[mpich-discuss] parallel execution error

Haider Abbas haiderabbasphy at gmail.com
Mon Oct 8 10:35:30 CDT 2018


Dear all,

Parallel execution of 'gamess' give the output which is given below. I
guess that this is due to some internal firewall of ubuntu because I am
using a separate switch with no Internet connection. could you please let
me know how to overcome this problem.
which command about iptables should I run on master and other nodes.
As I am first time preparing the cluster so guide me about the running the
cluster, from which node should I start my job, master or some other nodes.

with regards
yours sincerely

Haider Abbas







----- GAMESS execution script 'rungms' -----
This job is running on host physics-OptiPlex-3046
under operating system Linux at Sat Oct 6 14:32:01 IST 2018
Available scratch disk space (Kbyte units) at beginning of the job is
Filesystem           1K-blocks     Used Available Use% Mounted on
master:/home/mpiuser 953145344 23300096 881404928   3% /home/mpiuser
GAMESS temporary binary files will be written to /home/mpiuser/gamess
GAMESS supplementary output files will be written to /home/mpiuser/scr
Copying input file siguanine.inp to your run's scratch directory...
cp siguanine.inp /home/mpiuser/gamess/siguanine.F05
unset echo
/home/mpiuser/gamess/ddikick.x /home/mpiuser/gamess/gamess.00.x siguanine
-ddi 10 40 master:cpus=4 node1:cpus=4 node2:cpus=4 node3:cpus=4
node4:cpus=4 node5:cpus=4 node6:cpus=4 node7:cpus=4 node8:cpus=4
node9:cpus=4 -scr /home/mpiuser/gamess

 Distributed Data Interface kickoff program.
 Initiating 40 compute processes on 10 nodes to run the following command:
 /home/mpiuser/gamess/gamess.00.x siguanine

 TCP connect error: ECONNREFUSED.
 DDI Process 40: error code 911
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 DDI Process 42: error code 911
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP connect error: ECONNREFUSED.
 TCP connect error: ECONNREFUSED.
 TCP connect error: ECONNREFUSED.
 TCP connect error: ECONNREFUSED.
 TCP connect error: ECONNREFUSED.
 DDI Process 1: error code 911
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 DDI Process 41: error code 911
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 DDI Process 43: error code 911
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 DDI Process 2: error code 911
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP connect error: ECONNREFUSED.
 TCP connect error: ECONNREFUSED.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP connect error: ECONNREFUSED.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 DDI Process 3: error code 911
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 DDI Process 0: error code 911
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP connect error: ECONNREFUSED.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 TCP connect error: ECONNREFUSED.
 TCP: Connect failed. physics-OptiPlex-3046 -> master:35850.
 ddikick.x: Timed out while waiting for DDI processes to check in.
 ddikick.x: Fatal error detected.
 The error is most likely to be in the application, so check for
 input errors, disk space, memory needs, application bugs, etc.
 ddikick.x will now clean up all processes, and exit...
 ddikick.x: Sending kill signal to DDI processes.
 ddikick.x: Execution terminated due to error(s).
unset echo
----- accounting info -----
Files used on the master node physics-OptiPlex-3046 were:
-rw-r--r-- 1 mpiuser mpiuser 2594 Oct  6 14:28
/home/mpiuser/gamess/siguanine.F05
-rw-r--r-- 1 physics physics 2594 Aug  8  2016
/home/mpiuser/gamess/siguanine.inp
-rw-rw-r-- 1 mpiuser mpiuser 7856 Oct  6 14:29
/home/mpiuser/gamess/siguanine.out
ls: No match.
ls: No match.
ls: No match.
Files from node1 are:
-rw-r--r-- 1 physics physics 2594 Aug  8  2016
/home/mpiuser/gamess/siguanine.inp
-rw-rw-r-- 1 mpiuser mpiuser 8166 Oct  6 14:29
/home/mpiuser/gamess/siguanine.out
Files from node2 are:
-rw-r--r-- 1 physics physics 2594 Aug  8  2016
/home/mpiuser/gamess/siguanine.inp
-rw-rw-r-- 1 mpiuser mpiuser 8352 Oct  6 14:29
/home/mpiuser/gamess/siguanine.out
Files from node3 are:
-rw-r--r-- 1 physics physics 2594 Aug  8  2016
/home/mpiuser/gamess/siguanine.inp
-rw-rw-r-- 1 mpiuser mpiuser 8538 Oct  6 14:29
/home/mpiuser/gamess/siguanine.out
Files from node4 are:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20181008/1cf4512a/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list