[mpich-discuss] success and failure report for mpich-3.0.2

Pavan Balaji balaji at mcs.anl.gov
Wed May 1 07:42:02 CDT 2013


Siegmar,

Adding discuss at mpich.org back to the cc list.  Please don't drop it.

First, don't add the full path.  That'll not help when the executable is
at two different paths on the two machines.

Can you please run this from both sunpc1 and linpc1:

% mpiexec -np 2 -hosts sunpc1,linpc1 which hostname

The reason I'm still curious whether both machines are seeing the same
path is because one of the machines is accessed locally (through fork)
while the other is accessed over ssh.  So the environment you are seeing
by logging in might not be the same as the environment you'd see by a
non-interactive ssh launch.

 -- Pavan

On 05/01/2013 04:29 AM US Central Time, Siegmar Gross wrote:
> Hi
> 
>> On 04/30/2013 05:55 AM US Central Time, Siegmar Gross wrote:
>>> It seems, that I don't need a path, if the command has the same path
>>> on both machines. It breaks, if the program has different pathnames.
>>
>> From the launching logic, I don't know how that'll be true.  I just
>> tested this as well and it works fine for me.
>>
>>> sunpc1 fd1026 108 mpiexec -np 2 -host sunpc1,linpc1 hostname
>>> sunpc1
>>> [proxy:0:1 at linpc1] HYDU_create_process 
>>> (../../../../mpich-3.0.2/src/pm/hydra/utils/launch/launch.c:74):
>>>   execvp error on file hostname (No such file or directory)
>>
>> My guess is that "hostname" one of the machines is on your path and the
>> other is not.
> 
> No, both machines know "hostname". I try to show you, which PATH is
> available on both machines.
> 
> 
> sunpc1 hello_1 110 mpiexec -np 2 -host sunpc1 environ_mpi
> 
> Now 1 slave tasks are sending their environment.
> 
> Environment from task 1:
>   message type:        3
>   msg length:          3394 characters
>   message:             
>     hostname:          sunpc1
>     operating system:  SunOS
>     release:           5.10
>     processor:         i86pc
>     PATH
>                        /usr/local/eclipse-3.6.1
>                        /usr/local/NetBeans-4.0/bin
>                        /usr/local/jdk1.7.0_07/bin/amd64
>                        /usr/local/apache-ant-1.6.2/bin
>                        /usr/local/gcc-4.8.0/bin
>                        /opt/solstudio12.3/bin
>                        /usr/local/bin
>                        /usr/local/ssl/bin
>                        /usr/local/pgsql/bin
>                        /usr/bin
>                        /usr/openwin/bin
>                        /usr/dt/bin
>                        /usr/ccs/bin
>                        /usr/sfw/bin
>                        /opt/sfw/bin
>                        /usr/ucb
>                        /usr/lib/lp/postscript
>                        /usr/local/teTeX-1.0.7/bin/i386-pc-solaris2.10
>                        /usr/local/bluej-2.1.2
>                        /usr/local/mpich-3.0.2_64_cc/bin
>                        /home/fd1026/SunOS/x86_64/bin
>                        .
>                        /usr/sbin
>     LD_LIBRARY_PATH_64
> ...
> 
> 
> sunpc1 hello_1 111 mpiexec -np 2 -host linpc1 environ_mpi
> [proxy:0:0 at linpc1] HYDU_create_process 
> (../../../../mpich-3.0.2/src/pm/hydra/utils/launch/launch.c:74):
>   execvp error on file environ_mpi (No such file or directory)
> [proxy:0:0 at linpc1] HYDU_create_process 
> (../../../../mpich-3.0.2/src/pm/hydra/utils/launch/launch.c:74):
>   execvp error on file environ_mpi (No such file or directory)
> 
> ======================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   EXIT CODE: 255
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> =======================================================================
> sunpc1 hello_1 112 
> 
> 
> 
> Now I switch the local host from Solaris to Linux and try everything
> once more.
> 
> sunpc1 hello_1 112 ssh linpc1
> linpc1 fd1026 102  mpiexec -np 2 -host linpc1 environ_mpi
> 
> Now 1 slave tasks are sending their environment.
> 
> Environment from task 1:
>   message type:        3
>   msg length:          3452 characters
>   message:             
>     hostname:          linpc1
>     operating system:  Linux
>     release:           3.1.10-1.16-desktop
>     processor:         x86_64
>     PATH
>                        /usr/local/eclipse-3.6.1
>                        /usr/local/NetBeans-4.0/bin
>                        /usr/local/jdk1.7.0_07-64/bin
>                        /usr/local/apache-ant-1.6.2/bin
>                        /usr/local/icc-9.1/idb/bin
>                        /usr/local/icc-9.1/cc/bin
>                        /usr/local/icc-9.1/fc/bin
>                        /usr/local/gcc-4.8.0/bin
>                        /opt/solstudio12.3/bin
>                        /usr/local/bin
>                        /usr/local/ssl/bin
>                        /usr/local/pgsql/bin
>                        /bin
>                        /usr/bin
>                        /usr/X11R6/bin
>                        /usr/local/teTeX-1.0.7/bin/i586-pc-linux-gnu
>                        /usr/local/bluej-2.1.2
>                        /usr/local/mpich-3.0.2_64_cc/bin
>                        /home/fd1026/Linux/x86_64/bin
>                        .
>                        /usr/sbin
>     LD_LIBRARY_PATH_64
> ...
> 
> 
> 
> linpc1 fd1026 103 mpiexec -np 2 -host sunpc1 environ_mpi
> 
> ====================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   EXIT CODE: 9
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> =====================================================================
> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)
> This typically refers to a problem with your application.
> Please see the FAQ page for debugging suggestions
> 
> 
> OK, now let's try with a full pathname.
> 
> linpc1 fd1026 104 mpiexec -np 2 -host sunpc1 /home/fd1026/SunOS/x86_64/bin/environ_mpi
> 
> Now 1 slave tasks are sending their environment.
> 
> Environment from task 1:
>   message type:        3
>   msg length:          3436 characters
>   message:             
>     hostname:          sunpc1
>     operating system:  SunOS
>     release:           5.10
>     processor:         i86pc
>     PATH
>                        /usr/local/eclipse-3.6.1
>                        /usr/local/NetBeans-4.0/bin
>                        /usr/local/jdk1.7.0_07-64/bin
>                        /usr/local/apache-ant-1.6.2/bin
>                        /usr/local/icc-9.1/idb/bin
>                        /usr/local/icc-9.1/cc/bin
>                        /usr/local/icc-9.1/fc/bin
>                        /usr/local/gcc-4.8.0/bin
>                        /opt/solstudio12.3/bin
>                        /usr/local/bin
>                        /usr/local/ssl/bin
>                        /usr/local/pgsql/bin
>                        /bin
>                        /usr/bin
>                        /usr/X11R6/bin
>                        /usr/local/teTeX-1.0.7/bin/i586-pc-linux-gnu
>                        /usr/local/bluej-2.1.2
>                        /usr/local/mpich-3.0.2_64_cc/bin
>                        /home/fd1026/Linux/x86_64/bin
>                        .
>                        /usr/sbin
>     LD_LIBRARY_PATH_64
> ...
> 
> 
> Ah, you are still using PATH from Linux and not from SunOS. I
> was lucky with "date", because Linux contains its "default"
> pathnames and "/usr/local/bin", while "/bin" is not a "default"
> pathname for Solaris as you can see above. My MPI programs are
> stored in "/home/fd1026/Linux/x86_64/bin" for Linux and in
> "/home/fd1026/SunOS/x86_64/bin" for Solaris x86_64 (I'm using
> NFS, so that I need different directories for the same program
> on different operating systems).
> 
> 
> 
>>> linpc1 fd1026 105 mpiexec -np 2 -host sunpc1,linpc1 hostname
>>> linpc1
>>> sunpc1
>>
>> Are all of /bin /usr/local/bin and /usr/bin in your path?
> 
> No, PATH depends on the operating system and architecture, but
> PATH contains all directories necessary to find all programs.
> All environment variables are set via $HOME/.cshrc. Does MPICH
> need the same PATH on all machines? How do you distinguish a
> program for different operating systems in a NFS environment?
> Do you need a link "$HOME/mpich_programs", which points to the
> operating system specific directory and which is part of PATH?
> 
> 
> Kind regards
> 
> Siegmar
> 

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji



More information about the discuss mailing list