[mpich-discuss] MPI error

Halim Amer halim.amer at acm.org
Wed Feb 8 10:50:06 CST 2017


Hi,

It seems you are using Open MPI. You can whether contact the Open MPI 
team to look into your problem, or clean your environment (both binaries 
and libraries) if you want to use MPICH to make sure you are truly using 
MPICH when building and running your program.

Halim
www.mcs.anl.gov/~aamer

On 2/8/17 10:18 AM, Seth Munholland wrote:
> Hello everyone,
>
> I have been configuring a new ubuntu cluster and wanted it to run MPI
> programs.  I got mpich configured and compiled then ran the following as
> a test:
>
> #include <stdio.h>
> #include <mpi.h>
>
> int main (int argc, char** argv) {
>     int rank = 0, size = 0, nameLen = 0;
>     char procName[MPI_MAX_PROCESSOR_NAME];
>
>     MPI_Init (&argc, &argv);
>     MPI_Comm_size (MPI_COMM_WORLD, &size);
>     MPI_Comm_rank (MPI_COMM_WORLD, &rank);
>     MPI_Get_processor_name (procName, &nameLen);
>
>     printf ("Hello from processor %s, rank %d of %d\n", procName, rank,
> size);
>
>     MPI_Finalize();
>     return 0;
> }
>
> using the commands:
> mpicc mpi_hello.c -o mpi_hello
> mpiexec -mahcinefile machinefile mpi_hello
>
> my machienfile looks like this:
> beanblade4:24
> beanblade3:24
> beanblade2:24
> beanblade:24
>
> Which it properly defined in /etc/hosts.  It compiled and executed
> without error and returned all the processes I had on each node.  I went
> on to install some programs on the NFS drive and when I tried to run it
> I get the following error
>
> bash: orted: command not found
> --------------------------------------------------------------------------
> ORTE was unable to reliably start one or more daemons.
> This usually is caused by:
>
> * not finding the required libraries and/or binaries on
>   one or more nodes. Please check your PATH and LD_LIBRARY_PATH
>   settings, or configure OMPI with --enable-orterun-prefix-by-default
>
> * lack of authority to execute on one or more specified nodes.
>   Please verify your allocation and authorities.
>
> * the inability to write startup files into /tmp
> (--tmpdir/orte_tmpdir_base).
>   Please check with your sys admin to determine the correct location to use.
>
> *  compilation of the orted with dynamic libraries when static are required
>   (e.g., on Cray). Please check your configure cmd line and consider using
>   one of the contrib/platform definitions for your system type.
>
> * an inability to create a connection back to mpirun due to a
>   lack of common network interfaces and/or no route found between
>   them. Please check network connectivity (including firewalls
>   and network routing requirements).
> --------------------------------------------------------------------------
>
> This error now gets returned when I try to run the mpi_hello program I
> used to test my mpi implementation.  if I drop the machinefile tag from
> my mpiexec command I get
>
> mpiexec ./mpi_hello
> --------------------------------------------------------------------------
> [[14894,1],17]: A high-performance Open MPI point-to-point messaging module
> was unable to find any relevant network interfaces:
>
> Module: OpenFabrics (openib)
>   Host: beanblade
>
> Another transport will be used instead, although this may result in
> lower performance.
> --------------------------------------------------------------------------
> Hello from processor beanblade, rank 18 of 24
> Hello from processor beanblade, rank 0 of 24
> Hello from processor beanblade, rank 2 of 24
> Hello from processor beanblade, rank 4 of 24
> Hello from processor beanblade, rank 6 of 24
> Hello from processor beanblade, rank 7 of 24
> Hello from processor beanblade, rank 3 of 24
> Hello from processor beanblade, rank 1 of 24
> Hello from processor beanblade, rank 5 of 24
> Hello from processor beanblade, rank 8 of 24
> Hello from processor beanblade, rank 12 of 24
> Hello from processor beanblade, rank 13 of 24
> Hello from processor beanblade, rank 11 of 24
> Hello from processor beanblade, rank 9 of 24
> Hello from processor beanblade, rank 10 of 24
> Hello from processor beanblade, rank 14 of 24
> Hello from processor beanblade, rank 15 of 24
> Hello from processor beanblade, rank 16 of 24
> Hello from processor beanblade, rank 17 of 24
> Hello from processor beanblade, rank 19 of 24
> Hello from processor beanblade, rank 20 of 24
> Hello from processor beanblade, rank 21 of 24
> Hello from processor beanblade, rank 22 of 24
> Hello from processor beanblade, rank 23 of 24
> [beanblade:13849] 23 more processes have sent help message
> help-mpi-btl-base.txt / btl:no-nics
> [beanblade:13849] Set MCA parameter "orte_base_help_aggregate" to 0 to
> see all help / error messages
>
> I've been trying to figure it out on the forums and I think it's
> somethign to do with my bashrc file (I have a few exports for
> environmental variables), but removing them didn't fix the problem.
> What did I break?  Where do I look to fix it?
>
> Seth Munholland, B.Sc.
> Department of Biological Sciences
> Rm. 304 Biology Building
> University of Windsor
> 401 Sunset Ave. N9B 3P4
> T: (519) 253-3000 Ext: 4755
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list