[mpich-discuss] MPI error

Seth Munholland munholl at uwindsor.ca
Wed Feb 8 10:18:56 CST 2017


Hello everyone,

I have been configuring a new ubuntu cluster and wanted it to run MPI
programs.  I got mpich configured and compiled then ran the following as a
test:

#include <stdio.h>
#include <mpi.h>

int main (int argc, char** argv) {
    int rank = 0, size = 0, nameLen = 0;
    char procName[MPI_MAX_PROCESSOR_NAME];

    MPI_Init (&argc, &argv);
    MPI_Comm_size (MPI_COMM_WORLD, &size);
    MPI_Comm_rank (MPI_COMM_WORLD, &rank);
    MPI_Get_processor_name (procName, &nameLen);

    printf ("Hello from processor %s, rank %d of %d\n", procName, rank,
size);

    MPI_Finalize();
    return 0;
}

using the commands:
mpicc mpi_hello.c -o mpi_hello
mpiexec -mahcinefile machinefile mpi_hello

my machienfile looks like this:
beanblade4:24
beanblade3:24
beanblade2:24
beanblade:24

Which it properly defined in /etc/hosts.  It compiled and executed without
error and returned all the processes I had on each node.  I went on to
install some programs on the NFS drive and when I tried to run it I get the
following error

bash: orted: command not found
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:

* not finding the required libraries and/or binaries on
  one or more nodes. Please check your PATH and LD_LIBRARY_PATH
  settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.
  Please verify your allocation and authorities.

* the inability to write startup files into /tmp
(--tmpdir/orte_tmpdir_base).
  Please check with your sys admin to determine the correct location to use.

*  compilation of the orted with dynamic libraries when static are required
  (e.g., on Cray). Please check your configure cmd line and consider using
  one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
  lack of common network interfaces and/or no route found between
  them. Please check network connectivity (including firewalls
  and network routing requirements).
--------------------------------------------------------------------------

This error now gets returned when I try to run the mpi_hello program I used
to test my mpi implementation.  if I drop the machinefile tag from my
mpiexec command I get

mpiexec ./mpi_hello
--------------------------------------------------------------------------
[[14894,1],17]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: beanblade

Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
Hello from processor beanblade, rank 18 of 24
Hello from processor beanblade, rank 0 of 24
Hello from processor beanblade, rank 2 of 24
Hello from processor beanblade, rank 4 of 24
Hello from processor beanblade, rank 6 of 24
Hello from processor beanblade, rank 7 of 24
Hello from processor beanblade, rank 3 of 24
Hello from processor beanblade, rank 1 of 24
Hello from processor beanblade, rank 5 of 24
Hello from processor beanblade, rank 8 of 24
Hello from processor beanblade, rank 12 of 24
Hello from processor beanblade, rank 13 of 24
Hello from processor beanblade, rank 11 of 24
Hello from processor beanblade, rank 9 of 24
Hello from processor beanblade, rank 10 of 24
Hello from processor beanblade, rank 14 of 24
Hello from processor beanblade, rank 15 of 24
Hello from processor beanblade, rank 16 of 24
Hello from processor beanblade, rank 17 of 24
Hello from processor beanblade, rank 19 of 24
Hello from processor beanblade, rank 20 of 24
Hello from processor beanblade, rank 21 of 24
Hello from processor beanblade, rank 22 of 24
Hello from processor beanblade, rank 23 of 24
[beanblade:13849] 23 more processes have sent help message
help-mpi-btl-base.txt / btl:no-nics
[beanblade:13849] Set MCA parameter "orte_base_help_aggregate" to 0 to see
all help / error messages

I've been trying to figure it out on the forums and I think it's somethign
to do with my bashrc file (I have a few exports for environmental
variables), but removing them didn't fix the problem.  What did I break?
Where do I look to fix it?

Seth Munholland, B.Sc.
Department of Biological Sciences
Rm. 304 Biology Building
University of Windsor
401 Sunset Ave. N9B 3P4
T: (519) 253-3000 Ext: 4755
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170208/be726599/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list