<meta http-equiv="Content-Type" content="text/html; charset=utf-8"><div dir="ltr"><div>Hello everyone,<br><br>I have been configuring a new ubuntu cluster
and wanted it to run MPI programs. I got mpich configured and compiled
then ran the following as a test:<br><br>#include <stdio.h><br>#include <mpi.h><br><br>int main (int argc, char** argv) {<br> int rank = 0, size = 0, nameLen = 0;<br> char procName[MPI_MAX_PROCESSOR_<wbr>NAME];<br><br> MPI_Init (&argc, &argv);<br> MPI_Comm_size (MPI_COMM_WORLD, &size);<br> MPI_Comm_rank (MPI_COMM_WORLD, &rank);<br> MPI_Get_processor_name (procName, &nameLen);<br><br> printf ("Hello from processor %s, rank %d of %d\n", procName, rank, size);<br><br> MPI_Finalize();<br> return 0;<br>}<br><br></div><div>using the commands:<br></div><div>mpicc mpi_hello.c -o mpi_hello<br></div><div>mpiexec -mahcinefile machinefile mpi_hello<br></div><div><br></div><div>my machienfile looks like this:<br></div><div>beanblade4:24<br></div><div>beanblade3:24<br></div><div>beanblade2:24<br></div><div>beanblade:24<br><br></div><div>Which
it properly defined in /etc/hosts. It compiled and executed without
error and returned all the processes I had on each node. I went on to
install some programs on the NFS drive and when I tried to run it I get
the following error<br></div><div><br>bash: orted: command not found<br>------------------------------<wbr>------------------------------<wbr>--------------<br>ORTE was unable to reliably start one or more daemons.<br>This usually is caused by:<br><br>* not finding the required libraries and/or binaries on<br> one or more nodes. Please check your PATH and LD_LIBRARY_PATH<br> settings, or configure OMPI with --enable-orterun-prefix-by-<wbr>default<br><br>* lack of authority to execute on one or more specified nodes.<br> Please verify your allocation and authorities.<br><br>* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).<br> Please check with your sys admin to determine the correct location to use.<br><br>* compilation of the orted with dynamic libraries when static are required<br> (e.g., on Cray). Please check your configure cmd line and consider using<br> one of the contrib/platform definitions for your system type.<br><br>* an inability to create a connection back to mpirun due to a<br> lack of common network interfaces and/or no route found between<br> them. Please check network connectivity (including firewalls<br> and network routing requirements).<br>------------------------------<wbr>------------------------------<wbr>--------------<br><br></div><div>This
error now gets returned when I try to run the mpi_hello program I used
to test my mpi implementation. if I drop the machinefile tag from my
mpiexec command I get<br><br>mpiexec ./mpi_hello<br>------------------------------<wbr>------------------------------<wbr>--------------<br>[[14894,1],17]: A high-performance Open MPI point-to-point messaging module<br>was unable to find any relevant network interfaces:<br><br>Module: OpenFabrics (openib)<br> Host: beanblade<br><br>Another transport will be used instead, although this may result in<br>lower performance.<br>------------------------------<wbr>------------------------------<wbr>--------------<br>Hello from processor beanblade, rank 18 of 24<br>Hello from processor beanblade, rank 0 of 24<br>Hello from processor beanblade, rank 2 of 24<br>Hello from processor beanblade, rank 4 of 24<br>Hello from processor beanblade, rank 6 of 24<br>Hello from processor beanblade, rank 7 of 24<br>Hello from processor beanblade, rank 3 of 24<br>Hello from processor beanblade, rank 1 of 24<br>Hello from processor beanblade, rank 5 of 24<br>Hello from processor beanblade, rank 8 of 24<br>Hello from processor beanblade, rank 12 of 24<br>Hello from processor beanblade, rank 13 of 24<br>Hello from processor beanblade, rank 11 of 24<br>Hello from processor beanblade, rank 9 of 24<br>Hello from processor beanblade, rank 10 of 24<br>Hello from processor beanblade, rank 14 of 24<br>Hello from processor beanblade, rank 15 of 24<br>Hello from processor beanblade, rank 16 of 24<br>Hello from processor beanblade, rank 17 of 24<br>Hello from processor beanblade, rank 19 of 24<br>Hello from processor beanblade, rank 20 of 24<br>Hello from processor beanblade, rank 21 of 24<br>Hello from processor beanblade, rank 22 of 24<br>Hello from processor beanblade, rank 23 of 24<br>[beanblade:13849] 23 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics<br>[beanblade:13849] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages<br><br></div>I've
been trying to figure it out on the forums and I think it's somethign
to do with my bashrc file (I have a few exports for environmental
variables), but removing them didn't fix the problem. What did I
break? Where do I look to fix it?<br><br><div><div class="gmail_signature"><div dir="ltr"><div>Seth Munholland, B.Sc.<br></div><div>Department of Biological Sciences<br>
Rm. 304 Biology Building<br>
University of Windsor<br>
401 Sunset Ave. N9B 3P4<br>
T: <a value="+15192533000">(519) 253-3000 Ext: 4755</a></div></div></div></div>
</div>