[mpich-discuss] MPI error

Seth Munholland munholl at uwindsor.ca
Wed Feb 15 10:26:16 CST 2017


That's very strange as I never installed OpenMPI.  I used apt-get to
install MPICH and have avoided OpenMPI to try and not get this issue.  I
thought it would be an easy fix of uninstalling and reinstalling via
apt-get, but I can't seem to actually remove this OpenMPI version.  Would a
source compile write over the apt-get installed version?

Seth Munholland, B.Sc.
Department of Biological Sciences
Rm. 304 Biology Building
University of Windsor
401 Sunset Ave. N9B 3P4
T: (519) 253-3000 Ext: 4755

On Wed, Feb 8, 2017 at 11:50 AM, Halim Amer <halim.amer at acm.org> wrote:

> Hi,
>
> It seems you are using Open MPI. You can whether contact the Open MPI team
> to look into your problem, or clean your environment (both binaries and
> libraries) if you want to use MPICH to make sure you are truly using MPICH
> when building and running your program.
>
> Halim
> www.mcs.anl.gov/~aamer
>
>
> On 2/8/17 10:18 AM, Seth Munholland wrote:
>
>> Hello everyone,
>>
>> I have been configuring a new ubuntu cluster and wanted it to run MPI
>> programs.  I got mpich configured and compiled then ran the following as
>> a test:
>>
>> #include <stdio.h>
>> #include <mpi.h>
>>
>> int main (int argc, char** argv) {
>>     int rank = 0, size = 0, nameLen = 0;
>>     char procName[MPI_MAX_PROCESSOR_NAME];
>>
>>     MPI_Init (&argc, &argv);
>>     MPI_Comm_size (MPI_COMM_WORLD, &size);
>>     MPI_Comm_rank (MPI_COMM_WORLD, &rank);
>>     MPI_Get_processor_name (procName, &nameLen);
>>
>>     printf ("Hello from processor %s, rank %d of %d\n", procName, rank,
>> size);
>>
>>     MPI_Finalize();
>>     return 0;
>> }
>>
>> using the commands:
>> mpicc mpi_hello.c -o mpi_hello
>> mpiexec -mahcinefile machinefile mpi_hello
>>
>> my machienfile looks like this:
>> beanblade4:24
>> beanblade3:24
>> beanblade2:24
>> beanblade:24
>>
>> Which it properly defined in /etc/hosts.  It compiled and executed
>> without error and returned all the processes I had on each node.  I went
>> on to install some programs on the NFS drive and when I tried to run it
>> I get the following error
>>
>> bash: orted: command not found
>> ------------------------------------------------------------
>> --------------
>> ORTE was unable to reliably start one or more daemons.
>> This usually is caused by:
>>
>> * not finding the required libraries and/or binaries on
>>   one or more nodes. Please check your PATH and LD_LIBRARY_PATH
>>   settings, or configure OMPI with --enable-orterun-prefix-by-default
>>
>> * lack of authority to execute on one or more specified nodes.
>>   Please verify your allocation and authorities.
>>
>> * the inability to write startup files into /tmp
>> (--tmpdir/orte_tmpdir_base).
>>   Please check with your sys admin to determine the correct location to
>> use.
>>
>> *  compilation of the orted with dynamic libraries when static are
>> required
>>   (e.g., on Cray). Please check your configure cmd line and consider using
>>   one of the contrib/platform definitions for your system type.
>>
>> * an inability to create a connection back to mpirun due to a
>>   lack of common network interfaces and/or no route found between
>>   them. Please check network connectivity (including firewalls
>>   and network routing requirements).
>> ------------------------------------------------------------
>> --------------
>>
>> This error now gets returned when I try to run the mpi_hello program I
>> used to test my mpi implementation.  if I drop the machinefile tag from
>> my mpiexec command I get
>>
>> mpiexec ./mpi_hello
>> ------------------------------------------------------------
>> --------------
>> [[14894,1],17]: A high-performance Open MPI point-to-point messaging
>> module
>> was unable to find any relevant network interfaces:
>>
>> Module: OpenFabrics (openib)
>>   Host: beanblade
>>
>> Another transport will be used instead, although this may result in
>> lower performance.
>> ------------------------------------------------------------
>> --------------
>> Hello from processor beanblade, rank 18 of 24
>> Hello from processor beanblade, rank 0 of 24
>> Hello from processor beanblade, rank 2 of 24
>> Hello from processor beanblade, rank 4 of 24
>> Hello from processor beanblade, rank 6 of 24
>> Hello from processor beanblade, rank 7 of 24
>> Hello from processor beanblade, rank 3 of 24
>> Hello from processor beanblade, rank 1 of 24
>> Hello from processor beanblade, rank 5 of 24
>> Hello from processor beanblade, rank 8 of 24
>> Hello from processor beanblade, rank 12 of 24
>> Hello from processor beanblade, rank 13 of 24
>> Hello from processor beanblade, rank 11 of 24
>> Hello from processor beanblade, rank 9 of 24
>> Hello from processor beanblade, rank 10 of 24
>> Hello from processor beanblade, rank 14 of 24
>> Hello from processor beanblade, rank 15 of 24
>> Hello from processor beanblade, rank 16 of 24
>> Hello from processor beanblade, rank 17 of 24
>> Hello from processor beanblade, rank 19 of 24
>> Hello from processor beanblade, rank 20 of 24
>> Hello from processor beanblade, rank 21 of 24
>> Hello from processor beanblade, rank 22 of 24
>> Hello from processor beanblade, rank 23 of 24
>> [beanblade:13849] 23 more processes have sent help message
>> help-mpi-btl-base.txt / btl:no-nics
>> [beanblade:13849] Set MCA parameter "orte_base_help_aggregate" to 0 to
>> see all help / error messages
>>
>> I've been trying to figure it out on the forums and I think it's
>> somethign to do with my bashrc file (I have a few exports for
>> environmental variables), but removing them didn't fix the problem.
>> What did I break?  Where do I look to fix it?
>>
>> Seth Munholland, B.Sc.
>> Department of Biological Sciences
>> Rm. 304 Biology Building
>> University of Windsor
>> 401 Sunset Ave. N9B 3P4
>> T: (519) 253-3000 Ext: 4755
>>
>>
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170215/dde8cdb4/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list