[mpich-discuss] Error running examples after install

Balaji, Pavan balaji at anl.gov
Sat Jan 24 13:14:43 CST 2015


Sounds like a network setup issue, such as a firewall or /etc/host file description.  Did you look through the FAQ entry on this?

http://wiki.mpich.org/mpich/index.php/Frequently_Asked_Questions#Q:_My_MPI_program_aborts_with_an_error_saying_it_cannot_communicate_with_other_processes

  -- Pavan

> On Jan 24, 2015, at 12:58 PM, Tiago dos Santos <santos.tmd at gmail.com> wrote:
> 
> Hello everyone,
> 
> After installing mpich, I ran the examples and keep getting this stack error:
> 
> tds at ubuntu:~/Downloads/mpich-3.1.3$ mpiexec -f machinefile -n 2 ./examples/cpi
> Warning: Permanently added the ECDSA host key for IP address '192.168.201.138' to the list of known hosts.
> Process 0 of 2 is on ubuntu
> Fatal error in PMPI_Reduce: Unknown error class, error stack:
> PMPI_Reduce(1263)...............: MPI_Reduce(sbuf=0x7fff3f86da00, rbuf=0x7fff3f86da08, count=1, MPI_DOUBLE, MPI_SUM, root=0, MPI_COMM_WORLD) failed
> MPIR_Reduce_impl(1075)..........: 
> MPIR_Reduce_intra(881)..........: 
> MPIR_Reduce_binomial(188).......: 
> MPIDI_CH3U_Recvq_FDU_or_AEP(636): Communication error with rank 1
> 
> ===================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   PID 7563 RUNNING AT ubuntu
> =   EXIT CODE: 1
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> ===================================================================================
> [proxy:0:1 at ubuntu-clone] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:885): assert (!closed) failed
> [proxy:0:1 at ubuntu-clone] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
> [proxy:0:1 at ubuntu-clone] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
> [mpiexec at ubuntu] HYDT_bscu_wait_for_completion (tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
> [mpiexec at ubuntu] HYDT_bsci_wait_for_completion (tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
> [mpiexec at ubuntu] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:218): launcher returned error waiting for completion
> [mpiexec at ubuntu] main (ui/mpich/mpiexec.c:344): process manager error waiting for completion
> 
> 
> Since I’m pretty new to the MPI world, I kinda can’t get what did I did wrong - Did I do something wrong with ssh? Was it something else?
> 
> System Specifications:
> - Ubuntu 14.04 64  bits
> - gcc version 4.8.2
> - While installing, fortran support was disable
> - This system is running on a virtual machine
> 
> 
> Network Specification:
> - Two machines with the specifications above in a private virtual network
> - One machine is called ubuntu and the other one is ubuntu-clone
> 
> Host Files:
> - ubuntu
> 	- ubuntu
> 	- ubuntu-clone
> - ubuntu-clone
> 	- ubuntu-clone
> 	- ubuntu
> 
> 
> As you can see, the stack trace is from a command running on host ubuntu. The same error is razed when I run the same command on host ubuntu-clone.
> Can anyone help me getting where I messed up?
> 
> Thanks in advance
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss

--
Pavan Balaji  ✉️
http://www.mcs.anl.gov/~balaji

_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list