[mpich-discuss] Fatal error in PMPI_Barrier: A process has failed, error stack:

Rajeev Thakur thakur at mcs.anl.gov
Wed Mar 26 20:45:29 CDT 2014


Is there a firewall on either machine that is in the way of communication?

Rajeev

On Mar 26, 2014, at 8:28 PM, Tony Ladd <tladd at che.ufl.edu>
 wrote:

> No - you get the same error - it looks as if process 1 (on the remote node) is not starting
> 
> svr:tladd(netbench)> mpirun -n 2 -f hosts /global/usr/src/mpich-3.0.4/examples/cpi
> Process 0 of 2 is on svr.che.ufl.edu
> Fatal error in PMPI_Reduce: A process has failed, error stack:
> PMPI_Reduce(1217)...............: MPI_Reduce(sbuf=0x7fff30ecced8, rbuf=0x7fff30ecced0, count=1, MPI_DOUBLE,
> 
> But if I reverse the order in the host file (pc5 first and then svr) apparently both processes start
> 
> svr:tladd(netbench)> mpirun -n 2 -f hosts /global/usr/src/mpich-3.0.4/examples/cpi
> Process 1 of 2 is on svr.che.ufl.edu
> Process 0 of 2 is on pc5
> Fatal error in PMPI_Reduce: A process has failed, error stack:
> PMPI_Reduce(1217)...............: MPI_Reduce(sbuf=0x7fff4d776348, rbuf=0x7fff4d776340, count=1, MPI_DOUBLE,
> 
> But with the same result in the end.
> 
> Tony
> 
> 
> 
> On 03/26/2014 08:18 PM, Rajeev Thakur wrote:
>> Does the cpi example run across two machines?
>> 
>> Rajeev
>> 
>> On Mar 26, 2014, at 7:13 PM, Tony Ladd <tladd at che.ufl.edu>
>>  wrote:
>> 
>>> Rajeev
>>> 
>>> Sorry about that. I was switching back and forth from openmpi to mpich. But it does not make a difference. Here is a clean log from a fresh terminal - no mention of openmpi
>>> 
>>> Tony
>>> 
>>> PS - its a CentOS 6.5install - should have mentioned it before.
>>> 
>>> -- 
>>> Tony Ladd
>>> 
>>> Chemical Engineering Department
>>> University of Florida
>>> Gainesville, Florida 32611-6005
>>> USA
>>> 
>>> Email: tladd-"(AT)"-che.ufl.edu
>>> Web    http://ladd.che.ufl.edu
>>> 
>>> Tel:   (352)-392-6509
>>> FAX:   (352)-392-9514
>>> 
>>> <mpich.log>_______________________________________________
>>> discuss mailing list     discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
> 
> -- 
> Tony Ladd
> 
> Chemical Engineering Department
> University of Florida
> Gainesville, Florida 32611-6005
> USA
> 
> Email: tladd-"(AT)"-che.ufl.edu
> Web    http://ladd.che.ufl.edu
> 
> Tel:   (352)-392-6509
> FAX:   (352)-392-9514
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss




More information about the discuss mailing list