[mpich-discuss] Parallel test hanging with mpich on rhel7
Balaji, Pavan
balaji at anl.gov
Mon Feb 10 22:56:41 CST 2014
On 2/10/14, 10:47 PM, "Orion Poplawski" <orion at cora.nwra.com> wrote:
>On 02/06/2014 09:10 PM, Balaji, Pavan wrote:
>>
>> Thanks. That’s very useful analysis. Would you be willing to try the
>> attached patch to see if it solves this issue?
>>
>> — Pavan
>
>Well, it seems to prevent a hang (although I'm also updating from 3.0.4
>to 3.1rc3 so not sure what is all changing here), but it does not run:
It might be easier to use the nightly snapshots to make sure you are not
missing some fixes:
http://www.mpich.org/static/tarballs/nightly/master/mpich/
The patch I sent, as well as a few other patches after 3.1rc3, are all
included in the nightly snapshots.
>============================
>Fatal error in MPI_Init: Other MPI error, error stack:
>MPIR_Init_thread(467)..............:
>MPID_Init(177).....................: channel initialization failed
>MPIDI_CH3_Init(70).................:
>MPID_nem_init(319).................:
>MPID_nem_tcp_init(171).............:
>MPID_nem_tcp_get_business_card(418):
>MPID_nem_tcp_init(377).............: gethostbyname failed, i-00001ff8
>(errno 1)
>Fatal error in MPI_Init: Other MPI error, error stack:
>MPIR_Init_thread(467)..............:
>MPID_Init(177).....................: channel initialization failed
>MPIDI_CH3_Init(70).................:
>MPID_nem_init(319).................:
>MPID_nem_tcp_init(171).............:
>MPID_nem_tcp_get_business_card(418):
>MPID_nem_tcp_init(377).............: gethostbyname failed, i-00001ff8
>(errno 1)
That’s really weird. Errno 1 is "permission denied”. I don’t know how
that’s happening with gethostbyname.
Can you send your mpiexec command line and a small program that reproduces
this error? E.g., if a program that just does MPI_INIT/MPI_FINALIZE
reproduces this error, that’ll be best.
— Pavan
More information about the discuss
mailing list