[mpich-discuss] MPI_Intercomm_create() for merging two spawned groups
Dave Goodell (dgoodell)
dgoodell at cisco.com
Fri Sep 26 12:45:57 CDT 2014
I haven't read the test case code, but based on the description in this thread I think this is probably a duplicate of ticket #1502: http://trac.mpich.org/projects/mpich/ticket/1502
FWIW, I think that issue has since been fixed in Open MPI, though I haven't tested it myself.
On Sep 26, 2014, at 11:29 AM, Wesley Bland <wbland at anl.gov> wrote:
> I believe that your code is correct. I’ve gone through it to simplify things a bit (attached) and see the same errors as you. That’s probably a bug in MPICH that needs to be fixed unless someone else comes along and says that for MPI_INTERCOMM_CREATE to work, all processes must be in the same peer_comm, which doesn’t seem to be what the standard says to me. I’ll create a ticket and add you as a CC so you can keep track of things.
> In the meantime, you can avoid this problem by using one of the other ways of setting up communication between two group of processes. You can use the connect/accept functions as per this ticket: http://trac.mpich.org/projects/mpich/ticket/495 or you can change the way you spawn processes to have all processes in MPI_COMM_WORLD be involved in spawning the new processes. I don’t know if that will actually work for your application, but it’s a stopgap measure while we fix this bug.
> > On Sep 26, 2014, at 10:50 AM, Carsten Clauss <c.clauss at fz-juelich.de> wrote:
> > Dear all,
> > I have a code where two processes (forming the original MPI_COMM_WORLD) each spawn one additional child process (using MPI_COMM_SELF as spawning group).
> > Now I want to create an intra-comm that covers all of these four processes.
> > For doing so, I initially merge the two inter-comms resulting from the spawn calls into two new intra-comms (by using MPI_Intercomm_merge()).
> > Then I create via MPI_intercomm_create() a new inter-comm that connects these two by using the original world communicator as peer-com.
> > Finally, I merge the resulting inter-comm into the desired intra-comm.
> > When using Open MPI, my code (it's derived from the MPICH test spaiccreate2.c, see attachment) works fine on my local machine.
> > However, when running it with MPICH-3.1.2, I get the following error message:
> > PMPI_Intercomm_create(601).....: MPI_Intercomm_create(comm=0x84000006, local_leader=1, MPI_COMM_WORLD, remote_leader=1, tag=123, newintercomm=0x7fff8323ee3c) failed
> > MPIR_Intercomm_create_impl(258):
> > MPID_GPID_ToLpidArray(461).....: Internal MPI error: Unknown gpid (1809769587)0
> > Fatal error in PMPI_Intercomm_create: Internal MPI error!, error stack:
> > PMPI_Intercomm_create(601).....: MPI_Intercomm_create(comm=0x84000004, local_leader=1, MPI_COMM_WORLD, remote_leader=0, tag=123, newintercomm=0x7fff6e0b4c7c) failed
> > MPIR_Intercomm_create_impl(258):
> > MPID_GPID_ToLpidArray(461).....: Internal MPI error: Unknown gpid (1607388239)0
> > Here are my questions:
> > 1) Is the above mentioned approach the right way to reach my goal?
> > 2) Is the semantics of the attached code MPI compliant?
> > 3) What is the reason for the error message when using MPICH?
> > Thanks in advance and with kind regards,
> > Carsten
> > --
> > Carsten Clauss
> > www.par-tec.com
> > _____________________________________
> > ParTec Cluster Competence Center GmbH
> > Possartstrasse 20
> > D-81679 Muenchen
> > Geschäftsführer RA. Dipl.-Ing. Bernhard Frohwitter Eingetragen beim
> > Amtsgericht München HRB 151545 Steuer-Nr. 08/32305, Ust-ID DE235527064
> > <spaiccreate3.c>_______________________________________________
> > discuss mailing list discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
More information about the discuss