[mpich-discuss] How to terminate MPI_Comm_accept
Lu, Huiwei
huiweilu at mcs.anl.gov
Wed Oct 8 10:18:51 CDT 2014
Hi Hirak,
I can reproduce your error with the attached program using one process:
mpicc -g -o mpi_comm_accept mpi_comm_accept.c -pthread
mpiexec -n 1 ./mpi_comm_accept
I found MPI_Send and MPI_Recv were not using the same communicator to communicate, that’s why MPI_Recv will never receive the message. So either the communicator creation was wrong or the application was wrong.
The standard said MPI_Comm_accept and MPI_Comm_connect are used for “establishing contact between two groups of processes that do not share an existing communicator”. But in this case, thread 1 and 2 do share an existing communicator and want to create a new communicator based on the common communicator. I don’t know if it is allowed. If it is allowed, then MPI_Comm_accept and MPI_Comm_connect should be fixed to support multiple thread case; if it is not allowed, we may need to change the application to use another way to terminate MPI_Comm_accept.
Thanks,
—
Huiwei
On Oct 8, 2014, at 12:14 AM, Roy, Hirak <Hirak_Roy at mentor.com> wrote:
> Hi Pavan,
>
> Here is my code for thread2 :
>
> do {
> MPI_Comm newComm ;
> MPI_Comm_accept (m_serverPort.c_str(), MPI_INFO_NULL, 0, MPI_COMM_SELF, &newComm);
> Log ("Accepted a connection");
> int buf = 0 ;
> MPI_Status status ;
> MPI_Recv(&buf, 1, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, newComm, &status);
>
> if (status.MPI_TAG == MPI_MSG_TAG_NEW_CONN) {
> m_clientComs[m_clientCount] = newComm ;
> m_clientCount++;
> } else if (status.MPI_TAG == MPI_MSG_TAG_SHUTDOWN) {
> Log ("Shutdown");
> //MPI_Comm_disconnect (&newComm);
> Log ("Disconnect");
> break;
> } else {
> Log ("Unmatched Receive");
> }
> } while(1) ;
>
>
> Here is my code for thread1 to terminate thread2 :
>
> MPI_Comm newComm ;
> MPI_Comm_connect (m_serverPort.c_str(), MPI_INFO_NULL, 0, MPI_COMM_SELF, &newComm);
> Log ("Connect to Self");
> int val = 0 ;
> MPI_Request req ;
> MPI_Send(&val, 1, MPI_INT, 0, MPI_MSG_TAG_SHUTDOWN, newComm);
> Log ("Successful");
> //MPI_Status stat ;
> //MPI_Wait(&req, &stat);
> Log ("Complete");
>
> //MPI_Comm_disconnect(&newComm);
>
>
>
>
> The MPI_Send/Recv waits.
> I am using sock channel.
> For nemesis, I get the following crash :
> Assertion failed in file ./src/mpid/ch3/channels/nemesis/include/mpid_nem_inline.h at line 58: vc_ch->is_local
> internal ABORT - process 0
>
> I tried non-blocking send and receive followed by wait. However, that also does not solve the problem.
>
> Thanks,
> Hirak
>
>
>
> -----
>
> Hirak,
>
> Your approach should work fine. I’m not sure what issue you are facing. I assume thread 1 is doing this:
>
> while (1) {
> MPI_Comm_accept(..);
> MPI_Recv(.., tag, ..);
> if (tag == REGULAR_CONNECTION)
> continue;
> else if (tag == TERMINATION) {
> MPI_Send(..);
> break;
> }
> }
>
> In this case, all clients do an MPI_Comm_connect and then send a message with tag = REGULAR_CONNECTION. When thread 2 is done with its work, it’ll do an MPI_Comm_connect and then send a message with tag = TERMINATION, wait for a response from thread 1, and call finalize.
>
> — Pavan
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20141008/270d03f1/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpi_comm_accept.c
Type: application/octet-stream
Size: 1839 bytes
Desc: mpi_comm_accept.c
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20141008/270d03f1/attachment.obj>
-------------- next part --------------
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list