[mpich-discuss] Failed to allocate memory for an unexpected message

Luiz Carlos da Costa Junior lcjunior at ufrj.br
Thu Mar 13 12:35:50 CDT 2014


Does anyone have any clue about this?

Thanks in advance.


On 12 March 2014 14:40, Luiz Carlos da Costa Junior <lcjunior at ufrj.br>wrote:

> Dear Kenneth,
>
> Thanks for your quick reply.
> I tested your suggestion and, unfortunately, this approach didn't work.
>
> Question: when I call MPI_IPROBE it accounts also for the messages that
> were already received asynchronously?
>
> Is there any way to know, for my list of mpi_requests (from my
> MPI_IRECV's), which ones are "opened" and which ones have messages?
>
> Regards,
>
>
> On 11 March 2014 17:00, Kenneth Raffenetti <raffenet at mcs.anl.gov> wrote:
>
>> You could use MPI_Probe/MPI_Iprobe and pass in your "data" tag to test
>> for any more pending messages.
>>
>> Ken
>>
>>
>> On 03/11/2014 02:50 PM, Luiz Carlos da Costa Junior wrote:
>>
>>> Dear all,
>>>
>>> I am all set with your suggestions and my program is working quite well
>>> without any "unexpected message" since then, thanks again.
>>>
>>> However now I am facing a small problem I describe next.
>>>
>>> My receiver process receives actually 2 types of messages (2 different
>>> tags):
>>>
>>>   * the basic tag means that the message is a "data message" ("data"
>>>
>>>     tag) that should be processed.
>>>   * the second one means that the worker process is done and it will
>>>
>>>     send no more messages ("end_of_processing" tag).
>>>
>>> Once all worker processes send their end_of_processing tag, the receiver
>>> process finishes its execution.
>>>
>>> The problem I noticed is that some of the last messages sent by the
>>> worker processes were not being processed. I think the problem is
>>> related with the logic I am using with MPI_WAITANY in the receiver
>>> process. I am simply counting the number of end_of_processing messages
>>> received and if it reaches the number of worker processes, I finish the
>>> execution without checking if there are more messages to be received at
>>> the MPI_WAITANY queue.
>>>
>>> As the order that messages arrive is not relevant for MPI_WAITANY I
>>> think that my logic forgets some of the messages at the end of the
>>> queue. Is this right?
>>>
>>> Is there any way to check if there is any pending request to be
>>> processed?
>>>
>>> Best regards,
>>> Luiz
>>>
>>>
>>> On 16 January 2014 16:57, "Antonio J. Peña" <apenya at mcs.anl.gov
>>> <mailto:apenya at mcs.anl.gov>> wrote:
>>>
>>>
>>>     A profiling of both of your codes would help to understand where the
>>>     time is spent and the difference between them in terms of
>>> performance.
>>>
>>>        Antonio
>>>
>>>
>>>
>>>     On 01/16/2014 12:47 PM, Luiz Carlos da Costa Junior wrote:
>>>
>>>>     Yes, I am comparing original x new implementation.
>>>>
>>>>     The original implementation is as follows.
>>>>
>>>>     c-----------------------------------------------------------
>>>> ------------
>>>>           subroutine my_receiver_original
>>>>     c ------------------------------------------------------------
>>>> ------
>>>>           (...)
>>>>
>>>>     c     Local
>>>>     c     -----
>>>>           integer*4 m_stat(MPI_STATUS_SIZE)
>>>>           character card*(zbuf)      ! buffer for messages received
>>>>
>>>>           do while( keep_receiving )
>>>>             call MPI_RECV(card, zbuf, MPI_CHARACTER,
>>>>          .  MPI_ANY_SOURCE, M_RECCSV, MY_COMM,
>>>>          .  m_stat, m_ierr )
>>>>
>>>>     c       Process message: disk IO
>>>>     c ---------------
>>>>             <DO SOMETHING>
>>>>             if( SOMETHING_ELSE ) then
>>>>     keep_receiving = .false.
>>>>             end if
>>>>           end do
>>>>
>>>>           (...)
>>>>
>>>>           return
>>>>           end
>>>>
>>>>     Regards,
>>>>     Luiz
>>>>
>>>>     On 16 January 2014 16:19, Balaji, Pavan <balaji at mcs.anl.gov
>>>>     <mailto:balaji at mcs.anl.gov>> wrote:
>>>>
>>>>
>>>>         On Jan 16, 2014, at 12:16 PM, Luiz Carlos da Costa Junior
>>>>         <lcjunior at ufrj.br <mailto:lcjunior at ufrj.br>> wrote:
>>>>         > No, these failures don't occur all the time. I have a
>>>>         successful run (with my original implementation) which I am
>>>>         using as the base case for comparison.
>>>>
>>>>         What are the two cases you are comparing?  Original
>>>>         implementation vs. new implementation?  What's the original
>>>>         implementation?
>>>>
>>>>           -- Pavan
>>>>
>>>>         _______________________________________________
>>>>         discuss mailing list discuss at mpich.org <mailto:
>>>> discuss at mpich.org>
>>>>
>>>>         To manage subscription options or unsubscribe:
>>>>         https://lists.mpich.org/mailman/listinfo/discuss
>>>>
>>>>
>>>>
>>>>
>>>>     _______________________________________________
>>>>     discuss mailing listdiscuss at mpich.org  <mailto:discuss at mpich.org>
>>>>
>>>>     To manage subscription options or unsubscribe:
>>>>     https://lists.mpich.org/mailman/listinfo/discuss
>>>>
>>>
>>>
>>>     --
>>>     Antonio J. Peña
>>>     Postdoctoral Appointee
>>>     Mathematics and Computer Science Division
>>>     Argonne National Laboratory
>>>     9700 South Cass Avenue, Bldg. 240, Of. 3148
>>>     Argonne, IL 60439-4847
>>>     apenya at mcs.anl.gov  <mailto:apenya at mcs.anl.gov>
>>>     www.mcs.anl.gov/~apenya  <http://www.mcs.anl.gov/~apenya>
>>>
>>>
>>>     _______________________________________________
>>>     discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org>
>>>
>>>     To manage subscription options or unsubscribe:
>>>     https://lists.mpich.org/mailman/listinfo/discuss
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> discuss mailing list     discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>
>>>  _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140313/e9e9e642/attachment.html>


More information about the discuss mailing list