[mpich-discuss] process failing...

Ron Palmer ron.palmer at pgcgroup.com.au
Sat May 24 02:00:35 CDT 2014


Antonio, Rajeev and others,
thanks for your replies and comments on possible causes for the error 
messages and failure, I have passed them on to the programmers of the 
underlying application. I must admit I do not understand what unexpected 
messages are (I am but a mere user), could you perhaps give examples of 
typical causes of them? Eg, the cluster it runs on consists of 3 dual 
xeon computers with varying cpu clock rating - could these error 
messages be due to getting out of synch, expecting results but not 
getting them from the slower computer? I have re-started the process but 
excluded the slowest computer (2.27GHz, the other two are running at 
2.87 and 3.2) as I was running out of ideas.

For your information, this runs well on smaller problems (few computations).

Thanks,
Ron

On 24/05/2014 3:10 AM, Rajeev Thakur wrote:
> Yes. The message below says some process has received 261,895 messages 
> for which no matching receives have been posted yet.
>
>
>
> Rajeev
>
>
>> It looks like at least one of your processes is receiving too 
>> many unexpected messages, leading to get out of 
>> memory. Unexpected messages are those not matching a posted receive 
>> on the receiver side. You may check with the application developers 
>> to make them review the algorithm or look for any possible bug.
>>
>>   Antonio
>
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140524/8a70e551/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 30940 bytes
Desc: not available
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140524/8a70e551/attachment.png>


More information about the discuss mailing list