[mpich-discuss] Help troubleshooting a session

mark dimitsas.markos at gmail.com
Mon Jun 30 05:09:33 CDT 2014


Hello.
I wrote a program in MPI and for some combinations of data and number of 
nodes it runs ok, but other times it fails to run and returns errors 
(i.e. rank 0 in job 29  Calliope_50394   caused collective abort of all 
ranks exit status of rank 0: killed by signal 9). How i see it, for some 
reason, the division of data into the nodes keeps failing the execution. 
Because if i use 16 nodes and use for example a 320-line array as data 
collection, it executes fine. But if i use a 50-line array it fails the 
execution. Is there a way to troubleshoot the code and find where it 
fails? Also some examples would be great.


PS. Because i have wrote other programs in MPI that worked, the only 
difference they have with this one, that keeps failing, is that in this 
one i use loops like this:

     for(i=id*(n/p); i< (id+1)*n/p; i++){.....

to parse the data accordingly, but again for some combinations of data 
and number of nodes it worked...



More information about the discuss mailing list