[mpich-discuss] Help troubleshooting a session
mark
dimitsas.markos at gmail.com
Mon Jun 30 05:09:33 CDT 2014
Hello.
I wrote a program in MPI and for some combinations of data and number of
nodes it runs ok, but other times it fails to run and returns errors
(i.e. rank 0 in job 29 Calliope_50394 caused collective abort of all
ranks exit status of rank 0: killed by signal 9). How i see it, for some
reason, the division of data into the nodes keeps failing the execution.
Because if i use 16 nodes and use for example a 320-line array as data
collection, it executes fine. But if i use a 50-line array it fails the
execution. Is there a way to troubleshoot the code and find where it
fails? Also some examples would be great.
PS. Because i have wrote other programs in MPI that worked, the only
difference they have with this one, that keeps failing, is that in this
one i use loops like this:
for(i=id*(n/p); i< (id+1)*n/p; i++){.....
to parse the data accordingly, but again for some combinations of data
and number of nodes it worked...
More information about the discuss
mailing list