[mpich-discuss] Need help troubleshooting
mark
dimitsas.markos at gmail.com
Mon Jun 30 05:20:00 CDT 2014
Hello.
I need some help troubleshooting a program i wrote. For a combination of
data and nodes the program runs fine, but for others not. For example i
use 2d arrays for data collections and divide them into the nodes. If
the number of the lines in the array are 320 and the nodes are 16 ( 8
physical nodes with multi-threading) the program runs fine. But if the
lines in the array are 50 and the nodes 16 the program fails, but again
if the nodes are 2 or 4 the program runs ok.
Is there a way to define the exact spot where the code is failing?
Also, some examples would do wonders. Thanks
Ps-1: The errors that the program returns are in the form of:
rank 25 in job 29 Calliope_50394 caused collective abort of all ranks
- exit status of rank 25: killed by signal 11
Ps-2: I wrote other programs in MPI that worked, and the only difference
is, that in this program i use loops like these:
for(i=id*n/p; i< (id+1)*n/p; i++){..... (where id are the id's of the
nodes, n is the data collection and p are the number of nodes)
to parse the data accordingly.
More information about the discuss
mailing list