[mpich-discuss] Possible bug MPICH2 1.0.6

mark dimitsas.markos at gmail.com
Wed Jul 2 15:02:17 CDT 2014


Hello.
In a program i am writing, i have an array which size is equal to the 
number of nodes in my cluster.
I made this array to keep an eye of the objects that each node has in 
his control and when i use it with the actual ranks of the nodes as 
positions ( *array[node_rank]=objects;*) it returns me an error and 
stops the execution:

*rank 0 in job 10  Calliope_49755   caused collective abort of all ranks**
**  exit status of rank 0: killed by signal 11 *

However, i had freely used the variable /id/ or /node_rank/ to point out 
a specific action for a specific node since now and never had problems.

If instead of using the ranks node as a pointer for the array, i use an 
integer i.e. 5, the program runs fine, but even now, 1 out of 7 
executions, it returns me an error and stops. The action i am calling 
that keeps returning me the error is a simple abstraction (*array[id]--;*).

Any ideas?


PS. The array is 1d and is created by allocating memory, like this : 
*int *array = malloc(processes * sizeof(int));*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140702/2666d588/attachment.html>


More information about the discuss mailing list