[mpich-discuss] Possible bug MPICH2 1.0.6

mark dimitsas.markos at gmail.com
Wed Jul 2 15:04:28 CDT 2014


???? 02/07/2014 11:02 ??, ?/? mark ??????:
> Hello.
> In a program i am writing, i have an array which size is equal to the 
> number of nodes in my cluster.
> I made this array to keep an eye of the objects that each node has in 
> his control and when i use it with the actual ranks of the nodes as 
> positions ( *array[node_rank]=objects;*) it returns me an error and 
> stops the execution:
>
> *rank 0 in job 10  Calliope_49755   caused collective abort of all ranks**
> **  exit status of rank 0: killed by signal 11 *
>
> However, i had freely used the variable /id/ or /node_rank/ to point 
> out a specific action for a specific node since now and never had 
> problems.
>
> If instead of using the ranks node as a pointer for the array, i use 
> an integer i.e. 5, the program runs fine, but even now, 1 out of 7 
> executions, it returns me an error and stops. The action i am calling 
> that keeps returning me the error is a simple abstraction 
> (*array[id]--;*).
>
> Any ideas?
>
>
> PS. The array is 1d and is created by allocating memory, like this : 
> *int *array = malloc(processes * sizeof(int));*
I forgot to mention how i compiled and executed the program :
mpich -o prog prog.c -lm
mpiexec -n nodes `pwd`/prog
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140702/5e87b017/attachment.html>


More information about the discuss mailing list