[mpich-discuss] Possible bug MPICH2 1.0.6

Junchao Zhang jczhang at mcs.anl.gov
Wed Jul 2 17:38:43 CDT 2014


It is very likely a bug in your code.  Without a toy program to reproduce
your error, it is hard for us to help.
Also, could you use newer MPICH releases like MPICH 3.1.1?

--Junchao Zhang


On Wed, Jul 2, 2014 at 3:02 PM, mark <dimitsas.markos at gmail.com> wrote:

>  Hello.
> In a program i am writing, i have an array which size is equal to the
> number of nodes in my cluster.
> I made this array to keep an eye of the objects that each node has in his
> control and when i use it with the actual ranks of the nodes as positions (
> *array[node_rank]=objects;*) it returns me an error and stops the
> execution:
>
> *rank 0 in job 10  Calliope_49755   caused collective abort of all ranks*
> *  exit status of rank 0: killed by signal 11 *
>
> However, i had freely used the variable *id* or *node_rank*  to point out
> a specific action for a specific node since now and never had problems.
>
> If instead of using the ranks node as a pointer for the array, i use an
> integer i.e. 5, the program runs fine, but even now, 1 out of 7 executions,
> it returns me an error and stops. The action i am calling that keeps
> returning me the error is a simple abstraction (*array[id]--;*).
>
> Any ideas?
>
>
> PS. The array is 1d and is created by allocating memory, like this : *int
> *array = malloc(processes * sizeof(int));*
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140702/7f843353/attachment.html>


More information about the discuss mailing list