[mpich-discuss] Random Aborts in An MPI Program

Balaji, Pavan balaji at anl.gov
Sat Jul 5 18:48:42 CDT 2014


Mark,

Sorry, 1.0.6 is too old for us to help you here.  It might be a bug that was resolved a long time ago.

You can install mpich in your home directory.  You don’t need root permissions to install it.  Please see the README for such instructions.

Regards,

  — Pavan

On Jul 5, 2014, at 3:34 PM, mark <dimitsas.markos at gmail.com> wrote:

> Hello to all and to whoever is from the United States, Happy 4th of Jully !
> 
> 
> I am having these collective aborts while executing a program. More specifically the error is :
> rank 7 in job 38  Calliope_50667   caused collective abort of all ranks   exit status of rank 7: killed by signal 11  [cli_2]: aborting job: Fatal error in MPI_Allgather: Error message texts are not available [cli_4]: aborting job: Fatal error in MPI_Allgather: Error message texts are not available
> 
> But the MPI_Allgather command it's written correctly, since i checked it multiple times and what about the "Error message texts are not available"?
> 
> MPI_Allgather(&docs, 1, MPI_INT, texts_vectors, 1, MPI_INT, MPI_COMM_WORLD); 
> 
> Where docs is a int variable holding a single value for each node, and texts_vectors is an int array with the size of the population of the nodes.
> 
> I compile the programm using mpicc -g -o prog prog.c -lm and execute using mpiexec -n number_of_nodes `pwd`/prog
> I am using MPICH2 1.0.6 in a linux cluster machine that i use for the purposes of my bachelor thesis. I know it's an older version, but the machine belongs to an institute, so i don't have the permission to upgrade it. 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss




More information about the discuss mailing list