[mpich-discuss] MPI_Abort not working with multinode jobs launched by hydra-3.2

Min Si msi at anl.gov
Mon Dec 12 12:33:48 CST 2016


Hi Doug,

We have changed the internal implementation of MPI_Abort in both MPICH 
code and hydra in 3.2. However, the Intel MPI 5.1.3.210 was based on an 
old MPICH version v3.1.2. Thus if you compile the program with Intel MPI 
which is based on MPICH v3.1.2 and execute the binary with hydra v3.2, 
processes in remote nodes might not be able to exit.

We do not support the usage with mismatched versions of MPICH and hydra. 
You should try a new version of Intel MPI if you want to use hydra 3.2. 
AFAIK, Intel MPI 2017.0.064 is based on MPICH v3.2.

Min

On 12/12/16 7:16 AM, Doug Johnson wrote:
> Hi,
>
> We've encountered a problem with hydra-3.2 and Intel MPI 5.1.3.210 with
> multi-node MPI programs.  A call to MPI_Abort results in the all MPI
> ranks running on the same node as the rank that called MPI_Abort to
> exit, but leaves the other ranks running.  The program hangs on the
> other nodes interminably (at least until the time limit of the batch job
> is reached.)  The behavior is the same with hydra-3.3a2.  The problem
> does not exist when using hydra-3.1.4, all processes exit on all nodes.
>
> Reverting commit 9882227414439a4a79edd49ec10261742bb60108 fixes this
> problem with 3.2.
>
> The hydra shipped with Intel MPI does not exhibit this problem, but we
> are using an out-of-tree hydra as want the pbs launcher enabled.  Is
> there a better mechanism for process cleanup other than reverting the
> patch above?  Let me know if there's other information needed.
>
> Thanks,
> Doug
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss

_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list