[mpich-discuss] Client hangs if server dies in dynamic process management

Lu, Huiwei huiweilu at mcs.anl.gov
Wed Nov 19 15:05:07 CST 2014


Hi, Hirak,

Yes I can repeat the bug on both MacOS and Ubuntu with socket channel.

I have created a ticket for it. You can track the progress here:
http://trac.mpich.org/projects/mpich/ticket/2205

Thanks for reporting the bug.

—
Huiwei

> On Nov 18, 2014, at 12:49 PM, Roy, Hirak <Hirak_Roy at mentor.com> wrote:
> 
>  
> Hi Huiwei,
>  
> 1>    Did you start your nameserver ?
> 2>    Did the server program crash?
> I see the same hang (incomplete MPI_Finalize in client).
>  
> Here is my command line:
> Ø  hydra_namserver &
> Ø  mpiexec –n 1 –nameserver <hostname> ./server
> Ø  mpiexec –n 1 –nameserver <hostname> ./client
>  
>  
>  
> MPICH Version:                 3.2a2
> MPICH Release date:       Sun Nov 16 11:09:31 CST 2014
> MPICH Device:                  ch3:sock
> MPICH configure:             --prefix /home/hroy/local//mpich-3.2a2/linux_x86_64 --disable-f77 --disable-fc --disable-f90modules --disable-cxx --enable-fast=nochkmsg --enable-fast=notiming --enable-fast=ndebug --enable-fast=O3 --with-device=ch3:sock --enable-g=dbg --disable-fortran --without-valgrind CFLAGS=-O3 -fPIC CXXFLAGS=-O3 -fPIC
> MPICH CC:          /u/prod/gnu/gcc/20121129/gcc-4.5.0-linux_x86_64/bin/gcc -O3 -fPIC   -g -O3
> MPICH CXX:        no -O3 -fPIC  -g
> MPICH F77:         no   -g
> MPICH FC:           no   -g
>  
>  
> Thanks,
> Hirak
>  
> Could you try with the latest mpich-3.2a2?
> The client exit successfully on my Macbook with sock channel.
>  
>> Huiwei
>  
> > On Nov 16, 2014, at 10:44 PM, Hirak Roy <hirak_roy at mentor.com> wrote:
> > 
> > Hi All,
> > 
> > Here is my sample program. I am using channel sock of mpich-3.0.4.
> > 
> > I am running it as
> > > mpiexec -n 1 ./server.out
> > > mpiexec -n 1 ./client.out
> > 
> > Here my client program (client.c) hangs in MPI_Finalize.
> > There is an assert in the server.c where server exits.
> > 
> > There is no way to detect that in client.
> > Even if we detect that using some timeout strategy, the client hangs in the finalize step.
> > Could you please suggest what is going wrong here or is this a bug in sock channel?
> > 
> > Thanks,
> > Hirak
> > <client.c><server.c>_______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
>  
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss

_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list