[mpich-discuss] Client hangs if server dies in dynamic process management
Lu, Huiwei
huiweilu at mcs.anl.gov
Wed Nov 19 15:05:07 CST 2014
Hi, Hirak,
Yes I can repeat the bug on both MacOS and Ubuntu with socket channel.
I have created a ticket for it. You can track the progress here:
http://trac.mpich.org/projects/mpich/ticket/2205
Thanks for reporting the bug.
—
Huiwei
> On Nov 18, 2014, at 12:49 PM, Roy, Hirak <Hirak_Roy at mentor.com> wrote:
>
>
> Hi Huiwei,
>
> 1> Did you start your nameserver ?
> 2> Did the server program crash?
> I see the same hang (incomplete MPI_Finalize in client).
>
> Here is my command line:
> Ø hydra_namserver &
> Ø mpiexec –n 1 –nameserver <hostname> ./server
> Ø mpiexec –n 1 –nameserver <hostname> ./client
>
>
>
> MPICH Version: 3.2a2
> MPICH Release date: Sun Nov 16 11:09:31 CST 2014
> MPICH Device: ch3:sock
> MPICH configure: --prefix /home/hroy/local//mpich-3.2a2/linux_x86_64 --disable-f77 --disable-fc --disable-f90modules --disable-cxx --enable-fast=nochkmsg --enable-fast=notiming --enable-fast=ndebug --enable-fast=O3 --with-device=ch3:sock --enable-g=dbg --disable-fortran --without-valgrind CFLAGS=-O3 -fPIC CXXFLAGS=-O3 -fPIC
> MPICH CC: /u/prod/gnu/gcc/20121129/gcc-4.5.0-linux_x86_64/bin/gcc -O3 -fPIC -g -O3
> MPICH CXX: no -O3 -fPIC -g
> MPICH F77: no -g
> MPICH FC: no -g
>
>
> Thanks,
> Hirak
>
> Could you try with the latest mpich-3.2a2?
> The client exit successfully on my Macbook with sock channel.
>
> —
> Huiwei
>
> > On Nov 16, 2014, at 10:44 PM, Hirak Roy <hirak_roy at mentor.com> wrote:
> >
> > Hi All,
> >
> > Here is my sample program. I am using channel sock of mpich-3.0.4.
> >
> > I am running it as
> > > mpiexec -n 1 ./server.out
> > > mpiexec -n 1 ./client.out
> >
> > Here my client program (client.c) hangs in MPI_Finalize.
> > There is an assert in the server.c where server exits.
> >
> > There is no way to detect that in client.
> > Even if we detect that using some timeout strategy, the client hangs in the finalize step.
> > Could you please suggest what is going wrong here or is this a bug in sock channel?
> >
> > Thanks,
> > Hirak
> > <client.c><server.c>_______________________________________________
> > discuss mailing list discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list