[mpich-discuss] mpi_comm_spawn - process not destroyed
Stanislav Simko
s.simko at uu.nl
Wed Sep 6 06:42:28 CDT 2017
Hi Min,
I will have a look into it when I get a bit of time, but it's strange,
because I still see it. I've tested similar c++ code as well, and the
same problem is there - I see 3 processes running during the
computation in main program, after the workers should be finished.
Also, I have tested on Debian 8.7 in VirtualBox with mpich from distro
repositories v3.1-5+b2 with the same outcome.
I'll try to get back with new info asap.
If interested, here's the c++ code with removed calculation from worker
but added to the main program:mpi.cpp:#include <mpi.h>#include
<math.h>#include <iostream>
int main(int argc, char *argv[]){
MPI_Comm everyone; MPI_Init(&argc, &argv); double c =
0.; std::cout << "starting worker\n"; for(int
i=0;i<2;i++){ MPI_Comm_spawn("./worker",MPI_ARGV_NULL,1,MPI_INFO
_NULL,0,MPI_COMM_SELF, &everyone,MPI_ERRCODES_IGNORE);
MPI_Barrier(everyone); } std::cout << "parent rolling again &
doing stuff" << "\n"; for(int i=0;i<50000000;i++){ c = i*i+1-
(i*10) + sin(pow(i,i%8)); } std::cout << "parent finished stuff:
" << c << "\n"; MPI_Finalize(); return 0;}
worker.cpp:#include <mpi.h>#include <iostream>using namespace std;
int main(int argc, char *argv[]){
double c; MPI_Init(&argc, &argv); MPI_Comm
parent; MPI_Comm_get_parent(&parent); cout << "worker inside
worker finished stuff: " << c << "\n"; if(parent !=
MPI_COMM_NULL){ MPI_Barrier(parent); } MPI_Finalize();
return 0;}
best,stanislav.
On Tue, 2017-09-05 at 16:27 -1000, Min Si wrote:
> Hi Stanislav,
>
>
>
> I apologize for the late update. After a while I finally find
> chance
> to try this test. I could not reproduce this problem on my side.
>
>
>
> Here is my environment:
>
> - MPICH version: version 3.3a2 (using the same configure options
> as
> shown in your config.log)
>
> - mpi4py: version 2.0.0
>
> - platform: a Fedora25 VM and a Centos7 VM
>
>
>
> If you are still facing this problem, please try the following
> steps
> to narrow down:
>
> 1. Try remove the computation in child.py
>
> 2. Try MPICH test suite under <your MPICH build
> directory>/test/mpi/spawn/
>
> make
>
> make testing V=1
>
>
>
> Regards,
>
> Min
>
>
>
> On 8/13/17 11:07 AM, Stanislav Simko
> wrote:
>
>
>
> >
> > Hi Min,
> > attached are configure and make logs for the build that I did
> > for myself on our local cluster with Intel compilers. I.e.,
> > build is definitely not guaranteed to be perfect/optimal.
> > But
> > please keep in mind that I get the same behaviour with
> > standard
> > MPICH package in Fedora distribution - I think that
> > configuration options should be available online, maybe
> > prepared
> > by someone from MPICH?
> >
> >
> >
> > Also, I tested with simple c++ hello world like program and
> > it's the same.
> >
> >
> >
> > Thank you.
> > best,
> > stanislav.
> >
> >
> >
> >
> >
> >
> > On Sun, 2017-08-13 at 19:58 +0100, Min Si wrote:
> >
> > > Hi Stanislav,
> > >
> > >
> > >
> > > This seems interesting. Could you please also attach the
> > > MPICH
> > > config.log ? You can find under the directory where you
> > > build
> > > MPICH. I will look into this problem then and keep you
> > > updated.
> > >
> > >
> > >
> > > Min
> > >
> > >
> > >
> > > On 8/11/17 1:31 PM, Stanislav Simko
> > > wrote:
> > >
> > >
> > >
> > > > Dear all,
> > > >
> > > >
> > > >
> > > > I'm just trying some very basic stuff with
> > > > MPI_COMM_SPAWN
> > > > in python (i.e. I use mpi4py package), but I see
> > > > behaviour
> > > > that I do not understand - the child process gets
> > > > spawned,
> > > > does its stuff and then "should" finish. I see
> > > > though, that
> > > > the process created for the child stays alive. I
> > > > see this
> > > > only with the MPICH, OPENMPI does what I would
> > > > (naively)
> > > > expect. In this way I can end up with N "ghost"
> > > > process,
> > > > after calling SPAWN N-times. My minimal working
> > > > example is
> > > > following:
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > ______________________________________________
> > > > parent.py
> > > >
> > > >
> > > >
> > > > from __future__ import print_function
> > > > from mpi4py import MPI
> > > > comm = MPI.COMM_WORLD
> > > > spawned =
> > > > MPI.COMM_SELF.Spawn(sys.executable,args=['child.py'
> > > > ],maxprocs=1)
> > > > print("parent process is waiting for child")
> > > > spawned.Barrier()
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > ______________________________________________
> > > > child.py
> > > >
> > > >
> > > >
> > > > from __future__ import print_function
> > > > from mpi4py import MPI
> > > > parent = MPI.Comm.Get_parent()
> > > > # just do some stupid stuff that takes a bit of time
> > > > for i in range(5000000):
> > > > a = i*i+1-(i*10) + math.sin(math.pow(i,i%8))
> > > > parent.Barrier()
> > > >
> > > >
> > > >
> > > > ______________________________________________
> > > >
> > > >
> > > >
> > > > I run with e.g.:
> > > > mpirun -n 1 python mpi.py
> > > >
> > > >
> > > >
> > > > Do I miss something with SPAWN method?
> > > > (I tested on two independent systems, our local
> > > > cluster
> > > > with mpich v3.0.4, and my laptop - fedora 26, mpich
> > > > v3.2.8
> > > > from repositories)
> > > >
> > > >
> > > >
> > > > thank you very much for suggestions.
> > > >
> > > >
> > > >
> > > > Regards,
> > > > stanislav.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > discuss mailing list discuss at mpich.org
> > > > To manage subscription options or unsubscribe:
> > > > https://lists.mpich.org/mailman/listinfo/discuss
> > > >
> > >
> > >
> > >
> > >
> >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > discuss mailing list discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> >
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170906/3fc5416f/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list