[mpich-discuss] mpi_comm_spawn - process not destroyed

Stanislav Simko s.simko at uu.nl
Wed Sep 6 06:42:28 CDT 2017


Hi Min,
I will have a look into it when I get a bit of time, but it's strange,
because I still see it. I've tested similar c++ code as well, and the
same problem is there - I see 3 processes running during the
computation in main program, after the workers should be finished. 
Also, I have tested on Debian 8.7 in VirtualBox with mpich from distro
repositories v3.1-5+b2 with the same outcome.
I'll try to get back with new info asap.
If interested, here's the c++ code with removed calculation from worker
but added to the main program:mpi.cpp:#include <mpi.h>#include
<math.h>#include <iostream>
int main(int argc, char *argv[]){
    MPI_Comm everyone;    MPI_Init(&argc, &argv);    double c =
0.;    std::cout << "starting worker\n";    for(int
i=0;i<2;i++){        MPI_Comm_spawn("./worker",MPI_ARGV_NULL,1,MPI_INFO
_NULL,0,MPI_COMM_SELF,            &everyone,MPI_ERRCODES_IGNORE);      
  MPI_Barrier(everyone);    }    std::cout << "parent rolling again &
doing stuff" << "\n";    for(int i=0;i<50000000;i++){        c = i*i+1-
(i*10) + sin(pow(i,i%8));    }    std::cout << "parent finished stuff:
" << c << "\n";    MPI_Finalize();    return 0;}
worker.cpp:#include <mpi.h>#include <iostream>using namespace std;
int main(int argc, char *argv[]){
    double c;    MPI_Init(&argc, &argv);    MPI_Comm
parent;    MPI_Comm_get_parent(&parent);    cout << "worker inside
worker finished stuff: " << c << "\n";    if(parent !=
MPI_COMM_NULL){        MPI_Barrier(parent);    }    MPI_Finalize();    
return 0;}


best,stanislav.

On Tue, 2017-09-05 at 16:27 -1000, Min Si wrote:
>     Hi Stanislav,
> 
>     
> 
>     I apologize for the late update. After a while I finally find
> chance
>     to try this test. I could not reproduce this problem on my side.
> 
>     
> 
>     Here is my environment:
> 
>     - MPICH version: version 3.3a2 (using the same configure options
> as
>     shown in your config.log)
> 
>     - mpi4py: version 2.0.0
> 
>     - platform: a Fedora25 VM and a Centos7 VM
> 
>     
> 
>     If you are still facing this problem, please try the following
> steps
>     to narrow down:
> 
>     1. Try remove the computation in child.py
> 
>     2. Try MPICH test suite under <your MPICH build
>     directory>/test/mpi/spawn/
> 
>         make 
> 
>         make testing V=1
> 
>     
> 
>     Regards,
> 
>     Min
> 
>     
> 
>     On 8/13/17 11:07 AM, Stanislav Simko
>       wrote:
> 
>     
>     
> >       
> >       Hi Min,
> >       attached are configure and make logs for the build that I did
> >         for myself on our local cluster with Intel compilers. I.e.,
> >         build is definitely not guaranteed to be perfect/optimal.
> > But
> >         please keep in mind that I get the same behaviour with
> > standard
> >         MPICH package in Fedora distribution - I think that
> >         configuration options should be available online, maybe
> > prepared
> >         by someone from MPICH?
> >       
> > 
> >       
> >       Also, I tested with simple c++ hello world like program and
> >         it's the same.
> >       
> > 
> >       
> >       Thank you.
> >       best,
> >       stanislav.
> >       
> > 
> >       
> >       
> > 
> >       
> >       On Sun, 2017-08-13 at 19:58 +0100, Min Si wrote:
> >       
> > >  Hi Stanislav,
> > > 
> > >         
> > > 
> > >         This seems interesting. Could you please also attach the
> > > MPICH
> > >         config.log ? You can find under the directory where you
> > > build
> > >         MPICH. I will look into this problem then and keep you
> > > updated.
> > > 
> > >         
> > > 
> > >         Min
> > > 
> > >         
> > > 
> > >         On 8/11/17 1:31 PM, Stanislav Simko
> > >           wrote:
> > > 
> > >         
> > >         
> > > >           Dear all,
> > > >           
> > > > 
> > > >           
> > > >           I'm just trying some very basic stuff with
> > > > MPI_COMM_SPAWN
> > > >             in python (i.e. I use mpi4py package), but I see
> > > > behaviour
> > > >             that I do not understand - the child process gets
> > > > spawned,
> > > >             does its stuff and then "should" finish. I see
> > > > though, that
> > > >             the process created for the child stays alive. I
> > > > see this
> > > >             only with the MPICH, OPENMPI does what I would
> > > > (naively)
> > > >             expect. In this way I can end up with N "ghost"
> > > > process,
> > > >             after calling SPAWN N-times. My minimal working
> > > > example is
> > > >             following:
> > > >           
> > > > 
> > > >           
> > > >           
> > > > 
> > > >           
> > > >           ______________________________________________
> > > >           parent.py
> > > >           
> > > > 
> > > >           
> > > >           from __future__ import print_function
> > > >           from mpi4py import MPI
> > > >           comm = MPI.COMM_WORLD
> > > >           spawned =
> > > >             MPI.COMM_SELF.Spawn(sys.executable,args=['child.py'
> > > > ],maxprocs=1)
> > > >           print("parent process is waiting for child")
> > > >           spawned.Barrier()
> > > >           
> > > > 
> > > >           
> > > >           
> > > > 
> > > >           
> > > >           ______________________________________________
> > > >           child.py
> > > >           
> > > > 
> > > >           
> > > >           from __future__ import print_function
> > > >           from mpi4py import MPI
> > > >           parent = MPI.Comm.Get_parent()
> > > >           # just do some stupid stuff that takes a bit of time
> > > >           for i in range(5000000):
> > > >               a = i*i+1-(i*10) + math.sin(math.pow(i,i%8))
> > > >           parent.Barrier()
> > > >           
> > > > 
> > > >           
> > > >           ______________________________________________
> > > >           
> > > > 
> > > >           
> > > >           I run with e.g.:
> > > >           mpirun -n 1 python mpi.py
> > > >           
> > > > 
> > > >           
> > > >           Do I miss something with SPAWN method?
> > > >           (I tested on two independent systems, our local
> > > > cluster
> > > >             with mpich v3.0.4, and my laptop - fedora 26, mpich
> > > > v3.2.8
> > > >             from repositories)
> > > >           
> > > > 
> > > >           
> > > >           thank you very much for suggestions.
> > > >           
> > > > 
> > > >           
> > > >           Regards,
> > > >           stanislav.
> > > >           
> > > >           
> > > >           
> > > > 
> > > >           
> > > >           
> > > > 
> > > >           _______________________________________________
> > > > discuss mailing list     discuss at mpich.org
> > > > To manage subscription options or unsubscribe:
> > > > https://lists.mpich.org/mailman/listinfo/discuss
> > > >         
> > > 
> > >         
> > > 
> > >       
> > 
> >       
> >       
> > 
> >       
> >       
> > 
> >       _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> >     
> 
>     
> 
>   
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170906/3fc5416f/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list