<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body><div>Hi Min,</div><div><br></div><div>I will have a look into it when I get a bit of time, but it's strange, because I still see it. I've tested similar c++ code as well, and the same problem is there - I see 3 processes running during the computation in main program, after the workers should be finished. </div><div><br></div><div>Also, I have tested on Debian 8.7 in VirtualBox with mpich from distro repositories v3.1-5+b2 with the same outcome.</div><div><br></div><div>I'll try to get back with new info asap.</div><div><br></div><div>If interested, here's the c++ code with removed calculation from worker but added to the main program:</div><div>mpi.cpp:</div><div>#include <mpi.h></div><div>#include <math.h></div><div></div><div>#include <iostream></div><div><br></div><div>int main(int argc, char *argv[]){</div><div><br></div><div> MPI_Comm everyone;</div><div> MPI_Init(&argc, &argv);</div><div> double c = 0.;</div><div> std::cout << "starting worker\n";</div><div> for(int i=0;i<2;i++){</div><div></div><div> MPI_Comm_spawn("./worker",MPI_ARGV_NULL,1,MPI_INFO_NULL,0,MPI_COMM_SELF,</div><div> &everyone,MPI_ERRCODES_IGNORE);</div><div></div><div> MPI_Barrier(everyone);</div><div> }</div><div> std::cout << "parent rolling again & doing stuff" << "\n";</div><div> for(int i=0;i<50000000;i++){</div><div> c = i*i+1-(i*10) + sin(pow(i,i%8));</div><div> }</div><div> std::cout << "parent finished stuff: " << c << "\n";</div><div> MPI_Finalize();</div><div> return 0;</div><div>}</div><div><br></div><div>worker.cpp:</div><div>#include <mpi.h></div><div></div><div></div><div>#include <iostream></div><div>using namespace std;</div><div><br></div><div>int main(int argc, char *argv[]){</div><div><br></div><div> double c;</div><div> MPI_Init(&argc, &argv);</div><div> MPI_Comm parent;</div><div> MPI_Comm_get_parent(&parent);</div><div> cout << "worker inside worker finished stuff: " << c << "\n";</div><div> if(parent != MPI_COMM_NULL){</div><div></div><div> MPI_Barrier(parent);</div><div> }</div><div> MPI_Finalize();</div><div> return 0;</div><div>}</div><div></div><div><br></div><div><br></div><div><br></div><div></div><div></div><div></div><div></div><div>best,</div><div>stanislav.</div><div><br></div><div><br></div><div>On Tue, 2017-09-05 at 16:27 -1000, Min Si wrote:</div><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex">
Hi Stanislav,<br>
<br>
I apologize for the late update. After a while I finally find chance
to try this test. I could not reproduce this problem on my side.<br>
<br>
Here is my environment:<br>
- MPICH version: version 3.3a2 (using the same configure options as
shown in your config.log)<br>
- mpi4py: version 2.0.0<br>
- platform: a Fedora25 VM and a Centos7 VM<br>
<br>
If you are still facing this problem, please try the following steps
to narrow down:<br>
1. Try remove the computation in child.py<br>
2. Try MPICH test suite under <your MPICH build
directory>/test/mpi/spawn/<br>
make <br>
make testing V=1<br>
<br>
Regards,<br>
Min<br>
<br>
<div class="moz-cite-prefix">On 8/13/17 11:07 AM, Stanislav Simko
wrote:<br>
</div>
<blockquote type="cite" cite="mid:1502658423.18927.3.camel@uu.nl" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex">
<div>Hi Min,</div>
<div>attached are configure and make logs for the build that I did
for myself on our local cluster with Intel compilers. I.e.,
build is definitely not guaranteed to be perfect/optimal. But
please keep in mind that I get the same behaviour with standard
MPICH package in Fedora distribution - I think that
configuration options should be available online, maybe prepared
by someone from MPICH?</div>
<div><br>
</div>
<div>Also, I tested with simple c++ hello world like program and
it's the same.</div>
<div><br>
</div>
<div>Thank you.</div>
<div>best,</div>
<div>stanislav.</div>
<div><br>
</div>
<div><br>
</div>
<div>On Sun, 2017-08-13 at 19:58 +0100, Min Si wrote:</div>
<blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"> Hi Stanislav,<br>
<br>
This seems interesting. Could you please also attach the MPICH
config.log ? You can find under the directory where you build
MPICH. I will look into this problem then and keep you updated.<br>
<br>
Min<br>
<br>
<div class="moz-cite-prefix">On 8/11/17 1:31 PM, Stanislav Simko
wrote:<br>
</div>
<blockquote type="cite" cite="mid:1502454690.4363.18.camel@uu.nl" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex">
<div>Dear all,</div>
<div><br>
</div>
<div>I'm just trying some very basic stuff with MPI_COMM_SPAWN
in python (i.e. I use mpi4py package), but I see behaviour
that I do not understand - the child process gets spawned,
does its stuff and then "should" finish. I see though, that
the process created for the child stays alive. I see this
only with the MPICH, OPENMPI does what I would (naively)
expect. In this way I can end up with N "ghost" process,
after calling SPAWN N-times. My minimal working example is
following:</div>
<div><br>
</div>
<div><br>
</div>
<div>______________________________________________</div>
<div>parent.py</div>
<div><br>
</div>
<div>from __future__ import print_function</div>
<div>from mpi4py import MPI</div>
<div>comm = MPI.COMM_WORLD</div>
<div>spawned =
MPI.COMM_SELF.Spawn(sys.executable,args=['child.py'],maxprocs=1)</div>
<div>print("parent process is waiting for child")</div>
<div>spawned.Barrier()</div>
<div><br>
</div>
<div><br>
</div>
<div>______________________________________________</div>
<div>child.py</div>
<div><br>
</div>
<div>from __future__ import print_function</div>
<div>from mpi4py import MPI</div>
<div>parent = MPI.Comm.Get_parent()</div>
<div># just do some stupid stuff that takes a bit of time</div>
<div>for i in range(5000000):</div>
<div> a = i*i+1-(i*10) + math.sin(math.pow(i,i%8))</div>
<div>parent.Barrier()</div>
<div><br>
</div>
<div>______________________________________________</div>
<div><br>
</div>
<div>I run with e.g.:</div>
<div>mpirun -n 1 python mpi.py</div>
<div><br>
</div>
<div>Do I miss something with SPAWN method?</div>
<div>(I tested on two independent systems, our local cluster
with mpich v3.0.4, and my laptop - fedora 26, mpich v3.2.8
from repositories)</div>
<div><br>
</div>
<div>thank you very much for suggestions.</div>
<div><br>
</div>
<div>Regards,</div>
<div>stanislav.</div>
<div><span></span></div>
<div><span></span></div>
<br>
<br>
<pre wrap="">_______________________________________________
discuss mailing list <a class="moz-txt-link-abbreviated" href="mailto:discuss@mpich.org" moz-do-not-send="true">discuss@mpich.org</a>
To manage subscription options or unsubscribe:
<a class="moz-txt-link-freetext" href="https://lists.mpich.org/mailman/listinfo/discuss" moz-do-not-send="true">https://lists.mpich.org/mailman/listinfo/discuss</a></pre>
</blockquote>
<br>
</blockquote>
<div><span></span></div>
<br>
<br>
<pre wrap="">_______________________________________________
discuss mailing list <a class="moz-txt-link-abbreviated" href="mailto:discuss@mpich.org">discuss@mpich.org</a>
To manage subscription options or unsubscribe:
<a class="moz-txt-link-freetext" href="https://lists.mpich.org/mailman/listinfo/discuss">https://lists.mpich.org/mailman/listinfo/discuss</a></pre>
</blockquote>
<br>
</blockquote><div><span></span></div></body></html>