<div dir="ltr">I am not calling the cpi.py script directly. The master is spawning those processes. So I call<div><br></div><div>$ mpiexec -n 30 python master.py </div><div><br></div><div>Then each of the 30 ranks should spawn a cpi.py process. But with the attached master.py and cpi.py (directly from the mpi4py tutorial), you can see the errors I get:</div>
<div><br></div><div><div>[jlarson@mintthinkpad tutorial_example]$ mpiexec -n 30 python master.py </div><div>[mpiexec@mintthinkpad] control_cb (pm/pmiserv/pmiserv_cb.c:200): assert (!closed) failed</div><div>[mpiexec@mintthinkpad] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status</div>
<div>[mpiexec@mintthinkpad] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:198): error waiting for event</div><div>[mpiexec@mintthinkpad] main (ui/mpich/mpiexec.c:336): process manager error waiting for completion</div>
</div><div><br></div><div>As was previously stated, this appears to be an mpi4py problem and not a mpich question. </div><div><br></div><div>Since you are curious about the application, I the motivating example involves the numerical optimization of the output from an expensive simulation. I do not have access to the simulation code, so my master will tell the workers where they need to evaluate the expensive simulation. Then the simulation might itself depend heavily on MPI. </div>
<div><br></div><div>But I welcome your input on the design paradigm to avoid the "sharp edges"</div><div><br></div><div>Thank you again,</div><div>Jeff</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">
On Wed, Mar 12, 2014 at 5:15 PM, Jed Brown <span dir="ltr"><<a href="mailto:jed@jedbrown.org" target="_blank">jed@jedbrown.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Jeffrey Larson <<a href="mailto:jmlarson@anl.gov">jmlarson@anl.gov</a>> writes:<br>
<br>
> I am trying to have a single master, with a group of workers who are<br>
> themselves calculating function values. The evaluation of the function may<br>
> itself involve spawning MPI tasks.<br>
<br>
How are you running the cpi test then? I ran it with many spawned<br>
processes<br>
<br>
MPI.COMM_SELF.Spawn(cmd, None, 30)<br>
<br>
and with one spawned process on each of many masters. Neither crashed<br>
with current MPICH or Open MPI. What exactly is needed to reproduce the<br>
failure you see?<br>
<br>
<br>
I am curious why you want all this process spawning (there are lots of<br>
sharp edges to deploying this approach), but mpi4py should work.<br>
</blockquote></div><br></div>