<meta http-equiv="Content-Type" content="text/html; charset=utf-8">Yeah, I think checkpoint-<span></span>restart is probably the best way to do what this user wants. <div><br></div><div>Jeff <br><br>On Thursday, October 8, 2015, Rob Latham <<a href="mailto:robl@mcs.anl.gov">robl@mcs.anl.gov</a>> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
<br>
On 10/08/2015 08:18 AM, Jeff Hammond wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Nodes and cores are *hardware* concepts.  If you need to add hardware<br>
resources to your job, that is a resource manager issue.  If you want to<br>
create new MPI processes _that will run on those new nodes/cores_, you<br>
can use MPI_Comm_spawn for that.<br>
<br>
As for pausing a job, I'm not sure why you want to do that.  Are you<br>
trying to suspect the job until the new hardware resources become available?<br>
</blockquote>
<br>
Probably not what Wahi has in mind, but back on the old Crays (old vector-era crays, I mean), when checkpoint/restart was an os-level service, the job scheduler used that feature:  new job comes in with higher priority, checkpoint the old job.  when new job finishes, restart old job.<br>
<br>
==rob<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
On Wed, Oct 7, 2015 at 10:08 PM, wahi <<a>wahi@sci.am</a> <mailto:<a>wahi@sci.am</a>>><br>
wrote:<br>
<br>
    Hi Jeff,<br>
<br>
    Thank you for your reply, I think the MPI_COMM_Spawn is for<br>
    generating new processes but I need to add more node or cores after<br>
    pausing the job, am I right ?<br>
<br>
<br>
    On 10/08/2015 08:11 AM, Jeff Hammond wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
    See MPI_Comm_spawn(_multiple).<br>
<br>
    Jeff<br>
<br>
    On Wednesday, October 7, 2015, Zhao, Xin<br>
    <<mailto:<a>xinzhao3@illinois.edu</a>><a>xinzhao3@illinois.edu</a><br>
    <mailto:<a>xinzhao3@illinois.edu</a>>> wrote:<br>
<br>
        Hi Wahi,<br>
<br>
        Do you mean you want to change number of hosts during one MPI<br>
        execution? If so, we think that is not possible.<br>
<br>
        Xin<br>
        ________________________________________<br>
        From: wahi [<a>wahi@sci.am</a>]<br>
        Sent: Wednesday, October 07, 2015 2:43 AM<br>
        To: <a>discuss@mpich.org</a><br>
        Subject: [mpich-discuss] Can the MPICH job pause and resumed<br>
<br>
        Hi,<br>
<br>
        I would like to know if there is possibility to pause the<br>
        MPICH job and<br>
        restart it with more node numbers ?<br>
<br>
<br>
<br>
        Thanks in advance for any help or suggestion.<br>
<br>
<br>
<br>
<br>
<br>
        Regards,<br>
          Wahi<br>
        _______________________________________________<br>
        discuss mailing list <a>discuss@mpich.org</a><br>
        To manage subscription options or unsubscribe:<br>
        <a href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
        _______________________________________________<br>
        discuss mailing list <a>discuss@mpich.org</a><br>
        To manage subscription options or unsubscribe:<br>
        <a href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
<br>
<br>
<br>
    --<br>
    Jeff Hammond<br>
    <a>jeff.science@gmail.com</a> <mailto:<a>jeff.science@gmail.com</a>><br>
    <a href="http://jeffhammond.github.io/" target="_blank">http://jeffhammond.github.io/</a><br>
<br>
<br>
    _______________________________________________<br>
    discuss mailing <a>listdiscuss@mpich.org</a> <mailto:<a>discuss@mpich.org</a>><br>
    To manage subscription options or unsubscribe:<br>
    <a href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
</blockquote>
<br>
<br>
    _______________________________________________<br>
    discuss mailing list <a>discuss@mpich.org</a> <mailto:<a>discuss@mpich.org</a>><br>
    To manage subscription options or unsubscribe:<br>
    <a href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
<br>
<br>
<br>
<br>
--<br>
Jeff Hammond<br>
<a>jeff.science@gmail.com</a> <mailto:<a>jeff.science@gmail.com</a>><br>
<a href="http://jeffhammond.github.io/" target="_blank">http://jeffhammond.github.io/</a><br>
<br>
<br>
_______________________________________________<br>
discuss mailing list     <a>discuss@mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
<br>
</blockquote>
<br>
-- <br>
Rob Latham<br>
Mathematics and Computer Science Division<br>
Argonne National Lab, IL USA<br>
_______________________________________________<br>
discuss mailing list     <a>discuss@mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
</blockquote></div><br><br>-- <br>Jeff Hammond<br><a href="mailto:jeff.science@gmail.com" target="_blank">jeff.science@gmail.com</a><br><a href="http://jeffhammond.github.io/" target="_blank">http://jeffhammond.github.io/</a><br>