<meta http-equiv="Content-Type" content="text/html; charset=utf-8">Yeah, I think checkpoint-<span></span>restart is probably the best way to do what this user wants. <div><br></div><div>Jeff <br><br>On Thursday, October 8, 2015, Rob Latham <<a href="mailto:robl@mcs.anl.gov">robl@mcs.anl.gov</a>> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
<br>
On 10/08/2015 08:18 AM, Jeff Hammond wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Nodes and cores are *hardware* concepts. If you need to add hardware<br>
resources to your job, that is a resource manager issue. If you want to<br>
create new MPI processes _that will run on those new nodes/cores_, you<br>
can use MPI_Comm_spawn for that.<br>
<br>
As for pausing a job, I'm not sure why you want to do that. Are you<br>
trying to suspect the job until the new hardware resources become available?<br>
</blockquote>
<br>
Probably not what Wahi has in mind, but back on the old Crays (old vector-era crays, I mean), when checkpoint/restart was an os-level service, the job scheduler used that feature: new job comes in with higher priority, checkpoint the old job. when new job finishes, restart old job.<br>
<br>
==rob<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
On Wed, Oct 7, 2015 at 10:08 PM, wahi <<a>wahi@sci.am</a> <mailto:<a>wahi@sci.am</a>>><br>
wrote:<br>
<br>
Hi Jeff,<br>
<br>
Thank you for your reply, I think the MPI_COMM_Spawn is for<br>
generating new processes but I need to add more node or cores after<br>
pausing the job, am I right ?<br>
<br>
<br>
On 10/08/2015 08:11 AM, Jeff Hammond wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
See MPI_Comm_spawn(_multiple).<br>
<br>
Jeff<br>
<br>
On Wednesday, October 7, 2015, Zhao, Xin<br>
<<mailto:<a>xinzhao3@illinois.edu</a>><a>xinzhao3@illinois.edu</a><br>
<mailto:<a>xinzhao3@illinois.edu</a>>> wrote:<br>
<br>
Hi Wahi,<br>
<br>
Do you mean you want to change number of hosts during one MPI<br>
execution? If so, we think that is not possible.<br>
<br>
Xin<br>
________________________________________<br>
From: wahi [<a>wahi@sci.am</a>]<br>
Sent: Wednesday, October 07, 2015 2:43 AM<br>
To: <a>discuss@mpich.org</a><br>
Subject: [mpich-discuss] Can the MPICH job pause and resumed<br>
<br>
Hi,<br>
<br>
I would like to know if there is possibility to pause the<br>
MPICH job and<br>
restart it with more node numbers ?<br>
<br>
<br>
<br>
Thanks in advance for any help or suggestion.<br>
<br>
<br>
<br>
<br>
<br>
Regards,<br>
Wahi<br>
_______________________________________________<br>
discuss mailing list <a>discuss@mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
_______________________________________________<br>
discuss mailing list <a>discuss@mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
<br>
<br>
<br>
--<br>
Jeff Hammond<br>
<a>jeff.science@gmail.com</a> <mailto:<a>jeff.science@gmail.com</a>><br>
<a href="http://jeffhammond.github.io/" target="_blank">http://jeffhammond.github.io/</a><br>
<br>
<br>
_______________________________________________<br>
discuss mailing <a>listdiscuss@mpich.org</a> <mailto:<a>discuss@mpich.org</a>><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
</blockquote>
<br>
<br>
_______________________________________________<br>
discuss mailing list <a>discuss@mpich.org</a> <mailto:<a>discuss@mpich.org</a>><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
<br>
<br>
<br>
<br>
--<br>
Jeff Hammond<br>
<a>jeff.science@gmail.com</a> <mailto:<a>jeff.science@gmail.com</a>><br>
<a href="http://jeffhammond.github.io/" target="_blank">http://jeffhammond.github.io/</a><br>
<br>
<br>
_______________________________________________<br>
discuss mailing list <a>discuss@mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
<br>
</blockquote>
<br>
-- <br>
Rob Latham<br>
Mathematics and Computer Science Division<br>
Argonne National Lab, IL USA<br>
_______________________________________________<br>
discuss mailing list <a>discuss@mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
</blockquote></div><br><br>-- <br>Jeff Hammond<br><a href="mailto:jeff.science@gmail.com" target="_blank">jeff.science@gmail.com</a><br><a href="http://jeffhammond.github.io/" target="_blank">http://jeffhammond.github.io/</a><br>