[mpich-discuss] Can the MPICH job pause and resumed

Rob Latham robl at mcs.anl.gov
Thu Oct 8 13:13:52 CDT 2015



On 10/08/2015 08:18 AM, Jeff Hammond wrote:
> Nodes and cores are *hardware* concepts.  If you need to add hardware
> resources to your job, that is a resource manager issue.  If you want to
> create new MPI processes _that will run on those new nodes/cores_, you
> can use MPI_Comm_spawn for that.
>
> As for pausing a job, I'm not sure why you want to do that.  Are you
> trying to suspect the job until the new hardware resources become available?

Probably not what Wahi has in mind, but back on the old Crays (old 
vector-era crays, I mean), when checkpoint/restart was an os-level 
service, the job scheduler used that feature:  new job comes in with 
higher priority, checkpoint the old job.  when new job finishes, restart 
old job.

==rob

>
> On Wed, Oct 7, 2015 at 10:08 PM, wahi <wahi at sci.am <mailto:wahi at sci.am>>
> wrote:
>
>     Hi Jeff,
>
>     Thank you for your reply, I think the MPI_COMM_Spawn is for
>     generating new processes but I need to add more node or cores after
>     pausing the job, am I right ?
>
>
>     On 10/08/2015 08:11 AM, Jeff Hammond wrote:
>>     See MPI_Comm_spawn(_multiple).
>>
>>     Jeff
>>
>>     On Wednesday, October 7, 2015, Zhao, Xin
>>     <<mailto:xinzhao3 at illinois.edu>xinzhao3 at illinois.edu
>>     <mailto:xinzhao3 at illinois.edu>> wrote:
>>
>>         Hi Wahi,
>>
>>         Do you mean you want to change number of hosts during one MPI
>>         execution? If so, we think that is not possible.
>>
>>         Xin
>>         ________________________________________
>>         From: wahi [wahi at sci.am]
>>         Sent: Wednesday, October 07, 2015 2:43 AM
>>         To: discuss at mpich.org
>>         Subject: [mpich-discuss] Can the MPICH job pause and resumed
>>
>>         Hi,
>>
>>         I would like to know if there is possibility to pause the
>>         MPICH job and
>>         restart it with more node numbers ?
>>
>>
>>
>>         Thanks in advance for any help or suggestion.
>>
>>
>>
>>
>>
>>         Regards,
>>           Wahi
>>         _______________________________________________
>>         discuss mailing list discuss at mpich.org
>>         To manage subscription options or unsubscribe:
>>         https://lists.mpich.org/mailman/listinfo/discuss
>>         _______________________________________________
>>         discuss mailing list discuss at mpich.org
>>         To manage subscription options or unsubscribe:
>>         https://lists.mpich.org/mailman/listinfo/discuss
>>
>>
>>
>>     --
>>     Jeff Hammond
>>     jeff.science at gmail.com <mailto:jeff.science at gmail.com>
>>     http://jeffhammond.github.io/
>>
>>
>>     _______________________________________________
>>     discuss mailing listdiscuss at mpich.org <mailto:discuss at mpich.org>
>>     To manage subscription options or unsubscribe:
>>     https://lists.mpich.org/mailman/listinfo/discuss
>
>
>     _______________________________________________
>     discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org>
>     To manage subscription options or unsubscribe:
>     https://lists.mpich.org/mailman/listinfo/discuss
>
>
>
>
> --
> Jeff Hammond
> jeff.science at gmail.com <mailto:jeff.science at gmail.com>
> http://jeffhammond.github.io/
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list