[mpich-discuss] Can the MPICH job pause and resumed
Jeff Hammond
jeff.science at gmail.com
Sat Oct 10 16:09:33 CDT 2015
Yeah, I think checkpoint-restart is probably the best way to do what this
user wants.
Jeff
On Thursday, October 8, 2015, Rob Latham <robl at mcs.anl.gov> wrote:
>
>
> On 10/08/2015 08:18 AM, Jeff Hammond wrote:
>
>> Nodes and cores are *hardware* concepts. If you need to add hardware
>> resources to your job, that is a resource manager issue. If you want to
>> create new MPI processes _that will run on those new nodes/cores_, you
>> can use MPI_Comm_spawn for that.
>>
>> As for pausing a job, I'm not sure why you want to do that. Are you
>> trying to suspect the job until the new hardware resources become
>> available?
>>
>
> Probably not what Wahi has in mind, but back on the old Crays (old
> vector-era crays, I mean), when checkpoint/restart was an os-level service,
> the job scheduler used that feature: new job comes in with higher
> priority, checkpoint the old job. when new job finishes, restart old job.
>
> ==rob
>
>
>> On Wed, Oct 7, 2015 at 10:08 PM, wahi <wahi at sci.am <mailto:wahi at sci.am>>
>> wrote:
>>
>> Hi Jeff,
>>
>> Thank you for your reply, I think the MPI_COMM_Spawn is for
>> generating new processes but I need to add more node or cores after
>> pausing the job, am I right ?
>>
>>
>> On 10/08/2015 08:11 AM, Jeff Hammond wrote:
>>
>>> See MPI_Comm_spawn(_multiple).
>>>
>>> Jeff
>>>
>>> On Wednesday, October 7, 2015, Zhao, Xin
>>> <<mailto:xinzhao3 at illinois.edu>xinzhao3 at illinois.edu
>>> <mailto:xinzhao3 at illinois.edu>> wrote:
>>>
>>> Hi Wahi,
>>>
>>> Do you mean you want to change number of hosts during one MPI
>>> execution? If so, we think that is not possible.
>>>
>>> Xin
>>> ________________________________________
>>> From: wahi [wahi at sci.am]
>>> Sent: Wednesday, October 07, 2015 2:43 AM
>>> To: discuss at mpich.org
>>> Subject: [mpich-discuss] Can the MPICH job pause and resumed
>>>
>>> Hi,
>>>
>>> I would like to know if there is possibility to pause the
>>> MPICH job and
>>> restart it with more node numbers ?
>>>
>>>
>>>
>>> Thanks in advance for any help or suggestion.
>>>
>>>
>>>
>>>
>>>
>>> Regards,
>>> Wahi
>>> _______________________________________________
>>> discuss mailing list discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>> _______________________________________________
>>> discuss mailing list discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>
>>>
>>>
>>> --
>>> Jeff Hammond
>>> jeff.science at gmail.com <mailto:jeff.science at gmail.com>
>>> http://jeffhammond.github.io/
>>>
>>>
>>> _______________________________________________
>>> discuss mailing listdiscuss at mpich.org <mailto:discuss at mpich.org>
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>
>>
>>
>> _______________________________________________
>> discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org>
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>>
>>
>>
>> --
>> Jeff Hammond
>> jeff.science at gmail.com <mailto:jeff.science at gmail.com>
>> http://jeffhammond.github.io/
>>
>>
>> _______________________________________________
>> discuss mailing list discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>>
> --
> Rob Latham
> Mathematics and Computer Science Division
> Argonne National Lab, IL USA
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
--
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20151010/c33c8b1a/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list