[mpich-discuss] Can the MPICH job pause and resumed
Rob Latham
robl at mcs.anl.gov
Thu Oct 8 13:13:52 CDT 2015
On 10/08/2015 08:18 AM, Jeff Hammond wrote:
> Nodes and cores are *hardware* concepts. If you need to add hardware
> resources to your job, that is a resource manager issue. If you want to
> create new MPI processes _that will run on those new nodes/cores_, you
> can use MPI_Comm_spawn for that.
>
> As for pausing a job, I'm not sure why you want to do that. Are you
> trying to suspect the job until the new hardware resources become available?
Probably not what Wahi has in mind, but back on the old Crays (old
vector-era crays, I mean), when checkpoint/restart was an os-level
service, the job scheduler used that feature: new job comes in with
higher priority, checkpoint the old job. when new job finishes, restart
old job.
==rob
>
> On Wed, Oct 7, 2015 at 10:08 PM, wahi <wahi at sci.am <mailto:wahi at sci.am>>
> wrote:
>
> Hi Jeff,
>
> Thank you for your reply, I think the MPI_COMM_Spawn is for
> generating new processes but I need to add more node or cores after
> pausing the job, am I right ?
>
>
> On 10/08/2015 08:11 AM, Jeff Hammond wrote:
>> See MPI_Comm_spawn(_multiple).
>>
>> Jeff
>>
>> On Wednesday, October 7, 2015, Zhao, Xin
>> <<mailto:xinzhao3 at illinois.edu>xinzhao3 at illinois.edu
>> <mailto:xinzhao3 at illinois.edu>> wrote:
>>
>> Hi Wahi,
>>
>> Do you mean you want to change number of hosts during one MPI
>> execution? If so, we think that is not possible.
>>
>> Xin
>> ________________________________________
>> From: wahi [wahi at sci.am]
>> Sent: Wednesday, October 07, 2015 2:43 AM
>> To: discuss at mpich.org
>> Subject: [mpich-discuss] Can the MPICH job pause and resumed
>>
>> Hi,
>>
>> I would like to know if there is possibility to pause the
>> MPICH job and
>> restart it with more node numbers ?
>>
>>
>>
>> Thanks in advance for any help or suggestion.
>>
>>
>>
>>
>>
>> Regards,
>> Wahi
>> _______________________________________________
>> discuss mailing list discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>> _______________________________________________
>> discuss mailing list discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>>
>>
>> --
>> Jeff Hammond
>> jeff.science at gmail.com <mailto:jeff.science at gmail.com>
>> http://jeffhammond.github.io/
>>
>>
>> _______________________________________________
>> discuss mailing listdiscuss at mpich.org <mailto:discuss at mpich.org>
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org>
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
>
>
>
> --
> Jeff Hammond
> jeff.science at gmail.com <mailto:jeff.science at gmail.com>
> http://jeffhammond.github.io/
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list