[mpich-discuss] Fwd: MPICH fault tolerance and resiliency

sanjeev s snjv.workmail at gmail.com
Fri May 26 10:40:32 CDT 2017


Hi,

In dynamic process, I read about two models: Client server and parent child.

Client Server : We need to have dedicated threads each for client and
server. Now considering all instances same , we will end up doing lot of
thread creation apart from our application worker threads. Moreover when 1
instance (app) goes down, we want that instance to come up without doing
much manual work. We don't want to club this logic in our application.
Also, When I took the size(number of instance for that comm), I am not
getting the count for client instance. To distribute the task, I need to
have additional logic to handle this case in my application.

2) Parent child: Suppose we have started 4 instance on 4 different
machines. Now we need to add another server. I don't think parent child/
client server is good option in this case.

We don't want to build process management capabilities in our application.
We are looking for process management in MPI itself (e.g in Hydra )so that
we can leverage on that.

Please correct me if I am missing something in my understanding of Dynamic
model.

Regards
Sanjeev Sinha



On Fri, May 26, 2017 at 8:46 PM, Halim Amer <aamer at anl.gov> wrote:

> Sanjeev,
>
> > More precisely my requirement is suppose I started 4 instances of my
> > application. Now I want to add one more instance dynamically to this set
>
> From my understanding, dynamic processes would work fine for this case.
> Could you elaborate on why the dynamic process model is not sufficient for
> your needs?
>
> Halim
> www.mcs.anl.gov/~aamer
>
>
> On 5/26/17 9:11 AM, sanjeev s wrote:
>
>> Hi mpich,
>>
>> I have a requirement where in we need to add start stop application
>> instances on the fly before starting a job.Is there any mpich service
>> available. I looked through dynamic process model, but its not sufficing
>> our need.
>>
>> More precisely my requirement is suppose I started 4 instances of my
>> application. Now I want to add one more instance dynamically to this set
>>
>> Is there any tool which MPICH supports for fault tolerance behavior?
>>
>> Thanks
>> Sanjeev
>>
>>
>>
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170526/eaac3657/attachment.html>


More information about the discuss mailing list