[mpich-discuss] question about -disable-auto-cleanup

Zaak Beekman zbeekman at gmail.com
Wed Aug 30 12:29:11 CDT 2017


OK, since there were no responses here to my previous email, perhaps a
better question would be:

What is a good resource to learn about the impact of passing
`--disable-auto-cleanup` at runtime?

Some google searches bring up discussions of what appear to be bugs in the
standard and/or implementation, but I'm not sure where to look to find out
about even the intended runtime semantics.

Any and all help pointing me in the right direction would be much
appreciated.

Thanks,
Zaak

On Wed, Aug 30, 2017 at 1:00 PM <discuss-request at mpich.org> wrote:

> Send discuss mailing list submissions to
>         discuss at mpich.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://lists.mpich.org/mailman/listinfo/discuss
> or, via email, send a message with subject or body 'help' to
>         discuss-request at mpich.org
>
> You can reach the person managing the list at
>         discuss-owner at mpich.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of discuss digest..."
>
>
> Today's Topics:
>
>    1.  question about -disable-auto-cleanup (Zaak Beekman)
>    2.  Torque MPICH jobs stuck (Souparno Adhikary)
>    3. Re:  Torque MPICH jobs stuck (Halim Amer)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 29 Aug 2017 21:22:49 +0000
> From: Zaak Beekman <zbeekman at gmail.com>
> To: discuss at mpich.org
> Subject: [mpich-discuss] question about -disable-auto-cleanup
> Message-ID:
>         <
> CAAbnBwZrQ03YmmmayhcHEywh8bEFMZ_AycBydOqZFB023KeJZQ at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> I know that --disable-auto-cleanup is required to enable the fault-tolerant
> MPI features, but are there downsides to passing this? Performance
> implications?
>
> I ask, because over at https://github.com/sourceryinstitute/OpenCoarrays
> we've
> implemented much of the Fortran 2015 failed images feature on top of MPICH
> and other MPI implementations. But to use this, --disable-auto-cleanup must
> be passed to mpiexec. We provide wrapper scripts to try to abstract the
> back end (GASNet, MPI, OpenSHMEM etc.) in the form of a Fortran compiler
> wrapper, and an executable launcher. So I'm wondering, since failed images
> are part of the standard (2015) would it be dumb if we always pass
> --disable-auto-cleanup to mpiexec and only turn off support when explicitly
> asked for by the user, or is it safer/more performant to default to
> requiring the user to pass an additional flag to our wrapper script that
> results in --disable-auto-cleanup getting passed to mpiexec?
>
> Feedback would be much appreciated. Feel free to post responses at
> https://github.com/sourceryinstitute/OpenCoarrays/issues/401 as well..
>
> Thanks,
> Zaak
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.mpich.org/pipermail/discuss/attachments/20170829/52d25b23/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 2
> Date: Wed, 30 Aug 2017 13:48:00 +0530
> From: Souparno Adhikary <souparnoa91 at gmail.com>
> To: discuss at mpich.org
> Subject: [mpich-discuss] Torque MPICH jobs stuck
> Message-ID:
>         <
> CAL6QJ1BF8FAYAvLiyqtKGMo+6e_3vdSf95wmH2n2F8efHMyfCw at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> I know this is not a proper place to discuss this, but, as the Torque-mpich
> list seems dead, I can't think of any other place to post this.
>
> MPICH2 was installed in the servers. I installed Torque afterwards. I
> opened the ports including them in the iptables file.
>
> Torque mpi jobs (even the simple jobs like hostname) remains stuck. But,
> the jobs are properly distributed in the nodes and pbsnodes -a showing them
> in order.
>
> The sched_log files and server_logs do not yield anything different.
> Therefore, it might be a problem with the mpich2.
>
> Can you please suggest me from where I can start troubleshooting???
>
> Thanks,
>
> Souparno Adhikary,
> CHPC Lab,
> Department of Microbiology,
> University of Calcutta.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.mpich.org/pipermail/discuss/attachments/20170830/99b126ee/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 3
> Date: Wed, 30 Aug 2017 11:00:51 -0500
> From: Halim Amer <aamer at anl.gov>
> To: <discuss at mpich.org>
> Subject: Re: [mpich-discuss] Torque MPICH jobs stuck
> Message-ID: <3a2d0cc3-51a5-c646-4afc-40ece230bb04 at anl.gov>
> Content-Type: text/plain; charset="utf-8"; format=flowed
>
> Which MPICH version are you using? Have you tried the latest 3.2 version?
>
> If it still fails, can you attach your simple Torque job script here?
>
> Halim
> www.mcs.anl.gov/~aamer
>
> On 8/30/17 3:18 AM, Souparno Adhikary wrote:
> > I know this is not a proper place to discuss this, but, as the
> > Torque-mpich list seems dead, I can't think of any other place to post
> this.
> >
> > MPICH2 was installed in the servers. I installed Torque afterwards. I
> > opened the ports including them in the iptables file.
> >
> > Torque mpi jobs (even the simple jobs like hostname) remains stuck. But,
> > the jobs are properly distributed in the nodes and pbsnodes -a showing
> > them in order.
> >
> > The sched_log files and server_logs do not yield anything different.
> > Therefore, it might be a problem with the mpich2.
> >
> > Can you please suggest me from where I can start troubleshooting???
> >
> > Thanks,
> >
> > Souparno Adhikary,
> > CHPC Lab,
> > Department of Microbiology,
> > University of Calcutta.
> >
> >
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> >
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> discuss mailing list
> discuss at mpich.org
> https://lists.mpich.org/mailman/listinfo/discuss
>
> ------------------------------
>
> End of discuss Digest, Vol 58, Issue 18
> ***************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170830/10bbec38/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list