<meta http-equiv="Content-Type" content="text/html; charset=utf-8"><div dir="ltr">Ken,<div><br></div><div>Thanks so much for your concise response!</div><div><br></div><div>Is my understanding correct that: If one MPI rank, or all the ranks on a given node are to fail, from, say, a hardware issue, there is no means of determining that this has happened unless the MPI library supports the user level failure mitigation (ULFM) features? So if one were to pass --disable-auto-cleanup when the runtime didn't have support for ULFM, and processes on remote MPI ranks died, there'd be no way to query them and call MPI_abort from a different, still running rank?</div><div><br></div><div>Thanks again,</div><div>Zaak<br><br><div class="gmail_quote"><div dir="ltr">On Thu, Aug 31, 2017 at 12:19 PM <<a href="mailto:discuss-request@mpich.org">discuss-request@mpich.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Send discuss mailing list submissions to<br>
<a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a><br>
<br>
To subscribe or unsubscribe via the World Wide Web, visit<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
or, via email, send a message with subject or body 'help' to<br>
<a href="mailto:discuss-request@mpich.org" target="_blank">discuss-request@mpich.org</a><br>
<br>
You can reach the person managing the list at<br>
<a href="mailto:discuss-owner@mpich.org" target="_blank">discuss-owner@mpich.org</a><br>
<br>
When replying, please edit your Subject line so it is more specific<br>
than "Re: Contents of discuss digest..."<br>
<br>
<br>
Today's Topics:<br>
<br>
1. question about -disable-auto-cleanup (Zaak Beekman)<br>
2. Re: Torque MPICH jobs stuck (Souparno Adhikary)<br>
3. Re: question about -disable-auto-cleanup (Kenneth Raffenetti)<br>
<br>
<br>
----------------------------------------------------------------------<br>
<br>
Message: 1<br>
Date: Wed, 30 Aug 2017 17:29:11 +0000<br>
From: Zaak Beekman <<a href="mailto:zbeekman@gmail.com" target="_blank">zbeekman@gmail.com</a>><br>
To: <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a><br>
Subject: [mpich-discuss] question about -disable-auto-cleanup<br>
Message-ID:<br>
<<a href="mailto:CAAbnBwb8L2VtXqsbbGjZvSOHpTkd7mK6SeQuhHKTEG0brOPSBQ@mail.gmail.com" target="_blank">CAAbnBwb8L2VtXqsbbGjZvSOHpTkd7mK6SeQuhHKTEG0brOPSBQ@mail.gmail.com</a>><br>
Content-Type: text/plain; charset="utf-8"<br>
<br>
OK, since there were no responses here to my previous email, perhaps a<br>
better question would be:<br>
<br>
What is a good resource to learn about the impact of passing<br>
`--disable-auto-cleanup` at runtime?<br>
<br>
Some google searches bring up discussions of what appear to be bugs in the<br>
standard and/or implementation, but I'm not sure where to look to find out<br>
about even the intended runtime semantics.<br>
<br>
Any and all help pointing me in the right direction would be much<br>
appreciated.<br>
<br>
Thanks,<br>
Zaak<br>
<br>
On Wed, Aug 30, 2017 at 1:00 PM <<a href="mailto:discuss-request@mpich.org" target="_blank">discuss-request@mpich.org</a>> wrote:<br>
<br>
> Send discuss mailing list submissions to<br>
> <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a><br>
><br>
> To subscribe or unsubscribe via the World Wide Web, visit<br>
> <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
> or, via email, send a message with subject or body 'help' to<br>
> <a href="mailto:discuss-request@mpich.org" target="_blank">discuss-request@mpich.org</a><br>
><br>
> You can reach the person managing the list at<br>
> <a href="mailto:discuss-owner@mpich.org" target="_blank">discuss-owner@mpich.org</a><br>
><br>
> When replying, please edit your Subject line so it is more specific<br>
> than "Re: Contents of discuss digest..."<br>
><br>
><br>
> Today's Topics:<br>
><br>
> 1. question about -disable-auto-cleanup (Zaak Beekman)<br>
> 2. Torque MPICH jobs stuck (Souparno Adhikary)<br>
> 3. Re: Torque MPICH jobs stuck (Halim Amer)<br>
><br>
><br>
> ----------------------------------------------------------------------<br>
><br>
> Message: 1<br>
> Date: Tue, 29 Aug 2017 21:22:49 +0000<br>
> From: Zaak Beekman <<a href="mailto:zbeekman@gmail.com" target="_blank">zbeekman@gmail.com</a>><br>
> To: <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a><br>
> Subject: [mpich-discuss] question about -disable-auto-cleanup<br>
> Message-ID:<br>
> <<br>
> <a href="mailto:CAAbnBwZrQ03YmmmayhcHEywh8bEFMZ_AycBydOqZFB023KeJZQ@mail.gmail.com" target="_blank">CAAbnBwZrQ03YmmmayhcHEywh8bEFMZ_AycBydOqZFB023KeJZQ@mail.gmail.com</a>><br>
> Content-Type: text/plain; charset="utf-8"<br>
><br>
> I know that --disable-auto-cleanup is required to enable the fault-tolerant<br>
> MPI features, but are there downsides to passing this? Performance<br>
> implications?<br>
><br>
> I ask, because over at <a href="https://github.com/sourceryinstitute/OpenCoarrays" rel="noreferrer" target="_blank">https://github.com/sourceryinstitute/OpenCoarrays</a><br>
> we've<br>
> implemented much of the Fortran 2015 failed images feature on top of MPICH<br>
> and other MPI implementations. But to use this, --disable-auto-cleanup must<br>
> be passed to mpiexec. We provide wrapper scripts to try to abstract the<br>
> back end (GASNet, MPI, OpenSHMEM etc.) in the form of a Fortran compiler<br>
> wrapper, and an executable launcher. So I'm wondering, since failed images<br>
> are part of the standard (2015) would it be dumb if we always pass<br>
> --disable-auto-cleanup to mpiexec and only turn off support when explicitly<br>
> asked for by the user, or is it safer/more performant to default to<br>
> requiring the user to pass an additional flag to our wrapper script that<br>
> results in --disable-auto-cleanup getting passed to mpiexec?<br>
><br>
> Feedback would be much appreciated. Feel free to post responses at<br>
> <a href="https://github.com/sourceryinstitute/OpenCoarrays/issues/401" rel="noreferrer" target="_blank">https://github.com/sourceryinstitute/OpenCoarrays/issues/401</a> as well..<br>
><br>
> Thanks,<br>
> Zaak<br>
> -------------- next part --------------<br>
> An HTML attachment was scrubbed...<br>
> URL: <<br>
> <a href="http://lists.mpich.org/pipermail/discuss/attachments/20170829/52d25b23/attachment-0001.html" rel="noreferrer" target="_blank">http://lists.mpich.org/pipermail/discuss/attachments/20170829/52d25b23/attachment-0001.html</a><br>
> ><br>
><br>
> ------------------------------<br>
><br>
> Message: 2<br>
> Date: Wed, 30 Aug 2017 13:48:00 +0530<br>
> From: Souparno Adhikary <<a href="mailto:souparnoa91@gmail.com" target="_blank">souparnoa91@gmail.com</a>><br>
> To: <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a><br>
> Subject: [mpich-discuss] Torque MPICH jobs stuck<br>
> Message-ID:<br>
> <<br>
> <a href="mailto:CAL6QJ1BF8FAYAvLiyqtKGMo%2B6e_3vdSf95wmH2n2F8efHMyfCw@mail.gmail.com" target="_blank">CAL6QJ1BF8FAYAvLiyqtKGMo+6e_3vdSf95wmH2n2F8efHMyfCw@mail.gmail.com</a>><br>
> Content-Type: text/plain; charset="utf-8"<br>
><br>
> I know this is not a proper place to discuss this, but, as the Torque-mpich<br>
> list seems dead, I can't think of any other place to post this.<br>
><br>
> MPICH2 was installed in the servers. I installed Torque afterwards. I<br>
> opened the ports including them in the iptables file.<br>
><br>
> Torque mpi jobs (even the simple jobs like hostname) remains stuck. But,<br>
> the jobs are properly distributed in the nodes and pbsnodes -a showing them<br>
> in order.<br>
><br>
> The sched_log files and server_logs do not yield anything different.<br>
> Therefore, it might be a problem with the mpich2.<br>
><br>
> Can you please suggest me from where I can start troubleshooting???<br>
><br>
> Thanks,<br>
><br>
> Souparno Adhikary,<br>
> CHPC Lab,<br>
> Department of Microbiology,<br>
> University of Calcutta.<br>
> -------------- next part --------------<br>
> An HTML attachment was scrubbed...<br>
> URL: <<br>
> <a href="http://lists.mpich.org/pipermail/discuss/attachments/20170830/99b126ee/attachment-0001.html" rel="noreferrer" target="_blank">http://lists.mpich.org/pipermail/discuss/attachments/20170830/99b126ee/attachment-0001.html</a><br>
> ><br>
><br>
> ------------------------------<br>
><br>
> Message: 3<br>
> Date: Wed, 30 Aug 2017 11:00:51 -0500<br>
> From: Halim Amer <<a href="mailto:aamer@anl.gov" target="_blank">aamer@anl.gov</a>><br>
> To: <<a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a>><br>
> Subject: Re: [mpich-discuss] Torque MPICH jobs stuck<br>
> Message-ID: <<a href="mailto:3a2d0cc3-51a5-c646-4afc-40ece230bb04@anl.gov" target="_blank">3a2d0cc3-51a5-c646-4afc-40ece230bb04@anl.gov</a>><br>
> Content-Type: text/plain; charset="utf-8"; format=flowed<br>
><br>
> Which MPICH version are you using? Have you tried the latest 3.2 version?<br>
><br>
> If it still fails, can you attach your simple Torque job script here?<br>
><br>
> Halim<br>
> <a href="http://www.mcs.anl.gov/~aamer" rel="noreferrer" target="_blank">www.mcs.anl.gov/~aamer</a><br>
><br>
> On 8/30/17 3:18 AM, Souparno Adhikary wrote:<br>
> > I know this is not a proper place to discuss this, but, as the<br>
> > Torque-mpich list seems dead, I can't think of any other place to post<br>
> this.<br>
> ><br>
> > MPICH2 was installed in the servers. I installed Torque afterwards. I<br>
> > opened the ports including them in the iptables file.<br>
> ><br>
> > Torque mpi jobs (even the simple jobs like hostname) remains stuck. But,<br>
> > the jobs are properly distributed in the nodes and pbsnodes -a showing<br>
> > them in order.<br>
> ><br>
> > The sched_log files and server_logs do not yield anything different.<br>
> > Therefore, it might be a problem with the mpich2.<br>
> ><br>
> > Can you please suggest me from where I can start troubleshooting???<br>
> ><br>
> > Thanks,<br>
> ><br>
> > Souparno Adhikary,<br>
> > CHPC Lab,<br>
> > Department of Microbiology,<br>
> > University of Calcutta.<br>
> ><br>
> ><br>
> > _______________________________________________<br>
> > discuss mailing list <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a><br>
> > To manage subscription options or unsubscribe:<br>
> > <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
> ><br>
><br>
><br>
> ------------------------------<br>
><br>
> Subject: Digest Footer<br>
><br>
> _______________________________________________<br>
> discuss mailing list<br>
> <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a><br>
> <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
><br>
> ------------------------------<br>
><br>
> End of discuss Digest, Vol 58, Issue 18<br>
> ***************************************<br>
><br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <<a href="http://lists.mpich.org/pipermail/discuss/attachments/20170830/10bbec38/attachment-0001.html" rel="noreferrer" target="_blank">http://lists.mpich.org/pipermail/discuss/attachments/20170830/10bbec38/attachment-0001.html</a>><br>
<br>
------------------------------<br>
<br>
Message: 2<br>
Date: Thu, 31 Aug 2017 13:00:34 +0530<br>
From: Souparno Adhikary <<a href="mailto:souparnoa91@gmail.com" target="_blank">souparnoa91@gmail.com</a>><br>
To: <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a><br>
Subject: Re: [mpich-discuss] Torque MPICH jobs stuck<br>
Message-ID:<br>
<CAL6QJ1BaHHTQdfESDxFEcxNes-ZyjQD48KXe5==FTS_9=<a href="mailto:4Mw8w@mail.gmail.com" target="_blank">4Mw8w@mail.gmail.com</a>><br>
Content-Type: text/plain; charset="utf-8"<br>
<br>
We are using mpich2-1.4.1p1. I can give a try with the latest version. My<br>
job script is as follows:<br>
<br>
#!/bin/sh<br>
#PBS -N asyn<br>
#PBS -q batch<br>
#PBS -l nodes=4:ppn=4<br>
#PBS -l walltime=120:00:00<br>
#PBS -V<br>
cd $PBS_O_WORKDIR<br>
mpirun -np 16 gmx_mpi mdrun -deffnm asyn_10ns<br>
<br>
<br>
Souparno Adhikary,<br>
CHPC Lab,<br>
Department of Microbiology,<br>
University of Calcutta.<br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <<a href="http://lists.mpich.org/pipermail/discuss/attachments/20170831/e607657d/attachment-0001.html" rel="noreferrer" target="_blank">http://lists.mpich.org/pipermail/discuss/attachments/20170831/e607657d/attachment-0001.html</a>><br>
<br>
------------------------------<br>
<br>
Message: 3<br>
Date: Thu, 31 Aug 2017 11:19:45 -0500<br>
From: Kenneth Raffenetti <<a href="mailto:raffenet@mcs.anl.gov" target="_blank">raffenet@mcs.anl.gov</a>><br>
To: <<a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a>><br>
Subject: Re: [mpich-discuss] question about -disable-auto-cleanup<br>
Message-ID: <<a href="mailto:8a531da2-8de2-70ab-9ff2-bd4e36660154@mcs.anl.gov" target="_blank">8a531da2-8de2-70ab-9ff2-bd4e36660154@mcs.anl.gov</a>><br>
Content-Type: text/plain; charset="utf-8"; format=flowed<br>
<br>
Hi Zaak,<br>
<br>
I'll try my best to explain here. There are a few things to consider.<br>
<br>
1. Hydra: -disable-auto-cleanup means if an MPI process dies, let other<br>
processes in the job continue running. Since Hydra (mpiexec) is already<br>
monitoring MPI processes to detect when one dies, there is no impact<br>
inside Hydra from passing this option.<br>
<br>
2. Application behavior: Since the default error handler in MPI is<br>
MPI_ERRORS_ARE_FATAL, some applications may rely on that fact and expect<br>
a running job to be aborted/cleaned up if a process quits. With<br>
-disable-auto-cleanup this will no longer be the case. An application<br>
can call MPI_Abort() to force the old behavior, however.<br>
<br>
Ken<br>
<br>
On 08/30/2017 12:29 PM, Zaak Beekman wrote:<br>
> OK, since there were no responses here to my previous email, perhaps a<br>
> better question would be:<br>
><br>
> What is a good resource to learn about the impact of passing<br>
> `--disable-auto-cleanup` at runtime?<br>
><br>
> Some google searches bring up discussions of what appear to be bugs in<br>
> the standard and/or implementation, but I'm not sure where to look to<br>
> find out about even the intended runtime semantics.<br>
><br>
> Any and all help pointing me in the right direction would be much<br>
> appreciated.<br>
><br>
> Thanks,<br>
> Zaak<br>
><br>
> On Wed, Aug 30, 2017 at 1:00 PM <<a href="mailto:discuss-request@mpich.org" target="_blank">discuss-request@mpich.org</a><br>
> <mailto:<a href="mailto:discuss-request@mpich.org" target="_blank">discuss-request@mpich.org</a>>> wrote:<br>
><br>
> Send discuss mailing list submissions to<br>
> <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a> <mailto:<a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a>><br>
><br>
> To subscribe or unsubscribe via the World Wide Web, visit<br>
> <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
> or, via email, send a message with subject or body 'help' to<br>
> <a href="mailto:discuss-request@mpich.org" target="_blank">discuss-request@mpich.org</a> <mailto:<a href="mailto:discuss-request@mpich.org" target="_blank">discuss-request@mpich.org</a>><br>
><br>
> You can reach the person managing the list at<br>
> <a href="mailto:discuss-owner@mpich.org" target="_blank">discuss-owner@mpich.org</a> <mailto:<a href="mailto:discuss-owner@mpich.org" target="_blank">discuss-owner@mpich.org</a>><br>
><br>
> When replying, please edit your Subject line so it is more specific<br>
> than "Re: Contents of discuss digest..."<br>
><br>
><br>
> Today's Topics:<br>
><br>
> 1. question about -disable-auto-cleanup (Zaak Beekman)<br>
> 2. Torque MPICH jobs stuck (Souparno Adhikary)<br>
> 3. Re: Torque MPICH jobs stuck (Halim Amer)<br>
><br>
><br>
> ----------------------------------------------------------------------<br>
><br>
> Message: 1<br>
> Date: Tue, 29 Aug 2017 21:22:49 +0000<br>
> From: Zaak Beekman <<a href="mailto:zbeekman@gmail.com" target="_blank">zbeekman@gmail.com</a> <mailto:<a href="mailto:zbeekman@gmail.com" target="_blank">zbeekman@gmail.com</a>>><br>
> To: <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a> <mailto:<a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a>><br>
> Subject: [mpich-discuss] question about -disable-auto-cleanup<br>
> Message-ID:<br>
><br>
> <<a href="mailto:CAAbnBwZrQ03YmmmayhcHEywh8bEFMZ_AycBydOqZFB023KeJZQ@mail.gmail.com" target="_blank">CAAbnBwZrQ03YmmmayhcHEywh8bEFMZ_AycBydOqZFB023KeJZQ@mail.gmail.com</a><br>
> <mailto:<a href="mailto:CAAbnBwZrQ03YmmmayhcHEywh8bEFMZ_AycBydOqZFB023KeJZQ@mail.gmail.com" target="_blank">CAAbnBwZrQ03YmmmayhcHEywh8bEFMZ_AycBydOqZFB023KeJZQ@mail.gmail.com</a>>><br>
> Content-Type: text/plain; charset="utf-8"<br>
><br>
> I know that --disable-auto-cleanup is required to enable the<br>
> fault-tolerant<br>
> MPI features, but are there downsides to passing this? Performance<br>
> implications?<br>
><br>
> I ask, because over at<br>
> <a href="https://github.com/sourceryinstitute/OpenCoarrays" rel="noreferrer" target="_blank">https://github.com/sourceryinstitute/OpenCoarrays</a> we've<br>
> implemented much of the Fortran 2015 failed images feature on top of<br>
> MPICH<br>
> and other MPI implementations. But to use this,<br>
> --disable-auto-cleanup must<br>
> be passed to mpiexec. We provide wrapper scripts to try to abstract the<br>
> back end (GASNet, MPI, OpenSHMEM etc.) in the form of a Fortran compiler<br>
> wrapper, and an executable launcher. So I'm wondering, since failed<br>
> images<br>
> are part of the standard (2015) would it be dumb if we always pass<br>
> --disable-auto-cleanup to mpiexec and only turn off support when<br>
> explicitly<br>
> asked for by the user, or is it safer/more performant to default to<br>
> requiring the user to pass an additional flag to our wrapper script that<br>
> results in --disable-auto-cleanup getting passed to mpiexec?<br>
><br>
> Feedback would be much appreciated. Feel free to post responses at<br>
> <a href="https://github.com/sourceryinstitute/OpenCoarrays/issues/401" rel="noreferrer" target="_blank">https://github.com/sourceryinstitute/OpenCoarrays/issues/401</a> as well..<br>
><br>
> Thanks,<br>
> Zaak<br>
> -------------- next part --------------<br>
> An HTML attachment was scrubbed...<br>
> URL:<br>
> <<a href="http://lists.mpich.org/pipermail/discuss/attachments/20170829/52d25b23/attachment-0001.html" rel="noreferrer" target="_blank">http://lists.mpich.org/pipermail/discuss/attachments/20170829/52d25b23/attachment-0001.html</a>><br>
><br>
> ------------------------------<br>
><br>
> Message: 2<br>
> Date: Wed, 30 Aug 2017 13:48:00 +0530<br>
> From: Souparno Adhikary <<a href="mailto:souparnoa91@gmail.com" target="_blank">souparnoa91@gmail.com</a><br>
> <mailto:<a href="mailto:souparnoa91@gmail.com" target="_blank">souparnoa91@gmail.com</a>>><br>
> To: <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a> <mailto:<a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a>><br>
> Subject: [mpich-discuss] Torque MPICH jobs stuck<br>
> Message-ID:<br>
><br>
> <<a href="mailto:CAL6QJ1BF8FAYAvLiyqtKGMo%2B6e_3vdSf95wmH2n2F8efHMyfCw@mail.gmail.com" target="_blank">CAL6QJ1BF8FAYAvLiyqtKGMo+6e_3vdSf95wmH2n2F8efHMyfCw@mail.gmail.com</a><br>
> <mailto:<a href="mailto:CAL6QJ1BF8FAYAvLiyqtKGMo%252B6e_3vdSf95wmH2n2F8efHMyfCw@mail.gmail.com" target="_blank">CAL6QJ1BF8FAYAvLiyqtKGMo%2B6e_3vdSf95wmH2n2F8efHMyfCw@mail.gmail.com</a>>><br>
> Content-Type: text/plain; charset="utf-8"<br>
><br>
> I know this is not a proper place to discuss this, but, as the<br>
> Torque-mpich<br>
> list seems dead, I can't think of any other place to post this.<br>
><br>
> MPICH2 was installed in the servers. I installed Torque afterwards. I<br>
> opened the ports including them in the iptables file.<br>
><br>
> Torque mpi jobs (even the simple jobs like hostname) remains stuck. But,<br>
> the jobs are properly distributed in the nodes and pbsnodes -a<br>
> showing them<br>
> in order.<br>
><br>
> The sched_log files and server_logs do not yield anything different.<br>
> Therefore, it might be a problem with the mpich2.<br>
><br>
> Can you please suggest me from where I can start troubleshooting???<br>
><br>
> Thanks,<br>
><br>
> Souparno Adhikary,<br>
> CHPC Lab,<br>
> Department of Microbiology,<br>
> University of Calcutta.<br>
> -------------- next part --------------<br>
> An HTML attachment was scrubbed...<br>
> URL:<br>
> <<a href="http://lists.mpich.org/pipermail/discuss/attachments/20170830/99b126ee/attachment-0001.html" rel="noreferrer" target="_blank">http://lists.mpich.org/pipermail/discuss/attachments/20170830/99b126ee/attachment-0001.html</a>><br>
><br>
> ------------------------------<br>
><br>
> Message: 3<br>
> Date: Wed, 30 Aug 2017 11:00:51 -0500<br>
> From: Halim Amer <<a href="mailto:aamer@anl.gov" target="_blank">aamer@anl.gov</a> <mailto:<a href="mailto:aamer@anl.gov" target="_blank">aamer@anl.gov</a>>><br>
> To: <<a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a> <mailto:<a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a>>><br>
> Subject: Re: [mpich-discuss] Torque MPICH jobs stuck<br>
> Message-ID: <<a href="mailto:3a2d0cc3-51a5-c646-4afc-40ece230bb04@anl.gov" target="_blank">3a2d0cc3-51a5-c646-4afc-40ece230bb04@anl.gov</a><br>
> <mailto:<a href="mailto:3a2d0cc3-51a5-c646-4afc-40ece230bb04@anl.gov" target="_blank">3a2d0cc3-51a5-c646-4afc-40ece230bb04@anl.gov</a>>><br>
> Content-Type: text/plain; charset="utf-8"; format=flowed<br>
><br>
> Which MPICH version are you using? Have you tried the latest 3.2<br>
> version?<br>
><br>
> If it still fails, can you attach your simple Torque job script here?<br>
><br>
> Halim<br>
> <a href="http://www.mcs.anl.gov/~aamer" rel="noreferrer" target="_blank">www.mcs.anl.gov/~aamer</a> <<a href="http://www.mcs.anl.gov/~aamer" rel="noreferrer" target="_blank">http://www.mcs.anl.gov/~aamer</a>><br>
><br>
> On 8/30/17 3:18 AM, Souparno Adhikary wrote:<br>
> > I know this is not a proper place to discuss this, but, as the<br>
> > Torque-mpich list seems dead, I can't think of any other place to<br>
> post this.<br>
> ><br>
> > MPICH2 was installed in the servers. I installed Torque afterwards. I<br>
> > opened the ports including them in the iptables file.<br>
> ><br>
> > Torque mpi jobs (even the simple jobs like hostname) remains<br>
> stuck. But,<br>
> > the jobs are properly distributed in the nodes and pbsnodes -a<br>
> showing<br>
> > them in order.<br>
> ><br>
> > The sched_log files and server_logs do not yield anything different.<br>
> > Therefore, it might be a problem with the mpich2.<br>
> ><br>
> > Can you please suggest me from where I can start troubleshooting???<br>
> ><br>
> > Thanks,<br>
> ><br>
> > Souparno Adhikary,<br>
> > CHPC Lab,<br>
> > Department of Microbiology,<br>
> > University of Calcutta.<br>
> ><br>
> ><br>
> > _______________________________________________<br>
> > discuss mailing list <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a> <mailto:<a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a>><br>
> > To manage subscription options or unsubscribe:<br>
> > <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
> ><br>
><br>
><br>
> ------------------------------<br>
><br>
> Subject: Digest Footer<br>
><br>
> _______________________________________________<br>
> discuss mailing list<br>
> <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a> <mailto:<a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a>><br>
> <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
><br>
> ------------------------------<br>
><br>
> End of discuss Digest, Vol 58, Issue 18<br>
> ***************************************<br>
><br>
><br>
><br>
> _______________________________________________<br>
> discuss mailing list <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a><br>
> To manage subscription options or unsubscribe:<br>
> <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
><br>
<br>
<br>
------------------------------<br>
<br>
Subject: Digest Footer<br>
<br>
_______________________________________________<br>
discuss mailing list<br>
<a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a><br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
<br>
------------------------------<br>
<br>
End of discuss Digest, Vol 58, Issue 19<br>
***************************************<br>
</blockquote></div></div></div>