<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Michael,<br>
<br>
Good question. Since this is the MPICH forum, I ran some test
cases on a CRAY, and a 64bit PC: <br>
<br>
For the CRAY using the cray-mpich/6.3.1 module. The code simply
initializes MPI and returns its rank. The code (exit_test.C)
follows:<br>
<br>
#include <iostream><br>
#include <cstdlib><br>
#include <mpi.h><br>
<br>
int main(void) {<br>
int rank;<br>
MPI_Init(0,0);<br>
MPI_Comm_rank( MPI_COMM_WORLD, &rank);<br>
if( rank == 0 ) {<br>
int size;<br>
MPI_Comm_size( MPI_COMM_WORLD, &size);<br>
std::cout << "Rank:" << rank << ",
size:" << size << std::endl; <br>
}<br>
MPI_Finalize();<br>
return rank;<br>
}<br>
<br>
I compiled with <br>
<br>
CC exit_test.C -o exit_test<br>
<br>
and executed with the following (PBS_INTERACTIVE)<br>
<br>
aprun -n 7 ./exit_test<br>
<br>
I was surprised with the result. I ran it probably a dozen
times. Here are a few runs ...<br>
<br>
xxx@batch2:~/src/MPI/tmp> aprun -n 7 ./exit_test<br>
Rank:0, size:7<br>
Application 13573540 exit codes: 6<br>
Application 13573540 resources: utime ~0s, stime ~1s, Rss ~39164,
inblocks ~10234, outblocks ~23470<br>
xxx@batch2:~/src/MPI/tmp> echo $?<br>
6<br>
xxx@batch2:~/src/MPI/tmp> aprun -n 7 ./exit_test<br>
Rank:0, size:7<br>
Application 13573552 exit codes: 4<br>
Application 13573552 resources: utime ~0s, stime ~1s, Rss ~39164,
inblocks ~10234, outblocks ~23470<br>
xxx@batch2:~/src/MPI/tmp> echo $?<br>
4<br>
xxx@batch2:~/src/MPI/tmp> aprun -n 7 ./exit_test<br>
Rank:0, size:7<br>
Application 13573564 exit codes: 1<br>
Application 13573564 resources: utime ~0s, stime ~1s, Rss ~39172,
inblocks ~10234, outblocks ~23470<br>
xxx@batch2:~/src/MPI/tmp> echo $?<br>
1<br>
xxx@batch2:~/src/MPI/tmp> aprun -n 7 ./exit_test<br>
Rank:0, size:7<br>
Application 13573589 exit codes: 5<br>
Application 13573589 resources: utime ~0s, stime ~1s, Rss ~39188,
inblocks ~10234, outblocks ~23470<br>
xxx@batch2:~/src/MPI/tmp> echo $?<br>
5<br>
<br>
I say surprised because the man page taken from
<a class="moz-txt-link-freetext" href="http://www.mpich.org/static/docs/v3.1/www1/mpiexec.html">http://www.mpich.org/static/docs/v3.1/www1/mpiexec.html</a><br>
reports the following ...<br>
<br>
<h2>Return Status</h2>
<tt>mpiexec</tt> returns the maximum of the exit status values of
all of the
processes created by <tt>mpiexec</tt>.
<br>
<br>
<br>
But the fine Cray folk have probably mucked with the MPI
executive, so the results running on the XC30 might not agree with
the MPICH man page.<br>
<br>
I also ran this test on a PC running UBUNTU 12.04, with source
built MPICH 3.1.2. Here are a few of those results<br>
<br>
Compile line: mpicc exit_test.C -o exit_test<br>
<br>
Here are results from this test. Note than I am varying the
processor count here.<br>
<br>
xxx@Cue ~/tmp/MPI $ mpiexec -np 9 ./exit_test<br>
Rank:0, size:9<br>
<br>
===================================================================================<br>
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES<br>
= PID 19062 RUNNING AT Cue<br>
= EXIT CODE: 1<br>
= CLEANING UP REMAINING PROCESSES<br>
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES<br>
===================================================================================<br>
xxx@Cue ~/tmp/MPI $ echo $?<br>
15<br>
xxx@Cue ~/tmp/MPI $ mpiexec -np 8 ./exit_test<br>
Rank:0, size:8<br>
<br>
===================================================================================<br>
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES<br>
= PID 19130 RUNNING AT Cue<br>
= EXIT CODE: 1<br>
= CLEANING UP REMAINING PROCESSES<br>
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES<br>
===================================================================================<br>
xxx@Cue ~/tmp/MPI $ echo $?<br>
7<br>
xxx@Cue ~/tmp/MPI $ mpiexec -np 7 ./exit_test<br>
Rank:0, size:7<br>
<br>
===================================================================================<br>
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES<br>
= PID 19142 RUNNING AT Cue<br>
= EXIT CODE: 1<br>
= CLEANING UP REMAINING PROCESSES<br>
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES<br>
===================================================================================<br>
xxx@Cue ~/tmp/MPI $ echo $?<br>
7<br>
xxx@Cue ~/tmp/MPI $ mpiexec -np 3 ./exit_test<br>
Rank:0, size:3<br>
<br>
===================================================================================<br>
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES<br>
= PID 19158 RUNNING AT Cue<br>
= EXIT CODE: 1<br>
= CLEANING UP REMAINING PROCESSES<br>
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES<br>
===================================================================================<br>
xxx@Cue ~/tmp/MPI $ echo $?<br>
3<br>
<br>
So unlike the test on the CRAY, the results on the PC are
consistent, but do not agree with the man page. <br>
<br>
Michael to address your question, it is certainly implementation
dependent. There are any number of ways to report the exit
status. MPICH docs suggest taking the maximum value. Another
approach is to use the exit status of the root process.<br>
<br>
As a side note, MPICH acknowledges that a non-zero return value
results in a bad termination. If I run the later case with np=1,
I get<br>
<br>
xxx@Cue ~/tmp/MPI $ mpiexec -np 1 ./exit_test<br>
Rank:0, size:1<br>
xxx@Cue ~/tmp/MPI $ echo $?<br>
0<br>
<br>
which is what I would expect. --Mike<br>
<br>
On 04/09/2015 02:27 PM, Michael Raymond wrote:<br>
</div>
<blockquote cite="mid:5526D2BC.8060101@sgi.com" type="cite"> Hi.
I'm the lead developer of SGI MPI.
<br>
<br>
Do other MPIs do something different here? As you might have
thousands of ranks, I'm wondering how you'd decide which exit code
to return?
<br>
<br>
On 04/09/2015 11:36 AM, Michael L. Stokes wrote:
<br>
<blockquote type="cite">This question is not MPICH specific, but
I'm sure the expertise is here
<br>
to answer this question.
<br>
<br>
While running tests on spirit.afrl.hpc.mil (SGI ICE X) using the
<br>
MPT/2.11 stack, I noticed that mpirun returns 0 to the shell
regardless
<br>
of the exit value ( <stdlib.h> exit(int) ), or the return
value
<br>
(return(int)) from the main.
<br>
<br>
Would this behavior be regarded as an error? What are the
issues?
<br>
<br>
--Mike
<br>
_______________________________________________
<br>
discuss mailing list <a class="moz-txt-link-abbreviated" href="mailto:discuss@mpich.org">discuss@mpich.org</a>
<br>
To manage subscription options or unsubscribe:
<br>
<a class="moz-txt-link-freetext" href="https://lists.mpich.org/mailman/listinfo/discuss">https://lists.mpich.org/mailman/listinfo/discuss</a>
<br>
</blockquote>
<br>
</blockquote>
<br>
</body>
</html>