[mpich-discuss] Test of MPICH 3.1.3 on BlueGene/Q
Jeff Hammond
jeff.science at gmail.com
Tue Jan 19 17:38:37 CST 2016
Because Blue Gene doesn't have fork() or any other OS mechanism for
spawning processes after job start, it has never had a nontrivial
implementation of MPI_Comm_spawn and thus has never passed the MPICH test
suite. By nontrivial, I mean one that does something other than fail in a
compliant way because world_size = universe_size (which may not have
happened but which I proposed as a trivial way to achieve MPI-2.2
compliance).
For Blue Gene/Q acceptance testing, every MPICH test (from some version of
the test suite circa 2012, which I do not recall offhand) passed except
those explicitly excluded. The exclusions were anything related to dynamic
processes, connect-accept, etc. and language bindings (certainly Fortran; I
don't know what we said about C++, but I don't think that is relevant
here). Fortran was excluded because that has nothing to do with the guts
of MPI, the network, etc. It is strictly a test of the Fortran compiler and
the Fortran bindings. So if some MPICH Fortran test is failing, it is
either a compiler issue or a problem with MPICH Fortran bindings.
I hope this helps.
Jeff
On Tue, Jan 19, 2016 at 8:59 AM, Rob Latham <robl at mcs.anl.gov> wrote:
>
>
> On 01/17/2016 08:16 PM, Dominic Chien wrote:
>
>> Hi,
>>
>> I have built MPICH 3.1.3 on Bluegene/Q, based on the following
>> configuration
>> ../configure --host=powerpc64-bgq-linux --with-device=pamid:BGQ
>> --with-file-system=gpfs:BGQ
>> --with-bgq-install-dir=/bgsys/drivers/V1R2M0/ppc64 --disable-wrapper-rpath
>> --enable-fast=nochkmsg,notiming,O3 --with-assert-level=0
>> --disable-error-messages --disable-debuginfo --enable-thread-cs=per-object
>> --with-atomic-primitives --enable-handle-allocation=tls
>> --enable-refcount=lock-free --disable-predefined-refcount
>> --with-cross-file=src/mpid/pamid/cross/bgq8
>> --prefix=/scratch/home/chiensh/apps/mpich/3.1.3-opt/ --disable-spawn
>>
>> The resultant mpich has passed most of the tests (679) and 9 are failed
>> (see below), but I am not sure if these errors are critical. Can anyone
>> comment on this?
>>
>>
> I don't know if MPICH ever passed 100% of the mpich tests on Blue Gene
> (maybe back in 1.5.1 days, but we had fewer tests then, too).
>
> These 9 errors all look like something an application might run into:
> probing messages, truncated messages, RMA via the fortran interface,
> examining the status object.
>
> If your application does any of those things I'd pay particular attention
> to the results. It's entirely possible that your application won't touch
> the parts of MPICH that are not fully up to spec on Blue Gene.
>
> So, I would say these errors are concerning, but not critical. Press on
> and let us know how things go with your application!
>
> ==rob
>
>
>
> Many Thanks!
>>
>> Regards,
>> Dominic Chien
>>
>> =========================================================
>> not ok 283 - ./init/timeout 2
>> ---
>> Directory: ./init
>> File: timeout
>> Num-procs: 2
>> Date: "Wed Jan 13 13:55:50 2016"
>> ...
>> ## Test output (expected 'No Errors'):
>> ## srun returned a zero status but the program returned a nonzero status
>> =========================================================
>> =========================================================
>> not ok 324 - ./pt2pt/mprobe 2
>> ---
>> Directory: ./pt2pt
>> File: mprobe
>> Num-procs: 2
>> Date: "Wed Jan 13 14:07:47 2016"
>> ...
>> ## Test output (expected 'No Errors'):
>> ## 2016-01-13 14:07:47.846 (WARN ) [0xfff8d988bb0]
>> 78050:ibm.runjob.client.Job: terminated by signal 11
>> ## 2016-01-13 14:07:47.846 (WARN ) [0xfff8d988bb0]
>> 78050:ibm.runjob.client.Job: abnormal termination by signal 11 from rank 1
>> =========================================================
>> =========================================================
>> not ok 538 - ./f77/rma/wingetf 5
>> ---
>> Directory: ./f77/rma
>> File: wingetf
>> Num-procs: 5
>> Date: "Wed Jan 13 16:28:10 2016"
>> ...
>> ## Test output (expected 'No Errors'):
>> ## 0 buf( 1 , 11 ) = 751 expected 251
>> ## 0 buf( 2 , 11 ) = 752 expected 252
>> ## 0 buf( 3 , 11 ) = 753 expected 253
>> ## 0 buf( 4 , 11 ) = 754 expected 254
>> ## 0 buf( 5 , 11 ) = 755 expected 255
>> ## 0 buf( 6 , 11 ) = 756 expected 256
>> ## 0 buf( 7 , 11 ) = 757 expected 257
>> ## 0 buf( 8 , 11 ) = 758 expected 258
>> ## 0 buf( 9 , 11 ) = 759 expected 259
>> ## 0 buf( 10 , 11 ) = 760 expected 260
>> ## Found 25 errors
>> =========================================================
>> =========================================================
>> not ok 640 - ./f90/rma/wingetf90 5
>> ---
>> Directory: ./f90/rma
>> File: wingetf90
>> Num-procs: 5
>> Date: "Wed Jan 13 18:10:17 2016"
>> ...
>> ## Test output (expected 'No Errors'):
>> ## 4 buf( 1 ,0) = 0 expected 976
>> ## 4 buf( 2 ,0) = 24525328 expected 977
>> ## 4 buf( 3 ,0) = 31 expected 978
>> ## 4 buf( 4 ,0) = -1073759872 expected 979
>> ## 4 buf( 5 ,0) = 1107296292 expected 980
>> ## 4 buf( 6 ,0) = -1073758504 expected 981
>> ## 4 buf( 7 ,0) = 0 expected 982
>> ## 4 buf( 8 ,0) = 22184620 expected 983
>> ## 4 buf( 9 ,0) = 0 expected 984
>> ## 4 buf( 10 ,0) = 25808064 expected 985
>> ## Found 50 errors
>> =========================================================
>> =========================================================
>> not ok 668 - ./errors/pt2pt/truncmsg1 2
>> ---
>> Directory: ./errors/pt2pt
>> File: truncmsg1
>> Num-procs: 2
>> Date: "Wed Jan 13 18:16:19 2016"
>> ...
>> ## Test output (expected 'No Errors'):
>> ## MPI_Recv (short) returned MPI_SUCCESS instead of truncated message
>> ## MPI_Recv (irecv-short) returned MPI_SUCCESS instead of truncated
>> message
>> ## MPI_Recv (medium) returned MPI_SUCCESS instead of truncated message
>> ## Found 3 errors
>> =========================================================
>> =========================================================
>> not ok 670 - ./errors/pt2pt/errinstatts 2
>> ---
>> Directory: ./errors/pt2pt
>> File: errinstatts
>> Num-procs: 2
>> Date: "Wed Jan 13 18:16:45 2016"
>> ...
>> ## Test output (expected 'No Errors'):
>> ## Did not get ERR_IN_STATUS in Testsome (outcount = 2, should equal 2);
>> class returned was 0
>> ## Found 1 errors
>> =========================================================
>> =========================================================
>> not ok 671 - ./errors/pt2pt/errinstatta 2
>> ---
>> Directory: ./errors/pt2pt
>> File: errinstatta
>> Num-procs: 2
>> Date: "Wed Jan 13 18:16:58 2016"
>> ...
>> ## Test output (expected 'No Errors'):
>> ## Did not get ERR_IN_STATUS in Testall
>> ## Found 1 errors
>> =========================================================
>> =========================================================
>> not ok 672 - ./errors/pt2pt/errinstatws 2
>> ---
>> Directory: ./errors/pt2pt
>> File: errinstatws
>> Num-procs: 2
>> Date: "Wed Jan 13 18:17:11 2016"
>> ...
>> ## Test output (expected 'No Errors'):
>> ## Did not get ERR_IN_STATUS in Waitsome. Got 0.
>> ## Found 1 errors
>> =========================================================
>> =========================================================
>> not ok 673 - ./errors/pt2pt/errinstatwa 2
>> ---
>> Directory: ./errors/pt2pt
>> File: errinstatwa
>> Num-procs: 2
>> Date: "Wed Jan 13 18:17:23 2016"
>> ...
>> ## Test output (expected 'No Errors'):
>> ## Did not get ERR_IN_STATUS in Waitall
>> ## Found 1 errors
>> =========================================================
>>
>>
>> _______________________________________________
>> discuss mailing list discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
--
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20160119/831707fd/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list