[mpich-discuss] Test of MPICH 3.1.3 on BlueGene/Q
Rob Latham
robl at mcs.anl.gov
Tue Jan 19 10:59:59 CST 2016
On 01/17/2016 08:16 PM, Dominic Chien wrote:
> Hi,
>
> I have built MPICH 3.1.3 on Bluegene/Q, based on the following configuration
> ../configure --host=powerpc64-bgq-linux --with-device=pamid:BGQ --with-file-system=gpfs:BGQ --with-bgq-install-dir=/bgsys/drivers/V1R2M0/ppc64 --disable-wrapper-rpath --enable-fast=nochkmsg,notiming,O3 --with-assert-level=0 --disable-error-messages --disable-debuginfo --enable-thread-cs=per-object --with-atomic-primitives --enable-handle-allocation=tls --enable-refcount=lock-free --disable-predefined-refcount --with-cross-file=src/mpid/pamid/cross/bgq8 --prefix=/scratch/home/chiensh/apps/mpich/3.1.3-opt/ --disable-spawn
>
> The resultant mpich has passed most of the tests (679) and 9 are failed (see below), but I am not sure if these errors are critical. Can anyone comment on this?
>
I don't know if MPICH ever passed 100% of the mpich tests on Blue Gene
(maybe back in 1.5.1 days, but we had fewer tests then, too).
These 9 errors all look like something an application might run into:
probing messages, truncated messages, RMA via the fortran interface,
examining the status object.
If your application does any of those things I'd pay particular
attention to the results. It's entirely possible that your application
won't touch the parts of MPICH that are not fully up to spec on Blue Gene.
So, I would say these errors are concerning, but not critical. Press on
and let us know how things go with your application!
==rob
> Many Thanks!
>
> Regards,
> Dominic Chien
>
> =========================================================
> not ok 283 - ./init/timeout 2
> ---
> Directory: ./init
> File: timeout
> Num-procs: 2
> Date: "Wed Jan 13 13:55:50 2016"
> ...
> ## Test output (expected 'No Errors'):
> ## srun returned a zero status but the program returned a nonzero status
> =========================================================
> =========================================================
> not ok 324 - ./pt2pt/mprobe 2
> ---
> Directory: ./pt2pt
> File: mprobe
> Num-procs: 2
> Date: "Wed Jan 13 14:07:47 2016"
> ...
> ## Test output (expected 'No Errors'):
> ## 2016-01-13 14:07:47.846 (WARN ) [0xfff8d988bb0] 78050:ibm.runjob.client.Job: terminated by signal 11
> ## 2016-01-13 14:07:47.846 (WARN ) [0xfff8d988bb0] 78050:ibm.runjob.client.Job: abnormal termination by signal 11 from rank 1
> =========================================================
> =========================================================
> not ok 538 - ./f77/rma/wingetf 5
> ---
> Directory: ./f77/rma
> File: wingetf
> Num-procs: 5
> Date: "Wed Jan 13 16:28:10 2016"
> ...
> ## Test output (expected 'No Errors'):
> ## 0 buf( 1 , 11 ) = 751 expected 251
> ## 0 buf( 2 , 11 ) = 752 expected 252
> ## 0 buf( 3 , 11 ) = 753 expected 253
> ## 0 buf( 4 , 11 ) = 754 expected 254
> ## 0 buf( 5 , 11 ) = 755 expected 255
> ## 0 buf( 6 , 11 ) = 756 expected 256
> ## 0 buf( 7 , 11 ) = 757 expected 257
> ## 0 buf( 8 , 11 ) = 758 expected 258
> ## 0 buf( 9 , 11 ) = 759 expected 259
> ## 0 buf( 10 , 11 ) = 760 expected 260
> ## Found 25 errors
> =========================================================
> =========================================================
> not ok 640 - ./f90/rma/wingetf90 5
> ---
> Directory: ./f90/rma
> File: wingetf90
> Num-procs: 5
> Date: "Wed Jan 13 18:10:17 2016"
> ...
> ## Test output (expected 'No Errors'):
> ## 4 buf( 1 ,0) = 0 expected 976
> ## 4 buf( 2 ,0) = 24525328 expected 977
> ## 4 buf( 3 ,0) = 31 expected 978
> ## 4 buf( 4 ,0) = -1073759872 expected 979
> ## 4 buf( 5 ,0) = 1107296292 expected 980
> ## 4 buf( 6 ,0) = -1073758504 expected 981
> ## 4 buf( 7 ,0) = 0 expected 982
> ## 4 buf( 8 ,0) = 22184620 expected 983
> ## 4 buf( 9 ,0) = 0 expected 984
> ## 4 buf( 10 ,0) = 25808064 expected 985
> ## Found 50 errors
> =========================================================
> =========================================================
> not ok 668 - ./errors/pt2pt/truncmsg1 2
> ---
> Directory: ./errors/pt2pt
> File: truncmsg1
> Num-procs: 2
> Date: "Wed Jan 13 18:16:19 2016"
> ...
> ## Test output (expected 'No Errors'):
> ## MPI_Recv (short) returned MPI_SUCCESS instead of truncated message
> ## MPI_Recv (irecv-short) returned MPI_SUCCESS instead of truncated message
> ## MPI_Recv (medium) returned MPI_SUCCESS instead of truncated message
> ## Found 3 errors
> =========================================================
> =========================================================
> not ok 670 - ./errors/pt2pt/errinstatts 2
> ---
> Directory: ./errors/pt2pt
> File: errinstatts
> Num-procs: 2
> Date: "Wed Jan 13 18:16:45 2016"
> ...
> ## Test output (expected 'No Errors'):
> ## Did not get ERR_IN_STATUS in Testsome (outcount = 2, should equal 2); class returned was 0
> ## Found 1 errors
> =========================================================
> =========================================================
> not ok 671 - ./errors/pt2pt/errinstatta 2
> ---
> Directory: ./errors/pt2pt
> File: errinstatta
> Num-procs: 2
> Date: "Wed Jan 13 18:16:58 2016"
> ...
> ## Test output (expected 'No Errors'):
> ## Did not get ERR_IN_STATUS in Testall
> ## Found 1 errors
> =========================================================
> =========================================================
> not ok 672 - ./errors/pt2pt/errinstatws 2
> ---
> Directory: ./errors/pt2pt
> File: errinstatws
> Num-procs: 2
> Date: "Wed Jan 13 18:17:11 2016"
> ...
> ## Test output (expected 'No Errors'):
> ## Did not get ERR_IN_STATUS in Waitsome. Got 0.
> ## Found 1 errors
> =========================================================
> =========================================================
> not ok 673 - ./errors/pt2pt/errinstatwa 2
> ---
> Directory: ./errors/pt2pt
> File: errinstatwa
> Num-procs: 2
> Date: "Wed Jan 13 18:17:23 2016"
> ...
> ## Test output (expected 'No Errors'):
> ## Did not get ERR_IN_STATUS in Waitall
> ## Found 1 errors
> =========================================================
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list