[mpich-discuss] Test of MPICH 3.1.3 on BlueGene/Q

Rob Latham robl at mcs.anl.gov
Tue Jan 19 10:59:59 CST 2016



On 01/17/2016 08:16 PM, Dominic Chien wrote:
> Hi,
>
> I have built MPICH 3.1.3 on Bluegene/Q, based on the following configuration
> ../configure --host=powerpc64-bgq-linux --with-device=pamid:BGQ --with-file-system=gpfs:BGQ --with-bgq-install-dir=/bgsys/drivers/V1R2M0/ppc64 --disable-wrapper-rpath --enable-fast=nochkmsg,notiming,O3 --with-assert-level=0 --disable-error-messages --disable-debuginfo --enable-thread-cs=per-object --with-atomic-primitives --enable-handle-allocation=tls --enable-refcount=lock-free --disable-predefined-refcount --with-cross-file=src/mpid/pamid/cross/bgq8 --prefix=/scratch/home/chiensh/apps/mpich/3.1.3-opt/ --disable-spawn
>
> The resultant mpich has passed most of the tests (679) and 9 are failed (see below), but I am not sure if these errors are critical.  Can anyone comment on this?
>

I don't know if MPICH ever passed 100% of the mpich tests on Blue Gene 
(maybe back in 1.5.1 days, but we had fewer tests then, too).

These 9 errors all look like something an application might run into: 
probing messages, truncated messages, RMA via the fortran interface, 
examining the status object.

If your application does any of those things I'd pay particular 
attention to the results.  It's entirely possible that your application 
won't touch the parts of MPICH that are not fully up to spec on Blue Gene.

So, I would say these errors are concerning, but not critical.  Press on 
and let us know how things go with your application!

==rob


> Many Thanks!
>
> Regards,
> Dominic Chien
>
> =========================================================
> not ok 283 - ./init/timeout 2
>    ---
>    Directory: ./init
>    File: timeout
>    Num-procs: 2
>    Date: "Wed Jan 13 13:55:50 2016"
>    ...
> ## Test output (expected 'No Errors'):
> ## srun returned a zero status but the program returned a nonzero status
> =========================================================
> =========================================================
> not ok 324 - ./pt2pt/mprobe 2
>    ---
>    Directory: ./pt2pt
>    File: mprobe
>    Num-procs: 2
>    Date: "Wed Jan 13 14:07:47 2016"
>    ...
> ## Test output (expected 'No Errors'):
> ## 2016-01-13 14:07:47.846 (WARN ) [0xfff8d988bb0] 78050:ibm.runjob.client.Job: terminated by signal 11
> ## 2016-01-13 14:07:47.846 (WARN ) [0xfff8d988bb0] 78050:ibm.runjob.client.Job: abnormal termination by signal 11 from rank 1
> =========================================================
> =========================================================
> not ok 538 - ./f77/rma/wingetf 5
>    ---
>    Directory: ./f77/rma
>    File: wingetf
>    Num-procs: 5
>    Date: "Wed Jan 13 16:28:10 2016"
>    ...
> ## Test output (expected 'No Errors'):
> ##  0  buf( 1 , 11 ) =  751  expected  251
> ##  0  buf( 2 , 11 ) =  752  expected  252
> ##  0  buf( 3 , 11 ) =  753  expected  253
> ##  0  buf( 4 , 11 ) =  754  expected  254
> ##  0  buf( 5 , 11 ) =  755  expected  255
> ##  0  buf( 6 , 11 ) =  756  expected  256
> ##  0  buf( 7 , 11 ) =  757  expected  257
> ##  0  buf( 8 , 11 ) =  758  expected  258
> ##  0  buf( 9 , 11 ) =  759  expected  259
> ##  0  buf( 10 , 11 ) =  760  expected  260
> ##   Found  25  errors
> =========================================================
> =========================================================
> not ok 640 - ./f90/rma/wingetf90 5
>    ---
>    Directory: ./f90/rma
>    File: wingetf90
>    Num-procs: 5
>    Date: "Wed Jan 13 18:10:17 2016"
>    ...
> ## Test output (expected 'No Errors'):
> ##  4  buf( 1 ,0) =  0  expected 976
> ##  4  buf( 2 ,0) =  24525328  expected 977
> ##  4  buf( 3 ,0) =  31  expected 978
> ##  4  buf( 4 ,0) =  -1073759872  expected 979
> ##  4  buf( 5 ,0) =  1107296292  expected 980
> ##  4  buf( 6 ,0) =  -1073758504  expected 981
> ##  4  buf( 7 ,0) =  0  expected 982
> ##  4  buf( 8 ,0) =  22184620  expected 983
> ##  4  buf( 9 ,0) =  0  expected 984
> ##  4  buf( 10 ,0) =  25808064  expected 985
> ##   Found  50  errors
> =========================================================
> =========================================================
> not ok 668 - ./errors/pt2pt/truncmsg1 2
>    ---
>    Directory: ./errors/pt2pt
>    File: truncmsg1
>    Num-procs: 2
>    Date: "Wed Jan 13 18:16:19 2016"
>    ...
> ## Test output (expected 'No Errors'):
> ## MPI_Recv (short) returned MPI_SUCCESS instead of truncated message
> ## MPI_Recv (irecv-short) returned MPI_SUCCESS instead of truncated message
> ## MPI_Recv (medium) returned MPI_SUCCESS instead of truncated message
> ##  Found 3 errors
> =========================================================
> =========================================================
> not ok 670 - ./errors/pt2pt/errinstatts 2
>    ---
>    Directory: ./errors/pt2pt
>    File: errinstatts
>    Num-procs: 2
>    Date: "Wed Jan 13 18:16:45 2016"
>    ...
> ## Test output (expected 'No Errors'):
> ## Did not get ERR_IN_STATUS in Testsome (outcount = 2, should equal 2); class returned was 0
> ##  Found 1 errors
> =========================================================
> =========================================================
> not ok 671 - ./errors/pt2pt/errinstatta 2
>    ---
>    Directory: ./errors/pt2pt
>    File: errinstatta
>    Num-procs: 2
>    Date: "Wed Jan 13 18:16:58 2016"
>    ...
> ## Test output (expected 'No Errors'):
> ## Did not get ERR_IN_STATUS in Testall
> ##  Found 1 errors
> =========================================================
> =========================================================
> not ok 672 - ./errors/pt2pt/errinstatws 2
>    ---
>    Directory: ./errors/pt2pt
>    File: errinstatws
>    Num-procs: 2
>    Date: "Wed Jan 13 18:17:11 2016"
>    ...
> ## Test output (expected 'No Errors'):
> ## Did not get ERR_IN_STATUS in Waitsome.  Got 0.
> ##  Found 1 errors
> =========================================================
> =========================================================
> not ok 673 - ./errors/pt2pt/errinstatwa 2
>    ---
>    Directory: ./errors/pt2pt
>    File: errinstatwa
>    Num-procs: 2
>    Date: "Wed Jan 13 18:17:23 2016"
>    ...
> ## Test output (expected 'No Errors'):
> ## Did not get ERR_IN_STATUS in Waitall
> ##  Found 1 errors
> =========================================================
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list