[mpich-discuss] MPICH 3.2 on BlueGene/Q

Jeff Hammond jeff.science at gmail.com
Sun Jan 10 21:30:04 CST 2016


I recall MPI-3 RMA on BGQ deadlocks if you set PAMID_THREAD_MULTIPLE
(please see ALCF MPI docs to verify exact name), which is required for
async progress.

ARMCI-MPI test suite is one good way to validate MPI-3 RMA is working.

Jeff

On Sunday, January 10, 2016, Dominic Chien <chiensh.acrc at gmail.com> wrote:

> Hi Rob and Pavan,
>
> Thank you for your clarification.
>
> If the MPI-3 functionalities works correctly for version 3.1rc4 on BG/Q,
> I may not need version 3.2. Have all MPI-3 features being tested and
> verified on BGQ? I just found that 3.1rc4 hang like a "deadlock" for  for
> nonblocking test when n >2 . (maybe I should open another ticket for this
> issue.)
>
> [chiensh at cumulus coll.bak]$ srun -n 2 nonblocking
>  No errors
> [chiensh at cumulus coll.bak]$ srun -n 4 nonblocking
> (never return)
>
> Thanks!
>
> Regards,
> Dominic
>
> On 9 Jan, 2016, at 12:14 am, Rob Latham <robl at mcs.anl.gov <javascript:;>>
> wrote:
>
> >
> >
> > On 01/08/2016 10:13 AM, Rob Latham wrote:
> >
> >> If there are 3.2 features you'd like to cherry pick into this fork, and
> >> they don't touch the device interface, or depend on patches that do,
> >> then we can definitely do that.
> >>
> >> If you are working on Argonne's Blue Gene, I built and installed this
> >> fork to /soft/libraries/unsupported/mpich
> >>
> >> Let me know if you run into any problems.
> >
> > Oh, and as Pavan says, this is unsupported which means I'd like to know
> if you have problems, but I cannot promise that I'll be able to spend a lot
> of time debugging them.
> >
> > ==rob
> >
> >
> >>> Regards,
> >>> Dominic
> >>>
> >>> Here is the environment variables
> >>>
> ====================================================================================
> >>>
> >>> export CC=/opt/ibmcmp/vac/bg/12.1/bin/bgxlc_r
> >>> export CXX=/opt/ibmcmp/vacpp/bg/12.1/bin/bgxlC_r
> >>> export F77=/opt/ibmcmp/xlf/bg/14.1/bin/bgxlf_r
> >>> export FC=/opt/ibmcmp/xlf/bg/14.1/bin/bgxlf90_r
> >>> export
> >>> AR=/bgsys/drivers/V1R2M0/ppc64/gnu-linux/powerpc64-bgq-linux/bin/ar
> >>> export
> >>> LD=/bgsys/drivers/V1R2M0/ppc64/gnu-linux/powerpc64-bgq-linux/bin/ld
> >>> export
> >>>
> RANLIB=/bgsys/drivers/V1R2M0/ppc64/gnu-linux/powerpc64-bgq-linux/bin/ranlib
> >>>
> >>> export MPICHLIB_CXXFLAGS="-qhot -qinline=800 -qflag=i:i -qsaveopt
> >>> -qsuppress=1506-236"
> >>> export MPICHLIB_CFLAGS=${MPICHLIB_CXXFLAGS}
> >>> export MPICHLIB_FFLAGS=${MPICHLIB_CXXFLAGS}
> >>> export MPICHLIB_F90FLAGS=${MPICHLIB_CXXFLAGS}
> >>>
> ====================================================================================
> >>>
> >>>
> >>> Here is the configure
> >>>
> ====================================================================================
> >>>
> >>> ../configure --host=powerpc64-bgq-linux --with-device=pamid
> >>> --with-file-system=gpfs:BGQ
> >>> --with-bgq-install-dir=/bgsys/drivers/V1R2M0/ppc64
> >>> --with-pami=/bgsys/drivers/V1R2M0/ppc64/comm/sys
> >>> --with-pami-include=/bgsys/drivers/V1R2M0/ppc64/comm/sys/include
> >>> --with-pami-lib=/bgsys/drivers/V1R2M0/ppc64/comm/sys/lib
> >>> --disable-wrapper-rpath --enable-fast=nochkmsg,notiming,O3
> >>> --with-assert-level=0 --disable-error-messages --disable-debuginfo
> >>> --enable-thread-cs=per-object --with-atomic-primitives
> >>> --enable-handle-allocation=tls --enable-refcount=lock-free
> >>> --disable-predefined-refcount
> >>> --with-cross-file=src/mpid/pamid/cross/bgq8
> >>> --prefix=/scratch/home/chiensh/apps/mpich/3.2.rc2
> >>>
> ====================================================================================
> >>>
> >>>
> >>> Here is the make log
> >>>
> ====================================================================================
> >>>
> >>> ...
> >>>  CC       src/mpi/attr/lib_libmpi_la-attr_delete.lo
> >>>
> "/scratch/home/chiensh/mpich/mpich-3.2rc2/src/mpid/pamid/include/mpidi_thread.h",
> >>> line 64.9: 1506-358 (I) "MPIU_THREAD_CS_ENTER" is defined on line 80
> >>> of ../src/util/thread/mpiu_thread_multiple.h.
> >>>
> "/scratch/home/chiensh/mpich/mpich-3.2rc2/src/mpid/pamid/include/mpidi_thread.h",
> >>> line 65.9: 1506-358 (I) "MPIU_THREAD_CS_EXIT" is defined on line 81 of
> >>> ../src/util/thread/mpiu_thread_multiple.h.
> >>>
> "/scratch/home/chiensh/mpich/mpich-3.2rc2/src/mpid/pamid/include/mpidi_thread.h",
> >>> line 66.9: 1506-358 (I) "MPIU_THREAD_CS_YIELD" is defined on line 82
> >>> of ../src/util/thread/mpiu_thread_multiple.h.
> >>> "../src/include/mpiimpl.h", line 1184.5: 1506-046 (S) Syntax error.
> >>> "../src/include/mpiimpl.h", line 1481.5: 1506-046 (S) Syntax error.
> >>> "../src/include/mpiimpl.h", line 1636.5: 1506-046 (S) Syntax error.
> >>> "../src/include/mpiimpl.h", line 2785.5: 1506-343 (S) Redeclaration of
> >>> MPID_Send differs from previous declaration on line 2760 of
> >>> "../src/include/mpiimpl.h".
> >>> "../src/include/mpiimpl.h", line 2785.5: 1506-377 (I) The type "int"
> >>> of parameter 2 differs from the previous type "long".
> >>> "../src/include/mpiimpl.h", line 2884.5: 1506-343 (S) Redeclaration of
> >>> MPID_Isend differs from previous declaration on line 2865 of
> >>> "../src/include/mpiimpl.h".
> >>> "../src/include/mpiimpl.h", line 2884.5: 1506-377 (I) The type "int"
> >>> of parameter 2 differs from the previous type "long".
> >>> "../src/include/mpitimpl.h", line 245.5: 1506-046 (S) Syntax error.
> >>> "../src/include/mpitimpl.h", line 900.40: 1506-022 (S) "total" is not
> >>> a member of "struct {...}".
> >>>
> "/scratch/home/chiensh/mpich/mpich-3.2rc2/src/mpid/pamid/include/../src/mpid_request.h",
> >>> line 156.21: 1506-022 (S) "cc" is not a member of "struct
> MPID_Request".
> >>>
> "/scratch/home/chiensh/mpich/mpich-3.2rc2/src/mpid/pamid/include/../src/mpid_request.h",
> >>> line 157.8: 1506-022 (S) "cc_ptr" is not a member of "struct
> >>> MPID_Request".
> >>>
> "/scratch/home/chiensh/mpich/mpich-3.2rc2/src/mpid/pamid/include/../src/mpid_request.h",
> >>> line 157.23: 1506-022 (S) "cc" is not a member of "struct
> MPID_Request".
> >>>
> "/scratch/home/chiensh/mpich/mpich-3.2rc2/src/mpid/pamid/include/../src/mpid_request.h",
> >>> line 283.62: 1506-099 (S) Unexpected argument.
> >>>
> "/scratch/home/chiensh/mpich/mpich-3.2rc2/src/mpid/pamid/include/../src/mpid_request.h",
> >>> line 326.23: 1506-022 (S) "cc_ptr" is not a member of "struct
> >>> MPID_Request".
> >>>
> "/scratch/home/chiensh/mpich/mpich-3.2rc2/src/mpid/pamid/include/../src/mpid_request.h",
> >>> line 341.23: 1506-022 (S) "cc_ptr" is not a member of "struct
> >>> MPID_Request".
> >>>
> "/scratch/home/chiensh/mpich/mpich-3.2rc2/src/mpid/pamid/include/mpidpost.h",
> >>> line 56.44: 1506-022 (S) "vcr" is not a member of "struct MPID_Comm".
> >>>
> "/scratch/home/chiensh/mpich/mpich-3.2rc2/src/mpid/pamid/include/mpidpost.h",
> >>> line 83.43: 1506-022 (S) "vcrt" is not a member of "struct MPID_Comm".
> >>>
> "/scratch/home/chiensh/mpich/mpich-3.2rc2/src/mpid/pamid/include/mpidpost.h",
> >>> line 84.37: 1506-022 (S) "vcrt" is not a member of "struct MPID_Comm".
> >>>
> "/scratch/home/chiensh/mpich/mpich-3.2rc2/src/mpid/pamid/include/mpidpost.h",
> >>> line 84.57: 1506-022 (S) "vcr" is not a member of "struct MPID_Comm".
> >>>
> "/scratch/home/chiensh/mpich/mpich-3.2rc2/src/mpid/pamid/include/mpidpost.h",
> >>> line 91.42: 1506-022 (S) "vcr" is not a member of "struct MPID_Comm".
> >>>
> "/scratch/home/chiensh/mpich/mpich-3.2rc2/src/mpid/pamid/include/mpidpost.h",
> >>> line 92.41: 1506-022 (S) "vcr" is not a member of "struct MPID_Comm".
> >>> "../src/mpi/attr/attr_delete.c", line 65.26: 1506-045 (S) Undeclared
> >>> identifier GLOBAL.
> >>> make[2]: *** [src/mpi/attr/lib_libmpi_la-attr_delete.lo] Error 1
> >>> make[2]: Leaving directory
> >>> `/scratch/home/chiensh/mpich/mpich-3.2rc2/build'
> >>> make[1]: *** [all-recursive] Error 1
> >>> make[1]: Leaving directory
> >>> `/scratch/home/chiensh/mpich/mpich-3.2rc2/build'
> >>> make: *** [all] Error 2
> >>>
> ====================================================================================
> >>>
> >>> _______________________________________________
> >>> discuss mailing list     discuss at mpich.org <javascript:;>
> >>> To manage subscription options or unsubscribe:
> >>> https://lists.mpich.org/mailman/listinfo/discuss
> >>>
> >> _______________________________________________
> >> discuss mailing list     discuss at mpich.org <javascript:;>
> >> To manage subscription options or unsubscribe:
> >> https://lists.mpich.org/mailman/listinfo/discuss
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org <javascript:;>
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org <javascript:;>
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>


-- 
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20160110/a2e46a29/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list