[mpich-discuss] MPICH 3.2 on BlueGene/Q

pramod kumbhar pramod.s.kumbhar at gmail.com
Thu Feb 25 13:04:02 CST 2016


Dear All,

I came across below thread in the archives about mpich 3.2 on BG-Q.

I am testing non-blocking collectives, i/o functions on cluster and
would like to do the same on BG-Q. I have following questions:

1. Last email from Dominic suggest that last "compilable" version
doesn't support async progress. Is there any version that has
non-blocking support and compiles on bg-q? (the fork from Rob?)

2. Do I have to consider anything specific on bg-q while benchmarking
non-blocking functions (from mpi-3) ?

Thanks in advance!

Regards,

Pramod

p.s. I am copying the email thread from archive, not sure if this will
be delivered to the correct thread...


Hi All,

Here is an update:

MPICH 3.1.3 is the last version that passed the nonblocking test, even
without setting PAMID_THREAD_MULTIPLE. However, setting
PAMID_ASYNC_PROGRESS=1 will cause error.(Abort(1) on node 7: 'locking'
async progress not applicable...)

[chiensh at cumulus <https://lists.mpich.org/mailman/listinfo/discuss>
coll.bak]$ which mpif90
~/apps/mpich/3.1.3/bin/mpif90
[chiensh at cumulus <https://lists.mpich.org/mailman/listinfo/discuss>
coll.bak]$ make nonblocking
  CC       nonblocking.o
  CCLD     nonblocking
[chiensh at cumulus <https://lists.mpich.org/mailman/listinfo/discuss>
coll.bak]$ srun -n 2 ./nonblocking
 No errors
[chiensh at cumulus <https://lists.mpich.org/mailman/listinfo/discuss>
coll.bak]$ srun -n 4 ./nonblocking
 No errors
[chiensh at cumulus <https://lists.mpich.org/mailman/listinfo/discuss>
coll.bak]$ srun -n 16 ./nonblocking
 No errors

Thanks all!

Regards,
Dominic


On 11 Jan, 2016, at 2:29 pm, Dominic Chien <chiensh.acrc at gmail.com
<https://lists.mpich.org/mailman/listinfo/discuss>> wrote:

>* Thank you Jeff and Halim,
*> >* Halim, I have tried 3.1.4  but it does not return 0 (error) when
the program is finished, e.g. for a helloworld program
*>* ==================================================================
*>*   program hello
*>*   include 'mpif.h'
*>*   integer rank, size, ierror, tag, status(MPI_STATUS_SIZE)
*> >*   call MPI_INIT(ierror)
*>*   call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror)
*>*   call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror)
*>*   print*, 'node', rank, ': Hello world'
*>*   call MPI_FINALIZE(ierror)
*>*   end
*>* ==================================================================
*> >* Using MPI 3.1.rc4
*>* ==================================================================
*>* [chiensh at cumulus
<https://lists.mpich.org/mailman/listinfo/discuss> test]$ which mpif90
*>* ~/apps/mpich/3.1.rc4/bin/mpif90
*>* [chiensh at cumulus
<https://lists.mpich.org/mailman/listinfo/discuss> test]$ srun -n 2
./a.out
*>* node 1 : Hello world
*>* node 0 : Hello world
*>* [chiensh at cumulus
<https://lists.mpich.org/mailman/listinfo/discuss> test]$
*>* ==================================================================
*>* Using MPI 3.1.4
*>* ==================================================================
*>* [chiensh at cumulus
<https://lists.mpich.org/mailman/listinfo/discuss> test]$ which mpif90
*>* ~/apps/mpich/3.1.4/bin/mpif90
*>* [chiensh at cumulus
<https://lists.mpich.org/mailman/listinfo/discuss> test]$ srun -n 2
./a.out
*>* node 1 : Hello world
*>* node 0 : Hello world
*>* 2016-01-11 14:24:25.968 (WARN ) [0xfff7ef48b10]
75532:ibm.runjob.client.Job: terminated by signal 11
*>* 2016-01-11 14:24:25.968 (WARN ) [0xfff7ef48b10]
75532:ibm.runjob.client.Job: abnormal termination by signal 11 from
rank 1
*>* ==================================================================
*> > >* Jeff, after I set PAMID_THREAD_MULTIPLE=1 and
PAMID_ASYNC_PROGRESS=1, it seems to have some "improvement":
nonblocking test can run for up to 4 processes sometime, but sometime
it just get a "deadlock", (see below)
*>* ==========================================================
*>* [chiensh at cumulus
<https://lists.mpich.org/mailman/listinfo/discuss> coll.bak]$ srun
--nodes=4 --ntasks-per-node=1 nonblocking
*>* MPIDI_Process.*
*>*  verbose               : 2
*>*  statistics            : 1
*>*  contexts              : 32
*>*  async_progress        : 1
*>*  context_post          : 1
*>*  pt2pt.limits
*>*    application
*>*      eager
*>*        remote, local   : 4097, 4097
*>*      short
*>*        remote, local   : 113, 113
*>*    internal
*>*      eager
*>*        remote, local   : 4097, 4097
*>*      short
*>*        remote, local   : 113, 113
*>*  rma_pending           : 1000
*>*  shmem_pt2pt           : 1
*>*  disable_internal_eager_scale : 524288
*>*  optimized.collectives : 0
*>*  optimized.select_colls: 2
*>*  optimized.subcomms    : 1
*>*  optimized.memory      : 0
*>*  optimized.num_requests: 1
*>*  mpir_nbc              : 1
*>*  numTasks              : 4
*>* mpi thread level        : 'MPI_THREAD_SINGLE'
*>* MPIU_THREAD_GRANULARITY : 'per object'
*>* ASSERT_LEVEL            : 0
*>* MPICH_LIBDIR           : not defined
*>* The following MPICH_* environment variables were specified:
*>* The following PAMID_* environment variables were specified:
*>*  PAMID_STATISTICS=1
*>*  PAMID_ASYNC_PROGRESS=1
*>*  PAMID_THREAD_MULTIPLE=1
*>*  PAMID_VERBOSE=2
*>* The following PAMI_* environment variables were specified:
*>* The following COMMAGENT_* environment variables were specified:
*>* The following MUSPI_* environment variables were specified:
*>* The following BG_* environment variables were specified:
*>* No errors
*>* ==========================================================
*>* [chiensh at cumulus
<https://lists.mpich.org/mailman/listinfo/discuss> coll.bak]$ srun
--nodes=4 --ntasks-per-node=1 nonblocking
*>* MPIDI_Process.*
*>*  verbose               : 2
*>*  statistics            : 1
*>*  contexts              : 32
*>*  async_progress        : 1
*>*  context_post          : 1
*>*  pt2pt.limits
*>*    application
*>*      eager
*>*        remote, local   : 4097, 4097
*>*      short
*>*        remote, local   : 113, 113
*>*    internal
*>*      eager
*>*        remote, local   : 4097, 4097
*>*      short
*>*        remote, local   : 113, 113
*>*  rma_pending           : 1000
*>*  shmem_pt2pt           : 1
*>*  disable_internal_eager_scale : 524288
*>*  optimized.collectives : 0
*>*  optimized.select_colls: 2
*>*  optimized.subcomms    : 1
*>*  optimized.memory      : 0
*>*  optimized.num_requests: 1
*>*  mpir_nbc              : 1
*>*  numTasks              : 4
*>* mpi thread level        : 'MPI_THREAD_SINGLE'
*>* MPIU_THREAD_GRANULARITY : 'per object'
*>* ASSERT_LEVEL            : 0
*>* MPICH_LIBDIR           : not defined
*>* The following MPICH_* environment variables were specified:
*>* The following PAMID_* environment variables were specified:
*>*  PAMID_STATISTICS=1
*>*  PAMID_ASYNC_PROGRESS=1
*>*  PAMID_THREAD_MULTIPLE=1
*>*  PAMID_VERBOSE=2
*>* The following PAMI_* environment variables were specified:
*>* The following COMMAGENT_* environment variables were specified:
*>* The following MUSPI_* environment variables were specified:
*>* The following BG_* environment variables were specified:
*>* (never return from here)
*>* ==========================================================
*> >* Thanks!
*> >* Regards,
*>* Dominic
*> > >* On 11 Jan, 2016, at 12:08 pm, Halim Amer <aamer at anl.gov
<https://lists.mpich.org/mailman/listinfo/discuss>> wrote:
*> >>* Dominic,
*>> >>* There were a bunch of fixes that went to PAMID since v3.1rc4.
You could try a release from the 3.1 series (i.e. from 3.1 through
3.1.4).
*>> >>* Regards,
*>>* --Halim
*>> >>* www.mcs.anl.gov/~aamer <http://www.mcs.anl.gov/~aamer>
*>* On 11 Jan, 2016, at 11:30 am, Jeff Hammond <jeff.science at
gmail.com <https://lists.mpich.org/mailman/listinfo/discuss>> wrote:
*>>* I recall MPI-3 RMA on BGQ deadlocks if you set
PAMID_THREAD_MULTIPLE (please see ALCF MPI docs to verify exact name),
which is required for async progress.
*>> >>* ARMCI-MPI test suite is one good way to validate MPI-3 RMA is working.
*>> >>* Jeff
*>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20160225/41f39f95/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list