[mpich-discuss] MPI_Gather fails with 2048 processes and 4096 MB total

Balaji, Pavan balaji at anl.gov
Wed Dec 2 12:42:00 CST 2015


I believe Cray has a newer version based on mpich-3.2 that they are releasing soon-ish.  If you are a good customer, they might give you an engineering build to try out.  :-)

  -- Pavan

> On Dec 2, 2015, at 9:28 AM, Jeff Hammond <jeff.science at gmail.com> wrote:
> 
> This was fixed after Cray forked MPICH 3.1.2.  You should file a support issue with Cray via the appropriate channel.
> 
> # master
> jrhammon-mac01:git jrhammon$ git grep MPIC_Send
> src/include/mpiimpl.h:int MPIC_Send(const void *buf, MPI_Aint count, MPI_Datatype datatype, int dest, int tag,
> ...
> 
> # 3.1 branch
> jrhammon-mac01:git jrhammon$ git checkout 3.1.x
> jrhammon-mac01:git jrhammon$ git grep MPIC_Send
> src/include/mpiimpl.h:int MPIC_Send(const void *buf, int count, MPI_Datatype datatype, int dest, int tag,
> ...
> 
> Best,
> 
> Jeff
> 
> On Wed, Dec 2, 2015 at 6:40 AM, <Florian.Willich at dlr.de> wrote:
> >
> > Thanks for the "bash line", the output is:
> >
> > $ grep "define MPICH_VERSION" $CRAY_MPICH2_DIR/include/mpi.h
> > #define MPICH_VERSION "3.1.2"
> >
> > Florian
> >
> > ________________________________
> > Von: Jeff Hammond [jeff.science at gmail.com]
> > Gesendet: Mittwoch, 2. Dezember 2015 14:58
> > An: MPICH
> >
> > Betreff: Re: [mpich-discuss] MPI_Gather fails with 2048 processes and 4096 MB total
> >
> > 1) You should address this with Cray.  Their implementation of MPI has closed-source modifications to MPICH that we cannot analyze for count-safety.
> >
> > 2) You can discover the version of MPICH associated with Cray MPI like this:
> > jhammond at cori11:~> grep "define MPICH_VERSION" $CRAY_MPICH2_DIR/include/mpi.h
> > #define MPICH_VERSION "3.1.2"
> >
> > Best,
> >
> > Jeff
> >
> > On Wed, Dec 2, 2015 at 12:15 AM, <Florian.Willich at dlr.de> wrote:
> >>
> >> Hi Rob,
> >>
> >> well maybe I was addressing the wrong organisation... I am currently testing on the Cray swan super computer which provides the module cray-mpich/7.2.6 ("Cray Message Passing Toolkit 7.2.6").
> >>
> >> I can not determine whether the cray mpich version is mpich with additional implementations or if it is totally different from the official mpich releases. Additioanlly, I can not figure out on which mpich version this cray-mpich module is based on. I'll continue investigation and keep you updated.
> >>
> >> Best Regards
> >>
> >> Florian
> >> ________________________________________
> >> Von: Rob Latham [robl at mcs.anl.gov]
> >> Gesendet: Dienstag, 1. Dezember 2015 16:48
> >> An: discuss at mpich.org
> >> Betreff: Re: [mpich-discuss] MPI_Gather fails with 2048 processes and 4096 MB total
> >>
> >> On 11/26/2015 12:38 PM, Archer, Charles J wrote:
> >> > FYI, we hit various flavors of this problem when I was still at IBM, I think mostly in weather codes.
> >> > Apparently Cray hit this too:
> >> >
> >> > https://trac.mpich.org/projects/mpich/ticket/1767
> >> >
> >> > We pretty much told our customers back then that a fix was forthcoming (with no ETA :) )with the revamp of datatypes to use internal 64-bit counts.
> >> > We also provided workarounds.
> >> >
> >> > In the case of this gather operation, we asked the customer to implement gather as a flat tree using point to point.
> >> > Root posts irecvs, then barrier, children send to root.
> >> >
> >> > IRC, the giant gather we were debugging was at the very end of the application and used to gather some statistics for IO at the root, so it wasn’t critical to perform well.
> >> > I also attempted a workaround using some derived datatypes, but I hit another truncation in the datatype code itself :\
> >> > I should see if I can dig up that implementation and make sure it isn’t still broken for large counts.
> >>
> >> those are all fine approaches to work around the problem.  the internals
> >> of MPICH, though, need to be 64 bit clean -- there are still 4500 places
> >> where clang warns of a 64 bit value being assigned to a 32 bit type.
> >>
> >> Florian Willich, what version of MPICH is this?   The line numbers in
> >> the back trace don't match up with what I've got, and
> >> I really thought we fixed this class of bug with commits 31d95ed7b18c
> >> and 68f8c7aa7 over the summer.
> >>
> >> ==rob
> >>
> >>
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Nov 26, 2015, at 12:14 PM, Balaji, Pavan <balaji at anl.gov<mailto:balaji at anl.gov>> wrote:
> >> >
> >> >
> >> > Thanks for reporting.  This looks like an integer-overflow issue, which fails when the summation of data elements from all processes is larger than INT_MAX (2 billion).  We'll look into it.  I've created a ticket for it, and added you as the reporter, so you'll get notified as they are updates.
> >> >
> >> > http://trac.mpich.org/projects/mpich/ticket/2317
> >> >
> >> > Rob: can you create a simple test program for this and add it to the test bucket, so it shows up on the nightlies?
> >> >
> >> > Thanks,
> >> >
> >> >   -- Pavan
> >> >
> >> > On Nov 26, 2015, at 10:18 AM, Florian.Willich at dlr.de wrote:
> >> >
> >> > Dear mpich discussion group,
> >> >
> >> > the following issue appeared when running some benchmarks with MPI Gather:
> >> >
> >> > Gathering data (calling MPI_Gather(...) ) involing 2048 processes and 2 MB of data (4096 MB total) that each process sends fails with the following output:
> >> > ____________________________
> >> >
> >> > Rank 1024 [Thu Nov 26 09:43:16 2015] [c1-0c1s12n3] Fatal error in PMPI_Gather: Invalid count, error stack:
> >> > PMPI_Gather(959)......: MPI_Gather(sbuf=0x2aaab826c010, scount=524288, MPI_INT, rbuf=(nil), rcount=524288, MPI_INT, root=0, MPI_COMM_WORLD) failed
> >> > MPIR_Gather_impl(775).:
> >> > MPIR_Gather(735)......:
> >> > MPIR_Gather_intra(347):
> >> > MPIC_Send(360)........: Negative count, value is -2147483648
> >> > _pmiu_daemon(SIGCHLD): [NID 00307] [c1-0c1s12n3] [Thu Nov 26 09:43:16 2015] PE RANK 1024 exit signal Aborted
> >> > [NID 00307] 2015-11-26 09:43:16 Apid 949450: initiated application termination
> >> > Application 949450 exit codes: 134
> >> > Application 949450 exit signals: Killed
> >> > Application 949450 resources: utime ~1s, stime ~137s, Rss ~2110448, inblocks ~617782, outblocks ~1659320
> >> > ____________________________
> >> >
> >> > The following are some tests that I ran to better understand the problem:
> >> >
> >> > 2047 processes - 2 MB (4094 MB total) -> works!
> >> >
> >> > 2048 processes - 2047.5 KB (~1.999512 MB) (4095 MB total) -> works!
> >> >
> >> > 2048 processes - 3 MB (6144 MB total) -> fails:
> >> > ____________________________
> >> >
> >> > Rank 1024 [Thu Nov 26 09:41:15 2015] [c1-0c1s12n3] Fatal error in PMPI_Gather: Invalid count, error stack:
> >> > PMPI_Gather(959)......: MPI_Gather(sbuf=0x2aaab826c010, scount=786432, MPI_INT, rbuf=(nil), rcount=786432, MPI_INT, root=0, MPI_COMM_WORLD) failed
> >> > MPIR_Gather_impl(775).:
> >> > MPIR_Gather(735)......:
> >> > MPIR_Gather_intra(347):
> >> > MPIC_Send(360)........: Negative count, value is -1073741824
> >> > _pmiu_daemon(SIGCHLD): [NID 00307] [c1-0c1s12n3] [Thu Nov 26 09:41:15 2015] PE RANK 1024 exit signal Aborted
> >> > [NID 00307] 2015-11-26 09:41:15 Apid 949448: initiated application termination
> >> > Application 949448 exit codes: 134
> >> > Application 949448 exit signals: Killed
> >> > Application 949448 resources: utime ~1s, stime ~139s, Rss ~3159984, inblocks ~617782, outblocks ~1659351
> >> > ____________________________
> >> >
> >> > 2047 processes - 3 MB (6141 MB total) -> fails:
> >> > ____________________________
> >> >
> >> > Rank 1024 [Thu Nov 26 09:40:31 2015] [c1-0c1s12n3] Fatal error in PMPI_Gather: Invalid count, error stack:
> >> > PMPI_Gather(959)......: MPI_Gather(sbuf=0x2aaab826c010, scount=786432, MPI_INT, rbuf=(nil), rcount=786432, MPI_INT, root=0, MPI_COMM_WORLD) failed
> >> > MPIR_Gather_impl(775).:
> >> > MPIR_Gather(735)......:
> >> > MPIR_Gather_intra(347):
> >> > MPIC_Send(360)........: Negative count, value is -1076887552
> >> > _pmiu_daemon(SIGCHLD): [NID 00307] [c1-0c1s12n3] [Thu Nov 26 09:40:32 2015] PE RANK 1024 exit signal Aborted
> >> > [NID 00307] 2015-11-26 09:40:32 Apid 949446: initiated application termination
> >> > Application 949446 exit codes: 134
> >> > Application 949446 exit signals: Killed
> >> > Application 949446 resources: utime ~1s, stime ~134s, Rss ~3157072, inblocks ~617780, outblocks ~1659351
> >> > ____________________________
> >> >
> >> > 8 processes - 625 MB (5000 MB total) -> works!
> >> >
> >> > I can think of some pitfalls that might cause this issue but I do not have the knowledge of the internally called routines to check them. Is someone familier with the implementation of MPI_Gather(...) and willing to help me?
> >> >
> >> > Best regards
> >> >
> >> > Florian
> >> >
> >> > Deutsches Zentrum für Luft- und Raumfahrt e. V. (DLR)
> >> > German Aerospace Center
> >> > Institute of Planetary Research | Planetary Physics | Rutherfordstraße 2 | 12489 Berlin
> >> >
> >> > Florian Willich| Intern - Software Developer (Parallel Applications)
> >> > florian.willlich at dlr.de
> >> > DLR.de
> >> > _______________________________________________
> >> > discuss mailing list     discuss at mpich.org
> >> > To manage subscription options or unsubscribe:
> >> > https://lists.mpich.org/mailman/listinfo/discuss
> >> >
> >> > _______________________________________________
> >> > discuss mailing list     discuss at mpich.org
> >> > To manage subscription options or unsubscribe:
> >> > https://lists.mpich.org/mailman/listinfo/discuss
> >> >
> >> > _______________________________________________
> >> > discuss mailing list     discuss at mpich.org
> >> > To manage subscription options or unsubscribe:
> >> > https://lists.mpich.org/mailman/listinfo/discuss
> >> >
> >>
> >> --
> >> Rob Latham
> >> Mathematics and Computer Science Division
> >> Argonne National Lab, IL USA
> >> _______________________________________________
> >> discuss mailing list     discuss at mpich.org
> >> To manage subscription options or unsubscribe:
> >> https://lists.mpich.org/mailman/listinfo/discuss
> >> _______________________________________________
> >> discuss mailing list     discuss at mpich.org
> >> To manage subscription options or unsubscribe:
> >> https://lists.mpich.org/mailman/listinfo/discuss
> >
> >
> >
> >
> > --
> > Jeff Hammond
> > jeff.science at gmail.com
> > http://jeffhammond.github.io/
> >
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> 
> 
> 
> 
> --
> Jeff Hammond
> jeff.science at gmail.com
> http://jeffhammond.github.io/
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss

_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list