[mpich-discuss] MPI_Gather fails with 2048 processes and 4096 MB total

Jeff Hammond jeff.science at gmail.com
Wed Dec 2 07:58:02 CST 2015


1) You should address this with Cray.  Their implementation of MPI has
closed-source modifications to MPICH that we cannot analyze for
count-safety.

2) You can discover the version of MPICH associated with Cray MPI like this:
jhammond at cori11:~> grep "define MPICH_VERSION"
$CRAY_MPICH2_DIR/include/mpi.h
#define MPICH_VERSION "3.1.2"

Best,

Jeff

On Wed, Dec 2, 2015 at 12:15 AM, <Florian.Willich at dlr.de> wrote:

> Hi Rob,
>
> well maybe I was addressing the wrong organisation... I am currently
> testing on the Cray swan super computer which provides the module
> cray-mpich/7.2.6 ("Cray Message Passing Toolkit 7.2.6").
>
> I can not determine whether the cray mpich version is mpich with
> additional implementations or if it is totally different from the official
> mpich releases. Additioanlly, I can not figure out on which mpich version
> this cray-mpich module is based on. I'll continue investigation and keep
> you updated.
>
> Best Regards
>
> Florian
> ________________________________________
> Von: Rob Latham [robl at mcs.anl.gov]
> Gesendet: Dienstag, 1. Dezember 2015 16:48
> An: discuss at mpich.org
> Betreff: Re: [mpich-discuss] MPI_Gather fails with 2048 processes and 4096
> MB total
>
> On 11/26/2015 12:38 PM, Archer, Charles J wrote:
> > FYI, we hit various flavors of this problem when I was still at IBM, I
> think mostly in weather codes.
> > Apparently Cray hit this too:
> >
> > https://trac.mpich.org/projects/mpich/ticket/1767
> >
> > We pretty much told our customers back then that a fix was forthcoming
> (with no ETA :) )with the revamp of datatypes to use internal 64-bit counts.
> > We also provided workarounds.
> >
> > In the case of this gather operation, we asked the customer to implement
> gather as a flat tree using point to point.
> > Root posts irecvs, then barrier, children send to root.
> >
> > IRC, the giant gather we were debugging was at the very end of the
> application and used to gather some statistics for IO at the root, so it
> wasn’t critical to perform well.
> > I also attempted a workaround using some derived datatypes, but I hit
> another truncation in the datatype code itself :\
> > I should see if I can dig up that implementation and make sure it isn’t
> still broken for large counts.
>
> those are all fine approaches to work around the problem.  the internals
> of MPICH, though, need to be 64 bit clean -- there are still 4500 places
> where clang warns of a 64 bit value being assigned to a 32 bit type.
>
> Florian Willich, what version of MPICH is this?   The line numbers in
> the back trace don't match up with what I've got, and
> I really thought we fixed this class of bug with commits 31d95ed7b18c
> and 68f8c7aa7 over the summer.
>
> ==rob
>
>
> >
> >
> >
> >
> >
> > On Nov 26, 2015, at 12:14 PM, Balaji, Pavan <balaji at anl.gov<mailto:
> balaji at anl.gov>> wrote:
> >
> >
> > Thanks for reporting.  This looks like an integer-overflow issue, which
> fails when the summation of data elements from all processes is larger than
> INT_MAX (2 billion).  We'll look into it.  I've created a ticket for it,
> and added you as the reporter, so you'll get notified as they are updates.
> >
> > http://trac.mpich.org/projects/mpich/ticket/2317
> >
> > Rob: can you create a simple test program for this and add it to the
> test bucket, so it shows up on the nightlies?
> >
> > Thanks,
> >
> >   -- Pavan
> >
> > On Nov 26, 2015, at 10:18 AM, Florian.Willich at dlr.de wrote:
> >
> > Dear mpich discussion group,
> >
> > the following issue appeared when running some benchmarks with MPI
> Gather:
> >
> > Gathering data (calling MPI_Gather(...) ) involing 2048 processes and 2
> MB of data (4096 MB total) that each process sends fails with the following
> output:
> > ____________________________
> >
> > Rank 1024 [Thu Nov 26 09:43:16 2015] [c1-0c1s12n3] Fatal error in
> PMPI_Gather: Invalid count, error stack:
> > PMPI_Gather(959)......: MPI_Gather(sbuf=0x2aaab826c010, scount=524288,
> MPI_INT, rbuf=(nil), rcount=524288, MPI_INT, root=0, MPI_COMM_WORLD) failed
> > MPIR_Gather_impl(775).:
> > MPIR_Gather(735)......:
> > MPIR_Gather_intra(347):
> > MPIC_Send(360)........: Negative count, value is -2147483648
> > _pmiu_daemon(SIGCHLD): [NID 00307] [c1-0c1s12n3] [Thu Nov 26 09:43:16
> 2015] PE RANK 1024 exit signal Aborted
> > [NID 00307] 2015-11-26 09:43:16 Apid 949450: initiated application
> termination
> > Application 949450 exit codes: 134
> > Application 949450 exit signals: Killed
> > Application 949450 resources: utime ~1s, stime ~137s, Rss ~2110448,
> inblocks ~617782, outblocks ~1659320
> > ____________________________
> >
> > The following are some tests that I ran to better understand the problem:
> >
> > 2047 processes - 2 MB (4094 MB total) -> works!
> >
> > 2048 processes - 2047.5 KB (~1.999512 MB) (4095 MB total) -> works!
> >
> > 2048 processes - 3 MB (6144 MB total) -> fails:
> > ____________________________
> >
> > Rank 1024 [Thu Nov 26 09:41:15 2015] [c1-0c1s12n3] Fatal error in
> PMPI_Gather: Invalid count, error stack:
> > PMPI_Gather(959)......: MPI_Gather(sbuf=0x2aaab826c010, scount=786432,
> MPI_INT, rbuf=(nil), rcount=786432, MPI_INT, root=0, MPI_COMM_WORLD) failed
> > MPIR_Gather_impl(775).:
> > MPIR_Gather(735)......:
> > MPIR_Gather_intra(347):
> > MPIC_Send(360)........: Negative count, value is -1073741824
> > _pmiu_daemon(SIGCHLD): [NID 00307] [c1-0c1s12n3] [Thu Nov 26 09:41:15
> 2015] PE RANK 1024 exit signal Aborted
> > [NID 00307] 2015-11-26 09:41:15 Apid 949448: initiated application
> termination
> > Application 949448 exit codes: 134
> > Application 949448 exit signals: Killed
> > Application 949448 resources: utime ~1s, stime ~139s, Rss ~3159984,
> inblocks ~617782, outblocks ~1659351
> > ____________________________
> >
> > 2047 processes - 3 MB (6141 MB total) -> fails:
> > ____________________________
> >
> > Rank 1024 [Thu Nov 26 09:40:31 2015] [c1-0c1s12n3] Fatal error in
> PMPI_Gather: Invalid count, error stack:
> > PMPI_Gather(959)......: MPI_Gather(sbuf=0x2aaab826c010, scount=786432,
> MPI_INT, rbuf=(nil), rcount=786432, MPI_INT, root=0, MPI_COMM_WORLD) failed
> > MPIR_Gather_impl(775).:
> > MPIR_Gather(735)......:
> > MPIR_Gather_intra(347):
> > MPIC_Send(360)........: Negative count, value is -1076887552
> > _pmiu_daemon(SIGCHLD): [NID 00307] [c1-0c1s12n3] [Thu Nov 26 09:40:32
> 2015] PE RANK 1024 exit signal Aborted
> > [NID 00307] 2015-11-26 09:40:32 Apid 949446: initiated application
> termination
> > Application 949446 exit codes: 134
> > Application 949446 exit signals: Killed
> > Application 949446 resources: utime ~1s, stime ~134s, Rss ~3157072,
> inblocks ~617780, outblocks ~1659351
> > ____________________________
> >
> > 8 processes - 625 MB (5000 MB total) -> works!
> >
> > I can think of some pitfalls that might cause this issue but I do not
> have the knowledge of the internally called routines to check them. Is
> someone familier with the implementation of MPI_Gather(...) and willing to
> help me?
> >
> > Best regards
> >
> > Florian
> >
> > Deutsches Zentrum für Luft- und Raumfahrt e. V. (DLR)
> > German Aerospace Center
> > Institute of Planetary Research | Planetary Physics | Rutherfordstraße 2
> | 12489 Berlin
> >
> > Florian Willich| Intern - Software Developer (Parallel Applications)
> > florian.willlich at dlr.de
> > DLR.de
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> >
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> >
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> >
>
> --
> Rob Latham
> Mathematics and Computer Science Division
> Argonne National Lab, IL USA
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>



-- 
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20151202/324c132f/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list