[mpich-discuss] MPI_Gather fails with 2048 processes and 4096 MB total

Jeff Hammond jeff.science at gmail.com
Wed Dec 2 09:28:20 CST 2015


This was fixed after Cray forked MPICH 3.1.2.  You should file a support
issue with Cray via the appropriate channel.

# master
jrhammon-mac01:git jrhammon$ git grep MPIC_Send
src/include/mpiimpl.h:int MPIC_Send(const void *buf, *MPI_Aint count,*
MPI_Datatype datatype, int dest, int tag,
...

# 3.1 branch
jrhammon-mac01:git jrhammon$ git checkout 3.1.x
jrhammon-mac01:git jrhammon$ git grep MPIC_Send
src/include/mpiimpl.h:int MPIC_Send(const void *buf, *int count*,
MPI_Datatype datatype, int dest, int tag,
...

Best,

Jeff

On Wed, Dec 2, 2015 at 6:40 AM, <Florian.Willich at dlr.de> wrote:
>
> Thanks for the "bash line", the output is:
>
> $ grep "define MPICH_VERSION" $CRAY_MPICH2_DIR/include/mpi.h
> #define MPICH_VERSION "3.1.2"
>
> Florian
>
> ________________________________
> Von: Jeff Hammond [jeff.science at gmail.com]
> Gesendet: Mittwoch, 2. Dezember 2015 14:58
> An: MPICH
>
> Betreff: Re: [mpich-discuss] MPI_Gather fails with 2048 processes and
4096 MB total
>
> 1) You should address this with Cray.  Their implementation of MPI has
closed-source modifications to MPICH that we cannot analyze for
count-safety.
>
> 2) You can discover the version of MPICH associated with Cray MPI like
this:
> jhammond at cori11:~> grep "define MPICH_VERSION"
$CRAY_MPICH2_DIR/include/mpi.h
> #define MPICH_VERSION "3.1.2"
>
> Best,
>
> Jeff
>
> On Wed, Dec 2, 2015 at 12:15 AM, <Florian.Willich at dlr.de> wrote:
>>
>> Hi Rob,
>>
>> well maybe I was addressing the wrong organisation... I am currently
testing on the Cray swan super computer which provides the module
cray-mpich/7.2.6 ("Cray Message Passing Toolkit 7.2.6").
>>
>> I can not determine whether the cray mpich version is mpich with
additional implementations or if it is totally different from the official
mpich releases. Additioanlly, I can not figure out on which mpich version
this cray-mpich module is based on. I'll continue investigation and keep
you updated.
>>
>> Best Regards
>>
>> Florian
>> ________________________________________
>> Von: Rob Latham [robl at mcs.anl.gov]
>> Gesendet: Dienstag, 1. Dezember 2015 16:48
>> An: discuss at mpich.org
>> Betreff: Re: [mpich-discuss] MPI_Gather fails with 2048 processes and
4096 MB total
>>
>> On 11/26/2015 12:38 PM, Archer, Charles J wrote:
>> > FYI, we hit various flavors of this problem when I was still at IBM, I
think mostly in weather codes.
>> > Apparently Cray hit this too:
>> >
>> > https://trac.mpich.org/projects/mpich/ticket/1767
>> >
>> > We pretty much told our customers back then that a fix was forthcoming
(with no ETA :) )with the revamp of datatypes to use internal 64-bit counts.
>> > We also provided workarounds.
>> >
>> > In the case of this gather operation, we asked the customer to
implement gather as a flat tree using point to point.
>> > Root posts irecvs, then barrier, children send to root.
>> >
>> > IRC, the giant gather we were debugging was at the very end of the
application and used to gather some statistics for IO at the root, so it
wasn’t critical to perform well.
>> > I also attempted a workaround using some derived datatypes, but I hit
another truncation in the datatype code itself :\
>> > I should see if I can dig up that implementation and make sure it
isn’t still broken for large counts.
>>
>> those are all fine approaches to work around the problem.  the internals
>> of MPICH, though, need to be 64 bit clean -- there are still 4500 places
>> where clang warns of a 64 bit value being assigned to a 32 bit type.
>>
>> Florian Willich, what version of MPICH is this?   The line numbers in
>> the back trace don't match up with what I've got, and
>> I really thought we fixed this class of bug with commits 31d95ed7b18c
>> and 68f8c7aa7 over the summer.
>>
>> ==rob
>>
>>
>> >
>> >
>> >
>> >
>> >
>> > On Nov 26, 2015, at 12:14 PM, Balaji, Pavan <balaji at anl.gov<mailto:
balaji at anl.gov>> wrote:
>> >
>> >
>> > Thanks for reporting.  This looks like an integer-overflow issue,
which fails when the summation of data elements from all processes is
larger than INT_MAX (2 billion).  We'll look into it.  I've created a
ticket for it, and added you as the reporter, so you'll get notified as
they are updates.
>> >
>> > http://trac.mpich.org/projects/mpich/ticket/2317
>> >
>> > Rob: can you create a simple test program for this and add it to the
test bucket, so it shows up on the nightlies?
>> >
>> > Thanks,
>> >
>> >   -- Pavan
>> >
>> > On Nov 26, 2015, at 10:18 AM, Florian.Willich at dlr.de wrote:
>> >
>> > Dear mpich discussion group,
>> >
>> > the following issue appeared when running some benchmarks with MPI
Gather:
>> >
>> > Gathering data (calling MPI_Gather(...) ) involing 2048 processes and
2 MB of data (4096 MB total) that each process sends fails with the
following output:
>> > ____________________________
>> >
>> > Rank 1024 [Thu Nov 26 09:43:16 2015] [c1-0c1s12n3] Fatal error in
PMPI_Gather: Invalid count, error stack:
>> > PMPI_Gather(959)......: MPI_Gather(sbuf=0x2aaab826c010, scount=524288,
MPI_INT, rbuf=(nil), rcount=524288, MPI_INT, root=0, MPI_COMM_WORLD) failed
>> > MPIR_Gather_impl(775).:
>> > MPIR_Gather(735)......:
>> > MPIR_Gather_intra(347):
>> > MPIC_Send(360)........: Negative count, value is -2147483648
>> > _pmiu_daemon(SIGCHLD): [NID 00307] [c1-0c1s12n3] [Thu Nov 26 09:43:16
2015] PE RANK 1024 exit signal Aborted
>> > [NID 00307] 2015-11-26 09:43:16 Apid 949450: initiated application
termination
>> > Application 949450 exit codes: 134
>> > Application 949450 exit signals: Killed
>> > Application 949450 resources: utime ~1s, stime ~137s, Rss ~2110448,
inblocks ~617782, outblocks ~1659320
>> > ____________________________
>> >
>> > The following are some tests that I ran to better understand the
problem:
>> >
>> > 2047 processes - 2 MB (4094 MB total) -> works!
>> >
>> > 2048 processes - 2047.5 KB (~1.999512 MB) (4095 MB total) -> works!
>> >
>> > 2048 processes - 3 MB (6144 MB total) -> fails:
>> > ____________________________
>> >
>> > Rank 1024 [Thu Nov 26 09:41:15 2015] [c1-0c1s12n3] Fatal error in
PMPI_Gather: Invalid count, error stack:
>> > PMPI_Gather(959)......: MPI_Gather(sbuf=0x2aaab826c010, scount=786432,
MPI_INT, rbuf=(nil), rcount=786432, MPI_INT, root=0, MPI_COMM_WORLD) failed
>> > MPIR_Gather_impl(775).:
>> > MPIR_Gather(735)......:
>> > MPIR_Gather_intra(347):
>> > MPIC_Send(360)........: Negative count, value is -1073741824
>> > _pmiu_daemon(SIGCHLD): [NID 00307] [c1-0c1s12n3] [Thu Nov 26 09:41:15
2015] PE RANK 1024 exit signal Aborted
>> > [NID 00307] 2015-11-26 09:41:15 Apid 949448: initiated application
termination
>> > Application 949448 exit codes: 134
>> > Application 949448 exit signals: Killed
>> > Application 949448 resources: utime ~1s, stime ~139s, Rss ~3159984,
inblocks ~617782, outblocks ~1659351
>> > ____________________________
>> >
>> > 2047 processes - 3 MB (6141 MB total) -> fails:
>> > ____________________________
>> >
>> > Rank 1024 [Thu Nov 26 09:40:31 2015] [c1-0c1s12n3] Fatal error in
PMPI_Gather: Invalid count, error stack:
>> > PMPI_Gather(959)......: MPI_Gather(sbuf=0x2aaab826c010, scount=786432,
MPI_INT, rbuf=(nil), rcount=786432, MPI_INT, root=0, MPI_COMM_WORLD) failed
>> > MPIR_Gather_impl(775).:
>> > MPIR_Gather(735)......:
>> > MPIR_Gather_intra(347):
>> > MPIC_Send(360)........: Negative count, value is -1076887552
>> > _pmiu_daemon(SIGCHLD): [NID 00307] [c1-0c1s12n3] [Thu Nov 26 09:40:32
2015] PE RANK 1024 exit signal Aborted
>> > [NID 00307] 2015-11-26 09:40:32 Apid 949446: initiated application
termination
>> > Application 949446 exit codes: 134
>> > Application 949446 exit signals: Killed
>> > Application 949446 resources: utime ~1s, stime ~134s, Rss ~3157072,
inblocks ~617780, outblocks ~1659351
>> > ____________________________
>> >
>> > 8 processes - 625 MB (5000 MB total) -> works!
>> >
>> > I can think of some pitfalls that might cause this issue but I do not
have the knowledge of the internally called routines to check them. Is
someone familier with the implementation of MPI_Gather(...) and willing to
help me?
>> >
>> > Best regards
>> >
>> > Florian
>> >
>> > Deutsches Zentrum für Luft- und Raumfahrt e. V. (DLR)
>> > German Aerospace Center
>> > Institute of Planetary Research | Planetary Physics | Rutherfordstraße
2 | 12489 Berlin
>> >
>> > Florian Willich| Intern - Software Developer (Parallel Applications)
>> > florian.willlich at dlr.de
>> > DLR.de
>> > _______________________________________________
>> > discuss mailing list     discuss at mpich.org
>> > To manage subscription options or unsubscribe:
>> > https://lists.mpich.org/mailman/listinfo/discuss
>> >
>> > _______________________________________________
>> > discuss mailing list     discuss at mpich.org
>> > To manage subscription options or unsubscribe:
>> > https://lists.mpich.org/mailman/listinfo/discuss
>> >
>> > _______________________________________________
>> > discuss mailing list     discuss at mpich.org
>> > To manage subscription options or unsubscribe:
>> > https://lists.mpich.org/mailman/listinfo/discuss
>> >
>>
>> --
>> Rob Latham
>> Mathematics and Computer Science Division
>> Argonne National Lab, IL USA
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>
>
>
>
> --
> Jeff Hammond
> jeff.science at gmail.com
> http://jeffhammond.github.io/
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss




--
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20151202/d5483f92/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list