[mpich-devel] Fwd: MPI_Comm_Split/Dup scalability on BGQ and K supercomputers
Jeff Hammond
jeff.science at gmail.com
Thu May 29 15:41:49 CDT 2014
On Thu, May 29, 2014 at 2:41 PM, Rob Latham <robl at mcs.anl.gov> wrote:
>
>
> On 05/29/2014 02:21 PM, Kenneth Raffenetti wrote:
>>
>> I do not [know where the source code to IBM's MPI
>> implementation on Blue Gene is]. RobL should know. I imagine you have to
>> add it via softenv?
>
>
> Don't be shy, man! these messages should all be on the mpich-devel list,
> which I'm looping back in now.
>
> - you can add the mpich-ibm repository and then clone it with git. A bit of
> a mess, since V1R1M1 and V1R2M1 have a different directory structure.
>
> - you can browse the source on-line Mira is now (finally!) running V1R2M1,
> so follow the BGQ/IBM_V1R2M1/efix15 tag:
>
> http://git.mpich.org/mpich-ibm.git/tree/81ec186864f8cfe69ed9d99dfbd2d5ffd423e6cd:/mpich2
if you want the canonical MPI BGQ source that IBM packages in the
driver, see https://wiki.alcf.anl.gov/parts/index.php/Blue_Gene/Q#Source_Code
jeff
>> On 05/29/2014 02:15 PM, Junchao Zhang wrote:
>>>
>>> Ken,
>>> Do you know where is IBM MPI on mira?
>>>
>>> --Junchao Zhang
>>>
>>>
>>> ---------- Forwarded message ----------
>>> From: *Thakur, Rajeev* <thakur at mcs.anl.gov <mailto:thakur at mcs.anl.gov>>
>>> Date: Thu, May 29, 2014 at 2:11 PM
>>> Subject: Re: [mpich-devel] MPI_Comm_Split/Dup scalability on BGQ and K
>>> supercomputers
>>> To: "Zhang, Junchao" <jczhang at mcs.anl.gov <mailto:jczhang at mcs.anl.gov>>
>>> Cc: "Balaji, Pavan" <balaji at anl.gov <mailto:balaji at anl.gov>>
>>>
>>>
>>> Can you look at the source code of IBM MPI on Mira to see what they are
>>> doing. I don't know if the source code is in our repo or somewhere else,
>>> but Rob Latham should know.
>>>
>>> Also, there is a bug in Comm_split in the latest version of MPI
>>> installed on Mira. There was a lot of discussion on the early-users
>>> mailing list. Rob Latham knows about it, and he can tell you what the
>>> bug is about.
>>>
>>> Rajeev
>>>
>>>
>>> On May 29, 2014, at 1:49 PM, Junchao Zhang <jczhang at mcs.anl.gov
>>> <mailto:jczhang at mcs.anl.gov>>
>>> wrote:
>>>
>>> > I did more experiments on Mira. But I feel I am lost :) In short, I
>>> think the original MPICH comm_split implementation is good if not the
>>> best. However, for unknown reasons, the MPI library provided by Mira
>>> is bad.
>>> > I wrote three micro-benchmarks, 1) qsort.c, which is a sequential C
>>> doe, to test the libc qsort performance when sorting a presorted array.
>>> 2) split.c, which measures MPI_Comm_split(MPI_COMM_WORLD, 0, myrank,
>>> ...). Note that all processes have the same color and use their old
>>> rank. The maximal time on processes is reported. 3) dup.c, which simply
>>> does MPI_Comm_dup(MPI_COMM_WORLD, &newcomm).
>>> >
>>> > I compiled the code using either mpixlc, which is provided by
>>> +mpiwrapper-gcc on Mira, or mpicc, which is built from mpich/master by
>>> myself, or mpicc-mergesort, which is also built by myself with a
>>> mergesort comm_split.c
>>> >
>>> > QSORT : compiled with mpixlc, run with 1 core on 1 node (also run
>>> with 16 cores simultaneously on 1 node, which gives even better
>>> results). The results indicate the libc-qsort used by mpixlc is good.
>>> > # integers time (s)
>>> > 64000 0.110789
>>> > 125000 0.200747
>>> > 512000 0.445075
>>> >
>>> > COMM_SPLIT : compiled with mpixlc.(Note the results match with Sam's)
>>> > # ranks time (s)
>>> > 64000 2.297776
>>> > 125000 8.157885
>>> > 512000 237.079862
>>> >
>>> > COMM_DUP : compiled with mpixlc (Note the results match with Sam's)
>>> > 64000 2.057966
>>> > 125000 7.808858
>>> > 512000 235.948144
>>> >
>>> > COMM_SPLIT : compiled with mpicc, run two times. The variation is big
>>> at 512,000.
>>> > 64000 0.549024 0.545003
>>> > 125000 0.959198 0.965059
>>> > 512000 6.625385 3.716696
>>> >
>>> > COMM_DUP : compiled with mpicc
>>> > 64000 0.028518
>>> > 125000 0.056386
>>> > 512000 0.202677
>>> >
>>> > COMM_SPLIT : compiled with mpicc-mergesort
>>> > 64000 0.362572
>>> > 125000 0.712167
>>> > 512000 2.929134
>>> >
>>> > COMM_DUP : compiled with mpicc-mergesort
>>> > 64000 0.028835
>>> > 125000 0.055987
>>> > 512000 0.202660
>>> >
>>> > (Another difference should be mentioned is that mpixlc version is
>>> "MPICH2 version 1.5". I looked at the log history of comm_split.c.
>>> After MPICH2 1.5, there is no big changes)
>>> > We can see Comm_dup performance is independent of Comm_split and
>>> mpicc-mergesort is slightly better than mpicc. It may be still worth
>>> pushing mergesort to mpich.
>>> >
>>> > I have no idea why mpixlc is bad. Should I reported it to ALCF?
>>> >
>>> > --Junchao Zhang
>>> >
>>> >
>>> > On Wed, May 28, 2014 at 7:34 AM, Junchao Zhang <jczhang at mcs.anl.gov
>>> <mailto:jczhang at mcs.anl.gov>> wrote:
>>> > Fixed them. Thanks.
>>> >
>>> > --Junchao Zhang
>>> >
>>> >
>>> > On Tue, May 27, 2014 at 11:02 PM, Thakur, Rajeev <thakur at mcs.anl.gov
>>> <mailto:thakur at mcs.anl.gov>> wrote:
>>> > Junchao,
>>> > Your code looks fine, although I didn't check the merge
>>> sort in detail for potential bugs. You may want add a comment explaining
>>> that the 2x in the memory allocation for keytable is because the second
>>> half of it will be used as scratch space for the merge sort. Otherwise,
>>> someone looking at the code 3 years from now will miss it :-). Also, a
>>> typo in one comment: sortting -> sorting.
>>> >
>>> > Rajeev
>>> >
>>> >
>>> > On May 27, 2014, at 3:42 PM, Junchao Zhang <jczhang at mcs.anl.gov
>>> <mailto:jczhang at mcs.anl.gov>> wrote:
>>> >
>>> > > Here is the code.
>>> > > We do not just sort keys. We also want to keep track of old ranks
>>> for these keys. So we actually sort an array of {key, rank}.
>>> > >
>>> > > --Junchao Zhang
>>> > >
>>> > >
>>> > > On Tue, May 27, 2014 at 12:58 PM, Junchao Zhang
>>> <jczhang at mcs.anl.gov <mailto:jczhang at mcs.anl.gov>> wrote:
>>> > > Downloaded mpich2-1.3.1 and checked that . It uses bubble sort,
>>> which is a stable sort algorithm and no extra buffer is needed.
>>> > > merge sort is stable, but needs O(N) extra memory. I can allocate
>>> the array and the scratch together in one malloc instead of two mallocs
>>> separately. Let me do that.
>>> > >
>>> > > --Junchao Zhang
>>> > >
>>> > >
>>> > > On Tue, May 27, 2014 at 12:46 PM, Thakur, Rajeev
>>> <thakur at mcs.anl.gov <mailto:thakur at mcs.anl.gov>> wrote:
>>> > > The version I suggested (1.4) was not old enough. It has the new
>>> code with quick sort, not the old code with bubble sort. Instead, see
>>> 1.3.1 (http://www.mpich.org/static/downloads/1.3.1/). I confirmed that
>>> it has the old bubble sort. I believe it used O(p) less memory as well
>>> but I didn't recheck just now.
>>> > >
>>> > > Rajeev
>>> > >
>>> > >
>>> > >
>>> > > On May 27, 2014, at 12:38 PM, Junchao Zhang <jczhang at mcs.anl.gov
>>> <mailto:jczhang at mcs.anl.gov>>
>>> > > wrote:
>>> > >
>>> > > > I read the old code. It is not efficient. The code gathers {key,
>>> color (in effect it is later used to store old ranks) } and puts them in
>>> an array. Actually, that is enough info for a stable sort (i.e., using
>>> old ranks for ties). But the old code did not realize that and created
>>> a new type, splittype { key, color, orig_idx}, It assigns orig_idx the
>>> index of the element in array and uses that for ties. This is the O(p)
>>> memory you mentioned. We can say it adds 50% memory use.
>>> > > > For merge sort, the buffer has to be standalone (i.e., not mixed
>>> with old data). So I deleted the orig_idx field from splittype, and
>>> allocated a buffer as big as the original array, in effect adding 100%
>>> memory use.
>>> > > > I think we can have a fast qsort with 0% extra memory for our
>>> case. I will do that experiment later.
>>> > > >
>>> > > > --Junchao Zhang
>>> > > >
>>> > > >
>>> > > > On Tue, May 27, 2014 at 11:17 AM, Thakur, Rajeev
>>> <thakur at mcs.anl.gov <mailto:thakur at mcs.anl.gov>> wrote:
>>> > > > Junchao,
>>> > > > As far as I remember, when we switched from bubble
>>> sort to quicksort, we had to allocate additional O(nprocs) memory for
>>> the trick to make the qsort stable. You can check the old code in a
>>> release in 2011 (say 1.4 from
>>> http://www.mpich.org/static/downloads/1.4/). It would be good if we can
>>> avoid that additional memory allocation in the new sort. And if in the
>>> merge sort, you have added one more allocate, that should definitely be
>>> avoided.
>>> > > >
>>> > > > Rajeev
>>> > > >
>>> > > >
>>> > > > On May 27, 2014, at 9:40 AM, Junchao Zhang <jczhang at mcs.anl.gov
>>> <mailto:jczhang at mcs.anl.gov>> wrote:
>>> > > >
>>> > > > > Rejeev,
>>> > > > > The code is at mpich-review/commsplit. I pushed it there
>>> several days ago and MPICH test results look good. Could you review it?
>>> > > > > Yes, I will look why the improvement of Comm_dup is better.
>>> > > > > Another thing I want to investigate is: In Comm_split, we
>>> sort {key, old rank}, i.e., an integer pair. So basically, we have
>>> mechanisms to avoid the unstable problem of qsort. We can copy an
>>> optimized qsort instead of using the one in libc, and test its
>>> performance. The benefit of qsort is we can avoid the extra buffer
>>> needed in mergesort. Using a customized qsort, we can also avoid
>>> function pointer dereferencing. In libc qsort, one provides a generic
>>> comp function pointer. But our case is simple. We just want to compare
>>> integers. We can avoid the function call cost for each comparison.
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > --Junchao Zhang
>>> > > > >
>>> > > > >
>>> > > > > On Tue, May 27, 2014 at 9:24 AM, Thakur, Rajeev
>>> <thakur at mcs.anl.gov <mailto:thakur at mcs.anl.gov>> wrote:
>>> > > > > Thanks. Have you run your new version through the MPICH test
>>> suite to make sure there are no bugs in the new sort.
>>> > > > >
>>> > > > > Can you look at Comm_dup to see why it improved so much.
>>> > > > >
>>> > > > > BG/Q has a relatively slower processor, so the sorting time is
>>> noticeable, unlike on the NERSC system.
>>> > > > >
>>> > > > > I think we should include it in 3.1.1
>>> > > > >
>>> > > > > Rajeev
>>> > > > >
>>> > > > > On May 27, 2014, at 9:16 AM, Junchao Zhang <jczhang at mcs.anl.gov
>>> <mailto:jczhang at mcs.anl.gov>>
>>> > > > > wrote:
>>> > > > >
>>> > > > > > I tested Sam's HPGMG on Mira. I tried mpiwrapper-gcc and
>>> mpiwrapper-xl provided by the softenv of Mira, and also tried a
>>> self-built mpich with my own comm_split + mergesort. Here are test
>>> results.
>>> > > > > >
>>> > > > > > mpiwrapper-gcc and mpiwrapper-xl have similar results like
>>> the following (all time is in seconds)
>>> > > > > > # of ranks comm_dup comm_split MGBuild
>>> > > > > > 64000 2.059304 2.291601 28.085873
>>> > > > > > 125000 7.807612 8.158366 88.644299
>>> > > > > > 512000 235.923766 (no results due to job timed
>>> out)
>>> > > > > >
>>> > > > > > mpich-mergesort
>>> > > > > > # of ranks comm_dup comm_split MGBuild
>>> > > > > > 64000 0.046616 0.716383 6.999050
>>> > > > > > 125000 0.065818 1.293382 10.696351
>>> > > > > > 512000 0.229605 4.998839 52.467708
>>> > > > > >
>>> > > > > > I don't know why the MPICH libraries on Mira are bad, since
>>> optimizations in qsort to avoid the worst case are not new. The libc
>>> library on Mira should have an optimized qsort.
>>> > > > > > Another thing I have not looked into is why comm_dup is
>>> affected by comm_split. It seems comm_dup calls comm_split internally.
>>> > > > > >
>>> > > > > > In short, the current mergesort solution largely fixes the
>>> scaling problem. We may further investigate the issue and come up with
>>> better solutions. But if you think this is a key fix, probably we can
>>> push it to 3.1.1.
>>> > > > > >
>>> > > > > >
>>> > > > > > --Junchao Zhang
>>> > > > > >
>>> > > > > >
>>> > > > > > On Sun, May 25, 2014 at 12:25 PM, Junchao Zhang
>>> <jczhang at mcs.anl.gov <mailto:jczhang at mcs.anl.gov>> wrote:
>>> > > > > > Yes, I am going to test that.
>>> > > > > >
>>> > > > > > --Junchao Zhang
>>> > > > > >
>>> > > > > >
>>> > > > > > On Sun, May 25, 2014 at 11:11 AM, Thakur, Rajeev
>>> <thakur at mcs.anl.gov <mailto:thakur at mcs.anl.gov>> wrote:
>>> > > > > > Can you also see if the xlc compiler makes any difference?
>>> > > > > >
>>> > > > > > Rajeev
>>> > > > > >
>>> > > > > > On May 25, 2014, at 8:14 AM, Junchao Zhang
>>> <jczhang at mcs.anl.gov <mailto:jczhang at mcs.anl.gov>>
>>> > > > > > wrote:
>>>
>>> > > > > >
>>> > > > > > > I tested hpgmg on Mira and reproduced the scaling problem.
>>> I used mpiwrapper-gcc.
>>> > > > > > > With 64000 MPI tasks, the output is:
>>> > > > > > > Duplicating MPI_COMM_WORLD...done (2.059304 seconds)
>>> > > > > > > Building MPI subcommunicators..., level 1...done (2.291601
>>> seconds)
>>> > > > > > > Total time in MGBuild 28.085873 seconds
>>> > > > > > > With 125000 MPI Tasks, the output is:
>>> > > > > > > Duplicating MPI_COMM_WORLD...done (7.807612 seconds)
>>> > > > > > > Building MPI subcommunicators..., level 1...done (8.158366
>>> seconds)
>>> > > > > > > Total time in MGBuild 88.644299 seconds
>>> > > > > > >
>>> > > > > > > Let me replace qsort with mergesort in Comm_split to see
>>> what will happen.
>>> > > > > > >
>>> > > > > > > --Junchao Zhang
>>> > > > > > >
>>> > > > > > >
>>> > > > > > > On Sat, May 24, 2014 at 9:07 AM, Sam Williams
>>> <swwilliams at lbl.gov <mailto:swwilliams at lbl.gov>> wrote:
>>> > > > > > > I saw the problem on Mira and K but not Edison. I don't
>>> know if that is due to scale or implementation.
>>> > > > > > >
>>> > > > > > > On Mira, I was running jobs with a 10min wallclock limit.
>>> I scaled to 46656 processes with 64 threads per process (c1,
>>> OMP_NUM_THREADS=64) and all jobs completed successfully. However, I was
>>> only looking at MGSolve times and not MGBuild times. I then decided to
>>> explore 8 threads per process (c8, OMP_NUM_THREADS=8) and started at the
>>> high concurrency. The jobs timed out while still in MGBuild after
>>> 10mins with 373248 processes as well as with a 20min timeout. At that
>>> point I added the USE_SUBCOMM option to enable/disable the use of
>>> comm_split. I haven't tried scaling with the sub communicator on Mira
>>> since then.
>>> > > > > > >
>>> > > > > > >
>>> > > > > > > On May 24, 2014, at 6:39 AM, Junchao Zhang
>>> <jczhang at mcs.anl.gov <mailto:jczhang at mcs.anl.gov>> wrote:
>>> > > > > > >
>>> > > > > > > > Hi, Sam,
>>> > > > > > > > Could you give me the exact number of MPI ranks for
>>> your results on Mira?
>>> > > > > > > > I run hpgmg on Edison with export OMP_NUM_THREADS=1,
>>> aprun -n 64000 -ss -cc numa_node ./hpgmg-fv 6 1. The total time in
>>> MGBuild is about 0.005 seconds. I was wondering how many cores I need to
>>> apply to reproduce the problem.
>>> > > > > > > > Thanks.
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > > --Junchao Zhang
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > > On Sat, May 17, 2014 at 9:06 AM, Sam Williams
>>> <swwilliams at lbl.gov <mailto:swwilliams at lbl.gov>> wrote:
>>> > > > > > > > I've been conducting scaling experiments on the Mira
>>> (Blue Gene/Q) and K (Sparc) supercomputers. I've noticed that the time
>>> required for MPI_Comm_split and MPI_Comm_dup can grow quickly with scale
>>> (~P^2). As such, its performance eventually becomes a bottleneck. That
>>> is, although the benefit of using a subcommunicator is huge (multigrid
>>> solves are weak-scalable), the penalty of creating one (multigrid build
>>> time) is also huge.
>>> > > > > > > >
>>> > > > > > > > For example, when scaling from 1 to 46K nodes (= cubes of
>>> integers) on Mira, the time (in seconds) required to build a MG solver
>>> (including a subcommunicator) scales as
>>> > > > > > > > 222335.output: Total time in MGBuild 0.056704
>>> > > > > > > > 222336.output: Total time in MGBuild 0.060834
>>> > > > > > > > 222348.output: Total time in MGBuild 0.064782
>>> > > > > > > > 222349.output: Total time in MGBuild 0.090229
>>> > > > > > > > 222350.output: Total time in MGBuild 0.075280
>>> > > > > > > > 222351.output: Total time in MGBuild 0.091852
>>> > > > > > > > 222352.output: Total time in MGBuild 0.137299
>>> > > > > > > > 222411.output: Total time in MGBuild 0.301552
>>> > > > > > > > 222413.output: Total time in MGBuild 0.606444
>>> > > > > > > > 222415.output: Total time in MGBuild 0.745272
>>> > > > > > > > 222417.output: Total time in MGBuild 0.779757
>>> > > > > > > > 222418.output: Total time in MGBuild 4.671838
>>> > > > > > > > 222419.output: Total time in MGBuild 15.123162
>>> > > > > > > > 222420.output: Total time in MGBuild 33.875626
>>> > > > > > > > 222421.output: Total time in MGBuild 49.494547
>>> > > > > > > > 222422.output: Total time in MGBuild 151.329026
>>> > > > > > > >
>>> > > > > > > > If I disable the call to MPI_Comm_Split, my time scales as
>>> > > > > > > > 224982.output: Total time in MGBuild 0.050143
>>> > > > > > > > 224983.output: Total time in MGBuild 0.052607
>>> > > > > > > > 224988.output: Total time in MGBuild 0.050697
>>> > > > > > > > 224989.output: Total time in MGBuild 0.078343
>>> > > > > > > > 224990.output: Total time in MGBuild 0.054634
>>> > > > > > > > 224991.output: Total time in MGBuild 0.052158
>>> > > > > > > > 224992.output: Total time in MGBuild 0.060286
>>> > > > > > > > 225008.output: Total time in MGBuild 0.062925
>>> > > > > > > > 225009.output: Total time in MGBuild 0.097357
>>> > > > > > > > 225010.output: Total time in MGBuild 0.061807
>>> > > > > > > > 225011.output: Total time in MGBuild 0.076617
>>> > > > > > > > 225012.output: Total time in MGBuild 0.099683
>>> > > > > > > > 225013.output: Total time in MGBuild 0.125580
>>> > > > > > > > 225014.output: Total time in MGBuild 0.190711
>>> > > > > > > > 225016.output: Total time in MGBuild 0.218329
>>> > > > > > > > 225017.output: Total time in MGBuild 0.282081
>>> > > > > > > >
>>> > > > > > > > Although I didn't directly measure it, this suggests the
>>> time for MPI_Comm_Split is growing roughly quadratically with process
>>> concurrency.
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > > I see the same effect on the K machine (8...64K nodes)
>>> where the code uses comm_split/dup in conjunction:
>>> > > > > > > > run00008_7_1.sh.o2412931: Total time in MGBuild
>>> 0.026458 seconds
>>> > > > > > > > run00064_7_1.sh.o2415876: Total time in MGBuild
>>> 0.039121 seconds
>>> > > > > > > > run00512_7_1.sh.o2415877: Total time in MGBuild
>>> 0.086800 seconds
>>> > > > > > > > run01000_7_1.sh.o2414496: Total time in MGBuild
>>> 0.129764 seconds
>>> > > > > > > > run01728_7_1.sh.o2415878: Total time in MGBuild
>>> 0.224576 seconds
>>> > > > > > > > run04096_7_1.sh.o2415880: Total time in MGBuild
>>> 0.738979 seconds
>>> > > > > > > > run08000_7_1.sh.o2414504: Total time in MGBuild
>>> 2.123800 seconds
>>> > > > > > > > run13824_7_1.sh.o2415881: Total time in MGBuild
>>> 6.276573 seconds
>>> > > > > > > > run21952_7_1.sh.o2415882: Total time in MGBuild
>>> 13.634200 seconds
>>> > > > > > > > run32768_7_1.sh.o2415884: Total time in MGBuild
>>> 36.508670 seconds
>>> > > > > > > > run46656_7_1.sh.o2415874: Total time in MGBuild
>>> 58.668228 seconds
>>> > > > > > > > run64000_7_1.sh.o2415875: Total time in MGBuild
>>> 117.322217 seconds
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > > A glance at the implementation on Mira (I don't know if
>>> the implementation on K is stock) suggests it should be using qsort to
>>> sort based on keys. Unfortunately, qsort is not performance robust like
>>> heap/merge sort. If one were to be productive and call comm_split
>>> like...
>>> > > > > > > > MPI_Comm_split(...,mycolor,myrank,...)
>>> > > > > > > > then one runs the risk that the keys are presorted. This
>>> hits the worst case computational complexity for qsort... O(P^2).
>>> Demanding programmers avoid sending sorted keys seems unreasonable.
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > > I should note, I see a similar lack of scaling with
>>> MPI_Comm_dup on the K machine. Unfortunately, my BGQ data used an
>>> earlier version of the code that did not use comm_dup. As such, I can’t
>>> definitively say that it is a problem on that machine as well.
>>> > > > > > > >
>>> > > > > > > > Thus, I'm asking for scalable implementations of
>>> comm_split/dup using merge/heap sort whose worst case complexity is
>>> still PlogP to be prioritized in the next update.
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > > thanks
>>> > > > > > > > _______________________________________________
>>> > > > > > > > To manage subscription options or unsubscribe:
>>> > > > > > > > https://lists.mpich.org/mailman/listinfo/devel
>>> > > > > > > >
>>> > > > > > > > _______________________________________________
>>> > > > > > > > To manage subscription options or unsubscribe:
>>> > > > > > > > https://lists.mpich.org/mailman/listinfo/devel
>>> > > > > > >
>>> > > > > > > _______________________________________________
>>> > > > > > > To manage subscription options or unsubscribe:
>>> > > > > > > https://lists.mpich.org/mailman/listinfo/devel
>>> > > > > > >
>>> > > > > > > _______________________________________________
>>> > > > > > > To manage subscription options or unsubscribe:
>>> > > > > > > https://lists.mpich.org/mailman/listinfo/devel
>>> > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > >
>>> > > > >
>>> > > >
>>> > > >
>>> > >
>>> > >
>>> > >
>>> > > <comm_split.c>
>>> >
>>> >
>>> >
>>>
>>>
>
> --
> Rob Latham
> Mathematics and Computer Science Division
> Argonne National Lab, IL USA
>
> _______________________________________________
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/devel
--
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
More information about the devel
mailing list