[mpich-discuss] MPI memory allocation.

Anatoly G anatolyrishon at gmail.com
Sat Dec 21 23:54:51 CST 2013


Good morning Pavan.
Do you see process memory raise & network throughput low behavior?

Anatoly.


On Thu, Dec 19, 2013 at 7:09 PM, Anatoly G <anatolyrishon at gmail.com> wrote:

> No. You can see memory increase using monitor like "qps", or any other.
> The stack trace that I send I get when I stop my real program (not
> simulation) with TotalView debugger.
> if you monitor network traffic, you will see network speed which becomes
> low until it reaches almost 0.
>
> Anatoly.
>
>
> On Thu, Dec 19, 2013 at 5:16 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
>
>>
>> Are you printing it out?
>>
>>   — Pavan
>>
>> On Dec 19, 2013, at 10:51 PM, Anatoly G <anatolyrishon at gmail.com> wrote:
>>
>> > Can you please remove comment from section
>> > /*
>> >         // swap tag & enter blocked recv
>> >         MPI_Status stat;
>> >         tags[slaveIdx] = (tags[slaveIdx] == TAG1) ? TAG2 : TAG1;
>> >         MPI_Recv(RcvBufs[slaveIdx], BUF_SZ, MPI::CHAR, slaveRank,
>> tags[slaveIdx], MPI_COMM_WORLD, &stat);
>> >
>> >         ++SlavesRcvIters[slaveIdx];
>> > */
>> >
>> > And then run it.
>> > Do you see memory allocation increase?
>> >
>> > Regards,
>> > Anatoly.
>> >
>> >
>> > On Thu, Dec 19, 2013 at 4:29 PM, Pavan Balaji <balaji at mcs.anl.gov>
>> wrote:
>> >
>> > I’m not sure what I should look for.  I ran the program and it
>> completed fine.
>> >
>> >   — Pavan
>> >
>> > On Dec 19, 2013, at 7:16 PM, Anatoly G <anatolyrishon at gmail.com> wrote:
>> >
>> > > Good afternoon.
>> > > My program enters a lot to attached stack functions.
>> > > Can you please explain if it's ok?
>> > > Did you success to execute simulation from previous mail.
>> > > Did you see the memory raise when MPI_Recv not in comments?
>> > >
>> > > Regards,
>> > > Anatoly.
>> > >
>> > >
>> > > On Thu, Dec 12, 2013 at 9:36 PM, Anatoly G <anatolyrishon at gmail.com>
>> wrote:
>> > > Hi.
>> > > Finally, I got an additional info.
>> > > I build short simulation of my real application.
>> > >
>> > > The short description of real scenario.
>> > > I have Master + N slaves. Each slave sends to Master 2 types of
>> messages:
>> > >       • constant length message with predefined fields (one of it's
>> fields is length of second message).
>> > >       • second message - length of this message each time is
>> different and passed in first message.
>> > > Master should use MPI_Irecv commands, in order to be tolerant to
>> slaves failure (blocked MPI_Recv is blocking Master in the failure case).
>> > > Master executes MPI_Irecv to each slave with buffer size equal to the
>> constant size of first message type. After receiving first type of message,
>> Master allocates expected buffer for second message and performs receive
>> too. This happens in endless loop for each slaves. I use MPI_Waitany to
>> monitor all receives.
>> > > In order to separate messages Master & slaves use different tags (as
>> ids) for first & second messages.
>> > >
>> > > Simulation description:
>> > > All passed buffers (first & second) have the same size.
>> > > Slave (SndSyncSlave) sends messages and swaps 2 tags between them
>> (like 2 types of messages, but second one has constant size too).
>> > > Master routine (Rcv_WaitAny function) executes MPI_Irecv for first
>> message, and after receive executes MPI_Irecv for the second one.
>> > >
>> > > In this scenario: 5 processes works fine, but if I execute 20
>> processes and remove comment from line "usleep(200000)"  I see 800 Mbit/s
>> on network at the test beginning, but after 1-2 second network speed become
>> 200-300Kbit/s and not increased back any more.
>> > >
>> > > If I add MPI_Recv block in Master (remove comment from "MPI_Recv" and
>> line around) I see that Master starts increase memory as my real
>> application, but again on 5 processes this not happens. This is scenario
>> used in my real application.
>> > >
>> > > Command line: mpiexec.hydra -genvall -f MpiConfigMachines.txt
>> -launcher=ssh -n 20 mpi_rcv_any_multithread 100000 1000000 out
>> > >
>> > > where
>> > > 100000 - number of sends from each slave
>> > > 1000000 - scale to separate input from each scale (used for debug
>> only)
>> > > out - prefix of output file. Each process produce out_"rank".txt file.
>> > >
>> > > MpiConfigMachines.txt - configuration file for my computers, 2
>> computers back to back 1 Gbit/s network.
>> > >
>> > >
>> > > Can you please test this case, and give me yours suggestions.
>> > >
>> > > Thank you,
>> > > Anatoly.
>> > >
>> > >
>> > >
>> > > On Mon, Dec 9, 2013 at 9:55 PM, Anatoly G <anatolyrishon at gmail.com>
>> wrote:
>> > > Yes, I understand that. I"ll try to make my stand alone test closer
>> to real application. Thank you.
>> > >
>> > >
>> > > On Mon, Dec 9, 2013 at 9:31 PM, Pavan Balaji <balaji at mcs.anl.gov>
>> wrote:
>> > >
>> > > It sounds like MPICH is working correctly.  Without a test case, it’s
>> unfortunately quite hard for us to even know what to look for.  It’s also
>> possible that there’s a bug in your code which might be causing some bad
>> behavior.
>> > >
>> > >   — Pavan
>> > >
>> > > On Dec 9, 2013, at 1:27 PM, Anatoly G <anatolyrishon at gmail.com>
>> wrote:
>> > >
>> > > > Yes, I"m actually need Fault tolerance, and it was the main reason
>> for choosing MPICH2. I use fault tolerance for unpredictable bugs in the
>> future. My system should survive partially. But in the regular case I just
>> need full performance. I"m suspect that I don't use MPI correctly, but on
>> slow rate everything works fine. The fail caused by increasing rate of
>> MPI_Isend or increasing data buffer size. I didn't find yet any strong
>> dependence, only main stream.
>> > > >
>> > > > Unfortunately I have a complex system which has a number of threads
>> in each process. Part of the threads use different communicators.
>> > > >
>> > > > I try to simulate the same MPI behavior in simple stand alone test,
>> but stand alone test works fine. It shows a full network performance, when
>> I slow down master (on stand alone test), all slaves are stopped too and
>> are waiting for master to continue. Can I open any MPICH log to send you
>> the results?
>> > > >
>> > > >
>> > > > On Mon, Dec 9, 2013 at 8:10 PM, Pavan Balaji <balaji at mcs.anl.gov>
>> wrote:
>> > > >
>> > > > Do you actually need Fault Tolerance (one of your previous emails
>> seemed to indicate that)?
>> > > >
>> > > > It sounds like there a bug in either your application or in the
>> MPICH stack and you are trying to trace that down, and don’t really care
>> about fault tolerance.  Is that a correct assessment?
>> > > >
>> > > > Do you have a simplified program that reproduces this error, that
>> we can try?
>> > > >
>> > > >   — Pavan
>> > > >
>> > > > On Dec 9, 2013, at 11:44 AM, Anatoly G <anatolyrishon at gmail.com>
>> wrote:
>> > > >
>> > > > > No. Hardware is Ok. Master process  allocates memory (check with
>> MemoryScape doesn't show any sufficient memory allocation in my code). Then
>> network traffic become low, and then Master process crashes w/o saving core
>> file. I have unlimited size of core files. The same fail (w/o core) I see
>> when I call MPI_Abort, but I don't call it.
>> > > > >
>> > > > >
>> > > > > On Mon, Dec 9, 2013 at 7:28 PM, Wesley Bland <wbland at mcs.anl.gov>
>> wrote:
>> > > > > Are you actually seeing hardware failure or is your code just
>> crashing? It's odd that one specific process would fail so often in the
>> same way if it were a hardware problem.
>> > > > >
>> > > > > Thanks,
>> > > > > Wesley
>> > > > >
>> > > > > On Dec 9, 2013, at 11:15 AM, Anatoly G <anatolyrishon at gmail.com>
>> wrote:
>> > > > >
>> > > > >> One more interesting fact. Each time I have a failure, the fails
>> only master process, but slaves are still exists together with
>> mpiexec.hydra. I thought that slaves should fail too, but slaves are live.
>> > > > >>
>> > > > >>
>> > > > >> On Mon, Dec 9, 2013 at 10:30 AM, Anatoly G <
>> anatolyrishon at gmail.com> wrote:
>> > > > >> I configure MPICH-3.1rc2 build w/o "so" files. But instead of
>> MPICH2 & MPICH-3.0.4 I get so files. What should I change in configure line
>> to link MPI with my application statically.
>> > > > >>
>> > > > >>
>> > > > >>
>> > > > >> On Mon, Dec 9, 2013 at 9:47 AM, Pavan Balaji <balaji at mcs.anl.gov>
>> wrote:
>> > > > >>
>> > > > >> Can you try mpich-3.1rc2?  There were several fixes for this in
>> this version and it’ll be good to try that out before we go digging too far
>> into this.
>> > > > >>
>> > > > >>   — Pavan
>> > > > >>
>> > > > >> On Dec 9, 2013, at 1:46 AM, Anatoly G <anatolyrishon at gmail.com>
>> wrote:
>> > > > >>
>> > > > >> > With MPICH - 3.0.4 the situation repeated. It looks like MPI
>> allocates memory for messages.
>> > > > >> > Can you please advice about scenario when MPI or may be TCP
>> under MPI allocates memory due to high transfer rate?
>> > > > >> >
>> > > > >> >
>> > > > >> > On Mon, Dec 9, 2013 at 9:32 AM, Anatoly G <
>> anatolyrishon at gmail.com> wrote:
>> > > > >> > Thank you very much.
>> > > > >> > Issend - is not so good, It can't support me Fault tolerance.
>> If slave process fails, the master stall.
>> > > > >> > I tried mpich-3.0.4 with hydra-3.0.4 but my program which uses
>> MPI Fault tolerance doesn't recognize failure of slave process, but
>> recognizes failure with MPICH2. May be you can suggest solution?
>> > > > >> > I tried to use hydra from MPICH2 but link my program with
>> MPICH3. This combination recognizes failures, but I"m not sure that such
>> combination is stable enough.
>> > > > >> > Can you please advice?
>> > > > >> > Anatoly.
>> > > > >> >
>> > > > >> >
>> > > > >> >
>> > > > >> > On Sat, Dec 7, 2013 at 5:20 PM, Pavan Balaji <
>> balaji at mcs.anl.gov> wrote:
>> > > > >> >
>> > > > >> > As much as I hate saying this — some people find it easier to
>> think of it as “MPICH3”.
>> > > > >> >
>> > > > >> >   — Pavan
>> > > > >> >
>> > > > >> > On Dec 7, 2013, at 7:37 AM, Wesley Bland <wbland at mcs.anl.gov>
>> wrote:
>> > > > >> >
>> > > > >> > > MPICH is just the new version of MPICH2. We renamed it when
>> we went past version 3.0.
>> > > > >> > >
>> > > > >> > > On Dec 7, 2013, at 3:55 AM, Anatoly G <
>> anatolyrishon at gmail.com> wrote:
>> > > > >> > >
>> > > > >> > >> Ok. I"ll try both Issend, and next step to upgrade MPICH to
>> 3.0.4.
>> > > > >> > >> I thought before that MPICH & MPICH2 are two different
>> branches, when MPICH2 partially supports Fault tolerance, but MPICH not.
>> Now I understand, that I was wrong and MPICH2 is just main version of MPICH.
>> > > > >> > >>
>> > > > >> > >> Thank you very much,
>> > > > >> > >> Anatoly.
>> > > > >> > >>
>> > > > >> > >>
>> > > > >> > >>
>> > > > >> > >> On Thu, Dec 5, 2013 at 11:01 PM, Rajeev Thakur <
>> thakur at mcs.anl.gov> wrote:
>> > > > >> > >> The master is receiving too many incoming messages than it
>> can match quickly enough with Irecvs. Try using MPI_Issend instead of
>> MPI_Isend.
>> > > > >> > >>
>> > > > >> > >> Rajeev
>> > > > >> > >>
>> > > > >> > >> On Dec 5, 2013, at 2:58 AM, Anatoly G <
>> anatolyrishon at gmail.com> wrote:
>> > > > >> > >>
>> > > > >> > >> > Hello.
>> > > > >> > >> > I"m using MPICH2 1.5.
>> > > > >> > >> > My system contains master and 16 slaves.
>> > > > >> > >> > System uses number of communicators.
>> > > > >> > >> > The single communicator used for below scenario:
>> > > > >> > >> > Each slave sends non-stop 2Kbyte data buffer using
>> MPI_Isend and waits using MPI_Wait.
>> > > > >> > >> > Master starts with MPI_Irecv to each slave
>> > > > >> > >> > Then in endless loop:
>> > > > >> > >> > MPI_Waitany and MPI_Irecv on rank returned by MPI_Waitany.
>> > > > >> > >> >
>> > > > >> > >> > Another communicator used for broadcast communication
>> (commands between master + slaves),
>> > > > >> > >> > but it's not used in parallel with previous communicator,
>> > > > >> > >> > only before or after data transfer.
>> > > > >> > >> >
>> > > > >> > >> > The system executed on two computers linked by 1Gbit/s
>> Ethernet.
>> > > > >> > >> > Master executed on first computer, all slaves on other
>> one.
>> > > > >> > >> > Network traffic is ~800Mbit/s.
>> > > > >> > >> >
>> > > > >> > >> > After 1-2 minutes of execution, master process starts to
>> increase it's memory allocation and network traffic becomes low.
>> > > > >> > >> > This memory allocation & network traffic slow down
>> continues until fail of MPI,
>> > > > >> > >> > without core file save.
>> > > > >> > >> > My program doesn't allocate memory. Can you please
>> explain this behaviour.
>> > > > >> > >> > How can I cause MPI to stop sending slaves if Master
>> can't serve such traffic, instead of memory allocation and fail?
>> > > > >> > >> >
>> > > > >> > >> >
>> > > > >> > >> > Thank you,
>> > > > >> > >> > Anatoly.
>> > > > >> > >> >
>> > > > >> > >> > P.S.
>> > > > >> > >> > On my stand alone test, I simulate similar behaviour, but
>> with single thread on each process (master & hosts).
>> > > > >> > >> > When I start stand alone test, master stops slaves until
>> it completes accumulated data processing and MPI doesn't increase memory
>> allocation.
>> > > > >> > >> > When Master is free slaves continue to send data.
>> > > > >> > >> > _______________________________________________
>> > > > >> > >> > discuss mailing list     discuss at mpich.org
>> > > > >> > >> > To manage subscription options or unsubscribe:
>> > > > >> > >> > https://lists.mpich.org/mailman/listinfo/discuss
>> > > > >> > >>
>> > > > >> > >> _______________________________________________
>> > > > >> > >> discuss mailing list     discuss at mpich.org
>> > > > >> > >> To manage subscription options or unsubscribe:
>> > > > >> > >> https://lists.mpich.org/mailman/listinfo/discuss
>> > > > >> > >>
>> > > > >> > >> _______________________________________________
>> > > > >> > >> discuss mailing list     discuss at mpich.org
>> > > > >> > >> To manage subscription options or unsubscribe:
>> > > > >> > >> https://lists.mpich.org/mailman/listinfo/discuss
>> > > > >> > > _______________________________________________
>> > > > >> > > discuss mailing list     discuss at mpich.org
>> > > > >> > > To manage subscription options or unsubscribe:
>> > > > >> > > https://lists.mpich.org/mailman/listinfo/discuss
>> > > > >> >
>> > > > >> > --
>> > > > >> > Pavan Balaji
>> > > > >> > http://www.mcs.anl.gov/~balaji
>> > > > >> >
>> > > > >> > _______________________________________________
>> > > > >> > discuss mailing list     discuss at mpich.org
>> > > > >> > To manage subscription options or unsubscribe:
>> > > > >> > https://lists.mpich.org/mailman/listinfo/discuss
>> > > > >> >
>> > > > >> >
>> > > > >> > _______________________________________________
>> > > > >> > discuss mailing list     discuss at mpich.org
>> > > > >> > To manage subscription options or unsubscribe:
>> > > > >> > https://lists.mpich.org/mailman/listinfo/discuss
>> > > > >>
>> > > > >> --
>> > > > >> Pavan Balaji
>> > > > >> http://www.mcs.anl.gov/~balaji
>> > > > >>
>> > > > >> _______________________________________________
>> > > > >> discuss mailing list     discuss at mpich.org
>> > > > >> To manage subscription options or unsubscribe:
>> > > > >> https://lists.mpich.org/mailman/listinfo/discuss
>> > > > >>
>> > > > >>
>> > > > >> _______________________________________________
>> > > > >> discuss mailing list     discuss at mpich.org
>> > > > >> To manage subscription options or unsubscribe:
>> > > > >> https://lists.mpich.org/mailman/listinfo/discuss
>> > > > >
>> > > > > _______________________________________________
>> > > > > discuss mailing list     discuss at mpich.org
>> > > > > To manage subscription options or unsubscribe:
>> > > > > https://lists.mpich.org/mailman/listinfo/discuss
>> > > > >
>> > > > > _______________________________________________
>> > > > > discuss mailing list     discuss at mpich.org
>> > > > > To manage subscription options or unsubscribe:
>> > > > > https://lists.mpich.org/mailman/listinfo/discuss
>> > > >
>> > > > --
>> > > > Pavan Balaji
>> > > > http://www.mcs.anl.gov/~balaji
>> > > >
>> > > > _______________________________________________
>> > > > discuss mailing list     discuss at mpich.org
>> > > > To manage subscription options or unsubscribe:
>> > > > https://lists.mpich.org/mailman/listinfo/discuss
>> > > >
>> > > > _______________________________________________
>> > > > discuss mailing list     discuss at mpich.org
>> > > > To manage subscription options or unsubscribe:
>> > > > https://lists.mpich.org/mailman/listinfo/discuss
>> > >
>> > > --
>> > > Pavan Balaji
>> > > http://www.mcs.anl.gov/~balaji
>> > >
>> > > _______________________________________________
>> > > discuss mailing list     discuss at mpich.org
>> > > To manage subscription options or unsubscribe:
>> > > https://lists.mpich.org/mailman/listinfo/discuss
>> > >
>> > >
>> > >
>> > > <backtrace2.txt>_______________________________________________
>> > > discuss mailing list     discuss at mpich.org
>> > > To manage subscription options or unsubscribe:
>> > > https://lists.mpich.org/mailman/listinfo/discuss
>> >
>> > --
>> > Pavan Balaji
>> > http://www.mcs.anl.gov/~balaji
>> >
>> > _______________________________________________
>> > discuss mailing list     discuss at mpich.org
>> > To manage subscription options or unsubscribe:
>> > https://lists.mpich.org/mailman/listinfo/discuss
>> >
>> > _______________________________________________
>> > discuss mailing list     discuss at mpich.org
>> > To manage subscription options or unsubscribe:
>> > https://lists.mpich.org/mailman/listinfo/discuss
>>
>> --
>> Pavan Balaji
>> http://www.mcs.anl.gov/~balaji
>>
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20131222/88809c1b/attachment.html>


More information about the discuss mailing list