[mpich-discuss] MPI memory allocation.

Pavan Balaji balaji at mcs.anl.gov
Sat Dec 7 09:20:33 CST 2013


As much as I hate saying this — some people find it easier to think of it as “MPICH3”.

  — Pavan

On Dec 7, 2013, at 7:37 AM, Wesley Bland <wbland at mcs.anl.gov> wrote:

> MPICH is just the new version of MPICH2. We renamed it when we went past version 3.0. 
> 
> On Dec 7, 2013, at 3:55 AM, Anatoly G <anatolyrishon at gmail.com> wrote:
> 
>> Ok. I"ll try both Issend, and next step to upgrade MPICH to 3.0.4.
>> I thought before that MPICH & MPICH2 are two different branches, when MPICH2 partially supports Fault tolerance, but MPICH not. Now I understand, that I was wrong and MPICH2 is just main version of MPICH.
>> 
>> Thank you very much,
>> Anatoly.
>> 
>> 
>> 
>> On Thu, Dec 5, 2013 at 11:01 PM, Rajeev Thakur <thakur at mcs.anl.gov> wrote:
>> The master is receiving too many incoming messages than it can match quickly enough with Irecvs. Try using MPI_Issend instead of MPI_Isend.
>> 
>> Rajeev
>> 
>> On Dec 5, 2013, at 2:58 AM, Anatoly G <anatolyrishon at gmail.com> wrote:
>> 
>> > Hello.
>> > I"m using MPICH2 1.5.
>> > My system contains master and 16 slaves.
>> > System uses number of communicators.
>> > The single communicator used for below scenario:
>> > Each slave sends non-stop 2Kbyte data buffer using MPI_Isend and waits using MPI_Wait.
>> > Master starts with MPI_Irecv to each slave
>> > Then in endless loop:
>> > MPI_Waitany and MPI_Irecv on rank returned by MPI_Waitany.
>> >
>> > Another communicator used for broadcast communication (commands between master + slaves),
>> > but it's not used in parallel with previous communicator,
>> > only before or after data transfer.
>> >
>> > The system executed on two computers linked by 1Gbit/s Ethernet.
>> > Master executed on first computer, all slaves on other one.
>> > Network traffic is ~800Mbit/s.
>> >
>> > After 1-2 minutes of execution, master process starts to increase it's memory allocation and network traffic becomes low.
>> > This memory allocation & network traffic slow down continues until fail of MPI,
>> > without core file save.
>> > My program doesn't allocate memory. Can you please explain this behaviour.
>> > How can I cause MPI to stop sending slaves if Master can't serve such traffic, instead of memory allocation and fail?
>> >
>> >
>> > Thank you,
>> > Anatoly.
>> >
>> > P.S.
>> > On my stand alone test, I simulate similar behaviour, but with single thread on each process (master & hosts).
>> > When I start stand alone test, master stops slaves until it completes accumulated data processing and MPI doesn't increase memory allocation.
>> > When Master is free slaves continue to send data.
>> > _______________________________________________
>> > discuss mailing list     discuss at mpich.org
>> > To manage subscription options or unsubscribe:
>> > https://lists.mpich.org/mailman/listinfo/discuss
>> 
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>> 
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss

--
Pavan Balaji
http://www.mcs.anl.gov/~balaji




More information about the discuss mailing list