[mpich-discuss] MPI memory allocation.

Reuti reuti at staff.uni-marburg.de
Thu Dec 5 04:00:38 CST 2013


Am 05.12.2013 um 09:58 schrieb Anatoly G:

> I"m using MPICH2 1.5.
> My system contains master and 16 slaves. 
> System uses number of communicators.
> The single communicator used for below scenario:
> Each slave sends non-stop 2Kbyte data buffer using MPI_Isend and waits using MPI_Wait.
> Master starts with MPI_Irecv to each slave
> Then in endless loop:
> MPI_Waitany and MPI_Irecv on rank returned by MPI_Waitany.
> Another communicator used for broadcast communication (commands between master + slaves), 
> but it's not used in parallel with previous communicator, 
> only before or after data transfer.
> The system executed on two computers linked by 1Gbit/s Ethernet.
> Master executed on first computer, all slaves on other one.
> Network traffic is ~800Mbit/s.
> After 1-2 minutes of execution, master process starts to increase it's memory allocation and network traffic becomes low.
> This memory allocation & network traffic slow down continues until fail of MPI, 
> without core file save.

This might depend on the ulimit: core file size

> My program doesn't allocate memory. Can you please explain this behaviour.
> How can I cause MPI to stop sending slaves if Master can't serve such traffic, instead of memory allocation and fail?

Can you please try with the latest MPICH 3.0.4 to check whether it behaves in the same way.

-- Reuti

> Thank you,
> Anatoly.
> P.S.
> On my stand alone test, I simulate similar behaviour, but with single thread on each process (master & hosts).
> When I start stand alone test, master stops slaves until it completes accumulated data processing and MPI doesn't increase memory allocation.
> When Master is free slaves continue to send data.
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss

More information about the discuss mailing list