[mpich-discuss] Maximum number of communicators
Zhou, Hui
zhouh at anl.gov
Wed Oct 9 14:52:33 CDT 2024
Hi Bruce,
Could you create a github issue for this?
Hui
________________________________
From: Palmer, Bruce J <Bruce.Palmer at pnnl.gov>
Sent: Wednesday, October 9, 2024 1:37 PM
To: discuss at mpich.org <discuss at mpich.org>; Zhou, Hui <zhouh at anl.gov>
Subject: Re: Maximum number of communicators
I may have spoken too soon. This fixed the issue on one SMP node using 24 processes. When I increase the number of nodes to 4 with 96 processes, I can’t even get past MPI_Init. I’m seeing the following in standard out ===================================================================================
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
ZjQcmQRYFpfptBannerEnd
I may have spoken too soon. This fixed the issue on one SMP node using 24 processes. When I increase the number of nodes to 4 with 96 processes, I can’t even get past MPI_Init. I’m seeing the following in standard out
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 483326 RUNNING AT j006
= EXIT CODE: 9
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions
I get the following in standard error:
Abort(613541135) on node 65: Fatal error in internal_Init: Other MPI error, error stack:
internal_Init(49162).............: MPI_Init(argc=0x7ffff3c5646c, argv=0x7ffff3c56460) failed
MPII_Init_thread(265)............:
MPIR_init_comm_world(34).........:
MPIR_Comm_commit(823)............:
MPID_Comm_commit_post_hook(222)..:
MPIDI_world_post_init(660).......:
MPIDI_OFI_init_vcis(842).........:
check_num_nics(891)..............:
MPIR_Allreduce_allcomm_auto(4726):
MPIC_Sendrecv(302)...............:
MPID_Isend(63)...................:
MPIDI_isend(35)..................:
MPIDI_OFI_send_fallback(549).....: OFI call tsendv failed (ofi_send.h:549:MPIDI_OFI_send_fallback:No such file or dir
ectory)
Bruce
From: Palmer, Bruce J via discuss <discuss at mpich.org>
Date: Monday, October 7, 2024 at 1:14 PM
To: Zhou, Hui <zhouh at anl.gov>, discuss at mpich.org <discuss at mpich.org>
Cc: Palmer, Bruce J <Bruce.Palmer at pnnl.gov>
Subject: Re: [mpich-discuss] Maximum number of communicators
Check twice before you click! This email originated from outside PNNL.
That seemed to fix the issue. Thanks! Bruce From: Zhou, Hui <zhouh@ anl. gov> Date: Thursday, October 3, 2024 at 11: 01 AM To: discuss@ mpich. org <discuss@ mpich. org> Cc: Palmer, Bruce J <Bruce. Palmer@ pnnl. gov> Subject: Re: Maximum
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
ZjQcmQRYFpfptBannerEnd
That seemed to fix the issue.
Thanks!
Bruce
From: Zhou, Hui <zhouh at anl.gov>
Date: Thursday, October 3, 2024 at 11:01 AM
To: discuss at mpich.org <discuss at mpich.org>
Cc: Palmer, Bruce J <Bruce.Palmer at pnnl.gov>
Subject: Re: Maximum number of communicators
Hi Bruce,
Try configure mpich using --with-device=ch4:ofi --enable-extended-context-bits.
Hui
________________________________
From: Palmer, Bruce J via discuss <discuss at mpich.org>
Sent: Thursday, October 3, 2024 11:44 AM
To: discuss at mpich.org <discuss at mpich.org>
Cc: Palmer, Bruce J <Bruce.Palmer at pnnl.gov>
Subject: [mpich-discuss] Maximum number of communicators
Hi, I’m looking at using MPI RMA to support sparse data structures in Global Arrays. I’ve got an application that uses a large number of sparse arrays and it is failing when the number of sparse arrays reaches about 500. Each sparse array is
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
ZjQcmQRYFpfptBannerEnd
Hi,
I’m looking at using MPI RMA to support sparse data structures in Global Arrays. I’ve got an application that uses a large number of sparse arrays and it is failing when the number of sparse arrays reaches about 500. Each sparse array is built on top of 4 conventional global arrays and each global array uses one MPI Window. Each Window appears to be creating its own communicator and I’m hitting an internal limit at 2048 communicators. Is there a way to increase the number of communicators?
Bruce Palmer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20241009/38a6259f/attachment-0001.html>
More information about the discuss
mailing list