[mpich-discuss] -configfile detail

Balaji, Pavan balaji at anl.gov
Mon May 19 12:56:35 CDT 2014


There was a limit which we increased in the past.  Are you using the latest version of mpich (or Hydra)?

  — Pavan

On May 19, 2014, at 12:38 PM, Raeth . Peter <PRaeth at drc.com> wrote:

> 
> Is there a limit to the number of process lines on -configfile? In our case, we are getting a segmentation fault with large numbers but the run goes fine for small numbers. For example: Our scenario has ten nodes. Each node has 16 cores.
> 
> The job starts with:  mpiexec -configfile mpich.txt
> 
> For a small number of process lines, mpich.txt contains:
> 
> -v -launcher ssh -launcher-exec /app/local/pbs_ssh -print-all-exitcodes -prepend-rank -host wputilc-0012 -np 1 ./simpleTest/distrib/simpleTest 0 0 0 22 16 33
> -host wputilc-0012 -np 1 ./simpleTest/distrib/simpleTest 1 1 1 22 16 33
> -host wputilc-0012 -np 1 ./simpleTest/distrib/simpleTest 2 2 2 22 16 33
> -host wputilc-0012 -np 1 ./simpleTest/distrib/simpleTest 3 3 3 22 16 33
> -host wputilc-0012 -np 1 ./simpleTest/distrib/simpleTest 4 4 4 22 16 33
> -host wputilc-0012 -np 1 ./simpleTest/distrib/simpleTest 1 1 1 22 16 33
> -host wputilc-0012 -np 1 ./simpleTest/distrib/simpleTest 2 2 2 22 16 33
> -host wputilc-0012 -np 1 ./simpleTest/distrib/simpleTest 3 3 3 22 16 33
> -host wputilc-0012 -np 1 ./simpleTest/distrib/simpleTest 4 4 4 22 16 33
> -host wputilc-0012 -np 1 ./simpleTest/distrib/simpleTest 4 4 4 22 16 33
> -host wputilc-0012 -np 1 ./simpleTest/distrib/simpleTest 4 4 4 22 16 33
> -host wputilc-0012 -np 1 ./simpleTest/distrib/simpleTest 4 4 4 22 16 33
> -host wputilc-0012 -np 1 ./simpleTest/distrib/simpleTest 4 4 4 22 16 33
> -host wputilc-0012 -np 1 ./simpleTest/distrib/simpleTest 4 4 4 22 16 33
> -host wputilc-0012 -np 1 ./simpleTest/distrib/simpleTest 4 4 4 22 16 33
> -host wputilc-0012 -np 1 ./simpleTest/distrib/simpleTest 4 4 4 22 16 33
> 
> 
> For a large number of process lines, mpich.txt contains more -host lines, 160 total. For each group of 16 processes, a different node is specified so that only 16 processes go on each of 10 nodes. The memory occupied by simpleTest is extremely small. It is this large run that fails with a seqmentation fault prior to any processes actually running.
> 
> The reason for wanting to run this way is that different inputs are needed for each process. Plus, we want to be able to support users who want to physically co-locate processes that spend a lot of time communicating with each other.
> 
> Thank you for whatever insights you can offer.
> 
> 
> Best,
> 
> Peter.
> 
> ________________________________
> This electronic message transmission and any attachments that accompany it contain information from DRC® (Dynamics Research Corporation) or its subsidiaries, or the intended recipient, which is privileged, proprietary, business confidential, or otherwise protected from disclosure and is the exclusive property of DRC and/or the intended recipient. The information in this email is solely intended for the use of the individual or entity that is the intended recipient. If you are not the intended recipient, any use, dissemination, distribution, retention, or copying of this communication, attachments, or substance is prohibited. If you have received this electronic transmission in error, please immediately reply to the author via email that you received the message by mistake and also promptly and permanently delete this message and all copies of this email and any attachments. We thank you for your assistance and apologize for any inconvenience.
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss




More information about the discuss mailing list