[mpich-discuss] Problem with Mpich3.0.4 build for WRF run across multiple nodes in a cluster.

Teck-Bin Arthur Lim limtba at ihpc.a-star.edu.sg
Tue Jun 21 22:09:17 CDT 2016


Dear Rob, 

Thank you for your pointer.  It was down to simple machinefile syntax error, 
mpich is fine.  I got it working across multiple nodes. 

Arthur 

-----Original Message-----
From: Teck-Bin Arthur Lim [mailto:limtba at ihpc.a-star.edu.sg] 
Sent: Wednesday, June 22, 2016 10:23 AM
To: discuss at mpich.org
Subject: Re: [mpich-discuss] Problem with Mpich3.0.4 build for WRF run across multiple nodes in a cluster.

Dear Rob, 

Thanks, my machinefile and script are simple, as follows: 
*********************************************
[limtba at fdns00_ws testenv-testlib]$ more mpi-script nohup time mpirun -machinefile machinefile -np 8 ./a.out >& log.aout &

[limtba at fdns00_ws testenv-testlib]$ more machinefile fdns00_ib cpu=3 fdns01_ib cpu=2 fdns02_ib cpu=3
*********************************************
Is there some syntax error?  This is only a small test.  
Without machinefile,  mpich works fine on sinlge node. 

Arthur 

-----Original Message-----
From: Rob Latham [mailto:robl at mcs.anl.gov]
Sent: Tuesday, June 21, 2016 11:21 PM
To: discuss at mpich.org
Subject: Re: [mpich-discuss] Problem with Mpich3.0.4 build for WRF run across multiple nodes in a cluster.



On 06/21/2016 05:58 AM, Teck-Bin Arthur Lim wrote:
> Hi ,
>
> I met with a basic problem  trying to get old mpich(3.0.4) version to do
> parallel run across different machines in a mini-cluster.   The error
>   messages
>
> while invoking the mpiruns  are :
>
> *****************error messages********************
>
> [limtba at fdns00_ws testenv-testlib]$ ./mpi-script
>
> [limtba at fdns00_ws testenv-testlib]$ more log.aout
>
> [mpiexec at fdns00_ws] HYDU_process_mfile_token (./utils/args/args.c:299):
> token cpu not supported at this time
>
> [mpiexec at fdns00_ws] HYDU_parse_hostfile (./utils/args/args.c:347):
> unable to process token

what does your machine file look like?  it sounds like you've got something in there that Hydra does not expect.

==rob
>
> [mpiexec at fdns00_ws] mfile_fn (./ui/mpich/utils.c:341): error parsing 
> hostfile
>
> [mpiexec at fdns00_ws] match_arg (./utils/args/args.c:153): match handler 
> returned error
>
> [mpiexec at fdns00_ws] HYDU_parse_array (./utils/args/args.c:175): 
> argument matching returned error
>
> [mpiexec at fdns00_ws] parse_args (./ui/mpich/utils.c:1609): error 
> parsing input array
>
> [mpiexec at fdns00_ws] HYD_uii_mpx_get_parameters
> (./ui/mpich/utils.c:1660): unable to parse user arguments
>
> [mpiexec at fdns00_ws] main (./ui/mpich/mpiexec.c:153): error parsing 
> parameters
>
> Command exited with non-zero status 255
>
> *****************error messages********************
>
> This old version was downloaded from WRF site
> (http://www2.mmm.ucar.edu/wrf/OnLineTutorial/compilation_tutorial.php#
> STEP2)
> , and was built
>
> with essentially, all the default configuration settings without any 
> options arguments given in the configure/make/make-install process, as :
>
> Ø./configure -prefix=$DIR/mpich
>
> Ømake
>
> Ømake install
>
> There are  no error messages during the built process, and the mpirun 
> works fine for parallel runs, using multiple processors, on a single 
> NODE only
>
> but met with the above error messages when attempting parallel run 
> across multiple machines.
>
> I need some advice as to how get this old mpich 3.0.4 working across
> machines.   The OS for these machines are Centos5.5, with gcc4.1.2 and
>
> gcc4.4.7 installations.  As WRF needs gcc4.4 and higher version, I 
> have built the mpich3.04 using gcc4.4.7.
>
> Would appreciate any help and advices.
>
> Many Thanks.
>
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list