[mpich-discuss] Problem with Mpich3.0.4 build for WRF run across multiple nodes in a cluster.
Teck-Bin Arthur Lim
limtba at ihpc.a-star.edu.sg
Tue Jun 21 21:23:01 CDT 2016
Dear Rob,
Thanks, my machinefile and script are simple, as follows:
*********************************************
[limtba at fdns00_ws testenv-testlib]$ more mpi-script
nohup time mpirun -machinefile machinefile -np 8 ./a.out >& log.aout &
[limtba at fdns00_ws testenv-testlib]$ more machinefile
fdns00_ib cpu=3
fdns01_ib cpu=2
fdns02_ib cpu=3
*********************************************
Is there some syntax error? This is only a small test.
Without machinefile, mpich works fine on sinlge node.
Arthur
-----Original Message-----
From: Rob Latham [mailto:robl at mcs.anl.gov]
Sent: Tuesday, June 21, 2016 11:21 PM
To: discuss at mpich.org
Subject: Re: [mpich-discuss] Problem with Mpich3.0.4 build for WRF run across multiple nodes in a cluster.
On 06/21/2016 05:58 AM, Teck-Bin Arthur Lim wrote:
> Hi ,
>
> I met with a basic problem trying to get old mpich(3.0.4) version to do
> parallel run across different machines in a mini-cluster. The error
> messages
>
> while invoking the mpiruns are :
>
> *****************error messages********************
>
> [limtba at fdns00_ws testenv-testlib]$ ./mpi-script
>
> [limtba at fdns00_ws testenv-testlib]$ more log.aout
>
> [mpiexec at fdns00_ws] HYDU_process_mfile_token (./utils/args/args.c:299):
> token cpu not supported at this time
>
> [mpiexec at fdns00_ws] HYDU_parse_hostfile (./utils/args/args.c:347):
> unable to process token
what does your machine file look like? it sounds like you've got something in there that Hydra does not expect.
==rob
>
> [mpiexec at fdns00_ws] mfile_fn (./ui/mpich/utils.c:341): error parsing
> hostfile
>
> [mpiexec at fdns00_ws] match_arg (./utils/args/args.c:153): match handler
> returned error
>
> [mpiexec at fdns00_ws] HYDU_parse_array (./utils/args/args.c:175):
> argument matching returned error
>
> [mpiexec at fdns00_ws] parse_args (./ui/mpich/utils.c:1609): error
> parsing input array
>
> [mpiexec at fdns00_ws] HYD_uii_mpx_get_parameters
> (./ui/mpich/utils.c:1660): unable to parse user arguments
>
> [mpiexec at fdns00_ws] main (./ui/mpich/mpiexec.c:153): error parsing
> parameters
>
> Command exited with non-zero status 255
>
> *****************error messages********************
>
> This old version was downloaded from WRF site
> (http://www2.mmm.ucar.edu/wrf/OnLineTutorial/compilation_tutorial.php#
> STEP2)
> , and was built
>
> with essentially, all the default configuration settings without any
> options arguments given in the configure/make/make-install process, as :
>
> Ø./configure -prefix=$DIR/mpich
>
> Ømake
>
> Ømake install
>
> There are no error messages during the built process, and the mpirun
> works fine for parallel runs, using multiple processors, on a single
> NODE only
>
> but met with the above error messages when attempting parallel run
> across multiple machines.
>
> I need some advice as to how get this old mpich 3.0.4 working across
> machines. The OS for these machines are Centos5.5, with gcc4.1.2 and
>
> gcc4.4.7 installations. As WRF needs gcc4.4 and higher version, I
> have built the mpich3.04 using gcc4.4.7.
>
> Would appreciate any help and advices.
>
> Many Thanks.
>
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list