[mpich-discuss] MPI_init very slow with more than 3 nodes: Solved (in part)

Pavan Balaji balaji at mcs.anl.gov
Thu Dec 5 10:11:03 CST 2013


Excellent.  Thanks for letting us know.

  — Pavan

On Dec 5, 2013, at 9:24 AM, Bixente BODO GOMEZ <bixente.bodo at ehu.es> wrote:

> Hi!
> 
> The problem is nfs, I don't know by server or clients.
> I have installed mpich in all nodes locally and now the system runs well:
> 
> mpiu at u105251:~$ date; mpirun -f machinefile -np 28 test/hello; date
> jue dic  5 16:08:54 CET 2013
> ...
> Hola desde el procesador u103972. 4 de 28
> ...
> jue dic  5 16:08:57 CET 2013
> 
> And
> 
> mpiu at u105251:~$ date; mpirun -f machinefile -np 8 test/cilindro; date
> jue dic  5 16:17:17 CET 2013
> 
>  Calculo de superficie y volumen de un cilindro
>    de radio r y altura h
>    Numero de procesadores:  8
> 
>    Introduce el radio r:  1
>             la altura h:  22
>                   delta:  1e-8
> 
> Calculando.....
> 
>    Valor conocido del volumen   : 69.1150383770
>    Valor calculado              : 69.1150349775
>    Valor conocido del area      : 138.2300767540
>    Valor calculado              : 138.2300699549
> 
>    Tiempo empleado (sec)        : 0.95427608
> 
> jue dic  5 16:17:26 CET 2013
> 
> 
> mpiu at u105251:~$ date; mpirun -f machinefile -np 28 test/cilindro; date
> jue dic  5 16:17:46 CET 2013
> 
>  Calculo de superficie y volumen de un cilindro
>    de radio r y altura h
>    Numero de procesadores:  28
> 
>    Introduce el radio r:  1
>             la altura h:  22
>                   delta:  1e-8
> 
> Calculando.....
> 
>    Valor conocido del volumen   : 69.1150383770
>    Valor calculado              : 69.1150348402
>    Valor conocido del area      : 138.2300767540
>    Valor calculado              : 138.2300696804
> 
>    Tiempo empleado (sec)        : 0.30431104
> 
> jue dic  5 16:17:55 CET 2013
> 
> Thanks.
> 
> Pavan Balaji <balaji at mcs.anl.gov> escribió:
> 
>> One thing you could try is to install mpich on a non-NFS directory such as /tmp and run a program that is also not on NFS to see if the problem persists.
>> 
>> You can build mpich on an NFS directory with --prefix=/tmp/mpich and then do “make install” on each machine.
>> 
>> Similarly for the “application”, you can run “/bin/hostname” or something similar.
>> 
>>  — Pavan
>> 
>> On Dec 3, 2013, at 12:06 PM, Antonio J. Peña <apenya at mcs.anl.gov> wrote:
>> 
>>> 
>>> Yes, it may be a good idea to look into that. Hopefully somebody in this list with NFS knowledge will give you a hint.
>>> 
>>>  Antonio
>>> 
>>> 
>>> On 12/03/2013 09:48 AM, Bixente BODO GOMEZ wrote:
>>>> I send a extract of tcpdump.  It seems that the problem is in the file attribues o locks of nfs.  I changed fstab on clients to add these options:
>>>> 
>>>> nfsvers=3,rw,bg,noac,rsize=8192,wsize=8192
>>>> 
>>>> but nothing has improved.
>>>> 
>>>> "Antonio J. Peña" <apenya at mcs.anl.gov> escribió:
>>>> 
>>>>> I don't think this is helping. Maybe tcpdump / wireshark captures could be helpful.
>>>>> 
>>>>>  Antonio
>>>>> 
>>>>> 
>>>>> On 12/02/2013 11:44 AM, Bixente BODO GOMEZ wrote:
>>>>>> The attachment...
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> discuss mailing list     discuss at mpich.org
>>>>>> To manage subscription options or unsubscribe:
>>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>> 
>>>>> 
>>>>> --
>>>>> Antonio J. Peña
>>>>> Postdoctoral Appointee
>>>>> Mathematics and Computer Science Division
>>>>> Argonne National Laboratory
>>>>> 9700 South Cass Avenue, Bldg. 240, Of. 3148
>>>>> Argonne, IL 60439-4847
>>>>> (+1) 630-252-7928
>>>>> apenya at mcs.anl.gov
>>>>> www.mcs.anl.gov/~apenya
>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> discuss mailing list
>>>> discuss at mpich.org
>>>> 
>>>> To manage subscription options or unsubscribe:
>>>> 
>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>> 
>>> 
>>> --
>>> Antonio J. Peña
>>> Postdoctoral Appointee
>>> Mathematics and Computer Science Division
>>> Argonne National Laboratory
>>> 9700 South Cass Avenue, Bldg. 240, Of. 3148
>>> Argonne, IL 60439-4847
>>> 
>>> apenya at mcs.anl.gov
>>> www.mcs.anl.gov/~apenya
>>> _______________________________________________
>>> discuss mailing list     discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>> 
>> --
>> Pavan Balaji
>> http://www.mcs.anl.gov/~balaji
>> 
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
> 
> 
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss

--
Pavan Balaji
http://www.mcs.anl.gov/~balaji




More information about the discuss mailing list