[mpich-discuss] MPI_init very slow with more than 3 nodes: Solved (in part)

Bixente BODO GOMEZ bixente.bodo at ehu.es
Thu Dec 5 09:24:10 CST 2013


Hi!

The problem is nfs, I don't know by server or clients.
I have installed mpich in all nodes locally and now the system runs well:

mpiu at u105251:~$ date; mpirun -f machinefile -np 28 test/hello; date
jue dic  5 16:08:54 CET 2013
...
Hola desde el procesador u103972. 4 de 28
...
jue dic  5 16:08:57 CET 2013

And

mpiu at u105251:~$ date; mpirun -f machinefile -np 8 test/cilindro; date
jue dic  5 16:17:17 CET 2013

   Calculo de superficie y volumen de un cilindro
     de radio r y altura h
     Numero de procesadores:  8

     Introduce el radio r:  1
              la altura h:  22
                    delta:  1e-8

Calculando.....

     Valor conocido del volumen   : 69.1150383770
     Valor calculado              : 69.1150349775
     Valor conocido del area      : 138.2300767540
     Valor calculado              : 138.2300699549

     Tiempo empleado (sec)        : 0.95427608

jue dic  5 16:17:26 CET 2013


mpiu at u105251:~$ date; mpirun -f machinefile -np 28 test/cilindro; date
jue dic  5 16:17:46 CET 2013

   Calculo de superficie y volumen de un cilindro
     de radio r y altura h
     Numero de procesadores:  28

     Introduce el radio r:  1
              la altura h:  22
                    delta:  1e-8

Calculando.....

     Valor conocido del volumen   : 69.1150383770
     Valor calculado              : 69.1150348402
     Valor conocido del area      : 138.2300767540
     Valor calculado              : 138.2300696804

     Tiempo empleado (sec)        : 0.30431104

jue dic  5 16:17:55 CET 2013

Thanks.

Pavan Balaji <balaji at mcs.anl.gov> escribió:

> One thing you could try is to install mpich on a non-NFS directory  
> such as /tmp and run a program that is also not on NFS to see if the  
> problem persists.
>
> You can build mpich on an NFS directory with --prefix=/tmp/mpich and  
> then do “make install” on each machine.
>
> Similarly for the “application”, you can run “/bin/hostname” or  
> something similar.
>
>   — Pavan
>
> On Dec 3, 2013, at 12:06 PM, Antonio J. Peña <apenya at mcs.anl.gov> wrote:
>
>>
>> Yes, it may be a good idea to look into that. Hopefully somebody in  
>> this list with NFS knowledge will give you a hint.
>>
>>   Antonio
>>
>>
>> On 12/03/2013 09:48 AM, Bixente BODO GOMEZ wrote:
>>> I send a extract of tcpdump.  It seems that the problem is in the  
>>> file attribues o locks of nfs.  I changed fstab on clients to add  
>>> these options:
>>>
>>> nfsvers=3,rw,bg,noac,rsize=8192,wsize=8192
>>>
>>> but nothing has improved.
>>>
>>> "Antonio J. Peña" <apenya at mcs.anl.gov> escribió:
>>>
>>>> I don't think this is helping. Maybe tcpdump / wireshark captures  
>>>> could be helpful.
>>>>
>>>>   Antonio
>>>>
>>>>
>>>> On 12/02/2013 11:44 AM, Bixente BODO GOMEZ wrote:
>>>>> The attachment...
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> discuss mailing list     discuss at mpich.org
>>>>> To manage subscription options or unsubscribe:
>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>
>>>>
>>>> --
>>>> Antonio J. Peña
>>>> Postdoctoral Appointee
>>>> Mathematics and Computer Science Division
>>>> Argonne National Laboratory
>>>> 9700 South Cass Avenue, Bldg. 240, Of. 3148
>>>> Argonne, IL 60439-4847
>>>> (+1) 630-252-7928
>>>> apenya at mcs.anl.gov
>>>> www.mcs.anl.gov/~apenya
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> discuss mailing list
>>> discuss at mpich.org
>>>
>>> To manage subscription options or unsubscribe:
>>>
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>>
>> --
>> Antonio J. Peña
>> Postdoctoral Appointee
>> Mathematics and Computer Science Division
>> Argonne National Laboratory
>> 9700 South Cass Avenue, Bldg. 240, Of. 3148
>> Argonne, IL 60439-4847
>>
>> apenya at mcs.anl.gov
>> www.mcs.anl.gov/~apenya
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss






More information about the discuss mailing list