[mpich-discuss] MPI_init very slow with more than 3 nodes

Bixente BODO GOMEZ bixente.bodo at ehu.es
Tue Dec 3 09:48:34 CST 2013


I send a extract of tcpdump.  It seems that the problem is in the file  
attribues o locks of nfs.  I changed fstab on clients to add these  
options:

nfsvers=3,rw,bg,noac,rsize=8192,wsize=8192

but nothing has improved.

"Antonio J. Peña" <apenya at mcs.anl.gov> escribió:

> I don't think this is helping. Maybe tcpdump / wireshark captures  
> could be helpful.
>
>   Antonio
>
>
> On 12/02/2013 11:44 AM, Bixente BODO GOMEZ wrote:
>> The attachment...
>>
>>
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>
>
> -- 
> Antonio J. Peña
> Postdoctoral Appointee
> Mathematics and Computer Science Division
> Argonne National Laboratory
> 9700 South Cass Avenue, Bldg. 240, Of. 3148
> Argonne, IL 60439-4847
> (+1) 630-252-7928
> apenya at mcs.anl.gov
> www.mcs.anl.gov/~apenya


-------------- next part --------------
6:37:34.648679 IP u103972.165588456 > u105251.nfs: 232 getattr fh 0,0/22
16:37:34.670903 IP u105251.nfs > u103950.903: Flags [.], ack 1510177, win 7250, options [nop,nop,TS val 1884415 ecr 1806213], length 0
16:37:34.681582 IP u105251.nfs > u103950.3170553679: reply ok 76 getattr NON 3 ids 0/24 sz 0
16:37:34.681613 IP u105251.nfs > u103972.165588456: reply ok 52 getattr ERROR: unk 10011
16:37:34.682380 IP u103950.3187330895 > u105251.nfs: 240 getattr fh 0,0/22
16:37:34.682402 IP u103972.182365672 > u105251.nfs: 172 getattr fh 0,0/35
16:37:34.682426 IP u105251.nfs > u103972.182365672: reply ok 60 getattr NON 1 ids 0/1004313938 sz -1482030592
16:37:34.682438 IP u105251.nfs > u103950.903: Flags [.], ack 1510421, win 7250, options [nop,nop,TS val 1884417 ecr 1806226], length 0
16:37:34.682488 IP u105251.nfs > u103950.3187330895: reply ok 316 getattr NON 5 ids 0/18 sz 0
16:37:34.683187 IP u103972.199142888 > u105251.nfs: 116 getattr fh 0,0/36
16:37:34.683208 IP u103950.3204108111 > u105251.nfs: 156 getattr fh 0,0/22
16:37:34.698442 IP u105251.nfs > u103950.3204108111: reply ok 68 getattr NON 2 ids 0/20 sz 0
16:37:34.699259 IP u103950.3220885327 > u105251.nfs: 232 getattr fh 0,0/22
16:37:34.722904 IP u105251.nfs > u103972.961: Flags [.], ack 1592445, win 6925, options [nop,nop,TS val 1884428 ecr 1801168], length 0
16:37:34.732194 IP u105251.nfs > u103972.199142888: reply ok 76 getattr NON 3 ids 0/24 sz 0
16:37:34.732211 IP u105251.nfs > u103950.3220885327: reply ok 52 getattr ERROR: unk 10011
16:37:34.732995 IP u103972.215920104 > u105251.nfs: 240 getattr fh 0,0/22
16:37:34.733017 IP u103950.3237662543 > u105251.nfs: 172 getattr fh 0,0/35
16:37:34.733040 IP u105251.nfs > u103950.3237662543: reply ok 60 getattr NON 1 ids 0/1004313938 sz -1465253376
16:37:34.733051 IP u105251.nfs > u103972.961: Flags [.], ack 1592689, win 6925, options [nop,nop,TS val 1884430 ecr 1801180], length 0



More information about the discuss mailing list