[mpich-discuss] Runing mpich over InfiniBand Open Fabrics

Ramiro Alba raq at cttc.upc.edu
Thu Jan 15 12:25:31 CST 2015


Antonio,

Thanks for the info. I'll have a try. All our IB cards are from 
Mellanox, both
DDR and QDR. The only thing is installing the Mellanox OFED stack to get 
mxm
library and modules

Regards

On 2015-01-15 17:21, Antonio J. Peña wrote:
> Hi Ramiro,
> 
>  Our folks that contributed that netmod are looking into this issue.
> In the meantime, I'd suggest trying the MXM netmod. MXM is a netmod
> for IB networks using the MXM API instead of Verbs to interact with
> the HCA.
> 
>  Best,
>    Antonio
> 
>  On 01/15/2015 05:52 AM, Ramiro Alba wrote:
> 
>> Hi all,
>> 
>> I've compiled mpich-3.1.3 on centos 6.5 with the following options:
>> 
>> 
>>         --enable-fortran=yes
>>         --with-device=ch3:nemesis:ib
>>         --with-pm=hydra:gforker
>> 
>> and the package 'libibverbs-devel' installed.
>> 
>> When I try to run a test hello program using two IB DDR  nodes,
>> using the
>> command:
>> 
>> mpiexec.hydra -np 16 -bind-to core -launcher rsh -iface ib0 -hosts
>> jff201,jff202 mpi_hello
>> 
>> I've got the errors bellow, even running using with root user.
>> 
>> If I compile with:
>> 
>> --with-device=ch3:nemesis
>> 
>> it works with no errors.
>> 
>> I am also using both openmpi and mvapich2 on Infiniband and they
>> work fine
>> 
>> Am I doing something wrong when compiling and/or running?
>> Any sugestion is welcomed?
>> 
>> Thanks in advance
>> Regards
>> 
>> 
> ##########################################################################
>> 
>> MPICH OVER IB: RUN ERRORS
>> 
> ##########################################################################
>> 
>> Fatal error in MPI_Init: Other MPI error, error stack:
>> MPIR_Init_thread(498):
>> MPID_Init(177).......: channel initialization failed
>> MPIDI_CH3_Init(89)...:
>> MPID_nem_init(320)...:
>> MPID_nem_ib_init(264): MPID_nem_ib_com_open failed
>> Fatal error in MPI_Init: Other MPI error, error stack:
>> MPIR_Init_thread(498):
>> MPID_Init(177).......: channel initialization failed
>> MPIDI_CH3_Init(89)...:
>> MPID_nem_init(320)...:
>> MPID_nem_ib_init(264): MPID_nem_ib_com_open failed
>> [root at jff201 mpich]# mpirun -np 2 -iface eth0 mpi_hello-mpich
>> IB device not foundFatal error in MPI_Init: Other MPI error, error
>> stack:
>> MPIR_Init_thread(498):
>> MPID_Init(177).......: channel initialization failed
>> MPIDI_CH3_Init(89)...:
>> MPID_nem_init(320)...:
>> MPID_nem_ib_init(264): MPID_nem_ib_com_open failed
>> IB device not foundFatal error in MPI_Init: Other MPI error, error
>> stack:
>> MPIR_Init_thread(498):
>> MPID_Init(177).......: channel initialization failed
>> MPIDI_CH3_Init(89)...:
>> MPID_nem_init(320)...:
>> MPID_nem_ib_init(264): MPID_nem_ib_com_open failed
>> 
> ##########################################################################
>> 
>> 
>> _______________________________________________
>> discuss mailing list discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss [1]
> 
> --
> Antonio J. Peña
> Postdoctoral Appointee
> Mathematics and Computer Science Division
> Argonne National Laboratory
> 9700 South Cass Avenue, Bldg. 240, Of. 3148
> Argonne, IL 60439-4847
> apenya at mcs.anl.gov
> www.mcs.anl.gov/~apenya [2]
> 
> --
> Aquest missatge ha estat analitzat per MAILSCANNER [3]
> a la cerca de virus i d'altres continguts perillosos,
> i es considera que está net.
> 
> Links:
> ------
> [1] https://lists.mpich.org/mailman/listinfo/discuss
> [2] http://www.mcs.anl.gov/~apenya
> [3] http://www.mailscanner.info/
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss

-- 
Ramiro Alba

Centre Tecnològic de Tranferència de Calor
http://www.cttc.upc.edu

Escola Tècnica Superior d'Enginyeries
Industrial i Aeronàutica de Terrassa
Colom 11, E-08222, Terrassa, Barcelona, Spain
Tel: (+34) 93 739 8928


-- 
Aquest missatge ha estat analitzat per MailScanner
a la cerca de virus i d'altres continguts perillosos,
i es considera que est� net.

-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list