[mpich-discuss] Runing mpich over InfiniBand Open Fabrics
Ramiro Alba
raq at cttc.upc.edu
Thu Jan 15 12:25:31 CST 2015
Antonio,
Thanks for the info. I'll have a try. All our IB cards are from
Mellanox, both
DDR and QDR. The only thing is installing the Mellanox OFED stack to get
mxm
library and modules
Regards
On 2015-01-15 17:21, Antonio J. Peña wrote:
> Hi Ramiro,
>
> Our folks that contributed that netmod are looking into this issue.
> In the meantime, I'd suggest trying the MXM netmod. MXM is a netmod
> for IB networks using the MXM API instead of Verbs to interact with
> the HCA.
>
> Best,
> Antonio
>
> On 01/15/2015 05:52 AM, Ramiro Alba wrote:
>
>> Hi all,
>>
>> I've compiled mpich-3.1.3 on centos 6.5 with the following options:
>>
>>
>> --enable-fortran=yes
>> --with-device=ch3:nemesis:ib
>> --with-pm=hydra:gforker
>>
>> and the package 'libibverbs-devel' installed.
>>
>> When I try to run a test hello program using two IB DDR nodes,
>> using the
>> command:
>>
>> mpiexec.hydra -np 16 -bind-to core -launcher rsh -iface ib0 -hosts
>> jff201,jff202 mpi_hello
>>
>> I've got the errors bellow, even running using with root user.
>>
>> If I compile with:
>>
>> --with-device=ch3:nemesis
>>
>> it works with no errors.
>>
>> I am also using both openmpi and mvapich2 on Infiniband and they
>> work fine
>>
>> Am I doing something wrong when compiling and/or running?
>> Any sugestion is welcomed?
>>
>> Thanks in advance
>> Regards
>>
>>
> ##########################################################################
>>
>> MPICH OVER IB: RUN ERRORS
>>
> ##########################################################################
>>
>> Fatal error in MPI_Init: Other MPI error, error stack:
>> MPIR_Init_thread(498):
>> MPID_Init(177).......: channel initialization failed
>> MPIDI_CH3_Init(89)...:
>> MPID_nem_init(320)...:
>> MPID_nem_ib_init(264): MPID_nem_ib_com_open failed
>> Fatal error in MPI_Init: Other MPI error, error stack:
>> MPIR_Init_thread(498):
>> MPID_Init(177).......: channel initialization failed
>> MPIDI_CH3_Init(89)...:
>> MPID_nem_init(320)...:
>> MPID_nem_ib_init(264): MPID_nem_ib_com_open failed
>> [root at jff201 mpich]# mpirun -np 2 -iface eth0 mpi_hello-mpich
>> IB device not foundFatal error in MPI_Init: Other MPI error, error
>> stack:
>> MPIR_Init_thread(498):
>> MPID_Init(177).......: channel initialization failed
>> MPIDI_CH3_Init(89)...:
>> MPID_nem_init(320)...:
>> MPID_nem_ib_init(264): MPID_nem_ib_com_open failed
>> IB device not foundFatal error in MPI_Init: Other MPI error, error
>> stack:
>> MPIR_Init_thread(498):
>> MPID_Init(177).......: channel initialization failed
>> MPIDI_CH3_Init(89)...:
>> MPID_nem_init(320)...:
>> MPID_nem_ib_init(264): MPID_nem_ib_com_open failed
>>
> ##########################################################################
>>
>>
>> _______________________________________________
>> discuss mailing list discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss [1]
>
> --
> Antonio J. Peña
> Postdoctoral Appointee
> Mathematics and Computer Science Division
> Argonne National Laboratory
> 9700 South Cass Avenue, Bldg. 240, Of. 3148
> Argonne, IL 60439-4847
> apenya at mcs.anl.gov
> www.mcs.anl.gov/~apenya [2]
>
> --
> Aquest missatge ha estat analitzat per MAILSCANNER [3]
> a la cerca de virus i d'altres continguts perillosos,
> i es considera que está net.
>
> Links:
> ------
> [1] https://lists.mpich.org/mailman/listinfo/discuss
> [2] http://www.mcs.anl.gov/~apenya
> [3] http://www.mailscanner.info/
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
--
Ramiro Alba
Centre Tecnològic de Tranferència de Calor
http://www.cttc.upc.edu
Escola Tècnica Superior d'Enginyeries
Industrial i Aeronàutica de Terrassa
Colom 11, E-08222, Terrassa, Barcelona, Spain
Tel: (+34) 93 739 8928
--
Aquest missatge ha estat analitzat per MailScanner
a la cerca de virus i d'altres continguts perillosos,
i es considera que est� net.
-------------- next part --------------
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list