[mpich-discuss] Hybrid HPC system

Min Si msi at anl.gov
Sun Jan 22 14:47:27 CST 2017


Hi Doha,

Can you please send us the config.log file for each MPICH build and your 
helloworld source doe ? The config.log file should be under your MPICH 
build directory where you executed ./configure.

Min
On 1/21/17 4:53 AM, Doha Ehab wrote:
> I have tried what you mentioned in the previous E-mail.
>
> 1- I have build MPICH for CPU node and ARM node.
> 2- Uploaded the binaries on same path on the 2 nodes.
> 3- Compiled helloWorld (it sends a number from process zero to all 
> other processes ) for both nodes. Then tried mpiexec -np 2 -f 
> <hostfile with mic hostnames>./helloworld
>
> I got this error
>  Fatal error in MPI_Recv: Other MPI error, error stack:
> MPI_Recv(200)................................: 
> MPI_Recv(buf=0xbe9460d0, count=1, MPI_INT, src=0, tag=0, 
> MPI_COMM_WORLD, status=0x1) failed
> MPIDI_CH3i_Progress_wait(242)................: an error occurred while 
> handling an event returned by MPIDU_Sock_Wait()
> MPIDI_CH3I_Progress_handle_sock_event(554)...:
> MPIDI_CH3_Sockconn_handle_connopen_event(899): unable to find the 
> process group structure with id <>
>
> Regards,
> Doha
>
>
> On Wed, Nov 16, 2016 at 6:38 PM, Min Si <msi at anl.gov 
> <mailto:msi at anl.gov>> wrote:
>
>     I guess you might need to put all the MPICH binaries (e.g.,
>     hydra_pmi_proxy) to the same path on each node. I have executed
>     MPICH on Intel MIC chips from the host CPU node where OS are
>     different. The thing I did was:
>     1. build MPICH for both CPU node and MIC on the CPU node (you have
>     done this step).
>     2. upload the MIC binaries to the same path on MIC chip as on the
>     CPU node
>        For example:
>        - on CPU node : /tmp/mpich/install/bin holds the CPU version
>        - on MIC :          /tmp/mpich/install/bin holds the MIC version
>     3. compile helloworld.c with the MIC version mpicc
>     4. execute on CPU node: mpiexe -np 2 -f <hostfile with mic
>     hostnames>./helloworld
>
>     I think you should be able to follow step 2, but since your
>     helloworld binary is also built with different OS, you might want
>     to put it also into the same path on two nodes similar as we do
>     for MPICH binaries.
>
>     Min
>
>
>     On 11/16/16 8:29 AM, Kenneth Raffenetti wrote:
>
>         Have you disabled any and all firewalls on both nodes? It
>         sounds like they are unable to communicate in initialization.
>
>         Ken
>
>         On 11/16/2016 07:34 AM, Doha Ehab wrote:
>
>             Yes, I built MPICH-3 on both systems and I tried the code
>             on each node
>             separately and it worked, I tried each node with other
>             nodes that has
>             the same operating system and it worked as well.
>             When I try the code on the 2 nodes that have different
>             operating systems
>             no result or error message appear.
>
>             Regards
>             Doha
>
>             On Mon, Nov 14, 2016 at 6:25 PM, Kenneth Raffenetti
>             <raffenet at mcs.anl.gov <mailto:raffenet at mcs.anl.gov>
>             <mailto:raffenet at mcs.anl.gov
>             <mailto:raffenet at mcs.anl.gov>>> wrote:
>
>                 It may be possible to run in such a setup, but it
>             would not be
>                 recommended. Did you build MPICH on both systems you
>             are trying to
>                 run on? What exactly happened when the code didn't work?
>
>                 Ken
>
>
>                 On 11/13/2016 12:36 AM, Doha Ehab wrote:
>
>                     Hello,
>                      I tried to run a parallel (Hello World) C code on
>             a cluster
>                     that has 2
>                     nodes, the nodes have different operating system
>             so the code did not
>                     work and no results were printed.
>                      How to make such a cluster work? is there is
>             extra steps that
>                     should be
>                     done?
>
>                     Regards,
>                     Doha
>
>
>                     _______________________________________________
>                     discuss mailing list discuss at mpich.org
>             <mailto:discuss at mpich.org>
>                     <mailto:discuss at mpich.org <mailto:discuss at mpich.org>>
>                     To manage subscription options or unsubscribe:
>             https://lists.mpich.org/mailman/listinfo/discuss
>             <https://lists.mpich.org/mailman/listinfo/discuss>
>                     <https://lists.mpich.org/mailman/listinfo/discuss
>             <https://lists.mpich.org/mailman/listinfo/discuss>>
>
>                 _______________________________________________
>                 discuss mailing list discuss at mpich.org
>             <mailto:discuss at mpich.org> <mailto:discuss at mpich.org
>             <mailto:discuss at mpich.org>>
>                 To manage subscription options or unsubscribe:
>             https://lists.mpich.org/mailman/listinfo/discuss
>             <https://lists.mpich.org/mailman/listinfo/discuss>
>                 <https://lists.mpich.org/mailman/listinfo/discuss
>             <https://lists.mpich.org/mailman/listinfo/discuss>>
>
>
>
>
>             _______________________________________________
>             discuss mailing list discuss at mpich.org
>             <mailto:discuss at mpich.org>
>             To manage subscription options or unsubscribe:
>             https://lists.mpich.org/mailman/listinfo/discuss
>             <https://lists.mpich.org/mailman/listinfo/discuss>
>
>         _______________________________________________
>         discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org>
>         To manage subscription options or unsubscribe:
>         https://lists.mpich.org/mailman/listinfo/discuss
>         <https://lists.mpich.org/mailman/listinfo/discuss>
>
>
>     _______________________________________________
>     discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org>
>     To manage subscription options or unsubscribe:
>     https://lists.mpich.org/mailman/listinfo/discuss
>     <https://lists.mpich.org/mailman/listinfo/discuss>
>
>
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170122/eabe1fd4/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list