[mpich-discuss] Hybrid HPC system
Min Si
msi at anl.gov
Sun Jan 22 14:47:27 CST 2017
Hi Doha,
Can you please send us the config.log file for each MPICH build and your
helloworld source doe ? The config.log file should be under your MPICH
build directory where you executed ./configure.
Min
On 1/21/17 4:53 AM, Doha Ehab wrote:
> I have tried what you mentioned in the previous E-mail.
>
> 1- I have build MPICH for CPU node and ARM node.
> 2- Uploaded the binaries on same path on the 2 nodes.
> 3- Compiled helloWorld (it sends a number from process zero to all
> other processes ) for both nodes. Then tried mpiexec -np 2 -f
> <hostfile with mic hostnames>./helloworld
>
> I got this error
> Fatal error in MPI_Recv: Other MPI error, error stack:
> MPI_Recv(200)................................:
> MPI_Recv(buf=0xbe9460d0, count=1, MPI_INT, src=0, tag=0,
> MPI_COMM_WORLD, status=0x1) failed
> MPIDI_CH3i_Progress_wait(242)................: an error occurred while
> handling an event returned by MPIDU_Sock_Wait()
> MPIDI_CH3I_Progress_handle_sock_event(554)...:
> MPIDI_CH3_Sockconn_handle_connopen_event(899): unable to find the
> process group structure with id <>
>
> Regards,
> Doha
>
>
> On Wed, Nov 16, 2016 at 6:38 PM, Min Si <msi at anl.gov
> <mailto:msi at anl.gov>> wrote:
>
> I guess you might need to put all the MPICH binaries (e.g.,
> hydra_pmi_proxy) to the same path on each node. I have executed
> MPICH on Intel MIC chips from the host CPU node where OS are
> different. The thing I did was:
> 1. build MPICH for both CPU node and MIC on the CPU node (you have
> done this step).
> 2. upload the MIC binaries to the same path on MIC chip as on the
> CPU node
> For example:
> - on CPU node : /tmp/mpich/install/bin holds the CPU version
> - on MIC : /tmp/mpich/install/bin holds the MIC version
> 3. compile helloworld.c with the MIC version mpicc
> 4. execute on CPU node: mpiexe -np 2 -f <hostfile with mic
> hostnames>./helloworld
>
> I think you should be able to follow step 2, but since your
> helloworld binary is also built with different OS, you might want
> to put it also into the same path on two nodes similar as we do
> for MPICH binaries.
>
> Min
>
>
> On 11/16/16 8:29 AM, Kenneth Raffenetti wrote:
>
> Have you disabled any and all firewalls on both nodes? It
> sounds like they are unable to communicate in initialization.
>
> Ken
>
> On 11/16/2016 07:34 AM, Doha Ehab wrote:
>
> Yes, I built MPICH-3 on both systems and I tried the code
> on each node
> separately and it worked, I tried each node with other
> nodes that has
> the same operating system and it worked as well.
> When I try the code on the 2 nodes that have different
> operating systems
> no result or error message appear.
>
> Regards
> Doha
>
> On Mon, Nov 14, 2016 at 6:25 PM, Kenneth Raffenetti
> <raffenet at mcs.anl.gov <mailto:raffenet at mcs.anl.gov>
> <mailto:raffenet at mcs.anl.gov
> <mailto:raffenet at mcs.anl.gov>>> wrote:
>
> It may be possible to run in such a setup, but it
> would not be
> recommended. Did you build MPICH on both systems you
> are trying to
> run on? What exactly happened when the code didn't work?
>
> Ken
>
>
> On 11/13/2016 12:36 AM, Doha Ehab wrote:
>
> Hello,
> I tried to run a parallel (Hello World) C code on
> a cluster
> that has 2
> nodes, the nodes have different operating system
> so the code did not
> work and no results were printed.
> How to make such a cluster work? is there is
> extra steps that
> should be
> done?
>
> Regards,
> Doha
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> <mailto:discuss at mpich.org>
> <mailto:discuss at mpich.org <mailto:discuss at mpich.org>>
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> <https://lists.mpich.org/mailman/listinfo/discuss>
> <https://lists.mpich.org/mailman/listinfo/discuss
> <https://lists.mpich.org/mailman/listinfo/discuss>>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> <mailto:discuss at mpich.org> <mailto:discuss at mpich.org
> <mailto:discuss at mpich.org>>
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> <https://lists.mpich.org/mailman/listinfo/discuss>
> <https://lists.mpich.org/mailman/listinfo/discuss
> <https://lists.mpich.org/mailman/listinfo/discuss>>
>
>
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> <mailto:discuss at mpich.org>
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> <https://lists.mpich.org/mailman/listinfo/discuss>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org>
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> <https://lists.mpich.org/mailman/listinfo/discuss>
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org>
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> <https://lists.mpich.org/mailman/listinfo/discuss>
>
>
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170122/eabe1fd4/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list