[mpich-discuss] Hybrid HPC system

Doha Ehab dohaehab at gmail.com
Sat Jan 21 04:53:49 CST 2017


I have tried what you mentioned in the previous E-mail.

1- I have build MPICH for CPU node and ARM node.
2- Uploaded the binaries on same path on the 2 nodes.
3- Compiled helloWorld (it sends a number from process zero to all other
processes ) for both nodes. Then tried mpiexec -np 2 -f <hostfile with mic
hostnames>./helloworld

I got this error
 Fatal error in MPI_Recv: Other MPI error, error stack:
MPI_Recv(200)................................: MPI_Recv(buf=0xbe9460d0,
count=1, MPI_INT, src=0, tag=0, MPI_COMM_WORLD, status=0x1) failed
MPIDI_CH3i_Progress_wait(242)................: an error occurred while
handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(554)...:
MPIDI_CH3_Sockconn_handle_connopen_event(899): unable to find the process
group structure with id <>

Regards,
Doha


On Wed, Nov 16, 2016 at 6:38 PM, Min Si <msi at anl.gov> wrote:

> I guess you might need to put all the MPICH binaries (e.g.,
> hydra_pmi_proxy) to the same path on each node. I have executed MPICH on
> Intel MIC chips from the host CPU node where OS are different. The thing I
> did was:
> 1. build MPICH for both CPU node and MIC on the CPU node (you have done
> this step).
> 2. upload the MIC binaries to the same path on MIC chip as on the CPU node
>    For example:
>    - on CPU node : /tmp/mpich/install/bin holds the CPU version
>    - on MIC :          /tmp/mpich/install/bin holds the MIC version
> 3. compile helloworld.c with the MIC version mpicc
> 4. execute on CPU node: mpiexe -np 2 -f <hostfile with mic
> hostnames>./helloworld
>
> I think you should be able to follow step 2, but since your helloworld
> binary is also built with different OS, you might want to put it also into
> the same path on two nodes similar as we do for MPICH binaries.
>
> Min
>
>
> On 11/16/16 8:29 AM, Kenneth Raffenetti wrote:
>
>> Have you disabled any and all firewalls on both nodes? It sounds like
>> they are unable to communicate in initialization.
>>
>> Ken
>>
>> On 11/16/2016 07:34 AM, Doha Ehab wrote:
>>
>>> Yes, I built MPICH-3 on both systems and I tried the code on each node
>>> separately and it worked, I tried each node with other nodes that has
>>> the same operating system and it worked as well.
>>> When I try the code on the 2 nodes that have different operating systems
>>> no result or error message appear.
>>>
>>> Regards
>>> Doha
>>>
>>> On Mon, Nov 14, 2016 at 6:25 PM, Kenneth Raffenetti
>>> <raffenet at mcs.anl.gov <mailto:raffenet at mcs.anl.gov>> wrote:
>>>
>>>     It may be possible to run in such a setup, but it would not be
>>>     recommended. Did you build MPICH on both systems you are trying to
>>>     run on? What exactly happened when the code didn't work?
>>>
>>>     Ken
>>>
>>>
>>>     On 11/13/2016 12:36 AM, Doha Ehab wrote:
>>>
>>>         Hello,
>>>          I tried to run a parallel (Hello World) C code on a cluster
>>>         that has 2
>>>         nodes, the nodes have different operating system so the code did
>>> not
>>>         work and no results were printed.
>>>          How to make such a cluster work? is there is extra steps that
>>>         should be
>>>         done?
>>>
>>>         Regards,
>>>         Doha
>>>
>>>
>>>         _______________________________________________
>>>         discuss mailing list     discuss at mpich.org
>>>         <mailto:discuss at mpich.org>
>>>         To manage subscription options or unsubscribe:
>>>         https://lists.mpich.org/mailman/listinfo/discuss
>>>         <https://lists.mpich.org/mailman/listinfo/discuss>
>>>
>>>     _______________________________________________
>>>     discuss mailing list     discuss at mpich.org <mailto:discuss at mpich.org
>>> >
>>>     To manage subscription options or unsubscribe:
>>>     https://lists.mpich.org/mailman/listinfo/discuss
>>>     <https://lists.mpich.org/mailman/listinfo/discuss>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> discuss mailing list     discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>
>>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170121/c8eb528a/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list