[mpich-discuss] Hybrid HPC system
Doha Ehab
dohaehab at gmail.com
Sat Jan 21 04:53:49 CST 2017
I have tried what you mentioned in the previous E-mail.
1- I have build MPICH for CPU node and ARM node.
2- Uploaded the binaries on same path on the 2 nodes.
3- Compiled helloWorld (it sends a number from process zero to all other
processes ) for both nodes. Then tried mpiexec -np 2 -f <hostfile with mic
hostnames>./helloworld
I got this error
Fatal error in MPI_Recv: Other MPI error, error stack:
MPI_Recv(200)................................: MPI_Recv(buf=0xbe9460d0,
count=1, MPI_INT, src=0, tag=0, MPI_COMM_WORLD, status=0x1) failed
MPIDI_CH3i_Progress_wait(242)................: an error occurred while
handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(554)...:
MPIDI_CH3_Sockconn_handle_connopen_event(899): unable to find the process
group structure with id <>
Regards,
Doha
On Wed, Nov 16, 2016 at 6:38 PM, Min Si <msi at anl.gov> wrote:
> I guess you might need to put all the MPICH binaries (e.g.,
> hydra_pmi_proxy) to the same path on each node. I have executed MPICH on
> Intel MIC chips from the host CPU node where OS are different. The thing I
> did was:
> 1. build MPICH for both CPU node and MIC on the CPU node (you have done
> this step).
> 2. upload the MIC binaries to the same path on MIC chip as on the CPU node
> For example:
> - on CPU node : /tmp/mpich/install/bin holds the CPU version
> - on MIC : /tmp/mpich/install/bin holds the MIC version
> 3. compile helloworld.c with the MIC version mpicc
> 4. execute on CPU node: mpiexe -np 2 -f <hostfile with mic
> hostnames>./helloworld
>
> I think you should be able to follow step 2, but since your helloworld
> binary is also built with different OS, you might want to put it also into
> the same path on two nodes similar as we do for MPICH binaries.
>
> Min
>
>
> On 11/16/16 8:29 AM, Kenneth Raffenetti wrote:
>
>> Have you disabled any and all firewalls on both nodes? It sounds like
>> they are unable to communicate in initialization.
>>
>> Ken
>>
>> On 11/16/2016 07:34 AM, Doha Ehab wrote:
>>
>>> Yes, I built MPICH-3 on both systems and I tried the code on each node
>>> separately and it worked, I tried each node with other nodes that has
>>> the same operating system and it worked as well.
>>> When I try the code on the 2 nodes that have different operating systems
>>> no result or error message appear.
>>>
>>> Regards
>>> Doha
>>>
>>> On Mon, Nov 14, 2016 at 6:25 PM, Kenneth Raffenetti
>>> <raffenet at mcs.anl.gov <mailto:raffenet at mcs.anl.gov>> wrote:
>>>
>>> It may be possible to run in such a setup, but it would not be
>>> recommended. Did you build MPICH on both systems you are trying to
>>> run on? What exactly happened when the code didn't work?
>>>
>>> Ken
>>>
>>>
>>> On 11/13/2016 12:36 AM, Doha Ehab wrote:
>>>
>>> Hello,
>>> I tried to run a parallel (Hello World) C code on a cluster
>>> that has 2
>>> nodes, the nodes have different operating system so the code did
>>> not
>>> work and no results were printed.
>>> How to make such a cluster work? is there is extra steps that
>>> should be
>>> done?
>>>
>>> Regards,
>>> Doha
>>>
>>>
>>> _______________________________________________
>>> discuss mailing list discuss at mpich.org
>>> <mailto:discuss at mpich.org>
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>> <https://lists.mpich.org/mailman/listinfo/discuss>
>>>
>>> _______________________________________________
>>> discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org
>>> >
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>> <https://lists.mpich.org/mailman/listinfo/discuss>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> discuss mailing list discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>
>>> _______________________________________________
>> discuss mailing list discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170121/c8eb528a/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list