[mpich-discuss] Hybrid HPC system

Min Si msi at anl.gov
Wed Nov 16 10:38:39 CST 2016


I guess you might need to put all the MPICH binaries (e.g., 
hydra_pmi_proxy) to the same path on each node. I have executed MPICH on 
Intel MIC chips from the host CPU node where OS are different. The thing 
I did was:
1. build MPICH for both CPU node and MIC on the CPU node (you have done 
this step).
2. upload the MIC binaries to the same path on MIC chip as on the CPU node
    For example:
    - on CPU node : /tmp/mpich/install/bin holds the CPU version
    - on MIC :          /tmp/mpich/install/bin holds the MIC version
3. compile helloworld.c with the MIC version mpicc
4. execute on CPU node: mpiexe -np 2 -f <hostfile with mic 
hostnames>./helloworld

I think you should be able to follow step 2, but since your helloworld 
binary is also built with different OS, you might want to put it also 
into the same path on two nodes similar as we do for MPICH binaries.

Min

On 11/16/16 8:29 AM, Kenneth Raffenetti wrote:
> Have you disabled any and all firewalls on both nodes? It sounds like 
> they are unable to communicate in initialization.
>
> Ken
>
> On 11/16/2016 07:34 AM, Doha Ehab wrote:
>> Yes, I built MPICH-3 on both systems and I tried the code on each node
>> separately and it worked, I tried each node with other nodes that has
>> the same operating system and it worked as well.
>> When I try the code on the 2 nodes that have different operating systems
>> no result or error message appear.
>>
>> Regards
>> Doha
>>
>> On Mon, Nov 14, 2016 at 6:25 PM, Kenneth Raffenetti
>> <raffenet at mcs.anl.gov <mailto:raffenet at mcs.anl.gov>> wrote:
>>
>>     It may be possible to run in such a setup, but it would not be
>>     recommended. Did you build MPICH on both systems you are trying to
>>     run on? What exactly happened when the code didn't work?
>>
>>     Ken
>>
>>
>>     On 11/13/2016 12:36 AM, Doha Ehab wrote:
>>
>>         Hello,
>>          I tried to run a parallel (Hello World) C code on a cluster
>>         that has 2
>>         nodes, the nodes have different operating system so the code 
>> did not
>>         work and no results were printed.
>>          How to make such a cluster work? is there is extra steps that
>>         should be
>>         done?
>>
>>         Regards,
>>         Doha
>>
>>
>>         _______________________________________________
>>         discuss mailing list     discuss at mpich.org
>>         <mailto:discuss at mpich.org>
>>         To manage subscription options or unsubscribe:
>>         https://lists.mpich.org/mailman/listinfo/discuss
>>         <https://lists.mpich.org/mailman/listinfo/discuss>
>>
>>     _______________________________________________
>>     discuss mailing list     discuss at mpich.org 
>> <mailto:discuss at mpich.org>
>>     To manage subscription options or unsubscribe:
>>     https://lists.mpich.org/mailman/listinfo/discuss
>>     <https://lists.mpich.org/mailman/listinfo/discuss>
>>
>>
>>
>>
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss

_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list