[mpich-discuss] Hybrid HPC system

Doha Ehab dohaehab at gmail.com
Mon Jan 23 10:46:27 CST 2017


Hi Min,
 I have attached the two Config.log. and here is the code

#include <stdio.h>
#include <mpi.h>

int main (argc, argv)
     int argc;
     char *argv[];
{

int i=0;
 MPI_Init (&argc, &argv); /* starts MPI */
// Find out rank, size
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);

int number;
if (world_rank == 0) {

    number = -1;
for( i=1; i < world_size; i++){

    MPI_Send(&number, 1, MPI_INT, i, 0, MPI_COMM_WORLD);
}
}
else  {
    MPI_Recv(&number, 1, MPI_INT, 0, 0, MPI_COMM_WORLD,
             MPI_STATUS_IGNORE);
    printf("Process %d received number %d from process 0\n",world_rank,
number);
}
MPI_Finalize();

  return 0;
}

Regards,
Doha

On Sun, Jan 22, 2017 at 10:47 PM, Min Si <msi at anl.gov> wrote:

> Hi Doha,
>
> Can you please send us the config.log file for each MPICH build and your
> helloworld source doe ? The config.log file should be under your MPICH
> build directory where you executed ./configure.
>
> Min
>
> On 1/21/17 4:53 AM, Doha Ehab wrote:
>
> I have tried what you mentioned in the previous E-mail.
>
> 1- I have build MPICH for CPU node and ARM node.
> 2- Uploaded the binaries on same path on the 2 nodes.
> 3- Compiled helloWorld (it sends a number from process zero to all other
> processes ) for both nodes. Then tried mpiexec -np 2 -f <hostfile with
> mic hostnames>./helloworld
>
> I got this error
>  Fatal error in MPI_Recv: Other MPI error, error stack:
> MPI_Recv(200)................................: MPI_Recv(buf=0xbe9460d0,
> count=1, MPI_INT, src=0, tag=0, MPI_COMM_WORLD, status=0x1) failed
> MPIDI_CH3i_Progress_wait(242)................: an error occurred while
> handling an event returned by MPIDU_Sock_Wait()
> MPIDI_CH3I_Progress_handle_sock_event(554)...:
> MPIDI_CH3_Sockconn_handle_connopen_event(899): unable to find the process
> group structure with id <>
>
> Regards,
> Doha
>
>
> On Wed, Nov 16, 2016 at 6:38 PM, Min Si <msi at anl.gov> wrote:
>
>> I guess you might need to put all the MPICH binaries (e.g.,
>> hydra_pmi_proxy) to the same path on each node. I have executed MPICH on
>> Intel MIC chips from the host CPU node where OS are different. The thing I
>> did was:
>> 1. build MPICH for both CPU node and MIC on the CPU node (you have done
>> this step).
>> 2. upload the MIC binaries to the same path on MIC chip as on the CPU node
>>    For example:
>>    - on CPU node : /tmp/mpich/install/bin holds the CPU version
>>    - on MIC :          /tmp/mpich/install/bin holds the MIC version
>> 3. compile helloworld.c with the MIC version mpicc
>> 4. execute on CPU node: mpiexe -np 2 -f <hostfile with mic
>> hostnames>./helloworld
>>
>> I think you should be able to follow step 2, but since your helloworld
>> binary is also built with different OS, you might want to put it also into
>> the same path on two nodes similar as we do for MPICH binaries.
>>
>> Min
>>
>>
>> On 11/16/16 8:29 AM, Kenneth Raffenetti wrote:
>>
>>> Have you disabled any and all firewalls on both nodes? It sounds like
>>> they are unable to communicate in initialization.
>>>
>>> Ken
>>>
>>> On 11/16/2016 07:34 AM, Doha Ehab wrote:
>>>
>>>> Yes, I built MPICH-3 on both systems and I tried the code on each node
>>>> separately and it worked, I tried each node with other nodes that has
>>>> the same operating system and it worked as well.
>>>> When I try the code on the 2 nodes that have different operating systems
>>>> no result or error message appear.
>>>>
>>>> Regards
>>>> Doha
>>>>
>>>> On Mon, Nov 14, 2016 at 6:25 PM, Kenneth Raffenetti
>>>> <raffenet at mcs.anl.gov <mailto:raffenet at mcs.anl.gov>> wrote:
>>>>
>>>>     It may be possible to run in such a setup, but it would not be
>>>>     recommended. Did you build MPICH on both systems you are trying to
>>>>     run on? What exactly happened when the code didn't work?
>>>>
>>>>     Ken
>>>>
>>>>
>>>>     On 11/13/2016 12:36 AM, Doha Ehab wrote:
>>>>
>>>>         Hello,
>>>>          I tried to run a parallel (Hello World) C code on a cluster
>>>>         that has 2
>>>>         nodes, the nodes have different operating system so the code
>>>> did not
>>>>         work and no results were printed.
>>>>          How to make such a cluster work? is there is extra steps that
>>>>         should be
>>>>         done?
>>>>
>>>>         Regards,
>>>>         Doha
>>>>
>>>>
>>>>         _______________________________________________
>>>>         discuss mailing list     discuss at mpich.org
>>>>         <mailto:discuss at mpich.org>
>>>>         To manage subscription options or unsubscribe:
>>>>         https://lists.mpich.org/mailman/listinfo/discuss
>>>>         <https://lists.mpich.org/mailman/listinfo/discuss>
>>>>
>>>>     _______________________________________________
>>>>     discuss mailing list     discuss at mpich.org <mailto:
>>>> discuss at mpich.org>
>>>>     To manage subscription options or unsubscribe:
>>>>     https://lists.mpich.org/mailman/listinfo/discuss
>>>>     <https://lists.mpich.org/mailman/listinfo/discuss>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> discuss mailing list     discuss at mpich.org
>>>> To manage subscription options or unsubscribe:
>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>
>>>> _______________________________________________
>>> discuss mailing list     discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>
>>
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:https://lists.mpich.org/mailman/listinfo/discuss
>
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170123/dd5e30d0/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: config.log
Type: text/x-log
Size: 396252 bytes
Desc: not available
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170123/dd5e30d0/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: config.log
Type: text/x-log
Size: 397938 bytes
Desc: not available
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170123/dd5e30d0/attachment-0001.bin>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list