[mpich-discuss] Can't receive messages

Matthias Neuer mneuer at web.de
Tue Dec 31 10:11:59 CST 2013


On 12/31/2013 04:34 PM, Pavan Balaji wrote:
>
> Are matze-debian and notebook of the same architecture (x86_64 and running the same OS/configuration)?
>
>    — Pavan

No, notebook is x86_32 on debian stable and matze-debian is x86_64 on 
debian testing, but I compiled the program for each system separately.

Matze

> On Dec 31, 2013, at 9:26 AM, Matthias Neuer <mneuer at web.de> wrote:
>
>> On 12/30/2013 03:23 PM, Rajeev Thakur wrote:
>>> There must be a firewall issue. A process on one machine cannot open a TCP connection on the other machine. Only known services like ssh are working.
>>
>> I don't think I have any firewall filter rule in my configuration.
>> "iptables --list" shows on both machines
>>
>> Chain INPUT (policy ACCEPT)
>> target     prot opt source               destination
>>
>> Chain FORWARD (policy ACCEPT)
>> target     prot opt source               destination
>>
>> Chain OUTPUT (policy ACCEPT)
>> target     prot opt source               destination
>>
>> Do you know a log file I can check to find out if the package was refused?
>>
>> Thanks for your help
>>
>>>
>>> On Dec 30, 2013, at 6:43 AM, Matthias Neuer <mneuer at web.de> wrote:
>>>
>>>> On 12/30/2013 12:12 PM, Rajeev Thakur wrote:
>>>>> There may be some other connectivity issue between the machines. Does the cpi example from the MPICH examples directory run across the two machines?
>>>>
>>>> The cpi-example hangs too. Output:
>>>>
>>>> Process 1 of 4 is on matze-debian
>>>> Process 3 of 4 is on matze-debian
>>>> Process 2 of 4 is on notebook
>>>> Process 0 of 4 is on notebook
>>>>
>>>> Then it hangs.
>>>> Seems like a configuration issue but why are the printf messages transfered correctly?
>>>>
>>>> Matthias
>>>>
>>>>> On Dec 30, 2013, at 3:58 AM, Matthias Neuer <mneuer at web.de> wrote:
>>>>>
>>>>>> Hi.
>>>>>>
>>>>>> Unfortunately, this does not solve the problem.
>>>>>>
>>>>>> Maybe I should have written that the program hangs after the output in my last post. It waits until MPI_Recv returns which does not happen.
>>>>>>
>>>>>> By the way I can login from one machine to the other using ssh, no problem.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> On 12/30/2013 06:39 AM, Rajeev Thakur wrote:
>>>>>>> Try adding an fflush(stdout) after the printf after the recv.
>>>>>>>
>>>>>>> On Dec 29, 2013, at 12:50 PM, Matthias Neuer <mneuer at web.de> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -------- Original Message --------
>>>>>>>> Subject: Can't receive messages
>>>>>>>> Date: Sun, 29 Dec 2013 19:00:05 +0100
>>>>>>>> From: Matthias Neuer <mneuer at web.de>
>>>>>>>> To: discuss at mpich.org
>>>>>>>>
>>>>>>>> Hi.
>>>>>>>>
>>>>>>>> I installed MPICH on 2 machines (called notebook and matze-debian) using
>>>>>>>> the same sourcecode. The output of mpichversion is both times:
>>>>>>>>
>>>>>>>> MPICH Version:          3.0.4
>>>>>>>> MPICH Release date:     Wed Apr 24 10:08:10 CDT 2013
>>>>>>>> MPICH Device:           ch3:nemesis
>>>>>>>> MPICH configure:        --disable-f77 --disable-fc
>>>>>>>> --prefix=/home/matze/mpich-install
>>>>>>>> MPICH CC:       cc    -O2
>>>>>>>> MPICH CXX:      c++   -O2
>>>>>>>> MPICH F77:      no
>>>>>>>> MPICH FC:       no
>>>>>>>>
>>>>>>>> I wrote a small test program which I send as an attachment.
>>>>>>>> On execution on the machine matze-debian I get the following:
>>>>>>>>
>>>>>>>> Process 0 is on matze-debian
>>>>>>>> Process 2 is on matze-debian
>>>>>>>> Process 1 is on notebook
>>>>>>>> received: Hello, from number 2 on matze-debian
>>>>>>>> Process 2 has sent the message
>>>>>>>> Process 1 has sent the message
>>>>>>>>
>>>>>>>> I don't receive the message from process number 1, but the output
>>>>>>>> suggests that the message was succesfully sent. Actually I checked the
>>>>>>>> network traffic using wireshark and matze-debian received a package in
>>>>>>>> which I found the sent message. So somehow the program does not
>>>>>>>> recognize the arrival of the message.
>>>>>>>>
>>>>>>>> When I run the program on a single machine it works.
>>>>>>>>
>>>>>>>> I don't know if this is a programming error or a configuration error.
>>>>>>>>
>>>>>>>> Thanks for your help
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> <ex1.c>_______________________________________________
>>>>>>>> discuss mailing list     discuss at mpich.org
>>>>>>>> To manage subscription options or unsubscribe:
>>>>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> discuss mailing list     discuss at mpich.org
>>>>>>> To manage subscription options or unsubscribe:
>>>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> discuss mailing list     discuss at mpich.org
>>>>>> To manage subscription options or unsubscribe:
>>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>>
>>>>> _______________________________________________
>>>>> discuss mailing list     discuss at mpich.org
>>>>> To manage subscription options or unsubscribe:
>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>>
>>>>
>>>> _______________________________________________
>>>> discuss mailing list     discuss at mpich.org
>>>> To manage subscription options or unsubscribe:
>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>
>>> _______________________________________________
>>> discuss mailing list     discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>
>>
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>




More information about the discuss mailing list