[mpich-discuss] Can't receive messages

Jeff Hammond jeff.science at gmail.com
Wed Jan 1 16:11:03 CST 2014



Sent from my iPhone

> On Jan 1, 2014, at 3:17 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
> 
> 
> Most often, this is not a problem on “well setup” clusters that have a resource manager installed on them.  While not exclusive, 99% of the time, we see this problem when someone is trying to run mpich between their laptop and some remote server.
> 

It comes up often enough on the list to justify detecting the broken case. 

Jeff 

>  — Pavan
> 
>> On Jan 1, 2014, at 2:00 PM, Reuti <reuti at staff.uni-marburg.de> wrote:
>> 
>>> Am 01.01.2014 um 20:25 schrieb Jeff Hammond:
>>> 
>>> Isn't the process manager infrastructure hetero-safe? At the very least, ssh-ing "uname -a" around the ring of procs can identify the problem on O(1) cost.
>> 
>> As there are clusters without `ssh` availability (i.e. MPICH is running tightly integrated in a queuing system), I would put this in the initial startup call to the slave daemons: `hydra_pmi_proxy` could get an additional parameter about the architecture of the machine where `mpiexec` was issued. It could then return instantly with an error code for something like "refuse to participate".
>> 
>> -- Reuti
>> 
>> 
>>> Jeff
>>> 
>>> Sent from my iPhone
>>> 
>>>> On Jan 1, 2014, at 11:33 AM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
>>>> 
>>>> 
>>>> If we detect what the local architecture is, how will we communicate it to other processes without having some infrastructure that is hetergeneous-architecture safe?
>>>> 
>>>> — Pavan
>>>> 
>>>>> On Jan 1, 2014, at 9:50 AM, Jeff Hammond <jeff.science at gmail.com> wrote:
>>>>> 
>>>>> One wonders if MPICH could verify the system was sufficiently homogeneous to function properly during initialization rather than defer detection to an error during communication calls. 
>>>>> 
>>>>> Jeff
>>>>> 
>>>>> Sent from my iPhone
>>>>> 
>>>>>> On Jan 1, 2014, at 8:07 AM, Matthias Neuer <mneuer at web.de> wrote:
>>>>>> 
>>>>>> Am 31.12.2013 18:44, schrieb Pavan Balaji:
>>>>>>> 
>>>>>>> On Dec 31, 2013, at 10:11 AM, Matthias Neuer <mneuer at web.de> wrote:
>>>>>>> 
>>>>>>>>> On 12/31/2013 04:34 PM, Pavan Balaji wrote:
>>>>>>>>> 
>>>>>>>>> Are matze-debian and notebook of the same architecture (x86_64 and running the same OS/configuration)?
>>>>>>>> 
>>>>>>>> No, notebook is x86_32 on debian stable and matze-debian is x86_64 on debian testing, but I compiled the program for each system separately.
>>>>>>> 
>>>>>>> Such heterogeneous configurations are not supported in mpich.
>>>>>> 
>>>>>> Ok, thats the problem then. I didn't find that information in the FAQ or
>>>>>> in the installation manual.
>>>>>> 
>>>>>> Thanks for your help.
>>>>>> Matthias
>>>>>> _______________________________________________
>>>>>> discuss mailing list     discuss at mpich.org
>>>>>> To manage subscription options or unsubscribe:
>>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>> _______________________________________________
>>>>> discuss mailing list     discuss at mpich.org
>>>>> To manage subscription options or unsubscribe:
>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>> 
>>>> --
>>>> Pavan Balaji
>>>> http://www.mcs.anl.gov/~balaji
>>>> 
>>>> _______________________________________________
>>>> discuss mailing list     discuss at mpich.org
>>>> To manage subscription options or unsubscribe:
>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>> _______________________________________________
>>> discuss mailing list     discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>> 
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
> 
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss



More information about the discuss mailing list