[mpich-discuss] Hydra fills in the MPIR_proctable[] incorrectly with multiple processes per node

John DelSignore John.DelSignore at roguewave.com
Tue Mar 12 11:32:23 CDT 2013


Hi Pavan,

Thanks very much for your reply and for working on a patch. I tried it again with your suggested "-hosts" option, as follows, and indeed version 1.4.1p1 does work:

Hello from rank 0 of 6, getpid()==21949
Hello from rank 1 of 6, getpid()==21950
Hello from rank 2 of 6, getpid()==21951
Hello from rank 3 of 6, getpid()==21945
Hello from rank 4 of 6, getpid()==21946
Hello from rank 5 of 6, getpid()==21947

mpir_proctable_t::create: extracting hostname/execname/pids for 6 processes
mpir_proctable_t::create: MPIR_proctable[0]: host_name(0x005641c0)="127.0.0.2", executable_name(0x00575640)="./mpichbug", pid=21949
mpir_proctable_t::create: MPIR_proctable[1]: host_name(0x00575620)="127.0.0.2", executable_name(0x005757a0)="./mpichbug", pid=21950
mpir_proctable_t::create: MPIR_proctable[2]: host_name(0x005643a0)="127.0.0.2", executable_name(0x00575740)="./mpichbug", pid=21951
mpir_proctable_t::create: MPIR_proctable[3]: host_name(0x00564110)="127.0.0.3", executable_name(0x00575410)="./mpichbug", pid=21945
mpir_proctable_t::create: MPIR_proctable[4]: host_name(0x00575430)="127.0.0.3", executable_name(0x00575450)="./mpichbug", pid=21946
mpir_proctable_t::create: MPIR_proctable[5]: host_name(0x00575470)="127.0.0.3", executable_name(0x00575490)="./mpichbug", pid=21947

I look forward to your fix for this problem.

On a different topic, the above output shows a potential performance and scalability problem with the way the MPIR_proctable[] is filled out. The MPIR_PROCDESC contain two pointer to null-terminated string members, as described in the MPIR spec:

typedef struct {
  char *host_name;
  char *executable_name;
  int pid;
} MPIR_PROCDESC;

As an "advice to implementors" kind-of suggestion, the MPIR spec says:

"The MPI implementation should share the host and executable name character strings across multiple process descriptor entries whenever possible. For example, if all of the MPI processes are executing “/path/a.out”, then the executable name field in each process descriptor should point to the same null-terminated character string. Sharing the strings enhances the tools scalability by allowing it to cache data from the starter process and avoid reading redundant character strings."

In the output above, we can see that there are exactly three unique strings: "127.0.0.2", "127.0.0.3", and "./mpichbug". However, each string is allocated at a unique address. This arrangement not only causes the MPIR proctable entries to require more space in the MPI starter process, it requires the tool to read each string individually, thus defeating any data-caching the tool might do, and slowing the tool down when reading the MPIR proctable.

If possible, it would be good is MPICH could be changed to share the host and executable name character strings across multiple process descriptor entries.

Cheers, John D.


Pavan Balaji wrote:
> John,
> 
> This does seem like a bug.  Specifically, this is a problem with the
> wrap-around of hosts.  For example, I don't expect this problem to show
> up when you do:
> 
> mpiexec -hosts 127.0.0.2:3,127.0.0.3:3 -n 6 ./mpichbug
> 
> This should only show up when the number of cores is not sufficient in
> the first round and mpiexec has to wrap around to the first host again:
> 
> mpiexec -hosts 127.0.0.2,127.0.0.3 -n 6 ./mpichbug
> 
> I'm working on a patch.  I'll commit it in shortly.
> 
>  -- Pavan
> 
> On 03/11/2013 02:59 PM US Central Time, John DelSignore wrote:
>> Hi,
>>
>> I'm pretty sure this is a MPICH Hydra bug, but I wanted to ask this group before I go through the trouble of figuring out how to file a MPICH bug report, which I think requires me to create an MPICH Trac account, which I don't know how to do.
>>
>> As far as I can tell, Hydra fills in the MPIR_proctable[] incorrectly with multiple processes per node. The index into the MPIR_proctable[] is supposed to be the MPI process's rank in MPI_COMM_WORLD. To demonstrate this problem. I created a simple MPI "hello world" program where each MPI process prints out its rank and pid; I attached it to this email.
>>
>> This is the version of MPICH I am using:
>>
>> shell% /home/mware/argonne/mpich2/1.4.1p1/x86_64-linux/bin/mpirun --version|head -3
>> HYDRA build details:
>>     Version:                                 1.4.1p1
>>     Release Date:                            Thu Sep  1 13:53:02 CDT 2011
>> shell% 
>>
>> I ran the code under TotalView using 2 nodes and 6 processes (3 per node). I enabled logging so that TotalView would output the contents of the MPIR_proctable[] as it extracted it from the mpirun process. Here is the output of the run:
>>
>> shell% tv8cli \
>>   -verbosity errors \
>>   -x15 \
>>   -parallel_stop no \
>>   -debug_file debug.log \
>>   -args \
>>     /home/mware/argonne/mpich2/1.4.1p1/x86_64-linux/bin/mpirun \
>>     -hosts 127.0.0.2,127.0.0.3 \
>>     -n 6 \
>>     ./mpichbug
>> d1.<> dcont
>> Hello from rank 0 of 6, getpid()==2691
>> Hello from rank 1 of 6, getpid()==2729
>> Hello from rank 2 of 6, getpid()==2693
>> Hello from rank 3 of 6, getpid()==2730
>> Hello from rank 4 of 6, getpid()==2694
>> Hello from rank 5 of 6, getpid()==2734
>> d1.<> quit -force
>> shell%
>>
>> Grepping for "proctable" in the debugger's log file shows the contents of the MPIR_proctable[]:
>>
>> shell% grep proctable debug.log
>> mpir_proctable_t::create: extracting hostname/execname/pids for 6 processes
>> mpir_proctable_t::create: MPIR_proctable[0]: host_name(0x0056be20)="127.0.0.2", executable_name(0x0056be80)="./mpichbug", pid=2691
>> mpir_proctable_t::create: MPIR_proctable[1]: host_name(0x0056be00)="127.0.0.2", executable_name(0x0056bde0)="./mpichbug", pid=2693
>> mpir_proctable_t::create: MPIR_proctable[2]: host_name(0x005859a0)="127.0.0.2", executable_name(0x0056c080)="./mpichbug", pid=2694
>> mpir_proctable_t::create: MPIR_proctable[3]: host_name(0x005856b0)="127.0.0.3", executable_name(0x005856d0)="./mpichbug", pid=2729
>> mpir_proctable_t::create: MPIR_proctable[4]: host_name(0x005856f0)="127.0.0.3", executable_name(0x00585710)="./mpichbug", pid=2730
>> mpir_proctable_t::create: MPIR_proctable[5]: host_name(0x00585730)="127.0.0.3", executable_name(0x00585750)="./mpichbug", pid=2734
>> shell% 
>>
>> Matching up the pid values shows that the MPIR_proctable[rank] does not match the rank returned to the program by MPI_Comm_rank() for some of the MPI processes. Here's the MPI rank to MPIR_proctable rank mapping:
>> 0 => 0
>> 1 => 3
>> 2 => 1
>> 3 => 4
>> 4 => 2
>> 5 => 5    
>>
>> Do you agree that this is an MPICH Hydra bug?
>>
>> Any advice on how to create an MPICH Trac account so that I can report the bug?
>>
>> Thanks, John D.
>>
>>
>>
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
> 



More information about the discuss mailing list