[mpich-discuss] lldb and mpich issue:

Dave Goodell (dgoodell) dgoodell at cisco.com
Wed Apr 9 10:15:40 CDT 2014


Sounds like something worth reporting upstream to LLVM in addition to an MPICH-specific workaround.

-Dave

On Apr 8, 2014, at 8:47 PM, "Balaji, Pavan" <balaji at anl.gov> wrote:

> Drew,
> 
> Thanks for reporting this issue.  I’m able to reproduce this on a mac with lldb.
> 
> On some initial digging it looks like lldb is messing with UNIX sockets created between mpiexec’s proxy and the MPI process.  lldb should be seeing that socket, but it doesn’t seem to hand it off to the application process.  That’s nasty.
> 
> I’ve created a ticket for this:
> 
> https://trac.mpich.org/projects/mpich/ticket/2063
> 
> The simplest solution will be to give an option in mpiexec to avoid the UNIX sockets path completely and use TCP/IP sockets.  We don’t want to do this by default because: (1) it requires the system to allow internal TCP connections and (2) UNIX sockets can be a bit faster in some cases.  We’ll try to get this into mpich soon.
> 
> Thanks,
> 
>  — Pavan
> 
> On Apr 8, 2014, at 8:19 PM, Drew Lewis <drew90 at vt.edu> wrote:
> 
>> Dear MPICH discuss group,  
>> 
>> Sorry if this is a duplicate, I tried to send it before being subscribed, and am not sure if it ever made it to the list. 
>> 
>> I am encountering an issue when trying to use lldb with both mpich 3.1 and 3.0.4.  
>> 
>> The command that generates the error is 
>> 
>> computer:tests ME$ mpirun -np 1 lldb ./a.out
>> Current executable set to './a.out' (x86_64).
>> (lldb) run
>> run
>> Process 67202 launched: './a.out' (x86_64)
>> [cli_0]: write_line error; fd=6 buf=:cmd=init pmi_version=1 pmi_subversion=1
>> :
>> system msg for write_line failure : Bad file descriptor
>> [cli_0]: Unable to write to PMI_fd
>> [cli_0]: write_line error; fd=6 buf=:cmd=get_appnum
>> :
>> system msg for write_line failure : Bad file descriptor
>> Fatal error in MPI_Init: Other MPI error, error stack:
>> MPIR_Init_thread(467):
>> MPID_Init(140).......: channel initialization failed
>> MPID_Init(422).......: PMI_Get_appnum returned -1
>> Process 67202 exited with status = 1 (0x00000001)
>> 
>> mpich 3.1 was configured with 
>> $SRC_DIR/configure \
>> --prefix=$INSTALL_ROOT/$BUILD_DIR \
>> --disable-cxx \
>> --disable-f77 \
>> --disable-fc \
>> CC=clang \
>> CXX=clang++ \
>> 
>> The version of lldb is 
>> lldb-310.2.36
>> 
>> The minimal example that shows the problem is 
>> #include <mpi.h>
>> 
>> int main(int argc, char**argv){
>>  MPI_Init(&argc, &argv);
>>  MPI_Finalize();
>>  return 0;
>> }
>> compiled with mpicc
>> 
>> Since I know it has been an issue in the past the shell I am using is 
>> GNU bash, version 3.2.51(1)-release (x86_64-apple-darwin13)
>> Copyright (C) 2007 Free Software Foundation, Inc.
>> 
>> any help you can offer would be much appreciated.  
>> 
>> Thank you, 
>> -Drew Lewis
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss




More information about the discuss mailing list