[mpich-discuss] [Fwd: MPI_File_read_at_all external32 bug in MV2-2.1]

Balaji, Pavan balaji at anl.gov
Fri Nov 6 19:13:08 CST 2015


Adam,

Not yet.  We were working heads-down on the mpich-3.2 release.

I'm adding the mpich discuss list to cc, and poking @robl as well.  Do you have a test program that reproduces the error?  (sorry if you already sent it, I'm still catching up on the pending issues).

  -- Pavan

> On Nov 6, 2015, at 2:10 PM, Adam T. Moody <moody20 at llnl.gov> wrote:
> 
> Hi Pavan,
> Did you have a chance to look into this in MPICH?
> 
> I'm pretty certain there is a lurking ROMIO bug here.
> -Adam
> 
> 
> Adam T. Moody wrote:
> 
>> Hi Pavan and Howard,
>> FYI, it looks this same bug is in MPICH-3.2rc1 and Open MPI-1.10.0.
>> 
>> I guess no one uses collective I/O with external32 :-)
>> -Adam
>> 
>> ------------------------------------------------------------------------
>> 
>> Subject:
>> MPI_File_read_at_all external32 bug in MV2-2.1
>> From:
>> "Adam T. Moody" <moody20 at llnl.gov>
>> Date:
>> Fri, 30 Oct 2015 11:48:35 -0700
>> To:
>> "mvapich-discuss at cse.ohio-state.edu" <mvapich-discuss at cse.ohio-state.edu>
>> 
>> To:
>> "mvapich-discuss at cse.ohio-state.edu" <mvapich-discuss at cse.ohio-state.edu>
>> 
>> 
>> Hello MVAPICH team,
>> I've hit a bug in MPI_File_read_at_all in MVAPICH2-2.1.  I have an application that reads and writes files in external32 format.  It writes the file just fine, but it throws the following error when reading the file back:
>> 
>> internal ABORT - process 0
>> srun: error: rzmerl2: task 0: Exited with exit code 1
>> Assertion failed in file src/mpid/common/datatype/mpid_ext32_segment.c at line 277: FALSE
>> memcpy argument memory ranges overlap, dst_=0x2aaab5c160a8 src_=0x2aaab5c160a8 len_=176
>> 
>> Above, MPI is trying to do a memcpy where the source and destination buffer are the same address.  Looking through the code for MVAPICH2-2.1, the problem seems to be at line 132 in src/mpi/romio/mpi-io/read_all.c:
>> 
>>   if (e32_buf != NULL) {
>>       error_code = MPIU_read_external32_conversion_fn(xbuf, datatype,
>>               count, e32_buf);
>>   ADIOI_Free(e32_buf);
>>   }
>> 
>> I think the fix is to change "xbuf" above to "buf" as it is in read.c below:
>> 
>>   if (e32_buf != NULL) {
>>       error_code = MPIU_read_external32_conversion_fn(buf, datatype,
>>               count, e32_buf);
>>   ADIOI_Free(e32_buf);
>>   }
>> 
>> When in external32 mode, xbuf == e32_buf, which acts as a temporary buffer in which to read the data.  The code is then meant to unpack and convert the data from the temporary buffer into the user buffer at buf.
>> 
>> It's probably worth checking the other external32 code paths to look for similar bugs.
>> Thanks,
>> -Adam
>> 
> 




More information about the discuss mailing list