[mpich-discuss] MPICH3 and Nagfor: Corrupts writing/IO?

Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] matthew.thompson at nasa.gov
Wed Jan 18 09:53:53 CST 2017


Ken, et al,

I have found one more possible issue with NAG. It seems to be due to the 
use of, in MPICH, statements like:

integer*8
real*8

Why? Well, by default, NAG really wants those to be integer*2 and real*2 
because their default is -kind=sequential (as kind numbers are 
implementation dependent). I'm currently trying to build another version 
of MPICH3 by passing in -kind=byte (4,8,... instead of 1,2...) because I 
can't build any upstream libraries (e.g., HDF5) with -kind=byte if the 
MPI wasn't built that way. Fun!


Now, it might be impossible to change such MPI modules to use 
'selected_real_kind' and 'selected_int_kind' (as is allowed in F90), 
but, it might be good if anyone cared to.

The one I'm more concerned with is seeing "integer*8" in:

  test/mpi/f08/coll/allredopttf08.f90

That is part of the F08 interface. Once you are F08, you could (should?) 
just use INT64, REAL32, etc. like is done in:

   src/binding/fortran/use_mpi_f08/mpi_f08_types.f90

or selected_real/int_kind by default.

Still, no matter what, I'm happy I have a working stack!

Matt

On 01/17/2017 01:18 PM, Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND 
APPLICATIONS INC] wrote:
> Ken,
>
> I just tried your configure (with enable-cxx) and it seemed to work!
> Thanks!
>
> I suppose I should try and figure out what flag caused the corruption
> but it's working. Don't really want to poke the bear right now.
>
> Matt
>
>
> On 01/12/2017 04:46 PM, Kenneth Raffenetti wrote:
>> Can you check your compiler flags and make sure they are all necessary?
>> I was able to reproduce the error locally with your settings, but a more
>> default configuration works fine. I.e.
>>
>> ./configure --prefix=$PWD/i --disable-wrapper-rpath CC=gcc CXX=g++
>> FC=nagfor F77=nagfor FCFLAGS=-mismatch FFLAGS=-mismatch
>> --enable-fortran=all
>>
>> Ken
>>
>> On 01/12/2017 10:54 AM, Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND
>> APPLICATIONS INC] wrote:
>>> All,
>>>
>>> I've been having some "fun" recently trying to get an MPI stack built
>>> with nagfor 6.1. I've tried Open MPI and MVAPICH2 and failed to even
>>> build the MPI stack, SGI MPT doesn't like nagfor, and Intel MPI I'm
>>> guessing wouldn't either.
>>>
>>> So, I figured I'd go for MPICH3. And, lo and behold, building with
>>> nagfor 6.1 and gcc 5.3 (for CC and CXX) with:
>>>
>>>>  ./configure
>>>> --prefix=$SWDEV/MPI/mpich/3.2/nagfor_6.1-gcc_5.3-nomismatchall \
>>>      --disable-wrapper-rpath CC=gcc CXX=g++ FC=nagfor F77=nagfor \
>>>      CFLAGS='-fpic -m64' CXXFLAGS='-fpic -m64' \
>>>      FCFLAGS='-PIC -abi=64' FFLAGS='-PIC -abi=64 -mismatch' \
>>>      --enable-fortran=all --enable-cxx
>>>
>>> I got something to build. Huzzah!
>>>
>>> I then tried the cpi test, it worked! It even detected I was on slurm
>>> according to mpirun -verbose.
>>>
>>> I then tried a simple Fortran 90 Hello world program and...crash:
>>>
>>>> (1211)(master) $ cat helloWorld.F90
>>>> program hello_world
>>>>
>>>>    use mpi
>>>>
>>>>    implicit none
>>>>
>>>>    integer :: comm
>>>>    integer :: myid, npes, ierror
>>>>    integer :: name_length
>>>>
>>>>    character(len=MPI_MAX_PROCESSOR_NAME) :: processor_name
>>>>
>>>>    call mpi_init(ierror)
>>>>
>>>>    comm = MPI_COMM_WORLD
>>>>
>>>>    call MPI_Comm_Rank(comm,myid,ierror)
>>>>    call MPI_Comm_Size(comm,npes,ierror)
>>>>    call MPI_Get_Processor_Name(processor_name,name_length,ierror)
>>>>
>>>>    write (*,'(A,1X,I4,1X,A,1X,I4,1X,A,1X,A)') "Process", myid, "of",
>>>> npes, "is on", trim(processor_name)
>>>>
>>>>    call MPI_Finalize(ierror)
>>>>
>>>> end program hello_world
>>>> (1212)(master) $ mpifort -o helloWorld.exe helloWorld.F90
>>>> NAG Fortran Compiler Release 6.1(Tozai) Build 6113
>>>> [NAG Fortran Compiler normal termination]
>>>> (1213)(master) $ mpirun -np 4 ./helloWorld.exe
>>>> srun.slurm: cluster configuration lacks support for cpu binding
>>>> Runtime Error: Buffer overflow on output
>>>> Program terminated by I/O error on unit 6
>>>> (Output_Unit,Unformatted,Direct)
>>>> Runtime Error: Buffer overflow on output
>>>> Program terminated by I/O error on unit 6
>>>> (Output_Unit,Unformatted,Direct)
>>>> Runtime Error: Buffer overflow on output
>>>> Program terminated by I/O error on unit Runtime Error: Buffer overflow
>>>> on output
>>>> Program terminated by I/O error on unit 6
>>>> (Output_Unit,Unformatted,Direct)
>>>> 6 (Output_Unit,Unformatted,Direct)
>>>>
>>>> ===================================================================================
>>>>
>>>>
>>>>
>>>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>>> =   PID 7642 RUNNING AT borgl189
>>>> =   EXIT CODE: 134
>>>> =   CLEANING UP REMAINING PROCESSES
>>>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>>> ===================================================================================
>>>>
>>>>
>>>>
>>>> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
>>>> This typically refers to a problem with your application.
>>>> Please see the FAQ page for debugging suggestions
>>>
>>> Weird. So, I decided to try something different, this program:
>>>
>>>> program main
>>>>    implicit none
>>>>    real :: a
>>>>    a = 1240.0
>>>>    write (*,*) "Hello world", a
>>>> end program main
>>>
>>> Looks boring and is standard-compliant and nagfor likes it:
>>>
>>>> (1226) $ nagfor test.F90 && ./a.out
>>>> NAG Fortran Compiler Release 6.1(Tozai) Build 6113
>>>> [NAG Fortran Compiler normal termination]
>>>>  Hello world   1.2400000E+03
>>>
>>> Looks correct. Now let's try mpifort:
>>>
>>>> (1232) $ mpifort test.F90 && ./a.out
>>>> NAG Fortran Compiler Release 6.1(Tozai) Build 6113
>>>> [NAG Fortran Compiler normal termination]
>>>>  Hello world
>>>> Segmentation fault (core dumped)
>>>
>>> You can't really see it here, but that "Hello world" is surrounded by LF
>>> characters. Like a literal LineFeed...and then it core dumps.
>>>
>>> Now let's try running with mpirun as well:
>>>
>>>> (1233) $ mpifort test.F90 && mpirun -np 1 ./a.out
>>>> NAG Fortran Compiler Release 6.1(Tozai) Build 6113
>>>> [NAG Fortran Compiler normal termination]
>>>> srun.slurm: cluster configuration lacks support for cpu binding
>>>>
>>>> ===================================================================================
>>>>
>>>>
>>>>
>>>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>>> =   PID 8520 RUNNING AT borgl189
>>>> =   EXIT CODE: 139
>>>> =   CLEANING UP REMAINING PROCESSES
>>>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>>> ===================================================================================
>>>>
>>>>
>>>>
>>>> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault
>>>> (signal 11)
>>>> This typically refers to a problem with your application.
>>>> Please see the FAQ page for debugging suggestions
>>>
>>> All righty then.
>>>
>>> Does anyone have advice for this? I'll fully accept I configured MPICH3
>>> wrong as it's the first time in a while I've built MPICH (think MPICH2).
>>> But, still, I don't have any exciting flags.
>>>
>>> Matt
>>>
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>
>


-- 
Matt Thompson, SSAI, Sr Scientific Programmer/Analyst
NASA GSFC,    Global Modeling and Assimilation Office
Code 610.1,  8800 Greenbelt Rd,  Greenbelt,  MD 20771
Phone: 301-614-6712                 Fax: 301-614-6246
http://science.gsfc.nasa.gov/sed/bio/matthew.thompson
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list