[mpich-discuss] MPICH3 and Nagfor: Corrupts writing/IO?

Kenneth Raffenetti raffenet at mcs.anl.gov
Thu Jan 12 15:46:09 CST 2017


Can you check your compiler flags and make sure they are all necessary? 
I was able to reproduce the error locally with your settings, but a more 
default configuration works fine. I.e.

./configure --prefix=$PWD/i --disable-wrapper-rpath CC=gcc CXX=g++ 
FC=nagfor F77=nagfor FCFLAGS=-mismatch FFLAGS=-mismatch --enable-fortran=all

Ken

On 01/12/2017 10:54 AM, Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND 
APPLICATIONS INC] wrote:
> All,
>
> I've been having some "fun" recently trying to get an MPI stack built
> with nagfor 6.1. I've tried Open MPI and MVAPICH2 and failed to even
> build the MPI stack, SGI MPT doesn't like nagfor, and Intel MPI I'm
> guessing wouldn't either.
>
> So, I figured I'd go for MPICH3. And, lo and behold, building with
> nagfor 6.1 and gcc 5.3 (for CC and CXX) with:
>
>>  ./configure
>> --prefix=$SWDEV/MPI/mpich/3.2/nagfor_6.1-gcc_5.3-nomismatchall \
>      --disable-wrapper-rpath CC=gcc CXX=g++ FC=nagfor F77=nagfor \
>      CFLAGS='-fpic -m64' CXXFLAGS='-fpic -m64' \
>      FCFLAGS='-PIC -abi=64' FFLAGS='-PIC -abi=64 -mismatch' \
>      --enable-fortran=all --enable-cxx
>
> I got something to build. Huzzah!
>
> I then tried the cpi test, it worked! It even detected I was on slurm
> according to mpirun -verbose.
>
> I then tried a simple Fortran 90 Hello world program and...crash:
>
>> (1211)(master) $ cat helloWorld.F90
>> program hello_world
>>
>>    use mpi
>>
>>    implicit none
>>
>>    integer :: comm
>>    integer :: myid, npes, ierror
>>    integer :: name_length
>>
>>    character(len=MPI_MAX_PROCESSOR_NAME) :: processor_name
>>
>>    call mpi_init(ierror)
>>
>>    comm = MPI_COMM_WORLD
>>
>>    call MPI_Comm_Rank(comm,myid,ierror)
>>    call MPI_Comm_Size(comm,npes,ierror)
>>    call MPI_Get_Processor_Name(processor_name,name_length,ierror)
>>
>>    write (*,'(A,1X,I4,1X,A,1X,I4,1X,A,1X,A)') "Process", myid, "of",
>> npes, "is on", trim(processor_name)
>>
>>    call MPI_Finalize(ierror)
>>
>> end program hello_world
>> (1212)(master) $ mpifort -o helloWorld.exe helloWorld.F90
>> NAG Fortran Compiler Release 6.1(Tozai) Build 6113
>> [NAG Fortran Compiler normal termination]
>> (1213)(master) $ mpirun -np 4 ./helloWorld.exe
>> srun.slurm: cluster configuration lacks support for cpu binding
>> Runtime Error: Buffer overflow on output
>> Program terminated by I/O error on unit 6
>> (Output_Unit,Unformatted,Direct)
>> Runtime Error: Buffer overflow on output
>> Program terminated by I/O error on unit 6
>> (Output_Unit,Unformatted,Direct)
>> Runtime Error: Buffer overflow on output
>> Program terminated by I/O error on unit Runtime Error: Buffer overflow
>> on output
>> Program terminated by I/O error on unit 6
>> (Output_Unit,Unformatted,Direct)
>> 6 (Output_Unit,Unformatted,Direct)
>>
>> ===================================================================================
>>
>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> =   PID 7642 RUNNING AT borgl189
>> =   EXIT CODE: 134
>> =   CLEANING UP REMAINING PROCESSES
>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> ===================================================================================
>>
>> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
>> This typically refers to a problem with your application.
>> Please see the FAQ page for debugging suggestions
>
> Weird. So, I decided to try something different, this program:
>
>> program main
>>    implicit none
>>    real :: a
>>    a = 1240.0
>>    write (*,*) "Hello world", a
>> end program main
>
> Looks boring and is standard-compliant and nagfor likes it:
>
>> (1226) $ nagfor test.F90 && ./a.out
>> NAG Fortran Compiler Release 6.1(Tozai) Build 6113
>> [NAG Fortran Compiler normal termination]
>>  Hello world   1.2400000E+03
>
> Looks correct. Now let's try mpifort:
>
>> (1232) $ mpifort test.F90 && ./a.out
>> NAG Fortran Compiler Release 6.1(Tozai) Build 6113
>> [NAG Fortran Compiler normal termination]
>>  Hello world
>> Segmentation fault (core dumped)
>
> You can't really see it here, but that "Hello world" is surrounded by LF
> characters. Like a literal LineFeed...and then it core dumps.
>
> Now let's try running with mpirun as well:
>
>> (1233) $ mpifort test.F90 && mpirun -np 1 ./a.out
>> NAG Fortran Compiler Release 6.1(Tozai) Build 6113
>> [NAG Fortran Compiler normal termination]
>> srun.slurm: cluster configuration lacks support for cpu binding
>>
>> ===================================================================================
>>
>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> =   PID 8520 RUNNING AT borgl189
>> =   EXIT CODE: 139
>> =   CLEANING UP REMAINING PROCESSES
>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> ===================================================================================
>>
>> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault
>> (signal 11)
>> This typically refers to a problem with your application.
>> Please see the FAQ page for debugging suggestions
>
> All righty then.
>
> Does anyone have advice for this? I'll fully accept I configured MPICH3
> wrong as it's the first time in a while I've built MPICH (think MPICH2).
> But, still, I don't have any exciting flags.
>
> Matt
>
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list