[mpich-discuss] Hangup or stall at MPI_INIT
Ryan Crocker
rcrocker at uvm.edu
Thu Dec 6 23:27:47 CST 2012
Hi all,
I'm having a problem with hangups or some sort of stall. Basically the program will keep running, or look as it if is on the cluster i'm running on (i've noticed this issue on multiple clusters actually), But nothing is happening. When i have every processor print out they all do right before MPI_INIT and not after. I counted the files and all the processors are entering the call. What makes it even more odd is that the same exact simulation will run if i decrease the number of nodes/cores i'm using, i.e. i'll run on 96 cores (or more) and moving to 72 my program will run without issue.
I'm using the latest gcc compiler set with the latest version of MPICH2 configured with:
'--with-pbs=/PBS' '--with-default-comm=pmi' '--enable-pbspro-helper' 'CC=gcc' 'LDFLAGS=-lpthread' 'CPPFLAGS=-fpic'
I've changed to the intel/mpich2 compiler set and have not had the same problem. I have no idea what this issue could be and have had little luck finding an answer with stackoverflow or google searches any help would be much appreciated. Also, hit me up for any more information you'd need to help.
Thanks,
My flow solver is written in fortran and here are the subroutines leading up to the MPI_INIT call, and the whole flow solver is compiled with '-O3 -ffree-line-length-none' :
! ====================================== !
program main
call main_init
call simulation_run
call main_stop
end program main
! ====================================== !
subroutine main_init
use string
implicit none
character(len=str_medium) :: input_name
call parallel_init
! Initialize the random number generator
call random_init
! Parse the input file
call parser_init
input_name='input'
call parser_parsefile(input_name)
! Geometry initialization
call geometry_init
! Data initialization
call data_init
! Simulation initialization
call simulation_init
return
end subroutine main_init
! ====================================== !
subroutine parallel_init
use parallel
use parser
implicit none
integer :: ierr
integer :: size_real,size_dp
! Initialize a first basic MPI environment
!##### This is where it stall out #########
call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD,irank,ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD,nproc,ierr)
irank = irank+1
iroot = 1
! Set MPI working precision - WP
call MPI_TYPE_SIZE(MPI_REAL,size_real,ierr)
call MPI_TYPE_SIZE(MPI_DOUBLE_PRECISION,size_dp,ierr)
if (WP .eq. size_real) then
MPI_REAL_WP = MPI_REAL
else if (WP .eq. size_dp) then
MPI_REAL_WP = MPI_DOUBLE_PRECISION
else
call parallel_kill('Error in parallel_init: no WP equivalent in MPI')
end if
! Set MPI single precision
call MPI_TYPE_SIZE(MPI_REAL,size_real,ierr)
call MPI_TYPE_SIZE(MPI_DOUBLE_PRECISION,size_dp,ierr)
if (SP .eq. size_real) then
MPI_REAL_SP = MPI_REAL
else if (SP .eq. size_dp) then
MPI_REAL_SP = MPI_DOUBLE_PRECISION
else
call parallel_kill('Error in parallel_init: no SP equivalent in MPI')
end if
! For now, comm should point to MPI_COMM_WORLD
comm = MPI_COMM_WORLD
return
end subroutine parallel_init
! ============================= !
More information about the discuss
mailing list