[mpich-discuss] Issues with shared memory window

Balaji, Pavan balaji at anl.gov
Sat Jul 26 09:46:17 CDT 2014


Jonathan,

There are several errors in the code.

Why is the window malloc(0)’ed?  The window is created by Win_allocate_shared.

You need to do a barrier after all accesses to the window.  Otherwise, another process might free the shared memory while you are accessing it.

You need to have a bunch of Win_sync’s after each process has done its writes and before processes do their reads.

You need to free the Window using MPI_Win_free.

You should free your communicator, otherwise it’s a resource leak.

The argument to be passed for your array should be &arr, not arr, since arr is an OUT parameter.

I’d recommend taking one of the code examples in test/mpi/rma and modifying it to what you need.  That way the code would be standard compliant.

Regards,

  — Pavan

On Jul 25, 2014, at 11:07 PM, Jonathan Blair <qbit at utexas.edu> wrote:

> Hi MPICH users,
> 
> I've been having issues with MPI_Win_allocate_shared(). I believe my use case is compliant with the standard, but I am not ruling out ignorance on my part as the fault.
> 
> In my project, one task allocates the memory to be shared, and the other tasks attach to the shared memory. The allocation function returns MPI_SUCCESS, as do all calls of MPI_Win_shared_query(). The size and displacement unit match expected values. However, the picture of the memory is nonuniform.
> 
> I'm running this on a shared memory system (the communicator is intra-node, currently being tested on a desktop), with MPICH 3.1.2 installed, passing all internal tests during installation.
> 
> I notice that MPI_Free_mem() reports errors and I believe MPI_Finalize() causes a segfault, but I'm not sure if this is specifically related to the issue at hand.
> 
> I have included a minimal test case below. Does anyone have any insight into my problem?
> 
> Thanks for you input,
> Jonathan
> 
> 
> [begin file test.cpp]
> #include <stdlib.h>
> #include <stdio.h>
> #include <mpi.h>
> 
> using namespace std;
> 
> int main(int argc, char *argv[]){
> 
>  int rank;
>  int color = 1;
>  int ierr;
>  int *arr = (int *) malloc( 0 );
>  int disp_unit;
> 
>  MPI_Aint size = 2048;
>  MPI_Aint reportedSize = 0;
>  MPI_Comm comm;
> 
>  ierr = MPI_Init( &argc, &argv );
>  ierr = MPI_Comm_rank( MPI_COMM_WORLD, &rank );
>  ierr = MPI_Comm_split( MPI_COMM_WORLD, color, rank, &comm );
> 
>  MPI_Win *win = (MPI_Win *) malloc( 0 );
> 
>  if (rank == 0){
>    ierr = MPI_Win_allocate_shared( \
>      size, \
>      (int) sizeof(int), \
>      MPI_INFO_NULL, \
>      comm, \
>      (void *) arr, \
>      win );
> 
>    printf( "Rank: %i ierr from MPI_Win_allocate_shared = %i\n", rank, ierr);
> 
>    ierr = MPI_Barrier( comm );
> 
>    for (int i=0; i < size; i++){
>      arr[i] = i;
>    }
> 
>    printf( "Rank: %i arr[0] = %i, arr[1] = %i\n", rank, arr[0], arr[1] );
> 
>    ierr = MPI_Barrier( comm );
>  }
>  else{
>    ierr = MPI_Win_allocate_shared( \
>      (MPI_Aint) 0, \
>      (int) sizeof(int), \
>      MPI_INFO_NULL, \
>      comm, \
>      (void *) arr, \
>      win );
> 
>    printf( "Rank: %i ierr from MPI_Win_allocate_shared = %i\n", rank, ierr);
> 
>    ierr = MPI_Win_shared_query( \
>      *win, \
>      (int) 0, \
>      &reportedSize, \
>      &disp_unit, \
>      (void *) arr );
> 
>    printf( "Rank: %i ierr from MPI_Win_shared_query = %i\n", rank, ierr);
>    printf( "Rank: %i reportedSize = %i\n", rank, (int) reportedSize);
>    printf( "Rank: %i disp_unit = %i\n", rank, disp_unit);
> 
>    ierr = MPI_Barrier( comm );
> 
>    ierr = MPI_Barrier( comm );
>    printf( "Rank: %i arr[0] = %i, arr[1] = %i\n", rank, arr[0], arr[1] );
>  }
> 
>  MPI_Free_mem((void *) win);
>  ierr = MPI_Finalize();
>  return 0;
> }
> [end file test.cpp]
> 
> 
> 
> [begin shell output]
> $ mpirun -n 2 ./test
> Rank: 0 ierr from MPI_Win_allocate_shared = 0
> Rank: 1 ierr from MPI_Win_allocate_shared = 0
> Rank: 1 ierr from MPI_Win_shared_query = 0
> Rank: 1 reportedSize = 2048
> Rank: 1 disp_unit = 4
> Rank: 0 arr[0] = 0, arr[1] = 1
> Rank: 1 arr[0] = 1996775424, arr[1] = 32592
> [0] Block at address 0x000000000093f190 is corrupted; cannot free;
> may be block not allocated with MPL_trmalloc or MALLOC
> called in /path/to/mpich-3.1.2/src/mpid/ch3/src/ch3u_rma_ops.c at line 493
> [1] Block at address 0x0000000000ec8190 is corrupted; cannot free;
> may be block not allocated with MPL_trmalloc or MALLOC
> called in /path/to/mpich-3.1.2/src/mpid/ch3/src/ch3u_rma_ops.c at line 493
> [1] 56 at [0x0000000000eccb78], ich-3.1.2/src/util/wrappers/mpiu_shm_wrappers.h[188]
> [1] 24 at [0x0000000000eccab8], ich-3.1.2/src/util/wrappers/mpiu_shm_wrappers.h[217]
> [1] 56 at [0x0000000000ecc9d8], ich-3.1.2/src/util/wrappers/mpiu_shm_wrappers.h[188]
> [1] 24 at [0x0000000000ecc918], ich-3.1.2/src/util/wrappers/mpiu_shm_wrappers.h[217]
> [1] 8 at [0x0000000000ecc788], src/mpid/ch3/channels/nemesis/src/ch3_win_fns.c[131]
> [1] 8 at [0x0000000000eca3a8], src/mpid/ch3/channels/nemesis/src/ch3_win_fns.c[127]
> [1] 8 at [0x0000000000ec74e8], src/mpid/ch3/channels/nemesis/src/ch3_win_fns.c[123]
> [1] 16 at [0x0000000000ecc6c8], src/mpid/ch3/channels/nemesis/src/ch3_win_fns.c[120]
> [1] 16 at [0x0000000000ecc608], src/mpid/ch3/channels/nemesis/src/ch3_win_fns.c[117]
> [1] 16 at [0x0000000000ecc548], src/mpid/ch3/channels/nemesis/src/ch3_win_fns.c[113]
> [1] 48 at [0x0000000000ecbe98], h/mpich/mpich-3.1.2/src/mpid/ch3/src/mpid_rma.c[301]
> [1] 32 at [0x0000000000ecc478], ch/mpich/mpich-3.1.2/src/mpid/ch3/src/mpid_vc.c[122]
> [1] 8 at [0x0000000000ecc2f8], mpich/mpich-3.1.2/src/util/procmap/local_proc.c[93]
> [1] 8 at [0x0000000000ecc248], mpich/mpich-3.1.2/src/util/procmap/local_proc.c[92]
> [1] 32 at [0x0000000000ecc3a8], ch/mpich/mpich-3.1.2/src/mpid/ch3/src/mpid_vc.c[122]
> [1] 8 at [0x0000000000ecc198], mpich/mpich-3.1.2/src/util/procmap/local_proc.c[93]
> [1] 8 at [0x0000000000ecc0e8], mpich/mpich-3.1.2/src/util/procmap/local_proc.c[92]
> [1] 32 at [0x0000000000ecc018], ch/mpich/mpich-3.1.2/src/mpid/ch3/src/mpid_vc.c[122]
> [1] 504 at [0x0000000000ecafc8], earch/mpich/mpich-3.1.2/src/mpi/comm/commutil.c[281]
> [1] 504 at [0x0000000000ecaa88], earch/mpich/mpich-3.1.2/src/mpi/comm/commutil.c[281]
> 
> ===================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   PID 8822 RUNNING AT Machine
> =   EXIT CODE: 139
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> ===================================================================================
> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
> This typically refers to a problem with your application.
> Please see the FAQ page for debugging suggestions
> [end shell output]
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss

--
Pavan Balaji  ✉️
http://www.mcs.anl.gov/~balaji



More information about the discuss mailing list