[mpich-discuss] Issues with shared memory window
Balaji, Pavan
balaji at anl.gov
Sat Jul 26 09:46:17 CDT 2014
Jonathan,
There are several errors in the code.
Why is the window malloc(0)’ed? The window is created by Win_allocate_shared.
You need to do a barrier after all accesses to the window. Otherwise, another process might free the shared memory while you are accessing it.
You need to have a bunch of Win_sync’s after each process has done its writes and before processes do their reads.
You need to free the Window using MPI_Win_free.
You should free your communicator, otherwise it’s a resource leak.
The argument to be passed for your array should be &arr, not arr, since arr is an OUT parameter.
I’d recommend taking one of the code examples in test/mpi/rma and modifying it to what you need. That way the code would be standard compliant.
Regards,
— Pavan
On Jul 25, 2014, at 11:07 PM, Jonathan Blair <qbit at utexas.edu> wrote:
> Hi MPICH users,
>
> I've been having issues with MPI_Win_allocate_shared(). I believe my use case is compliant with the standard, but I am not ruling out ignorance on my part as the fault.
>
> In my project, one task allocates the memory to be shared, and the other tasks attach to the shared memory. The allocation function returns MPI_SUCCESS, as do all calls of MPI_Win_shared_query(). The size and displacement unit match expected values. However, the picture of the memory is nonuniform.
>
> I'm running this on a shared memory system (the communicator is intra-node, currently being tested on a desktop), with MPICH 3.1.2 installed, passing all internal tests during installation.
>
> I notice that MPI_Free_mem() reports errors and I believe MPI_Finalize() causes a segfault, but I'm not sure if this is specifically related to the issue at hand.
>
> I have included a minimal test case below. Does anyone have any insight into my problem?
>
> Thanks for you input,
> Jonathan
>
>
> [begin file test.cpp]
> #include <stdlib.h>
> #include <stdio.h>
> #include <mpi.h>
>
> using namespace std;
>
> int main(int argc, char *argv[]){
>
> int rank;
> int color = 1;
> int ierr;
> int *arr = (int *) malloc( 0 );
> int disp_unit;
>
> MPI_Aint size = 2048;
> MPI_Aint reportedSize = 0;
> MPI_Comm comm;
>
> ierr = MPI_Init( &argc, &argv );
> ierr = MPI_Comm_rank( MPI_COMM_WORLD, &rank );
> ierr = MPI_Comm_split( MPI_COMM_WORLD, color, rank, &comm );
>
> MPI_Win *win = (MPI_Win *) malloc( 0 );
>
> if (rank == 0){
> ierr = MPI_Win_allocate_shared( \
> size, \
> (int) sizeof(int), \
> MPI_INFO_NULL, \
> comm, \
> (void *) arr, \
> win );
>
> printf( "Rank: %i ierr from MPI_Win_allocate_shared = %i\n", rank, ierr);
>
> ierr = MPI_Barrier( comm );
>
> for (int i=0; i < size; i++){
> arr[i] = i;
> }
>
> printf( "Rank: %i arr[0] = %i, arr[1] = %i\n", rank, arr[0], arr[1] );
>
> ierr = MPI_Barrier( comm );
> }
> else{
> ierr = MPI_Win_allocate_shared( \
> (MPI_Aint) 0, \
> (int) sizeof(int), \
> MPI_INFO_NULL, \
> comm, \
> (void *) arr, \
> win );
>
> printf( "Rank: %i ierr from MPI_Win_allocate_shared = %i\n", rank, ierr);
>
> ierr = MPI_Win_shared_query( \
> *win, \
> (int) 0, \
> &reportedSize, \
> &disp_unit, \
> (void *) arr );
>
> printf( "Rank: %i ierr from MPI_Win_shared_query = %i\n", rank, ierr);
> printf( "Rank: %i reportedSize = %i\n", rank, (int) reportedSize);
> printf( "Rank: %i disp_unit = %i\n", rank, disp_unit);
>
> ierr = MPI_Barrier( comm );
>
> ierr = MPI_Barrier( comm );
> printf( "Rank: %i arr[0] = %i, arr[1] = %i\n", rank, arr[0], arr[1] );
> }
>
> MPI_Free_mem((void *) win);
> ierr = MPI_Finalize();
> return 0;
> }
> [end file test.cpp]
>
>
>
> [begin shell output]
> $ mpirun -n 2 ./test
> Rank: 0 ierr from MPI_Win_allocate_shared = 0
> Rank: 1 ierr from MPI_Win_allocate_shared = 0
> Rank: 1 ierr from MPI_Win_shared_query = 0
> Rank: 1 reportedSize = 2048
> Rank: 1 disp_unit = 4
> Rank: 0 arr[0] = 0, arr[1] = 1
> Rank: 1 arr[0] = 1996775424, arr[1] = 32592
> [0] Block at address 0x000000000093f190 is corrupted; cannot free;
> may be block not allocated with MPL_trmalloc or MALLOC
> called in /path/to/mpich-3.1.2/src/mpid/ch3/src/ch3u_rma_ops.c at line 493
> [1] Block at address 0x0000000000ec8190 is corrupted; cannot free;
> may be block not allocated with MPL_trmalloc or MALLOC
> called in /path/to/mpich-3.1.2/src/mpid/ch3/src/ch3u_rma_ops.c at line 493
> [1] 56 at [0x0000000000eccb78], ich-3.1.2/src/util/wrappers/mpiu_shm_wrappers.h[188]
> [1] 24 at [0x0000000000eccab8], ich-3.1.2/src/util/wrappers/mpiu_shm_wrappers.h[217]
> [1] 56 at [0x0000000000ecc9d8], ich-3.1.2/src/util/wrappers/mpiu_shm_wrappers.h[188]
> [1] 24 at [0x0000000000ecc918], ich-3.1.2/src/util/wrappers/mpiu_shm_wrappers.h[217]
> [1] 8 at [0x0000000000ecc788], src/mpid/ch3/channels/nemesis/src/ch3_win_fns.c[131]
> [1] 8 at [0x0000000000eca3a8], src/mpid/ch3/channels/nemesis/src/ch3_win_fns.c[127]
> [1] 8 at [0x0000000000ec74e8], src/mpid/ch3/channels/nemesis/src/ch3_win_fns.c[123]
> [1] 16 at [0x0000000000ecc6c8], src/mpid/ch3/channels/nemesis/src/ch3_win_fns.c[120]
> [1] 16 at [0x0000000000ecc608], src/mpid/ch3/channels/nemesis/src/ch3_win_fns.c[117]
> [1] 16 at [0x0000000000ecc548], src/mpid/ch3/channels/nemesis/src/ch3_win_fns.c[113]
> [1] 48 at [0x0000000000ecbe98], h/mpich/mpich-3.1.2/src/mpid/ch3/src/mpid_rma.c[301]
> [1] 32 at [0x0000000000ecc478], ch/mpich/mpich-3.1.2/src/mpid/ch3/src/mpid_vc.c[122]
> [1] 8 at [0x0000000000ecc2f8], mpich/mpich-3.1.2/src/util/procmap/local_proc.c[93]
> [1] 8 at [0x0000000000ecc248], mpich/mpich-3.1.2/src/util/procmap/local_proc.c[92]
> [1] 32 at [0x0000000000ecc3a8], ch/mpich/mpich-3.1.2/src/mpid/ch3/src/mpid_vc.c[122]
> [1] 8 at [0x0000000000ecc198], mpich/mpich-3.1.2/src/util/procmap/local_proc.c[93]
> [1] 8 at [0x0000000000ecc0e8], mpich/mpich-3.1.2/src/util/procmap/local_proc.c[92]
> [1] 32 at [0x0000000000ecc018], ch/mpich/mpich-3.1.2/src/mpid/ch3/src/mpid_vc.c[122]
> [1] 504 at [0x0000000000ecafc8], earch/mpich/mpich-3.1.2/src/mpi/comm/commutil.c[281]
> [1] 504 at [0x0000000000ecaa88], earch/mpich/mpich-3.1.2/src/mpi/comm/commutil.c[281]
>
> ===================================================================================
> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> = PID 8822 RUNNING AT Machine
> = EXIT CODE: 139
> = CLEANING UP REMAINING PROCESSES
> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> ===================================================================================
> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
> This typically refers to a problem with your application.
> Please see the FAQ page for debugging suggestions
> [end shell output]
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
--
Pavan Balaji ✉️
http://www.mcs.anl.gov/~balaji
More information about the discuss
mailing list