[mpich-discuss] Problem with RMA && MPI_Type_vector when stride value is relatively large (UPD)
Zhao, Xin
xinzhao3 at illinois.edu
Fri Oct 31 09:28:26 CDT 2014
Hi Victor,
The "at_completion_counter" is used to detect the completion of all RMA operations on target side, (Win_fence / Win_complete can return after it reaches zero). It is initialized as the number of expected origins and is decremented when target receives the last operation from one origin. However, the GET-like operations needs to send back data, which may not finish immediately when target receives the GET-like packet, so here we increment that counter when target receives a GET-like packet and decrement it when the sending back process finishes (where your assertion failure happens).
Is it possible for you to reproduce this bug in a simple program? So that I can look at what's going wrong.
Thanks,
Xin
________________________________________
From: Victor Vysotskiy [victor.vysotskiy at teokem.lu.se]
Sent: Friday, October 31, 2014 5:47 AM
To: discuss at mpich.org
Subject: Re: [mpich-discuss] Problem with RMA && MPI_Type_vector when stride value is relatively large (UPD)
Dear Xin,
thank you very much for your efforts! Indeed, the test-bed code works out within your recent fix. However, my real application now is failing with the following error message:
Assertion failed in file src/mpid/ch3/src/ch3u_handle_send_req.c at line 61: win_ptr->at_completion_counter >= 0
internal ABORT - process 10
Do you, by any chance, have any idea what might cause such error? Could you please advise me how to debug this problem?
With best regards,
Victor.
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list