[mpich-discuss] How to use non-blocking send/receive without calling MPI_Wait
Lei Shi
lshi at ku.edu
Tue Apr 7 02:39:14 CDT 2015
On Tue, Apr 7, 2015 at 2:37 AM, Lei Shi <leishi at ku.edu> wrote:
> Hi Huiwei and Jeff,
>
> I use hybrid OpenMP/MPI to do overlap communication. So I put all
> communication in one dedicated OpenMP thread and computation in the other
> thread. For this case, I'm using intel MPI library. Probably I did some
> mistakes
>
> One version of my code using one dedicated thread to do messaging is like
> this
>
> /* hybrid mpi/openmp overlap **/template<typename T>void CPR_NS_3D_Solver<T>::UpdateRes(T**q, T**res){
> int thread_id,n_thread;
> int sol_rev_flag=0,grad_rev_flag=0;
>
> // Explicitly disable dynamic teams
> omp_set_dynamic(0);
> // Use 2 threads for all consecutive parallel regions
> omp_set_num_threads(2);
> #pragma omp parallel default(shared) private(thread_id)
> {
> thread_id=omp_get_thread_num();
> n_thread=omp_get_num_threads();
>
> /** communication thread **/
> if(thread_id==1){
> SendInterfaceSol();
> RevInterfaceSol();#pragma omp flush
> sol_rev_flag=1;#pragma omp flush(sol_rev_flag)
> }
>
> /** computation thread **/
> if(thread_id==0){
> ResFromDivInvisFlux(q,res); //local computation
> #pragma omp flush(sol_rev_flag)
> while(sol_rev_flag!=1){ #pragma omp flush(sol_rev_flag)
> }#pragma omp flush
> ResFromFluxCorrection(q,res); //depends on interface sol
> }
> }//end of omp
> }
>
> template<typename T>
>
> void CPR_NS_3D_Solver<T>::SendInterfaceSol(){
> uint *n_if_to_proc=this->grid_->num_iface_proc;
> uint **if_to_proc=this->grid_->snd_iface_proc;
> uint **rev_if_to_f=this->grid_->rev_iface_proc;
>
> int tag=52;
> for(int p2=0;p2<_n_proc;++p2){
> if(p2!=_proc_id){
> int nif=n_if_to_proc[p2];
> //pack data to send ....
>
> }
> }
>
> /** * Exchange interface sol **/
> int n_proc_exchange=0;
> for(int z=0;z<_n_proc;++z){
> int nif=n_if_to_proc[z];
>
> //send data
> if(nif>0){ MPI_Isend(&snd_buf_[z][0],n_buf_[z],MPI_DOUBLE,z,tag, MPI_COMM_WORLD, &s_sol_req_[n_proc_exchange]);
> MPI_Irecv(&rev_buf_[z][0],n_buf_[z],MPI_DOUBLE,z,tag, MPI_COMM_WORLD, &r_sol_req_[n_proc_exchange]);
> n_proc_exchange++;
> }
> }
>
> }
>
> template<typename T>
> void CPR_NS_3D_Solver<T>::RevInterfaceSol(){ uint *n_if_to_proc=this->grid_->num_iface_proc;
> uint **if_to_proc=this->grid_->snd_iface_proc;
> uint **rev_if_to_f=this->grid_->rev_iface_proc;
>
> //wait
> if(n_proc_exchange_>0){ MPI_Waitall(n_proc_exchange_,s_sol_req_,MPI_STATUS_IGNORE); MPI_Waitall(n_proc_exchange_,r_sol_req_,MPI_STATUS_IGNORE); }
>
> /** store to local data structure **/
> for(int z=0;z<_n_proc;++z){
> int nif=n_if_to_proc[z];
>
> if(nif>0){
>
> //unpacking .... }
> }
>
> }
>
>
>
>
>
>
> Sincerely Yours,
>
> Lei Shi
> ---------
>
> On Fri, Apr 3, 2015 at 4:37 PM, Jeff Hammond <jeff.science at gmail.com>
> wrote:
>
>> As far as I know, Ethernet is not good at making asynchronous progress in
>> hardware the way e.g. InfiniBand is. I would have thought that a dedicated
>> progress thread would help, but it seems you tried that. Did you use your
>> own progress thread or MPICH_ASYNC_PROGRESS=1?
>>
>> Jeff
>>
>> On Fri, Apr 3, 2015 at 10:10 AM, Lei Shi <lshi at ku.edu> wrote:
>>
>>> Huiwei,
>>>
>>> Thanks for your email. Your answer leads to my another question about
>>> asynchronous MPI communication.
>>>
>>> I'm trying to do an overlapped communication/computing to speedup my MPI
>>> code. I read some papers comparing some different approaches to do the
>>> overlapped communication. The "naive" overlapped communication
>>> implementation, which only use non-blocking mpi Isend/Irecv and the hybrid
>>> approach using OpenMP and MPI together. In the hybrid approach, a separated
>>> thread is use to do all non-blocking communications. Just exactly as you
>>> said, the results indicate that current MPI implementations do not
>>> support true asynchronous communication.
>>>
>>> If I use the naive approach, my code with non-blocking or blocking
>>> send/recv gives me almost the same performance in term of Wtime. All
>>> communications are postponed to MPI_Wait.
>>>
>>> I have tried calling mpi_test to push library to do communication during
>>> iterations. And try to use a dedicated thread to do communication and the
>>> other thread to do computing only. However, the performance gains are very
>>> small or no gain at all. I'm wondering it is due to the hardware. The
>>> cluster I tested uses 10G Ethernet card.
>>>
>>>
>>> Best,
>>>
>>> Lei Shi
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Apr 3, 2015 at 8:49 AM, Huiwei Lu <huiweilu at mcs.anl.gov> wrote:
>>>
>>>> Hi Lei,
>>>>
>>>> As far as I know, all current MPI implementations do not support true
>>>> asynchronous communication for now. i.e., If there is no MPI calls in your
>>>> iterations, MPICH will not be able to make progress on communication.
>>>>
>>>> One solution is to poll the MPI runtime regularly to make progress by
>>>> inserting MPI_Test to your iteration (even though you do not want to check
>>>> the data).
>>>>
>>>> Another solution is to enable MPI's asynchronous progress thread to
>>>> make progress for you.
>>>>
>>>> --
>>>> Huiwei
>>>>
>>>> On Thu, Apr 2, 2015 at 11:44 PM, Lei Shi <lshi at ku.edu> wrote:
>>>>
>>>>> Hi Junchao,
>>>>>
>>>>> Thanks for your reply. For my case, I don't want to check the data
>>>>> have been received or not. So I don't want to call MPI_Test or any function
>>>>> to verify it. But my problem is like if I ignore calling the MPI_Wait, just
>>>>> call Isend/Irev, my program freezes for several sec and then continues to
>>>>> run. My guess is probably I messed up the MPI library internal buffer by
>>>>> doing this.
>>>>>
>>>>> On Thu, Apr 2, 2015 at 7:25 PM, Junchao Zhang <jczhang at mcs.anl.gov>
>>>>> wrote:
>>>>>
>>>>>> Does MPI_Test fit your needs?
>>>>>>
>>>>>> --Junchao Zhang
>>>>>>
>>>>>> On Thu, Apr 2, 2015 at 7:16 PM, Lei Shi <lshi at ku.edu> wrote:
>>>>>>
>>>>>>> I want to use non-blocking send/rev MPI_Isend/MPI_Irev to do
>>>>>>> communication. But in my case, I don't really care what kind of data I get
>>>>>>> or it is ready to use or not. So I don't want to waste my time to do any
>>>>>>> synchronization by calling MPI_Wait or etc API.
>>>>>>>
>>>>>>> But when I avoid calling MPI_Wait, my program is freezed several
>>>>>>> secs after running some iterations (after multiple MPI_Isend/Irev
>>>>>>> callings), then continues. It takes even more time than the case with
>>>>>>> MPI_Wait. So my question is how to do a "true" non-blocking communication
>>>>>>> without waiting for the data ready or not. Thanks.
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> discuss mailing list discuss at mpich.org
>>>>>>> To manage subscription options or unsubscribe:
>>>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> discuss mailing list discuss at mpich.org
>>>>>> To manage subscription options or unsubscribe:
>>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> discuss mailing list discuss at mpich.org
>>>>> To manage subscription options or unsubscribe:
>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> discuss mailing list discuss at mpich.org
>>>> To manage subscription options or unsubscribe:
>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>
>>>
>>>
>>> _______________________________________________
>>> discuss mailing list discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>
>>
>>
>>
>> --
>> Jeff Hammond
>> jeff.science at gmail.com
>> http://jeffhammond.github.io/
>>
>> _______________________________________________
>> discuss mailing list discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20150407/cfd4f4fa/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list