[mpich-discuss] How to use non-blocking send/receive without calling MPI_Wait

Lei Shi lshi at ku.edu
Tue Apr 7 02:48:08 CDT 2015


 Here is my pure MPI overlap version. I use intel traceanalyzer, the
profiling shows that right now, communication only  proceed when I call
mpi_waitall on nodes with 10g network.

/** pure mpi overlap  **/
  template<typename T>
  void CPR_NS_3D_Solver<T>::UpdateRes(T**q, T**res){
    if(_n_proc>1)
      SendInterfaceSol(); //call isend/irecv to send msg 1

    ResFromDivInvisFlux(q,res); //do local jobs

    if(_n_proc>1){
      RevInterfaceSol(); //mpi_waitall for msg 1
      if(vis_mode_)
        SendInterfaceCorrGrad(); //depends on msg 1 then snd msg 2
    }

    if(vis_mode_)
      ResFromDivVisFlux(q,res); //computing, which depends on msg 1

    if(_n_proc>1 && vis_mode_)
      RevInterfaceCorrGrad(); //mpi_waitall for msg 2

    ResFromFluxCorrection(q,res); //computing, which depends on msg 1 and 2
  }


On Tue, Apr 7, 2015 at 2:39 AM, Lei Shi <lshi at ku.edu> wrote:

>
>
> On Tue, Apr 7, 2015 at 2:37 AM, Lei Shi <leishi at ku.edu> wrote:
>
>> Hi Huiwei and Jeff,
>>
>> I use hybrid OpenMP/MPI to do overlap communication. So I put all
>> communication in one dedicated OpenMP thread and computation in the other
>> thread. For this case, I'm using intel MPI library. Probably I did some
>> mistakes
>>
>> One version of my code using one dedicated thread to do messaging is like
>> this
>>
>> /* hybrid mpi/openmp overlap **/template<typename T>void CPR_NS_3D_Solver<T>::UpdateRes(T**q, T**res){
>>   int thread_id,n_thread;
>>   int sol_rev_flag=0,grad_rev_flag=0;
>>
>>   // Explicitly disable dynamic teams
>>   omp_set_dynamic(0);
>>   // Use 2 threads for all consecutive parallel regions
>>   omp_set_num_threads(2);
>>     #pragma omp parallel default(shared) private(thread_id)
>>   {
>>     thread_id=omp_get_thread_num();
>>     n_thread=omp_get_num_threads();
>>
>>     /** communication thread   **/
>>     if(thread_id==1){
>>       SendInterfaceSol();
>>       RevInterfaceSol();#pragma omp flush
>>       sol_rev_flag=1;#pragma omp flush(sol_rev_flag)
>>     }
>>
>>     /** computation thread **/
>>     if(thread_id==0){
>>       ResFromDivInvisFlux(q,res); //local computation
>>         #pragma omp flush(sol_rev_flag)
>>         while(sol_rev_flag!=1){             #pragma omp flush(sol_rev_flag)
>>         }#pragma omp flush
>>         ResFromFluxCorrection(q,res); //depends on interface sol
>>     }
>>   }//end of omp
>>     }
>>
>> template<typename T>
>>
>>   void CPR_NS_3D_Solver<T>::SendInterfaceSol(){
>>     uint *n_if_to_proc=this->grid_->num_iface_proc;
>>     uint **if_to_proc=this->grid_->snd_iface_proc;
>>     uint **rev_if_to_f=this->grid_->rev_iface_proc;
>>
>>     int tag=52;
>>     for(int p2=0;p2<_n_proc;++p2){
>>       if(p2!=_proc_id){
>>         int nif=n_if_to_proc[p2];
>>         //pack data to send ....
>>
>>       }
>>     }
>>
>>     /**      * Exchange interface sol     **/
>>     int n_proc_exchange=0;
>>     for(int z=0;z<_n_proc;++z){
>>       int nif=n_if_to_proc[z];
>>
>>       //send data
>>       if(nif>0){        MPI_Isend(&snd_buf_[z][0],n_buf_[z],MPI_DOUBLE,z,tag, MPI_COMM_WORLD, &s_sol_req_[n_proc_exchange]);
>>         MPI_Irecv(&rev_buf_[z][0],n_buf_[z],MPI_DOUBLE,z,tag, MPI_COMM_WORLD, &r_sol_req_[n_proc_exchange]);
>>         n_proc_exchange++;
>>       }
>>     }
>>
>>   }
>>
>>   template<typename T>
>>   void CPR_NS_3D_Solver<T>::RevInterfaceSol(){    uint *n_if_to_proc=this->grid_->num_iface_proc;
>>     uint **if_to_proc=this->grid_->snd_iface_proc;
>>     uint **rev_if_to_f=this->grid_->rev_iface_proc;
>>
>>     //wait
>>     if(n_proc_exchange_>0){      MPI_Waitall(n_proc_exchange_,s_sol_req_,MPI_STATUS_IGNORE);      MPI_Waitall(n_proc_exchange_,r_sol_req_,MPI_STATUS_IGNORE);    }
>>
>>     /** store to local data structure **/
>>     for(int z=0;z<_n_proc;++z){
>>       int nif=n_if_to_proc[z];
>>
>>       if(nif>0){
>>
>>        //unpacking ....      }
>>     }
>>
>>   }
>>
>>
>>
>>
>>
>>
>> Sincerely Yours,
>>
>> Lei Shi
>> ---------
>>
>> On Fri, Apr 3, 2015 at 4:37 PM, Jeff Hammond <jeff.science at gmail.com>
>> wrote:
>>
>>> As far as I know, Ethernet is not good at making asynchronous progress
>>> in hardware the way e.g. InfiniBand is.  I would have thought that a
>>> dedicated progress thread would help, but it seems you tried that.  Did you
>>> use your own progress thread or MPICH_ASYNC_PROGRESS=1?
>>>
>>> Jeff
>>>
>>> On Fri, Apr 3, 2015 at 10:10 AM, Lei Shi <lshi at ku.edu> wrote:
>>>
>>>> Huiwei,
>>>>
>>>> Thanks for your email. Your answer leads to my another question about
>>>> asynchronous MPI communication.
>>>>
>>>> I'm trying to do an overlapped communication/computing to speedup my
>>>> MPI code. I read some papers comparing some different approaches to do the
>>>> overlapped communication. The "naive" overlapped communication
>>>> implementation, which only use non-blocking mpi Isend/Irecv and the hybrid
>>>> approach using OpenMP and MPI together. In the hybrid approach, a separated
>>>> thread is use to do all non-blocking communications. Just exactly as you
>>>> said, the results indicate that current MPI implementations do not
>>>> support true asynchronous communication.
>>>>
>>>> If I use the naive approach, my code with non-blocking or blocking
>>>> send/recv gives me almost the same performance in term of Wtime. All
>>>> communications are postponed to MPI_Wait.
>>>>
>>>> I have tried calling mpi_test to push library to do communication
>>>> during iterations. And try to use a dedicated thread to do communication
>>>> and the other thread to do computing only. However, the performance gains
>>>> are very small or no gain at all. I'm wondering it is due to the hardware.
>>>> The cluster I tested uses 10G Ethernet card.
>>>>
>>>>
>>>> Best,
>>>>
>>>> Lei Shi
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Apr 3, 2015 at 8:49 AM, Huiwei Lu <huiweilu at mcs.anl.gov> wrote:
>>>>
>>>>> Hi Lei,
>>>>>
>>>>> As far as I know, all current MPI implementations do not support true
>>>>> asynchronous communication for now. i.e., If there is no MPI calls in your
>>>>> iterations, MPICH will not be able to make progress on communication.
>>>>>
>>>>> One solution is to poll the MPI runtime regularly to make progress by
>>>>> inserting MPI_Test to your iteration (even though you do not want to check
>>>>> the data).
>>>>>
>>>>> Another solution is to enable MPI's asynchronous progress thread to
>>>>> make progress for you.
>>>>>
>>>>> --
>>>>> Huiwei
>>>>>
>>>>> On Thu, Apr 2, 2015 at 11:44 PM, Lei Shi <lshi at ku.edu> wrote:
>>>>>
>>>>>> Hi Junchao,
>>>>>>
>>>>>> Thanks for your reply. For my case, I don't want to check the data
>>>>>> have been received or not. So I don't want to call MPI_Test or any function
>>>>>> to verify it. But my problem is like if I ignore calling the MPI_Wait, just
>>>>>> call Isend/Irev, my program freezes for several sec and then continues to
>>>>>> run. My guess is probably I messed up the MPI library internal buffer by
>>>>>> doing this.
>>>>>>
>>>>>> On Thu, Apr 2, 2015 at 7:25 PM, Junchao Zhang <jczhang at mcs.anl.gov>
>>>>>> wrote:
>>>>>>
>>>>>>> Does MPI_Test fit your needs?
>>>>>>>
>>>>>>> --Junchao Zhang
>>>>>>>
>>>>>>> On Thu, Apr 2, 2015 at 7:16 PM, Lei Shi <lshi at ku.edu> wrote:
>>>>>>>
>>>>>>>> I want to use non-blocking send/rev MPI_Isend/MPI_Irev to do
>>>>>>>> communication. But in my case, I don't really care what kind of data I get
>>>>>>>> or it is ready to use or not. So I don't want to waste my time to do any
>>>>>>>> synchronization  by calling MPI_Wait or etc API.
>>>>>>>>
>>>>>>>> But when I avoid calling MPI_Wait, my program is freezed several
>>>>>>>> secs after running some iterations (after multiple MPI_Isend/Irev
>>>>>>>> callings), then continues. It takes even more time than the case with
>>>>>>>> MPI_Wait.  So my question is how to do a "true" non-blocking communication
>>>>>>>> without waiting for the data ready or not. Thanks.
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> discuss mailing list     discuss at mpich.org
>>>>>>>> To manage subscription options or unsubscribe:
>>>>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> discuss mailing list     discuss at mpich.org
>>>>>>> To manage subscription options or unsubscribe:
>>>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> discuss mailing list     discuss at mpich.org
>>>>>> To manage subscription options or unsubscribe:
>>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> discuss mailing list     discuss at mpich.org
>>>>> To manage subscription options or unsubscribe:
>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> discuss mailing list     discuss at mpich.org
>>>> To manage subscription options or unsubscribe:
>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>
>>>
>>>
>>>
>>> --
>>> Jeff Hammond
>>> jeff.science at gmail.com
>>> http://jeffhammond.github.io/
>>>
>>> _______________________________________________
>>> discuss mailing list     discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20150407/b95edbff/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list