[mpich-discuss] How to use non-blocking send/receive without calling MPI_Wait
Lei Shi
lshi at ku.edu
Tue Apr 7 02:48:08 CDT 2015
Here is my pure MPI overlap version. I use intel traceanalyzer, the
profiling shows that right now, communication only proceed when I call
mpi_waitall on nodes with 10g network.
/** pure mpi overlap **/
template<typename T>
void CPR_NS_3D_Solver<T>::UpdateRes(T**q, T**res){
if(_n_proc>1)
SendInterfaceSol(); //call isend/irecv to send msg 1
ResFromDivInvisFlux(q,res); //do local jobs
if(_n_proc>1){
RevInterfaceSol(); //mpi_waitall for msg 1
if(vis_mode_)
SendInterfaceCorrGrad(); //depends on msg 1 then snd msg 2
}
if(vis_mode_)
ResFromDivVisFlux(q,res); //computing, which depends on msg 1
if(_n_proc>1 && vis_mode_)
RevInterfaceCorrGrad(); //mpi_waitall for msg 2
ResFromFluxCorrection(q,res); //computing, which depends on msg 1 and 2
}
On Tue, Apr 7, 2015 at 2:39 AM, Lei Shi <lshi at ku.edu> wrote:
>
>
> On Tue, Apr 7, 2015 at 2:37 AM, Lei Shi <leishi at ku.edu> wrote:
>
>> Hi Huiwei and Jeff,
>>
>> I use hybrid OpenMP/MPI to do overlap communication. So I put all
>> communication in one dedicated OpenMP thread and computation in the other
>> thread. For this case, I'm using intel MPI library. Probably I did some
>> mistakes
>>
>> One version of my code using one dedicated thread to do messaging is like
>> this
>>
>> /* hybrid mpi/openmp overlap **/template<typename T>void CPR_NS_3D_Solver<T>::UpdateRes(T**q, T**res){
>> int thread_id,n_thread;
>> int sol_rev_flag=0,grad_rev_flag=0;
>>
>> // Explicitly disable dynamic teams
>> omp_set_dynamic(0);
>> // Use 2 threads for all consecutive parallel regions
>> omp_set_num_threads(2);
>> #pragma omp parallel default(shared) private(thread_id)
>> {
>> thread_id=omp_get_thread_num();
>> n_thread=omp_get_num_threads();
>>
>> /** communication thread **/
>> if(thread_id==1){
>> SendInterfaceSol();
>> RevInterfaceSol();#pragma omp flush
>> sol_rev_flag=1;#pragma omp flush(sol_rev_flag)
>> }
>>
>> /** computation thread **/
>> if(thread_id==0){
>> ResFromDivInvisFlux(q,res); //local computation
>> #pragma omp flush(sol_rev_flag)
>> while(sol_rev_flag!=1){ #pragma omp flush(sol_rev_flag)
>> }#pragma omp flush
>> ResFromFluxCorrection(q,res); //depends on interface sol
>> }
>> }//end of omp
>> }
>>
>> template<typename T>
>>
>> void CPR_NS_3D_Solver<T>::SendInterfaceSol(){
>> uint *n_if_to_proc=this->grid_->num_iface_proc;
>> uint **if_to_proc=this->grid_->snd_iface_proc;
>> uint **rev_if_to_f=this->grid_->rev_iface_proc;
>>
>> int tag=52;
>> for(int p2=0;p2<_n_proc;++p2){
>> if(p2!=_proc_id){
>> int nif=n_if_to_proc[p2];
>> //pack data to send ....
>>
>> }
>> }
>>
>> /** * Exchange interface sol **/
>> int n_proc_exchange=0;
>> for(int z=0;z<_n_proc;++z){
>> int nif=n_if_to_proc[z];
>>
>> //send data
>> if(nif>0){ MPI_Isend(&snd_buf_[z][0],n_buf_[z],MPI_DOUBLE,z,tag, MPI_COMM_WORLD, &s_sol_req_[n_proc_exchange]);
>> MPI_Irecv(&rev_buf_[z][0],n_buf_[z],MPI_DOUBLE,z,tag, MPI_COMM_WORLD, &r_sol_req_[n_proc_exchange]);
>> n_proc_exchange++;
>> }
>> }
>>
>> }
>>
>> template<typename T>
>> void CPR_NS_3D_Solver<T>::RevInterfaceSol(){ uint *n_if_to_proc=this->grid_->num_iface_proc;
>> uint **if_to_proc=this->grid_->snd_iface_proc;
>> uint **rev_if_to_f=this->grid_->rev_iface_proc;
>>
>> //wait
>> if(n_proc_exchange_>0){ MPI_Waitall(n_proc_exchange_,s_sol_req_,MPI_STATUS_IGNORE); MPI_Waitall(n_proc_exchange_,r_sol_req_,MPI_STATUS_IGNORE); }
>>
>> /** store to local data structure **/
>> for(int z=0;z<_n_proc;++z){
>> int nif=n_if_to_proc[z];
>>
>> if(nif>0){
>>
>> //unpacking .... }
>> }
>>
>> }
>>
>>
>>
>>
>>
>>
>> Sincerely Yours,
>>
>> Lei Shi
>> ---------
>>
>> On Fri, Apr 3, 2015 at 4:37 PM, Jeff Hammond <jeff.science at gmail.com>
>> wrote:
>>
>>> As far as I know, Ethernet is not good at making asynchronous progress
>>> in hardware the way e.g. InfiniBand is. I would have thought that a
>>> dedicated progress thread would help, but it seems you tried that. Did you
>>> use your own progress thread or MPICH_ASYNC_PROGRESS=1?
>>>
>>> Jeff
>>>
>>> On Fri, Apr 3, 2015 at 10:10 AM, Lei Shi <lshi at ku.edu> wrote:
>>>
>>>> Huiwei,
>>>>
>>>> Thanks for your email. Your answer leads to my another question about
>>>> asynchronous MPI communication.
>>>>
>>>> I'm trying to do an overlapped communication/computing to speedup my
>>>> MPI code. I read some papers comparing some different approaches to do the
>>>> overlapped communication. The "naive" overlapped communication
>>>> implementation, which only use non-blocking mpi Isend/Irecv and the hybrid
>>>> approach using OpenMP and MPI together. In the hybrid approach, a separated
>>>> thread is use to do all non-blocking communications. Just exactly as you
>>>> said, the results indicate that current MPI implementations do not
>>>> support true asynchronous communication.
>>>>
>>>> If I use the naive approach, my code with non-blocking or blocking
>>>> send/recv gives me almost the same performance in term of Wtime. All
>>>> communications are postponed to MPI_Wait.
>>>>
>>>> I have tried calling mpi_test to push library to do communication
>>>> during iterations. And try to use a dedicated thread to do communication
>>>> and the other thread to do computing only. However, the performance gains
>>>> are very small or no gain at all. I'm wondering it is due to the hardware.
>>>> The cluster I tested uses 10G Ethernet card.
>>>>
>>>>
>>>> Best,
>>>>
>>>> Lei Shi
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Apr 3, 2015 at 8:49 AM, Huiwei Lu <huiweilu at mcs.anl.gov> wrote:
>>>>
>>>>> Hi Lei,
>>>>>
>>>>> As far as I know, all current MPI implementations do not support true
>>>>> asynchronous communication for now. i.e., If there is no MPI calls in your
>>>>> iterations, MPICH will not be able to make progress on communication.
>>>>>
>>>>> One solution is to poll the MPI runtime regularly to make progress by
>>>>> inserting MPI_Test to your iteration (even though you do not want to check
>>>>> the data).
>>>>>
>>>>> Another solution is to enable MPI's asynchronous progress thread to
>>>>> make progress for you.
>>>>>
>>>>> --
>>>>> Huiwei
>>>>>
>>>>> On Thu, Apr 2, 2015 at 11:44 PM, Lei Shi <lshi at ku.edu> wrote:
>>>>>
>>>>>> Hi Junchao,
>>>>>>
>>>>>> Thanks for your reply. For my case, I don't want to check the data
>>>>>> have been received or not. So I don't want to call MPI_Test or any function
>>>>>> to verify it. But my problem is like if I ignore calling the MPI_Wait, just
>>>>>> call Isend/Irev, my program freezes for several sec and then continues to
>>>>>> run. My guess is probably I messed up the MPI library internal buffer by
>>>>>> doing this.
>>>>>>
>>>>>> On Thu, Apr 2, 2015 at 7:25 PM, Junchao Zhang <jczhang at mcs.anl.gov>
>>>>>> wrote:
>>>>>>
>>>>>>> Does MPI_Test fit your needs?
>>>>>>>
>>>>>>> --Junchao Zhang
>>>>>>>
>>>>>>> On Thu, Apr 2, 2015 at 7:16 PM, Lei Shi <lshi at ku.edu> wrote:
>>>>>>>
>>>>>>>> I want to use non-blocking send/rev MPI_Isend/MPI_Irev to do
>>>>>>>> communication. But in my case, I don't really care what kind of data I get
>>>>>>>> or it is ready to use or not. So I don't want to waste my time to do any
>>>>>>>> synchronization by calling MPI_Wait or etc API.
>>>>>>>>
>>>>>>>> But when I avoid calling MPI_Wait, my program is freezed several
>>>>>>>> secs after running some iterations (after multiple MPI_Isend/Irev
>>>>>>>> callings), then continues. It takes even more time than the case with
>>>>>>>> MPI_Wait. So my question is how to do a "true" non-blocking communication
>>>>>>>> without waiting for the data ready or not. Thanks.
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> discuss mailing list discuss at mpich.org
>>>>>>>> To manage subscription options or unsubscribe:
>>>>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> discuss mailing list discuss at mpich.org
>>>>>>> To manage subscription options or unsubscribe:
>>>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> discuss mailing list discuss at mpich.org
>>>>>> To manage subscription options or unsubscribe:
>>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> discuss mailing list discuss at mpich.org
>>>>> To manage subscription options or unsubscribe:
>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> discuss mailing list discuss at mpich.org
>>>> To manage subscription options or unsubscribe:
>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>
>>>
>>>
>>>
>>> --
>>> Jeff Hammond
>>> jeff.science at gmail.com
>>> http://jeffhammond.github.io/
>>>
>>> _______________________________________________
>>> discuss mailing list discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20150407/b95edbff/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list