<div dir="ltr">Hello Rob,<div><br></div><div>I think I may have just found what the problem is:<div><div><br></div><div>In ADIOI_GEN_IfileSync() I am passing args to MPI_Grequest_start() to later get how much data has been written to global file system in the query function and also return the error_code. The problem is that I am freeing args in the pthread routine (which is not reported in the previous email btw). This causes MPI_Wait() catching fire when the pthread has already completed and args freed (which happens when I have many segments). I fixed the code and now it seems to work : )</div></div></div><div><br></div><div>I am sorry I have bothered you with a silly problem just related to a lack of attention from my side. Nevertheless, talking to you has helped me finding the solution in a few mins after wandering for days.</div><div><br></div><div>Best Regards,</div><div><br></div><div>Giuseppe</div><div class="gmail_extra"><br><div class="gmail_quote">On 4 December 2014 at 13:14, Giuseppe Congiu <span dir="ltr"><<a href="mailto:giuseppe.congiu@seagate.com" target="_blank">giuseppe.congiu@seagate.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote"><span class="">On 3 December 2014 at 21:19, Rob Latham <span dir="ltr"><<a href="mailto:robl@mcs.anl.gov" target="_blank">robl@mcs.anl.gov</a>></span> wrote:<br><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span></span>
Sorry not to have responded to you sooner. Between a big conference and US thanksgiving, a lot of us were out of the office for the last few weeks.<br></blockquote><div><br></div></span><div>Hello Rob, I can totally understand, I have been also busy with other stuff in the last few weeks. Nevertheless, I cannot deny I have been looking forward for a reply : )</div><span class=""><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
Welcome. You're digging into ROMIO and Generalized Requests, so you've picked two fairly obscure areas of MPI to work on!<br>
<br>
As it happens, I am the world's expert in both ROMIO and Generalized requests. (the population size for that domain is exceedingly small...)</blockquote><div><br></div></span><div>Then you are the blessing that I have been waiting for : ) <br></div><span class=""><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span></span>
I think this approach can greatly benefit collective I/O -- I observe on some systems that the communication cost is not hurting two-phase collective I/O but instead it is the synchronization: if one I/O aggregator is taking longer to do I/O than the others, all N processes pay a price waiting for the laggard.<span> <br></span></blockquote><div><br></div></span><div>Yes, this is exactly the point. Communication cost does not seem to be a problem, on the other hand the MPI_Allreduce() at the end of the collective write is a real pain. By writing to local devices, the I/O response time variation among aggregators can be greatly reduced from what I observed. </div><span class=""><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
before we get too far, we need to talk a bit about generalized requests and their "super-strict" progress model. Can you share your query, free, and cancel functions?</blockquote><div><br></div></span><div>The call back functions are quite simple, they don't do anything special:</div><div><div><br></div><div><font face="monospace"> int ADIOI_GEN_Ifile_sync_query_fn( void *extra_state, MPI_Status *status )</font></div><div><font face="monospace"> {</font></div><div><font face="monospace"> ARGS *state = (ARGS*)extra_state;</font></div><div><font face="monospace"><br></font></div><div><font face="monospace"> MPI_Status_set_cancelled( status, 0 );</font></div><div><font face="monospace"> MPI_Status_set_elements( status, MPI_BYTE, state->bytes_xfered );</font></div><div><font face="monospace"> status->MPI_SOURCE = MPI_UNDEFINED;</font></div><div><font face="monospace"> status->MPI_TAG = MPI_UNDEFINED;</font></div><div><font face="monospace"> </font></div><div><font face="monospace"> return state->error_code;</font></div><div><font face="monospace"> }</font></div><div><font face="monospace"><br></font></div><div><font face="monospace"> int ADIOI_GEN_Ifile_sync_free_fn( void *extra_state )</font></div><div><font face="monospace"> {</font></div><div><font face="monospace"> return MPI_SUCCESS;</font></div><div><font face="monospace"> }</font></div><div><font face="monospace"><br></font></div><div><font face="monospace"> int ADIOI_GEN_Ifile_sync_cancel_fn( void *extra_state, int complete )</font></div><div><font face="monospace"> {</font></div><div><font face="monospace"> return MPI_SUCCESS;</font></div><div><font face="monospace"> }</font></div></div><span class=""><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
Which part of this ends up in a generalized request?</blockquote><div><br></div></span><div>The function that ends up in a generalized request is the synchronization function. ADIOI_GEN_IfileSync(), which is also pretty simple. It just reads the locally written file domains and writes them to the global file. </div><span class=""><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">MPICH is going to set those hints to "automatic", but you have overridden the defaults? (it's the right override in most cases, so good! unless you did not, in which case we should double check that you are not mixing vendor MPICH and your own MPICH)</blockquote><div> </div></span><div>I am setting the MPI hints using a IOR configuration file, follows a config file snippet:</div><div><br></div><div><div><font face="monospace">IOR_HINT__MPI__cb_buffer_size=16777216</font></div><div><font face="monospace">IOR_HINT__MPI__cb_nodes=1</font></div><div><font face="monospace">IOR_HINT__MPI__romio_cb_read=enable</font></div><div><font face="monospace">IOR_HINT__MPI__romio_cb_write=enable</font></div><div><font face="monospace">IOR_HINT__MPI__local_cache=enable</font></div><div><font face="monospace">IOR_HINT__MPI__local_cache_path=/tmp/ior_tmp_file</font></div><div><font face="monospace">IOR_HINT__MPI__local_cache_flush_flag=flush_immediate</font></div><div><font face="monospace">IOR_HINT__MPI__local_cache_discard_flag=enable</font></div><div><font face="monospace">IOR_HINT__MPI__romio_no_indep_rw=true</font></div></div><div><br></div><div>the collective buffer size by default is 16MB so the hint in this case is superfluous (same goes for cb_nodes=1 since I am using only one node). </div><span class=""><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
MPIR_Barrier_intra(169):<br>
PSIlogger: Child with rank 5 exited with status 1.<br>
mpid_irecv_done(101)...: read from socket failed - request<br>
state:recv(pde)done<br>
wait entry: 2a1ef18, 21f2790, 2a1e8e8, 21f4488<br>
Fatal error in PMPI_Wait: Other MPI error, error stack:<br>
PMPI_Wait(180)..........: MPI_Wait(request=0x21f4488, status=0x21f2138)<br>
failed<br>
MPIR_Wait_impl(77)......:<br>
MPIR_Grequest_query(447): user request query function returned error<br>
code 1601<br>
</blockquote>
<br></span>
I wonder if your query is doing more than it's supposed to do....</blockquote><div><br></div></span><div>Maybe you can tell me. I basically used Chapter 12 - External Interfaces - (from MPI specifications published September 21, 2012) as reference for the generalized request interface. </div><div>I just want to remind that this problem is showing itself only when I am writing more than 4 segments in IOR.</div><span class=""><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
Does anyone have a clue of what is going wrong here?<br>
</blockquote>
<br></span>
We're going to need to see some more code, I think...</blockquote><div><br></div></span><div>Follows the function that starts the generalized request to synchronize the file:</div><div><br></div><div><font face="monospace"> int ADIOI_GEN_IfileSync( ADIO_File fd, int count, MPI_Datatype datatype, <br></font></div><div><font face="monospace"> int file_ptr_type, ADIO_Offset offset, </font></div><div><font face="monospace"> ADIO_Request *request )</font></div><div><font face="monospace"> {</font></div><div><font face="monospace"> pthread_t thread;</font></div><div><font face="monospace"> pthread_attr_t attr;</font></div><div><font face="monospace"> int rc;</font></div><div><font face="monospace"> ARGS *args;</font></div><div><font face="monospace"><br></font></div><div><font face="monospace"> /* copy args */</font></div><div><font face="monospace"> args = (ARGS *)ADIOI_Malloc(sizeof(ARGS));</font></div><div><font face="monospace"> args->fd = fd;</font></div><div><font face="monospace"> args->count = count;</font></div><div><font face="monospace"> args->datatype = datatype;</font></div><div><font face="monospace"> args->file_ptr_type = file_ptr_type;</font></div><div><font face="monospace"> args->offset = offset;</font></div><div><font face="monospace"> args->request = request;</font></div><div><font face="monospace"><br></font></div><div><font face="monospace"> /* start generalized request */</font></div><div><font face="monospace"> MPI_Grequest_start( &ADIOI_GEN_Ifile_sync_query_fn,</font></div><div><font face="monospace"><span style="white-space:pre-wrap"> </span> &ADIOI_GEN_Ifile_sync_free_fn,</font></div><div><font face="monospace"> &ADIOI_GEN_Ifile_sync_cancel_fn,</font></div><div><font face="monospace"> args,</font></div><div><font face="monospace"> request );</font></div><div><font face="monospace"><br></font></div><div><font face="monospace"> /* spawn a new thread to handle the request */</font></div><div><font face="monospace"> pthread_attr_init( &attr );</font></div><div><font face="monospace"> pthread_attr_setdetachstate( &attr, PTHREAD_CREATE_DETACHED );</font></div><div><font face="monospace"> rc = pthread_create( &thread, &attr, ADIOI_GEN_Ifile_sync_thread, args );</font></div><div><font face="monospace"> pthread_attr_destroy( &attr );</font></div><div><font face="monospace"><br></font></div><div><font face="monospace"> /* --Error handling starts here-- */</font></div><div><font face="monospace"> //TODO: do something </font></div><div><font face="monospace"> /* --End of Error handling-- */</font></div><div><font face="monospace"> </font></div><div><font face="monospace"> return MPI_SUCCESS;</font></div><div><font face="monospace"> }</font></div><div><br></div><div>BTW, I just realized I forgot to free ARGS in the query function. I will fix that but I don't think it is causing the problem. Am I wrong?</div><div><br></div><div>Thanks for your help,</div></div><span class=""><div><br></div>-- <br><div><div dir="ltr">Giuseppe Congiu <strong>·</strong> Research Engineer II<br>
Seagate Technology, LLC<br>
office: <a href="tel:%2B44%20%280%2923%209249%206082" value="+442392496082" target="_blank">+44 (0)23 9249 6082</a> <strong>·</strong> mobile: <br>
<a href="http://www.seagate.com" target="_blank">www.seagate.com</a><br></div></div>
</span></div></div>
</blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature"><div dir="ltr">Giuseppe Congiu <strong>·</strong> Research Engineer II<br>
Seagate Technology, LLC<br>
office: +44 (0)23 9249 6082 <strong>·</strong> mobile: <br>
<a href="http://www.seagate.com" target="_blank">www.seagate.com</a><br></div></div>
</div></div>