<div dir="ltr">Thanks again Rob, I will !<div><br></div><div>Giuseppe<br><div><br></div><div><br></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On 4 December 2014 at 14:51, Rob Latham <span dir="ltr"><<a href="mailto:robl@mcs.anl.gov" target="_blank">robl@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><br>
<br>
On 12/04/2014 08:16 AM, Giuseppe Congiu wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hello Rob,<br>
<br>
I think I may have just found what the problem is:<br>
<br>
In ADIOI_GEN_IfileSync() I am passing args to MPI_Grequest_start() to<br>
later get how much data has been written to global file system in the<br>
query function and also return the error_code. The problem is that I am<br>
freeing args in the pthread routine (which is not reported in the<br>
previous email btw). This causes MPI_Wait() catching fire when the<br>
pthread has already completed and args freed (which happens when I have<br>
many segments). I fixed the code and now it seems to work : )<br>
<br>
I am sorry I have bothered you with a silly problem just related to a<br>
lack of attention from my side. Nevertheless, talking to you has helped<br>
me finding the solution in a few mins after wandering for days.<br>
</blockquote>
<br></span>
My old office mate had a little rubber duck he'd keep on his desk. When faced with a programming problem, he'd talk to "debugging ducky" and sort out the state.<br>
<br>
I'm glad I could be your debugging ducky. I am always glad to come across folks brave enough to hack on the ROMIO code. Please keep me updated with your research.<br>
<br>
==rob<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">
<br>
Best Regards,<br>
<br>
Giuseppe<br>
<br>
On 4 December 2014 at 13:14, Giuseppe Congiu<br></span><span class="">
<<a href="mailto:giuseppe.congiu@seagate.com" target="_blank">giuseppe.congiu@seagate.com</a> <mailto:<a href="mailto:giuseppe.congiu@seagate.com" target="_blank">giuseppe.congiu@<u></u>seagate.com</a>>> wrote:<br>
<br>
<br>
On 3 December 2014 at 21:19, Rob Latham <<a href="mailto:robl@mcs.anl.gov" target="_blank">robl@mcs.anl.gov</a><br></span><div><div class="h5">
<mailto:<a href="mailto:robl@mcs.anl.gov" target="_blank">robl@mcs.anl.gov</a>>> wrote:<br>
<br>
Sorry not to have responded to you sooner. Between a big<br>
conference and US thanksgiving, a lot of us were out of the<br>
office for the last few weeks.<br>
<br>
<br>
Hello Rob, I can totally understand, I have been also busy with<br>
other stuff in the last few weeks. Nevertheless, I cannot deny I<br>
have been looking forward for a reply : )<br>
<br>
Welcome. You're digging into ROMIO and Generalized Requests, so<br>
you've picked two fairly obscure areas of MPI to work on!<br>
<br>
As it happens, I am the world's expert in both ROMIO and<br>
Generalized requests. (the population size for that domain is<br>
exceedingly small...)<br>
<br>
<br>
Then you are the blessing that I have been waiting for : )<br>
<br>
I think this approach can greatly benefit collective I/O -- I<br>
observe on some systems that the communication cost is not<br>
hurting two-phase collective I/O but instead it is the<br>
synchronization: if one I/O aggregator is taking longer to do<br>
I/O than the others, all N processes pay a price waiting for the<br>
laggard.<br>
<br>
<br>
Yes, this is exactly the point. Communication cost does not seem to<br>
be a problem, on the other hand the MPI_Allreduce() at the end of<br>
the collective write is a real pain. By writing to local devices,<br>
the I/O response time variation among aggregators can be greatly<br>
reduced from what I observed.<br>
<br>
before we get too far, we need to talk a bit about generalized<br>
requests and their "super-strict" progress model. Can you<br>
share your query, free, and cancel functions?<br>
<br>
<br>
The call back functions are quite simple, they don't do anything<br>
special:<br>
<br>
int ADIOI_GEN_Ifile_sync_query_fn( void *extra_state,<br>
MPI_Status *status )<br>
{<br>
ARGS *state = (ARGS*)extra_state;<br>
<br>
MPI_Status_set_cancelled( status, 0 );<br>
MPI_Status_set_elements( status, MPI_BYTE,<br>
state->bytes_xfered );<br>
status->MPI_SOURCE = MPI_UNDEFINED;<br>
status->MPI_TAG = MPI_UNDEFINED;<br>
return state->error_code;<br>
}<br>
<br>
int ADIOI_GEN_Ifile_sync_free_fn( void *extra_state )<br>
{<br>
return MPI_SUCCESS;<br>
}<br>
<br>
int ADIOI_GEN_Ifile_sync_cancel_<u></u>fn( void *extra_state, int<br>
complete )<br>
{<br>
return MPI_SUCCESS;<br>
}<br>
<br>
Which part of this ends up in a generalized request?<br>
<br>
<br>
The function that ends up in a generalized request is the<br>
synchronization function. ADIOI_GEN_IfileSync(), which is also<br>
pretty simple. It just reads the locally written file domains and<br>
writes them to the global file.<br>
<br>
MPICH is going to set those hints to "automatic", but you have<br>
overridden the defaults? (it's the right override in most<br>
cases, so good! unless you did not, in which case we should<br>
double check that you are not mixing vendor MPICH and your own<br>
MPICH)<br>
<br>
I am setting the MPI hints using a IOR configuration file, follows a<br>
config file snippet:<br>
<br>
IOR_HINT__MPI__cb_buffer_size=<u></u>16777216<br>
IOR_HINT__MPI__cb_nodes=1<br>
IOR_HINT__MPI__romio_cb_read=<u></u>enable<br>
IOR_HINT__MPI__romio_cb_write=<u></u>enable<br>
IOR_HINT__MPI__local_cache=<u></u>enable<br>
IOR_HINT__MPI__local_cache_<u></u>path=/tmp/ior_tmp_file<br>
IOR_HINT__MPI__local_cache_<u></u>flush_flag=flush_immediate<br>
IOR_HINT__MPI__local_cache_<u></u>discard_flag=enable<br>
IOR_HINT__MPI__romio_no_indep_<u></u>rw=true<br>
<br>
the collective buffer size by default is 16MB so the hint in this<br>
case is superfluous (same goes for cb_nodes=1 since I am using only<br>
one node).<br>
<br>
MPIR_Barrier_intra(169):<br>
PSIlogger: Child with rank 5 exited with status 1.<br>
mpid_irecv_done(101)...: read from socket failed - request<br>
state:recv(pde)done<br>
wait entry: 2a1ef18, 21f2790, 2a1e8e8, 21f4488<br>
Fatal error in PMPI_Wait: Other MPI error, error stack:<br>
PMPI_Wait(180)..........: MPI_Wait(request=0x21f4488,<br>
status=0x21f2138)<br>
failed<br>
MPIR_Wait_impl(77)......:<br>
MPIR_Grequest_query(447): user request query function<br>
returned error<br>
code 1601<br>
<br>
<br>
I wonder if your query is doing more than it's supposed to do....<br>
<br>
<br>
Maybe you can tell me. I basically used Chapter 12 - External<br>
Interfaces - (from MPI specifications published September 21, 2012)<br>
as reference for the generalized request interface.<br>
I just want to remind that this problem is showing itself only when<br>
I am writing more than 4 segments in IOR.<br>
<br>
Does anyone have a clue of what is going wrong here?<br>
<br>
<br>
We're going to need to see some more code, I think...<br>
<br>
<br>
Follows the function that starts the generalized request to<br>
synchronize the file:<br>
<br>
int ADIOI_GEN_IfileSync( ADIO_File fd, int count, MPI_Datatype<br>
datatype,<br>
int file_ptr_type, ADIO_Offset offset,<br>
ADIO_Request *request )<br>
{<br>
pthread_t thread;<br>
pthread_attr_t attr;<br>
int rc;<br>
ARGS *args;<br>
<br>
/* copy args */<br>
args = (ARGS *)ADIOI_Malloc(sizeof(ARGS));<br>
args->fd = fd;<br>
args->count = count;<br>
args->datatype = datatype;<br>
args->file_ptr_type = file_ptr_type;<br>
args->offset = offset;<br>
args->request = request;<br>
<br>
/* start generalized request */<br>
MPI_Grequest_start( &ADIOI_GEN_Ifile_sync_query_<u></u>fn,<br>
&ADIOI_GEN_Ifile_sync_free_fn,<br>
&ADIOI_GEN_Ifile_sync_cancel_<u></u>fn,<br>
args,<br>
request );<br>
<br>
/* spawn a new thread to handle the request */<br>
pthread_attr_init( &attr );<br>
pthread_attr_setdetachstate( &attr, PTHREAD_CREATE_DETACHED );<br>
rc = pthread_create( &thread, &attr,<br>
ADIOI_GEN_Ifile_sync_thread, args );<br>
pthread_attr_destroy( &attr );<br>
<br>
/* --Error handling starts here-- */<br>
//TODO: do something<br>
/* --End of Error handling-- */<br>
return MPI_SUCCESS;<br>
}<br>
<br>
BTW, I just realized I forgot to free ARGS in the query function. I<br>
will fix that but I don't think it is causing the problem. Am I wrong?<br>
<br>
Thanks for your help,<br>
<br>
--<br></div></div>
Giuseppe Congiu *·* Research Engineer II<br>
Seagate Technology, LLC<br>
office: <a href="tel:%2B44%20%280%2923%209249%206082" value="+442392496082" target="_blank">+44 (0)23 9249 6082</a> <tel:%2B44%20%280%2923%209249%<u></u>206082><br>
*·* mobile:<br>
<a href="http://www.seagate.com" target="_blank">www.seagate.com</a> <<a href="http://www.seagate.com" target="_blank">http://www.seagate.com</a>><br>
<br>
<br>
<br><span class="HOEnZb"><font color="#888888">
<br>
--<br>
Giuseppe Congiu *·* Research Engineer II<br>
Seagate Technology, LLC<br>
office: <a href="tel:%2B44%20%280%2923%209249%206082" value="+442392496082" target="_blank">+44 (0)23 9249 6082</a> *·* mobile:<br>
<a href="http://www.seagate.com" target="_blank">www.seagate.com</a> <<a href="http://www.seagate.com" target="_blank">http://www.seagate.com</a>><br>
</font></span></blockquote><div class="HOEnZb"><div class="h5">
<br>
-- <br>
Rob Latham<br>
Mathematics and Computer Science Division<br>
Argonne National Lab, IL USA<br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature"><div dir="ltr">Giuseppe Congiu <strong>·</strong> Research Engineer II<br>
Seagate Technology, LLC<br>
office: +44 (0)23 9249 6082 <strong>·</strong> mobile: <br>
<a href="http://www.seagate.com" target="_blank">www.seagate.com</a><br></div></div>
</div>