<meta http-equiv="Content-Type" content="text/html; charset=utf-8">Or install from macports; is has the patch.<span></span><br><br>On Thursday, August 11, 2016, Kenneth Raffenetti <<a href="mailto:raffenet@mcs.anl.gov">raffenet@mcs.anl.gov</a>> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Or a snapshot tarball: <a href="http://www.mpich.org/static/downloads/nightly/master/mpich/" target="_blank">http://www.mpich.org/static/do<wbr>wnloads/nightly/master/mpich/</a><br>
<br>
On 08/11/2016 04:21 PM, Halim Amer wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
This should be related to the alignment problem reported before<br>
(<a href="http://lists.mpich.org/pipermail/discuss/2016-May/004764.html" target="_blank">http://lists.mpich.org/piperm<wbr>ail/discuss/2016-May/004764.<wbr>html</a>).<br>
<br>
We plan to include a fix in the 3.2.x bug fix release series. Meanwhile,<br>
please try the repo version (<a href="http://git.mpich.org/mpich.git" target="_blank">git.mpich.org/mpich.git</a>), which should not<br>
suffer from this problem.<br>
<br>
--Halim<br>
<a href="http://www.mcs.anl.gov/~aamer" target="_blank">www.mcs.anl.gov/~aamer</a><br>
<br>
On 8/11/16 8:48 AM, Mark Davis wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hello, I'm running into a segfault when I run some relatively simple<br>
MPI programs. In this particular case, I'm running a small program in<br>
a loop that does MPI_Bcast, once per loop, within MPI_COMM_WORLD. The<br>
buffer consists of just 7 doubles. I'm running with 6 procs on a<br>
machine with 8 cores on OSX (Darwin - 15.6.0 Darwin Kernel Version<br>
15.6.0: Thu Jun 23 18:25:34 PDT 2016;<br>
root:xnu-3248.60.10~1/RELEASE_<wbr>X86_64 x86_64). When I run the same<br>
program with a smaller number of procs, the error usually doesn't show<br>
up. My compiler (both for compiling the MPICH source as well as my<br>
application) is clang 3.8.1.<br>
<br>
When I run the same program on linux, also with MPICH-3.2 (I believe<br>
the same exact source), compiled with gcc 5.3, I do not get this<br>
error. This seems to be something I get only with<br>
<br>
gdb shows the following stack trace. I have a feeling that this has<br>
something to do with my toolchain and/or libraries on my system given<br>
that I never get this error on my other system (linux). However, it's<br>
possible that there's an application bug as well.<br>
<br>
I'm running the MPICH-3.2 stable release; I haven't tried anything<br>
from the repository yet.<br>
<br>
Does anyone have any ideas about what's going on here? I'm happy to<br>
provide more details.<br>
<br>
Thank you,<br>
Mark<br>
<br>
<br>
Program received signal SIGSEGV, Segmentation fault.<br>
MPID_Request_create () at src/mpid/ch3/src/ch3u_request.<wbr>c:101<br>
101 req->dev.ext_hdr_ptr = NULL;<br>
(gdb) bt full<br>
#0 MPID_Request_create () at src/mpid/ch3/src/ch3u_request.<wbr>c:101<br>
No locals.<br>
#1 0x00000001003ac4c9 in MPIDI_CH3U_Recvq_FDP_or_AEU<br>
(match=<optimized out>, foundp=0x7fff5fbfe2bc) at<br>
src/mpid/ch3/src/ch3u_recvq.c:<wbr>830<br>
proc_failure_bit_masked = <error reading variable<br>
proc_failure_bit_masked (Cannot access memory at address 0x1)><br>
error_bit_masked = <error reading variable error_bit_masked<br>
(Cannot access memory at address 0x1)><br>
prev_rreq = <optimized out><br>
channel_matched = <optimized out><br>
rreq = <optimized out><br>
#2 0x00000001003d1ffe in MPIDI_CH3_PktHandler_EagerSend<br>
(vc=<optimized out>, pkt=0x1004b3fd8 <MPIU_DBG_MaxLevel>,<br>
buflen=0x7fff5fbfe440, rreqp=0x7fff5fbfe438) at<br>
src/mpid/ch3/src/ch3u_eager.c:<wbr>629<br>
mpi_errno = <error reading variable mpi_errno (Cannot access<br>
memory at address 0x0)><br>
found = <error reading variable found (Cannot access memory at<br>
address 0xefefefefefefefef)><br>
rreq = <optimized out><br>
data_len = <optimized out><br>
complete = <optimized out><br>
#3 0x00000001003f6045 in MPID_nem_handle_pkt (vc=<optimized out>,<br>
buf=0x102ad07e0 "", buflen=<optimized out>) at<br>
src/mpid/ch3/channels/nemesis/<wbr>src/ch3_progress.c:760<br>
len = 140734799800192<br>
mpi_errno = <optimized out><br>
complete = <error reading variable complete (Cannot access<br>
memory at address 0x1)><br>
rreq = <optimized out><br>
#4 0x00000001003f4e41 in MPIDI_CH3I_Progress<br>
(progress_state=0x7fff5fbfe750<wbr>, is_blocking=1) at<br>
src/mpid/ch3/channels/nemesis/<wbr>src/ch3_progress.c:570<br>
payload_len = 4299898840<br>
cell_buf = <optimized out><br>
rreq = <optimized out><br>
vc = 0x102ad07e8<br>
made_progress = <error reading variable made_progress (Cannot<br>
access memory at address 0x0)><br>
mpi_errno = <optimized out><br>
#5 0x000000010035386d in MPIC_Wait (request_ptr=<optimized out>,<br>
errflag=<optimized out>) at src/mpi/coll/helper_fns.c:225<br>
progress_state = {ch = {completion_count = -1409286143}}<br>
mpi_errno = <error reading variable mpi_errno (Cannot access<br>
memory at address 0x0)><br>
#6 0x0000000100353b10 in MPIC_Send (buf=0x100917c30,<br>
count=4299945096, datatype=-1581855963, dest=<optimized out>,<br>
tag=4975608, comm_ptr=0x1004b3fd8 <MPIU_DBG_MaxLevel>,<br>
errflag=<optimized out>) at src/mpi/coll/helper_fns.c:302<br>
mpi_errno = <optimized out><br>
request_ptr = 0x1004bf7e0 <MPID_Request_direct+1760><br>
#7 0x0000000100246031 in MPIR_Bcast_binomial (buffer=<optimized out>,<br>
count=<optimized out>, datatype=<optimized out>, root=<optimized out>,<br>
comm_ptr=<optimized out>, errflag=<optimized out>) at<br>
src/mpi/coll/bcast.c:280<br>
nbytes = <optimized out><br>
mpi_errno_ret = <optimized out><br>
mpi_errno = 0<br>
comm_size = <optimized out><br>
rank = 2<br>
type_size = <optimized out><br>
tmp_buf = 0x0<br>
position = <optimized out><br>
relative_rank = <optimized out><br>
mask = <optimized out><br>
src = <optimized out><br>
status = <optimized out><br>
recvd_size = <optimized out><br>
dst = <optimized out><br>
#8 0x00000001002455a3 in MPIR_SMP_Bcast (buffer=<optimized out>,<br>
count=<optimized out>, datatype=<optimized out>, root=<optimized out>,<br>
comm_ptr=<optimized out>, errflag=<optimized out>) at<br>
src/mpi/coll/bcast.c:1087<br>
mpi_errno_ = <error reading variable mpi_errno_ (Cannot access<br>
memory at address 0x0)><br>
mpi_errno = <optimized out><br>
mpi_errno_ret = <optimized out><br>
nbytes = <optimized out><br>
type_size = <optimized out><br>
status = <optimized out><br>
recvd_size = <optimized out><br>
#9 MPIR_Bcast_intra (buffer=0x100917c30, count=<optimized out>,<br>
datatype=<optimized out>, root=1, comm_ptr=<optimized out>,<br>
errflag=<optimized out>) at src/mpi/coll/bcast.c:1245<br>
nbytes = <optimized out><br>
mpi_errno_ret = <error reading variable mpi_errno_ret (Cannot<br>
access memory at address 0x0)><br>
mpi_errno = <optimized out><br>
type_size = <optimized out><br>
comm_size = <optimized out><br>
#10 0x000000010024751e in MPIR_Bcast (buffer=<optimized out>,<br>
count=<optimized out>, datatype=<optimized out>, root=<optimized out>,<br>
comm_ptr=0x0, errflag=<optimized out>) at src/mpi/coll/bcast.c:1475<br>
mpi_errno = <optimized out><br>
#11 MPIR_Bcast_impl (buffer=0x1004bf7e0 <MPID_Request_direct+1760>,<br>
count=-269488145, datatype=-16, root=0, comm_ptr=0x0,<br>
errflag=0x1004bf100 <MPID_Request_direct>) at<br>
src/mpi/coll/bcast.c:1451<br>
mpi_errno = <optimized out><br>
#12 0x00000001000f3c24 in MPI_Bcast (buffer=<optimized out>, count=7,<br>
datatype=1275069445, root=1, comm=<optimized out>) at<br>
src/mpi/coll/bcast.c:1585<br>
errflag = 2885681152<br>
mpi_errno = <optimized out><br>
comm_ptr = <optimized out><br>
#13 0x0000000100001df7 in run_test<int> (my_rank=2,<br>
num_ranks=<optimized out>, count=<optimized out>, root_rank=1,<br>
datatype=@0x7fff5fbfeaec: 1275069445, iterations=<optimized out>) at<br>
bcast_test.cpp:83<br>
No locals.<br>
#14 0x00000001000019cd in main (argc=<optimized out>, argv=<optimized<br>
out>) at bcast_test.cpp:137<br>
root_rank = <optimized out><br>
count = <optimized out><br>
iterations = <optimized out><br>
my_rank = 4978656<br>
num_errors = <optimized out><br>
runtime_ns = <optimized out><br>
stats = {<std::__1::__basic_string_com<wbr>mon<true>> = {<No data<br>
fields>}, __r_ =<br>
{<std::__1::__libcpp_compresse<wbr>d_pair_imp<std::__1::basic_<wbr>string<char,<br>
std::__1::char_traits<char>, std::__1::allocator<char> >::__rep,<br>
std::__1::allocator<char>, 2>> = {<std::__1::allocator<char>> = {<No<br>
data fields>}, __first_ = {{__l = {__cap_ = 17289301308300324847,<br>
__size_ = 17289301308300324847, __data_ = 0xefefefefefefefef <error:<br>
Cannot access memory at address 0xefefefefefefefef>}<br>
______________________________<wbr>_________________<br>
discuss mailing list <a>discuss@mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">https://lists.mpich.org/mailma<wbr>n/listinfo/discuss</a><br>
<br>
</blockquote>
______________________________<wbr>_________________<br>
discuss mailing list <a>discuss@mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">https://lists.mpich.org/mailma<wbr>n/listinfo/discuss</a><br>
</blockquote>
______________________________<wbr>_________________<br>
discuss mailing list <a>discuss@mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">https://lists.mpich.org/mailma<wbr>n/listinfo/discuss</a><br>
</blockquote>