[mpich-discuss] Internal Error: invalid error code 409e10 (Ring ids do not match)
Gus Correa
gus at ldeo.columbia.edu
Mon Jun 2 20:01:05 CDT 2014
Oh, I just replied to your previous email.
On 06/02/2014 08:53 PM, Jakub Łuczyński wrote:
> Solved, it was my error..
> int block_lengths[] = { 4, 2 };
> should be:
> int block_lengths[] = { 4, 1 };
>
Great that you solved it!
That makes sense, it sounds as it may have caused
some type of inadvertent buffer overlap.
> Did not read man carefully.
>
> There was overflow all the time. On MPICH it caused error, on openmpi
> and BGP it was silent.
>
I guess memory issues are not always reproducible across compilers,
libraries, etc.
>
> On Tue, Jun 3, 2014 at 1:15 AM, Jakub Łuczyński <doubleloop at o2.pl
> <mailto:doubleloop at o2.pl>> wrote:
>
> P.S.
> for clarity
>
> typedef long long m_entry_t;
>
>
> On Tue, Jun 3, 2014 at 1:09 AM, Jakub Łuczyński <doubleloop at o2.pl
> <mailto:doubleloop at o2.pl>> wrote:
>
> After installation of current version (3.1)
>
> $ mpiexec -n 2 ~/tmp/opencl/msp-par.exe 10 10 1
> Fatal error in PMPI_Reduce: Internal MPI error!, error stack:
> PMPI_Reduce(1259)........: MPI_Reduce(sbuf=0x7fff40ad30c8,
> rbuf=0x7fff40ad30e0, count=1, dtype=USER<struct>, op=0x98000000,
> root=0, MPI_COMM_WORLD) failed
> MPIR_Reduce_impl(1071)...:
> MPIR_Reduce_intra(822)...:
> MPIR_Reduce_impl(1071)...:
> MPIR_Reduce_intra(877)...:
> MPIR_Reduce_binomial(130):
> MPIR_Localcopy(123)......: memcpy arguments alias each other,
> dst=0x7fff40ad30e0 src=0x7fff40ad30c8 len=32
> MPIR_Reduce_intra(842)...:
> MPIR_Reduce_impl(1071)...:
> MPIR_Reduce_intra(877)...:
> MPIR_Reduce_binomial(246): Failure during collective
>
>
> On Tue, Jun 3, 2014 at 12:25 AM, Gus Correa
> <gus at ldeo.columbia.edu <mailto:gus at ldeo.columbia.edu>> wrote:
>
> This is an old version of mpich.
> Is it perhaps still using the mpd ring?
> [If so, you need to start the mpd ring, if not already set,
> before you launch the job. But that method was phased out.]
> It may be worth updating to the latest mpich stable
> and use the current mpiexec (hydra) to launch the job.
>
> http://www.mpich.org/__downloads/
> <http://www.mpich.org/downloads/>
> http://www.mpich.org/__documentation/guides/
> <http://www.mpich.org/documentation/guides/>
> http://wiki.mpich.org/mpich/__index.php/Using_the_Hydra___Process_Manager
> <http://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager>
>
>
>
> On 06/02/2014 06:03 PM, Lu, Huiwei wrote:
>
> Hi Kuba,
>
> Since it works with both Open MPI and BGP, it is most
> likely a problem of your MPICH installation or your
> platform.
>
> We have stopped supporting the Windows platform a while
> ago due to lack of developer resources. Please refer to
> our FAQ for more information:
> http://wiki.mpich.org/mpich/__index.php/Frequently_Asked___Questions#Q:_Why_can.27t_I___build_MPICH_on_Windows___anymore.3F
> <http://wiki.mpich.org/mpich/index.php/Frequently_Asked_Questions#Q:_Why_can.27t_I_build_MPICH_on_Windows_anymore.3F>
>
> If it is on windows platform, we recommend you use
> Microsoft MPI, which can be found here:
> http://msdn.microsoft.com/en-__us/library/bb524831(v=vs.85).__aspx
> <http://msdn.microsoft.com/en-us/library/bb524831(v=vs.85).aspx>
>
> We also encourage you to use the latest MPICH on Linux
> or OSX platforms, which can be downloaded here:
> http://www.mpich.org/__downloads/
> <http://www.mpich.org/downloads/>
>
> —
> Huiwei
>
> On Jun 2, 2014, at 4:49 PM, Jakub Łuczyński
> <doubleloop at o2.pl <mailto:doubleloop at o2.pl>> wrote:
>
> I wrote my assignment using MPI, and tested it both
> locally on Open MPI (1.6.5) and on IBM Blue Gene/P
> (with mpi implementation provided by IBM).
> Everything worked fine. Turns out that our solutions
> are tested also in our labs where MPICH, is installed:
>
> $ mpich2version
> MPICH2 Version: 1.4.1p1
>
> And when I run my solution there I get this strange
> error:
> $ mpirun -n 2 msp-par.exe 10 10 1
> Internal Error: invalid error code 409e10 (Ring ids
> do not match) in MPIR_Reduce_impl:1087
> Fatal error in PMPI_Reduce: Other MPI error, error
> stack:
> PMPI_Reduce(1270).....:
> MPI_Reduce(sbuf=__0x7fff693a92e8,
> rbuf=0x7fff693a9300, count=1, dtype=USER<struct>,
> op=0x98000000, root=0, MPI_COMM_WORLD) failed
> MPIR_Reduce_impl(1087):
>
> I am literally out of ideas what is wrong!
>
> Below source code fragments (c++):
>
> struct msp_solution
> {
> int x1, y1, x2, y2;
> m_entry_t max_sum;
> msp_solution();
> msp_solution(const pair<int, int> &c1, const
> pair<int, int> &c2, int max_sum);
> friend bool operator<(const msp_solution &s1,
> const msp_solution &s2);
> };
>
> void max_msp_solution(msp_solution *in, msp_solution
> *inout, int, MPI_Datatype*)
> {
> *inout = max(*in, *inout);
> }
>
> // somewhere in code
> {
> MPI_Datatype MPI_msp_solution_t;
> MPI_Op max_msp_solution_op;
>
> // create MPI struct from msp_solution
> MPI_Datatype types[] = { MPI_INT,
> MPI_LONG_LONG_INT };
> int block_lengths[] = { 4, 2 };
> MPI_Aint base_addr, x1_addr, max_sum_addr;
> MPI_Get_address(&collected___solution,
> &base_addr);
> MPI_Get_address(&collected___solution.x1,
> &x1_addr);
> MPI_Get_address(&collected___solution.max_sum,
> &max_sum_addr);
>
> MPI_Aint displacements[] =
> {
> x1_addr - base_addr,
> max_sum_addr - base_addr
> };
>
> MPI_Type_create_struct(2, block_lengths,
> displacements, types, &MPI_msp_solution_t);
> MPI_Type_commit(&MPI_msp___solution_t);
>
> // max reduction function
> MPI_Op_create((MPI_User___function *)
> max_msp_solution, 1, &max_msp_solution_op);
>
> ...
>
> msp_solution solution, received_solution;
> MPI_Comm comm,
> ...
> // comm is created using MPI_Comm_split
> // solution is initialized
> MPI_Reduce(&solution, &received_solution, 1,
> MPI_msp_solution_t, max_msp_solution_op , 0,
> MPI_COMM_WORLD);
> // ERROR above!!!
> }
>
>
> Is there some error in this? How can I make it run?
> P.S. MPI_Send and MPI_Recv on my struct
> MPI_msp_solution_t seems to work fine
>
> Thanks in advance!
> Best regards,
> Kuba
> _________________________________________________
> discuss mailing list discuss at mpich.org
> <mailto:discuss at mpich.org>
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/__mailman/listinfo/discuss
> <https://lists.mpich.org/mailman/listinfo/discuss>
>
>
> _________________________________________________
> discuss mailing list discuss at mpich.org
> <mailto:discuss at mpich.org>
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/__mailman/listinfo/discuss
> <https://lists.mpich.org/mailman/listinfo/discuss>
>
>
> _________________________________________________
> discuss mailing list discuss at mpich.org
> <mailto:discuss at mpich.org>
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/__mailman/listinfo/discuss
> <https://lists.mpich.org/mailman/listinfo/discuss>
>
>
>
>
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
More information about the discuss
mailing list