[mpich-discuss] Internal Error: invalid error code 409e10 (Ring ids do not match)

Gus Correa gus at ldeo.columbia.edu
Mon Jun 2 19:54:05 CDT 2014


Sorry, C++ is just wild guesswork for me ...

Do your send and recv buffers in MPI_Reduce
(&solution, &received_solution)
overlap or are the same on root (0) ?

 >     MPIR_Localcopy(123)......: memcpy arguments alias each other,
 >     dst=0x7fff40ad30e0 src=0x7fff40ad30c8 len=32

I guess you need to replace the send buffer by MPI_IN_PLACE on root,
when using a single buffer.
Or perhaps use different buffers.

On 06/02/2014 07:15 PM, Jakub Łuczyński wrote:
> P.S.
> for clarity
>
> typedef long long m_entry_t;
>
>
> On Tue, Jun 3, 2014 at 1:09 AM, Jakub Łuczyński <doubleloop at o2.pl
> <mailto:doubleloop at o2.pl>> wrote:
>
>     After installation of current version (3.1)
>
>     $ mpiexec -n 2 ~/tmp/opencl/msp-par.exe 10 10 1
>     Fatal error in PMPI_Reduce: Internal MPI error!, error stack:
>     PMPI_Reduce(1259)........: MPI_Reduce(sbuf=0x7fff40ad30c8,
>     rbuf=0x7fff40ad30e0, count=1, dtype=USER<struct>, op=0x98000000,
>     root=0, MPI_COMM_WORLD) failed
>     MPIR_Reduce_impl(1071)...:
>     MPIR_Reduce_intra(822)...:
>     MPIR_Reduce_impl(1071)...:
>     MPIR_Reduce_intra(877)...:
>     MPIR_Reduce_binomial(130):
>     MPIR_Localcopy(123)......: memcpy arguments alias each other,
>     dst=0x7fff40ad30e0 src=0x7fff40ad30c8 len=32
>     MPIR_Reduce_intra(842)...:
>     MPIR_Reduce_impl(1071)...:
>     MPIR_Reduce_intra(877)...:
>     MPIR_Reduce_binomial(246): Failure during collective
>
>
>     On Tue, Jun 3, 2014 at 12:25 AM, Gus Correa <gus at ldeo.columbia.edu
>     <mailto:gus at ldeo.columbia.edu>> wrote:
>
>         This is an old version of mpich.
>         Is it perhaps still using the mpd ring?
>         [If so, you need to start the mpd ring, if not already set,
>         before you launch the job. But that method was phased out.]
>         It may be worth updating to the latest mpich stable
>         and use the current mpiexec (hydra) to launch the job.
>
>         http://www.mpich.org/__downloads/ <http://www.mpich.org/downloads/>
>         http://www.mpich.org/__documentation/guides/
>         <http://www.mpich.org/documentation/guides/>
>         http://wiki.mpich.org/mpich/__index.php/Using_the_Hydra___Process_Manager
>         <http://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager>
>
>
>
>         On 06/02/2014 06:03 PM, Lu, Huiwei wrote:
>
>             Hi Kuba,
>
>             Since it works with both Open MPI and BGP, it is most likely
>             a problem of your MPICH installation or your platform.
>
>             We have stopped supporting the Windows platform a while ago
>             due to lack of developer resources. Please refer to our FAQ
>             for more information:
>             http://wiki.mpich.org/mpich/__index.php/Frequently_Asked___Questions#Q:_Why_can.27t_I___build_MPICH_on_Windows___anymore.3F
>             <http://wiki.mpich.org/mpich/index.php/Frequently_Asked_Questions#Q:_Why_can.27t_I_build_MPICH_on_Windows_anymore.3F>
>
>             If it is on windows platform, we recommend you use Microsoft
>             MPI, which can be found here:
>             http://msdn.microsoft.com/en-__us/library/bb524831(v=vs.85).__aspx
>             <http://msdn.microsoft.com/en-us/library/bb524831(v=vs.85).aspx>
>
>             We also encourage you to use the latest MPICH on Linux or
>             OSX platforms, which can be downloaded here:
>             http://www.mpich.org/__downloads/
>             <http://www.mpich.org/downloads/>
>
>>             Huiwei
>
>             On Jun 2, 2014, at 4:49 PM, Jakub Łuczyński
>             <doubleloop at o2.pl <mailto:doubleloop at o2.pl>> wrote:
>
>                 I wrote my assignment using MPI, and tested it both
>                 locally on Open MPI (1.6.5) and on IBM Blue Gene/P (with
>                 mpi implementation provided by IBM). Everything worked
>                 fine. Turns out that our solutions are tested also in
>                 our labs where MPICH, is installed:
>
>                 $ mpich2version
>                 MPICH2 Version:        1.4.1p1
>
>                 And when I run my solution there I get this strange error:
>                 $ mpirun -n 2 msp-par.exe 10 10 1
>                 Internal Error: invalid error code 409e10 (Ring ids do
>                 not match) in MPIR_Reduce_impl:1087
>                 Fatal error in PMPI_Reduce: Other MPI error, error stack:
>                 PMPI_Reduce(1270).....:
>                 MPI_Reduce(sbuf=__0x7fff693a92e8, rbuf=0x7fff693a9300,
>                 count=1, dtype=USER<struct>, op=0x98000000, root=0,
>                 MPI_COMM_WORLD) failed
>                 MPIR_Reduce_impl(1087):
>
>                 I am literally out of ideas what is wrong!
>
>                 Below source code fragments (c++):
>
>                 struct msp_solution
>                 {
>                      int x1, y1, x2, y2;
>                      m_entry_t max_sum;
>                      msp_solution();
>                      msp_solution(const pair<int, int> &c1, const
>                 pair<int, int> &c2, int max_sum);
>                      friend bool operator<(const msp_solution &s1, const
>                 msp_solution &s2);
>                 };
>
>                 void max_msp_solution(msp_solution *in, msp_solution
>                 *inout, int, MPI_Datatype*)
>                 {
>                       *inout = max(*in, *inout);
>                 }
>
>                 // somewhere in code
>                 {
>                       MPI_Datatype MPI_msp_solution_t;
>                       MPI_Op max_msp_solution_op;
>
>                       // create MPI struct from msp_solution
>                       MPI_Datatype types[] = { MPI_INT, MPI_LONG_LONG_INT };
>                       int block_lengths[] = { 4, 2 };
>                       MPI_Aint base_addr, x1_addr, max_sum_addr;
>                       MPI_Get_address(&collected___solution, &base_addr);
>                       MPI_Get_address(&collected___solution.x1, &x1_addr);
>                       MPI_Get_address(&collected___solution.max_sum,
>                 &max_sum_addr);
>
>                       MPI_Aint displacements[] =
>                       {
>                           x1_addr - base_addr,
>                           max_sum_addr - base_addr
>                       };
>
>                       MPI_Type_create_struct(2, block_lengths,
>                 displacements, types, &MPI_msp_solution_t);
>                       MPI_Type_commit(&MPI_msp___solution_t);
>
>                       // max reduction function
>                       MPI_Op_create((MPI_User___function *)
>                 max_msp_solution, 1, &max_msp_solution_op);
>
>                      ...
>
>                       msp_solution solution, received_solution;
>                       MPI_Comm comm,
>                       ...
>                       // comm is created using MPI_Comm_split
>                       // solution is initialized
>                       MPI_Reduce(&solution, &received_solution, 1,
>                 MPI_msp_solution_t, max_msp_solution_op , 0,
>                 MPI_COMM_WORLD);
>                       // ERROR above!!!
>                 }
>
>
>                 Is there some error in this? How can I make it run?
>                 P.S. MPI_Send and MPI_Recv on my struct
>                 MPI_msp_solution_t seems to work fine
>
>                 Thanks in advance!
>                 Best regards,
>                 Kuba
>                 _________________________________________________
>                 discuss mailing list discuss at mpich.org
>                 <mailto:discuss at mpich.org>
>                 To manage subscription options or unsubscribe:
>                 https://lists.mpich.org/__mailman/listinfo/discuss
>                 <https://lists.mpich.org/mailman/listinfo/discuss>
>
>
>             _________________________________________________
>             discuss mailing list discuss at mpich.org
>             <mailto:discuss at mpich.org>
>             To manage subscription options or unsubscribe:
>             https://lists.mpich.org/__mailman/listinfo/discuss
>             <https://lists.mpich.org/mailman/listinfo/discuss>
>
>
>         _________________________________________________
>         discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org>
>         To manage subscription options or unsubscribe:
>         https://lists.mpich.org/__mailman/listinfo/discuss
>         <https://lists.mpich.org/mailman/listinfo/discuss>
>
>
>
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>




More information about the discuss mailing list