[mpich-discuss] Internal Error: invalid error code 409e10 (Ring ids do not match)

Jakub Łuczyński doubleloop at o2.pl
Mon Jun 2 16:49:13 CDT 2014


I wrote my assignment using MPI, and tested it both locally on Open MPI
(1.6.5) and on IBM Blue Gene/P (with mpi implementation provided by IBM).
Everything worked fine. Turns out that our solutions are tested also in our
labs where MPICH, is installed:

$ mpich2version
MPICH2 Version:        1.4.1p1

And when I run my solution there I get this strange error:
$ mpirun -n 2 msp-par.exe 10 10 1
Internal Error: invalid error code 409e10 (Ring ids do not match) in
MPIR_Reduce_impl:1087
Fatal error in PMPI_Reduce: Other MPI error, error stack:
PMPI_Reduce(1270).....: MPI_Reduce(sbuf=0x7fff693a92e8,
rbuf=0x7fff693a9300, count=1, dtype=USER<struct>, op=0x98000000, root=0,
MPI_COMM_WORLD) failed
MPIR_Reduce_impl(1087):

I am literally out of ideas what is wrong!

Below source code fragments (c++):

struct msp_solution
{
   int x1, y1, x2, y2;
   m_entry_t max_sum;
   msp_solution();
   msp_solution(const pair<int, int> &c1, const pair<int, int> &c2, int
max_sum);
   friend bool operator<(const msp_solution &s1, const msp_solution &s2);
};

void max_msp_solution(msp_solution *in, msp_solution *inout, int,
MPI_Datatype*)
{
    *inout = max(*in, *inout);
}

// somewhere in code
{
    MPI_Datatype MPI_msp_solution_t;
    MPI_Op max_msp_solution_op;

    // create MPI struct from msp_solution
    MPI_Datatype types[] = { MPI_INT, MPI_LONG_LONG_INT };
    int block_lengths[] = { 4, 2 };
    MPI_Aint base_addr, x1_addr, max_sum_addr;
    MPI_Get_address(&collected_solution, &base_addr);
    MPI_Get_address(&collected_solution.x1, &x1_addr);
    MPI_Get_address(&collected_solution.max_sum, &max_sum_addr);

    MPI_Aint displacements[] =
    {
        x1_addr - base_addr,
        max_sum_addr - base_addr
    };

    MPI_Type_create_struct(2, block_lengths, displacements, types,
&MPI_msp_solution_t);
    MPI_Type_commit(&MPI_msp_solution_t);

    // max reduction function
    MPI_Op_create((MPI_User_function *) max_msp_solution, 1,
&max_msp_solution_op);

   ...

    msp_solution solution, received_solution;
    MPI_Comm comm,
    ...
    // comm is created using MPI_Comm_split
    // solution is initialized
    MPI_Reduce(&solution, &received_solution, 1, MPI_msp_solution_t,
max_msp_solution_op , 0, MPI_COMM_WORLD);
    // ERROR above!!!
}


Is there some error in this? How can I make it run?
P.S. MPI_Send and MPI_Recv on my struct MPI_msp_solution_t seems to work
fine

Thanks in advance!
Best regards,
Kuba
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140602/ec891b35/attachment.html>


More information about the discuss mailing list