[mpich-commits] [mpich] MPICH primary repository branch, 3.1.x, created. v3.1.3-185-gb05fe62

Service Account noreply at mpich.org
Fri Feb 20 12:41:28 CST 2015


This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "MPICH primary repository".

The branch, 3.1.x has been created
        at  b05fe62344b37c92a6e32fc36fbec1dc13ad1a17 (commit)

- Log -----------------------------------------------------------------
http://git.mpich.org/mpich.git/commitdiff/b05fe62344b37c92a6e32fc36fbec1dc13ad1a17

commit b05fe62344b37c92a6e32fc36fbec1dc13ad1a17
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Sat Feb 14 18:19:34 2015 -0600

    update version info for 3.1.4
    
    Signed-off-by: Antonio J. Pena <apenya at mcs.anl.gov>

diff --git a/maint/version.m4 b/maint/version.m4
index 68b7b1b..3405cb0 100644
--- a/maint/version.m4
+++ b/maint/version.m4
@@ -14,7 +14,7 @@
 # changing this by playing with diversions, but then we would probably be
 # playing with autotools-fire.
 
-m4_define([MPICH_VERSION_m4],[3.1.3])dnl
+m4_define([MPICH_VERSION_m4],[3.1.4])dnl
 m4_define([MPICH_RELEASE_DATE_m4],[unreleased development copy])dnl
 
 # For libtool ABI versioning rules see:
@@ -35,6 +35,6 @@ m4_define([MPICH_RELEASE_DATE_m4],[unreleased development copy])dnl
 # libmpi so version only includes functionality defined in the MPI
 # standard, and does not include MPIX_ functions and C++ bindings.
 
-m4_define([libmpi_so_version_m4],[12:4:0])dnl
+m4_define([libmpi_so_version_m4],[12:5:0])dnl
 
 [#] end of __file__

http://git.mpich.org/mpich.git/commitdiff/8f98a27cdcc82d9fe0e2a224c13d57c7ddef4a8e

commit 8f98a27cdcc82d9fe0e2a224c13d57c7ddef4a8e
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Sat Feb 14 18:16:22 2015 -0600

    update CHANGES file for 3.1.4
    
    Signed-off-by: Antonio J. Pena <apenya at mcs.anl.gov>

diff --git a/CHANGES b/CHANGES
index 9145ab0..cd886db 100644
--- a/CHANGES
+++ b/CHANGES
@@ -1,4 +1,31 @@
 ===============================================================================
+                               Changes in 3.1.4
+===============================================================================
+
+ # Bug fixes to MPI-3 shared memory functionality.
+
+ # Fixed a bug that prevented Fortran programs from being profiled by PMPI
+   libraries written in C.
+
+ # Fixed support for building MPICH on OSX with Intel C/C++ and Fortran compilers.
+
+ # Several bug fixes in ROMIO.
+
+ # Enhancements to the testsuite.
+
+ # Backports support for the Mellanox MXM InfiniBand interface.
+
+ # Backports support for the Mellanox HCOLL interface for collectives.
+
+ # Several other minor bug fixes, memory leak fixes, and code cleanup.
+
+   A full list of changes is available at the following link:
+
+     http://git.mpich.org/mpich.git/shortlog/v3.1.3..v3.1.4
+
+ 
+
+===============================================================================
                                Changes in 3.1.3
 ===============================================================================
 

http://git.mpich.org/mpich.git/commitdiff/f0565d14b3b6c136090cc236988a27b17487c1c1

commit f0565d14b3b6c136090cc236988a27b17487c1c1
Author: Xin Zhao <xinzhao3 at illinois.edu>
Date:   Wed Feb 11 02:15:17 2015 -0600

    Bug-fix: add barriers between init buffers and issuing RMA operations.
    
    Signed-off-by: Pavan Balaji <balaji at anl.gov>

diff --git a/test/mpi/rma/atomic_rmw_gacc.c b/test/mpi/rma/atomic_rmw_gacc.c
index d04e9c3..8bd7582 100644
--- a/test/mpi/rma/atomic_rmw_gacc.c
+++ b/test/mpi/rma/atomic_rmw_gacc.c
@@ -149,7 +149,9 @@ int main (int argc, char *argv[]) {
             MPI_Win_unlock(rank, win);
         }
 
-                MPI_Win_lock_all(0, win);
+        MPI_Barrier(MPI_COMM_WORLD);
+
+        MPI_Win_lock_all(0, win);
         if (rank != dest) {
             for (i = 0; i < my_buf_num; i++) {
                 MPI_Get_accumulate(&(orig_buf[i*OP_COUNT]), 1, origin_dtp,
@@ -176,7 +178,9 @@ int main (int argc, char *argv[]) {
             MPI_Win_unlock(rank, win);
         }
 
-                MPI_Win_lock_all(0, win);
+        MPI_Barrier(MPI_COMM_WORLD);
+
+        MPI_Win_lock_all(0, win);
         if (rank != dest) {
             for (i = 0; i < my_buf_num; i++) {
                 MPI_Get_accumulate(&(orig_buf[i*OP_COUNT]), OP_COUNT, MPI_INT,
@@ -203,7 +207,9 @@ int main (int argc, char *argv[]) {
             MPI_Win_unlock(rank, win);
         }
 
-                MPI_Win_lock_all(0, win);
+        MPI_Barrier(MPI_COMM_WORLD);
+
+        MPI_Win_lock_all(0, win);
         if (rank != dest) {
             for (i = 0; i < my_buf_num; i++) {
                 MPI_Get_accumulate(&(orig_buf[i*OP_COUNT]), 1, origin_dtp,

http://git.mpich.org/mpich.git/commitdiff/8766b10f3abd51a065ab6bb91a88ed4223170e63

commit 8766b10f3abd51a065ab6bb91a88ed4223170e63
Author: Igor Ivanov <Igor.Ivanov at itseez.com>
Date:   Mon Feb 9 14:36:02 2015 +0200

    mpid/sched: Fix issue with schedule entries list processing
    
    Call of MPID_Sched_cb callback function can force list memory
    reallocation. As a result entry point proccessed before call
    can become invalid. It should be set again after callback call.
    
    Signed-off-by: Devendar Bureddy <devendar at mellanox.com>
    Signed-off-by: Igor Ivanov <Igor.Ivanov at itseez.com>
    Signed-off-by: Wesley Bland <wbland at anl.gov>

diff --git a/src/mpid/common/sched/mpid_sched.c b/src/mpid/common/sched/mpid_sched.c
index 21082d1..15e332c 100644
--- a/src/mpid/common/sched/mpid_sched.c
+++ b/src/mpid/common/sched/mpid_sched.c
@@ -196,6 +196,8 @@ static int MPIDU_Sched_start_entry(struct MPIDU_Sched *s, size_t idx, struct MPI
         case MPIDU_SCHED_ENTRY_CB:
             if (e->u.cb.cb_type == MPIDU_SCHED_CB_TYPE_1) {
                 mpi_errno = e->u.cb.u.cb_p(r->comm, s->tag, e->u.cb.cb_state);
+                /* Sched entries list can be reallocated inside callback */
+                e = &s->entries[idx];
                 if (mpi_errno) {
                     e->status = MPIDU_SCHED_ENTRY_STATUS_FAILED;
                     MPIU_ERR_POP(mpi_errno);
@@ -203,6 +205,8 @@ static int MPIDU_Sched_start_entry(struct MPIDU_Sched *s, size_t idx, struct MPI
             }
             else if (e->u.cb.cb_type == MPIDU_SCHED_CB_TYPE_2) {
                 mpi_errno = e->u.cb.u.cb2_p(r->comm, s->tag, e->u.cb.cb_state, e->u.cb.cb_state2);
+                /* Sched entries list can be reallocated inside callback */
+                e = &s->entries[idx];
                 if (mpi_errno) {
                     e->status = MPIDU_SCHED_ENTRY_STATUS_FAILED;
                     MPIU_ERR_POP(mpi_errno);
@@ -244,6 +248,8 @@ static int MPIDU_Sched_continue(struct MPIDU_Sched *s)
 
         if (e->status == MPIDU_SCHED_ENTRY_STATUS_NOT_STARTED) {
             mpi_errno = MPIDU_Sched_start_entry(s, i, e);
+            /* Sched entries list can be reallocated inside callback */
+            e = &s->entries[i];
             if (mpi_errno) MPIU_ERR_POP(mpi_errno);
         }
 

http://git.mpich.org/mpich.git/commitdiff/1df34380ad52ce3563fe51366d92802f64c92c7b

commit 1df34380ad52ce3563fe51366d92802f64c92c7b
Author: Xin Zhao <xinzhao3 at illinois.edu>
Date:   Thu Feb 5 01:41:01 2015 -0800

    Add tests to test the atomicity for FOP, CAS and GACC operations.
    
    The entire "read-modify-write" should be atomic for CAS, FOP and
    GACC operations. This patch adds corresponding tests for them.
    
    Signed-off-by: Pavan Balaji <balaji at anl.gov>

diff --git a/test/mpi/rma/Makefile.am b/test/mpi/rma/Makefile.am
index 507de42..8a5308c 100644
--- a/test/mpi/rma/Makefile.am
+++ b/test/mpi/rma/Makefile.am
@@ -140,7 +140,10 @@ noinst_PROGRAMS =          \
     get-struct             \
     rput_local_comp        \
     racc_local_comp        \
-    at_complete
+    at_complete            \
+    atomic_rmw_fop         \
+    atomic_rmw_cas         \
+    atomic_rmw_gacc
 
 strided_acc_indexed_LDADD       = $(LDADD) -lm
 strided_acc_onelock_LDADD       = $(LDADD) -lm
diff --git a/test/mpi/rma/atomic_rmw_cas.c b/test/mpi/rma/atomic_rmw_cas.c
new file mode 100644
index 0000000..2b9a711
--- /dev/null
+++ b/test/mpi/rma/atomic_rmw_cas.c
@@ -0,0 +1,129 @@
+/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
+/*
+ *
+ *  (C) 2015 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ */
+
+/* This test is going to test the atomicity for "read-modify-write" in CAS
+ * operations */
+
+/* There are three processes involved in this test: P0 (origin_shm), P1 (origin_am),
+ * and P2 (dest). P0 and P1 issues one CAS to P2 via SHM and AM respectively.
+ * For P0, origin value is 1 and compare value is 0; for P1, origin value is 0 and
+ * compare value is 1; for P2, initial target value is 0. The correct results can
+ * only be one of the following cases:
+ *
+ *   (1) result value on P0: 0, result value on P1: 0, target value on P2: 1.
+ *   (2) result value on P0: 0, result value on P1: 1, target value on P2: 0.
+ *
+ * All other results are not correct. */
+
+#include "mpi.h"
+#include <stdio.h>
+
+#define LOOP_SIZE 10000
+#define CHECK_TAG 123
+
+int main (int argc, char *argv[]) {
+    int rank, size, i, j, k;
+    int errors = 0;
+    int origin_shm, origin_am, dest;
+    int *orig_buf = NULL, *result_buf = NULL, *compare_buf = NULL,
+        *target_buf = NULL, *check_buf = NULL;
+    MPI_Win win;
+    MPI_Status status;
+
+    MPI_Init(&argc, &argv);
+
+    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
+    MPI_Comm_size(MPI_COMM_WORLD, &size);
+    if (size != 3) {
+        /* run this test with three processes */
+        goto exit_test;
+    }
+
+    /* this works when MPIR_PARAM_CH3_ODD_EVEN_CLIQUES is set */
+    dest = 2;
+    origin_shm = 0;
+    origin_am = 1;
+
+    if (rank != dest) {
+        MPI_Alloc_mem(sizeof(int), MPI_INFO_NULL, &orig_buf);
+        MPI_Alloc_mem(sizeof(int), MPI_INFO_NULL, &result_buf);
+        MPI_Alloc_mem(sizeof(int), MPI_INFO_NULL, &compare_buf);
+    }
+
+    MPI_Win_allocate(sizeof(int), sizeof(int), MPI_INFO_NULL,
+                     MPI_COMM_WORLD, &target_buf, &win);
+
+    for (k = 0; k < LOOP_SIZE; k++)  {
+
+        /* init buffers */
+        if (rank == origin_shm) {
+            orig_buf[0] = 1;
+            compare_buf[0] = 0;
+            result_buf[0] = 0;
+        }
+        else if (rank == origin_am) {
+            orig_buf[0] = 0;
+            compare_buf[0] = 1;
+            result_buf[0] = 0;
+        }
+        else {
+            MPI_Win_lock(MPI_LOCK_SHARED, rank, 0, win);
+            target_buf[0] = 0;
+            MPI_Win_unlock(rank, win);
+        }
+
+        MPI_Barrier(MPI_COMM_WORLD);
+
+        /* perform FOP */
+        MPI_Win_lock_all(0, win);
+        if (rank != dest) {
+            MPI_Compare_and_swap(orig_buf, compare_buf, result_buf, MPI_INT, dest, 0, win);
+            MPI_Win_flush(dest, win);
+        }
+        MPI_Win_unlock_all(win);
+
+        MPI_Barrier(MPI_COMM_WORLD);
+
+        /* check results */
+        if (rank != dest) {
+            MPI_Gather(result_buf, 1, MPI_INT, check_buf, 1, MPI_INT, dest, MPI_COMM_WORLD);
+        }
+        else {
+            MPI_Alloc_mem(sizeof(int) * 3, MPI_INFO_NULL, &check_buf);
+            MPI_Gather(target_buf, 1, MPI_INT, check_buf, 1, MPI_INT, dest, MPI_COMM_WORLD);
+
+            if (!(check_buf[dest] == 0 && check_buf[origin_shm] == 0 && check_buf[origin_am] == 1) &&
+                !(check_buf[dest] == 1 && check_buf[origin_shm] == 0 && check_buf[origin_am] == 0)) {
+
+                printf("Wrong results: target result = %d, origin_shm result = %d, origin_am result = %d\n",
+                       check_buf[dest], check_buf[origin_shm], check_buf[origin_am]);
+
+                printf("Expected results (1): target result = 1, origin_shm result = 0, origin_am result = 0\n");
+                printf("Expected results (2): target result = 0, origin_shm result = 0, origin_am result = 1\n");
+
+                errors++;
+            }
+
+            MPI_Free_mem(check_buf);
+        }
+    }
+
+    MPI_Win_free(&win);
+
+    if (rank == origin_am || rank == origin_shm) {
+        MPI_Free_mem(orig_buf);
+        MPI_Free_mem(result_buf);
+        MPI_Free_mem(compare_buf);
+    }
+
+ exit_test:
+    if (rank == dest && errors == 0)
+        printf(" No Errors\n");
+
+    MPI_Finalize();
+    return 0;
+}
diff --git a/test/mpi/rma/atomic_rmw_fop.c b/test/mpi/rma/atomic_rmw_fop.c
new file mode 100644
index 0000000..873efe8
--- /dev/null
+++ b/test/mpi/rma/atomic_rmw_fop.c
@@ -0,0 +1,131 @@
+/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
+/*
+ *
+ *  (C) 2015 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ */
+
+/* This test is going to test the atomicity for "read-modify-write" in FOP
+ * operations */
+
+/* There are three processes involved in this test: P0 (origin_shm), P1 (origin_am),
+ * and P2 (dest). P0 and P1 issues multiple FOP with MPI_SUM and integer (value 1)
+ * to P2 via SHM and AM respectively. The correct results should be that the
+ * results on P0 and P1 never be the same. */
+
+#include "mpi.h"
+#include <stdio.h>
+
+#define AM_BUF_SIZE  10
+#define SHM_BUF_SIZE 1000
+#define WIN_BUF_SIZE 1
+
+#define LOOP_SIZE 15
+#define CHECK_TAG 123
+
+int main (int argc, char *argv[]) {
+    int rank, size, i, j, k;
+    int errors = 0, all_errors = 0;
+    int origin_shm, origin_am, dest;
+    int my_buf_size;
+    int *orig_buf = NULL, *result_buf = NULL, *target_buf = NULL, *check_buf = NULL;
+    MPI_Win win;
+    MPI_Status status;
+
+    MPI_Init(&argc, &argv);
+
+    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
+    MPI_Comm_size(MPI_COMM_WORLD, &size);
+    if (size != 3) {
+        /* run this test with three processes */
+        goto exit_test;
+    }
+
+    /* this works when MPIR_PARAM_CH3_ODD_EVEN_CLIQUES is set */
+    dest = 2;
+    origin_shm = 0;
+    origin_am = 1;
+
+    if (rank == origin_am) my_buf_size = AM_BUF_SIZE;
+    else if (rank == origin_shm) my_buf_size = SHM_BUF_SIZE;
+
+    if (rank != dest) {
+        MPI_Alloc_mem(sizeof(int) * my_buf_size, MPI_INFO_NULL, &orig_buf);
+        MPI_Alloc_mem(sizeof(int) * my_buf_size, MPI_INFO_NULL, &result_buf);
+    }
+
+    MPI_Win_allocate(sizeof(int) * WIN_BUF_SIZE, sizeof(int), MPI_INFO_NULL,
+                     MPI_COMM_WORLD, &target_buf, &win);
+
+    for (k = 0; k < LOOP_SIZE; k++)  {
+
+        /* init buffers */
+        if (rank != dest) {
+            for (i = 0; i < my_buf_size; i++) {orig_buf[i] = 1; result_buf[i] = 0;}
+        }
+        else {
+            MPI_Win_lock(MPI_LOCK_SHARED, rank, 0, win);
+            for (i = 0; i < WIN_BUF_SIZE; i++) {target_buf[i] = 0;}
+            MPI_Win_unlock(rank, win);
+        }
+
+        MPI_Barrier(MPI_COMM_WORLD);
+
+        /* perform FOP */
+        MPI_Win_lock_all(0, win);
+        if (rank != dest) {
+            for (i = 0; i < my_buf_size; i++) {
+                MPI_Fetch_and_op(&(orig_buf[i]), &(result_buf[i]), MPI_INT, dest, 0, MPI_SUM, win);
+                MPI_Win_flush(dest, win);
+            }
+        }
+        MPI_Win_unlock_all(win);
+
+        MPI_Barrier(MPI_COMM_WORLD);
+
+        if (rank != dest) {
+            /* check results on P0 and P2 (origin) */
+            if (rank == origin_am) {
+                MPI_Send(result_buf, AM_BUF_SIZE, MPI_INT, origin_shm, CHECK_TAG, MPI_COMM_WORLD);
+            }
+            else if (rank == origin_shm) {
+                MPI_Alloc_mem(sizeof(int) * AM_BUF_SIZE, MPI_INFO_NULL, &check_buf);
+                MPI_Recv(check_buf, AM_BUF_SIZE, MPI_INT, origin_am, CHECK_TAG, MPI_COMM_WORLD, &status);
+                for (i = 0; i < AM_BUF_SIZE; i++) {
+                    for (j = 0; j < SHM_BUF_SIZE; j++) {
+                        if (check_buf[i] == result_buf[j]) {
+                            printf("LOOP=%d, rank=%d, FOP, both check_buf[%d] and result_buf[%d] equal to %d, expected to be different. \n",
+                                   k, rank, i, j, check_buf[i]);
+                            errors++;
+                        }
+                    }
+                }
+                MPI_Free_mem(check_buf);
+            }
+        }
+        else {
+            /* check results on P1 */
+            if (target_buf[0] != AM_BUF_SIZE + SHM_BUF_SIZE) {
+                printf("LOOP=%d, rank=%d, FOP, target_buf[0] = %d, expected %d. \n",
+                       k, rank, target_buf[0], AM_BUF_SIZE+SHM_BUF_SIZE);
+                errors++;
+            }
+        }
+    }
+
+    MPI_Win_free(&win);
+
+    if (rank == origin_am || rank == origin_shm) {
+        MPI_Free_mem(orig_buf);
+        MPI_Free_mem(result_buf);
+    }
+
+ exit_test:
+    MPI_Reduce(&errors, &all_errors, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
+
+    if (rank == 0 && all_errors == 0)
+        printf(" No Errors\n");
+
+    MPI_Finalize();
+    return 0;
+}
diff --git a/test/mpi/rma/atomic_rmw_gacc.c b/test/mpi/rma/atomic_rmw_gacc.c
new file mode 100644
index 0000000..d04e9c3
--- /dev/null
+++ b/test/mpi/rma/atomic_rmw_gacc.c
@@ -0,0 +1,240 @@
+/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
+/*
+ *
+ *  (C) 2015 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ */
+
+/* This test is going to test the atomicity for "read-modify-write" in GACC
+ * operations */
+
+/* This test is similiar with atomic_rmw_fop.c.
+ * There are three processes involved in this test: P0 (origin_shm), P1 (origin_am),
+ * and P2 (dest). P0 and P1 issues multiple GACC with MPI_SUM and OP_COUNT integers
+ * (value 1) to P2 via SHM and AM respectively. The correct results should be that the
+ * results on P0 and P1 never be the same for intergers on the corresponding index
+ * in [0...OP_COUNT-1].
+ */
+
+#include "mpi.h"
+#include <stdio.h>
+
+#define OP_COUNT 10
+#define AM_BUF_NUM  10
+#define SHM_BUF_NUM 10000
+#define WIN_BUF_NUM 1
+
+#define LOOP_SIZE 15
+#define CHECK_TAG 123
+
+int rank, size;
+int dest, origin_shm, origin_am;
+int *orig_buf = NULL, *result_buf = NULL, *target_buf = NULL, *check_buf = NULL;
+
+void checkResults(int loop_k, int *errors) {
+    int i, j, m;
+    MPI_Status status;
+
+    if (rank != dest) {
+        /* check results on P0 and P2 (origin) */
+        if (rank == origin_am) {
+            MPI_Send(result_buf, AM_BUF_NUM * OP_COUNT, MPI_INT, origin_shm, CHECK_TAG, MPI_COMM_WORLD);
+        }
+        else if (rank == origin_shm) {
+            MPI_Alloc_mem(sizeof(int) * AM_BUF_NUM * OP_COUNT, MPI_INFO_NULL, &check_buf);
+            MPI_Recv(check_buf, AM_BUF_NUM * OP_COUNT, MPI_INT, origin_am, CHECK_TAG, MPI_COMM_WORLD, &status);
+            for (i = 0; i < AM_BUF_NUM; i++) {
+                for (j = 0; j < SHM_BUF_NUM; j++) {
+                    for (m = 0; m < OP_COUNT; m++) {
+                        if (check_buf[i*OP_COUNT+m] == result_buf[j*OP_COUNT+m]) {
+                            printf("LOOP=%d, rank=%d, FOP, both check_buf[%d] and result_buf[%d] equal to %d, expected to be different. \n",
+                                   loop_k, rank, i*OP_COUNT+m, j*OP_COUNT+m, check_buf[i*OP_COUNT+m]);
+                            (*errors)++;
+                        }
+                    }
+                }
+            }
+            MPI_Free_mem(check_buf);
+        }
+    }
+    else {
+        /* check results on P1 */
+        for (i = 0; i < OP_COUNT; i++) {
+            if (target_buf[i] != AM_BUF_NUM + SHM_BUF_NUM) {
+                printf("LOOP=%d, rank=%d, FOP, target_buf[%d] = %d, expected %d. \n",
+                       loop_k, rank, i, target_buf[i], AM_BUF_NUM+SHM_BUF_NUM);
+                (*errors)++;
+            }
+        }
+    }
+}
+
+int main (int argc, char *argv[]) {
+    int i, j, k;
+    int errors = 0, all_errors = 0;
+    int my_buf_num;
+    MPI_Win win;
+    MPI_Datatype origin_dtp, target_dtp;
+
+    MPI_Init(&argc, &argv);
+
+    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
+    MPI_Comm_size(MPI_COMM_WORLD, &size);
+    if (size != 3) {
+        /* run this test with three processes */
+        goto exit_test;
+    }
+
+    MPI_Type_contiguous(OP_COUNT, MPI_INT, &origin_dtp);
+    MPI_Type_commit(&origin_dtp);
+    MPI_Type_contiguous(OP_COUNT, MPI_INT, &target_dtp);
+    MPI_Type_commit(&target_dtp);
+
+    /* this works when MPIR_PARAM_CH3_ODD_EVEN_CLIQUES is set */
+    dest = 2;
+    origin_shm = 0;
+    origin_am = 1;
+
+    if (rank == origin_am) my_buf_num = AM_BUF_NUM;
+    else if (rank == origin_shm) my_buf_num = SHM_BUF_NUM;
+
+    if (rank != dest) {
+        MPI_Alloc_mem(sizeof(int) * my_buf_num * OP_COUNT, MPI_INFO_NULL, &orig_buf);
+        MPI_Alloc_mem(sizeof(int) * my_buf_num * OP_COUNT, MPI_INFO_NULL, &result_buf);
+    }
+
+    MPI_Win_allocate(sizeof(int) * WIN_BUF_NUM * OP_COUNT, sizeof(int), MPI_INFO_NULL,
+                     MPI_COMM_WORLD, &target_buf, &win);
+
+    for (k = 0; k < LOOP_SIZE; k++)  {
+
+        /* ====== Part 1: test basic datatypes ======== */
+
+        /* init buffers */
+        if (rank != dest) {
+            for (i = 0; i < my_buf_num * OP_COUNT; i++) {orig_buf[i] = 1; result_buf[i] = 0;}
+        }
+        else {
+            MPI_Win_lock(MPI_LOCK_SHARED, rank, 0, win);
+            for (i = 0; i < WIN_BUF_NUM * OP_COUNT; i++) {target_buf[i] = 0;}
+            MPI_Win_unlock(rank, win);
+        }
+
+        MPI_Barrier(MPI_COMM_WORLD);
+
+        MPI_Win_lock_all(0, win);
+        if (rank != dest) {
+            for (i = 0; i < my_buf_num; i++) {
+                MPI_Get_accumulate(&(orig_buf[i*OP_COUNT]), OP_COUNT, MPI_INT,
+                                   &(result_buf[i*OP_COUNT]), OP_COUNT, MPI_INT,
+                                   dest, 0, OP_COUNT, MPI_INT, MPI_SUM, win);
+                MPI_Win_flush(dest, win);
+            }
+        }
+        MPI_Win_unlock_all(win);
+
+        MPI_Barrier(MPI_COMM_WORLD);
+
+        checkResults(k, &errors);
+
+        /* ====== Part 2: test derived datatypes (origin derived, target derived) ======== */
+
+        /* init buffers */
+        if (rank != dest) {
+            for (i = 0; i < my_buf_num * OP_COUNT; i++) {orig_buf[i] = 1; result_buf[i] = 0;}
+        }
+        else {
+            MPI_Win_lock(MPI_LOCK_SHARED, rank, 0, win);
+            for (i = 0; i < WIN_BUF_NUM * OP_COUNT; i++) {target_buf[i] = 0;}
+            MPI_Win_unlock(rank, win);
+        }
+
+                MPI_Win_lock_all(0, win);
+        if (rank != dest) {
+            for (i = 0; i < my_buf_num; i++) {
+                MPI_Get_accumulate(&(orig_buf[i*OP_COUNT]), 1, origin_dtp,
+                                   &(result_buf[i*OP_COUNT]), 1, origin_dtp,
+                                   dest, 0, 1, target_dtp, MPI_SUM, win);
+                MPI_Win_flush(dest, win);
+            }
+        }
+        MPI_Win_unlock_all(win);
+
+        MPI_Barrier(MPI_COMM_WORLD);
+
+        checkResults(k, &errors);
+
+        /* ====== Part 3: test derived datatypes (origin basic, target derived) ======== */
+
+        /* init buffers */
+        if (rank != dest) {
+            for (i = 0; i < my_buf_num * OP_COUNT; i++) {orig_buf[i] = 1; result_buf[i] = 0;}
+        }
+        else {
+            MPI_Win_lock(MPI_LOCK_SHARED, rank, 0, win);
+            for (i = 0; i < WIN_BUF_NUM * OP_COUNT; i++) {target_buf[i] = 0;}
+            MPI_Win_unlock(rank, win);
+        }
+
+                MPI_Win_lock_all(0, win);
+        if (rank != dest) {
+            for (i = 0; i < my_buf_num; i++) {
+                MPI_Get_accumulate(&(orig_buf[i*OP_COUNT]), OP_COUNT, MPI_INT,
+                                   &(result_buf[i*OP_COUNT]), OP_COUNT, MPI_INT,
+                                   dest, 0, 1, target_dtp, MPI_SUM, win);
+                MPI_Win_flush(dest, win);
+            }
+        }
+        MPI_Win_unlock_all(win);
+
+        MPI_Barrier(MPI_COMM_WORLD);
+
+        checkResults(k, &errors);
+
+        /* ====== Part 4: test derived datatypes (origin derived target basic) ======== */
+
+        /* init buffers */
+        if (rank != dest) {
+            for (i = 0; i < my_buf_num * OP_COUNT; i++) {orig_buf[i] = 1; result_buf[i] = 0;}
+        }
+        else {
+            MPI_Win_lock(MPI_LOCK_SHARED, rank, 0, win);
+            for (i = 0; i < WIN_BUF_NUM * OP_COUNT; i++) {target_buf[i] = 0;}
+            MPI_Win_unlock(rank, win);
+        }
+
+                MPI_Win_lock_all(0, win);
+        if (rank != dest) {
+            for (i = 0; i < my_buf_num; i++) {
+                MPI_Get_accumulate(&(orig_buf[i*OP_COUNT]), 1, origin_dtp,
+                                   &(result_buf[i*OP_COUNT]), 1, origin_dtp,
+                                   dest, 0, OP_COUNT, MPI_INT, MPI_SUM, win);
+                MPI_Win_flush(dest, win);
+            }
+        }
+        MPI_Win_unlock_all(win);
+
+        MPI_Barrier(MPI_COMM_WORLD);
+
+        checkResults(k, &errors);
+    }
+
+    MPI_Win_free(&win);
+
+    if (rank == origin_am || rank == origin_shm) {
+        MPI_Free_mem(orig_buf);
+        MPI_Free_mem(result_buf);
+    }
+
+    MPI_Type_free(&origin_dtp);
+    MPI_Type_free(&target_dtp);
+
+ exit_test:
+    MPI_Reduce(&errors, &all_errors, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
+
+    if (rank == 0 && all_errors == 0)
+        printf(" No Errors\n");
+
+    MPI_Finalize();
+    return 0;
+}
diff --git a/test/mpi/rma/testlist.in b/test/mpi/rma/testlist.in
index 40318b3..a6149f1 100644
--- a/test/mpi/rma/testlist.in
+++ b/test/mpi/rma/testlist.in
@@ -128,6 +128,9 @@ win_shared_zerobyte 4 mpiversion=3.0
 win_shared_put_flush_get 4 mpiversion=3.0
 get-struct 2
 at_complete 2
+atomic_rmw_fop 3
+atomic_rmw_cas 3
+atomic_rmw_gacc 3
 
 ## This test is not strictly correct.  This was meant to test out the
 ## case when MPI_Test is not nonblocking.  However, we ended up

http://git.mpich.org/mpich.git/commitdiff/d4a3e09e4f74894ffc4615db2579ee9491462afa

commit d4a3e09e4f74894ffc4615db2579ee9491462afa
Author: Xin Zhao <xinzhao3 at illinois.edu>
Date:   Wed Feb 4 05:21:04 2015 -0800

    Bug-fix: guarantee atomicity for FOP and GACC.
    
    FOP, CAS and GACC are atomic "read-modify-write" operations,
    which means when the target window is defined on a SHM region,
    we need inter-process lock to guarantee the atomicity of the
    entire "read+OP". The current implementation is correct for
    SHM-based RMA operations, but not correct for AM-based RMA
    operations: for SHM-based operations, it protects the entire
    "read+OP", but for AM-based operations, it only protects the
    "OP" part.
    
    This patch fixes this issue by protecting the memory copy to
    temporary buffer and computation together for AM-based operations.
    
    Fix ticket 2226
    
    Signed-off-by: Pavan Balaji <balaji at anl.gov>

diff --git a/src/mpid/ch3/src/ch3u_handle_recv_req.c b/src/mpid/ch3/src/ch3u_handle_recv_req.c
index 5ed0828..1ff6a30 100644
--- a/src/mpid/ch3/src/ch3u_handle_recv_req.c
+++ b/src/mpid/ch3/src/ch3u_handle_recv_req.c
@@ -112,6 +112,10 @@ int MPIDI_CH3_ReqHandler_PutAccumRespComplete( MPIDI_VC_t *vc,
         MPIU_CHKPMEM_MALLOC(resp_req->dev.user_buf, void *, rreq->dev.user_count * type_size,
                             mpi_errno, "GACC resp. buffer");
 
+        /* atomic read-modify-write for get_acc*/
+        if (win_ptr->shm_allocated == TRUE)
+            MPIDI_CH3I_SHM_MUTEX_LOCK(win_ptr);
+
         if (MPIR_DATATYPE_IS_PREDEFINED(rreq->dev.datatype)) {
             MPIU_Memcpy(resp_req->dev.user_buf, rreq->dev.real_user_buf, 
                         rreq->dev.user_count * type_size);
@@ -125,6 +129,10 @@ int MPIDI_CH3_ReqHandler_PutAccumRespComplete( MPIDI_VC_t *vc,
             MPID_Segment_free(seg);
         }
 
+        mpi_errno = do_accumulate_op(rreq);
+        if (win_ptr->shm_allocated == TRUE)
+            MPIDI_CH3I_SHM_MUTEX_UNLOCK(win_ptr);
+
         resp_req->dev.OnFinal = MPIDI_CH3_ReqHandler_GetAccumRespComplete;
         resp_req->dev.OnDataAvail = MPIDI_CH3_ReqHandler_GetAccumRespComplete;
         resp_req->dev.target_win_handle = rreq->dev.target_win_handle;
@@ -151,8 +159,7 @@ int MPIDI_CH3_ReqHandler_PutAccumRespComplete( MPIDI_VC_t *vc,
 
         get_acc_flag = 1;
     }
-
-    if (MPIDI_Request_get_type(rreq) == MPIDI_REQUEST_TYPE_ACCUM_RESP) {
+    else if (MPIDI_Request_get_type(rreq) == MPIDI_REQUEST_TYPE_ACCUM_RESP) {
 
 	if (win_ptr->shm_allocated == TRUE)
 	    MPIDI_CH3I_SHM_MUTEX_LOCK(win_ptr);
@@ -530,6 +537,10 @@ int MPIDI_CH3_ReqHandler_FOPComplete( MPIDI_VC_t *vc,
 
     MPID_Win_get_ptr(rreq->dev.target_win_handle, win_ptr);
 
+    /* Atomic read-modify-write for FOP */
+    if (win_ptr->shm_allocated == TRUE)
+        MPIDI_CH3I_SHM_MUTEX_LOCK(win_ptr);
+
     /* Copy original data into the send buffer.  If data will fit in the
        header, use that.  Otherwise allocate a temporary buffer.  */
     if (len <= sizeof(fop_resp_pkt->data)) {
@@ -558,13 +569,12 @@ int MPIDI_CH3_ReqHandler_FOPComplete( MPIDI_VC_t *vc,
         uop = MPIR_OP_HDL_TO_FN(rreq->dev.op);
         one = 1;
 
-        if (win_ptr->shm_allocated == TRUE)
-            MPIDI_CH3I_SHM_MUTEX_LOCK(win_ptr);
         (*uop)(rreq->dev.user_buf, rreq->dev.real_user_buf, &one, &rreq->dev.datatype);
-        if (win_ptr->shm_allocated == TRUE)
-            MPIDI_CH3I_SHM_MUTEX_UNLOCK(win_ptr);
     }
 
+    if (win_ptr->shm_allocated == TRUE)
+        MPIDI_CH3I_SHM_MUTEX_UNLOCK(win_ptr);
+
     /* Send back the original data.  We do this here to ensure that the
        operation is remote complete before responding to the origin. */
     if (len <= sizeof(fop_resp_pkt->data)) {

http://git.mpich.org/mpich.git/commitdiff/6379e66882bebe85bc77e7b0f47ebc7bcd0a79ac

commit 6379e66882bebe85bc77e7b0f47ebc7bcd0a79ac
Author: Xin Zhao <xinzhao3 at illinois.edu>
Date:   Wed Feb 4 16:48:40 2015 -0800

    Add tests to test cases when both SHM window and non-SHM window exist.
    
    Signed-off-by: Pavan Balaji <balaji at anl.gov>

diff --git a/test/mpi/rma/testlist.in b/test/mpi/rma/testlist.in
index dd9aa16..40318b3 100644
--- a/test/mpi/rma/testlist.in
+++ b/test/mpi/rma/testlist.in
@@ -71,6 +71,7 @@ contention_putget 4
 put_base 2
 put_bottom 2
 win_flavors 4 mpiversion=3.0
+win_flavors 3 mpiversion=3.0
 manyrma2 2 timeLimit=500
 manyrma2_shm 2 timeLimit=500
 manyrma3 2
@@ -81,6 +82,7 @@ win_shared_noncontig 4 mpiversion=3.0
 win_shared_noncontig_put 4 mpiversion=3.0
 win_zero 4 mpiversion=3.0
 @largetest at win_large_shm 4 mpiversion=3.0
+ at largetest@win_large_shm 3 mpiversion=3.0
 win_dynamic_acc 4 mpiversion=3.0
 get_acc_local 1 mpiversion=3.0
 linked_list 4 mpiversion=3.0
diff --git a/test/mpi/rma/win_large_shm.c b/test/mpi/rma/win_large_shm.c
index fe730d2..583eab2 100644
--- a/test/mpi/rma/win_large_shm.c
+++ b/test/mpi/rma/win_large_shm.c
@@ -17,62 +17,71 @@ int main(int argc, char **argv) {
     MPI_Win win;
     MPI_Info win_info;
     MPI_Comm shared_comm;
+    int i;
     int shm_win_size = 1024 * 1024 * 1024 * sizeof(char); /* 1GB */
 
     MPI_Init(&argc, &argv);
 
     MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
 
-    MPI_Info_create(&win_info);
-    MPI_Info_set(win_info, (char*)"alloc_shm", (char*)"true");
+    for (i = 0; i < 2; i++) {
+        if (i == 0) {
+            MPI_Info_create(&win_info);
+            MPI_Info_set(win_info, (char*)"alloc_shm", (char*)"true");
+        }
+        else {
+            win_info = MPI_INFO_NULL;
+        }
 
-    MPI_Comm_split_type(MPI_COMM_WORLD, MPI_COMM_TYPE_SHARED, my_rank, MPI_INFO_NULL, &shared_comm);
+        MPI_Comm_split_type(MPI_COMM_WORLD, MPI_COMM_TYPE_SHARED, my_rank, MPI_INFO_NULL, &shared_comm);
 
-    MPI_Comm_rank(shared_comm, &shared_rank);
+        MPI_Comm_rank(shared_comm, &shared_rank);
 
-    /* every processes allocate 1GB window memory */
-    MPI_Win_allocate(shm_win_size, sizeof(char), win_info, MPI_COMM_WORLD, &mybase, &win);
+        /* every processes allocate 1GB window memory */
+        MPI_Win_allocate(shm_win_size, sizeof(char), win_info, MPI_COMM_WORLD, &mybase, &win);
 
-    MPI_Win_free(&win);
+        MPI_Win_free(&win);
 
-    MPI_Win_allocate_shared(shm_win_size, sizeof(char), win_info, shared_comm, &mybase, &win);
+        MPI_Win_allocate_shared(shm_win_size, sizeof(char), win_info, shared_comm, &mybase, &win);
 
-    MPI_Win_free(&win);
+        MPI_Win_free(&win);
 
-    /* some processes allocate 1GB and some processes allocate zero bytes */
-    if (my_rank % 2 == 0)
-        MPI_Win_allocate(shm_win_size, sizeof(char), win_info, MPI_COMM_WORLD, &mybase, &win);
-    else
-        MPI_Win_allocate(0, sizeof(char), win_info, MPI_COMM_WORLD, &mybase, &win);
+        /* some processes allocate 1GB and some processes allocate zero bytes */
+        if (my_rank % 2 == 0)
+            MPI_Win_allocate(shm_win_size, sizeof(char), win_info, MPI_COMM_WORLD, &mybase, &win);
+        else
+            MPI_Win_allocate(0, sizeof(char), win_info, MPI_COMM_WORLD, &mybase, &win);
 
-    MPI_Win_free(&win);
+        MPI_Win_free(&win);
 
-    if (shared_rank % 2 == 0)
-        MPI_Win_allocate_shared(shm_win_size, sizeof(char), win_info, shared_comm, &mybase, &win);
-    else
-        MPI_Win_allocate_shared(0, sizeof(char), win_info, shared_comm, &mybase, &win);
+        if (shared_rank % 2 == 0)
+            MPI_Win_allocate_shared(shm_win_size, sizeof(char), win_info, shared_comm, &mybase, &win);
+        else
+            MPI_Win_allocate_shared(0, sizeof(char), win_info, shared_comm, &mybase, &win);
 
-    MPI_Win_free(&win);
+        MPI_Win_free(&win);
 
-    /* some processes allocate 1GB and some processes allocate smaller bytes */
-    if (my_rank % 2 == 0)
-        MPI_Win_allocate(shm_win_size, sizeof(char), win_info, MPI_COMM_WORLD, &mybase, &win);
-    else
-        MPI_Win_allocate(shm_win_size/2, sizeof(char), win_info, MPI_COMM_WORLD, &mybase, &win);
+        /* some processes allocate 1GB and some processes allocate smaller bytes */
+        if (my_rank % 2 == 0)
+            MPI_Win_allocate(shm_win_size, sizeof(char), win_info, MPI_COMM_WORLD, &mybase, &win);
+        else
+            MPI_Win_allocate(shm_win_size/2, sizeof(char), win_info, MPI_COMM_WORLD, &mybase, &win);
 
-    MPI_Win_free(&win);
+        MPI_Win_free(&win);
 
-    /* some processes allocate 1GB and some processes allocate smaller bytes */
-    if (shared_rank % 2 == 0)
-        MPI_Win_allocate_shared(shm_win_size, sizeof(char), win_info, shared_comm, &mybase, &win);
-    else
-        MPI_Win_allocate_shared(shm_win_size/2, sizeof(char), win_info, shared_comm, &mybase, &win);
+        /* some processes allocate 1GB and some processes allocate smaller bytes */
+        if (shared_rank % 2 == 0)
+            MPI_Win_allocate_shared(shm_win_size, sizeof(char), win_info, shared_comm, &mybase, &win);
+        else
+            MPI_Win_allocate_shared(shm_win_size/2, sizeof(char), win_info, shared_comm, &mybase, &win);
 
-    MPI_Win_free(&win);
+        MPI_Win_free(&win);
 
-    MPI_Comm_free(&shared_comm);
+        MPI_Comm_free(&shared_comm);
 
-    MPI_Info_free(&win_info);
+        if (i == 0)
+            MPI_Info_free(&win_info);
+    }
 
     if (my_rank == 0)
         printf(" No Errors\n");

http://git.mpich.org/mpich.git/commitdiff/49610f1fed8a92c635e1de1b46e36c694378f467

commit 49610f1fed8a92c635e1de1b46e36c694378f467
Author: Xin Zhao <xinzhao3 at illinois.edu>
Date:   Wed Feb 4 16:36:57 2015 -0800

    Bug-fix: making processes with SHM and without SHM win work corrrectly.
    
    In commit 7d71278, if node_comm is NULL (only self process is on that
    node), we call allocate_no_shm() in CH3 to allocate window. If
    node_comm is not NULL (more than one process is on the same node), we
    call allocate_shm() in Nemesis to allocate SHM window. However,
    the exchanged information amount (in MPI_Allgather) is different
    in allocate_no_shm() and allocate_shm(), which leads to wrong execution
    when both SHM window and non-SHM window exist. This patch fixes this issue.
    
    Signed-off-by: Pavan Balaji <balaji at anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/src/ch3_win_fns.c b/src/mpid/ch3/channels/nemesis/src/ch3_win_fns.c
index eebca52..dba6d63 100644
--- a/src/mpid/ch3/channels/nemesis/src/ch3_win_fns.c
+++ b/src/mpid/ch3/channels/nemesis/src/ch3_win_fns.c
@@ -345,46 +345,19 @@ static int MPIDI_CH3I_Win_allocate_shm(MPI_Aint size, int disp_unit, MPID_Info *
 
     /* get the sizes of the windows and window objectsof
        all processes.  allocate temp. buffer for communication */
-    MPIU_CHKLMEM_MALLOC(tmp_buf, MPI_Aint *, 3*comm_size*sizeof(MPI_Aint), mpi_errno, "tmp_buf");
+    MPIU_CHKLMEM_MALLOC(node_sizes, MPI_Aint *, node_size*sizeof(MPI_Aint), mpi_errno, "node_sizes");
 
     /* FIXME: This needs to be fixed for heterogeneous systems */
-    tmp_buf[3*rank]   = (MPI_Aint) size;
-    tmp_buf[3*rank+1] = (MPI_Aint) disp_unit;
-    tmp_buf[3*rank+2] = (MPI_Aint) (*win_ptr)->handle;
+    node_sizes[node_rank]   = (MPI_Aint) size;
 
     mpi_errno = MPIR_Allgather_impl(MPI_IN_PLACE, 0, MPI_DATATYPE_NULL,
-                                    tmp_buf, 3 * sizeof(MPI_Aint), MPI_BYTE,
-                                    (*win_ptr)->comm_ptr, &errflag);
+                                    node_sizes, sizeof(MPI_Aint), MPI_BYTE,
+                                    node_comm_ptr, &errflag);
     MPIR_T_PVAR_TIMER_END(RMA, rma_wincreate_allgather);
     if (mpi_errno) MPIU_ERR_POP(mpi_errno);
     MPIU_ERR_CHKANDJUMP(errflag, mpi_errno, MPI_ERR_OTHER, "**coll_fail");
 
-    if ((*win_ptr)->create_flavor != MPI_WIN_FLAVOR_SHARED) {
-        MPIU_CHKLMEM_MALLOC(node_sizes, MPI_Aint *, node_size*sizeof(MPI_Aint), mpi_errno, "node_sizes");
-        for (i = 0; i < node_size; i++) node_sizes[i] = 0;
-    }
-    else {
-        node_sizes = (*win_ptr)->sizes;
-    }
-
     (*win_ptr)->shm_segment_len = 0;
-    k = 0;
-    for (i = 0; i < comm_size; ++i) {
-        (*win_ptr)->sizes[i]           = tmp_buf[k++];
-        (*win_ptr)->disp_units[i]      = (int) tmp_buf[k++];
-        (*win_ptr)->all_win_handles[i] = (MPI_Win) tmp_buf[k++];
-
-        if ((*win_ptr)->create_flavor != MPI_WIN_FLAVOR_SHARED) {
-            /* If create flavor is not MPI_WIN_FLAVOR_SHARED, all processes on this
-               window may not be on the same node. Because we only need the sizes of local
-               processes (in order), we copy their sizes to a seperate array and keep them
-               in order, fur purpose of future use of calculating shm_base_addrs. */
-            if ((*win_ptr)->comm_ptr->intranode_table[i] >= 0) {
-                MPIU_Assert((*win_ptr)->comm_ptr->intranode_table[i] < node_size);
-                node_sizes[(*win_ptr)->comm_ptr->intranode_table[i]] = (*win_ptr)->sizes[i];
-            }
-        }
-    }
 
     for (i = 0; i < node_size; i++) {
         if (noncontig)
@@ -555,18 +528,29 @@ static int MPIDI_CH3I_Win_allocate_shm(MPI_Aint size, int disp_unit, MPID_Info *
     (*win_ptr)->base = (*win_ptr)->shm_base_addrs[rank];
     }
 
+    MPIU_CHKLMEM_MALLOC(tmp_buf, MPI_Aint *, 4*comm_size*sizeof(MPI_Aint),
+                        mpi_errno, "tmp_buf");
+
     /* get the base addresses of the windows.  Note we reuse tmp_buf from above
        since it's at least as large as we need it for this allgather. */
-    tmp_buf[rank] = MPIU_PtrToAint((*win_ptr)->base);
+    tmp_buf[4*rank] = MPIU_PtrToAint((*win_ptr)->base);
+    tmp_buf[4*rank+1] = size;
+    tmp_buf[4*rank+2] = (MPI_Aint) disp_unit;
+    tmp_buf[4*rank+3] = (MPI_Aint) (*win_ptr)->handle;
 
     mpi_errno = MPIR_Allgather_impl(MPI_IN_PLACE, 0, MPI_DATATYPE_NULL,
-                                    tmp_buf, 1, MPI_AINT,
+                                    tmp_buf, 4, MPI_AINT,
                                     (*win_ptr)->comm_ptr, &errflag);
     if (mpi_errno) MPIU_ERR_POP(mpi_errno);
     MPIU_ERR_CHKANDJUMP(errflag, mpi_errno, MPI_ERR_OTHER, "**coll_fail");
 
-    for (i = 0; i < comm_size; ++i)
-        (*win_ptr)->base_addrs[i] = MPIU_AintToPtr(tmp_buf[i]);
+    k = 0;
+    for (i = 0; i < comm_size; ++i) {
+        (*win_ptr)->base_addrs[i] = MPIU_AintToPtr(tmp_buf[k++]);
+        (*win_ptr)->sizes[i] = tmp_buf[k++];
+        (*win_ptr)->disp_units[i] = (int) tmp_buf[k++];
+        (*win_ptr)->all_win_handles[i] = (MPI_Win) tmp_buf[k++];
+    }
 
     *base_pp = (*win_ptr)->base;
 
diff --git a/src/mpid/ch3/src/ch3u_rma_ops.c b/src/mpid/ch3/src/ch3u_rma_ops.c
index f30c464..6fe601f 100644
--- a/src/mpid/ch3/src/ch3u_rma_ops.c
+++ b/src/mpid/ch3/src/ch3u_rma_ops.c
@@ -47,8 +47,12 @@ int MPIDI_Win_free(MPID_Win **win_ptr)
     MPIU_ERR_CHKANDJUMP((*win_ptr)->epoch_state != MPIDI_EPOCH_NONE,
                         mpi_errno, MPI_ERR_RMA_SYNC, "**rmasync");
 
-    mpi_errno = MPIDI_CH3I_Wait_for_pt_ops_finish(*win_ptr);
-    if(mpi_errno) MPIU_ERR_POP(mpi_errno);
+    if (!(*win_ptr)->shm_allocated) {
+        /* when SHM is allocated, we already waited for operation completion in
+         MPIDI_CH3_SHM_Win_free, so we do not need to do it again here. */
+        mpi_errno = MPIDI_CH3I_Wait_for_pt_ops_finish(*win_ptr);
+        if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+    }
 
     comm_ptr = (*win_ptr)->comm_ptr;
     mpi_errno = MPIR_Comm_free_impl(comm_ptr);

http://git.mpich.org/mpich.git/commitdiff/b4effd55f83995e0c40c43b8644e5fb97a3be5f1

commit b4effd55f83995e0c40c43b8644e5fb97a3be5f1
Author: Rajeev Thakur <thakur at mcs.anl.gov>
Date:   Tue Feb 3 19:54:50 2015 -0600

    Fix MPI_Info_get to pass valuelen+1 to MPIU_Strncpy and check return
    code of MPIU_Strncpy. Added test program.
    Closes #2225
    
    Signed-off-by: William Gropp <wgropp at illinois.edu>

diff --git a/src/include/mpiimpl.h b/src/include/mpiimpl.h
index 58666fd..ee297bb 100644
--- a/src/include/mpiimpl.h
+++ b/src/include/mpiimpl.h
@@ -4351,7 +4351,7 @@ int MPIR_Cart_map_impl(const MPID_Comm *comm_ptr, int ndims, const int dims[],
                        const int periodic[], int *newrank);
 int MPIR_Close_port_impl(const char *port_name);
 int MPIR_Open_port_impl(MPID_Info *info_ptr, char *port_name);
-void MPIR_Info_get_impl(MPID_Info *info_ptr, const char *key, int valuelen, char *value, int *flag);
+int MPIR_Info_get_impl(MPID_Info *info_ptr, const char *key, int valuelen, char *value, int *flag);
 void MPIR_Info_get_nkeys_impl(MPID_Info *info_ptr, int *nkeys);
 int MPIR_Info_get_nthkey_impl(MPID_Info *info, int n, char *key);
 void MPIR_Info_get_valuelen_impl(MPID_Info *info_ptr, const char *key, int *valuelen, int *flag);
diff --git a/src/mpi/info/info_get.c b/src/mpi/info/info_get.c
index 690c38a..cae3f46 100644
--- a/src/mpi/info/info_get.c
+++ b/src/mpi/info/info_get.c
@@ -29,28 +29,35 @@ int MPI_Info_get(MPI_Info info, const char *key, int valuelen, char *value, int
 #define FUNCNAME MPIR_Info_get_impl
 #undef FCNAME
 #define FCNAME MPIU_QUOTE(FUNCNAME)
-void MPIR_Info_get_impl(MPID_Info *info_ptr, const char *key, int valuelen, char *value, int *flag)
+int MPIR_Info_get_impl(MPID_Info *info_ptr, const char *key, int valuelen, char *value, int *flag)
 {
     MPID_Info *curr_ptr;
+    int err=0, mpi_errno=0;
+
     curr_ptr = info_ptr->next;
     *flag = 0;
 
     while (curr_ptr) {
         if (!strncmp(curr_ptr->key, key, MPI_MAX_INFO_KEY)) {
-            MPIU_Strncpy(value, curr_ptr->value, valuelen);
-            /* The following is problematic - if the user passes the
-               declared length, then this will access memory one
-               passed that point */
-            /* FIXME: The real fix is to change MPIU_Strncpy to
-               set the null at the end (always!) and return an error
-               if it had to truncate the result. */
-            /* value[valuelen] = '\0'; */
+            err = MPIU_Strncpy(value, curr_ptr->value, valuelen+1);
+            /* +1 because the MPI Standard says "In C, valuelen
+             * (passed to MPI_Info_get) should be one less than the
+             * amount of allocated space to allow for the null
+             * terminator*/
             *flag = 1;
             break;
         }
         curr_ptr = curr_ptr->next;
     }
-    return;
+
+    /* --BEGIN ERROR HANDLING-- */
+    if (err != 0)
+    {
+        mpi_errno = MPIR_Err_create_code(MPI_SUCCESS, MPIR_ERR_RECOVERABLE, FCNAME, __LINE__, MPI_ERR_INFO_VALUE, "**infovallong", NULL);
+    }
+    /* --END ERROR HANDLING-- */
+
+    return mpi_errno;
 }
 
 #endif
@@ -61,7 +68,7 @@ void MPIR_Info_get_impl(MPID_Info *info_ptr, const char *key, int valuelen, char
 Input Parameters:
 + info - info object (handle)
 . key - key (string)
-- valuelen - length of value argument (integer)
+- valuelen - length of value argument, not including null terminator (integer)
 
 Output Parameters:
 + value - value (string)
@@ -139,8 +146,9 @@ int MPI_Info_get(MPI_Info info, const char *key, int valuelen, char *value,
 #   endif /* HAVE_ERROR_CHECKING */
 
     /* ... body of routine ...  */
-    MPIR_Info_get_impl(info_ptr, key, valuelen, value, flag);
+    mpi_errno = MPIR_Info_get_impl(info_ptr, key, valuelen, value, flag);
     /* ... end of body of routine ... */
+    if (mpi_errno) goto fn_fail;
 
 #ifdef HAVE_ERROR_CHECKING
   fn_exit:
diff --git a/test/mpi/info/Makefile.am b/test/mpi/info/Makefile.am
index 956fa90..1bc23a8 100644
--- a/test/mpi/info/Makefile.am
+++ b/test/mpi/info/Makefile.am
@@ -20,4 +20,5 @@ noinst_PROGRAMS = \
     infomany      \
     infomany2     \
     infotest      \
+    infoget      \
     infoenv
diff --git a/test/mpi/info/infoget.c b/test/mpi/info/infoget.c
new file mode 100644
index 0000000..cdc0f1d
--- /dev/null
+++ b/test/mpi/info/infoget.c
@@ -0,0 +1,42 @@
+/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
+/*
+ *  (C) 2001 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ */
+
+/* Test code provided by Hajime Fujita. See Trac ticket #2225. */
+
+#include "mpi.h"
+#include <stdio.h>
+#include "mpitest.h"
+#include <string.h>
+
+int main(int argc, char *argv[])
+{
+    MPI_Info info;
+    const char *key = "key", *val = "val";
+    char buff[3 + 1]; /* strlen("val") + 1 */
+    int flag, errs=0;
+
+    MTest_Init(&argc, &argv);
+
+    MPI_Info_create(&info);
+    MPI_Info_set(info, key, val);
+    MPI_Info_get(info, key, sizeof(buff)-1, buff, &flag);
+    if (flag) {
+        if (strncmp(buff, val, sizeof(buff)-1) != 0) {
+            errs++;
+            printf("returned value is %s, should be %s\n", buff, val);
+        }
+    }
+    else {
+        errs++;
+        printf("key not found\n");
+    }
+    MPI_Info_free(&info);
+
+    MTest_Finalize(errs);
+    MPI_Finalize();
+
+    return 0;
+}
diff --git a/test/mpi/info/testlist b/test/mpi/info/testlist
index 724a6a2..b174c00 100644
--- a/test/mpi/info/testlist
+++ b/test/mpi/info/testlist
@@ -5,4 +5,5 @@ infoorder 1
 infomany 1
 infomany2 1
 infotest 1
+infoget 1
 infoenv 1 mpiversion=3.0

http://git.mpich.org/mpich.git/commitdiff/82c14ecd3ae8aa9b66db6ede46ec79fb138ec851

commit 82c14ecd3ae8aa9b66db6ede46ec79fb138ec851
Author: Wesley Bland <wbland at anl.gov>
Date:   Wed Feb 4 13:25:34 2015 -0600

    Add weak symbols for MPIX functions
    
    Some of the MPIX functions did not have weak symbols set up correctly
    which causes problems on some compilers (Pathscale). This patch adds the
    correct attribute for all of them that were missing.
    
    Signed-off-by: Ken Raffenetti <raffenet at mcs.anl.gov>

diff --git a/src/mpi/comm/comm_agree.c b/src/mpi/comm/comm_agree.c
index 49a71d7..93cf9c5 100644
--- a/src/mpi/comm/comm_agree.c
+++ b/src/mpi/comm/comm_agree.c
@@ -15,6 +15,8 @@
 #pragma _HP_SECONDARY_DEF PMPIX_Comm_agree  MPIX_Comm_agree
 #elif defined(HAVE_PRAGMA_CRI_DUP)
 #pragma _CRI duplicate MPIX_Comm_agree as PMPIX_Comm_agree
+#elif defined(HAVE_WEAK_ATTRIBUTE)
+int MPIX_Comm_agree(MPI_Comm comm, int *flag) __attribute__((weak,alias("PMPIX_Comm_agree")));
 #endif
 /* -- End Profiling Symbol Block */
 
diff --git a/src/mpi/comm/comm_failure_ack.c b/src/mpi/comm/comm_failure_ack.c
index 4337eb3..14c929e 100644
--- a/src/mpi/comm/comm_failure_ack.c
+++ b/src/mpi/comm/comm_failure_ack.c
@@ -15,6 +15,8 @@
 #pragma _HP_SECONDARY_DEF PMPIX_Comm_failure_ack  MPIX_Comm_failure_ack
 #elif defined(HAVE_PRAGMA_CRI_DUP)
 #pragma _CRI duplicate MPIX_Comm_failure_ack as PMPIX_Comm_failure_ack
+#elif defined(HAVE_WEAK_ATTRIBUTE)
+int MPIX_Comm_failure_ack( MPI_Comm comm ) __attribute__((weak,alias("PMPIX_Comm_failure_ack")));
 #endif
 /* -- End Profiling Symbol Block */
 
diff --git a/src/mpi/comm/comm_failure_get_acked.c b/src/mpi/comm/comm_failure_get_acked.c
index aac0c95..2289a89 100644
--- a/src/mpi/comm/comm_failure_get_acked.c
+++ b/src/mpi/comm/comm_failure_get_acked.c
@@ -15,6 +15,8 @@
 #pragma _HP_SECONDARY_DEF PMPIX_Comm_failure_get_acked  MPIX_Comm_failure_get_acked
 #elif defined(HAVE_PRAGMA_CRI_DUP)
 #pragma _CRI duplicate MPIX_Comm_failure_get_acked as PMPIX_Comm_failure_get_acked
+#elif defined(HAVE_WEAK_ATTRIBUTE)
+int MPIX_Comm_failure_get_acked( MPI_Comm comm, MPI_Group *failedgrp ) __attribute__((weak,alias("PMPIX_Comm_failure_get_acked")));
 #endif
 /* -- End Profiling Symbol Block */
 
diff --git a/src/mpi/comm/comm_revoke.c b/src/mpi/comm/comm_revoke.c
index 7a5154f..d134e67 100644
--- a/src/mpi/comm/comm_revoke.c
+++ b/src/mpi/comm/comm_revoke.c
@@ -17,6 +17,8 @@
 #pragma _HP_SECONDARY_DEF PMPIX_Comm_revoke  MPIX_Comm_revoke
 #elif defined(HAVE_PRAGMA_CRI_DUP)
 #pragma _CRI duplicate MPIX_Comm_revoke as PMPIX_Comm_revoke
+#elif defined(HAVE_WEAK_ATTRIBUTE)
+int MPIX_Comm_revoke(MPI_Comm comm) __attribute__((weak,alias("PMPIX_Comm_revoke")));
 #endif
 /* -- End Profiling Symbol Block */
 
diff --git a/src/mpi/comm/comm_shrink.c b/src/mpi/comm/comm_shrink.c
index 844d617..a8a70e2 100644
--- a/src/mpi/comm/comm_shrink.c
+++ b/src/mpi/comm/comm_shrink.c
@@ -29,6 +29,8 @@
 #pragma _HP_SECONDARY_DEF PMPIX_Comm_shrink  MPIX_Comm_shrink
 #elif defined(HAVE_PRAGMA_CRI_DUP)
 #pragma _CRI duplicate MPIX_Comm_shrink as PMPIX_Comm_shrink
+#elif defined(HAVE_WEAK_ATTRIBUTE)
+int MPIX_Comm_shrink(MPI_Comm comm, MPI_Comm *newcomm) __attribute__((weak,alias("PMPIX_Comm_shrink")));
 #endif
 /* -- End Profiling Symbol Block */
 
diff --git a/src/mpi/pt2pt/greq_start.c b/src/mpi/pt2pt/greq_start.c
index 3bd9a77..5ec7075 100644
--- a/src/mpi/pt2pt/greq_start.c
+++ b/src/mpi/pt2pt/greq_start.c
@@ -230,6 +230,13 @@ int MPI_Grequest_start( MPI_Grequest_query_function *query_fn,
 #pragma _HP_SECONDARY_DEF PMPIX_Grequest_class_create MPIX_Grequest_class_create
 #elif defined(HAVE_PRAGMA_CRI_DUP)
 #pragma _CRI duplicate MPIX_Grequest_class_create as PMPIX_Grequest_class_create
+#elif defined(HAVE_WEAK_ATTRIBUTE)
+int MPIX_Grequest_class_create(MPI_Grequest_query_function *query_fn,
+        MPI_Grequest_free_function *free_fn,
+        MPI_Grequest_cancel_function *cancel_fn,
+        MPIX_Grequest_poll_function *poll_fn,
+        MPIX_Grequest_wait_function *wait_fn,
+        MPIX_Grequest_class *greq_class) __attribute__((weak,alias("MPIX_Grequest_class_create")));
 #endif
 /* -- End Profiling Symbol Block */
 
@@ -320,6 +327,8 @@ fn_fail:
 #pragma _HP_SECONDARY_DEF PMPI_Grequest_class_allocate MPIX_Grequest_class_allocate
 #elif defined(HAVE_PRAGMA_CRI_DUP)
 #pragma _CRI duplicate MPIX_Grequest_class_allocate as PMPIX_Grequest_class_allocate
+#elif defined(HAVE_WEAK_ATTRIBUTE)
+int MPIX_Grequest_class_allocate(MPIX_Grequest_class greq_class, void *extra_state, MPI_Request *request) __attribute__((weak,alias("MPIX_Grequest_class_allocate")));
 #endif
 /* -- End Profiling Symbol Block */
 
@@ -363,6 +372,14 @@ int MPIX_Grequest_class_allocate(MPIX_Grequest_class greq_class,
 #pragma _HP_SECONDARY_DEF PMPI_Grequest_start MPIX_Grequest_start
 #elif defined(HAVE_PRAGMA_CRI_DUP)
 #pragma _CRI duplicate MPIX_Grequest_start as PMPIX_Grequest_start
+#elif defined(HAVE_WEAK_ATTRIBUTE)
+int MPIX_Grequest_start( MPI_Grequest_query_function *query_fn,
+        MPI_Grequest_free_function *free_fn,
+        MPI_Grequest_cancel_function *cancel_fn,
+        MPIX_Grequest_poll_function *poll_fn,
+        MPIX_Grequest_wait_function *wait_fn,
+        void *extra_state,
+        MPI_Request *request ) __attribute__((weak,alias("MPIX_Grequest_start")));
 #endif
 /* -- End Profiling Symbol Block */
 
diff --git a/src/mutex/mutex_create.c b/src/mutex/mutex_create.c
index 5bae747..eec39c1 100644
--- a/src/mutex/mutex_create.c
+++ b/src/mutex/mutex_create.c
@@ -22,6 +22,8 @@
 #pragma _HP_SECONDARY_DEF PMPIX_Mutex_create  MPIX_Mutex_create
 #elif defined(HAVE_PRAGMA_CRI_DUP)
 #pragma _CRI duplicate MPIX_Mutex_create as PMPIX_Mutex_create
+#elif defined(HAVE_WEAK_ATTRIBUTE)
+int MPIX_Mutex_create(int my_count, MPI_Comm comm, MPIX_Mutex * hdl_out) __attribute__((weak,alias("MPIX_Mutex_create")));
 #endif
 /* -- End Profiling Symbol Block */
 
diff --git a/src/mutex/mutex_free.c b/src/mutex/mutex_free.c
index 767c00b..79715ca 100644
--- a/src/mutex/mutex_free.c
+++ b/src/mutex/mutex_free.c
@@ -19,6 +19,8 @@
 #pragma _HP_SECONDARY_DEF PMPIX_Mutex_free  MPIX_Mutex_free
 #elif defined(HAVE_PRAGMA_CRI_DUP)
 #pragma _CRI duplicate MPIX_Mutex_free as PMPIX_Mutex_free
+#elif defined(HAVE_WEAK_ATTRIBUTE)
+int MPIX_Mutex_free(MPIX_Mutex * hdl_ptr) __attribute__((weak,alias("MPIX_Mutex_free")));
 #endif
 /* -- End Profiling Symbol Block */
 
diff --git a/src/mutex/mutex_lock.c b/src/mutex/mutex_lock.c
index d6f9836..257e80f 100644
--- a/src/mutex/mutex_lock.c
+++ b/src/mutex/mutex_lock.c
@@ -20,6 +20,8 @@
 #pragma _HP_SECONDARY_DEF PMPIX_Mutex_lock  MPIX_Mutex_lock
 #elif defined(HAVE_PRAGMA_CRI_DUP)
 #pragma _CRI duplicate MPIX_Mutex_lock as PMPIX_Mutex_lock
+#elif defined(HAVE_WEAK_ATTRIBUTE)
+int MPIX_Mutex_lock(MPIX_Mutex hdl, int mutex, int proc) __attribute__((weak,alias("MPIX_Mutex_lock")));
 #endif
 /* -- End Profiling Symbol Block */
 
diff --git a/src/mutex/mutex_unlock.c b/src/mutex/mutex_unlock.c
index 5dd67d9..722f8b9 100644
--- a/src/mutex/mutex_unlock.c
+++ b/src/mutex/mutex_unlock.c
@@ -20,6 +20,8 @@
 #pragma _HP_SECONDARY_DEF PMPIX_Mutex_unlock  MPIX_Mutex_unlock
 #elif defined(HAVE_PRAGMA_CRI_DUP)
 #pragma _CRI duplicate MPIX_Mutex_unlock as PMPIX_Mutex_unlock
+#elif defined(HAVE_WEAK_ATTRIBUTE)
+int MPIX_Mutex_unlock(MPIX_Mutex hdl, int mutex, int proc) __attribute__((weak,alias("MPIX_Mutex_unlock")));
 #endif
 /* -- End Profiling Symbol Block */
 

http://git.mpich.org/mpich.git/commitdiff/92be121b56c15aebaed5f8bdf0053de6a46af692

commit 92be121b56c15aebaed5f8bdf0053de6a46af692
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Mon Feb 2 15:57:15 2015 -0600

    portals4: simplify send callback
    
    Merges the existing send callbacks into a single function. Uses the
    completion counter to track remaining operations and complete the
    request once finished.
    
    Signed-off-by: Antonio Pena Monferrer <apenya at mcs.anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
index d3f4d91..4ee5de1 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
@@ -13,7 +13,7 @@
 #define FCNAME MPIU_QUOTE(FUNCNAME)
 static void big_meappend(void *buf, ptl_size_t left_to_send, MPIDI_VC_t *vc, ptl_match_bits_t match_bits, MPID_Request *sreq)
 {
-    int i, ret;
+    int i, ret, was_incomplete;
     MPID_nem_ptl_vc_area *vc_ptl;
     ptl_me_t me;
 
@@ -45,77 +45,54 @@ static void big_meappend(void *buf, ptl_size_t left_to_send, MPIDI_VC_t *vc, ptl
                                      &REQ_PTL(sreq)->get_me_p[i]);
         DBG_MSG_MEAPPEND("CTL", vc->pg_rank, me, sreq);
         MPIU_Assert(ret == 0);
+        /* increment the cc for each get operation */
+        MPIDI_CH3U_Request_increment_cc(sreq, &was_incomplete);
+        MPIU_Assert(was_incomplete);
+        REQ_PTL(sreq)->num_gets++;
 
         /* account for what has been sent */
         me.start = (char *)me.start + me.length;
         left_to_send -= me.length;
-        REQ_PTL(sreq)->num_gets++;
     }
 }
 
 #undef FUNCNAME
-#define FUNCNAME handler_send_complete
+#define FUNCNAME handler_send
 #undef FCNAME
 #define FCNAME MPIU_QUOTE(FUNCNAME)
-static int handler_send_complete(const ptl_event_t *e)
+static int handler_send(const ptl_event_t *e)
 {
     int mpi_errno = MPI_SUCCESS;
     MPID_Request *const sreq = e->user_ptr;
-    int ret;
-    int i;
-    MPIDI_STATE_DECL(MPID_STATE_HANDLER_SEND_COMPLETE);
+    int i, ret, incomplete;
+
+    MPIDI_STATE_DECL(MPID_STATE_HANDLER_SEND);
 
-    MPIDI_FUNC_ENTER(MPID_STATE_HANDLER_SEND_COMPLETE);
+    MPIDI_FUNC_ENTER(MPID_STATE_HANDLER_SEND);
 
     MPIU_Assert(e->type == PTL_EVENT_SEND || e->type == PTL_EVENT_GET);
 
-    if (REQ_PTL(sreq)->md != PTL_INVALID_HANDLE) {
-        ret = PtlMDRelease(REQ_PTL(sreq)->md);
-        MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmdrelease", "**ptlmdrelease %s", MPID_nem_ptl_strerror(ret));
-    }
+    /* if we are done, release all resources and complete the request */
+    if (sreq->cc == 1) {
+        if (REQ_PTL(sreq)->md != PTL_INVALID_HANDLE) {
+            ret = PtlMDRelease(REQ_PTL(sreq)->md);
+            MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmdrelease", "**ptlmdrelease %s", MPID_nem_ptl_strerror(ret));
+        }
 
-    for (i = 0; i < MPID_NEM_PTL_NUM_CHUNK_BUFFERS; ++i)
-        if (REQ_PTL(sreq)->chunk_buffer[i])
-            MPIU_Free(REQ_PTL(sreq)->chunk_buffer[i]);
+        for (i = 0; i < MPID_NEM_PTL_NUM_CHUNK_BUFFERS; ++i)
+            if (REQ_PTL(sreq)->chunk_buffer[i])
+                MPIU_Free(REQ_PTL(sreq)->chunk_buffer[i]);
 
-    if (REQ_PTL(sreq)->get_me_p)
-        MPIU_Free(REQ_PTL(sreq)->get_me_p);
+        if (REQ_PTL(sreq)->get_me_p)
+            MPIU_Free(REQ_PTL(sreq)->get_me_p);
     
-    MPIDI_CH3U_Request_complete(sreq);
-
- fn_exit:
-    MPIDI_FUNC_EXIT(MPID_STATE_HANDLER_SEND_COMPLETE);
-    return mpi_errno;
- fn_fail:
-    goto fn_exit;
-}
-
-#undef FUNCNAME
-#define FUNCNAME handler_large
-#undef FCNAME
-#define FCNAME MPIU_QUOTE(FUNCNAME)
-static int handler_large(const ptl_event_t *e)
-{
-    int mpi_errno = MPI_SUCCESS;
-    MPID_Request *const sreq = e->user_ptr;
-    MPIDI_STATE_DECL(MPID_STATE_HANDLER_LARGE);
-
-    MPIDI_FUNC_ENTER(MPID_STATE_HANDLER_LARGE);
-
-    MPIU_Assert(e->type == PTL_EVENT_SEND || e->type == PTL_EVENT_GET);
-
-    if (e->type == PTL_EVENT_SEND) {
-        REQ_PTL(sreq)->put_done = 1;
-    } else if (e->type == PTL_EVENT_GET) {
-        /* decrement the remaining get operations */
-        REQ_PTL(sreq)->num_gets--;
+        MPIDI_CH3U_Request_complete(sreq);
+    } else {
+        MPIDI_CH3U_Request_decrement_cc(sreq, &incomplete);
     }
 
-    if (REQ_PTL(sreq)->num_gets == 0 && REQ_PTL(sreq)->put_done)
-        mpi_errno = handler_send_complete(e);
-
  fn_exit:
-    MPIDI_FUNC_EXIT(MPID_STATE_HANDLER_LARGE);
+    MPIDI_FUNC_EXIT(MPID_STATE_HANDLER_SEND);
     return mpi_errno;
  fn_fail:
     goto fn_exit;
@@ -163,7 +140,7 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
         /* Small message.  Send all data eagerly */
         if (dt_contig) {
             MPIU_DBG_MSG(CH3_CHANNEL, VERBOSE, "Small contig message");
-            REQ_PTL(sreq)->event_handler = handler_send_complete;
+            REQ_PTL(sreq)->event_handler = handler_send;
             MPIU_DBG_MSG_P(CH3_CHANNEL, VERBOSE, "&REQ_PTL(sreq)->event_handler = %p", &(REQ_PTL(sreq)->event_handler));
             ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)((char *)buf + dt_true_lb), data_sz, PTL_NO_ACK_REQ, vc_ptl->id, vc_ptl->pt,
                          NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), 0, sreq,
@@ -201,7 +178,7 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
             ret = PtlMDBind(MPIDI_nem_ptl_ni, &md, &REQ_PTL(sreq)->md);
             MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmdbind", "**ptlmdbind %s", MPID_nem_ptl_strerror(ret));
                 
-            REQ_PTL(sreq)->event_handler = handler_send_complete;
+            REQ_PTL(sreq)->event_handler = handler_send;
             ret = MPID_nem_ptl_rptl_put(REQ_PTL(sreq)->md, 0, data_sz, PTL_NO_ACK_REQ, vc_ptl->id, vc_ptl->pt,
                          NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), 0, sreq,
                                         NPTL_HEADER(ssend_flag, data_sz));
@@ -218,7 +195,7 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
         last = data_sz;
         MPID_Segment_pack(sreq->dev.segment_ptr, sreq->dev.segment_first, &last, REQ_PTL(sreq)->chunk_buffer[0]);
         MPIU_Assert(last == sreq->dev.segment_size);
-        REQ_PTL(sreq)->event_handler = handler_send_complete;
+        REQ_PTL(sreq)->event_handler = handler_send;
         ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)REQ_PTL(sreq)->chunk_buffer[0], data_sz, PTL_NO_ACK_REQ,
                      vc_ptl->id, vc_ptl->pt, NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), 0, sreq,
                                     NPTL_HEADER(ssend_flag, data_sz));
@@ -235,7 +212,7 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
                      NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), sreq);
         REQ_PTL(sreq)->large = TRUE;
 
-        REQ_PTL(sreq)->event_handler = handler_large;
+        REQ_PTL(sreq)->event_handler = handler_send;
         ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)((char *)buf + dt_true_lb), PTL_LARGE_THRESHOLD, PTL_NO_ACK_REQ, vc_ptl->id, vc_ptl->pt,
                      NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), 0, sreq,
                                     NPTL_HEADER(ssend_flag | NPTL_LARGE, data_sz));
@@ -273,6 +250,8 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
 
             if (last == sreq->dev.segment_size && last <= MPIDI_nem_ptl_ni_limits.max_msg_size + PTL_LARGE_THRESHOLD) {
                 /* Entire message fit in one IOV */
+                int was_incomplete;
+
                 MPIU_DBG_MSG(CH3_CHANNEL, VERBOSE, "    rest of message fits in one IOV");
                 /* Create ME for remaining data */
                 me.start = &sreq->dev.iov[initial_iov_count];
@@ -288,11 +267,14 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
 
                 MPIU_CHKPMEM_MALLOC(REQ_PTL(sreq)->get_me_p, ptl_handle_me_t *, sizeof(ptl_handle_me_t), mpi_errno, "get_me_p");
 
-                REQ_PTL(sreq)->num_gets = 1;
                 ret = MPID_nem_ptl_me_append(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_get_pt, &me, PTL_PRIORITY_LIST, sreq,
                                              &REQ_PTL(sreq)->get_me_p[0]);
                 MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmeappend", "**ptlmeappend %s", MPID_nem_ptl_strerror(ret));
                 DBG_MSG_MEAPPEND("CTL", vc->pg_rank, me, sreq);
+                /* increment the cc for the get operation */
+                MPIDI_CH3U_Request_increment_cc(sreq, &was_incomplete);
+                MPIU_Assert(was_incomplete);
+                REQ_PTL(sreq)->num_gets = 1;
 
                 /* Create MD for first chunk */
                 md.start = sreq->dev.iov;
@@ -305,7 +287,7 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
 
                 REQ_PTL(sreq)->large = TRUE;
 
-                REQ_PTL(sreq)->event_handler = handler_large;
+                REQ_PTL(sreq)->event_handler = handler_send;
                 ret = MPID_nem_ptl_rptl_put(REQ_PTL(sreq)->md, 0, PTL_LARGE_THRESHOLD, PTL_NO_ACK_REQ, vc_ptl->id, vc_ptl->pt,
                              NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), 0, sreq,
                                             NPTL_HEADER(ssend_flag | NPTL_LARGE, data_sz));
@@ -329,7 +311,7 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
                  NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), sreq);
     REQ_PTL(sreq)->large = TRUE;
 
-    REQ_PTL(sreq)->event_handler = handler_large;
+    REQ_PTL(sreq)->event_handler = handler_send;
     ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)REQ_PTL(sreq)->chunk_buffer[0], PTL_LARGE_THRESHOLD,
                                 PTL_NO_ACK_REQ, vc_ptl->id, vc_ptl->pt, NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank),
                                 0, sreq, NPTL_HEADER(ssend_flag | NPTL_LARGE, data_sz));

http://git.mpich.org/mpich.git/commitdiff/527b159b2298a3d4c5764b249f6d1cbae123ebf3

commit 527b159b2298a3d4c5764b249f6d1cbae123ebf3
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Mon Feb 2 10:32:03 2015 -0600

    portals4: remove dead code
    
    Signed-off-by: Antonio Pena Monferrer <apenya at mcs.anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
index e49546f..d3f4d91 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
@@ -121,99 +121,6 @@ static int handler_large(const ptl_event_t *e)
     goto fn_exit;
 }
 
-#if 0
-
-#undef FUNCNAME
-#define FUNCNAME handler_pack_chunk
-#undef FCNAME
-#define FCNAME MPIU_QUOTE(FUNCNAME)
-static int handler_pack_chunk(const ptl_event_t *e)
-{
-    int mpi_errno = MPI_SUCCESS;
-    MPID_Request *const sreq = e->user_ptr;
-    MPIDI_STATE_DECL(MPID_STATE_HANDLER_PACK_CHUNK);
-
-    MPIDI_FUNC_ENTER(MPID_STATE_HANDLER_PACK_CHUNK);
-
-    MPIU_Assert(e->type == PTL_EVENT_GET || e->type == PTL_EVENT_PUT);
-
-    if (e->type == PTL_EVENT_PUT) {
-        mpi_errno = handler_send_complete(e);
-        if (mpi_errno) MPIU_ERR_POP(mpi_errno);
-        goto fn_exit;
-    }
-
-    /* pack next chunk */
-    MPI_nem_ptl_pack_byte(sreq->dev.segment_ptr, sreq->dev.segment_first, sreq->dev.segment_first + PTL_LARGE_THRESHOLD,
-              REQ_PTL(sreq_)->chunk_buffer[1], &REQ_PTL(sreq)->overflow[1]);
-    sreq->dev.segment_first += PTL_LARGE_THRESHOLD;
-
-    /* notify receiver */
-    ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, 0, 0, PTL_ACK_REQ, vc_ptl->id,
-                 vc_ptl->pt, ?????, 0, sreq,
-                 NPTL_HEADER(?????, MPIDI_Process.my_pg_rank, me.match_bits));
-
-
- fn_exit:
-    MPIDI_FUNC_EXIT(MPID_STATE_HANDLER_PACK_CHUNK);
-    return mpi_errno;
- fn_fail:
-    goto fn_exit;
-}
-#undef FUNCNAME
-#define FUNCNAME handler_multi_put
-#undef FCNAME
-#define FCNAME MPIU_QUOTE(FUNCNAME)
-static int handler_multi_put(const ptl_event_t *e)
-{
-    int mpi_errno = MPI_SUCCESS;
-    MPID_Request *const sreq = e->user_ptr;
-    MPIDI_STATE_DECL(MPID_STATE_HANDLER_MULTI_PUT);
-
-    MPIDI_FUNC_ENTER(MPID_STATE_HANDLER_MULTI_PUT);
-
-    
-    
-
- fn_exit:
-    MPIDI_FUNC_EXIT(MPID_STATE_HANDLER_MULTI_PUT);
-    return mpi_errno;
- fn_fail:
-    goto fn_exit;
-}
-
-
-#undef FUNCNAME
-#define FUNCNAME handler_large_multi
-#undef FCNAME
-#define FCNAME MPIU_QUOTE(FUNCNAME)
-static int handler_large_multi(const ptl_event_t *e)
-{
-    int mpi_errno = MPI_SUCCESS;
-    MPID_Request *const sreq = e->user_ptr;
-    MPIDI_STATE_DECL(MPID_STATE_HANDLER_LARGE_MULTI);
-
-    MPIU_Assert(e->type == PTL_EVENT_ACK);
-
-    MPIDI_FUNC_ENTER(MPID_STATE_HANDLER_LARGE_MULTI);
-    if (e->mlength < PTL_LARGE_THRESHOLD) {
-        /* truncated message */
-        mpi_errno = handler_send_complete(e);
-        if (mpi_errno) MPIU_ERR_POP(mpi_errno);
-    } else {
-        REQ_PTL(sreq)->event_handler = handler_pack_chunk;
-    }
-    
- fn_exit:
-    MPIDI_FUNC_EXIT(MPID_STATE_HANDLER_LARGE_MULTI);
-    return mpi_errno;
- fn_fail:
-    goto fn_exit;
-}
-
-#endif
-
-
 /* Send message for either isend or issend */
 #undef FUNCNAME
 #define FUNCNAME send_msg

http://git.mpich.org/mpich.git/commitdiff/a9ba4460efd9e1c717225b4d9bcb37d98db2118b

commit a9ba4460efd9e1c717225b4d9bcb37d98db2118b
Author: Rajeev Thakur <thakur at mcs.anl.gov>
Date:   Tue Jan 27 14:51:42 2015 -0600

    Changes the sticky lb/ub fields in resized types to 0, since the lb/ub
    set by type_create_resized are not sticky.
    
    Changes darray and subarray types to use type_create_resized instead
    of type_struct with explicit lb/ub, because explicit MPI_LB/MPI_UB
    have been removed from MPI in MPI-3 and they also cause other problems
    because they were defined to be sticky in MPI-1.
    
    Fixes type_create_struct, which was incorrectly setting lb and ub to
    true_lb and true_ub in the non-sticky case.
    
    Closes #2218
    Closes #2220
    Closes #2224
    
    Signed-off-by: Rob Latham <robl at mcs.anl.gov>

diff --git a/src/mpi/datatype/type_create_darray.c b/src/mpi/datatype/type_create_darray.c
index 0047f38..5c85f47 100644
--- a/src/mpi/datatype/type_create_darray.c
+++ b/src/mpi/datatype/type_create_darray.c
@@ -197,7 +197,7 @@ PMPI_LOCAL int MPIR_Type_cyclic(const int *array_of_gsizes,
     int mpi_errno,blksize, i, blklens[3], st_index, end_index,
 	local_size, rem, count;
     MPI_Aint stride, disps[3];
-    MPI_Datatype type_tmp, types[3];
+    MPI_Datatype type_tmp, type_indexed, types[3];
 
     if (darg == MPI_DISTRIBUTE_DFLT_DARG) blksize = 1;
     else blksize = darg;
@@ -281,18 +281,30 @@ PMPI_LOCAL int MPIR_Type_cyclic(const int *array_of_gsizes,
     if (((order == MPI_ORDER_FORTRAN) && (dim == 0)) ||
 	((order == MPI_ORDER_C) && (dim == ndims-1)))
     {
-        types[0] = MPI_LB;
         disps[0] = 0;
-        types[1] = *type_new;
         disps[1] = (MPI_Aint) rank * (MPI_Aint) blksize * orig_extent;
-        types[2] = MPI_UB;
         disps[2] = orig_extent * (MPI_Aint)(array_of_gsizes[dim]);
-        blklens[0] = blklens[1] = blklens[2] = 1;
-        mpi_errno = MPID_Type_struct(3,
-				     blklens,
-				     disps,
-				     types,
-				     &type_tmp);
+
+/* Instead of using MPI_LB/MPI_UB, which have been removed from MPI in MPI-3,
+   use MPI_Type_create_resized. Use hindexed_block to set the starting displacement
+   of the datatype (disps[1]) and type_create_resized to set lb to 0 (disps[0])
+   and extent to disps[2], which makes ub = disps[2].
+ */
+        mpi_errno = MPID_Type_blockindexed(1, 1, &disps[1],
+                                           1, /* 1 means disp is in bytes */
+                                           *type_new, &type_indexed);
+
+	/* --BEGIN ERROR HANDLING-- */
+	if (mpi_errno != MPI_SUCCESS)
+	{
+	    mpi_errno = MPIR_Err_create_code(mpi_errno, MPIR_ERR_RECOVERABLE, FCNAME, __LINE__, MPI_ERR_OTHER, "**fail", 0);
+	    return mpi_errno;
+	}
+	/* --END ERROR HANDLING-- */
+
+        mpi_errno = MPID_Type_create_resized(type_indexed, 0, disps[2], &type_tmp);
+
+        MPIR_Type_free_impl(&type_indexed);
         MPIR_Type_free_impl(type_new);
         *type_new = type_tmp;
 
@@ -364,9 +376,9 @@ int MPI_Type_create_darray(int size,
     int mpi_errno = MPI_SUCCESS, i;
     MPI_Datatype new_handle;
 
-    int procs, tmp_rank, tmp_size, blklens[3], *coords;
+    int procs, tmp_rank, tmp_size, *coords;
     MPI_Aint *st_offsets, orig_extent, disps[3];
-    MPI_Datatype type_old, type_new = MPI_DATATYPE_NULL, types[3];
+    MPI_Datatype type_old, type_new = MPI_DATATYPE_NULL, tmp_type;
 
 #   ifdef HAVE_ERROR_CHECKING
     MPI_Aint   size_with_aint;
@@ -666,20 +678,27 @@ int MPI_Type_create_darray(int size,
     for (i=0; i<ndims; i++) disps[2] *= (MPI_Aint)(array_of_gsizes[i]);
 	
     disps[0] = 0;
-    blklens[0] = blklens[1] = blklens[2] = 1;
-    types[0] = MPI_LB;
-    types[1] = type_new;
-    types[2] = MPI_UB;
-
-    mpi_errno = MPID_Type_struct(3,
-				 blklens,
-				 disps,
-				 types,
-				 &new_handle);
+
+/* Instead of using MPI_LB/MPI_UB, which have been removed from MPI in MPI-3,
+   use MPI_Type_create_resized. Use hindexed_block to set the starting displacement
+   of the datatype (disps[1]) and type_create_resized to set lb to 0 (disps[0])
+   and extent to disps[2], which makes ub = disps[2].
+ */
+    mpi_errno = MPID_Type_blockindexed(1, 1, &disps[1],
+                                       1, /* 1 means disp is in bytes */
+                                       type_new, &tmp_type);
+
+    /* --BEGIN ERROR HANDLING-- */
+    if (mpi_errno != MPI_SUCCESS) goto fn_fail;
+    /* --END ERROR HANDLING-- */
+
+    mpi_errno = MPID_Type_create_resized(tmp_type, 0, disps[2], &new_handle);
+
     /* --BEGIN ERROR HANDLING-- */
     if (mpi_errno != MPI_SUCCESS) goto fn_fail;
     /* --END ERROR HANDLING-- */
 
+    MPIR_Type_free_impl(&tmp_type);
     MPIR_Type_free_impl(&type_new);
 
     /* at this point we have the new type, and we've cleaned up any
diff --git a/src/mpi/datatype/type_create_subarray.c b/src/mpi/datatype/type_create_subarray.c
index 0089f37..906f865 100644
--- a/src/mpi/datatype/type_create_subarray.c
+++ b/src/mpi/datatype/type_create_subarray.c
@@ -72,8 +72,7 @@ int MPI_Type_create_subarray(int ndims,
 
     /* these variables are from the original version in ROMIO */
     MPI_Aint size, extent, disps[3];
-    int blklens[3];
-    MPI_Datatype tmp1, tmp2, types[3];
+    MPI_Datatype tmp1, tmp2;
 
 #   ifdef HAVE_ERROR_CHECKING
     MPI_Aint   size_with_aint;
@@ -278,29 +277,23 @@ int MPI_Type_create_subarray(int ndims,
     for (i=0; i<ndims; i++) disps[2] *= (MPI_Aint)(array_of_sizes[i]);
 
     disps[0] = 0;
-    blklens[0] = blklens[1] = blklens[2] = 1;
-    types[0] = MPI_LB;
-    types[1] = tmp1;
-    types[2] = MPI_UB;
-
-    /* TODO:
-     * if we were to do all this as an mpid function, we could just
-     * directly adjust the LB and UB in the MPID_Datatype structure
-     * instead of jumping through this hoop.
-     *
-     * i suppose we could do the same thing here...
-     *
-     * another alternative would be to use MPID_Type_create_resized()
-     * instead of building the struct.  that would also be cleaner.
-     */
-    mpi_errno = MPID_Type_struct(3,
-				 blklens,
-				 disps,
-				 types,
-				 &new_handle);
+
+/* Instead of using MPI_LB/MPI_UB, which have been removed from MPI in MPI-3,
+   use MPI_Type_create_resized. Use hindexed_block to set the starting displacement
+   of the datatype (disps[1]) and type_create_resized to set lb to 0 (disps[0])
+   and extent to disps[2], which makes ub = disps[2].
+ */
+
+    mpi_errno = MPID_Type_blockindexed(1, 1, &disps[1],
+                                       1, /* 1 means disp is in bytes */
+                                       tmp1, &tmp2);
+    if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+
+    mpi_errno = MPID_Type_create_resized(tmp2, 0, disps[2], &new_handle);
     if (mpi_errno) MPIU_ERR_POP(mpi_errno);
 
     MPIR_Type_free_impl(&tmp1);
+    MPIR_Type_free_impl(&tmp2);
 
     /* at this point we have the new type, and we've cleaned up any
      * intermediate types created in the process.  we just need to save
diff --git a/src/mpid/common/datatype/mpid_type_create_resized.c b/src/mpid/common/datatype/mpid_type_create_resized.c
index 07d753d..75b535f 100644
--- a/src/mpid/common/datatype/mpid_type_create_resized.c
+++ b/src/mpid/common/datatype/mpid_type_create_resized.c
@@ -47,8 +47,8 @@ int MPID_Type_create_resized(MPI_Datatype oldtype,
 	int oldsize = MPID_Datatype_get_basic_size(oldtype);
 
 	new_dtp->size           = oldsize;
-	new_dtp->has_sticky_ub  = 1;
-	new_dtp->has_sticky_lb  = 1;
+	new_dtp->has_sticky_ub  = 0;
+	new_dtp->has_sticky_lb  = 0;
 	new_dtp->dataloop_depth = 1;
 	new_dtp->true_lb        = 0;
 	new_dtp->lb             = lb;
@@ -70,8 +70,8 @@ int MPID_Type_create_resized(MPI_Datatype oldtype,
 	MPID_Datatype_get_ptr(oldtype, old_dtp);
 
 	new_dtp->size           = old_dtp->size;
-	new_dtp->has_sticky_ub  = 1;
-	new_dtp->has_sticky_lb  = 1;
+	new_dtp->has_sticky_ub  = 0;
+	new_dtp->has_sticky_lb  = 0;
 	new_dtp->dataloop_depth = old_dtp->dataloop_depth;
 	new_dtp->true_lb        = old_dtp->true_lb;
 	new_dtp->lb             = lb;
diff --git a/src/mpid/common/datatype/mpid_type_struct.c b/src/mpid/common/datatype/mpid_type_struct.c
index 148958e..72e23fd 100644
--- a/src/mpid/common/datatype/mpid_type_struct.c
+++ b/src/mpid/common/datatype/mpid_type_struct.c
@@ -151,12 +151,12 @@ int MPID_Type_struct(int count,
     int mpi_errno = MPI_SUCCESS;
     int i, old_are_contig = 1, definitely_not_contig = 0;
     int found_sticky_lb = 0, found_sticky_ub = 0, found_true_lb = 0,
-	found_true_ub = 0, found_el_type = 0;
+	found_true_ub = 0, found_el_type = 0, found_lb=0, found_ub=0;
     MPI_Aint el_sz = 0;
     MPI_Aint size = 0;
     MPI_Datatype el_type = MPI_DATATYPE_NULL;
     MPI_Aint true_lb_disp = 0, true_ub_disp = 0, sticky_lb_disp = 0,
-	sticky_ub_disp = 0;
+	sticky_ub_disp = 0, lb_disp = 0, ub_disp = 0;
 
     MPID_Datatype *new_dtp;
 
@@ -320,7 +320,7 @@ int MPID_Type_struct(int count,
 	    }
 	}
 
-	/* keep lowest true lb and highest true ub
+	/* keep lowest lb/true_lb and highest ub/true_ub
 	 *
 	 * note: checking for contiguity at the same time, to avoid
 	 *       yet another pass over the arrays
@@ -339,6 +339,18 @@ int MPID_Type_struct(int count,
 		definitely_not_contig = 1;
 	    }
 
+	    if (!found_lb)
+	    {
+		found_lb = 1;
+		lb_disp  = tmp_lb;
+	    }
+	    else if (lb_disp > tmp_lb)
+	    {
+		/* lb before previous */
+		lb_disp = tmp_lb;
+		definitely_not_contig = 1;
+	    }
+
 	    if (!found_true_ub)
 	    {
 		found_true_ub = 1;
@@ -352,6 +364,20 @@ int MPID_Type_struct(int count,
 		/* element ends before previous ended */
 		definitely_not_contig = 1;
 	    }
+
+	    if (!found_ub)
+	    {
+		found_ub = 1;
+		ub_disp  = tmp_ub;
+	    }
+	    else if (ub_disp < tmp_ub)
+	    {
+		ub_disp = tmp_ub;
+	    }
+	    else {
+		/* ub before previous */
+		definitely_not_contig = 1;
+	    }
 	}
 
 	if (!is_builtin && !old_dtp->is_contig)
@@ -366,11 +392,11 @@ int MPID_Type_struct(int count,
 
     new_dtp->has_sticky_lb = found_sticky_lb;
     new_dtp->true_lb       = true_lb_disp;
-    new_dtp->lb = (found_sticky_lb) ? sticky_lb_disp : true_lb_disp;
+    new_dtp->lb = (found_sticky_lb) ? sticky_lb_disp : lb_disp;
 
     new_dtp->has_sticky_ub = found_sticky_ub;
     new_dtp->true_ub       = true_ub_disp;
-    new_dtp->ub = (found_sticky_ub) ? sticky_ub_disp : true_ub_disp;
+    new_dtp->ub = (found_sticky_ub) ? sticky_ub_disp : ub_disp;
 
     new_dtp->alignsize = MPID_Type_struct_alignsize(count,
 						    oldtype_array,

http://git.mpich.org/mpich.git/commitdiff/316ea7cdbfce60db123eead06f0c0b4202064369

commit 316ea7cdbfce60db123eead06f0c0b4202064369
Author: Jithin Jose <jithin.jose at intel.com>
Date:   Tue Jan 27 13:22:53 2015 -0800

    Updates for latest OFI-libfabric API
    
    Signed-off-by: Charles J Archer <charles.j.archer at intel.com>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_cm.c b/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_cm.c
index 31b38fc..c38968c 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_cm.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_cm.c
@@ -105,7 +105,7 @@ static inline int MPID_nem_ofi_conn_req_callback(cq_tagged_entry_t * wc, MPID_Re
                        gl_data.conn_req->dev.user_buf,
                        OFI_KVSAPPSTRLEN,
                        gl_data.mr,
-                       0,
+                       FI_ADDR_UNSPEC,
                        MPID_CONN_REQ,
                        ~MPID_PROTOCOL_MASK,
                        (void *) &(REQ_OFI(gl_data.conn_req)->ofi_context)), trecv);
@@ -246,7 +246,7 @@ static inline int MPID_nem_ofi_preposted_callback(cq_tagged_entry_t * wc, MPID_R
                        &rreq->dev.user_count,
                        sizeof rreq->dev.user_count,
                        gl_data.mr,
-                       0,
+                       FI_ADDR_UNSPEC,
                        MPID_MSG_RTS,
                        ~MPID_PROTOCOL_MASK, &(REQ_OFI(rreq)->ofi_context)), trecv);
     END_FUNC_RC(FCNAME);
@@ -307,7 +307,7 @@ int MPID_nem_ofi_cm_init(MPIDI_PG_t * pg_p, int pg_rank ATTRIBUTE((unused)))
                        &persistent_req->dev.user_count,
                        sizeof persistent_req->dev.user_count,
                        gl_data.mr,
-                       0,
+                       FI_ADDR_UNSPEC,
                        MPID_MSG_RTS,
                        ~MPID_PROTOCOL_MASK,
                        (void *) &(REQ_OFI(persistent_req)->ofi_context)), trecv);
@@ -326,7 +326,7 @@ int MPID_nem_ofi_cm_init(MPIDI_PG_t * pg_p, int pg_rank ATTRIBUTE((unused)))
                        conn_req->dev.user_buf,
                        OFI_KVSAPPSTRLEN,
                        gl_data.mr,
-                       0,
+                       FI_ADDR_UNSPEC,
                        MPID_CONN_REQ,
                        ~MPID_PROTOCOL_MASK, (void *) &(REQ_OFI(conn_req)->ofi_context)), trecv);
     gl_data.conn_req = conn_req;
diff --git a/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_impl.h b/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_impl.h
index 9e5f048..07b119c 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_impl.h
+++ b/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_impl.h
@@ -40,7 +40,7 @@ typedef int (*req_fn) (MPIDI_VC_t *, MPID_Request *, int *);
 /* Global Object for state tracking */
 /* ******************************** */
 typedef struct {
-    fi_addr_t bound_addr;       /* This ranks bound address    */
+    char bound_addr[128];       /* This ranks bound address    */
     fi_addr_t any_addr;         /* Specifies any source        */
     size_t bound_addrlen;       /* length of the bound address */
     struct fid_fabric *fabric;  /* fabric object               */
diff --git a/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_init.c b/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_init.c
index cefa87d..2fd7c87 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_init.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_init.c
@@ -56,7 +56,6 @@ int MPID_nem_ofi_init(MPIDI_PG_t * pg_p, int pg_rank, char **bc_val_p, int *val_
     hints.ep_type = FI_EP_RDM;  /* Reliable datagram         */
     hints.caps = FI_TAGGED;     /* Tag matching interface    */
     hints.caps |= FI_BUFFERED_RECV;     /* Buffered receives         */
-    hints.caps |= FI_REMOTE_COMPLETE;   /* Remote completion         */
     hints.caps |= FI_CANCEL;    /* Support cancel            */
     hints.caps |= FI_DYNAMIC_MR;        /* Global dynamic mem region */
 
@@ -80,7 +79,7 @@ int MPID_nem_ofi_init(MPIDI_PG_t * pg_p, int pg_rank, char **bc_val_p, int *val_
 
     domain_attr.threading = FI_THREAD_ENDPOINT;
     domain_attr.control_progress = FI_PROGRESS_AUTO;
-    tx_attr.op_flags = FI_REMOTE_COMPLETE;
+    domain_attr.data_progress = FI_PROGRESS_AUTO;
     hints.domain_attr = &domain_attr;
     hints.tx_attr = &tx_attr;
 
@@ -250,10 +249,8 @@ int MPID_nem_ofi_init(MPIDI_PG_t * pg_p, int pg_rank, char **bc_val_p, int *val_
     /* ---------------------------------------------------- */
     /* Insert the ANY_SRC address                           */
     /* ---------------------------------------------------- */
-    MPIU_CHKLMEM_MALLOC(null_addr, char *, 1 * gl_data.bound_addrlen, mpi_errno, "null_addr");
-    memset(null_addr, 0, gl_data.bound_addrlen);
 
-    FI_RC(fi_av_insert(gl_data.av, null_addr, 1, &gl_data.any_addr, 0ULL, NULL), avmap);
+    gl_data.any_addr = FI_ADDR_UNSPEC;
 
     /* --------------------------------- */
     /* Store the direct addresses in     */

http://git.mpich.org/mpich.git/commitdiff/71de78550fe7769367da55b2174f42daca62de9e

commit 71de78550fe7769367da55b2174f42daca62de9e
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Mon Jan 26 17:06:52 2015 -0600

    correct tag usage in MPIC_Sendrecv
    
    The tag for send was ignored and recvtag incorrectly used in its place.
    
    Signed-off-by: Wesley Bland <wbland at anl.gov>

diff --git a/src/mpi/coll/helper_fns.c b/src/mpi/coll/helper_fns.c
index 8fbbe56..e46ef45 100644
--- a/src/mpi/coll/helper_fns.c
+++ b/src/mpi/coll/helper_fns.c
@@ -465,7 +465,7 @@ int MPIC_Sendrecv(const void *sendbuf, int sendcount, MPI_Datatype sendtype,
     mpi_errno = MPID_Irecv(recvbuf, recvcount, recvtype, source, recvtag,
                            comm_ptr, context_id, &recv_req_ptr);
     if (mpi_errno) MPIU_ERR_POP(mpi_errno);
-    mpi_errno = MPID_Isend(sendbuf, sendcount, sendtype, dest, recvtag,
+    mpi_errno = MPID_Isend(sendbuf, sendcount, sendtype, dest, sendtag,
                            comm_ptr, context_id, &send_req_ptr);
     if (mpi_errno) MPIU_ERR_POP(mpi_errno);
 

http://git.mpich.org/mpich.git/commitdiff/d2adf9832b3f30a25c52217e1692101aa05d601b

commit d2adf9832b3f30a25c52217e1692101aa05d601b
Author: Rob Latham <robl at mcs.anl.gov>
Date:   Tue Jan 20 10:53:55 2015 -0600

    fix issue with small i/o and big datatypes
    
    HDF5 folks reported a bug with ROMIO and one of their slightly-strange (but
    100% legal) datatypes.  git-bisect points to the "promote size of length"
    change.  Seems that MPICH does not like struct datatypes with zero-count
    elements?  Further investigation requred.  This change (construct a simpler
    datatype in more cases) is sufficient to help HDF5 move forward.
    
    See #2221
    
    Signed-off-by: Ken Raffenetti <raffenet at mcs.anl.gov>

diff --git a/src/mpi/romio/adio/common/utils.c b/src/mpi/romio/adio/common/utils.c
index b539187..eef48cb 100644
--- a/src/mpi/romio/adio/common/utils.c
+++ b/src/mpi/romio/adio/common/utils.c
@@ -63,17 +63,40 @@ int ADIOI_Type_create_hindexed_x(int count,
     int i, ret;
     MPI_Datatype *types;
     int *blocklens;
+    int is_big=0;
 
     types = ADIOI_Malloc(count*sizeof(MPI_Datatype));
     blocklens = ADIOI_Malloc(count*sizeof(int));
 
+    /* squashing two loops into one.
+     * - Look in the array_of_blocklengths for any large values
+     * - convert MPI_Count items (if they are not too big) into int-sized items
+     * after this loop we will know if we can use MPI_type_hindexed or if we
+     * need a more complicated BigMPI-style struct-of-chunks.
+     *
+     * Why not use the struct-of-chunks in all cases?  HDF5 reported a bug,
+     * which I have not yet precicesly nailed down, but appears to have
+     * something to do with struct-of-chunks when the chunks are small */
+
     for(i=0; i<count; i++) {
-	blocklens[i] = 1;
-	type_create_contiguous_x(array_of_blocklengths[i], oldtype,  &(types[i]));
+	if (array_of_blocklengths[i] > INT_MAX) {
+	    blocklens[i] = 1;
+	    is_big=1;
+	    type_create_contiguous_x(array_of_blocklengths[i], oldtype,  &(types[i]));
+	} else {
+	    /* OK to cast: checked for "bigness" above */
+	    blocklens[i] = (int)array_of_blocklengths[i];
+	    MPI_Type_contiguous(blocklens[i], oldtype, &(types[i]));
+	}
     }
 
-    ret = MPI_Type_create_struct(count, blocklens, array_of_displacements,
-	    types, newtype);
+    if (is_big) {
+	ret = MPI_Type_create_struct(count, blocklens, array_of_displacements,
+		types, newtype);
+    } else {
+	ret = MPI_Type_hindexed(count, blocklens,
+		array_of_displacements, oldtype, newtype);
+    }
     for (i=0; i< count; i++)
 	MPI_Type_free(&(types[i]));
     ADIOI_Free(types);

http://git.mpich.org/mpich.git/commitdiff/2d465f31676c94b8c694718f28721fcb186815bb

commit 2d465f31676c94b8c694718f28721fcb186815bb
Author: Rob Latham <robl at mcs.anl.gov>
Date:   Mon Jan 19 15:36:08 2015 -0600

    clean up 64-to-32 bit casts
    
    many many places where a 64 bit value is stored in a 32 bit value
    
    Signed-off-by: Ken Raffenetti <raffenet at mcs.anl.gov>

diff --git a/src/mpi/romio/adio/common/ad_seek.c b/src/mpi/romio/adio/common/ad_seek.c
index d8bbe85..ceaf6f2 100644
--- a/src/mpi/romio/adio/common/ad_seek.c
+++ b/src/mpi/romio/adio/common/ad_seek.c
@@ -26,8 +26,8 @@ ADIO_Offset ADIOI_GEN_SeekIndividual(ADIO_File fd, ADIO_Offset offset,
     ADIO_Offset n_etypes_in_filetype, n_filetypes, etype_in_filetype;
     ADIO_Offset abs_off_in_filetype=0;
     ADIO_Offset size_in_filetype, sum;
-    MPI_Count filetype_size;
-    int etype_size, filetype_is_contig;
+    MPI_Count filetype_size, etype_size;
+    int filetype_is_contig;
     MPI_Aint filetype_extent;
 
     ADIOI_UNREFERENCED_ARG(whence);
@@ -35,7 +35,7 @@ ADIO_Offset ADIOI_GEN_SeekIndividual(ADIO_File fd, ADIO_Offset offset,
     ADIOI_Datatype_iscontig(fd->filetype, &filetype_is_contig);
     etype_size = fd->etype_size;
 
-    if (filetype_is_contig) off = fd->disp + (ADIO_Offset)etype_size * offset;
+    if (filetype_is_contig) off = fd->disp + etype_size * offset;
     else {
         flat_file = ADIOI_Flatlist;
         while (flat_file->type != fd->filetype) flat_file = flat_file->next;
diff --git a/src/mpi/romio/adio/common/utils.c b/src/mpi/romio/adio/common/utils.c
index 9814220..b539187 100644
--- a/src/mpi/romio/adio/common/utils.c
+++ b/src/mpi/romio/adio/common/utils.c
@@ -25,8 +25,12 @@ static int type_create_contiguous_x(MPI_Count count,
     int blocklens[2];
     MPI_Datatype types[2];
 
-    MPI_Count c = count/INT_MAX;
-    MPI_Count r = count%INT_MAX;
+    /* truly stupendously large counts will overflow an integer with this math,
+     * but that is a problem for a few decades from now.  Sorry, few decades
+     * from now! */
+    ADIOI_Assert(count/INT_MAX == (int)(count/INT_MAX));
+    int c = (int)(count/INT_MAX); /* OK to cast until 'count' is 256 bits */
+    int r = count%INT_MAX;
 
     MPI_Type_vector(c, INT_MAX, INT_MAX, oldtype, &chunks);
     MPI_Type_contiguous(r, oldtype, &remainder);

http://git.mpich.org/mpich.git/commitdiff/2b090de851871fbd600757ca885d9ccb6df7afdd

commit 2b090de851871fbd600757ca885d9ccb6df7afdd
Author: Rob Latham <robl at mcs.anl.gov>
Date:   Mon Jan 19 14:12:03 2015 -0600

    Tweak MPIDU_Datatype_debug output
    
    - bump up subtypes from 3 to 6. The limit is arbitrary.  I am trying to
      figure out a type with 4 sub-types.
    - split up indexed/hindexed lists onto separate lines.  MPICH debug
      output format adds its own newlines, but we have to clean out MPICH's
      extra debug output anyway: joining a few lines isn't that much more
      work.
    - output a name of the digraph that graphviz can actually parse.
    
    Signed-off-by: Ken Raffenetti <raffenet at mcs.anl.gov>

diff --git a/src/mpid/common/datatype/mpid_type_debug.c b/src/mpid/common/datatype/mpid_type_debug.c
index 2489dbd..25a81ed 100644
--- a/src/mpid/common/datatype/mpid_type_debug.c
+++ b/src/mpid/common/datatype/mpid_type_debug.c
@@ -26,6 +26,9 @@ void MPIDI_Dataloop_dot_printf(MPID_Dataloop *loop_p, int depth, int header);
 void MPIDI_Datatype_contents_printf(MPI_Datatype type, int depth, int acount);
 static char *MPIDI_Datatype_depth_spacing(int depth) ATTRIBUTE((unused));
 
+#define NR_TYPE_CUTOFF 6 /* Number of types to display before truncating
+			    output. 6 picked as arbitrary cutoff */
+
 /* note: this isn't really "error handling" per se, but leave these comments
  * because Bill uses them for coverage analysis.
  */
@@ -65,7 +68,8 @@ void MPIDI_Dataloop_dot_printf(MPID_Dataloop *loop_p,
 
     if (header) {
 	MPIU_DBG_OUT_FMT(DATATYPE,(MPIU_DBG_FDEST,
-				   "digraph %p {   {", loop_p));
+		    /* graphviz does not like the 0xNNN format */
+				   "digraph %lld {   {", (long long int)loop_p));
     }
 
     switch (loop_p->kind & DLOOP_KIND_MASK) {
@@ -94,28 +98,27 @@ void MPIDI_Dataloop_dot_printf(MPID_Dataloop *loop_p,
 			    (int) loop_p->loop_params.i_t.count,
 			    (int) loop_p->loop_params.i_t.total_blocks));
 
-	    /* 3 picked as arbitrary cutoff */
-	    for (i=0; i < 3 && i < loop_p->loop_params.i_t.count; i++) {
+	    for (i=0; i < NR_TYPE_CUTOFF && i < loop_p->loop_params.i_t.count; i++) {
 		if (i + 1 < loop_p->loop_params.i_t.count) {
 		    /* more regions after this one */
 		    MPIU_DBG_OUT_FMT(DATATYPE,(MPIU_DBG_FDEST,
-		    "(" MPI_AINT_FMT_DEC_SPEC ", %d), ",
+		    "\\n(" MPI_AINT_FMT_DEC_SPEC ", %d), ",
 			  (MPI_Aint) loop_p->loop_params.i_t.offset_array[i],
 		          (int) loop_p->loop_params.i_t.blocksize_array[i]));
 		}
 		else {
 		    MPIU_DBG_OUT_FMT(DATATYPE,(MPIU_DBG_FDEST,
-		           "(" MPI_AINT_FMT_DEC_SPEC ", %d); ",
+		           "\\n(" MPI_AINT_FMT_DEC_SPEC ", %d); ",
 		           (MPI_Aint) loop_p->loop_params.i_t.offset_array[i],
 			   (int) loop_p->loop_params.i_t.blocksize_array[i]));
 		}
 	    }
 	    if (i < loop_p->loop_params.i_t.count) {
-		MPIU_DBG_OUT(DATATYPE,"...; ");
+		MPIU_DBG_OUT(DATATYPE,"\\n...; ");
 	    }
 
 	    MPIU_DBG_OUT_FMT(DATATYPE,(MPIU_DBG_FDEST,
-				       "el_sz = " MPI_AINT_FMT_DEC_SPEC "; el_ext = " MPI_AINT_FMT_DEC_SPEC " }\"];\n",
+				       "\\nel_sz = " MPI_AINT_FMT_DEC_SPEC "; el_ext = " MPI_AINT_FMT_DEC_SPEC " }\"];\n",
 				       (MPI_Aint) loop_p->el_size,
 				       (MPI_Aint) loop_p->el_extent));
 	    break;
@@ -126,12 +129,11 @@ void MPIDI_Dataloop_dot_printf(MPID_Dataloop *loop_p,
 			    (int) loop_p->loop_params.bi_t.count,
 			    (int) loop_p->loop_params.bi_t.blocksize));
 
-	    /* 3 picked as arbitrary cutoff */
-	    for (i=0; i < 3 && i < loop_p->loop_params.bi_t.count; i++) {
+	    for (i=0; i < NR_TYPE_CUTOFF && i < loop_p->loop_params.bi_t.count; i++) {
 		if (i + 1 < loop_p->loop_params.bi_t.count) {
 		    /* more regions after this one */
 		    MPIU_DBG_OUT_FMT(DATATYPE,(MPIU_DBG_FDEST,
-		        MPI_AINT_FMT_DEC_SPEC ", ",
+		        MPI_AINT_FMT_DEC_SPEC ",\\n ",
 			(MPI_Aint) loop_p->loop_params.bi_t.offset_array[i]));
 		}
 		else {
@@ -145,7 +147,7 @@ void MPIDI_Dataloop_dot_printf(MPID_Dataloop *loop_p,
 	    }
 
 	    MPIU_DBG_OUT_FMT(DATATYPE,(MPIU_DBG_FDEST,
-				      "el_sz = " MPI_AINT_FMT_DEC_SPEC "; el_ext = " MPI_AINT_FMT_DEC_SPEC " }\"];",
+				      "\\nel_sz = " MPI_AINT_FMT_DEC_SPEC "; el_ext = " MPI_AINT_FMT_DEC_SPEC " }\"];",
 				       (MPI_Aint) loop_p->el_size,
 				       (MPI_Aint) loop_p->el_extent));
 	    break;
@@ -154,7 +156,7 @@ void MPIDI_Dataloop_dot_printf(MPID_Dataloop *loop_p,
 	    "      dl%d [shape = record, label = \"struct | {ct = %d; blks = ",
 			    depth,
 			    (int) loop_p->loop_params.s_t.count));
-	    for (i=0; i < 3 && i < loop_p->loop_params.s_t.count; i++) {
+	    for (i=0; i < NR_TYPE_CUTOFF && i < loop_p->loop_params.s_t.count; i++) {
 		if (i + 1 < loop_p->loop_params.s_t.count) {
 		    MPIU_DBG_OUT_FMT(DATATYPE,(MPIU_DBG_FDEST,"%d, ",
 			    (int) loop_p->loop_params.s_t.blocksize_array[i]));
@@ -171,7 +173,7 @@ void MPIDI_Dataloop_dot_printf(MPID_Dataloop *loop_p,
 		MPIU_DBG_OUT(DATATYPE,"disps = ");
 	    }
 
-	    for (i=0; i < 3 && i < loop_p->loop_params.s_t.count; i++) {
+	    for (i=0; i < NR_TYPE_CUTOFF && i < loop_p->loop_params.s_t.count; i++) {
 		if (i + 1 < loop_p->loop_params.s_t.count) {
 		    MPIU_DBG_OUT_FMT(DATATYPE,(MPIU_DBG_FDEST,MPI_AINT_FMT_DEC_SPEC ", ",
 			    (MPI_Aint) loop_p->loop_params.s_t.offset_array[i]));

http://git.mpich.org/mpich.git/commitdiff/95d785be8f8babd97fba6cb3db1b3ea06e42356e

commit 95d785be8f8babd97fba6cb3db1b3ea06e42356e
Author: Su Huang <suhuang at us.ibm.com>
Date:   Fri Jan 16 14:55:16 2015 -0500

     PAMID:MP_STATISTICS=print will cause mpi hw case coredump
    
     The segfault was caused by the library trying to free an already freed mpid_statp
     structure. The structure is freed right after the status information is printed.
     To fix the problem, the mpid_statp is set to NULL after the free is done.
    
     (ibm) D202018
    
    Signed-off-by: Sameh Sharkawi <sssharka at us.ibm.com>

diff --git a/src/mpid/pamid/src/mpidi_util.c b/src/mpid/pamid/src/mpidi_util.c
index 8217ade..774e786 100644
--- a/src/mpid/pamid/src/mpidi_util.c
+++ b/src/mpid/pamid/src/mpidi_util.c
@@ -800,10 +800,16 @@ void MPIDI_print_statistics() {
        (MPIDI_Process.mp_printenv)) {
        if (MPIDI_Process.mp_statistics) {
            MPIDI_Statistics_write(stdout);
-           if (mpid_statp) MPIU_Free(mpid_statp);
+           if (mpid_statp) {
+               MPIU_Free(mpid_statp);
+               mpid_statp=NULL;
+           }
        }
     if (MPIDI_Process.mp_printenv) {
-        if (mpich_env)  MPIU_Free(mpich_env);
+        if (mpich_env) {
+            MPIU_Free(mpich_env);
+            mpich_env=NULL;
+        }
     }
   }
 }

http://git.mpich.org/mpich.git/commitdiff/c007db6c5e041543ca3a9c187dfa38de795dda87

commit c007db6c5e041543ca3a9c187dfa38de795dda87
Author: Rob Latham <robl at mcs.anl.gov>
Date:   Mon Jan 12 16:23:14 2015 -0600

    use PATH_MAX instead of magic number
    
    User on OpenMPI list wanted to create a 259 character file.  shared file
    pointer name construction used the magic '256' value to construct a full
    path to the hidden shared file pointer file.  PATH_MAX already exists
    for this purpose, so use it.
    
    While there, found a few spots checking/setting PATH_MAX, so do that in
    one place
    
    Closes #2212
    
    Signed-off-by: Ken Raffenetti <raffenet at mcs.anl.gov>

diff --git a/src/mpi/romio/adio/common/ad_fstype.c b/src/mpi/romio/adio/common/ad_fstype.c
index e2c41ea..c89b560 100644
--- a/src/mpi/romio/adio/common/ad_fstype.c
+++ b/src/mpi/romio/adio/common/ad_fstype.c
@@ -137,10 +137,6 @@ Output Parameters:
 */
 #ifdef ROMIO_NEEDS_ADIOPARENTDIR
 
-#ifndef PATH_MAX
-#define PATH_MAX 65535
-#endif
-
 /* In a strict ANSI environment, S_ISLNK may not be defined.  Fix that
    here.  We assume that S_ISLNK is *always* defined as a macro.  If
    that is not universally true, then add a test to the romio
diff --git a/src/mpi/romio/adio/common/shfp_fname.c b/src/mpi/romio/adio/common/shfp_fname.c
index f1e5eee..dfa5baf 100644
--- a/src/mpi/romio/adio/common/shfp_fname.c
+++ b/src/mpi/romio/adio/common/shfp_fname.c
@@ -21,8 +21,11 @@
    store the shared file pointer. The shared-file-pointer file is a 
    hidden file in the same directory as the real file being accessed.
    If the real file is /tmp/thakur/testfile, the shared-file-pointer
-   file will be /tmp/thakur/.testfile.shfp.xxxx, where xxxx is
-   a random number. This file is created only if the shared
+   file will be /tmp/thakur/.testfile.shfp.yyy.xxxx, where yyy
+   is rank 0's process id and xxxx is a random number. If the
+   underlying file system supports shared file pointers
+   (PVFS does not, for example), the file name is always
+   constructed. This file is created only if the shared
    file pointer functions are used and is deleted when the real
    file is closed. */
 
@@ -33,14 +36,14 @@ void ADIOI_Shfp_fname(ADIO_File fd, int rank, int *error_code)
     char *slash, *ptr, tmp[128];
     int pid = 0;
 
-    fd->shared_fp_fname = (char *) ADIOI_Malloc(256);
+    fd->shared_fp_fname = (char *) ADIOI_Malloc(PATH_MAX);
 
     if (!rank) {
         srand(time(NULL));
         i = rand();
 	pid = (int)getpid();
 	
-	if (ADIOI_Strncpy(fd->shared_fp_fname, fd->filename, 256)) {
+	if (ADIOI_Strncpy(fd->shared_fp_fname, fd->filename, PATH_MAX)) {
 	    *error_code = ADIOI_Err_create_code("ADIOI_Shfp_fname",
 		    fd->filename, ENAMETOOLONG);
 	    return;
@@ -57,7 +60,7 @@ void ADIOI_Shfp_fname(ADIO_File fd, int rank, int *error_code)
 			fd->filename, ENAMETOOLONG);
 		return;
 	    }
-	    if (ADIOI_Strncpy(fd->shared_fp_fname + 1, fd->filename, 255)) {
+	    if (ADIOI_Strncpy(fd->shared_fp_fname + 1, fd->filename, PATH_MAX-1)) {
 		*error_code = ADIOI_Err_create_code("ADIOI_Shfp_fname",
 			fd->filename, ENAMETOOLONG);
 		return;
@@ -76,7 +79,7 @@ void ADIOI_Shfp_fname(ADIO_File fd, int rank, int *error_code)
 		return;
 	    }
 	    /* ok to cast: file names bounded by PATH_MAX and NAME_MAX */
-	    len = (int) (256 - (slash+2 - fd->shared_fp_fname));
+	    len = (int) (PATH_MAX - (slash+2 - fd->shared_fp_fname));
 	    if (ADIOI_Strncpy(slash + 2, ptr + 1, len)) {
 		*error_code = ADIOI_Err_create_code("ADIOI_Shfp_fname",
 			ptr + 1, ENAMETOOLONG);
@@ -86,7 +89,7 @@ void ADIOI_Shfp_fname(ADIO_File fd, int rank, int *error_code)
 	    
 	ADIOI_Snprintf(tmp, 128, ".shfp.%d.%d", pid, i);
 	/* ADIOI_Strnapp will return non-zero if truncated.  That's ok */
-	ADIOI_Strnapp(fd->shared_fp_fname, tmp, 256);
+	ADIOI_Strnapp(fd->shared_fp_fname, tmp, PATH_MAX);
 	
 	len = (int)strlen(fd->shared_fp_fname);
 	MPI_Bcast(&len, 1, MPI_INT, 0, fd->comm);
diff --git a/src/mpi/romio/adio/common/system_hints.c b/src/mpi/romio/adio/common/system_hints.c
index 5d0e24b..fd6cba5 100644
--- a/src/mpi/romio/adio/common/system_hints.c
+++ b/src/mpi/romio/adio/common/system_hints.c
@@ -28,10 +28,6 @@
 #include <io.h>
 #endif
 
-#ifndef PATH_MAX
-#define PATH_MAX 65535
-#endif
-
 /*#define SYSHINT_DEBUG 1  */
 
 #define ROMIO_HINT_DEFAULT_CFG "/etc/romio-hints"
diff --git a/src/mpi/romio/adio/include/adioi.h b/src/mpi/romio/adio/include/adioi.h
index 2532941..3fddbda 100644
--- a/src/mpi/romio/adio/include/adioi.h
+++ b/src/mpi/romio/adio/include/adioi.h
@@ -906,7 +906,13 @@ typedef struct wcThreadFuncData {
 void *ADIOI_IO_Thread_Func(void *vptr_args);
 
 
+#ifdef HAVE_LIMITS_H
+#include <limits.h>
+#endif
 
+#ifndef PATH_MAX
+#define PATH_MAX 65535
+#endif
 
 #endif
 

http://git.mpich.org/mpich.git/commitdiff/a53b662ce47168471fca4c1b71381c7cc8ac440b

commit a53b662ce47168471fca4c1b71381c7cc8ac440b
Author: Rob Latham <robl at mcs.anl.gov>
Date:   Mon Jan 12 15:50:32 2015 -0600

    make ADIOI_Shfp_fname report errors
    
    Right now there's only one error condition: file name too long.  This
    change checks return codes of ADIOI_Strncpy and informs caller.
    Otherwise, really long names result in buffer overruns.
    
    See #2212
    
    Signed-off-by: Ken Raffenetti <raffenet at mcs.anl.gov>

diff --git a/src/mpi/romio/adio/common/shfp_fname.c b/src/mpi/romio/adio/common/shfp_fname.c
index 3761cf8..f1e5eee 100644
--- a/src/mpi/romio/adio/common/shfp_fname.c
+++ b/src/mpi/romio/adio/common/shfp_fname.c
@@ -26,7 +26,7 @@
    file pointer functions are used and is deleted when the real
    file is closed. */
 
-void ADIOI_Shfp_fname(ADIO_File fd, int rank)
+void ADIOI_Shfp_fname(ADIO_File fd, int rank, int *error_code)
 {
     int i;
     int len;
@@ -40,7 +40,11 @@ void ADIOI_Shfp_fname(ADIO_File fd, int rank)
         i = rand();
 	pid = (int)getpid();
 	
-	ADIOI_Strncpy(fd->shared_fp_fname, fd->filename, 256);
+	if (ADIOI_Strncpy(fd->shared_fp_fname, fd->filename, 256)) {
+	    *error_code = ADIOI_Err_create_code("ADIOI_Shfp_fname",
+		    fd->filename, ENAMETOOLONG);
+	    return;
+	}
 	
 #ifdef ROMIO_NTFS
 	slash = strrchr(fd->filename, '\\');
@@ -48,8 +52,16 @@ void ADIOI_Shfp_fname(ADIO_File fd, int rank)
 	slash = strrchr(fd->filename, '/');
 #endif
 	if (!slash) {
-	    ADIOI_Strncpy(fd->shared_fp_fname, ".", 2);
-	    ADIOI_Strncpy(fd->shared_fp_fname + 1, fd->filename, 255);
+	    if (ADIOI_Strncpy(fd->shared_fp_fname, ".", 2)) {
+		*error_code = ADIOI_Err_create_code("ADIOI_Shfp_fname",
+			fd->filename, ENAMETOOLONG);
+		return;
+	    }
+	    if (ADIOI_Strncpy(fd->shared_fp_fname + 1, fd->filename, 255)) {
+		*error_code = ADIOI_Err_create_code("ADIOI_Shfp_fname",
+			fd->filename, ENAMETOOLONG);
+		return;
+	    }
 	}
 	else {
 	    ptr = slash;
@@ -58,13 +70,22 @@ void ADIOI_Shfp_fname(ADIO_File fd, int rank)
 #else
 	    slash = strrchr(fd->shared_fp_fname, '/');
 #endif
-	    ADIOI_Strncpy(slash + 1, ".", 2);
+	    if (ADIOI_Strncpy(slash + 1, ".", 2))  {
+		*error_code = ADIOI_Err_create_code("ADIOI_Shfp_fname",
+			fd->filename, ENAMETOOLONG);
+		return;
+	    }
 	    /* ok to cast: file names bounded by PATH_MAX and NAME_MAX */
 	    len = (int) (256 - (slash+2 - fd->shared_fp_fname));
-	    ADIOI_Strncpy(slash + 2, ptr + 1, len);
+	    if (ADIOI_Strncpy(slash + 2, ptr + 1, len)) {
+		*error_code = ADIOI_Err_create_code("ADIOI_Shfp_fname",
+			ptr + 1, ENAMETOOLONG);
+		return;
+	    }
 	}
 	    
 	ADIOI_Snprintf(tmp, 128, ".shfp.%d.%d", pid, i);
+	/* ADIOI_Strnapp will return non-zero if truncated.  That's ok */
 	ADIOI_Strnapp(fd->shared_fp_fname, tmp, 256);
 	
 	len = (int)strlen(fd->shared_fp_fname);
diff --git a/src/mpi/romio/adio/include/adioi.h b/src/mpi/romio/adio/include/adioi.h
index e3f9a16..2532941 100644
--- a/src/mpi/romio/adio/include/adioi.h
+++ b/src/mpi/romio/adio/include/adioi.h
@@ -579,7 +579,7 @@ ADIO_Offset ADIOI_GEN_SeekIndividual(ADIO_File fd, ADIO_Offset offset,
 void ADIOI_GEN_Resize(ADIO_File fd, ADIO_Offset size, int *error_code);
 void ADIOI_GEN_SetInfo(ADIO_File fd, MPI_Info users_info, int *error_code);
 void ADIOI_GEN_Close(ADIO_File fd, int *error_code);
-void ADIOI_Shfp_fname(ADIO_File fd, int rank);
+void ADIOI_Shfp_fname(ADIO_File fd, int rank, int *error_code);
 void ADIOI_GEN_Prealloc(ADIO_File fd, ADIO_Offset size, int *error_code);
 int ADIOI_Error(ADIO_File fd, int error_code, char *string);
 int MPIR_Err_setmsg( int, int, const char *, const char *, const char *, ... );
diff --git a/src/mpi/romio/mpi-io/open.c b/src/mpi/romio/mpi-io/open.c
index 27a3e84..a2a68c9 100644
--- a/src/mpi/romio/mpi-io/open.c
+++ b/src/mpi/romio/mpi-io/open.c
@@ -179,7 +179,9 @@ int MPI_File_open(MPI_Comm comm, ROMIO_CONST char *filename, int amode,
     if ((error_code == MPI_SUCCESS) && 
 		    ADIO_Feature((*fh), ADIO_SHARED_FP)) {
 	MPI_Comm_rank(dupcomm, &rank);
-	ADIOI_Shfp_fname(*fh, rank);
+	ADIOI_Shfp_fname(*fh, rank, &error_code);
+	if (error_code != MPI_SUCCESS)
+	    goto fn_fail;
 
         /* if MPI_MODE_APPEND, set the shared file pointer to end of file.
            indiv. file pointer already set to end of file in ADIO_Open. 

http://git.mpich.org/mpich.git/commitdiff/7fec9f73c79ab55a6cb1d0fe14afe18a2240f315

commit 7fec9f73c79ab55a6cb1d0fe14afe18a2240f315
Author: Charles J Archer <charles.j.archer at intel.com>
Date:   Wed Jan 14 08:58:12 2015 -0800

    Updates to latest OFI pre-1.0 release
    
    Compile time fix required for OFI threading model
    No semantic changes
    
    Signed-off-by: Yohann Burette <yohann.burette at intel.com>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_init.c b/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_init.c
index 182a780..cefa87d 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_init.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_init.c
@@ -78,7 +78,7 @@ int MPID_nem_ofi_init(MPIDI_PG_t * pg_p, int pg_rank, char **bc_val_p, int *val_
     tx_attr_t tx_attr;
     memset(&tx_attr, 0, sizeof(tx_attr));
 
-    domain_attr.threading = FI_THREAD_PROGRESS;
+    domain_attr.threading = FI_THREAD_ENDPOINT;
     domain_attr.control_progress = FI_PROGRESS_AUTO;
     tx_attr.op_flags = FI_REMOTE_COMPLETE;
     hints.domain_attr = &domain_attr;

http://git.mpich.org/mpich.git/commitdiff/30cff73f4ecc07d3387035f23f9de406b0e733df

commit 30cff73f4ecc07d3387035f23f9de406b0e733df
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Thu Jan 8 13:03:15 2015 -0600

    rptls: do not send pause/unpause messages to self
    
    CH3 ensures that self communication does not go through the netmod,
    so there is no need for a process to pause/unpause itself.
    
    Signed-off-by: Antonio J. Pena <apenya at mcs.anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c
index 6e58e8f..74a39ee 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c
@@ -142,7 +142,7 @@ static int poke_progress(void)
         /* if we are in a local AWAITING PAUSE ACKS state, see if we
          * can send out the unpause message */
         if (rptl->local_state == RPTL_LOCAL_STATE_AWAITING_PAUSE_ACKS &&
-            rptl->pause_ack_counter == rptl_info.world_size) {
+            rptl->pause_ack_counter == rptl_info.world_size - 1) {
             /* if we are over the max count limit, do not send an
              * unpause message yet */
             if (rptl->data.ob_curr_count > rptl->data.ob_max_count)
@@ -154,6 +154,8 @@ static int poke_progress(void)
             rptl->local_state = RPTL_LOCAL_STATE_ACTIVE;
 
             for (i = 0; i < rptl_info.world_size; i++) {
+                if (i == MPIDI_Process.my_pg_rank)
+                    continue;
                 mpi_errno = rptl_info.get_target_info(i, &id, rptl->data.pt, &data_pt, &control_pt);
                 if (mpi_errno) {
                     ret = PTL_FAIL;
@@ -182,6 +184,8 @@ static int poke_progress(void)
             /* send a pause ack message */
             assert(target->rptl);
             for (i = 0; i < rptl_info.world_size; i++) {
+                if (i == MPIDI_Process.my_pg_rank)
+                    continue;
                 /* find the target that has this target id and get the
                  * control portal information for it */
                 mpi_errno = rptl_info.get_target_info(i, &id, target->rptl->data.pt, &data_pt, &control_pt);
@@ -455,6 +459,8 @@ static int send_pause_messages(struct rptl *rptl)
     rptl->data.ob_max_count = rptl->data.ob_curr_count / 2;
 
     for (i = 0; i < rptl_info.world_size; i++) {
+        if (i == MPIDI_Process.my_pg_rank)
+            continue;
         mpi_errno = rptl_info.get_target_info(i, &id, rptl->data.pt, &data_pt, &control_pt);
         if (mpi_errno) {
             ret = PTL_FAIL;

http://git.mpich.org/mpich.git/commitdiff/4dd94abb51fccac523fd7f37b830583a738d13ce

commit 4dd94abb51fccac523fd7f37b830583a738d13ce
Author: Ralph Castain <rhc at open-mpi.org>
Date:   Tue Jan 6 18:40:52 2015 -0800

    ROMIO: Add missing files to Makefile.mk 'noinst'
    
    OpenMPI uses 'make dist', but MPICH does not.  Some recently added
    (internal) header files were not listed in ROMIO's noinst declaration
    
    Note: RobL combined and edited these OpenMPI patches into this patch:
    - e0927895db8d
    - 84c41429e9ac
    
    Signed-off-by: Rob Latham <robl at mcs.anl.gov>

diff --git a/src/mpi/romio/adio/Makefile.mk b/src/mpi/romio/adio/Makefile.mk
index a64eb9e..505d518 100644
--- a/src/mpi/romio/adio/Makefile.mk
+++ b/src/mpi/romio/adio/Makefile.mk
@@ -19,7 +19,9 @@ noinst_HEADERS +=                      \
     adio/include/mpio_error.h          \
     adio/include/mpipr.h               \
     adio/include/mpiu_greq.h           \
-    adio/include/nopackage.h
+    adio/include/nopackage.h           \
+    adio/include/mpiu_external32.h     \
+    adio/include/hint_fns.h
 
 include $(top_srcdir)/adio/ad_gpfs/Makefile.mk
 include $(top_srcdir)/adio/ad_gpfs/bg/Makefile.mk

http://git.mpich.org/mpich.git/commitdiff/4c1c33fbf9ca8443338f7ef2cb5461f78d1cbe62

commit 4c1c33fbf9ca8443338f7ef2cb5461f78d1cbe62
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Tue Dec 23 10:39:50 2014 -0600

    add FCMODOUTFLAG to FC_COMPILE_MODS
    
    Adding FCMODOUTFLAG directly to AM_FCFLAGS could cause conflicts with
    certain libtool flags (-module) during linking. This change allows us
    to set FCMODOUTFLAG during module creation, but not have it present
    during linking. Refs #2024
    
    Signed-off-by: Junchao Zhang <jczhang at mcs.anl.gov>

diff --git a/src/binding/fortran/use_mpi/buildiface b/src/binding/fortran/use_mpi/buildiface
index 8159e69..cc6dc72 100755
--- a/src/binding/fortran/use_mpi/buildiface
+++ b/src/binding/fortran/use_mpi/buildiface
@@ -950,7 +950,7 @@ nodist_noinst_HEADERS += \\
 
 # cause any .\$(MOD) files to be output in the f90 bindings directory instead of
 # the current directory
-AM_FCFLAGS += \$(FCMODOUTFLAG)src/binding/fortran/use_mpi
+FC_COMPILE_MODS += \$(FCMODOUTFLAG)src/binding/fortran/use_mpi
 
 mpi_fc_sources += \\
     src/binding/fortran/use_mpi/typef90cmplxf.c \\

http://git.mpich.org/mpich.git/commitdiff/56044c3a33e1200855ff8edb72dea8ee6a4f5996

commit 56044c3a33e1200855ff8edb72dea8ee6a4f5996
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Tue Dec 23 11:27:57 2014 -0600

    patch libtool for ifort on darwin
    
    Recent versions of ifort on darwin will drop flags intended for the
    linker unless they are prefixed with "-Wl,". Jeff Hammond checked with
    the Intel compiler folks, and they confirmed that "-Wl," has been
    supported since the initial ifort release on OSX (9.1).
    
    Closes #2024
    
    Signed-off-by: Junchao Zhang <jczhang at mcs.anl.gov>

diff --git a/autogen.sh b/autogen.sh
index cdbd09d..8a8ccd9 100755
--- a/autogen.sh
+++ b/autogen.sh
@@ -951,7 +951,22 @@ if [ "$do_build_configure" = "yes" ] ; then
                     fi
                 fi
 
-                if [ $powerpcle_patch_requires_rebuild = "yes" -o $nagfor_patch_requires_rebuild = "yes" ] ; then
+                # There is no need to patch if we're not going to use Fortran.
+                ifort_patch_requires_rebuild=no
+                if [ $do_bindings = "yes" ] ; then
+                    echo_n "Patching libtool.m4 for compatibility with ifort on OSX... "
+                    patch -N -s -l $amdir/confdb/libtool.m4 maint/darwin-ifort.patch
+                    if [ $? -eq 0 ] ; then
+                        ifort_patch_requires_rebuild=yes
+                        # Remove possible leftovers, which don't imply a failure
+                        rm -f $amdir/confdb/libtool.m4.orig
+                        echo "done"
+                    else
+                        echo "failed"
+                    fi
+                fi
+
+                if [ $powerpcle_patch_requires_rebuild = "yes" -o $nagfor_patch_requires_rebuild = "yes" -o $ifort_patch_requires_rebuild = "yes" ] ; then
                     # Rebuild configure
                     (cd $amdir && $autoconf -f) || exit 1
                     # Reset libtool.m4 timestamps to avoid confusing make
diff --git a/maint/darwin-ifort.patch b/maint/darwin-ifort.patch
new file mode 100644
index 0000000..42c7816
--- /dev/null
+++ b/maint/darwin-ifort.patch
@@ -0,0 +1,14 @@
+--- confdb/libtool.m4~ 2014-12-23 10:59:38.000000000 -0600
++++ confdb/libtool.m4  2014-12-23 11:05:54.000000000 -0600
+@@ -1097,7 +1097,10 @@
+   _LT_TAGVAR(link_all_deplibs, $1)=yes
+   _LT_TAGVAR(allow_undefined_flag, $1)="$_lt_dar_allow_undefined"
+   case $cc_basename in
+-     ifort*) _lt_dar_can_shared=yes ;;
++     ifort*)
++        _LT_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,'
++        _lt_dar_can_shared=yes
++      ;;
+      *) _lt_dar_can_shared=$GCC ;;
+   esac
+   if test "$_lt_dar_can_shared" = "yes"; then

http://git.mpich.org/mpich.git/commitdiff/ff007459aa3d5d4ae1f2d70ef59e714690a7e619

commit ff007459aa3d5d4ae1f2d70ef59e714690a7e619
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Wed Dec 17 16:15:20 2014 -0600

    Fortran profiling interface fix
    
    Previous re-organization of the library symbols resulted in a
    situation where Fortran programs could no longer be profiled using
    tools written in C. Functions in libmpifort directly called the
    PMPI_* versions in libmpi.
    
    Now we always call the MPI_* versions from libmpifort. In the case
    where we are building a separate profiling library, we use a new
    preprocessor flag to ensure we call PMPI_* from inside libpmpi.
    
    Additional bug fix:
      - always define mpi_conversion_fn_null_, there is no pmpi version
    
    Fixes #2209
    
    Signed-off-by: Junchao Zhang <jczhang at mcs.anl.gov>

diff --git a/Makefile.am b/Makefile.am
index 818f60d..463bfe4 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -150,6 +150,7 @@ if BUILD_PROFILING_LIB
 lib_LTLIBRARIES += lib/lib at PMPILIBNAME@.la
 lib_lib at PMPILIBNAME@_la_SOURCES = $(mpi_sources) $(mpi_f77_sources) $(mpi_core_sources)
 lib_lib at PMPILIBNAME@_la_LDFLAGS = $(external_ldflags) $(ABIVERSIONFLAGS)
+lib_lib at PMPILIBNAME@_la_CPPFLAGS = $(AM_CPPFLAGS) -DF77_USE_PMPI
 lib_lib at PMPILIBNAME@_la_LIBADD = $(external_libs) $(pmpi_convenience_libs)
 EXTRA_lib_lib at PMPILIBNAME@_la_DEPENDENCIES = $(pmpi_convenience_libs)
 
diff --git a/src/binding/fortran/mpif_h/buildiface b/src/binding/fortran/mpif_h/buildiface
index 5498459..62e0073 100755
--- a/src/binding/fortran/mpif_h/buildiface
+++ b/src/binding/fortran/mpif_h/buildiface
@@ -1367,11 +1367,13 @@ sub print_name_map_block {
 #define ${lcprefix}${lcname}_ p${lcprefix}${lcname}_
 #endif /* Test on name mapping */
 
+#ifdef F77_USE_PMPI
 /* This defines the routine that we call, which must be the PMPI version
    since we're renaming the Fortran entry as the pmpi version.  The MPI name
    must be undefined first to prevent any conflicts with previous renamings. */
 #undef ${ucprefix}${routine_name}
 #define ${ucprefix}${routine_name} P${ucprefix}${routine_name} 
+#endif
 
 #else
 ";
@@ -5327,12 +5329,10 @@ extern FORT_DLL_SPEC int FORT_CALL mpi_conversion_fn_null_ ( void*v1, MPI_Fint*v
 
 #endif
 
-#ifndef MPICH_MPI_FROM_PMPI
 /* This isn't a callable function */
 FORT_DLL_SPEC int FORT_CALL mpi_conversion_fn_null_ ( void*v1, MPI_Fint*v2, MPI_Fint*v3, void*v4, MPI_Offset*v5, MPI_Fint *v6, MPI_Fint*v7, MPI_Fint *ierr ) {
     return 0;
 }
-#endif
 
 ";
 

http://git.mpich.org/mpich.git/commitdiff/8e4302ce935bab8007420e865b53e04521dfd8df

commit 8e4302ce935bab8007420e865b53e04521dfd8df
Author: William Gropp <wgropp at illinois.edu>
Date:   Thu Dec 18 10:41:11 2014 -0600

    Add timelimit option
    
    Adds a way to pass a timelimit argument to the run command, as long
    as the timelimit is in seconds.  This is enough for some of the MPICH
    versions of mpiexec and for recent versions of the Cray aprun command.
    
    Signed-off-by: Wesley Bland <wbland at anl.gov>

diff --git a/test/mpi/runtests.in b/test/mpi/runtests.in
index 3edc028..a6d05f3 100644
--- a/test/mpi/runtests.in
+++ b/test/mpi/runtests.in
@@ -49,6 +49,9 @@ $mpiexec = "@MPIEXEC@";    # Name of mpiexec program (including path, if necessa
 # "-ppn %d"
 $ppnArg  = "";
 $ppnMax  = -1;
+# timelimitArg is the argument to use to mpiexec to set the timelimit
+# in seconds.  The format is "string%d", e.g., "-t %d" for Cray aprun
+$timelimitArg="";
 #
 $testIsStrict = "@MPI_IS_STRICT@";
 $MPIhasMPIX   = "@MPI_HAS_MPIX@";
@@ -160,6 +163,9 @@ if (defined($ENV{'MPITEST_PPNARG'})) {
 if (defined($ENV{'MPITEST_PPNMAX'})) {
     $ppnMax = $ENV{'MPITEST_PPNMAX'};
 }
+if (defined($ENV{'MPITEST_TIMELIMITARG'})) {
+    $timelimitArg = $ENV{'MPITEST_TIMELIMITARG'};
+}
 
 #---------------------------------------------------------------------------
 # Process arguments and override any defaults
@@ -174,6 +180,7 @@ foreach $_ (@ARGV) {
     elsif (/--?maxnp=(\d+)/) { $np_max = $1; }
     elsif (/--?ppn=(\d+)/)  { $ppnMax = $1; }
     elsif (/--?ppnarg=(.*)/) { $ppnArg = $1; }
+    elsif (/--?timelimitarg=(.*)/) { $timelimitArg = $1; }
     elsif (/--?tests=(.*)/) { $listfiles = $1; }
     elsif (/--?srcdir=(.*)/) { $srcdir = $1; }
     elsif (/--?verbose/) { $verbose = 1; }
@@ -245,6 +252,7 @@ foreach $_ (@ARGV) {
 	print STDERR "runtests [-tests=testfile] [-np=nprocesses] \
         [-maxnp=max-nprocesses] [-srcdir=location-of-tests] \
         [-ppn=max-proc-per-node] [-ppnarg=string] \
+        [-timelimitarg=string] \
         [-xmlfile=filename ] [-tapfile=filename ] \
         [-junitfile=filename ] [-noxmlclose] \
         [-verbose] [-showprogress] [-debug] [-batch]\n";
@@ -689,6 +697,13 @@ sub RunMPIProgram {
 	$extraArgs .= " " . $ppnargs;
     }
 
+    # Handle the timelimit option.
+    if ($timelimitArg ne "" && $timeout> 0) {
+        $tlargs = "";
+	$tlargs = $timelimitArg;
+	$tlargs =~ s/\%d/$timeout/;
+	$extraArgs .= " " . $tlargs;
+    }
 
     # Run the optional setup routine. For example, the timeout tests could
     # be set to a shorter timeout.
@@ -831,6 +846,15 @@ sub AddMPIProgram {
 	$extraArgs .= " " . $ppnargs;
     }
 
+    # Handle the timelimit option.
+    if ($timelimitArg ne "" && $timeout> 0) {
+        $tlargs = "";
+	$tlargs = $timelimitArg;
+	$tlargs =~ s/\%d/$timeout/;
+	$extraArgs .= " " . $tlargs;
+    }
+
+
     print STDOUT "Env includes $progEnv\n" if $verbose;
     print STDOUT "$mpiexec $np_arg $np $extraArgs $program_wrapper ./$programname $progArgs\n" if $verbose;
     print STDOUT "." if $showProgress;

http://git.mpich.org/mpich.git/commitdiff/ed1e71a092d69fd852152d8dbfc7a502e48c8316

commit ed1e71a092d69fd852152d8dbfc7a502e48c8316
Author: William Gropp <wgropp at illinois.edu>
Date:   Thu Dec 11 11:51:05 2014 -0800

    Correct testsuite dist target support
    
    Signed-off-by: Wesley Bland <wbland at anl.gov>

diff --git a/test/mpi/ckpoint/Makefile.am b/test/mpi/ckpoint/Makefile.am
index 456e04d..d897d14 100644
--- a/test/mpi/ckpoint/Makefile.am
+++ b/test/mpi/ckpoint/Makefile.am
@@ -7,6 +7,8 @@
 
 include $(top_srcdir)/Makefile.mtest
 
+EXTRA_DIST = testlist
+
 ## for all programs that are just built from the single corresponding source
 ## file, we don't need per-target _SOURCES rules, automake will infer them
 ## correctly
diff --git a/test/mpi/cxx/topo/Makefile.am b/test/mpi/cxx/topo/Makefile.am
index 7df4fd8..da38cda 100644
--- a/test/mpi/cxx/topo/Makefile.am
+++ b/test/mpi/cxx/topo/Makefile.am
@@ -7,7 +7,7 @@
 
 include $(top_srcdir)/Makefile_cxx.mtest
 
-EXTRA_DIST = testlist distgraphcxx.cxx
+EXTRA_DIST = testlist.in distgraphcxx.cxx
 
 # avoid having to write many "foo_SOURCES = foo.cxx" lines because
 # automake is too limited to figure this out for itself
diff --git a/test/mpi/datatype/Makefile.am b/test/mpi/datatype/Makefile.am
index 2840393..ee6d9c1 100644
--- a/test/mpi/datatype/Makefile.am
+++ b/test/mpi/datatype/Makefile.am
@@ -7,7 +7,7 @@
 
 include $(top_srcdir)/Makefile.mtest
 
-EXTRA_DIST = testlist
+EXTRA_DIST = testlist.in
 
 ## For some reason, these tests were missing from both the simplemakefile and
 ## the testlist.  Leaving them disabled for now.
diff --git a/test/mpi/errors/Makefile.am b/test/mpi/errors/Makefile.am
index 760e420..8500b36 100644
--- a/test/mpi/errors/Makefile.am
+++ b/test/mpi/errors/Makefile.am
@@ -7,6 +7,8 @@
 
 include $(top_srcdir)/Makefile.mtest
 
+EXTRA_DIST = testlist.in
+
 # FIXME should "datatype" be included in this list?  It was not in the
 # simplemake version for some reason and is also missing from the testlist
 # file
diff --git a/test/mpi/errors/f77/Makefile.am b/test/mpi/errors/f77/Makefile.am
index 1cb612a..39cca6c 100644
--- a/test/mpi/errors/f77/Makefile.am
+++ b/test/mpi/errors/f77/Makefile.am
@@ -7,7 +7,7 @@
 
 include $(top_srcdir)/Makefile_f77.mtest
 
-EXTRA_DIST = testlist
+EXTRA_DIST = testlist.in
 
 SUBDIRS = @iodir@
 DIST_SUBDIRS = io
diff --git a/test/mpi/errors/f90/Makefile.am b/test/mpi/errors/f90/Makefile.am
index 6824021..ab3aeed 100644
--- a/test/mpi/errors/f90/Makefile.am
+++ b/test/mpi/errors/f90/Makefile.am
@@ -7,7 +7,7 @@
 
 include $(top_srcdir)/Makefile_f90.mtest
 
-EXTRA_DIST = testlist
+EXTRA_DIST = testlist.in
 
 SUBDIRS = @iodir@
 DIST_SUBDIRS = io
diff --git a/test/mpi/f08/Makefile.am b/test/mpi/f08/Makefile.am
index 64d0fc8..40e7366 100644
--- a/test/mpi/f08/Makefile.am
+++ b/test/mpi/f08/Makefile.am
@@ -7,5 +7,7 @@
 
 include $(top_srcdir)/Makefile_f08.mtest
 
+EXTRA_DIST = testlist
+
 SUBDIRS = attr coll comm datatype ext info init io misc profile pt2pt rma spawn subarray timer topo
-DIST_SUBDIRS = $(SUBDIRS)
+DIST_SUBDIRS = $(SUBDIRS) util
diff --git a/test/mpi/f08/coll/Makefile.am b/test/mpi/f08/coll/Makefile.am
index 46ac1a1..1b7d0b1 100644
--- a/test/mpi/f08/coll/Makefile.am
+++ b/test/mpi/f08/coll/Makefile.am
@@ -7,6 +7,8 @@
 
 include $(top_srcdir)/Makefile_f08.mtest
 
+EXTRA_DIST = testlist
+
 # avoid having to write many "foo_SOURCES = foo.f90" lines
 AM_DEFAULT_SOURCE_EXT = .f90
 
diff --git a/test/mpi/f08/datatype/Makefile.am b/test/mpi/f08/datatype/Makefile.am
index 2de5301..8ab9147 100644
--- a/test/mpi/f08/datatype/Makefile.am
+++ b/test/mpi/f08/datatype/Makefile.am
@@ -9,6 +9,8 @@
 
 include $(top_srcdir)/Makefile_f08.mtest
 
+EXTRA_DIST = testlist
+
 # avoid having to write many "foo_SOURCES = foo.f90" lines
 AM_DEFAULT_SOURCE_EXT = .f90
 
diff --git a/test/mpi/f08/ext/Makefile.am b/test/mpi/f08/ext/Makefile.am
index 426f9ed..7beb9bd 100644
--- a/test/mpi/f08/ext/Makefile.am
+++ b/test/mpi/f08/ext/Makefile.am
@@ -7,7 +7,7 @@
 
 include $(top_srcdir)/Makefile_f08.mtest
 
-EXTRA_DIST = testlist.in
+EXTRA_DIST = testlist
 
 # allocmemf is an "extra" program because it requires a Fortran extension
 EXTRA_PROGRAMS = allocmemf90
diff --git a/test/mpi/f08/pt2pt/Makefile.am b/test/mpi/f08/pt2pt/Makefile.am
index 4df3e41..eb0e5dc 100644
--- a/test/mpi/f08/pt2pt/Makefile.am
+++ b/test/mpi/f08/pt2pt/Makefile.am
@@ -7,6 +7,8 @@
 
 include $(top_srcdir)/Makefile_f08.mtest
 
+EXTRA_DIST = testlist
+
 # avoid having to write many "foo_SOURCES = foo.f90" lines
 AM_DEFAULT_SOURCE_EXT = .f90
 
diff --git a/test/mpi/f08/rma/Makefile.am b/test/mpi/f08/rma/Makefile.am
index 4588779..0c45f79 100644
--- a/test/mpi/f08/rma/Makefile.am
+++ b/test/mpi/f08/rma/Makefile.am
@@ -9,7 +9,7 @@
 
 include $(top_srcdir)/Makefile_f08.mtest
 
-EXTRA_DIST = testlist
+EXTRA_DIST = testlist.in
 
 # avoid having to write many "foo_SOURCES = foo.f90" lines
 AM_DEFAULT_SOURCE_EXT = .f90
diff --git a/test/mpi/f77/rma/Makefile.am b/test/mpi/f77/rma/Makefile.am
index b0c820b..52ac79d 100644
--- a/test/mpi/f77/rma/Makefile.am
+++ b/test/mpi/f77/rma/Makefile.am
@@ -7,7 +7,7 @@
 
 include $(top_srcdir)/Makefile_f77.mtest
 
-EXTRA_DIST = testlist
+EXTRA_DIST = testlist.in
 
 # avoid having to write many "foo_SOURCES = foo.f" lines
 AM_DEFAULT_SOURCE_EXT = .f
diff --git a/test/mpi/f77/spawn/Makefile.am b/test/mpi/f77/spawn/Makefile.am
index 30f0302..36e6119 100644
--- a/test/mpi/f77/spawn/Makefile.am
+++ b/test/mpi/f77/spawn/Makefile.am
@@ -7,7 +7,7 @@
 
 include $(top_srcdir)/Makefile_f77.mtest
 
-EXTRA_DIST = testlist
+EXTRA_DIST = testlist.in
 
 # avoid having to write many "foo_SOURCES = foo.f" lines
 AM_DEFAULT_SOURCE_EXT = .f
diff --git a/test/mpi/f90/misc/Makefile.am b/test/mpi/f90/misc/Makefile.am
index 6d87255..600378f 100644
--- a/test/mpi/f90/misc/Makefile.am
+++ b/test/mpi/f90/misc/Makefile.am
@@ -9,6 +9,8 @@ include $(top_srcdir)/Makefile_f90.mtest
 
 EXTRA_DIST = testlist
 
+EXTRA_DIST += testlist.ap
+
 noinst_PROGRAMS = sizeof2
 sizeof2_SOURCES = sizeof2.f90
 
diff --git a/test/mpi/ft/Makefile.am b/test/mpi/ft/Makefile.am
index 11d9136..5b4a012 100644
--- a/test/mpi/ft/Makefile.am
+++ b/test/mpi/ft/Makefile.am
@@ -7,6 +7,8 @@
 
 include $(top_srcdir)/Makefile.mtest
 
+EXTRA_DIST = testlist
+
 ## for all programs that are just built from the single corresponding source
 ## file, we don't need per-target _SOURCES rules, automake will infer them
 ## correctly
diff --git a/test/mpi/io/Makefile.am b/test/mpi/io/Makefile.am
index 4d5ba77..8c64d42 100644
--- a/test/mpi/io/Makefile.am
+++ b/test/mpi/io/Makefile.am
@@ -7,7 +7,7 @@
 
 include $(top_srcdir)/Makefile.mtest
 
-EXTRA_DIST = testlist
+EXTRA_DIST = testlist.in
 
 ## for all programs that are just built from the single corresponding source
 ## file, we don't need per-target _SOURCES rules, automake will infer them
diff --git a/test/mpi/rma/Makefile.am b/test/mpi/rma/Makefile.am
index 0fce1ad..507de42 100644
--- a/test/mpi/rma/Makefile.am
+++ b/test/mpi/rma/Makefile.am
@@ -7,7 +7,7 @@
 
 include $(top_srcdir)/Makefile.mtest
 
-EXTRA_DIST = testlist
+EXTRA_DIST = testlist.in
 
 ## for all programs that are just built from the single corresponding source
 ## file, we don't need per-target _SOURCES rules, automake will infer them
diff --git a/test/mpi/util/Makefile.am b/test/mpi/util/Makefile.am
index 9f33fdd..a31cc70 100644
--- a/test/mpi/util/Makefile.am
+++ b/test/mpi/util/Makefile.am
@@ -2,6 +2,7 @@
 AM_CPPFLAGS = -I${srcdir}/../include -I../include
 
 mtest.$(OBJEXT): mtest.c
+mtest_datatype.$(OBJEXT): mtest_datatype.c mtest_datatype.h
 dtypes.$(OBJEXT): dtypes.c
 nbc_pmpi_adapter.$(OBJEXT): nbc_pmpi_adapter.c
 all-local: mtest.$(OBJEXT) dtypes.$(OBJEXT) nbc_pmpi_adapter.$(OBJEXT)
@@ -10,5 +11,9 @@ EXTRA_PROGRAMS = mtestcheck dtypes
 mtestcheck_SOURCES = mtestcheck.c mtest.c
 
 # exploiting the NBC PMPI adapter is still very much a manual process...
-EXTRA_DIST = nbc_pmpi_adapter.c
+# mtest_datatype.c and mtest_datatype_gen.c also needed
+# FIXME: mtest_datatype.h belongs with the other include files, in
+# ../include
+EXTRA_DIST = nbc_pmpi_adapter.c mtest_datatype.c mtest_datatype.h \
+	mtest_datatype_gen.c
 

http://git.mpich.org/mpich.git/commitdiff/6532a4765333f0e49b8d06214e9a2db637067a20

commit 6532a4765333f0e49b8d06214e9a2db637067a20
Author: William Gropp <wgropp at illinois.edu>
Date:   Thu Dec 11 11:50:31 2014 -0800

    Update test suite README
    
    Signed-off-by: Wesley Bland <wbland at anl.gov>

diff --git a/test/mpi/README b/test/mpi/README
index 7b81d59..fa8baaf 100644
--- a/test/mpi/README
+++ b/test/mpi/README
@@ -1,7 +1,7 @@
 MPICH Test Suite
 
 This test suite is a *supplement* to other test suites, including the
-original MPICH testsuite, the Intel testsuite, and the IBM MPI test suite 
+original MPICH testsuite, the Intel testsuite, and the IBM MPI test suite
 (or test suites derived from that test, including the MPI C++ tests).
 
 Building the Test Suite
@@ -12,7 +12,7 @@ automatically.  In some cases, it will need some help.  For example:
 For IBM MPI, where the compilation commands are not mpicc and mpif77 etc.:
 
 ./configure CC=xlc MPICC=mpcc F77=xlf MPIF77=mpxlf CXX=xlC \
-                       MPICXX="mpCC -cpp" F90=xlf90 MPIF90=mpxlf90 \
+                       MPICXX="mpCC -cpp" FC=xlf90 MPIFC=mpxlf90 \
 		       --disable-spawn \
 		       --enable-strictmpi
 
@@ -82,13 +82,25 @@ to checktests:
 
 cd btest && ../checktests --ignorebogus
 
+See "More control over running tests" to see how to control how many
+processes per node on used.  For example, on a Cray XE-6, this command
+line to runtests can be used:
+
+  runtests -batch -tests=testlist -ppnarg="-N %d" -ppn=2 -showprogress \
+           -mpiexec=aprun
+
+This runs at most 2 processes per node.  Note that this can take a long
+time to execute because it builds all of the executables required for the
+tests (over 800 of them!).  The "-showprogress" flag lets you know that
+something is happening, but is not necessary.
+
 Controlling the Tests that are Run
 ==================================
-The tests are actually built and run by the script "runtests".  This script 
+The tests are actually built and run by the script "runtests".  This script
 can be given a file that contains a list of the tests to run.  This file has
 two primary types of entries:
 
-    directories:  Enter directory and look for the file "testlist".  
+    directories:  Enter directory and look for the file "testlist".
                   Recursively run the contents of that file
     program names: Build and run that program
 
@@ -153,3 +165,41 @@ resultTest=proc : This is used to change the way in which the success or
                   in fact handled.
 
 
+More control over running tests
+===============================
+
+You can provide a "processes per node" argument to the run command (typically
+mpiexec) with either options to "runtests" or environment variables.
+The two values are
+
+-ppnarg=string or MPITEST_PPNARG
+   The string used to specify the number of processes per node.  The number
+   of processes to use will be substituted for the %d in the string.  For
+   example,
+
+   export MPITEST_PPNARG="-ppn %d"
+
+-ppn=n or MPITEST_PPNMAX
+   The maximum number of processes per node.  For example
+
+   runtests ... -ppn=2
+
+   This allows the runtests script to ensure that the value of the
+   processes per node argument does not exceed the total number of processes;
+   some run commands (e.g., aprun on Cray) require that the number of
+   processes per node be no greater than the total number of processes.
+
+Note that for most systems it will be important to run the tests
+multiple times, using this option to ensure that the tests that
+involve more than one process are run each of the following cases:
+1) Multiple MPI processes per chip (likely using shared memory to
+communicate between processes)
+2) MPI processes on separate chips within the same node (also likely
+using shared memory between processes, but may use a different
+approach to handle the NUMA nature of this case)
+3) MPI processes on separate nodes (likely using the best available
+interconnect).
+Note, this depends on the nature of the MPI implementation; these
+options make it easier to run the necessary cases.  If you run only
+the first case, which is often the default case, you may not
+effectively test the MPI implementation.

http://git.mpich.org/mpich.git/commitdiff/b33d3f60e2a86c6896f234a1893771fc2be49372

commit b33d3f60e2a86c6896f234a1893771fc2be49372
Author: William Gropp <wgropp at illinois.edu>
Date:   Thu Dec 11 11:50:00 2014 -0800

    Add support for procs-per-node to testsuite runs
    
    Signed-off-by: Wesley Bland <wbland at anl.gov>

diff --git a/test/mpi/runtests.in b/test/mpi/runtests.in
index ba8dd6a..3edc028 100644
--- a/test/mpi/runtests.in
+++ b/test/mpi/runtests.in
@@ -44,6 +44,12 @@ use File::Copy qw(move);
 $MPIMajorVersion = "@MPI_VERSION@";
 $MPIMinorVersion = "@MPI_SUBVERSION@";
 $mpiexec = "@MPIEXEC@";    # Name of mpiexec program (including path, if necessary)
+# ppnMax is the maximum number of processes per node.  -1 means ignore.
+# ppnArg is the argument to use to mpiexec - format is "string%d"; e.g.,
+# "-ppn %d"
+$ppnArg  = "";
+$ppnMax  = -1;
+#
 $testIsStrict = "@MPI_IS_STRICT@";
 $MPIhasMPIX   = "@MPI_HAS_MPIX@";
 $runxfail     = "@RUN_XFAIL@";
@@ -147,6 +153,13 @@ if (defined($ENV{'MPITEST_BATCH'})) {
 if (defined($ENV{'MPITEST_BATCHDIR'})) {
     $batrundir = $ENV{'MPITEST_BATCHDIR'};
 }
+# PPN support
+if (defined($ENV{'MPITEST_PPNARG'})) {
+    $ppnArg = $ENV{'MPITEST_PPNARG'};
+}
+if (defined($ENV{'MPITEST_PPNMAX'})) {
+    $ppnMax = $ENV{'MPITEST_PPNMAX'};
+}
 
 #---------------------------------------------------------------------------
 # Process arguments and override any defaults
@@ -157,8 +170,10 @@ foreach $_ (@ARGV) {
 	# we don't want to bother to try and find it.
 	$mpiexec = $1; 
     }
-    elsif (/--?np=(.*)/)   { $np_default = $1; }
-    elsif (/--?maxnp=(.*)/) { $np_max = $1; }
+    elsif (/--?np=(\d+)/)   { $np_default = $1; }
+    elsif (/--?maxnp=(\d+)/) { $np_max = $1; }
+    elsif (/--?ppn=(\d+)/)  { $ppnMax = $1; }
+    elsif (/--?ppnarg=(.*)/) { $ppnArg = $1; }
     elsif (/--?tests=(.*)/) { $listfiles = $1; }
     elsif (/--?srcdir=(.*)/) { $srcdir = $1; }
     elsif (/--?verbose/) { $verbose = 1; }
@@ -229,6 +244,7 @@ foreach $_ (@ARGV) {
 	print STDERR "Unrecognized argument $_\n";
 	print STDERR "runtests [-tests=testfile] [-np=nprocesses] \
         [-maxnp=max-nprocesses] [-srcdir=location-of-tests] \
+        [-ppn=max-proc-per-node] [-ppnarg=string] \
         [-xmlfile=filename ] [-tapfile=filename ] \
         [-junitfile=filename ] [-noxmlclose] \
         [-verbose] [-showprogress] [-debug] [-batch]\n";
@@ -648,6 +664,7 @@ sub RunMPIProgram {
     my $found_error   = 0;
     my $found_noerror = 0;
     my $inline = "";
+    my $extraArgs = "";
 
     &RunPreMsg( $programname, $np, $curdir );
 
@@ -659,14 +676,27 @@ sub RunMPIProgram {
 	$timeout = $timeLimit;
     }
     $ENV{"MPIEXEC_TIMEOUT"} = $timeout;
-    
+
+    # Handle the ppn (processes per node) option.
+    $ppnargs = "";
+    if ($ppnArg ne "" && $ppnMax > 0) {
+	$ppnargs = $ppnArg;
+	$nn = $ppnMax;
+	# Some systems require setting the number of processes per node
+	# no greater than the total number of processes (e.g., aprun on Cray)
+	if ($nn > $np) { $nn = $np; }
+	$ppnargs =~ s/\%d/$nn/;
+	$extraArgs .= " " . $ppnargs;
+    }
+
+
     # Run the optional setup routine. For example, the timeout tests could
     # be set to a shorter timeout.
     if ($InitForTest ne "") {
 	&$InitForTest();
     }
     print STDOUT "Env includes $progEnv\n" if $verbose;
-    print STDOUT "$mpiexec $np_arg $np $program_wrapper ./$programname $progArgs\n" if $verbose;
+    print STDOUT "$mpiexec $np_arg $np $extraArgs $mpiexecArgs $program_wrapper ./$programname $progArgs\n" if $verbose;
     print STDOUT "." if $showProgress;
     # Save and restore the environment if necessary before running mpiexec.
     if ($progEnv ne "") {
@@ -680,7 +710,7 @@ sub RunMPIProgram {
 	    }
 	}
     }
-    open ( MPIOUT, "$mpiexec $np_arg $np $mpiexecArgs $program_wrapper ./$programname $progArgs 2>&1 |" ) ||
+    open ( MPIOUT, "$mpiexec $np_arg $np $extraArgs $mpiexecArgs $program_wrapper ./$programname $progArgs 2>&1 |" ) ||
 	die "Could not run ./$programname\n";
     if ($progEnv ne "") {
 	%ENV = %saveEnv;
@@ -691,7 +721,7 @@ sub RunMPIProgram {
     }
     else {
 	if ($verbose) {
-	    $inline = "$mpiexec $np_arg $np $program_wrapper ./$programname\n";
+	    $inline = "$mpiexec $np_arg $np $extraArgs $mpiexecArgs $program_wrapper ./$programname\n";
 	}
 	else {
 	    $inline = "";
@@ -789,6 +819,18 @@ sub AddMPIProgram {
 	$extraArgs .= $timeoutArg
     }
 
+    # Handle the ppn (processes per node) option.
+    $ppnargs = "";
+    if ($ppnArg ne "" && $ppnMax > 0) {
+	$ppnargs = $ppnArg;
+	$nn = $ppnMax;
+	# Some systems require setting the number of processes per node
+	# no greater than the total number of processes (e.g., aprun on Cray)
+	if ($nn > $np) { $nn = $np; }
+	$ppnargs =~ s/\%d/$nn/;
+	$extraArgs .= " " . $ppnargs;
+    }
+
     print STDOUT "Env includes $progEnv\n" if $verbose;
     print STDOUT "$mpiexec $np_arg $np $extraArgs $program_wrapper ./$programname $progArgs\n" if $verbose;
     print STDOUT "." if $showProgress;

http://git.mpich.org/mpich.git/commitdiff/9e4d5049144bc773e0ac7560f2cd457e5b2820c8

commit 9e4d5049144bc773e0ac7560f2cd457e5b2820c8
Author: William Gropp <wgropp at illinois.edu>
Date:   Wed Dec 10 14:42:51 2014 -0800

    Check for endif in f77tof90 to avoid false warning
    
    Signed-off-by: Wesley Bland <wbland at anl.gov>

diff --git a/maint/f77tof90.in b/maint/f77tof90.in
index b6ecccc..340b047 100644
--- a/maint/f77tof90.in
+++ b/maint/f77tof90.in
@@ -332,7 +332,7 @@ sub ConvertMakefile {
         if (not m/^\s*#/) {
             while (m/\b(\w+f)\b/g) {
                 my $word = $1;
-                next if $word eq "if" or $word eq "rf"; # filter out some noise
+                next if $word eq "if" or $word eq "rf" or $word eq "endif"; # filter out some noise
                 if (-e "$indir/${word}.f" or
                     0 == system(qq(grep 'TESTDEFN filename="${word}\\.f"' '$indir/ioharness.defn' >/dev/null 2>&1)))
                 {

http://git.mpich.org/mpich.git/commitdiff/477961071f56bb7d36b38d1ac462fb073d554503

commit 477961071f56bb7d36b38d1ac462fb073d554503
Author: Wesley Bland <wbland at anl.gov>
Date:   Mon Jan 5 12:58:58 2015 -0600

    Bring test suite version in line with MPICH
    
    Instead of using its own versioning system that wasn't getting updated
    with any regularity, now the test suite will use the same versioning
    scheme as mainline MPICH. This is consistent with other parts of MPICH
    that get distributed separately (MPL, ROMIO, Hydra).
    
    Signed-off-by: Ken Raffenetti <raffenet at mcs.anl.gov>

diff --git a/.gitignore b/.gitignore
index 9712189..113379a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -119,6 +119,7 @@ tags
 /src/pm/hydra/mpl/confdb
 /src/mpi/romio/version.m4
 /src/pm/hydra/version.m4
+/test/mpi/version.m4
 
 # created by the build process in the test dirs
 gen-src-stamp
diff --git a/autogen.sh b/autogen.sh
index a286f5e..cdbd09d 100755
--- a/autogen.sh
+++ b/autogen.sh
@@ -105,6 +105,7 @@ done
 if [ -f maint/version.m4 ] ; then
     cp -pPR maint/version.m4 src/pm/hydra/version.m4
     cp -pPR maint/version.m4 src/mpi/romio/version.m4
+    cp -pPR maint/version.m4 test/mpi/version.m4
 fi
 
 # Now sanity check that some of the above sync was successful
diff --git a/test/mpi/configure.ac b/test/mpi/configure.ac
index 70c05f5..c22c245 100644
--- a/test/mpi/configure.ac
+++ b/test/mpi/configure.ac
@@ -11,8 +11,13 @@ dnl
 dnl The file name here refers to a file in the source being configured
 dnl FIXME this is the old style, needs updating to new style
 dnl AC_INIT(include/mpitest.h)
-dnl FIXME duplication with VERSION variable below
-AC_INIT([mpich-testsuite],[1.2])
+m4_include([version.m4])
+AC_INIT([mpich-testsuite],
+        MPICH_VERSION_m4,
+        [discuss at mpich.org],
+        [mpich-testsuite],
+        [http://www.mpich.org/])
+
 AC_CONFIG_HEADER(include/mpitestconf.h)
 AH_TOP([/* -*- Mode: C; c-basic-offset:4 ; -*- */
 /*  
@@ -23,11 +28,8 @@ AH_TOP([/* -*- Mode: C; c-basic-offset:4 ; -*- */
 #define MPITESTCONF_H_INCLUDED
 ])
 AH_BOTTOM([#endif])
-# This version is the version of the test suite.
-# 1.0: Initial version (all versions) before independent release
-# 1.1: Initial version that is independent of MPICH  11/08
-# 1.2: Automake replaces simplemake
-VERSION=1.2
+
+VERSION=MPICH_VERSION_m4
 AC_SUBST(VERSION)
 AC_CONFIG_AUX_DIR([confdb])
 AC_CONFIG_MACRO_DIR([confdb])

http://git.mpich.org/mpich.git/commitdiff/2eab48a7fc5995340b09b38deb74e8e202820a96

commit 2eab48a7fc5995340b09b38deb74e8e202820a96
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Mon Dec 15 13:18:07 2014 -0600

    disable libtool versioning for embedded libraries
    
    Squashes a warning when using the embedded versions of OPA and MPL.
    
    Signed-off-by: Sangmin Seo <sseo at anl.gov>

diff --git a/configure.ac b/configure.ac
index 064ac11..0af4994 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1163,7 +1163,9 @@ AC_SUBST([mpllibdir])
 mpllib=""
 AC_SUBST([mpllib])
 if test "$with_mpl_prefix" = "embedded" ; then
-    PAC_CONFIG_SUBDIR(src/mpl,,AC_MSG_ERROR(MPL configure failed))
+    # no need for libtool versioning when embedding MPL
+    mpl_subdir_args="--disable-versioning"
+    PAC_CONFIG_SUBDIR_ARGS([src/mpl],[$mpl_subdir_args],[],[AC_MSG_ERROR(MPL configure failed)])
     PAC_APPEND_FLAG([-I${master_top_builddir}/src/mpl/include], [CPPFLAGS])
     PAC_APPEND_FLAG([-I${use_top_srcdir}/src/mpl/include], [CPPFLAGS])
 
@@ -1229,9 +1231,10 @@ if test "$with_openpa_prefix" = "embedded" ; then
                                      implementation.  See the src/openpa directory
                                      for more info.])],
                     [],[with_atomic_primitives=not_specified])
-        opa_subdir_args=""
+        # no need for libtool versioning when embedding OPA
+        opa_subdir_args="--disable-versioning"
         if test "$with_atomic_primitives" = "not_specified" ; then
-            opa_subdir_args="--with-atomic-primitives=auto_allow_emulation"
+            PAC_APPEND_FLAG([--with-atomic-primitives=auto_allow_emulation], [opa_subdir_args])
         fi
         PAC_CONFIG_SUBDIR_ARGS([src/openpa],[$opa_subdir_args],[],[AC_MSG_ERROR([OpenPA configure failed])])
     else

http://git.mpich.org/mpich.git/commitdiff/d421f39c9a75ad5ed159e4eb6b7f9bff74c91d30

commit d421f39c9a75ad5ed159e4eb6b7f9bff74c91d30
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Thu Dec 4 13:54:58 2014 -0600

    remove mpl/opa libs from external flags when embedded
    
    We were incorrectly adding the build directories for mpl and opa to
    external_ldflags in Makefile.am, causing them to be listed in the
    installed libmpi.la libtool file. If a linker does not handle this
    potentially non-existant build directory gracefully, it could cause
    an issue. Since the mpl and opa libraries are now embedded in libmpi
    by default, we simply eliminate the flags unless we are using
    pre-built, external libraries. Fixes #2208
    
    Thanks to Markus Geimer for the bug report and suggested solution.
    
    Signed-off-by: Sangmin Seo <sseo at anl.gov>

diff --git a/Makefile.am b/Makefile.am
index d08430e..818f60d 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -48,7 +48,7 @@ pkgconfigdir = @pkgconfigdir@
 errnames_txt_files = 
 
 external_subdirs = @mplsrcdir@ @opasrcdir@
-external_ldflags = -L at mpllibdir@ -L at opalibdir@
+external_ldflags = @mpllibdir@ @opalibdir@
 external_libs = @EXTERNAL_LIBS@
 mpi_convenience_libs =
 pmpi_convenience_libs = @mpllib@ @opalib@
diff --git a/configure.ac b/configure.ac
index 552e1d9..064ac11 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1167,7 +1167,6 @@ if test "$with_mpl_prefix" = "embedded" ; then
     PAC_APPEND_FLAG([-I${master_top_builddir}/src/mpl/include], [CPPFLAGS])
     PAC_APPEND_FLAG([-I${use_top_srcdir}/src/mpl/include], [CPPFLAGS])
 
-    mpllibdir="${master_top_builddir}/src/mpl"
     mplsrcdir="${master_top_builddir}/src/mpl"
     mpllib="src/mpl/lib${MPLLIBNAME}.la"
 else
@@ -1178,7 +1177,7 @@ else
     PAC_APPEND_FLAG([-I${with_mpl_prefix}/include],[CPPFLAGS])
     PAC_PREPEND_FLAG([-l${MPLLIBNAME}],[EXTERNAL_LIBS])
     PAC_APPEND_FLAG([-L${with_mpl_prefix}/lib],[WRAPPER_LDFLAGS])
-    mpllibdir="${with_mpl_prefix}/lib"
+    mpllibdir="-L${with_mpl_prefix}/lib"
 fi
 
 # OpenPA
@@ -1218,7 +1217,6 @@ AC_SUBST([opalib])
 if test "$with_openpa_prefix" = "embedded" ; then
     if test -e "${use_top_srcdir}/src/openpa" ; then
         opasrcdir="${master_top_builddir}/src/openpa"
-        opalibdir="${master_top_builddir}/src/openpa/src"
         opalib="${master_top_builddir}/src/openpa/src/lib${OPALIBNAME}.la"
         PAC_APPEND_FLAG([-I${use_top_srcdir}/src/openpa/src],[CPPFLAGS])
         PAC_APPEND_FLAG([-I${master_top_builddir}/src/openpa/src],[CPPFLAGS])
@@ -1254,9 +1252,9 @@ else
     PAC_PREPEND_FLAG([-l${OPALIBNAME}],[EXTERNAL_LIBS])
     if test -d ${with_openpa_prefix}/lib64 ; then
         PAC_APPEND_FLAG([-L${with_openpa_prefix}/lib64],[WRAPPER_LDFLAGS])
-        opalibdir="${with_openpa_prefix}/lib64"
+        opalibdir="-L${with_openpa_prefix}/lib64"
     else
-        opalibdir="${with_openpa_prefix}/lib"
+        opalibdir="-L${with_openpa_prefix}/lib"
     fi
     PAC_APPEND_FLAG([-L${with_openpa_prefix}/lib],[WRAPPER_LDFLAGS])
 fi
diff --git a/examples/Makefile.am b/examples/Makefile.am
index 40d4cff..a6fe6bf 100644
--- a/examples/Makefile.am
+++ b/examples/Makefile.am
@@ -36,14 +36,14 @@ cpi_DEPENDENCIES =
 # the make-time instances of libpmpi.la and libmpi.la live here
 AM_LDFLAGS += -L../lib
 # the make-time instances of libmpl.la and libopa.la live here
-AM_LDFLAGS += -L at mpllibdir@ -L at opalibdir@
+AM_LDFLAGS += @mpllibdir@ @opalibdir@
 
 # Wrapper LDFLAGS need to be added at the end to make sure we link
 # with the libraries we just built, and not any previously installed
 # libraries.
 AM_LDFLAGS += $(WRAPPER_LDFLAGS)
 
-external_libs = -l at MPLLIBNAME@ -l at OPALIBNAME@ $(WRAPPER_LIBS)
+external_libs = $(WRAPPER_LIBS)
 if BUILD_PROFILING_LIB
 LIBS += -l at PMPILIBNAME@
 cpi_DEPENDENCIES += ../lib/lib at PMPILIBNAME@.la

http://git.mpich.org/mpich.git/commitdiff/cdfe67b858de0822dca08e97b24446452b49aeae

commit cdfe67b858de0822dca08e97b24446452b49aeae
Author: Paul Coffman <pkcoff at us.ibm.com>
Date:   Fri Dec 12 14:51:18 2014 -0600

    barrier in close whenever shared files supported
    
    Currently in the MPI_File_close there is a barrier in place whenever the
    ADIO_SHARED_FP feature is enabled AND the ADIO_UNLINK_AFTER_CLOSE
    feature is disabled right before the code to close the shared file
    pointer and potentially unlink the shared file itself.  PE testing on
    GPFS revealed a situation using the non-collective
    MPI_File_read_shared/MPI_File_write_shared
    where based on this implementation all tasks needed to wait for all
    other tasks to complete processing before unlinking the shared file
    pointer or the open of the shared file pointer could fail.  This
    situation is illustrated as follows with the simplest example of 2 tasks
    that do this:
    MPI_File_Open
    MPI_File_set_view
    MPI_File_Read_shared
    MPI_File_close
    
    So both tasks call MPI_File_Read_shared at the same time which first
    does the ADIO_Get_shared_fp which does the file open with create mode on
    the shared file pointer.   Only 1 task can actually create the file, so
    there is a race to see who can get it done first.  If task 0 gets it
    created then he is the winner and goes on to use it, read the file and
    then MPI_File_close which then unlinks the shared file pointer first and
    then closes the output file.  Meanwhile, task 1 lost the race to create
    the file and is in error, the error handling in gpfs goes into effect
    and task 1 now just tries to open the file that task 0 created.  The
    problem is this error handling took longer that task 0 took to read and
    close the output file, so at the time when task 0 does the close he is
    the only process with a link since task 1 is still in the create file
    error handlilng code so therefore gpfs goes ahead and deletes the shared
    file pointer.  Then when the error handling code for task 1 does
    complete and he tries to do the open, the file is no longer there, so
    the open fails as does the subsequent read of the shared file pointer.
    Currently GPFS has the ADIO_UNLINK_AFTER_CLOSE  feature enabled, so the
    fix for this is to remove the additional condition of
    ADIO_UNLINK_AFTER_CLOSE  being disabled for the barrier in the close to
    be done.  Presumably this could be an issue for any parallel file system
    so this change is being done in the common code.
    
    See ticket #2214
    
    Signed-off-by: Paul Coffman <pkcoff at us.ibm.com>
    Signed-off-by: Rob Latham <robl at mcs.anl.gov>

diff --git a/src/mpi/romio/mpi-io/close.c b/src/mpi/romio/mpi-io/close.c
index cb1df99..520f206 100644
--- a/src/mpi/romio/mpi-io/close.c
+++ b/src/mpi/romio/mpi-io/close.c
@@ -58,9 +58,9 @@ int MPI_File_close(MPI_File *fh)
 	/* POSIX semantics say a deleted file remains available until all
 	 * processes close the file.  But since when was NFS posix-compliant?
 	 */
-	if (!ADIO_Feature(adio_fh, ADIO_UNLINK_AFTER_CLOSE)) {
-		MPI_Barrier((adio_fh)->comm);
-	}
+	/* this used to be gated by the lack of the UNLINK_AFTER_CLOSE feature,
+	 * but a race condition in GPFS necessated this.  See ticket #2214 */
+	MPI_Barrier((adio_fh)->comm);
 	if ((adio_fh)->shared_fp_fd != ADIO_FILE_NULL) {
 	    MPI_File *fh_shared = &(adio_fh->shared_fp_fd);
 	    ADIO_Close((adio_fh)->shared_fp_fd, &error_code);

http://git.mpich.org/mpich.git/commitdiff/f1a7ea41e4214a331c2232ad07910d5a09d056eb

commit f1a7ea41e4214a331c2232ad07910d5a09d056eb
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Wed Dec 10 10:26:22 2014 -0800

    hydra: fix zero-length format string warnings
    
    Signed-off-by: Junchao Zhang <jczhang at mcs.anl.gov>

diff --git a/src/pm/hydra/ui/mpich/utils.c b/src/pm/hydra/ui/mpich/utils.c
index 7941673..d72172f 100644
--- a/src/pm/hydra/ui/mpich/utils.c
+++ b/src/pm/hydra/ui/mpich/utils.c
@@ -151,7 +151,7 @@ static HYD_status help_fn(char *arg, char ***argv)
     HYD_status status = HYD_SUCCESS;
 
     help_help_fn();
-    HYDU_ERR_SETANDJUMP(status, HYD_GRACEFUL_ABORT, "");
+    HYDU_ERR_SETANDJUMP(status, HYD_GRACEFUL_ABORT, "%s", "");
 
   fn_exit:
     return status;
@@ -1227,7 +1227,7 @@ static HYD_status info_fn(char *arg, char ***argv)
                        "    Demux engines available:                 %s\n",
                        HYDRA_AVAILABLE_DEMUXES);
 
-    HYDU_ERR_SETANDJUMP(status, HYD_GRACEFUL_ABORT, "");
+    HYDU_ERR_SETANDJUMP(status, HYD_GRACEFUL_ABORT, "%s", "");
 
   fn_exit:
     return status;
diff --git a/src/pm/hydra/utils/args/args.c b/src/pm/hydra/utils/args/args.c
index 56a5571..929d7e0 100644
--- a/src/pm/hydra/utils/args/args.c
+++ b/src/pm/hydra/utils/args/args.c
@@ -144,7 +144,7 @@ static HYD_status match_arg(char ***argv_p, struct HYD_arg_match_table *match_ta
                 }
                 else {
                     m->help_fn();
-                    HYDU_ERR_SETANDJUMP(status, HYD_GRACEFUL_ABORT, "");
+                    HYDU_ERR_SETANDJUMP(status, HYD_GRACEFUL_ABORT, "%s", "");
                 }
             }
 
diff --git a/src/pm/hydra/utils/string/string.c b/src/pm/hydra/utils/string/string.c
index b2083dc..841177c 100644
--- a/src/pm/hydra/utils/string/string.c
+++ b/src/pm/hydra/utils/string/string.c
@@ -89,7 +89,7 @@ HYD_status HYDU_strsplit(char *str, char **str1, char **str2, char sep)
     HYDU_FUNC_ENTER();
 
     if (str == NULL)
-        HYDU_ERR_SETANDJUMP(status, HYD_INTERNAL_ERROR, "");
+        HYDU_ERR_SETANDJUMP(status, HYD_INTERNAL_ERROR, "%s", "");
 
     *str1 = HYDU_strdup(str);
     for (i = 0; (*str1)[i] && ((*str1)[i] != sep); i++);

http://git.mpich.org/mpich.git/commitdiff/40d115e8934e7fb578420e3f6bf2ee871f9fac77

commit 40d115e8934e7fb578420e3f6bf2ee871f9fac77
Author: Junchao Zhang <jczhang at mcs.anl.gov>
Date:   Wed Dec 17 13:46:16 2014 -0600

    Fix a typo in mpivar.c
    
    Signed-off-by: Rob Latham <robl at mcs.anl.gov>

diff --git a/src/env/mpivars.c b/src/env/mpivars.c
index 91e21aa..35b0ed5 100644
--- a/src/env/mpivars.c
+++ b/src/env/mpivars.c
@@ -78,7 +78,7 @@ int main( int argc, char *argv[] )
     for (i=1; i<argc; i++) {
         /* Check for "no descriptions" */
         if (strcmp( argv[i], "--nodesc" ) == 0 ||
-            strcmp( argv[i], "-nodex" ) == 0) showDesc = 0;
+            strcmp( argv[i], "-nodesc" ) == 0) showDesc = 0;
         else {
             if (wrank == 0) {
                 fprintf( stderr, "Unrecognized command line argument %s\n",

http://git.mpich.org/mpich.git/commitdiff/bf882b60cb1d43952c433e5f41c0bbdf2b33d15d

commit bf882b60cb1d43952c433e5f41c0bbdf2b33d15d
Author: Junchao Zhang <jczhang at mcs.anl.gov>
Date:   Fri Dec 5 13:35:30 2014 -0600

    Fix misleading text in doc for error code MPI_ERR_TYPE
    
    See http://lists.mpich.org/pipermail/discuss/2014-December/003531.html
    
    Signed-off-by: Ken Raffenetti <raffenet at mcs.anl.gov>

diff --git a/maint/docnotes b/maint/docnotes
index b8cf2cd..215ac56 100644
--- a/maint/docnotes
+++ b/maint/docnotes
@@ -200,8 +200,9 @@ N*/
   non-negative; a count of zero is often valid.
 N*/
 /*N MPI_ERR_TYPE
-. MPI_ERR_TYPE - Invalid datatype argument.  May be an uncommitted 
-  MPI_Datatype (see 'MPI_Type_commit').
+. MPI_ERR_TYPE - Invalid datatype argument.  Additionally, this error can
+  occur if an uncommitted MPI_Datatype (see 'MPI_Type_commit') is used
+  in a communication call.
 N*/
 /*N MPI_ERR_TAG
 .  MPI_ERR_TAG - Invalid tag argument.  Tags must be non-negative; tags

http://git.mpich.org/mpich.git/commitdiff/e33e3a7caea6c6b47c433a94cd442258a8670c8d

commit e33e3a7caea6c6b47c433a94cd442258a8670c8d
Author: Charles J Archer <charles.j.archer at intel.com>
Date:   Thu Dec 11 19:10:47 2014 -0800

    Include uppercase SFI to OFI in netmod rename

diff --git a/src/mpid/ch3/channels/nemesis/netmod/ofi/Makefile.mk b/src/mpid/ch3/channels/nemesis/netmod/ofi/Makefile.mk
index 0e19072..70487e3 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/ofi/Makefile.mk
+++ b/src/mpid/ch3/channels/nemesis/netmod/ofi/Makefile.mk
@@ -4,7 +4,7 @@
 ## (C) 2011 by Argonne National Laboratory.
 ##     See COPYRIGHT in top-level directory.
 ##
-if BUILD_NEMESIS_NETMOD_SFI
+if BUILD_NEMESIS_NETMOD_OFI
 
 mpi_core_sources +=                                 		\
     src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_init.c 	\
diff --git a/src/mpid/ch3/channels/nemesis/netmod/ofi/errnames.txt b/src/mpid/ch3/channels/nemesis/netmod/ofi/errnames.txt
index 16c61b0..620e52d 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/ofi/errnames.txt
+++ b/src/mpid/ch3/channels/nemesis/netmod/ofi/errnames.txt
@@ -1,42 +1,42 @@
-**ofi_avmap:SFI get address vector map failed
-**ofi_avmap %s %d %s %s:SFI address vector map failed (%s:%d:%s:%s)
-**ofi_tsend:SFI tagged sendto failed
-**ofi_tsend %s %d %s %s:SFI tagged sendto failed (%s:%d:%s:%s)
-**ofi_trecv:SFI tagged recvfrom failed
-**ofi_trecv %s %d %s %s:SFI tagged recvfrom failed (%s:%d:%s:%s)
-**ofi_getinfo:SFI getinfo() failed
-**ofi_getinfo %s %d %s %s:SFI getinfo() failed (%s:%d:%s:%s)
-**ofi_openep:SFI endpoint open failed
-**ofi_openep %s %d %s %s:SFI endpoint open failed (%s:%d:%s:%s)
-**ofi_openfabric:SFI fabric open failure
-**ofi_openfabric %s %d %s %s:SFI fabric open failed (%s:%d:%s:%s)
-**ofi_opendomain:SFI domain open failure
-**ofi_opendomain %s %d %s %s:SFI domain open failed (%s:%d:%s:%s)
-**ofi_opencq:SFI event queue create failure
-**ofi_opencq %s %d %s %s:SFI event queue create failed (%s:%d:%s:%s)
-**ofi_avopen:SFI address vector open failed
-**ofi_avopen %s %d %s %s:SFI address vector open failed (%s:%d:%s:%s)
-**ofi_bind:SFI resource bind failure
-**ofi_bind %s %d %s %s:SFI resource bind failed (%s:%d:%s:%s)
-**ofi_ep_enable:SFI endpoint enable failed
-**ofi_ep_enable %s %d %s %s:SFI endpoint enable failed (%s:%d:%s:%s)
-**ofi_getname:SFI get endpoint name failed
-**ofi_getname %s %d %s %s:SFI get endpoint name failed (%s:%d:%s:%s)
-**ofi_avclose:SFI av close failed
-**ofi_avclose %s %d %s %s:SFI av close failed (%s:%d:%s:%s)
-**ofi_epclose:SFI endpoint close failed
-**ofi_epclose %s %d %s %s:SFI endpoint close failed (%s:%d:%s:%s)
-**ofi_cqclose:SFI cq close failed
-**ofi_cqclose %s %d %s %s:SFI cq close failed (%s:%d:%s:%s)
-**ofi_mrclose:SFI mr close failed
-**ofi_mrclose %s %d %s %s:SFI mr close failed (%s:%d:%s:%s)
-**ofi_fabricclose:SFI fabric close failed
-**ofi_fabricclose %s %d %s %s:SFI fabric close failed (%s:%d:%s:%s)
-**ofi_domainclose:SFI domain close failed
-**ofi_domainclose %s %d %s %s:SFI domain close failed (%s:%d:%s:%s)
-**ofi_tsearch:SFI tsearch failed
-**ofi_tsearch %s %d %s %s:SFI tsearch failed (%s:%d:%s:%s)
-**ofi_poll:SFI poll failed
-**ofi_poll %s %d %s %s:SFI poll failed (%s:%d:%s:%s)
-**ofi_cancel:SFI cancel failed
-**ofi_cancel %s %d %s %s:SFI cancel failed (%s:%d:%s:%s)
+**ofi_avmap:OFI get address vector map failed
+**ofi_avmap %s %d %s %s:OFI address vector map failed (%s:%d:%s:%s)
+**ofi_tsend:OFI tagged sendto failed
+**ofi_tsend %s %d %s %s:OFI tagged sendto failed (%s:%d:%s:%s)
+**ofi_trecv:OFI tagged recvfrom failed
+**ofi_trecv %s %d %s %s:OFI tagged recvfrom failed (%s:%d:%s:%s)
+**ofi_getinfo:OFI getinfo() failed
+**ofi_getinfo %s %d %s %s:OFI getinfo() failed (%s:%d:%s:%s)
+**ofi_openep:OFI endpoint open failed
+**ofi_openep %s %d %s %s:OFI endpoint open failed (%s:%d:%s:%s)
+**ofi_openfabric:OFI fabric open failure
+**ofi_openfabric %s %d %s %s:OFI fabric open failed (%s:%d:%s:%s)
+**ofi_opendomain:OFI domain open failure
+**ofi_opendomain %s %d %s %s:OFI domain open failed (%s:%d:%s:%s)
+**ofi_opencq:OFI event queue create failure
+**ofi_opencq %s %d %s %s:OFI event queue create failed (%s:%d:%s:%s)
+**ofi_avopen:OFI address vector open failed
+**ofi_avopen %s %d %s %s:OFI address vector open failed (%s:%d:%s:%s)
+**ofi_bind:OFI resource bind failure
+**ofi_bind %s %d %s %s:OFI resource bind failed (%s:%d:%s:%s)
+**ofi_ep_enable:OFI endpoint enable failed
+**ofi_ep_enable %s %d %s %s:OFI endpoint enable failed (%s:%d:%s:%s)
+**ofi_getname:OFI get endpoint name failed
+**ofi_getname %s %d %s %s:OFI get endpoint name failed (%s:%d:%s:%s)
+**ofi_avclose:OFI av close failed
+**ofi_avclose %s %d %s %s:OFI av close failed (%s:%d:%s:%s)
+**ofi_epclose:OFI endpoint close failed
+**ofi_epclose %s %d %s %s:OFI endpoint close failed (%s:%d:%s:%s)
+**ofi_cqclose:OFI cq close failed
+**ofi_cqclose %s %d %s %s:OFI cq close failed (%s:%d:%s:%s)
+**ofi_mrclose:OFI mr close failed
+**ofi_mrclose %s %d %s %s:OFI mr close failed (%s:%d:%s:%s)
+**ofi_fabricclose:OFI fabric close failed
+**ofi_fabricclose %s %d %s %s:OFI fabric close failed (%s:%d:%s:%s)
+**ofi_domainclose:OFI domain close failed
+**ofi_domainclose %s %d %s %s:OFI domain close failed (%s:%d:%s:%s)
+**ofi_tsearch:OFI tsearch failed
+**ofi_tsearch %s %d %s %s:OFI tsearch failed (%s:%d:%s:%s)
+**ofi_poll:OFI poll failed
+**ofi_poll %s %d %s %s:OFI poll failed (%s:%d:%s:%s)
+**ofi_cancel:OFI cancel failed
+**ofi_cancel %s %d %s %s:OFI cancel failed (%s:%d:%s:%s)
diff --git a/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_cm.c b/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_cm.c
index 686a905..31b38fc 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_cm.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_cm.c
@@ -38,7 +38,7 @@ static inline MPIDI_VC_t *ofi_tag_to_vc(uint64_t match_bits)
         port = get_port(match_bits);
         vc = gl_data.cm_vcs;
         while (vc && vc->port_name_tag != port) {
-            vc = VC_SFI(vc)->next;
+            vc = VC_OFI(vc)->next;
         }
         if (NULL == vc) {
             MPIU_Assertp(0);
@@ -90,7 +90,7 @@ static inline MPIDI_VC_t *ofi_tag_to_vc(uint64_t match_bits)
 static inline int MPID_nem_ofi_conn_req_callback(cq_tagged_entry_t * wc, MPID_Request * rreq)
 {
     int ret, len, mpi_errno = MPI_SUCCESS;
-    char bc[SFI_KVSAPPSTRLEN];
+    char bc[OFI_KVSAPPSTRLEN];
 
     MPIDI_VC_t *vc;
     char *addr = NULL;
@@ -103,12 +103,12 @@ static inline int MPID_nem_ofi_conn_req_callback(cq_tagged_entry_t * wc, MPID_Re
     MPIU_Assert(gl_data.conn_req == rreq);
     FI_RC(fi_trecv(gl_data.endpoint,
                        gl_data.conn_req->dev.user_buf,
-                       SFI_KVSAPPSTRLEN,
+                       OFI_KVSAPPSTRLEN,
                        gl_data.mr,
                        0,
                        MPID_CONN_REQ,
                        ~MPID_PROTOCOL_MASK,
-                       (void *) &(REQ_SFI(gl_data.conn_req)->ofi_context)), trecv);
+                       (void *) &(REQ_OFI(gl_data.conn_req)->ofi_context)), trecv);
 
     addr = MPIU_Malloc(gl_data.bound_addrlen);
     MPIU_Assertp(addr);
@@ -118,16 +118,16 @@ static inline int MPID_nem_ofi_conn_req_callback(cq_tagged_entry_t * wc, MPID_Re
 
     MPIDI_VC_Init(vc, NULL, 0);
     MPI_RC(MPIDI_GetTagFromPort(bc, &vc->port_name_tag));
-    ret = MPIU_Str_get_binary_arg(bc, "SFI", addr, gl_data.bound_addrlen, &len);
+    ret = MPIU_Str_get_binary_arg(bc, "OFI", addr, gl_data.bound_addrlen, &len);
     MPIU_ERR_CHKANDJUMP((ret != MPIU_STR_SUCCESS && ret != MPIU_STR_NOMEM) ||
                         (size_t) len != gl_data.bound_addrlen,
                         mpi_errno, MPI_ERR_OTHER, "**badbusinesscard");
 
     FI_RC(fi_av_insert(gl_data.av, addr, 1, &direct_addr, 0ULL, NULL), avmap);
-    VC_SFI(vc)->direct_addr = direct_addr;
-    VC_SFI(vc)->ready = 1;
-    VC_SFI(vc)->is_cmvc = 1;
-    VC_SFI(vc)->next = gl_data.cm_vcs;
+    VC_OFI(vc)->direct_addr = direct_addr;
+    VC_OFI(vc)->ready = 1;
+    VC_OFI(vc)->is_cmvc = 1;
+    VC_OFI(vc)->next = gl_data.cm_vcs;
     gl_data.cm_vcs = vc;
 
     MPIDI_CH3I_Acceptq_enqueue(vc, vc->port_name_tag);
@@ -159,10 +159,10 @@ static inline int MPID_nem_ofi_handle_packet(cq_tagged_entry_t * wc ATTRIBUTE((u
 
     BEGIN_FUNC(FCNAME);
     if (rreq->cc == 1) {
-        vc = REQ_SFI(rreq)->vc;
+        vc = REQ_OFI(rreq)->vc;
         MPIU_Assert(vc);
-        MPI_RC(MPID_nem_handle_pkt(vc, REQ_SFI(rreq)->pack_buffer, REQ_SFI(rreq)->pack_buffer_size))
-            MPIU_Free(REQ_SFI(rreq)->pack_buffer);
+        MPI_RC(MPID_nem_handle_pkt(vc, REQ_OFI(rreq)->pack_buffer, REQ_OFI(rreq)->pack_buffer_size))
+            MPIU_Free(REQ_OFI(rreq)->pack_buffer);
     }
     MPIDI_CH3U_Request_complete(rreq);
     END_FUNC_RC(FCNAME);
@@ -179,7 +179,7 @@ static inline int MPID_nem_ofi_cts_send_callback(cq_tagged_entry_t * wc, MPID_Re
 {
     int mpi_errno = MPI_SUCCESS;
     BEGIN_FUNC(FCNAME);
-    MPI_RC(MPID_nem_ofi_handle_packet(wc, REQ_SFI(sreq)->parent));
+    MPI_RC(MPID_nem_ofi_handle_packet(wc, REQ_OFI(sreq)->parent));
     MPIDI_CH3U_Request_complete(sreq);
     END_FUNC_RC(FCNAME);
 }
@@ -217,28 +217,28 @@ static inline int MPID_nem_ofi_preposted_callback(cq_tagged_entry_t * wc, MPID_R
     MPID_cc_incr(new_rreq->cc_ptr, &c);
     new_rreq->dev.OnDataAvail = NULL;
     new_rreq->dev.next = NULL;
-    REQ_SFI(new_rreq)->event_callback = MPID_nem_ofi_handle_packet;
-    REQ_SFI(new_rreq)->vc = vc;
-    REQ_SFI(new_rreq)->pack_buffer = pack_buffer;
-    REQ_SFI(new_rreq)->pack_buffer_size = pkt_len;
+    REQ_OFI(new_rreq)->event_callback = MPID_nem_ofi_handle_packet;
+    REQ_OFI(new_rreq)->vc = vc;
+    REQ_OFI(new_rreq)->pack_buffer = pack_buffer;
+    REQ_OFI(new_rreq)->pack_buffer_size = pkt_len;
     FI_RC(fi_trecv(gl_data.endpoint,
-                       REQ_SFI(new_rreq)->pack_buffer,
-                       REQ_SFI(new_rreq)->pack_buffer_size,
+                       REQ_OFI(new_rreq)->pack_buffer,
+                       REQ_OFI(new_rreq)->pack_buffer_size,
                        gl_data.mr,
-                       VC_SFI(vc)->direct_addr,
-                       wc->tag | MPID_MSG_DATA, 0, &(REQ_SFI(new_rreq)->ofi_context)), trecv);
+                       VC_OFI(vc)->direct_addr,
+                       wc->tag | MPID_MSG_DATA, 0, &(REQ_OFI(new_rreq)->ofi_context)), trecv);
 
     MPID_nem_ofi_create_req(&sreq, 1);
     sreq->dev.OnDataAvail = NULL;
     sreq->dev.next = NULL;
-    REQ_SFI(sreq)->event_callback = MPID_nem_ofi_cts_send_callback;
-    REQ_SFI(sreq)->parent = new_rreq;
+    REQ_OFI(sreq)->event_callback = MPID_nem_ofi_cts_send_callback;
+    REQ_OFI(sreq)->parent = new_rreq;
     FI_RC(fi_tsend(gl_data.endpoint,
                      NULL,
                      0,
                      gl_data.mr,
-                     VC_SFI(vc)->direct_addr,
-                     wc->tag | MPID_MSG_CTS, &(REQ_SFI(sreq)->ofi_context)), tsend);
+                     VC_OFI(vc)->direct_addr,
+                     wc->tag | MPID_MSG_CTS, &(REQ_OFI(sreq)->ofi_context)), tsend);
     MPIU_Assert(gl_data.persistent_req == rreq);
 
     rreq->dev.user_count = 0;
@@ -248,7 +248,7 @@ static inline int MPID_nem_ofi_preposted_callback(cq_tagged_entry_t * wc, MPID_R
                        gl_data.mr,
                        0,
                        MPID_MSG_RTS,
-                       ~MPID_PROTOCOL_MASK, &(REQ_SFI(rreq)->ofi_context)), trecv);
+                       ~MPID_PROTOCOL_MASK, &(REQ_OFI(rreq)->ofi_context)), trecv);
     END_FUNC_RC(FCNAME);
 }
 
@@ -264,8 +264,8 @@ int MPID_nem_ofi_connect_to_root_callback(cq_tagged_entry_t * wc ATTRIBUTE((unus
     int mpi_errno = MPI_SUCCESS;
     BEGIN_FUNC(FCNAME);
 
-    if (REQ_SFI(sreq)->pack_buffer)
-        MPIU_Free(REQ_SFI(sreq)->pack_buffer);
+    if (REQ_OFI(sreq)->pack_buffer)
+        MPIU_Free(REQ_OFI(sreq)->pack_buffer);
     MPIDI_CH3U_Request_complete(sreq);
 
     END_FUNC(FCNAME);
@@ -301,8 +301,8 @@ int MPID_nem_ofi_cm_init(MPIDI_PG_t * pg_p, int pg_rank ATTRIBUTE((unused)))
     MPID_nem_ofi_create_req(&persistent_req, 1);
     persistent_req->dev.OnDataAvail = NULL;
     persistent_req->dev.next = NULL;
-    REQ_SFI(persistent_req)->vc = NULL;
-    REQ_SFI(persistent_req)->event_callback = MPID_nem_ofi_preposted_callback;
+    REQ_OFI(persistent_req)->vc = NULL;
+    REQ_OFI(persistent_req)->event_callback = MPID_nem_ofi_preposted_callback;
     FI_RC(fi_trecv(gl_data.endpoint,
                        &persistent_req->dev.user_count,
                        sizeof persistent_req->dev.user_count,
@@ -310,25 +310,25 @@ int MPID_nem_ofi_cm_init(MPIDI_PG_t * pg_p, int pg_rank ATTRIBUTE((unused)))
                        0,
                        MPID_MSG_RTS,
                        ~MPID_PROTOCOL_MASK,
-                       (void *) &(REQ_SFI(persistent_req)->ofi_context)), trecv);
+                       (void *) &(REQ_OFI(persistent_req)->ofi_context)), trecv);
     gl_data.persistent_req = persistent_req;
 
     /* --------------------------------- */
     /* Post recv for connection requests */
     /* --------------------------------- */
     MPID_nem_ofi_create_req(&conn_req, 1);
-    conn_req->dev.user_buf = MPIU_Malloc(SFI_KVSAPPSTRLEN * sizeof(char));
+    conn_req->dev.user_buf = MPIU_Malloc(OFI_KVSAPPSTRLEN * sizeof(char));
     conn_req->dev.OnDataAvail = NULL;
     conn_req->dev.next = NULL;
-    REQ_SFI(conn_req)->vc = NULL;       /* We don't know the source yet */
-    REQ_SFI(conn_req)->event_callback = MPID_nem_ofi_conn_req_callback;
+    REQ_OFI(conn_req)->vc = NULL;       /* We don't know the source yet */
+    REQ_OFI(conn_req)->event_callback = MPID_nem_ofi_conn_req_callback;
     FI_RC(fi_trecv(gl_data.endpoint,
                        conn_req->dev.user_buf,
-                       SFI_KVSAPPSTRLEN,
+                       OFI_KVSAPPSTRLEN,
                        gl_data.mr,
                        0,
                        MPID_CONN_REQ,
-                       ~MPID_PROTOCOL_MASK, (void *) &(REQ_SFI(conn_req)->ofi_context)), trecv);
+                       ~MPID_PROTOCOL_MASK, (void *) &(REQ_OFI(conn_req)->ofi_context)), trecv);
     gl_data.conn_req = conn_req;
 
 
@@ -351,12 +351,12 @@ int MPID_nem_ofi_cm_finalize()
     int mpi_errno = MPI_SUCCESS;
     BEGIN_FUNC(FCNAME);
     FI_RC(fi_cancel((fid_t) gl_data.endpoint,
-                    &(REQ_SFI(gl_data.persistent_req)->ofi_context)), cancel);
+                    &(REQ_OFI(gl_data.persistent_req)->ofi_context)), cancel);
     MPIR_STATUS_SET_CANCEL_BIT(gl_data.persistent_req->status, TRUE);
     MPIR_STATUS_SET_COUNT(gl_data.persistent_req->status, 0);
     MPIDI_CH3U_Request_complete(gl_data.persistent_req);
 
-    FI_RC(fi_cancel((fid_t) gl_data.endpoint, &(REQ_SFI(gl_data.conn_req)->ofi_context)), cancel);
+    FI_RC(fi_cancel((fid_t) gl_data.endpoint, &(REQ_OFI(gl_data.conn_req)->ofi_context)), cancel);
     MPIU_Free(gl_data.conn_req->dev.user_buf);
     MPIR_STATUS_SET_CANCEL_BIT(gl_data.conn_req->status, TRUE);
     MPIR_STATUS_SET_COUNT(gl_data.conn_req->status, 0);
@@ -373,31 +373,31 @@ int MPID_nem_ofi_cm_finalize()
 /* Handle CH3/Nemesis VC connections                                        */
 /*   * Query the VC address information.  In particular we are looking for  */
 /*     the fabric address name.                                             */
-/*   * Use fi_av_insert to register the address name with SFI               */
+/*   * Use fi_av_insert to register the address name with OFI               */
 /* ------------------------------------------------------------------------ */
 #undef FCNAME
 #define FCNAME DECL_FUNC(MPID_nem_ofi_vc_connect)
 int MPID_nem_ofi_vc_connect(MPIDI_VC_t * vc)
 {
     int len, ret, mpi_errno = MPI_SUCCESS;
-    char bc[SFI_KVSAPPSTRLEN], *addr = NULL;
+    char bc[OFI_KVSAPPSTRLEN], *addr = NULL;
 
     BEGIN_FUNC(FCNAME);
     addr = MPIU_Malloc(gl_data.bound_addrlen);
     MPIU_Assert(addr);
-    MPIU_Assert(1 != VC_SFI(vc)->ready);
+    MPIU_Assert(1 != VC_OFI(vc)->ready);
 
     if (!vc->pg || !vc->pg->getConnInfo) {
         goto fn_exit;
     }
 
-    MPI_RC(vc->pg->getConnInfo(vc->pg_rank, bc, SFI_KVSAPPSTRLEN, vc->pg));
-    ret = MPIU_Str_get_binary_arg(bc, "SFI", addr, gl_data.bound_addrlen, &len);
+    MPI_RC(vc->pg->getConnInfo(vc->pg_rank, bc, OFI_KVSAPPSTRLEN, vc->pg));
+    ret = MPIU_Str_get_binary_arg(bc, "OFI", addr, gl_data.bound_addrlen, &len);
     MPIU_ERR_CHKANDJUMP((ret != MPIU_STR_SUCCESS && ret != MPIU_STR_NOMEM) ||
                         (size_t) len != gl_data.bound_addrlen,
                         mpi_errno, MPI_ERR_OTHER, "**badbusinesscard");
-    FI_RC(fi_av_insert(gl_data.av, addr, 1, &(VC_SFI(vc)->direct_addr), 0ULL, NULL), avmap);
-    VC_SFI(vc)->ready = 1;
+    FI_RC(fi_av_insert(gl_data.av, addr, 1, &(VC_OFI(vc)->direct_addr), 0ULL, NULL), avmap);
+    VC_OFI(vc)->ready = 1;
 
   fn_exit:
     if (addr)
@@ -415,7 +415,7 @@ int MPID_nem_ofi_vc_init(MPIDI_VC_t * vc)
 {
     int mpi_errno = MPI_SUCCESS;
     MPIDI_CH3I_VC *const vc_ch = &vc->ch;
-    MPID_nem_ofi_vc_t *const vc_ofi = VC_SFI(vc);
+    MPID_nem_ofi_vc_t *const vc_ofi = VC_OFI(vc);
 
     BEGIN_FUNC(FCNAME);
     vc->sendNoncontig_fn = MPID_nem_ofi_SendNoncontig;
@@ -447,25 +447,25 @@ int MPID_nem_ofi_vc_init(MPIDI_VC_t * vc)
 int MPID_nem_ofi_vc_destroy(MPIDI_VC_t * vc)
 {
     BEGIN_FUNC(FCNAME);
-    if (vc && (VC_SFI(vc)->is_cmvc == 1) && (VC_SFI(vc)->ready == 1)) {
+    if (vc && (VC_OFI(vc)->is_cmvc == 1) && (VC_OFI(vc)->ready == 1)) {
         if (vc->pg != NULL) {
             printf("ERROR: VC Destroy (%p) pg = %s\n", vc, (char *) vc->pg->id);
         }
         MPIDI_VC_t *prev = gl_data.cm_vcs;
-        while (prev && prev != vc && VC_SFI(prev)->next != vc) {
-            prev = VC_SFI(vc)->next;
+        while (prev && prev != vc && VC_OFI(prev)->next != vc) {
+            prev = VC_OFI(vc)->next;
         }
-        if (VC_SFI(prev)->next == vc) {
-            VC_SFI(prev)->next = VC_SFI(vc)->next;
+        if (VC_OFI(prev)->next == vc) {
+            VC_OFI(prev)->next = VC_OFI(vc)->next;
         }
         else if (vc == gl_data.cm_vcs) {
-            gl_data.cm_vcs = VC_SFI(vc)->next;
+            gl_data.cm_vcs = VC_OFI(vc)->next;
         }
         else {
             MPIU_Assert(0);
         }
     }
-    VC_SFI(vc)->ready = 0;
+    VC_OFI(vc)->ready = 0;
     END_FUNC(FCNAME);
     return MPI_SUCCESS;
 }
@@ -477,7 +477,7 @@ int MPID_nem_ofi_vc_terminate(MPIDI_VC_t * vc)
     int mpi_errno = MPI_SUCCESS;
     BEGIN_FUNC(FCNAME);
     MPI_RC(MPIDI_CH3U_Handle_connection(vc, MPIDI_VC_EVENT_TERMINATED));
-    VC_SFI(vc)->ready = 0;
+    VC_OFI(vc)->ready = 0;
     END_FUNC_RC(FCNAME);
 }
 
@@ -502,14 +502,14 @@ int MPID_nem_ofi_vc_terminate(MPIDI_VC_t * vc)
 int MPID_nem_ofi_connect_to_root(const char *business_card, MPIDI_VC_t * new_vc)
 {
     int len, ret, mpi_errno = MPI_SUCCESS, str_errno = MPI_SUCCESS;
-    int my_bc_len = SFI_KVSAPPSTRLEN;
+    int my_bc_len = OFI_KVSAPPSTRLEN;
     char *addr = NULL, *bc = NULL, *my_bc = NULL;
     MPID_Request *sreq;
     uint64_t conn_req_send_bits;
 
     BEGIN_FUNC(FCNAME);
     addr = MPIU_Malloc(gl_data.bound_addrlen);
-    bc = MPIU_Malloc(SFI_KVSAPPSTRLEN);
+    bc = MPIU_Malloc(OFI_KVSAPPSTRLEN);
     MPIU_Assertp(addr);
     MPIU_Assertp(bc);
     my_bc = bc;
@@ -518,34 +518,34 @@ int MPID_nem_ofi_connect_to_root(const char *business_card, MPIDI_VC_t * new_vc)
         goto fn_fail;
     }
     MPI_RC(MPIDI_GetTagFromPort(business_card, &new_vc->port_name_tag));
-    ret = MPIU_Str_get_binary_arg(business_card, "SFI", addr, gl_data.bound_addrlen, &len);
+    ret = MPIU_Str_get_binary_arg(business_card, "OFI", addr, gl_data.bound_addrlen, &len);
     MPIU_ERR_CHKANDJUMP((ret != MPIU_STR_SUCCESS && ret != MPIU_STR_NOMEM) ||
                         (size_t) len != gl_data.bound_addrlen,
                         mpi_errno, MPI_ERR_OTHER, "**badbusinesscard");
-    FI_RC(fi_av_insert(gl_data.av, addr, 1, &(VC_SFI(new_vc)->direct_addr), 0ULL, NULL), avmap);
+    FI_RC(fi_av_insert(gl_data.av, addr, 1, &(VC_OFI(new_vc)->direct_addr), 0ULL, NULL), avmap);
 
-    VC_SFI(new_vc)->ready = 1;
+    VC_OFI(new_vc)->ready = 1;
     str_errno = MPIU_Str_add_int_arg(&bc, &my_bc_len, "tag", new_vc->port_name_tag);
     MPIU_ERR_CHKANDJUMP(str_errno, mpi_errno, MPI_ERR_OTHER, "**argstr_port_name_tag");
     MPI_RC(MPID_nem_ofi_get_business_card(MPIR_Process.comm_world->rank, &bc, &my_bc_len));
-    my_bc_len = SFI_KVSAPPSTRLEN - my_bc_len;
+    my_bc_len = OFI_KVSAPPSTRLEN - my_bc_len;
 
     MPID_nem_ofi_create_req(&sreq, 1);
     sreq->kind = MPID_REQUEST_SEND;
     sreq->dev.OnDataAvail = NULL;
     sreq->dev.next = NULL;
-    REQ_SFI(sreq)->event_callback = MPID_nem_ofi_connect_to_root_callback;
-    REQ_SFI(sreq)->pack_buffer = my_bc;
+    REQ_OFI(sreq)->event_callback = MPID_nem_ofi_connect_to_root_callback;
+    REQ_OFI(sreq)->pack_buffer = my_bc;
     conn_req_send_bits = init_sendtag(0, MPIR_Process.comm_world->rank, 0, MPID_CONN_REQ);
     FI_RC(fi_tsend(gl_data.endpoint,
-                     REQ_SFI(sreq)->pack_buffer,
+                     REQ_OFI(sreq)->pack_buffer,
                      my_bc_len,
                      gl_data.mr,
-                     VC_SFI(new_vc)->direct_addr,
-                     conn_req_send_bits, &(REQ_SFI(sreq)->ofi_context)), tsend);
+                     VC_OFI(new_vc)->direct_addr,
+                     conn_req_send_bits, &(REQ_OFI(sreq)->ofi_context)), tsend);
     MPID_nem_ofi_poll(MPID_NONBLOCKING_POLL);
-    VC_SFI(new_vc)->is_cmvc = 1;
-    VC_SFI(new_vc)->next = gl_data.cm_vcs;
+    VC_OFI(new_vc)->is_cmvc = 1;
+    VC_OFI(new_vc)->next = gl_data.cm_vcs;
     gl_data.cm_vcs = new_vc;
   fn_exit:
     if (addr)
@@ -567,7 +567,7 @@ int MPID_nem_ofi_get_business_card(int my_rank ATTRIBUTE((unused)),
     BEGIN_FUNC(FCNAME);
     str_errno = MPIU_Str_add_binary_arg(bc_val_p,
                                         val_max_sz_p,
-                                        "SFI",
+                                        "OFI",
                                         (char *) &gl_data.bound_addr, sizeof(gl_data.bound_addr));
     if (str_errno) {
         MPIU_ERR_CHKANDJUMP(str_errno == MPIU_STR_NOMEM, mpi_errno, MPI_ERR_OTHER, "**buscard_len");
diff --git a/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_impl.h b/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_impl.h
index 553a5b7..9e5f048 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_impl.h
+++ b/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_impl.h
@@ -7,8 +7,8 @@
  *  to Argonne National Laboratory subject to Software Grant and Corporate
  *  Contributor License Agreement dated February 8, 2012.
  */
-#ifndef SFI_IMPL_H
-#define SFI_IMPL_H
+#ifndef OFI_IMPL_H
+#define OFI_IMPL_H
 
 #include "mpid_nem_impl.h"
 #include "mpihandlemem.h"
@@ -61,20 +61,20 @@ typedef struct {
 /* This is per destination          */
 /* ******************************** */
 typedef struct {
-    fi_addr_t direct_addr;      /* Remote SFI address */
+    fi_addr_t direct_addr;      /* Remote OFI address */
     int ready;                  /* VC ready state     */
     int is_cmvc;                /* Cleanup VC         */
     MPIDI_VC_t *next;           /* VC queue           */
 } MPID_nem_ofi_vc_t;
-#define VC_SFI(vc) ((MPID_nem_ofi_vc_t *)vc->ch.netmod_area.padding)
+#define VC_OFI(vc) ((MPID_nem_ofi_vc_t *)vc->ch.netmod_area.padding)
 
 /* ******************************** */
 /* Per request object data          */
-/* SFI/Netmod specific              */
+/* OFI/Netmod specific              */
 /* ******************************** */
 typedef struct {
     context_t ofi_context;      /* Context Object              */
-    void *addr;                 /* SFI Address                 */
+    void *addr;                 /* OFI Address                 */
     event_callback_fn event_callback;   /* Callback Event              */
     char *pack_buffer;          /* MPI Pack Buffer             */
     int pack_buffer_size;       /* Pack buffer size            */
@@ -84,7 +84,7 @@ typedef struct {
     uint64_t tag;               /* 64 bit tag request          */
     MPID_Request *parent;       /* Parent request              */
 } MPID_nem_ofi_req_t;
-#define REQ_SFI(req) ((MPID_nem_ofi_req_t *)((req)->ch.netmod_area.padding))
+#define REQ_OFI(req) ((MPID_nem_ofi_req_t *)((req)->ch.netmod_area.padding))
 
 /* ******************************** */
 /* Logging and function macros      */
@@ -109,7 +109,7 @@ fn_fail:                      \
    : __FILE__                                   \
 )
 #define DECL_FUNC(FUNCNAME)  MPIU_QUOTE(FUNCNAME)
-#define SFI_COMPILE_TIME_ASSERT(expr_)                                  \
+#define OFI_COMPILE_TIME_ASSERT(expr_)                                  \
   do { switch(0) { case 0: case (expr_): default: break; } } while (0)
 
 #define FI_RC(FUNC,STR)                                         \
@@ -151,17 +151,17 @@ fn_fail:                      \
 
 #define VC_READY_CHECK(vc)                      \
 ({                                              \
-  if (1 != VC_SFI(vc)->ready) {                 \
+  if (1 != VC_OFI(vc)->ready) {                 \
     MPI_RC(MPID_nem_ofi_vc_connect(vc));        \
   }                                             \
 })
 
-#define SFI_ADDR_INIT(src, vc, remote_proc) \
+#define OFI_ADDR_INIT(src, vc, remote_proc) \
 ({                                          \
   if (MPI_ANY_SOURCE != src) {              \
     MPIU_Assert(vc != NULL);                \
     VC_READY_CHECK(vc);                     \
-    remote_proc = VC_SFI(vc)->direct_addr;  \
+    remote_proc = VC_OFI(vc)->direct_addr;  \
   } else {                                  \
     MPIU_Assert(vc == NULL);                \
     remote_proc = gl_data.any_addr;         \
@@ -197,14 +197,14 @@ fn_fail:                      \
 #define MPID_TAG_SHIFT           (28)
 #define MPID_PSOURCE_SHIFT       (16)
 #define MPID_PORT_SHIFT          (32)
-#define SFI_KVSAPPSTRLEN         1024
+#define OFI_KVSAPPSTRLEN         1024
 
 /* ******************************** */
 /* Request manipulation inlines     */
 /* ******************************** */
 static inline void MPID_nem_ofi_init_req(MPID_Request * req)
 {
-    memset(REQ_SFI(req), 0, sizeof(MPID_nem_ofi_req_t));
+    memset(REQ_OFI(req), 0, sizeof(MPID_nem_ofi_req_t));
 }
 
 static inline int MPID_nem_ofi_create_req(MPID_Request ** request, int refcnt)
@@ -320,7 +320,7 @@ int MPID_nem_ofi_iSendContig(MPIDI_VC_t * vc, MPID_Request * sreq, void *hdr,
                              MPIDI_msg_sz_t hdr_sz, void *data, MPIDI_msg_sz_t data_sz);
 
 /* ************************************************************************** */
-/* SFI utility functions : not exposed as a netmod public API                 */
+/* OFI utility functions : not exposed as a netmod public API                 */
 /* ************************************************************************** */
 #define MPID_NONBLOCKING_POLL 0
 #define MPID_BLOCKING_POLL 1
diff --git a/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_init.c b/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_init.c
index cd9e7c7..182a780 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_init.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_init.c
@@ -21,7 +21,7 @@ int MPID_nem_ofi_init(MPIDI_PG_t * pg_p, int pg_rank, char **bc_val_p, int *val_
     info_t hints, *prov_tagged, *prov_use;
     cq_attr_t cq_attr;
     av_attr_t av_attr;
-    char kvsname[SFI_KVSAPPSTRLEN], key[SFI_KVSAPPSTRLEN], bc[SFI_KVSAPPSTRLEN];
+    char kvsname[OFI_KVSAPPSTRLEN], key[OFI_KVSAPPSTRLEN], bc[OFI_KVSAPPSTRLEN];
     char *my_bc, *addrs, *null_addr;
     fi_addr_t *fi_addrs = NULL;
     MPIDI_VC_t *vc;
@@ -39,11 +39,11 @@ int MPID_nem_ofi_init(MPIDI_PG_t * pg_p, int pg_rank, char **bc_val_p, int *val_
     /*        communication calls.                                              */
     /*        Note that we do not fill in FI_LOCAL_MR, which means this netmod  */
     /*        does not support exchange of memory regions on communication calls */
-    /*        SFI requires that all communication calls use a registered mr     */
+    /*        OFI requires that all communication calls use a registered mr     */
     /*        but in our case this netmod is written to only support transfers  */
     /*        on a dynamic memory region that spans all of memory.  So, we do   */
     /*        not set the FI_LOCAL_MR mode bit, and we set the FI_DYNAMIC_MR    */
-    /*        bit to tell SFI our requirement and filter providers appropriately */
+    /*        bit to tell OFI our requirement and filter providers appropriately */
     /* ep_type:  reliable datagram operation                                    */
     /* caps:     Capabilities required from the provider.  The bits specified   */
     /*           with buffered receive, cancel, and remote complete implements  */
@@ -62,7 +62,7 @@ int MPID_nem_ofi_init(MPIDI_PG_t * pg_p, int pg_rank, char **bc_val_p, int *val_
 
     /* ------------------------------------------------------------------------ */
     /* FI_VERSION provides binary backward and forward compatibility support    */
-    /* Specify the version of SFI is coded to, the provider will select struct  */
+    /* Specify the version of OFI is coded to, the provider will select struct  */
     /* layouts that are compatible with this version.                           */
     /* ------------------------------------------------------------------------ */
     fi_version = FI_VERSION(1, 0);
@@ -203,8 +203,8 @@ int MPID_nem_ofi_init(MPIDI_PG_t * pg_p, int pg_rank, char **bc_val_p, int *val_
     /* Publish the business card        */
     /* to the KVS                       */
     /* -------------------------------- */
-    PMI_RC(PMI_KVS_Get_my_name(kvsname, SFI_KVSAPPSTRLEN), pmi);
-    sprintf(key, "SFI-%d", pg_rank);
+    PMI_RC(PMI_KVS_Get_my_name(kvsname, OFI_KVSAPPSTRLEN), pmi);
+    sprintf(key, "OFI-%d", pg_rank);
 
     PMI_RC(PMI_KVS_Put(kvsname, key, my_bc), pmi);
     PMI_RC(PMI_KVS_Commit(kvsname), pmi);
@@ -228,10 +228,10 @@ int MPID_nem_ofi_init(MPIDI_PG_t * pg_p, int pg_rank, char **bc_val_p, int *val_
     MPIU_CHKLMEM_MALLOC(addrs, char *, pg_p->size * gl_data.bound_addrlen, mpi_errno, "addrs");
 
     for (i = 0; i < pg_p->size; ++i) {
-        sprintf(key, "SFI-%d", i);
+        sprintf(key, "OFI-%d", i);
 
-        PMI_RC(PMI_KVS_Get(kvsname, key, bc, SFI_KVSAPPSTRLEN), pmi);
-        ret = MPIU_Str_get_binary_arg(bc, "SFI",
+        PMI_RC(PMI_KVS_Get(kvsname, key, bc, OFI_KVSAPPSTRLEN), pmi);
+        ret = MPIU_Str_get_binary_arg(bc, "OFI",
                                       (char *) &addrs[i * gl_data.bound_addrlen],
                                       gl_data.bound_addrlen, &len);
         MPIU_ERR_CHKANDJUMP((ret != MPIU_STR_SUCCESS && ret != MPIU_STR_NOMEM) ||
@@ -261,8 +261,8 @@ int MPID_nem_ofi_init(MPIDI_PG_t * pg_p, int pg_rank, char **bc_val_p, int *val_
     /* --------------------------------- */
     for (i = 0; i < pg_p->size; ++i) {
         MPIDI_PG_Get_vc(pg_p, i, &vc);
-        VC_SFI(vc)->direct_addr = fi_addrs[i];
-        VC_SFI(vc)->ready = 1;
+        VC_OFI(vc)->direct_addr = fi_addrs[i];
+        VC_OFI(vc)->ready = 1;
     }
 
     /* --------------------------------------------- */
@@ -316,9 +316,9 @@ int MPID_nem_ofi_finalize(void)
 
 static inline int compile_time_checking()
 {
-    SFI_COMPILE_TIME_ASSERT(sizeof(MPID_nem_ofi_vc_t) <= MPID_NEM_VC_NETMOD_AREA_LEN);
-    SFI_COMPILE_TIME_ASSERT(sizeof(MPID_nem_ofi_req_t) <= MPID_NEM_REQ_NETMOD_AREA_LEN);
-    SFI_COMPILE_TIME_ASSERT(sizeof(iovec_t) == sizeof(MPID_IOV));
+    OFI_COMPILE_TIME_ASSERT(sizeof(MPID_nem_ofi_vc_t) <= MPID_NEM_VC_NETMOD_AREA_LEN);
+    OFI_COMPILE_TIME_ASSERT(sizeof(MPID_nem_ofi_req_t) <= MPID_NEM_REQ_NETMOD_AREA_LEN);
+    OFI_COMPILE_TIME_ASSERT(sizeof(iovec_t) == sizeof(MPID_IOV));
     MPIU_Assert(((void *) &(((iovec_t *) 0)->iov_base)) ==
                 ((void *) &(((MPID_IOV *) 0)->MPID_IOV_BUF)));
     MPIU_Assert(((void *) &(((iovec_t *) 0)->iov_len)) ==
diff --git a/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_msg.c b/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_msg.c
index ffa3761..49a6277 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_msg.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_msg.c
@@ -61,33 +61,33 @@
     c = 1;                                                              \
     MPID_cc_incr(sreq->cc_ptr, &c);                                     \
     MPID_cc_incr(sreq->cc_ptr, &c);                                     \
-    REQ_SFI(sreq)->event_callback   = MPID_nem_ofi_data_callback;       \
-    REQ_SFI(sreq)->pack_buffer      = pack_buffer;                      \
-    REQ_SFI(sreq)->pack_buffer_size = pkt_len;                          \
-    REQ_SFI(sreq)->vc               = vc;                               \
-    REQ_SFI(sreq)->tag              = match_bits;                       \
+    REQ_OFI(sreq)->event_callback   = MPID_nem_ofi_data_callback;       \
+    REQ_OFI(sreq)->pack_buffer      = pack_buffer;                      \
+    REQ_OFI(sreq)->pack_buffer_size = pkt_len;                          \
+    REQ_OFI(sreq)->vc               = vc;                               \
+    REQ_OFI(sreq)->tag              = match_bits;                       \
                                                                         \
     MPID_nem_ofi_create_req(&cts_req, 1);                               \
     cts_req->dev.OnDataAvail         = NULL;                            \
     cts_req->dev.next                = NULL;                            \
-    REQ_SFI(cts_req)->event_callback = MPID_nem_ofi_cts_recv_callback;  \
-    REQ_SFI(cts_req)->parent         = sreq;                            \
+    REQ_OFI(cts_req)->event_callback = MPID_nem_ofi_cts_recv_callback;  \
+    REQ_OFI(cts_req)->parent         = sreq;                            \
                                                                         \
     FI_RC(fi_trecv(gl_data.endpoint,                                \
                        NULL,                                            \
                        0,                                               \
                        gl_data.mr,                                      \
-                       VC_SFI(vc)->direct_addr,                         \
+                       VC_OFI(vc)->direct_addr,                         \
                        match_bits | MPID_MSG_CTS,                       \
                        0, /* Exact tag match, no ignore bits */         \
-                       &(REQ_SFI(cts_req)->ofi_context)),trecv);    \
+                       &(REQ_OFI(cts_req)->ofi_context)),trecv);    \
     FI_RC(fi_tsend(gl_data.endpoint,                                  \
-                     &REQ_SFI(sreq)->pack_buffer_size,                  \
-                     sizeof(REQ_SFI(sreq)->pack_buffer_size),           \
+                     &REQ_OFI(sreq)->pack_buffer_size,                  \
+                     sizeof(REQ_OFI(sreq)->pack_buffer_size),           \
                      gl_data.mr,                                        \
-                     VC_SFI(vc)->direct_addr,                           \
+                     VC_OFI(vc)->direct_addr,                           \
                      match_bits,                                        \
-                     &(REQ_SFI(sreq)->ofi_context)),tsend);           \
+                     &(REQ_OFI(sreq)->ofi_context)),tsend);           \
   })
 
 
@@ -106,25 +106,25 @@ static int MPID_nem_ofi_data_callback(cq_tagged_entry_t * wc, MPID_Request * sre
     uint64_t tag = 0;
     BEGIN_FUNC(FCNAME);
     if (sreq->cc == 2) {
-        vc = REQ_SFI(sreq)->vc;
-        REQ_SFI(sreq)->tag = tag | MPID_MSG_DATA;
+        vc = REQ_OFI(sreq)->vc;
+        REQ_OFI(sreq)->tag = tag | MPID_MSG_DATA;
         FI_RC(fi_tsend(gl_data.endpoint,
-                         REQ_SFI(sreq)->pack_buffer,
-                         REQ_SFI(sreq)->pack_buffer_size,
+                         REQ_OFI(sreq)->pack_buffer,
+                         REQ_OFI(sreq)->pack_buffer_size,
                          gl_data.mr,
-                         VC_SFI(vc)->direct_addr,
-                         wc->tag | MPID_MSG_DATA, (void *) &(REQ_SFI(sreq)->ofi_context)), tsend);
+                         VC_OFI(vc)->direct_addr,
+                         wc->tag | MPID_MSG_DATA, (void *) &(REQ_OFI(sreq)->ofi_context)), tsend);
     }
     if (sreq->cc == 1) {
-        if (REQ_SFI(sreq)->pack_buffer)
-            MPIU_Free(REQ_SFI(sreq)->pack_buffer);
+        if (REQ_OFI(sreq)->pack_buffer)
+            MPIU_Free(REQ_OFI(sreq)->pack_buffer);
 
         reqFn = sreq->dev.OnDataAvail;
         if (!reqFn) {
             MPIDI_CH3U_Request_complete(sreq);
         }
         else {
-            vc = REQ_SFI(sreq)->vc;
+            vc = REQ_OFI(sreq)->vc;
             MPI_RC(reqFn(vc, sreq, &complete));
         }
     }
@@ -144,7 +144,7 @@ static int MPID_nem_ofi_cts_recv_callback(cq_tagged_entry_t * wc, MPID_Request *
 {
     int mpi_errno = MPI_SUCCESS;
     BEGIN_FUNC(FCNAME);
-    MPI_RC(MPID_nem_ofi_data_callback(wc, REQ_SFI(rreq)->parent));
+    MPI_RC(MPID_nem_ofi_data_callback(wc, REQ_OFI(rreq)->parent));
     MPIDI_CH3U_Request_complete(rreq);
     END_FUNC_RC(FCNAME);
 }
diff --git a/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_progress.c b/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_progress.c
index ebe1fa2..02cc02e 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_progress.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_progress.c
@@ -50,14 +50,14 @@ static int tsearch_callback(cq_tagged_entry_t * wc, MPID_Request * rreq)
     int mpi_errno = MPI_SUCCESS;
     BEGIN_FUNC(FCNAME);
     if (wc->data) {
-        REQ_SFI(rreq)->match_state = TSEARCH_FOUND;
+        REQ_OFI(rreq)->match_state = TSEARCH_FOUND;
         rreq->status.MPI_SOURCE = get_source(wc->tag);
         rreq->status.MPI_TAG = get_tag(wc->tag);
         MPIR_STATUS_SET_COUNT(rreq->status, wc->len);
         rreq->status.MPI_ERROR = MPI_SUCCESS;
     }
     else {
-        REQ_SFI(rreq)->match_state = TSEARCH_NOT_FOUND;
+        REQ_OFI(rreq)->match_state = TSEARCH_NOT_FOUND;
     }
     END_FUNC(FCNAME);
     return mpi_errno;
@@ -92,9 +92,9 @@ int MPID_nem_ofi_iprobe_impl(struct MPIDI_VC *vc,
         rreq = &rreq_s;
         rreq->dev.OnDataAvail = NULL;
     }
-    REQ_SFI(rreq)->event_callback = tsearch_callback;
-    REQ_SFI(rreq)->match_state = TSEARCH_INIT;
-    SFI_ADDR_INIT(source, vc, remote_proc);
+    REQ_OFI(rreq)->event_callback = tsearch_callback;
+    REQ_OFI(rreq)->match_state = TSEARCH_INIT;
+    OFI_ADDR_INIT(source, vc, remote_proc);
     match_bits = init_recvtag(&mask_bits, comm->context_id + context_offset, source, tag);
 
     /* ------------------------------------------------------------------------ */
@@ -110,7 +110,7 @@ int MPID_nem_ofi_iprobe_impl(struct MPIDI_VC *vc,
                      0, /* Flags                */
                      &remote_proc,      /* Remote Address       */
                      &len,      /* Out:  incoming msglen */
-                     &(REQ_SFI(rreq)->ofi_context));    /* Nonblocking context  */
+                     &(REQ_OFI(rreq)->ofi_context));    /* Nonblocking context  */
     if (ret == -FI_ENOMSG) {
         *flag = 0;
         goto fn_exit;
@@ -126,10 +126,10 @@ int MPID_nem_ofi_iprobe_impl(struct MPIDI_VC *vc,
                              "**ofi_tsearch", "**ofi_tsearch %s %d %s %s",
                              __SHORT_FILE__, __LINE__, FCNAME, fi_strerror(-ret));
     }
-    while (TSEARCH_INIT == REQ_SFI(rreq)->match_state)
+    while (TSEARCH_INIT == REQ_OFI(rreq)->match_state)
         MPID_nem_ofi_poll(MPID_BLOCKING_POLL);
 
-    if (REQ_SFI(rreq)->match_state == TSEARCH_NOT_FOUND) {
+    if (REQ_OFI(rreq)->match_state == TSEARCH_NOT_FOUND) {
         if (rreq_ptr) {
             MPIDI_CH3_Request_destroy(rreq);
             *rreq_ptr = NULL;
@@ -233,16 +233,16 @@ int MPID_nem_ofi_poll(int in_blocking_poll)
         if (ret > 0) {
             if (NULL != wc.op_context) {
                 req = context_to_req(wc.op_context);
-                if (REQ_SFI(req)->event_callback) {
-                    MPI_RC(REQ_SFI(req)->event_callback(&wc, req));
+                if (REQ_OFI(req)->event_callback) {
+                    MPI_RC(REQ_OFI(req)->event_callback(&wc, req));
                     continue;
                 }
                 reqFn = req->dev.OnDataAvail;
                 if (reqFn) {
-                    if (REQ_SFI(req)->pack_buffer) {
-                        MPIU_Free(REQ_SFI(req)->pack_buffer);
+                    if (REQ_OFI(req)->pack_buffer) {
+                        MPIU_Free(REQ_OFI(req)->pack_buffer);
                     }
-                    vc = REQ_SFI(req)->vc;
+                    vc = REQ_OFI(req)->vc;
 
                     complete = 0;
                     MPI_RC(reqFn(vc, req, &complete));
@@ -268,10 +268,10 @@ int MPID_nem_ofi_poll(int in_blocking_poll)
                     /* ----------------------------------------------------- */
                     req = context_to_req(error.op_context);
                     if (req->kind == MPID_REQUEST_SEND) {
-                        mpi_errno = REQ_SFI(req)->event_callback(NULL, req);
+                        mpi_errno = REQ_OFI(req)->event_callback(NULL, req);
                     }
                     else if (req->kind == MPID_REQUEST_RECV) {
-                        mpi_errno = REQ_SFI(req)->event_callback(&wc, req);
+                        mpi_errno = REQ_OFI(req)->event_callback(&wc, req);
                         req->status.MPI_ERROR = MPI_ERR_TRUNCATE;
                         req->status.MPI_TAG = error.tag;
                     }
diff --git a/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_tagged.c b/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_tagged.c
index 89affd2..23abe0e 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_tagged.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_tagged.c
@@ -23,8 +23,8 @@ static inline int MPID_nem_ofi_sync_recv_callback(cq_tagged_entry_t * wc ATTRIBU
 
     BEGIN_FUNC(FCNAME);
 
-    MPIDI_CH3U_Recvq_DP(REQ_SFI(rreq)->parent);
-    MPIDI_CH3U_Request_complete(REQ_SFI(rreq)->parent);
+    MPIDI_CH3U_Recvq_DP(REQ_OFI(rreq)->parent);
+    MPIDI_CH3U_Request_complete(REQ_OFI(rreq)->parent);
     MPIDI_CH3U_Request_complete(rreq);
 
     END_FUNC(FCNAME);
@@ -42,8 +42,8 @@ static inline int MPID_nem_ofi_send_callback(cq_tagged_entry_t * wc ATTRIBUTE((u
 {
     int mpi_errno = MPI_SUCCESS;
     BEGIN_FUNC(FCNAME);
-    if (REQ_SFI(sreq)->pack_buffer)
-        MPIU_Free(REQ_SFI(sreq)->pack_buffer);
+    if (REQ_OFI(sreq)->pack_buffer)
+        MPIU_Free(REQ_OFI(sreq)->pack_buffer);
     MPIDI_CH3U_Request_complete(sreq);
     END_FUNC(FCNAME);
     return mpi_errno;
@@ -69,16 +69,16 @@ static inline int MPID_nem_ofi_recv_callback(cq_tagged_entry_t * wc, MPID_Reques
     rreq->status.MPI_ERROR = MPI_SUCCESS;
     rreq->status.MPI_SOURCE = get_source(wc->tag);
     rreq->status.MPI_TAG = get_tag(wc->tag);
-    REQ_SFI(rreq)->req_started = 1;
+    REQ_OFI(rreq)->req_started = 1;
     MPIR_STATUS_SET_COUNT(rreq->status, wc->len);
 
-    if (REQ_SFI(rreq)->pack_buffer) {
-        MPIDI_CH3U_Buffer_copy(REQ_SFI(rreq)->pack_buffer,
+    if (REQ_OFI(rreq)->pack_buffer) {
+        MPIDI_CH3U_Buffer_copy(REQ_OFI(rreq)->pack_buffer,
                                MPIR_STATUS_GET_COUNT(rreq->status),
                                MPI_BYTE, &err0, rreq->dev.user_buf,
                                rreq->dev.user_count, rreq->dev.datatype, &sz, &err1);
         MPIR_STATUS_SET_COUNT(rreq->status, sz);
-        MPIU_Free(REQ_SFI(rreq)->pack_buffer);
+        MPIU_Free(REQ_OFI(rreq)->pack_buffer);
         if (err0 || err1) {
             rreq->status.MPI_ERROR = MPI_ERR_TYPE;
         }
@@ -91,7 +91,7 @@ static inline int MPID_nem_ofi_recv_callback(cq_tagged_entry_t * wc, MPID_Reques
         /* MPID_SYNC_SEND_ACK is set in the tag bits to provide */
         /* separation of MPI messages and protocol messages     */
         /* ---------------------------------------------------- */
-        vc = REQ_SFI(rreq)->vc;
+        vc = REQ_OFI(rreq)->vc;
         if (!vc) {      /* MPI_ANY_SOURCE -- Post message from status, complete the VC */
             src = get_source(wc->tag);
             vc = rreq->comm->vcr[src];
@@ -102,14 +102,14 @@ static inline int MPID_nem_ofi_recv_callback(cq_tagged_entry_t * wc, MPID_Reques
         MPID_nem_ofi_create_req(&sync_req, 1);
         sync_req->dev.OnDataAvail = NULL;
         sync_req->dev.next = NULL;
-        REQ_SFI(sync_req)->event_callback = MPID_nem_ofi_sync_recv_callback;
-        REQ_SFI(sync_req)->parent = rreq;
+        REQ_OFI(sync_req)->event_callback = MPID_nem_ofi_sync_recv_callback;
+        REQ_OFI(sync_req)->parent = rreq;
         FI_RC(fi_tsend(gl_data.endpoint,
                          NULL,
                          0,
                          gl_data.mr,
-                         VC_SFI(vc)->direct_addr,
-                         ssend_bits, &(REQ_SFI(sync_req)->ofi_context)), tsend);
+                         VC_OFI(vc)->direct_addr,
+                         ssend_bits, &(REQ_OFI(sync_req)->ofi_context)), tsend);
     }
     else {
         /* ---------------------------------------------------- */
@@ -150,8 +150,8 @@ static inline int do_isend(struct MPIDI_VC *vc,
     MPID_nem_ofi_create_req(&sreq, 2);
     sreq->kind = MPID_REQUEST_SEND;
     sreq->dev.OnDataAvail = NULL;
-    REQ_SFI(sreq)->event_callback = MPID_nem_ofi_send_callback;
-    REQ_SFI(sreq)->vc = vc;
+    REQ_OFI(sreq)->event_callback = MPID_nem_ofi_send_callback;
+    REQ_OFI(sreq)->vc = vc;
 
     /* ---------------------------------------------------- */
     /* Create the pack buffer (if required), and allocate   */
@@ -167,7 +167,7 @@ static inline int do_isend(struct MPIDI_VC *vc,
                              MPI_ERR_OTHER, "**nomem", "**nomem %s", "Send buffer alloc");
         MPIDI_CH3U_Buffer_copy(buf, count, datatype, &err0,
                                send_buffer, data_sz, MPI_BYTE, &data_sz, &err1);
-        REQ_SFI(sreq)->pack_buffer = send_buffer;
+        REQ_OFI(sreq)->pack_buffer = send_buffer;
     }
 
     if (type == MPID_SYNC_SEND) {
@@ -181,26 +181,26 @@ static inline int do_isend(struct MPIDI_VC *vc,
         MPID_nem_ofi_create_req(&sync_req, 1);
         sync_req->dev.OnDataAvail = NULL;
         sync_req->dev.next = NULL;
-        REQ_SFI(sync_req)->event_callback = MPID_nem_ofi_sync_recv_callback;
-        REQ_SFI(sync_req)->parent = sreq;
+        REQ_OFI(sync_req)->event_callback = MPID_nem_ofi_sync_recv_callback;
+        REQ_OFI(sync_req)->parent = sreq;
         ssend_match = init_recvtag(&ssend_mask, comm->context_id + context_offset, dest, tag);
         ssend_match |= MPID_SYNC_SEND_ACK;
         FI_RC(fi_trecv(gl_data.endpoint,    /* endpoint    */
                            NULL,        /* recvbuf     */
                            0,   /* data sz     */
                            gl_data.mr,  /* dynamic mr  */
-                           VC_SFI(vc)->direct_addr,     /* remote proc */
+                           VC_OFI(vc)->direct_addr,     /* remote proc */
                            ssend_match, /* match bits  */
                            0ULL,        /* mask bits   */
-                           &(REQ_SFI(sync_req)->ofi_context)), trecv);
+                           &(REQ_OFI(sync_req)->ofi_context)), trecv);
     }
     FI_RC(fi_tsend(gl_data.endpoint,  /* Endpoint                       */
                      send_buffer,       /* Send buffer(packed or user)    */
                      data_sz,   /* Size of the send               */
                      gl_data.mr,        /* Dynamic memory region          */
-                     VC_SFI(vc)->direct_addr,   /* Use the address of this VC     */
+                     VC_OFI(vc)->direct_addr,   /* Use the address of this VC     */
                      match_bits,        /* Match bits                     */
-                     &(REQ_SFI(sreq)->ofi_context)), tsend);
+                     &(REQ_OFI(sreq)->ofi_context)), tsend);
     *request = sreq;
     END_FUNC_RC(FCNAME);
 }
@@ -223,8 +223,8 @@ int MPID_nem_ofi_recv_posted(struct MPIDI_VC *vc, struct MPID_Request *rreq)
     /* Initialize the request   */
     /* ------------------------ */
     MPID_nem_ofi_init_req(rreq);
-    REQ_SFI(rreq)->event_callback = MPID_nem_ofi_recv_callback;
-    REQ_SFI(rreq)->vc = vc;
+    REQ_OFI(rreq)->event_callback = MPID_nem_ofi_recv_callback;
+    REQ_OFI(rreq)->vc = vc;
 
     /* ---------------------------------------------------- */
     /* Fill out the match info, and allocate the pack buffer */
@@ -234,7 +234,7 @@ int MPID_nem_ofi_recv_posted(struct MPIDI_VC *vc, struct MPID_Request *rreq)
     tag = rreq->dev.match.parts.tag;
     context_id = rreq->dev.match.parts.context_id;
     match_bits = init_recvtag(&mask_bits, context_id, src, tag);
-    SFI_ADDR_INIT(src, vc, remote_proc);
+    OFI_ADDR_INIT(src, vc, remote_proc);
     MPIDI_Datatype_get_info(rreq->dev.user_count, rreq->dev.datatype,
                             dt_contig, data_sz, dt_ptr, dt_true_lb);
     if (dt_contig) {
@@ -244,7 +244,7 @@ int MPID_nem_ofi_recv_posted(struct MPIDI_VC *vc, struct MPID_Request *rreq)
         recv_buffer = (char *) MPIU_Malloc(data_sz);
         MPIU_ERR_CHKANDJUMP1(recv_buffer == NULL, mpi_errno, MPI_ERR_OTHER,
                              "**nomem", "**nomem %s", "Recv Pack Buffer alloc");
-        REQ_SFI(rreq)->pack_buffer = recv_buffer;
+        REQ_OFI(rreq)->pack_buffer = recv_buffer;
     }
 
     /* ---------------- */
@@ -255,7 +255,7 @@ int MPID_nem_ofi_recv_posted(struct MPIDI_VC *vc, struct MPID_Request *rreq)
                        data_sz,
                        gl_data.mr,
                        remote_proc,
-                       match_bits, mask_bits, &(REQ_SFI(rreq)->ofi_context)), trecv);
+                       match_bits, mask_bits, &(REQ_OFI(rreq)->ofi_context)), trecv);
     MPID_nem_ofi_poll(MPID_NONBLOCKING_POLL);
     END_FUNC_RC(FCNAME);
 }
@@ -337,7 +337,7 @@ int MPID_nem_ofi_issend(struct MPIDI_VC *vc,
   BEGIN_FUNC(FCNAME);                                   \
   MPID_nem_ofi_poll(MPID_NONBLOCKING_POLL);             \
   ret = fi_cancel((fid_t)gl_data.endpoint,              \
-                  &(REQ_SFI(req)->ofi_context));        \
+                  &(REQ_OFI(req)->ofi_context));        \
   if (ret == 0) {                                        \
     MPIR_STATUS_SET_CANCEL_BIT(req->status, TRUE);      \
   } else {                                              \
@@ -384,7 +384,7 @@ int MPID_nem_ofi_anysource_matched(MPID_Request * rreq)
     /* source request on another device.  We have the chance */
     /* to cancel this shared request if it has been posted   */
     /* ----------------------------------------------------- */
-    ret = fi_cancel((fid_t) gl_data.endpoint, &(REQ_SFI(rreq)->ofi_context));
+    ret = fi_cancel((fid_t) gl_data.endpoint, &(REQ_OFI(rreq)->ofi_context));
     if (ret == 0) {
         /* --------------------------------------------------- */
         /* Request cancelled:  cancel and complete the request */
diff --git a/src/mpid/ch3/channels/nemesis/netmod/ofi/subconfigure.m4 b/src/mpid/ch3/channels/nemesis/netmod/ofi/subconfigure.m4
index cb6b35c..ac39fe7 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/ofi/subconfigure.m4
+++ b/src/mpid/ch3/channels/nemesis/netmod/ofi/subconfigure.m4
@@ -7,18 +7,18 @@ AC_DEFUN([PAC_SUBCFG_PREREQ_]PAC_SUBCFG_AUTO_SUFFIX,[
             AS_CASE([$net],[ofi],[build_nemesis_netmod_ofi=yes])
         done
     ])
-    AM_CONDITIONAL([BUILD_NEMESIS_NETMOD_SFI],[test "X$build_nemesis_netmod_ofi" = "Xyes"])
+    AM_CONDITIONAL([BUILD_NEMESIS_NETMOD_OFI],[test "X$build_nemesis_netmod_ofi" = "Xyes"])
 ])dnl
 
 AC_DEFUN([PAC_SUBCFG_BODY_]PAC_SUBCFG_AUTO_SUFFIX,[
-AM_COND_IF([BUILD_NEMESIS_NETMOD_SFI],[
+AM_COND_IF([BUILD_NEMESIS_NETMOD_OFI],[
     AC_MSG_NOTICE([RUNNING CONFIGURE FOR ch3:nemesis:ofi])
 
     PAC_SET_HEADER_LIB_PATH(ofi)
     PAC_CHECK_HEADER_LIB_FATAL(ofi, rdma/fabric.h, fabric, fi_getinfo)
 
     AC_DEFINE([ENABLE_COMM_OVERRIDES], 1, [define to add per-vc function pointers to override send and recv functions])
-])dnl end AM_COND_IF(BUILD_NEMESIS_NETMOD_SFI,...)
+])dnl end AM_COND_IF(BUILD_NEMESIS_NETMOD_OFI,...)
 ])dnl end _BODY
 
 [#] end of __file__

http://git.mpich.org/mpich.git/commitdiff/d2403c946dc5a87fa91290a43eefdab54bc678ee

commit d2403c946dc5a87fa91290a43eefdab54bc678ee
Author: Charles J Archer <charles.j.archer at intel.com>
Date:   Tue Dec 9 15:04:56 2014 -0800

    Open Fabrics offical name is OFI rename from SFI

diff --git a/src/mpid/ch3/channels/nemesis/netmod/Makefile.mk b/src/mpid/ch3/channels/nemesis/netmod/Makefile.mk
index 978da0e..0c30f12 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/Makefile.mk
+++ b/src/mpid/ch3/channels/nemesis/netmod/Makefile.mk
@@ -14,4 +14,4 @@ include $(top_srcdir)/src/mpid/ch3/channels/nemesis/netmod/scif/Makefile.mk
 include $(top_srcdir)/src/mpid/ch3/channels/nemesis/netmod/portals4/Makefile.mk
 include $(top_srcdir)/src/mpid/ch3/channels/nemesis/netmod/ib/Makefile.mk
 include $(top_srcdir)/src/mpid/ch3/channels/nemesis/netmod/mxm/Makefile.mk
-include $(top_srcdir)/src/mpid/ch3/channels/nemesis/netmod/sfi/Makefile.mk
+include $(top_srcdir)/src/mpid/ch3/channels/nemesis/netmod/ofi/Makefile.mk
diff --git a/src/mpid/ch3/channels/nemesis/netmod/ofi/Makefile.mk b/src/mpid/ch3/channels/nemesis/netmod/ofi/Makefile.mk
new file mode 100644
index 0000000..0e19072
--- /dev/null
+++ b/src/mpid/ch3/channels/nemesis/netmod/ofi/Makefile.mk
@@ -0,0 +1,19 @@
+## -*- Mode: Makefile; -*-
+## vim: set ft=automake :
+##
+## (C) 2011 by Argonne National Laboratory.
+##     See COPYRIGHT in top-level directory.
+##
+if BUILD_NEMESIS_NETMOD_SFI
+
+mpi_core_sources +=                                 		\
+    src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_init.c 	\
+    src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_cm.c	 	\
+    src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_tagged.c	\
+    src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_msg.c	 	\
+    src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_data.c	 	\
+    src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_progress.c
+
+errnames_txt_files += src/mpid/ch3/channels/nemesis/netmod/ofi/errnames.txt
+
+endif
diff --git a/src/mpid/ch3/channels/nemesis/netmod/ofi/errnames.txt b/src/mpid/ch3/channels/nemesis/netmod/ofi/errnames.txt
new file mode 100644
index 0000000..16c61b0
--- /dev/null
+++ b/src/mpid/ch3/channels/nemesis/netmod/ofi/errnames.txt
@@ -0,0 +1,42 @@
+**ofi_avmap:SFI get address vector map failed
+**ofi_avmap %s %d %s %s:SFI address vector map failed (%s:%d:%s:%s)
+**ofi_tsend:SFI tagged sendto failed
+**ofi_tsend %s %d %s %s:SFI tagged sendto failed (%s:%d:%s:%s)
+**ofi_trecv:SFI tagged recvfrom failed
+**ofi_trecv %s %d %s %s:SFI tagged recvfrom failed (%s:%d:%s:%s)
+**ofi_getinfo:SFI getinfo() failed
+**ofi_getinfo %s %d %s %s:SFI getinfo() failed (%s:%d:%s:%s)
+**ofi_openep:SFI endpoint open failed
+**ofi_openep %s %d %s %s:SFI endpoint open failed (%s:%d:%s:%s)
+**ofi_openfabric:SFI fabric open failure
+**ofi_openfabric %s %d %s %s:SFI fabric open failed (%s:%d:%s:%s)
+**ofi_opendomain:SFI domain open failure
+**ofi_opendomain %s %d %s %s:SFI domain open failed (%s:%d:%s:%s)
+**ofi_opencq:SFI event queue create failure
+**ofi_opencq %s %d %s %s:SFI event queue create failed (%s:%d:%s:%s)
+**ofi_avopen:SFI address vector open failed
+**ofi_avopen %s %d %s %s:SFI address vector open failed (%s:%d:%s:%s)
+**ofi_bind:SFI resource bind failure
+**ofi_bind %s %d %s %s:SFI resource bind failed (%s:%d:%s:%s)
+**ofi_ep_enable:SFI endpoint enable failed
+**ofi_ep_enable %s %d %s %s:SFI endpoint enable failed (%s:%d:%s:%s)
+**ofi_getname:SFI get endpoint name failed
+**ofi_getname %s %d %s %s:SFI get endpoint name failed (%s:%d:%s:%s)
+**ofi_avclose:SFI av close failed
+**ofi_avclose %s %d %s %s:SFI av close failed (%s:%d:%s:%s)
+**ofi_epclose:SFI endpoint close failed
+**ofi_epclose %s %d %s %s:SFI endpoint close failed (%s:%d:%s:%s)
+**ofi_cqclose:SFI cq close failed
+**ofi_cqclose %s %d %s %s:SFI cq close failed (%s:%d:%s:%s)
+**ofi_mrclose:SFI mr close failed
+**ofi_mrclose %s %d %s %s:SFI mr close failed (%s:%d:%s:%s)
+**ofi_fabricclose:SFI fabric close failed
+**ofi_fabricclose %s %d %s %s:SFI fabric close failed (%s:%d:%s:%s)
+**ofi_domainclose:SFI domain close failed
+**ofi_domainclose %s %d %s %s:SFI domain close failed (%s:%d:%s:%s)
+**ofi_tsearch:SFI tsearch failed
+**ofi_tsearch %s %d %s %s:SFI tsearch failed (%s:%d:%s:%s)
+**ofi_poll:SFI poll failed
+**ofi_poll %s %d %s %s:SFI poll failed (%s:%d:%s:%s)
+**ofi_cancel:SFI cancel failed
+**ofi_cancel %s %d %s %s:SFI cancel failed (%s:%d:%s:%s)
diff --git a/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_cm.c b/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_cm.c
similarity index 82%
rename from src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_cm.c
rename to src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_cm.c
index 54bf757..686a905 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_cm.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_cm.c
@@ -7,10 +7,10 @@
  *  to Argonne National Laboratory subject to Software Grant and Corporate
  *  Contributor License Agreement dated February 8, 2012.
  */
-#include "sfi_impl.h"
+#include "ofi_impl.h"
 
 /* ------------------------------------------------------------------------ */
-/* sfi_tag_to_vc                                                            */
+/* ofi_tag_to_vc                                                            */
 /* This routine converts tag information from an incoming preposted receive */
 /* into the VC that uses the routine.  There is a possibility of a small    */
 /* list of temporary VC's that are used during dynamic task management      */
@@ -21,8 +21,8 @@
 /* is enough to look up the VC.                                             */
 /* ------------------------------------------------------------------------ */
 #undef FCNAME
-#define FCNAME DECL_FUNC(sfi_tag_to_vc)
-static inline MPIDI_VC_t *sfi_tag_to_vc(uint64_t match_bits)
+#define FCNAME DECL_FUNC(ofi_tag_to_vc)
+static inline MPIDI_VC_t *ofi_tag_to_vc(uint64_t match_bits)
 {
     int pgid = 0, port = 0;
     MPIDI_VC_t *vc = NULL;
@@ -70,7 +70,7 @@ static inline MPIDI_VC_t *sfi_tag_to_vc(uint64_t match_bits)
 }
 
 /* ------------------------------------------------------------------------ */
-/* MPID_nem_sfi_conn_req_callback                                           */
+/* MPID_nem_ofi_conn_req_callback                                           */
 /* A new process has been created and is connected to the current world     */
 /* The address of the new process is exchanged via the business card        */
 /* instead of being exchanged up front during the creation of the first     */
@@ -86,8 +86,8 @@ static inline MPIDI_VC_t *sfi_tag_to_vc(uint64_t match_bits)
 /* other VC's because they may not be part of a process group.              */
 /* ------------------------------------------------------------------------ */
 #undef FCNAME
-#define FCNAME DECL_FUNC(MPID_nem_sfi_conn_req_callback)
-static inline int MPID_nem_sfi_conn_req_callback(cq_tagged_entry_t * wc, MPID_Request * rreq)
+#define FCNAME DECL_FUNC(MPID_nem_ofi_conn_req_callback)
+static inline int MPID_nem_ofi_conn_req_callback(cq_tagged_entry_t * wc, MPID_Request * rreq)
 {
     int ret, len, mpi_errno = MPI_SUCCESS;
     char bc[SFI_KVSAPPSTRLEN];
@@ -108,7 +108,7 @@ static inline int MPID_nem_sfi_conn_req_callback(cq_tagged_entry_t * wc, MPID_Re
                        0,
                        MPID_CONN_REQ,
                        ~MPID_PROTOCOL_MASK,
-                       (void *) &(REQ_SFI(gl_data.conn_req)->sfi_context)), trecv);
+                       (void *) &(REQ_SFI(gl_data.conn_req)->ofi_context)), trecv);
 
     addr = MPIU_Malloc(gl_data.bound_addrlen);
     MPIU_Assertp(addr);
@@ -143,15 +143,15 @@ static inline int MPID_nem_sfi_conn_req_callback(cq_tagged_entry_t * wc, MPID_Re
 }
 
 /* ------------------------------------------------------------------------ */
-/* MPID_nem_sfi_handle_packet                                               */
+/* MPID_nem_ofi_handle_packet                                               */
 /* The "parent" request tracks the state of the entire rendezvous           */
 /* As "child" requests complete, the cc counter is decremented              */
 /* Notify CH3 that we have an incoming packet (if cc hits 1).  Otherwise    */
 /* decrement the ref counter via request completion                         */
 /* ------------------------------------------------------------------------ */
 #undef FCNAME
-#define FCNAME DECL_FUNC(MPID_nem_sfi_handle_packet)
-static inline int MPID_nem_sfi_handle_packet(cq_tagged_entry_t * wc ATTRIBUTE((unused)),
+#define FCNAME DECL_FUNC(MPID_nem_ofi_handle_packet)
+static inline int MPID_nem_ofi_handle_packet(cq_tagged_entry_t * wc ATTRIBUTE((unused)),
                                              MPID_Request * rreq)
 {
     int mpi_errno = MPI_SUCCESS;
@@ -169,24 +169,24 @@ static inline int MPID_nem_sfi_handle_packet(cq_tagged_entry_t * wc ATTRIBUTE((u
 }
 
 /* ------------------------------------------------------------------------ */
-/* MPID_nem_sfi_cts_send_callback                                           */
-/* A wrapper around MPID_nem_sfi_handle_packet that decrements              */
+/* MPID_nem_ofi_cts_send_callback                                           */
+/* A wrapper around MPID_nem_ofi_handle_packet that decrements              */
 /* the parent request's counter, and cleans up the CTS request              */
 /* ------------------------------------------------------------------------ */
 #undef FCNAME
-#define FCNAME DECL_FUNC(MPID_nem_sfi_cts_send_callback)
-static inline int MPID_nem_sfi_cts_send_callback(cq_tagged_entry_t * wc, MPID_Request * sreq)
+#define FCNAME DECL_FUNC(MPID_nem_ofi_cts_send_callback)
+static inline int MPID_nem_ofi_cts_send_callback(cq_tagged_entry_t * wc, MPID_Request * sreq)
 {
     int mpi_errno = MPI_SUCCESS;
     BEGIN_FUNC(FCNAME);
-    MPI_RC(MPID_nem_sfi_handle_packet(wc, REQ_SFI(sreq)->parent));
+    MPI_RC(MPID_nem_ofi_handle_packet(wc, REQ_SFI(sreq)->parent));
     MPIDI_CH3U_Request_complete(sreq);
     END_FUNC_RC(FCNAME);
 }
 
 /* ------------------------------------------------------------------------ */
-/* MPID_nem_sfi_preposted_callback                                          */
-/* This callback handles incoming "SendContig" messages (see sfi_msg.c)     */
+/* MPID_nem_ofi_preposted_callback                                          */
+/* This callback handles incoming "SendContig" messages (see ofi_msg.c)     */
 /* for the send routines.  This implements the CTS response and the RTS     */
 /* handler.  The steps are as follows:                                      */
 /*   * Create a parent data request and post a receive into a pack buffer   */
@@ -194,8 +194,8 @@ static inline int MPID_nem_sfi_cts_send_callback(cq_tagged_entry_t * wc, MPID_Re
 /*   * Re-Post the RTS receive and handler to handle the next message       */
 /* ------------------------------------------------------------------------ */
 #undef FCNAME
-#define FCNAME DECL_FUNC(MPID_nem_sfi_preposted_callback)
-static inline int MPID_nem_sfi_preposted_callback(cq_tagged_entry_t * wc, MPID_Request * rreq)
+#define FCNAME DECL_FUNC(MPID_nem_ofi_preposted_callback)
+static inline int MPID_nem_ofi_preposted_callback(cq_tagged_entry_t * wc, MPID_Request * rreq)
 {
     int c, mpi_errno = MPI_SUCCESS;
     size_t pkt_len;
@@ -204,7 +204,7 @@ static inline int MPID_nem_sfi_preposted_callback(cq_tagged_entry_t * wc, MPID_R
     MPID_Request *new_rreq, *sreq;
     BEGIN_FUNC(FCNAME);
 
-    vc = sfi_tag_to_vc(wc->tag);
+    vc = ofi_tag_to_vc(wc->tag);
     MPIU_Assert(vc);
     VC_READY_CHECK(vc);
 
@@ -213,11 +213,11 @@ static inline int MPID_nem_sfi_preposted_callback(cq_tagged_entry_t * wc, MPID_R
     MPIU_ERR_CHKANDJUMP1(pack_buffer == NULL, mpi_errno, MPI_ERR_OTHER,
                          "**nomem", "**nomem %s", "Pack Buffer alloc");
     c = 1;
-    MPID_nem_sfi_create_req(&new_rreq, 1);
+    MPID_nem_ofi_create_req(&new_rreq, 1);
     MPID_cc_incr(new_rreq->cc_ptr, &c);
     new_rreq->dev.OnDataAvail = NULL;
     new_rreq->dev.next = NULL;
-    REQ_SFI(new_rreq)->event_callback = MPID_nem_sfi_handle_packet;
+    REQ_SFI(new_rreq)->event_callback = MPID_nem_ofi_handle_packet;
     REQ_SFI(new_rreq)->vc = vc;
     REQ_SFI(new_rreq)->pack_buffer = pack_buffer;
     REQ_SFI(new_rreq)->pack_buffer_size = pkt_len;
@@ -226,19 +226,19 @@ static inline int MPID_nem_sfi_preposted_callback(cq_tagged_entry_t * wc, MPID_R
                        REQ_SFI(new_rreq)->pack_buffer_size,
                        gl_data.mr,
                        VC_SFI(vc)->direct_addr,
-                       wc->tag | MPID_MSG_DATA, 0, &(REQ_SFI(new_rreq)->sfi_context)), trecv);
+                       wc->tag | MPID_MSG_DATA, 0, &(REQ_SFI(new_rreq)->ofi_context)), trecv);
 
-    MPID_nem_sfi_create_req(&sreq, 1);
+    MPID_nem_ofi_create_req(&sreq, 1);
     sreq->dev.OnDataAvail = NULL;
     sreq->dev.next = NULL;
-    REQ_SFI(sreq)->event_callback = MPID_nem_sfi_cts_send_callback;
+    REQ_SFI(sreq)->event_callback = MPID_nem_ofi_cts_send_callback;
     REQ_SFI(sreq)->parent = new_rreq;
     FI_RC(fi_tsend(gl_data.endpoint,
                      NULL,
                      0,
                      gl_data.mr,
                      VC_SFI(vc)->direct_addr,
-                     wc->tag | MPID_MSG_CTS, &(REQ_SFI(sreq)->sfi_context)), tsend);
+                     wc->tag | MPID_MSG_CTS, &(REQ_SFI(sreq)->ofi_context)), tsend);
     MPIU_Assert(gl_data.persistent_req == rreq);
 
     rreq->dev.user_count = 0;
@@ -248,17 +248,17 @@ static inline int MPID_nem_sfi_preposted_callback(cq_tagged_entry_t * wc, MPID_R
                        gl_data.mr,
                        0,
                        MPID_MSG_RTS,
-                       ~MPID_PROTOCOL_MASK, &(REQ_SFI(rreq)->sfi_context)), trecv);
+                       ~MPID_PROTOCOL_MASK, &(REQ_SFI(rreq)->ofi_context)), trecv);
     END_FUNC_RC(FCNAME);
 }
 
 /* ------------------------------------------------------------------------ */
-/* MPID_nem_sfi_connect_to_root_callback                                    */
+/* MPID_nem_ofi_connect_to_root_callback                                    */
 /* Complete and clean up the request                                        */
 /* ------------------------------------------------------------------------ */
 #undef FCNAME
-#define FCNAME DECL_FUNC(MPID_nem_sfi_connect_to_root_callback)
-int MPID_nem_sfi_connect_to_root_callback(cq_tagged_entry_t * wc ATTRIBUTE((unused)),
+#define FCNAME DECL_FUNC(MPID_nem_ofi_connect_to_root_callback)
+int MPID_nem_ofi_connect_to_root_callback(cq_tagged_entry_t * wc ATTRIBUTE((unused)),
                                           MPID_Request * sreq)
 {
     int mpi_errno = MPI_SUCCESS;
@@ -273,14 +273,14 @@ int MPID_nem_sfi_connect_to_root_callback(cq_tagged_entry_t * wc ATTRIBUTE((unus
 }
 
 /* ------------------------------------------------------------------------ */
-/* MPID_nem_sfi_cm_init                                                     */
+/* MPID_nem_ofi_cm_init                                                     */
 /* This is a utility routine that sets up persistent connection management  */
 /* requests and a persistent data request to handle rendezvous SendContig   */
 /* messages.                                                                */
 /* ------------------------------------------------------------------------ */
 #undef FCNAME
-#define FCNAME DECL_FUNC(MPID_nem_sfi_cm_init)
-int MPID_nem_sfi_cm_init(MPIDI_PG_t * pg_p, int pg_rank ATTRIBUTE((unused)))
+#define FCNAME DECL_FUNC(MPID_nem_ofi_cm_init)
+int MPID_nem_ofi_cm_init(MPIDI_PG_t * pg_p, int pg_rank ATTRIBUTE((unused)))
 {
     int mpi_errno = MPI_SUCCESS;
     MPID_Request *persistent_req, *conn_req;
@@ -289,20 +289,20 @@ int MPID_nem_sfi_cm_init(MPIDI_PG_t * pg_p, int pg_rank ATTRIBUTE((unused)))
     /* ------------------------------------- */
     /* Set up CH3 and netmod data structures */
     /* ------------------------------------- */
-    MPI_RC(MPIDI_CH3I_Register_anysource_notification(MPID_nem_sfi_anysource_posted,
-                                                      MPID_nem_sfi_anysource_matched));
-    MPIDI_Anysource_iprobe_fn = MPID_nem_sfi_anysource_iprobe;
-    MPIDI_Anysource_improbe_fn = MPID_nem_sfi_anysource_improbe;
+    MPI_RC(MPIDI_CH3I_Register_anysource_notification(MPID_nem_ofi_anysource_posted,
+                                                      MPID_nem_ofi_anysource_matched));
+    MPIDI_Anysource_iprobe_fn = MPID_nem_ofi_anysource_iprobe;
+    MPIDI_Anysource_improbe_fn = MPID_nem_ofi_anysource_improbe;
     gl_data.pg_p = pg_p;
 
     /* ----------------------------------- */
     /* Post a persistent request to handle */
     /* ----------------------------------- */
-    MPID_nem_sfi_create_req(&persistent_req, 1);
+    MPID_nem_ofi_create_req(&persistent_req, 1);
     persistent_req->dev.OnDataAvail = NULL;
     persistent_req->dev.next = NULL;
     REQ_SFI(persistent_req)->vc = NULL;
-    REQ_SFI(persistent_req)->event_callback = MPID_nem_sfi_preposted_callback;
+    REQ_SFI(persistent_req)->event_callback = MPID_nem_ofi_preposted_callback;
     FI_RC(fi_trecv(gl_data.endpoint,
                        &persistent_req->dev.user_count,
                        sizeof persistent_req->dev.user_count,
@@ -310,25 +310,25 @@ int MPID_nem_sfi_cm_init(MPIDI_PG_t * pg_p, int pg_rank ATTRIBUTE((unused)))
                        0,
                        MPID_MSG_RTS,
                        ~MPID_PROTOCOL_MASK,
-                       (void *) &(REQ_SFI(persistent_req)->sfi_context)), trecv);
+                       (void *) &(REQ_SFI(persistent_req)->ofi_context)), trecv);
     gl_data.persistent_req = persistent_req;
 
     /* --------------------------------- */
     /* Post recv for connection requests */
     /* --------------------------------- */
-    MPID_nem_sfi_create_req(&conn_req, 1);
+    MPID_nem_ofi_create_req(&conn_req, 1);
     conn_req->dev.user_buf = MPIU_Malloc(SFI_KVSAPPSTRLEN * sizeof(char));
     conn_req->dev.OnDataAvail = NULL;
     conn_req->dev.next = NULL;
     REQ_SFI(conn_req)->vc = NULL;       /* We don't know the source yet */
-    REQ_SFI(conn_req)->event_callback = MPID_nem_sfi_conn_req_callback;
+    REQ_SFI(conn_req)->event_callback = MPID_nem_ofi_conn_req_callback;
     FI_RC(fi_trecv(gl_data.endpoint,
                        conn_req->dev.user_buf,
                        SFI_KVSAPPSTRLEN,
                        gl_data.mr,
                        0,
                        MPID_CONN_REQ,
-                       ~MPID_PROTOCOL_MASK, (void *) &(REQ_SFI(conn_req)->sfi_context)), trecv);
+                       ~MPID_PROTOCOL_MASK, (void *) &(REQ_SFI(conn_req)->ofi_context)), trecv);
     gl_data.conn_req = conn_req;
 
 
@@ -341,22 +341,22 @@ int MPID_nem_sfi_cm_init(MPIDI_PG_t * pg_p, int pg_rank ATTRIBUTE((unused)))
 }
 
 /* ------------------------------------------------------------------------ */
-/* MPID_nem_sfi_cm_finalize                                                 */
+/* MPID_nem_ofi_cm_finalize                                                 */
 /* Clean up and cancle the requests initiated by the cm_init routine        */
 /* ------------------------------------------------------------------------ */
 #undef FCNAME
-#define FCNAME DECL_FUNC(MPID_nem_sfi_cm_finalize)
-int MPID_nem_sfi_cm_finalize()
+#define FCNAME DECL_FUNC(MPID_nem_ofi_cm_finalize)
+int MPID_nem_ofi_cm_finalize()
 {
     int mpi_errno = MPI_SUCCESS;
     BEGIN_FUNC(FCNAME);
     FI_RC(fi_cancel((fid_t) gl_data.endpoint,
-                    &(REQ_SFI(gl_data.persistent_req)->sfi_context)), cancel);
+                    &(REQ_SFI(gl_data.persistent_req)->ofi_context)), cancel);
     MPIR_STATUS_SET_CANCEL_BIT(gl_data.persistent_req->status, TRUE);
     MPIR_STATUS_SET_COUNT(gl_data.persistent_req->status, 0);
     MPIDI_CH3U_Request_complete(gl_data.persistent_req);
 
-    FI_RC(fi_cancel((fid_t) gl_data.endpoint, &(REQ_SFI(gl_data.conn_req)->sfi_context)), cancel);
+    FI_RC(fi_cancel((fid_t) gl_data.endpoint, &(REQ_SFI(gl_data.conn_req)->ofi_context)), cancel);
     MPIU_Free(gl_data.conn_req->dev.user_buf);
     MPIR_STATUS_SET_CANCEL_BIT(gl_data.conn_req->status, TRUE);
     MPIR_STATUS_SET_COUNT(gl_data.conn_req->status, 0);
@@ -369,15 +369,15 @@ int MPID_nem_sfi_cm_finalize()
 }
 
 /* ------------------------------------------------------------------------ */
-/* MPID_nem_sfi_vc_connect                                                  */
+/* MPID_nem_ofi_vc_connect                                                  */
 /* Handle CH3/Nemesis VC connections                                        */
 /*   * Query the VC address information.  In particular we are looking for  */
 /*     the fabric address name.                                             */
 /*   * Use fi_av_insert to register the address name with SFI               */
 /* ------------------------------------------------------------------------ */
 #undef FCNAME
-#define FCNAME DECL_FUNC(MPID_nem_sfi_vc_connect)
-int MPID_nem_sfi_vc_connect(MPIDI_VC_t * vc)
+#define FCNAME DECL_FUNC(MPID_nem_ofi_vc_connect)
+int MPID_nem_ofi_vc_connect(MPIDI_VC_t * vc)
 {
     int len, ret, mpi_errno = MPI_SUCCESS;
     char bc[SFI_KVSAPPSTRLEN], *addr = NULL;
@@ -410,26 +410,26 @@ int MPID_nem_sfi_vc_connect(MPIDI_VC_t * vc)
 }
 
 #undef FCNAME
-#define FCNAME DECL_FUNC(MPID_nem_sfi_vc_init)
-int MPID_nem_sfi_vc_init(MPIDI_VC_t * vc)
+#define FCNAME DECL_FUNC(MPID_nem_ofi_vc_init)
+int MPID_nem_ofi_vc_init(MPIDI_VC_t * vc)
 {
     int mpi_errno = MPI_SUCCESS;
     MPIDI_CH3I_VC *const vc_ch = &vc->ch;
-    MPID_nem_sfi_vc_t *const vc_sfi = VC_SFI(vc);
+    MPID_nem_ofi_vc_t *const vc_ofi = VC_SFI(vc);
 
     BEGIN_FUNC(FCNAME);
-    vc->sendNoncontig_fn = MPID_nem_sfi_SendNoncontig;
-    vc_ch->iStartContigMsg = MPID_nem_sfi_iStartContigMsg;
-    vc_ch->iSendContig = MPID_nem_sfi_iSendContig;
+    vc->sendNoncontig_fn = MPID_nem_ofi_SendNoncontig;
+    vc_ch->iStartContigMsg = MPID_nem_ofi_iStartContigMsg;
+    vc_ch->iSendContig = MPID_nem_ofi_iSendContig;
     vc_ch->next = NULL;
     vc_ch->prev = NULL;
-    vc_sfi->is_cmvc = 0;
+    vc_ofi->is_cmvc = 0;
     vc->comm_ops = &_g_comm_ops;
 
     MPIDI_CHANGE_VC_STATE(vc, ACTIVE);
 
     if (NULL == vc->pg) {
-        vc_sfi->is_cmvc = 1;
+        vc_ofi->is_cmvc = 1;
     }
     else {
     }
@@ -438,13 +438,13 @@ int MPID_nem_sfi_vc_init(MPIDI_VC_t * vc)
 }
 
 /* ------------------------------------------------------------------------ */
-/* MPID_nem_sfi_vc_destroy                                                  */
-/* MPID_nem_sfi_vc_terminate                                                */
+/* MPID_nem_ofi_vc_destroy                                                  */
+/* MPID_nem_ofi_vc_terminate                                                */
 /* TODO:  Verify this code has no leaks                                     */
 /* ------------------------------------------------------------------------ */
 #undef FCNAME
-#define FCNAME DECL_FUNC(MPID_nem_sfi_vc_destroy)
-int MPID_nem_sfi_vc_destroy(MPIDI_VC_t * vc)
+#define FCNAME DECL_FUNC(MPID_nem_ofi_vc_destroy)
+int MPID_nem_ofi_vc_destroy(MPIDI_VC_t * vc)
 {
     BEGIN_FUNC(FCNAME);
     if (vc && (VC_SFI(vc)->is_cmvc == 1) && (VC_SFI(vc)->ready == 1)) {
@@ -471,8 +471,8 @@ int MPID_nem_sfi_vc_destroy(MPIDI_VC_t * vc)
 }
 
 #undef FCNAME
-#define FCNAME DECL_FUNC(MPID_nem_sfi_vc_terminate)
-int MPID_nem_sfi_vc_terminate(MPIDI_VC_t * vc)
+#define FCNAME DECL_FUNC(MPID_nem_ofi_vc_terminate)
+int MPID_nem_ofi_vc_terminate(MPIDI_VC_t * vc)
 {
     int mpi_errno = MPI_SUCCESS;
     BEGIN_FUNC(FCNAME);
@@ -484,7 +484,7 @@ int MPID_nem_sfi_vc_terminate(MPIDI_VC_t * vc)
 
 
 /* ------------------------------------------------------------------------ */
-/* MPID_nem_sfi_connect_to_root                                             */
+/* MPID_nem_ofi_connect_to_root                                             */
 /*  * A new unconnected VC (cm/ephemeral VC) has been created.  This code   */
 /*    connects the new VC to a rank in another process group.  The parent   */
 /*    address is obtained by an out of band method and given to this        */
@@ -499,7 +499,7 @@ int MPID_nem_sfi_vc_terminate(MPIDI_VC_t * vc)
 /* ------------------------------------------------------------------------ */
 #undef FCNAME
 #define FCNAME DECL_FUNC(nm_connect_to_root)
-int MPID_nem_sfi_connect_to_root(const char *business_card, MPIDI_VC_t * new_vc)
+int MPID_nem_ofi_connect_to_root(const char *business_card, MPIDI_VC_t * new_vc)
 {
     int len, ret, mpi_errno = MPI_SUCCESS, str_errno = MPI_SUCCESS;
     int my_bc_len = SFI_KVSAPPSTRLEN;
@@ -527,14 +527,14 @@ int MPID_nem_sfi_connect_to_root(const char *business_card, MPIDI_VC_t * new_vc)
     VC_SFI(new_vc)->ready = 1;
     str_errno = MPIU_Str_add_int_arg(&bc, &my_bc_len, "tag", new_vc->port_name_tag);
     MPIU_ERR_CHKANDJUMP(str_errno, mpi_errno, MPI_ERR_OTHER, "**argstr_port_name_tag");
-    MPI_RC(MPID_nem_sfi_get_business_card(MPIR_Process.comm_world->rank, &bc, &my_bc_len));
+    MPI_RC(MPID_nem_ofi_get_business_card(MPIR_Process.comm_world->rank, &bc, &my_bc_len));
     my_bc_len = SFI_KVSAPPSTRLEN - my_bc_len;
 
-    MPID_nem_sfi_create_req(&sreq, 1);
+    MPID_nem_ofi_create_req(&sreq, 1);
     sreq->kind = MPID_REQUEST_SEND;
     sreq->dev.OnDataAvail = NULL;
     sreq->dev.next = NULL;
-    REQ_SFI(sreq)->event_callback = MPID_nem_sfi_connect_to_root_callback;
+    REQ_SFI(sreq)->event_callback = MPID_nem_ofi_connect_to_root_callback;
     REQ_SFI(sreq)->pack_buffer = my_bc;
     conn_req_send_bits = init_sendtag(0, MPIR_Process.comm_world->rank, 0, MPID_CONN_REQ);
     FI_RC(fi_tsend(gl_data.endpoint,
@@ -542,8 +542,8 @@ int MPID_nem_sfi_connect_to_root(const char *business_card, MPIDI_VC_t * new_vc)
                      my_bc_len,
                      gl_data.mr,
                      VC_SFI(new_vc)->direct_addr,
-                     conn_req_send_bits, &(REQ_SFI(sreq)->sfi_context)), tsend);
-    MPID_nem_sfi_poll(MPID_NONBLOCKING_POLL);
+                     conn_req_send_bits, &(REQ_SFI(sreq)->ofi_context)), tsend);
+    MPID_nem_ofi_poll(MPID_NONBLOCKING_POLL);
     VC_SFI(new_vc)->is_cmvc = 1;
     VC_SFI(new_vc)->next = gl_data.cm_vcs;
     gl_data.cm_vcs = new_vc;
@@ -559,8 +559,8 @@ int MPID_nem_sfi_connect_to_root(const char *business_card, MPIDI_VC_t * new_vc)
 }
 
 #undef FCNAME
-#define FCNAME DECL_FUNC(MPID_nem_sfi_get_business_card)
-int MPID_nem_sfi_get_business_card(int my_rank ATTRIBUTE((unused)),
+#define FCNAME DECL_FUNC(MPID_nem_ofi_get_business_card)
+int MPID_nem_ofi_get_business_card(int my_rank ATTRIBUTE((unused)),
                                    char **bc_val_p, int *val_max_sz_p)
 {
     int mpi_errno = MPI_SUCCESS, str_errno = MPIU_STR_SUCCESS;
diff --git a/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_data.c b/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_data.c
similarity index 51%
rename from src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_data.c
rename to src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_data.c
index 1e39684..d89270a 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_data.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_data.c
@@ -7,23 +7,23 @@
  *  to Argonne National Laboratory subject to Software Grant and Corporate
  *  Contributor License Agreement dated February 8, 2012.
  */
-#include "sfi_impl.h"
+#include "ofi_impl.h"
 
 
-MPID_nem_sfi_global_t gl_data;
+MPID_nem_ofi_global_t gl_data;
 
 /* ************************************************************************** */
 /* Netmod Function Table                                                      */
 /* ************************************************************************** */
 MPIDI_Comm_ops_t _g_comm_ops = {
-    MPID_nem_sfi_recv_posted,   /* recv_posted */
+    MPID_nem_ofi_recv_posted,   /* recv_posted */
 
-    MPID_nem_sfi_send,  /* send */
-    MPID_nem_sfi_send,  /* rsend */
-    MPID_nem_sfi_ssend, /* ssend */
-    MPID_nem_sfi_isend, /* isend */
-    MPID_nem_sfi_isend, /* irsend */
-    MPID_nem_sfi_issend,        /* issend */
+    MPID_nem_ofi_send,  /* send */
+    MPID_nem_ofi_send,  /* rsend */
+    MPID_nem_ofi_ssend, /* ssend */
+    MPID_nem_ofi_isend, /* isend */
+    MPID_nem_ofi_isend, /* irsend */
+    MPID_nem_ofi_issend,        /* issend */
 
     NULL,       /* send_init */
     NULL,       /* bsend_init */
@@ -31,28 +31,28 @@ MPIDI_Comm_ops_t _g_comm_ops = {
     NULL,       /* ssend_init */
     NULL,       /* startall */
 
-    MPID_nem_sfi_cancel_send,   /* cancel_send */
-    MPID_nem_sfi_cancel_recv,   /* cancel_recv */
+    MPID_nem_ofi_cancel_send,   /* cancel_send */
+    MPID_nem_ofi_cancel_recv,   /* cancel_recv */
 
     NULL,       /* probe */
-    MPID_nem_sfi_iprobe,        /* iprobe */
-    MPID_nem_sfi_improbe        /* improbe */
+    MPID_nem_ofi_iprobe,        /* iprobe */
+    MPID_nem_ofi_improbe        /* improbe */
 };
 
-MPID_nem_netmod_funcs_t MPIDI_nem_sfi_funcs = {
-    MPID_nem_sfi_init,
-    MPID_nem_sfi_finalize,
+MPID_nem_netmod_funcs_t MPIDI_nem_ofi_funcs = {
+    MPID_nem_ofi_init,
+    MPID_nem_ofi_finalize,
 #ifdef ENABLE_CHECKPOINTING
     NULL,
     NULL,
     NULL,
 #endif
-    MPID_nem_sfi_poll,
-    MPID_nem_sfi_get_business_card,
-    MPID_nem_sfi_connect_to_root,
-    MPID_nem_sfi_vc_init,
-    MPID_nem_sfi_vc_destroy,
-    MPID_nem_sfi_vc_terminate,
-    MPID_nem_sfi_anysource_iprobe,
-    MPID_nem_sfi_anysource_improbe,
+    MPID_nem_ofi_poll,
+    MPID_nem_ofi_get_business_card,
+    MPID_nem_ofi_connect_to_root,
+    MPID_nem_ofi_vc_init,
+    MPID_nem_ofi_vc_destroy,
+    MPID_nem_ofi_vc_terminate,
+    MPID_nem_ofi_anysource_iprobe,
+    MPID_nem_ofi_anysource_improbe,
 };
diff --git a/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_impl.h b/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_impl.h
similarity index 83%
rename from src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_impl.h
rename to src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_impl.h
index d720170..553a5b7 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_impl.h
+++ b/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_impl.h
@@ -54,7 +54,7 @@ typedef struct {
     MPID_Request *persistent_req;       /* Unexpected request queue    */
     MPID_Request *conn_req;     /* Connection request          */
     MPIDI_Comm_ops_t comm_ops;
-} MPID_nem_sfi_global_t;
+} MPID_nem_ofi_global_t;
 
 /* ******************************** */
 /* Device channel specific data     */
@@ -65,15 +65,15 @@ typedef struct {
     int ready;                  /* VC ready state     */
     int is_cmvc;                /* Cleanup VC         */
     MPIDI_VC_t *next;           /* VC queue           */
-} MPID_nem_sfi_vc_t;
-#define VC_SFI(vc) ((MPID_nem_sfi_vc_t *)vc->ch.netmod_area.padding)
+} MPID_nem_ofi_vc_t;
+#define VC_SFI(vc) ((MPID_nem_ofi_vc_t *)vc->ch.netmod_area.padding)
 
 /* ******************************** */
 /* Per request object data          */
 /* SFI/Netmod specific              */
 /* ******************************** */
 typedef struct {
-    context_t sfi_context;      /* Context Object              */
+    context_t ofi_context;      /* Context Object              */
     void *addr;                 /* SFI Address                 */
     event_callback_fn event_callback;   /* Callback Event              */
     char *pack_buffer;          /* MPI Pack Buffer             */
@@ -83,8 +83,8 @@ typedef struct {
     MPIDI_VC_t *vc;             /* VC paired with this request */
     uint64_t tag;               /* 64 bit tag request          */
     MPID_Request *parent;       /* Parent request              */
-} MPID_nem_sfi_req_t;
-#define REQ_SFI(req) ((MPID_nem_sfi_req_t *)((req)->ch.netmod_area.padding))
+} MPID_nem_ofi_req_t;
+#define REQ_SFI(req) ((MPID_nem_ofi_req_t *)((req)->ch.netmod_area.padding))
 
 /* ******************************** */
 /* Logging and function macros      */
@@ -119,8 +119,8 @@ fn_fail:                      \
       MPIU_ERR_##CHKANDJUMP4(_ret<0,                            \
                            mpi_errno,                           \
                            MPI_ERR_OTHER,                       \
-                           "**sfi_"#STR,                        \
-                           "**sfi_"#STR" %s %d %s %s",          \
+                           "**ofi_"#STR,                        \
+                           "**ofi_"#STR" %s %d %s %s",          \
                            __SHORT_FILE__,                      \
                            __LINE__,                            \
                            FCNAME,                              \
@@ -134,8 +134,8 @@ fn_fail:                      \
       MPIU_ERR_##CHKANDJUMP4(pmi_errno!=PMI_SUCCESS,            \
                            mpi_errno,                           \
                            MPI_ERR_OTHER,                       \
-                           "**sfi_"#STR,                        \
-                           "**sfi_"#STR" %s %d %s %s",          \
+                           "**ofi_"#STR,                        \
+                           "**ofi_"#STR" %s %d %s %s",          \
                            __SHORT_FILE__,                      \
                            __LINE__,                            \
                            FCNAME,                              \
@@ -152,7 +152,7 @@ fn_fail:                      \
 #define VC_READY_CHECK(vc)                      \
 ({                                              \
   if (1 != VC_SFI(vc)->ready) {                 \
-    MPI_RC(MPID_nem_sfi_vc_connect(vc));        \
+    MPI_RC(MPID_nem_ofi_vc_connect(vc));        \
   }                                             \
 })
 
@@ -202,19 +202,19 @@ fn_fail:                      \
 /* ******************************** */
 /* Request manipulation inlines     */
 /* ******************************** */
-static inline void MPID_nem_sfi_init_req(MPID_Request * req)
+static inline void MPID_nem_ofi_init_req(MPID_Request * req)
 {
-    memset(REQ_SFI(req), 0, sizeof(MPID_nem_sfi_req_t));
+    memset(REQ_SFI(req), 0, sizeof(MPID_nem_ofi_req_t));
 }
 
-static inline int MPID_nem_sfi_create_req(MPID_Request ** request, int refcnt)
+static inline int MPID_nem_ofi_create_req(MPID_Request ** request, int refcnt)
 {
     int mpi_errno = MPI_SUCCESS;
     MPID_Request *req;
     req = MPID_Request_create();
     MPIU_Assert(req);
     MPIU_Object_set_ref(req, refcnt);
-    MPID_nem_sfi_init_req(req);
+    MPID_nem_ofi_init_req(req);
     *request = req;
     return mpi_errno;
 }
@@ -285,38 +285,38 @@ static inline int get_port(uint64_t match_bits)
 /* ************************************************************************** */
 /* MPICH Comm Override and Netmod functions                                   */
 /* ************************************************************************** */
-int MPID_nem_sfi_recv_posted(struct MPIDI_VC *vc, struct MPID_Request *req);
-int MPID_nem_sfi_send(struct MPIDI_VC *vc, const void *buf, int count,
+int MPID_nem_ofi_recv_posted(struct MPIDI_VC *vc, struct MPID_Request *req);
+int MPID_nem_ofi_send(struct MPIDI_VC *vc, const void *buf, int count,
                       MPI_Datatype datatype, int dest, int tag, MPID_Comm * comm,
                       int context_offset, struct MPID_Request **request);
-int MPID_nem_sfi_isend(struct MPIDI_VC *vc, const void *buf, int count,
+int MPID_nem_ofi_isend(struct MPIDI_VC *vc, const void *buf, int count,
                        MPI_Datatype datatype, int dest, int tag, MPID_Comm * comm,
                        int context_offset, struct MPID_Request **request);
-int MPID_nem_sfi_ssend(struct MPIDI_VC *vc, const void *buf, int count,
+int MPID_nem_ofi_ssend(struct MPIDI_VC *vc, const void *buf, int count,
                        MPI_Datatype datatype, int dest, int tag, MPID_Comm * comm,
                        int context_offset, struct MPID_Request **request);
-int MPID_nem_sfi_issend(struct MPIDI_VC *vc, const void *buf, int count,
+int MPID_nem_ofi_issend(struct MPIDI_VC *vc, const void *buf, int count,
                         MPI_Datatype datatype, int dest, int tag, MPID_Comm * comm,
                         int context_offset, struct MPID_Request **request);
-int MPID_nem_sfi_cancel_send(struct MPIDI_VC *vc, struct MPID_Request *sreq);
-int MPID_nem_sfi_cancel_recv(struct MPIDI_VC *vc, struct MPID_Request *rreq);
-int MPID_nem_sfi_iprobe(struct MPIDI_VC *vc, int source, int tag, MPID_Comm * comm,
+int MPID_nem_ofi_cancel_send(struct MPIDI_VC *vc, struct MPID_Request *sreq);
+int MPID_nem_ofi_cancel_recv(struct MPIDI_VC *vc, struct MPID_Request *rreq);
+int MPID_nem_ofi_iprobe(struct MPIDI_VC *vc, int source, int tag, MPID_Comm * comm,
                         int context_offset, int *flag, MPI_Status * status);
-int MPID_nem_sfi_improbe(struct MPIDI_VC *vc, int source, int tag, MPID_Comm * comm,
+int MPID_nem_ofi_improbe(struct MPIDI_VC *vc, int source, int tag, MPID_Comm * comm,
                          int context_offset, int *flag, MPID_Request ** message,
                          MPI_Status * status);
-int MPID_nem_sfi_anysource_iprobe(int tag, MPID_Comm * comm, int context_offset,
+int MPID_nem_ofi_anysource_iprobe(int tag, MPID_Comm * comm, int context_offset,
                                   int *flag, MPI_Status * status);
-int MPID_nem_sfi_anysource_improbe(int tag, MPID_Comm * comm, int context_offset,
+int MPID_nem_ofi_anysource_improbe(int tag, MPID_Comm * comm, int context_offset,
                                    int *flag, MPID_Request ** message, MPI_Status * status);
-void MPID_nem_sfi_anysource_posted(MPID_Request * rreq);
-int MPID_nem_sfi_anysource_matched(MPID_Request * rreq);
-int MPID_nem_sfi_send_data(cq_tagged_entry_t * wc, MPID_Request * sreq);
-int MPID_nem_sfi_SendNoncontig(MPIDI_VC_t * vc, MPID_Request * sreq,
+void MPID_nem_ofi_anysource_posted(MPID_Request * rreq);
+int MPID_nem_ofi_anysource_matched(MPID_Request * rreq);
+int MPID_nem_ofi_send_data(cq_tagged_entry_t * wc, MPID_Request * sreq);
+int MPID_nem_ofi_SendNoncontig(MPIDI_VC_t * vc, MPID_Request * sreq,
                                void *hdr, MPIDI_msg_sz_t hdr_sz);
-int MPID_nem_sfi_iStartContigMsg(MPIDI_VC_t * vc, void *hdr, MPIDI_msg_sz_t hdr_sz,
+int MPID_nem_ofi_iStartContigMsg(MPIDI_VC_t * vc, void *hdr, MPIDI_msg_sz_t hdr_sz,
                                  void *data, MPIDI_msg_sz_t data_sz, MPID_Request ** sreq_ptr);
-int MPID_nem_sfi_iSendContig(MPIDI_VC_t * vc, MPID_Request * sreq, void *hdr,
+int MPID_nem_ofi_iSendContig(MPIDI_VC_t * vc, MPID_Request * sreq, void *hdr,
                              MPIDI_msg_sz_t hdr_sz, void *data, MPIDI_msg_sz_t data_sz);
 
 /* ************************************************************************** */
@@ -324,19 +324,19 @@ int MPID_nem_sfi_iSendContig(MPIDI_VC_t * vc, MPID_Request * sreq, void *hdr,
 /* ************************************************************************** */
 #define MPID_NONBLOCKING_POLL 0
 #define MPID_BLOCKING_POLL 1
-int MPID_nem_sfi_init(MPIDI_PG_t * pg_p, int pg_rank, char **bc_val_p, int *val_max_sz_p);
-int MPID_nem_sfi_finalize(void);
-int MPID_nem_sfi_vc_init(MPIDI_VC_t * vc);
-int MPID_nem_sfi_get_business_card(int my_rank, char **bc_val_p, int *val_max_sz_p);
-int MPID_nem_sfi_poll(int in_blocking_poll);
-int MPID_nem_sfi_vc_terminate(MPIDI_VC_t * vc);
-int MPID_nem_sfi_vc_connect(MPIDI_VC_t * vc);
-int MPID_nem_sfi_connect_to_root(const char *business_card, MPIDI_VC_t * new_vc);
-int MPID_nem_sfi_vc_destroy(MPIDI_VC_t * vc);
-int MPID_nem_sfi_cm_init(MPIDI_PG_t * pg_p, int pg_rank);
-int MPID_nem_sfi_cm_finalize();
+int MPID_nem_ofi_init(MPIDI_PG_t * pg_p, int pg_rank, char **bc_val_p, int *val_max_sz_p);
+int MPID_nem_ofi_finalize(void);
+int MPID_nem_ofi_vc_init(MPIDI_VC_t * vc);
+int MPID_nem_ofi_get_business_card(int my_rank, char **bc_val_p, int *val_max_sz_p);
+int MPID_nem_ofi_poll(int in_blocking_poll);
+int MPID_nem_ofi_vc_terminate(MPIDI_VC_t * vc);
+int MPID_nem_ofi_vc_connect(MPIDI_VC_t * vc);
+int MPID_nem_ofi_connect_to_root(const char *business_card, MPIDI_VC_t * new_vc);
+int MPID_nem_ofi_vc_destroy(MPIDI_VC_t * vc);
+int MPID_nem_ofi_cm_init(MPIDI_PG_t * pg_p, int pg_rank);
+int MPID_nem_ofi_cm_finalize();
 
-extern MPID_nem_sfi_global_t gl_data;
+extern MPID_nem_ofi_global_t gl_data;
 extern MPIDI_Comm_ops_t _g_comm_ops;
 
 #endif
diff --git a/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_init.c b/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_init.c
similarity index 92%
rename from src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_init.c
rename to src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_init.c
index 586e522..cd9e7c7 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_init.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_init.c
@@ -7,14 +7,14 @@
  *  to Argonne National Laboratory subject to Software Grant and Corporate
  *  Contributor License Agreement dated February 8, 2012.
  */
-#include "sfi_impl.h"
+#include "ofi_impl.h"
 
 static inline int dump_and_choose_providers(info_t * prov, info_t ** prov_use);
 static inline int compile_time_checking();
 
 #undef FCNAME
-#define FCNAME DECL_FUNC(MPID_nem_sfi_init)
-int MPID_nem_sfi_init(MPIDI_PG_t * pg_p, int pg_rank, char **bc_val_p, int *val_max_sz_p)
+#define FCNAME DECL_FUNC(MPID_nem_ofi_init)
+int MPID_nem_ofi_init(MPIDI_PG_t * pg_p, int pg_rank, char **bc_val_p, int *val_max_sz_p)
 {
     int ret, fi_version, i, len, pmi_errno;
     int mpi_errno = MPI_SUCCESS;
@@ -92,7 +92,7 @@ int MPID_nem_sfi_init(MPIDI_PG_t * pg_p, int pg_rank, char **bc_val_p, int *val_
                      &prov_tagged),     /* Out: List of providers that match hints   */
           getinfo);
     MPIU_ERR_CHKANDJUMP4(prov_tagged == NULL, mpi_errno, MPI_ERR_OTHER,
-                         "**sfi_getinfo", "**sfi_getinfo %s %d %s %s",
+                         "**ofi_getinfo", "**ofi_getinfo %s %d %s %s",
                          __SHORT_FILE__, __LINE__, FCNAME, "No tag matching provider found");
     /* ------------------------------------------------------------------------ */
     /* Open fabric                                                              */
@@ -197,7 +197,7 @@ int MPID_nem_sfi_init(MPIDI_PG_t * pg_p, int pg_rank, char **bc_val_p, int *val_
     /* Get our business card            */
     /* -------------------------------- */
     my_bc = *bc_val_p;
-    MPI_RC(MPID_nem_sfi_get_business_card(pg_rank, bc_val_p, val_max_sz_p));
+    MPI_RC(MPID_nem_ofi_get_business_card(pg_rank, bc_val_p, val_max_sz_p));
 
     /* -------------------------------- */
     /* Publish the business card        */
@@ -273,7 +273,7 @@ int MPID_nem_sfi_init(MPIDI_PG_t * pg_p, int pg_rank, char **bc_val_p, int *val_
     /* required, like connection management and      */
     /* startcontig messages                          */
     /* --------------------------------------------- */
-    MPI_RC(MPID_nem_sfi_cm_init(pg_p, pg_rank));
+    MPI_RC(MPID_nem_ofi_cm_init(pg_p, pg_rank));
   fn_exit:
     if (fi_addrs)
         MPIU_Free(fi_addrs);
@@ -285,8 +285,8 @@ int MPID_nem_sfi_init(MPIDI_PG_t * pg_p, int pg_rank, char **bc_val_p, int *val_
 }
 
 #undef FCNAME
-#define FCNAME DECL_FUNC(MPID_nem_sfi_finalize)
-int MPID_nem_sfi_finalize(void)
+#define FCNAME DECL_FUNC(MPID_nem_ofi_finalize)
+int MPID_nem_ofi_finalize(void)
 {
     int mpi_errno = MPI_SUCCESS;
     int ret = 0;
@@ -303,7 +303,7 @@ int MPID_nem_sfi_finalize(void)
     /* Cancels any persistent/global requests and    */
     /* frees any resources from cm_init()            */
     /* --------------------------------------------- */
-    MPI_RC(MPID_nem_sfi_cm_finalize());
+    MPI_RC(MPID_nem_ofi_cm_finalize());
 
     FI_RC(fi_close((fid_t) gl_data.mr), mrclose);
     FI_RC(fi_close((fid_t) gl_data.av), avclose);
@@ -316,8 +316,8 @@ int MPID_nem_sfi_finalize(void)
 
 static inline int compile_time_checking()
 {
-    SFI_COMPILE_TIME_ASSERT(sizeof(MPID_nem_sfi_vc_t) <= MPID_NEM_VC_NETMOD_AREA_LEN);
-    SFI_COMPILE_TIME_ASSERT(sizeof(MPID_nem_sfi_req_t) <= MPID_NEM_REQ_NETMOD_AREA_LEN);
+    SFI_COMPILE_TIME_ASSERT(sizeof(MPID_nem_ofi_vc_t) <= MPID_NEM_VC_NETMOD_AREA_LEN);
+    SFI_COMPILE_TIME_ASSERT(sizeof(MPID_nem_ofi_req_t) <= MPID_NEM_REQ_NETMOD_AREA_LEN);
     SFI_COMPILE_TIME_ASSERT(sizeof(iovec_t) == sizeof(MPID_IOV));
     MPIU_Assert(((void *) &(((iovec_t *) 0)->iov_base)) ==
                 ((void *) &(((MPID_IOV *) 0)->MPID_IOV_BUF)));
@@ -333,28 +333,28 @@ static inline int compile_time_checking()
     /* likely needs a MPIU_ERR_REGISTER macro                                   */
     /* ------------------------------------------------------------------------ */
 #if 0
-    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_avmap", "**sfi_avmap %s %d %s %s", a, b, a, a);
-    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_tsend", "**sfi_tsend %s %d %s %s", a, b, a, a);
-    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_trecv", "**sfi_trecv %s %d %s %s", a, b, a, a);
-    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_getinfo", "**sfi_getinfo %s %d %s %s", a, b, a, a);
-    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_openep", "**sfi_openep %s %d %s %s", a, b, a, a);
-    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_openfabric", "**sfi_openfabric %s %d %s %s", a, b, a, a);
-    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_opendomain", "**sfi_opendomain %s %d %s %s", a, b, a, a);
-    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_opencq", "**sfi_opencq %s %d %s %s", a, b, a, a);
-    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_avopen", "**sfi_avopen %s %d %s %s", a, b, a, a);
-    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_bind", "**sfi_bind %s %d %s %s", a, b, a, a);
-    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_ep_enable", "**sfi_ep_enable %s %d %s %s", a, b, a, a);
-    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_getname", "**sfi_getname %s %d %s %s", a, b, a, a);
-    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_avclose", "**sfi_avclose %s %d %s %s", a, b, a, a);
-    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_epclose", "**sfi_epclose %s %d %s %s", a, b, a, a);
-    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_cqclose", "**sfi_cqclose %s %d %s %s", a, b, a, a);
-    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_fabricclose", "**sfi_fabricclose %s %d %s %s", a, b, a,
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**ofi_avmap", "**ofi_avmap %s %d %s %s", a, b, a, a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**ofi_tsend", "**ofi_tsend %s %d %s %s", a, b, a, a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**ofi_trecv", "**ofi_trecv %s %d %s %s", a, b, a, a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**ofi_getinfo", "**ofi_getinfo %s %d %s %s", a, b, a, a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**ofi_openep", "**ofi_openep %s %d %s %s", a, b, a, a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**ofi_openfabric", "**ofi_openfabric %s %d %s %s", a, b, a, a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**ofi_opendomain", "**ofi_opendomain %s %d %s %s", a, b, a, a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**ofi_opencq", "**ofi_opencq %s %d %s %s", a, b, a, a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**ofi_avopen", "**ofi_avopen %s %d %s %s", a, b, a, a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**ofi_bind", "**ofi_bind %s %d %s %s", a, b, a, a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**ofi_ep_enable", "**ofi_ep_enable %s %d %s %s", a, b, a, a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**ofi_getname", "**ofi_getname %s %d %s %s", a, b, a, a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**ofi_avclose", "**ofi_avclose %s %d %s %s", a, b, a, a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**ofi_epclose", "**ofi_epclose %s %d %s %s", a, b, a, a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**ofi_cqclose", "**ofi_cqclose %s %d %s %s", a, b, a, a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**ofi_fabricclose", "**ofi_fabricclose %s %d %s %s", a, b, a,
                   a);
-    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_domainclose", "**sfi_domainclose %s %d %s %s", a, b, a,
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**ofi_domainclose", "**ofi_domainclose %s %d %s %s", a, b, a,
                   a);
-    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_tsearch", "**sfi_tsearch %s %d %s %s", a, b, a, a);
-    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_poll", "**sfi_poll %s %d %s %s", a, b, a, a);
-    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_cancel", "**sfi_cancel %s %d %s %s", a, b, a, a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**ofi_tsearch", "**ofi_tsearch %s %d %s %s", a, b, a, a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**ofi_poll", "**ofi_poll %s %d %s %s", a, b, a, a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**ofi_cancel", "**ofi_cancel %s %d %s %s", a, b, a, a);
 #endif
     return 0;
 }
diff --git a/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_msg.c b/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_msg.c
similarity index 90%
rename from src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_msg.c
rename to src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_msg.c
index c202997..ffa3761 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_msg.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_msg.c
@@ -7,7 +7,7 @@
  *  to Argonne National Laboratory subject to Software Grant and Corporate
  *  Contributor License Agreement dated February 8, 2012.
  */
-#include "sfi_impl.h"
+#include "ofi_impl.h"
 
 /* ------------------------------------------------------------------------ */
 /* GET_PGID_AND_SET_MATCH macro looks up the process group to find the      */
@@ -44,7 +44,7 @@
 /* is based on a tagged rendezvous message.                                 */
 /* The rendezvous is implemented with an RTS-CTS-Data send protocol:        */
 /* CTS_POST()   |                                  |                        */
-/* RTS_SEND()   | -------------------------------> | ue_callback()(sfi_cm.c)*/
+/* RTS_SEND()   | -------------------------------> | ue_callback()(ofi_cm.c)*/
 /*              |                                  |   pack_buffer()        */
 /*              |                                  |   DATA_POST()          */
 /*              |                                  |   RTS_POST()           */
@@ -61,16 +61,16 @@
     c = 1;                                                              \
     MPID_cc_incr(sreq->cc_ptr, &c);                                     \
     MPID_cc_incr(sreq->cc_ptr, &c);                                     \
-    REQ_SFI(sreq)->event_callback   = MPID_nem_sfi_data_callback;       \
+    REQ_SFI(sreq)->event_callback   = MPID_nem_ofi_data_callback;       \
     REQ_SFI(sreq)->pack_buffer      = pack_buffer;                      \
     REQ_SFI(sreq)->pack_buffer_size = pkt_len;                          \
     REQ_SFI(sreq)->vc               = vc;                               \
     REQ_SFI(sreq)->tag              = match_bits;                       \
                                                                         \
-    MPID_nem_sfi_create_req(&cts_req, 1);                               \
+    MPID_nem_ofi_create_req(&cts_req, 1);                               \
     cts_req->dev.OnDataAvail         = NULL;                            \
     cts_req->dev.next                = NULL;                            \
-    REQ_SFI(cts_req)->event_callback = MPID_nem_sfi_cts_recv_callback;  \
+    REQ_SFI(cts_req)->event_callback = MPID_nem_ofi_cts_recv_callback;  \
     REQ_SFI(cts_req)->parent         = sreq;                            \
                                                                         \
     FI_RC(fi_trecv(gl_data.endpoint,                                \
@@ -80,14 +80,14 @@
                        VC_SFI(vc)->direct_addr,                         \
                        match_bits | MPID_MSG_CTS,                       \
                        0, /* Exact tag match, no ignore bits */         \
-                       &(REQ_SFI(cts_req)->sfi_context)),trecv);    \
+                       &(REQ_SFI(cts_req)->ofi_context)),trecv);    \
     FI_RC(fi_tsend(gl_data.endpoint,                                  \
                      &REQ_SFI(sreq)->pack_buffer_size,                  \
                      sizeof(REQ_SFI(sreq)->pack_buffer_size),           \
                      gl_data.mr,                                        \
                      VC_SFI(vc)->direct_addr,                           \
                      match_bits,                                        \
-                     &(REQ_SFI(sreq)->sfi_context)),tsend);           \
+                     &(REQ_SFI(sreq)->ofi_context)),tsend);           \
   })
 
 
@@ -97,8 +97,8 @@
 /* bulk data transfer.  On data send completion, the request can be freed   */
 /* ------------------------------------------------------------------------ */
 #undef FCNAME
-#define FCNAME DECL_FUNC(MPID_nem_sfi_data_callback)
-static int MPID_nem_sfi_data_callback(cq_tagged_entry_t * wc, MPID_Request * sreq)
+#define FCNAME DECL_FUNC(MPID_nem_ofi_data_callback)
+static int MPID_nem_ofi_data_callback(cq_tagged_entry_t * wc, MPID_Request * sreq)
 {
     int complete = 0, mpi_errno = MPI_SUCCESS;
     MPIDI_VC_t *vc;
@@ -113,7 +113,7 @@ static int MPID_nem_sfi_data_callback(cq_tagged_entry_t * wc, MPID_Request * sre
                          REQ_SFI(sreq)->pack_buffer_size,
                          gl_data.mr,
                          VC_SFI(vc)->direct_addr,
-                         wc->tag | MPID_MSG_DATA, (void *) &(REQ_SFI(sreq)->sfi_context)), tsend);
+                         wc->tag | MPID_MSG_DATA, (void *) &(REQ_SFI(sreq)->ofi_context)), tsend);
     }
     if (sreq->cc == 1) {
         if (REQ_SFI(sreq)->pack_buffer)
@@ -135,16 +135,16 @@ static int MPID_nem_sfi_data_callback(cq_tagged_entry_t * wc, MPID_Request * sre
 }
 
 /* ------------------------------------------------------------------------ */
-/* Signals the CTS has been received.  Call MPID_nem_sfi_data_callback on   */
+/* Signals the CTS has been received.  Call MPID_nem_ofi_data_callback on   */
 /* the parent send request to kick off the bulk data transfer               */
 /* ------------------------------------------------------------------------ */
 #undef FCNAME
-#define FCNAME DECL_FUNC(MPID_nem_sfi_cts_recv_callback)
-static int MPID_nem_sfi_cts_recv_callback(cq_tagged_entry_t * wc, MPID_Request * rreq)
+#define FCNAME DECL_FUNC(MPID_nem_ofi_cts_recv_callback)
+static int MPID_nem_ofi_cts_recv_callback(cq_tagged_entry_t * wc, MPID_Request * rreq)
 {
     int mpi_errno = MPI_SUCCESS;
     BEGIN_FUNC(FCNAME);
-    MPI_RC(MPID_nem_sfi_data_callback(wc, REQ_SFI(rreq)->parent));
+    MPI_RC(MPID_nem_ofi_data_callback(wc, REQ_SFI(rreq)->parent));
     MPIDI_CH3U_Request_complete(rreq);
     END_FUNC_RC(FCNAME);
 }
@@ -158,8 +158,8 @@ static int MPID_nem_sfi_cts_recv_callback(cq_tagged_entry_t * wc, MPID_Request *
 /* functions over a tagged msg interface                                    */
 /* ------------------------------------------------------------------------ */
 #undef FCNAME
-#define FCNAME DECL_FUNC(MPID_nem_sfi_iSendContig)
-int MPID_nem_sfi_iSendContig(MPIDI_VC_t * vc,
+#define FCNAME DECL_FUNC(MPID_nem_ofi_iSendContig)
+int MPID_nem_ofi_iSendContig(MPIDI_VC_t * vc,
                              MPID_Request * sreq,
                              void *hdr, MPIDI_msg_sz_t hdr_sz, void *data, MPIDI_msg_sz_t data_sz)
 {
@@ -170,7 +170,7 @@ int MPID_nem_sfi_iSendContig(MPIDI_VC_t * vc,
 
     BEGIN_FUNC(FCNAME);
     MPIU_Assert(hdr_sz <= (MPIDI_msg_sz_t) sizeof(MPIDI_CH3_Pkt_t));
-    MPID_nem_sfi_init_req(sreq);
+    MPID_nem_ofi_init_req(sreq);
     pkt_len = sizeof(MPIDI_CH3_Pkt_t) + data_sz;
     pack_buffer = MPIU_Malloc(pkt_len);
     MPIU_Assert(pack_buffer);
@@ -181,8 +181,8 @@ int MPID_nem_sfi_iSendContig(MPIDI_VC_t * vc,
 }
 
 #undef FCNAME
-#define FCNAME DECL_FUNC(MPID_nem_sfi_SendNoncontig)
-int MPID_nem_sfi_SendNoncontig(MPIDI_VC_t * vc,
+#define FCNAME DECL_FUNC(MPID_nem_ofi_SendNoncontig)
+int MPID_nem_ofi_SendNoncontig(MPIDI_VC_t * vc,
                                MPID_Request * sreq, void *hdr, MPIDI_msg_sz_t hdr_sz)
 {
     int c, pgid, pkt_len, mpi_errno = MPI_SUCCESS;
@@ -202,13 +202,13 @@ int MPID_nem_sfi_SendNoncontig(MPIDI_VC_t * vc,
     MPIU_Memcpy(pack_buffer, hdr, hdr_sz);
     MPID_Segment_pack(sreq->dev.segment_ptr, 0, &data_sz, pack_buffer + sizeof(MPIDI_CH3_Pkt_t));
     START_COMM();
-    MPID_nem_sfi_poll(MPID_NONBLOCKING_POLL);
+    MPID_nem_ofi_poll(MPID_NONBLOCKING_POLL);
     END_FUNC_RC(FCNAME);
 }
 
 #undef FCNAME
-#define FCNAME DECL_FUNC(MPID_nem_sfi_iStartContigMsg)
-int MPID_nem_sfi_iStartContigMsg(MPIDI_VC_t * vc,
+#define FCNAME DECL_FUNC(MPID_nem_ofi_iStartContigMsg)
+int MPID_nem_ofi_iStartContigMsg(MPIDI_VC_t * vc,
                                  void *hdr,
                                  MPIDI_msg_sz_t hdr_sz,
                                  void *data, MPIDI_msg_sz_t data_sz, MPID_Request ** sreq_ptr)
@@ -221,7 +221,7 @@ int MPID_nem_sfi_iStartContigMsg(MPIDI_VC_t * vc,
     BEGIN_FUNC(FCNAME);
     MPIU_Assert(hdr_sz <= (MPIDI_msg_sz_t) sizeof(MPIDI_CH3_Pkt_t));
 
-    MPID_nem_sfi_create_req(&sreq, 2);
+    MPID_nem_ofi_create_req(&sreq, 2);
     sreq->kind = MPID_REQUEST_SEND;
     sreq->dev.OnDataAvail = NULL;
     sreq->dev.next = NULL;
diff --git a/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_progress.c b/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_progress.c
similarity index 89%
rename from src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_progress.c
rename to src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_progress.c
index fdb1bbf..ebe1fa2 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_progress.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_progress.c
@@ -7,7 +7,7 @@
  *  to Argonne National Laboratory subject to Software Grant and Corporate
  *  Contributor License Agreement dated February 8, 2012.
  */
-#include "sfi_impl.h"
+#include "ofi_impl.h"
 
 #define TSEARCH_INIT      0
 #define TSEARCH_NOT_FOUND 1
@@ -16,9 +16,9 @@
 /* ------------------------------------------------------------------------ */
 /* This routine looks up the request that contains a context object         */
 /* ------------------------------------------------------------------------ */
-static inline MPID_Request *context_to_req(void *sfi_context)
+static inline MPID_Request *context_to_req(void *ofi_context)
 {
-    return (MPID_Request *) container_of(sfi_context, MPID_Request, ch.netmod_area.padding);
+    return (MPID_Request *) container_of(ofi_context, MPID_Request, ch.netmod_area.padding);
 }
 
 /* ------------------------------------------------------------------------ */
@@ -64,8 +64,8 @@ static int tsearch_callback(cq_tagged_entry_t * wc, MPID_Request * rreq)
 }
 
 #undef FCNAME
-#define FCNAME DECL_FUNC(MPID_nem_sfi_iprobe_impl)
-int MPID_nem_sfi_iprobe_impl(struct MPIDI_VC *vc,
+#define FCNAME DECL_FUNC(MPID_nem_ofi_iprobe_impl)
+int MPID_nem_ofi_iprobe_impl(struct MPIDI_VC *vc,
                              int source,
                              int tag,
                              MPID_Comm * comm,
@@ -110,7 +110,7 @@ int MPID_nem_sfi_iprobe_impl(struct MPIDI_VC *vc,
                      0, /* Flags                */
                      &remote_proc,      /* Remote Address       */
                      &len,      /* Out:  incoming msglen */
-                     &(REQ_SFI(rreq)->sfi_context));    /* Nonblocking context  */
+                     &(REQ_SFI(rreq)->ofi_context));    /* Nonblocking context  */
     if (ret == -FI_ENOMSG) {
         *flag = 0;
         goto fn_exit;
@@ -123,11 +123,11 @@ int MPID_nem_sfi_iprobe_impl(struct MPIDI_VC *vc,
     }
     else {
         MPIU_ERR_CHKANDJUMP4((ret < 0), mpi_errno, MPI_ERR_OTHER,
-                             "**sfi_tsearch", "**sfi_tsearch %s %d %s %s",
+                             "**ofi_tsearch", "**ofi_tsearch %s %d %s %s",
                              __SHORT_FILE__, __LINE__, FCNAME, fi_strerror(-ret));
     }
     while (TSEARCH_INIT == REQ_SFI(rreq)->match_state)
-        MPID_nem_sfi_poll(MPID_BLOCKING_POLL);
+        MPID_nem_ofi_poll(MPID_BLOCKING_POLL);
 
     if (REQ_SFI(rreq)->match_state == TSEARCH_NOT_FOUND) {
         if (rreq_ptr) {
@@ -144,22 +144,22 @@ int MPID_nem_sfi_iprobe_impl(struct MPIDI_VC *vc,
 }
 
 #undef FCNAME
-#define FCNAME DECL_FUNC(MPID_nem_sfi_iprobe)
-int MPID_nem_sfi_iprobe(struct MPIDI_VC *vc,
+#define FCNAME DECL_FUNC(MPID_nem_ofi_iprobe)
+int MPID_nem_ofi_iprobe(struct MPIDI_VC *vc,
                         int source,
                         int tag,
                         MPID_Comm * comm, int context_offset, int *flag, MPI_Status * status)
 {
     int rc;
     BEGIN_FUNC(FCNAME);
-    rc = MPID_nem_sfi_iprobe_impl(vc, source, tag, comm, context_offset, flag, status, NULL);
+    rc = MPID_nem_ofi_iprobe_impl(vc, source, tag, comm, context_offset, flag, status, NULL);
     END_FUNC(FCNAME);
     return rc;
 }
 
 #undef FCNAME
-#define FCNAME DECL_FUNC(MPID_nem_sfi_improbe)
-int MPID_nem_sfi_improbe(struct MPIDI_VC *vc,
+#define FCNAME DECL_FUNC(MPID_nem_ofi_improbe)
+int MPID_nem_ofi_improbe(struct MPIDI_VC *vc,
                          int source,
                          int tag,
                          MPID_Comm * comm,
@@ -169,7 +169,7 @@ int MPID_nem_sfi_improbe(struct MPIDI_VC *vc,
     int old_error = status->MPI_ERROR;
     int s;
     BEGIN_FUNC(FCNAME);
-    s = MPID_nem_sfi_iprobe_impl(vc, source, tag, comm, context_offset, flag, status, message);
+    s = MPID_nem_ofi_iprobe_impl(vc, source, tag, comm, context_offset, flag, status, message);
     if (flag && *flag) {
         status->MPI_ERROR = old_error;
         (*message)->kind = MPID_REQUEST_MPROBE;
@@ -179,36 +179,36 @@ int MPID_nem_sfi_improbe(struct MPIDI_VC *vc,
 }
 
 #undef FCNAME
-#define FCNAME DECL_FUNC(MPID_nem_sfi_anysource_iprobe)
-int MPID_nem_sfi_anysource_iprobe(int tag,
+#define FCNAME DECL_FUNC(MPID_nem_ofi_anysource_iprobe)
+int MPID_nem_ofi_anysource_iprobe(int tag,
                                   MPID_Comm * comm,
                                   int context_offset, int *flag, MPI_Status * status)
 {
     int rc;
     BEGIN_FUNC(FCNAME);
-    rc = MPID_nem_sfi_iprobe(NULL, MPI_ANY_SOURCE, tag, comm, context_offset, flag, status);
+    rc = MPID_nem_ofi_iprobe(NULL, MPI_ANY_SOURCE, tag, comm, context_offset, flag, status);
     END_FUNC(FCNAME);
     return rc;
 }
 
 #undef FCNAME
-#define FCNAME DECL_FUNC(MPID_nem_sfi_anysource_improbe)
-int MPID_nem_sfi_anysource_improbe(int tag,
+#define FCNAME DECL_FUNC(MPID_nem_ofi_anysource_improbe)
+int MPID_nem_ofi_anysource_improbe(int tag,
                                    MPID_Comm * comm,
                                    int context_offset,
                                    int *flag, MPID_Request ** message, MPI_Status * status)
 {
     int rc;
     BEGIN_FUNC(FCNAME);
-    rc = MPID_nem_sfi_improbe(NULL, MPI_ANY_SOURCE, tag, comm,
+    rc = MPID_nem_ofi_improbe(NULL, MPI_ANY_SOURCE, tag, comm,
                               context_offset, flag, message, status);
     END_FUNC(FCNAME);
     return rc;
 }
 
 #undef FCNAME
-#define FCNAME DECL_FUNC(MPID_nem_sfi_poll)
-int MPID_nem_sfi_poll(int in_blocking_poll)
+#define FCNAME DECL_FUNC(MPID_nem_ofi_poll)
+int MPID_nem_ofi_poll(int in_blocking_poll)
 {
     int complete = 0, mpi_errno = MPI_SUCCESS;
     ssize_t ret;
@@ -281,8 +281,8 @@ int MPID_nem_sfi_poll(int in_blocking_poll)
                 }
             }
             else {
-                MPIU_ERR_CHKANDJUMP4(1, mpi_errno, MPI_ERR_OTHER, "**sfi_poll",
-                                     "**sfi_poll %s %d %s %s", __SHORT_FILE__,
+                MPIU_ERR_CHKANDJUMP4(1, mpi_errno, MPI_ERR_OTHER, "**ofi_poll",
+                                     "**ofi_poll %s %d %s %s", __SHORT_FILE__,
                                      __LINE__, FCNAME, fi_strerror(-ret));
             }
         }
diff --git a/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_tagged.c b/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_tagged.c
similarity index 86%
rename from src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_tagged.c
rename to src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_tagged.c
index 237c735..89affd2 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_tagged.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/ofi/ofi_tagged.c
@@ -7,7 +7,7 @@
  *  to Argonne National Laboratory subject to Software Grant and Corporate
  *  Contributor License Agreement dated February 8, 2012.
  */
-#include "sfi_impl.h"
+#include "ofi_impl.h"
 
 #define MPID_NORMAL_SEND 0
 
@@ -15,8 +15,8 @@
 /* Receive callback called after sending a syncronous send acknowledgement. */
 /* ------------------------------------------------------------------------ */
 #undef FCNAME
-#define FCNAME DECL_FUNC(MPID_nem_sfi_sync_recv_callback)
-static inline int MPID_nem_sfi_sync_recv_callback(cq_tagged_entry_t * wc ATTRIBUTE((unused)),
+#define FCNAME DECL_FUNC(MPID_nem_ofi_sync_recv_callback)
+static inline int MPID_nem_ofi_sync_recv_callback(cq_tagged_entry_t * wc ATTRIBUTE((unused)),
                                                   MPID_Request * rreq)
 {
     int mpi_errno = MPI_SUCCESS;
@@ -36,8 +36,8 @@ static inline int MPID_nem_sfi_sync_recv_callback(cq_tagged_entry_t * wc ATTRIBU
 /* Free any temporary/pack buffers and complete the send request            */
 /* ------------------------------------------------------------------------ */
 #undef FCNAME
-#define FCNAME DECL_FUNC(MPID_nem_sfi_send_callback)
-static inline int MPID_nem_sfi_send_callback(cq_tagged_entry_t * wc ATTRIBUTE((unused)),
+#define FCNAME DECL_FUNC(MPID_nem_ofi_send_callback)
+static inline int MPID_nem_ofi_send_callback(cq_tagged_entry_t * wc ATTRIBUTE((unused)),
                                              MPID_Request * sreq)
 {
     int mpi_errno = MPI_SUCCESS;
@@ -54,8 +54,8 @@ static inline int MPID_nem_sfi_send_callback(cq_tagged_entry_t * wc ATTRIBUTE((u
 /* Handle an incoming receive completion event                              */
 /* ------------------------------------------------------------------------ */
 #undef FCNAME
-#define FCNAME DECL_FUNC(MPID_nem_sfi_recv_callback)
-static inline int MPID_nem_sfi_recv_callback(cq_tagged_entry_t * wc, MPID_Request * rreq)
+#define FCNAME DECL_FUNC(MPID_nem_ofi_recv_callback)
+static inline int MPID_nem_ofi_recv_callback(cq_tagged_entry_t * wc, MPID_Request * rreq)
 {
     int err0, err1, src, mpi_errno = MPI_SUCCESS;
     uint64_t ssend_bits;
@@ -99,17 +99,17 @@ static inline int MPID_nem_sfi_recv_callback(cq_tagged_entry_t * wc, MPID_Reques
         }
         ssend_bits = init_sendtag(rreq->dev.match.parts.context_id,
                                   rreq->comm->rank, rreq->status.MPI_TAG, MPID_SYNC_SEND_ACK);
-        MPID_nem_sfi_create_req(&sync_req, 1);
+        MPID_nem_ofi_create_req(&sync_req, 1);
         sync_req->dev.OnDataAvail = NULL;
         sync_req->dev.next = NULL;
-        REQ_SFI(sync_req)->event_callback = MPID_nem_sfi_sync_recv_callback;
+        REQ_SFI(sync_req)->event_callback = MPID_nem_ofi_sync_recv_callback;
         REQ_SFI(sync_req)->parent = rreq;
         FI_RC(fi_tsend(gl_data.endpoint,
                          NULL,
                          0,
                          gl_data.mr,
                          VC_SFI(vc)->direct_addr,
-                         ssend_bits, &(REQ_SFI(sync_req)->sfi_context)), tsend);
+                         ssend_bits, &(REQ_SFI(sync_req)->ofi_context)), tsend);
     }
     else {
         /* ---------------------------------------------------- */
@@ -147,10 +147,10 @@ static inline int do_isend(struct MPIDI_VC *vc,
     /* ---------------------------------------------------- */
     /* Create the MPI request                               */
     /* ---------------------------------------------------- */
-    MPID_nem_sfi_create_req(&sreq, 2);
+    MPID_nem_ofi_create_req(&sreq, 2);
     sreq->kind = MPID_REQUEST_SEND;
     sreq->dev.OnDataAvail = NULL;
-    REQ_SFI(sreq)->event_callback = MPID_nem_sfi_send_callback;
+    REQ_SFI(sreq)->event_callback = MPID_nem_ofi_send_callback;
     REQ_SFI(sreq)->vc = vc;
 
     /* ---------------------------------------------------- */
@@ -178,10 +178,10 @@ static inline int do_isend(struct MPIDI_VC *vc,
         /* ---------------------------------------------------- */
         int c = 1;
         MPID_cc_incr(sreq->cc_ptr, &c);
-        MPID_nem_sfi_create_req(&sync_req, 1);
+        MPID_nem_ofi_create_req(&sync_req, 1);
         sync_req->dev.OnDataAvail = NULL;
         sync_req->dev.next = NULL;
-        REQ_SFI(sync_req)->event_callback = MPID_nem_sfi_sync_recv_callback;
+        REQ_SFI(sync_req)->event_callback = MPID_nem_ofi_sync_recv_callback;
         REQ_SFI(sync_req)->parent = sreq;
         ssend_match = init_recvtag(&ssend_mask, comm->context_id + context_offset, dest, tag);
         ssend_match |= MPID_SYNC_SEND_ACK;
@@ -192,7 +192,7 @@ static inline int do_isend(struct MPIDI_VC *vc,
                            VC_SFI(vc)->direct_addr,     /* remote proc */
                            ssend_match, /* match bits  */
                            0ULL,        /* mask bits   */
-                           &(REQ_SFI(sync_req)->sfi_context)), trecv);
+                           &(REQ_SFI(sync_req)->ofi_context)), trecv);
     }
     FI_RC(fi_tsend(gl_data.endpoint,  /* Endpoint                       */
                      send_buffer,       /* Send buffer(packed or user)    */
@@ -200,14 +200,14 @@ static inline int do_isend(struct MPIDI_VC *vc,
                      gl_data.mr,        /* Dynamic memory region          */
                      VC_SFI(vc)->direct_addr,   /* Use the address of this VC     */
                      match_bits,        /* Match bits                     */
-                     &(REQ_SFI(sreq)->sfi_context)), tsend);
+                     &(REQ_SFI(sreq)->ofi_context)), tsend);
     *request = sreq;
     END_FUNC_RC(FCNAME);
 }
 
 #undef FCNAME
-#define FCNAME DECL_FUNC(MPID_nem_sfi_recv_posted)
-int MPID_nem_sfi_recv_posted(struct MPIDI_VC *vc, struct MPID_Request *rreq)
+#define FCNAME DECL_FUNC(MPID_nem_ofi_recv_posted)
+int MPID_nem_ofi_recv_posted(struct MPIDI_VC *vc, struct MPID_Request *rreq)
 {
     int mpi_errno = MPI_SUCCESS, dt_contig, src, tag;
     uint64_t match_bits = 0, mask_bits = 0;
@@ -222,8 +222,8 @@ int MPID_nem_sfi_recv_posted(struct MPIDI_VC *vc, struct MPID_Request *rreq)
     /* ------------------------ */
     /* Initialize the request   */
     /* ------------------------ */
-    MPID_nem_sfi_init_req(rreq);
-    REQ_SFI(rreq)->event_callback = MPID_nem_sfi_recv_callback;
+    MPID_nem_ofi_init_req(rreq);
+    REQ_SFI(rreq)->event_callback = MPID_nem_ofi_recv_callback;
     REQ_SFI(rreq)->vc = vc;
 
     /* ---------------------------------------------------- */
@@ -255,14 +255,14 @@ int MPID_nem_sfi_recv_posted(struct MPIDI_VC *vc, struct MPID_Request *rreq)
                        data_sz,
                        gl_data.mr,
                        remote_proc,
-                       match_bits, mask_bits, &(REQ_SFI(rreq)->sfi_context)), trecv);
-    MPID_nem_sfi_poll(MPID_NONBLOCKING_POLL);
+                       match_bits, mask_bits, &(REQ_SFI(rreq)->ofi_context)), trecv);
+    MPID_nem_ofi_poll(MPID_NONBLOCKING_POLL);
     END_FUNC_RC(FCNAME);
 }
 
 #undef FCNAME
-#define FCNAME DECL_FUNC(MPID_nem_sfi_send)
-int MPID_nem_sfi_send(struct MPIDI_VC *vc,
+#define FCNAME DECL_FUNC(MPID_nem_ofi_send)
+int MPID_nem_ofi_send(struct MPIDI_VC *vc,
                       const void *buf,
                       int count,
                       MPI_Datatype datatype,
@@ -279,8 +279,8 @@ int MPID_nem_sfi_send(struct MPIDI_VC *vc,
 }
 
 #undef FCNAME
-#define FCNAME DECL_FUNC(MPID_nem_sfi_isend)
-int MPID_nem_sfi_isend(struct MPIDI_VC *vc,
+#define FCNAME DECL_FUNC(MPID_nem_ofi_isend)
+int MPID_nem_ofi_isend(struct MPIDI_VC *vc,
                        const void *buf,
                        int count,
                        MPI_Datatype datatype,
@@ -296,8 +296,8 @@ int MPID_nem_sfi_isend(struct MPIDI_VC *vc,
 }
 
 #undef FCNAME
-#define FCNAME DECL_FUNC(MPID_nem_sfi_ssend)
-int MPID_nem_sfi_ssend(struct MPIDI_VC *vc,
+#define FCNAME DECL_FUNC(MPID_nem_ofi_ssend)
+int MPID_nem_ofi_ssend(struct MPIDI_VC *vc,
                        const void *buf,
                        int count,
                        MPI_Datatype datatype,
@@ -313,8 +313,8 @@ int MPID_nem_sfi_ssend(struct MPIDI_VC *vc,
 }
 
 #undef FCNAME
-#define FCNAME DECL_FUNC(MPID_nem_sfi_issend)
-int MPID_nem_sfi_issend(struct MPIDI_VC *vc,
+#define FCNAME DECL_FUNC(MPID_nem_ofi_issend)
+int MPID_nem_ofi_issend(struct MPIDI_VC *vc,
                         const void *buf,
                         int count,
                         MPI_Datatype datatype,
@@ -335,9 +335,9 @@ int MPID_nem_sfi_issend(struct MPIDI_VC *vc,
   int mpi_errno = MPI_SUCCESS;                          \
   int ret;                                              \
   BEGIN_FUNC(FCNAME);                                   \
-  MPID_nem_sfi_poll(MPID_NONBLOCKING_POLL);             \
+  MPID_nem_ofi_poll(MPID_NONBLOCKING_POLL);             \
   ret = fi_cancel((fid_t)gl_data.endpoint,              \
-                  &(REQ_SFI(req)->sfi_context));        \
+                  &(REQ_SFI(req)->ofi_context));        \
   if (ret == 0) {                                        \
     MPIR_STATUS_SET_CANCEL_BIT(req->status, TRUE);      \
   } else {                                              \
@@ -348,33 +348,33 @@ int MPID_nem_sfi_issend(struct MPIDI_VC *vc,
 })
 
 #undef FCNAME
-#define FCNAME DECL_FUNC(MPID_nem_sfi_cancel_send)
-int MPID_nem_sfi_cancel_send(struct MPIDI_VC *vc ATTRIBUTE((unused)), struct MPID_Request *sreq)
+#define FCNAME DECL_FUNC(MPID_nem_ofi_cancel_send)
+int MPID_nem_ofi_cancel_send(struct MPIDI_VC *vc ATTRIBUTE((unused)), struct MPID_Request *sreq)
 {
     DO_CANCEL(sreq);
 }
 
 #undef FCNAME
-#define FCNAME DECL_FUNC(MPID_nem_sfi_cancel_recv)
-int MPID_nem_sfi_cancel_recv(struct MPIDI_VC *vc ATTRIBUTE((unused)), struct MPID_Request *rreq)
+#define FCNAME DECL_FUNC(MPID_nem_ofi_cancel_recv)
+int MPID_nem_ofi_cancel_recv(struct MPIDI_VC *vc ATTRIBUTE((unused)), struct MPID_Request *rreq)
 {
     DO_CANCEL(rreq);
 }
 
 #undef FCNAME
-#define FCNAME DECL_FUNC(MPID_nem_sfi_anysource_posted)
-void MPID_nem_sfi_anysource_posted(MPID_Request * rreq)
+#define FCNAME DECL_FUNC(MPID_nem_ofi_anysource_posted)
+void MPID_nem_ofi_anysource_posted(MPID_Request * rreq)
 {
     int mpi_errno = MPI_SUCCESS;
     BEGIN_FUNC(FCNAME);
-    mpi_errno = MPID_nem_sfi_recv_posted(NULL, rreq);
+    mpi_errno = MPID_nem_ofi_recv_posted(NULL, rreq);
     MPIU_Assert(mpi_errno == MPI_SUCCESS);
     END_FUNC(FCNAME);
 }
 
 #undef FCNAME
-#define FCNAME DECL_FUNC(MPID_nem_sfi_anysource_matched)
-int MPID_nem_sfi_anysource_matched(MPID_Request * rreq)
+#define FCNAME DECL_FUNC(MPID_nem_ofi_anysource_matched)
+int MPID_nem_ofi_anysource_matched(MPID_Request * rreq)
 {
     int mpi_errno = FALSE;
     int ret;
@@ -384,7 +384,7 @@ int MPID_nem_sfi_anysource_matched(MPID_Request * rreq)
     /* source request on another device.  We have the chance */
     /* to cancel this shared request if it has been posted   */
     /* ----------------------------------------------------- */
-    ret = fi_cancel((fid_t) gl_data.endpoint, &(REQ_SFI(rreq)->sfi_context));
+    ret = fi_cancel((fid_t) gl_data.endpoint, &(REQ_SFI(rreq)->ofi_context));
     if (ret == 0) {
         /* --------------------------------------------------- */
         /* Request cancelled:  cancel and complete the request */
diff --git a/src/mpid/ch3/channels/nemesis/netmod/sfi/subconfigure.m4 b/src/mpid/ch3/channels/nemesis/netmod/ofi/subconfigure.m4
similarity index 71%
rename from src/mpid/ch3/channels/nemesis/netmod/sfi/subconfigure.m4
rename to src/mpid/ch3/channels/nemesis/netmod/ofi/subconfigure.m4
index 361f7d0..cb6b35c 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/sfi/subconfigure.m4
+++ b/src/mpid/ch3/channels/nemesis/netmod/ofi/subconfigure.m4
@@ -4,18 +4,18 @@ dnl MPICH_SUBCFG_AFTER=src/mpid/ch3/channels/nemesis
 AC_DEFUN([PAC_SUBCFG_PREREQ_]PAC_SUBCFG_AUTO_SUFFIX,[
     AM_COND_IF([BUILD_CH3_NEMESIS],[
         for net in $nemesis_networks ; do
-            AS_CASE([$net],[sfi],[build_nemesis_netmod_sfi=yes])
+            AS_CASE([$net],[ofi],[build_nemesis_netmod_ofi=yes])
         done
     ])
-    AM_CONDITIONAL([BUILD_NEMESIS_NETMOD_SFI],[test "X$build_nemesis_netmod_sfi" = "Xyes"])
+    AM_CONDITIONAL([BUILD_NEMESIS_NETMOD_SFI],[test "X$build_nemesis_netmod_ofi" = "Xyes"])
 ])dnl
 
 AC_DEFUN([PAC_SUBCFG_BODY_]PAC_SUBCFG_AUTO_SUFFIX,[
 AM_COND_IF([BUILD_NEMESIS_NETMOD_SFI],[
-    AC_MSG_NOTICE([RUNNING CONFIGURE FOR ch3:nemesis:sfi])
+    AC_MSG_NOTICE([RUNNING CONFIGURE FOR ch3:nemesis:ofi])
 
-    PAC_SET_HEADER_LIB_PATH(sfi)
-    PAC_CHECK_HEADER_LIB_FATAL(sfi, rdma/fabric.h, fabric, fi_getinfo)
+    PAC_SET_HEADER_LIB_PATH(ofi)
+    PAC_CHECK_HEADER_LIB_FATAL(ofi, rdma/fabric.h, fabric, fi_getinfo)
 
     AC_DEFINE([ENABLE_COMM_OVERRIDES], 1, [define to add per-vc function pointers to override send and recv functions])
 ])dnl end AM_COND_IF(BUILD_NEMESIS_NETMOD_SFI,...)
diff --git a/src/mpid/ch3/channels/nemesis/netmod/sfi/Makefile.mk b/src/mpid/ch3/channels/nemesis/netmod/sfi/Makefile.mk
deleted file mode 100644
index bc3d6ef..0000000
--- a/src/mpid/ch3/channels/nemesis/netmod/sfi/Makefile.mk
+++ /dev/null
@@ -1,19 +0,0 @@
-## -*- Mode: Makefile; -*-
-## vim: set ft=automake :
-##
-## (C) 2011 by Argonne National Laboratory.
-##     See COPYRIGHT in top-level directory.
-##
-if BUILD_NEMESIS_NETMOD_SFI
-
-mpi_core_sources +=                                 		\
-    src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_init.c 	\
-    src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_cm.c	 	\
-    src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_tagged.c	\
-    src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_msg.c	 	\
-    src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_data.c	 	\
-    src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_progress.c
-
-errnames_txt_files += src/mpid/ch3/channels/nemesis/netmod/sfi/errnames.txt
-
-endif
diff --git a/src/mpid/ch3/channels/nemesis/netmod/sfi/errnames.txt b/src/mpid/ch3/channels/nemesis/netmod/sfi/errnames.txt
deleted file mode 100644
index 09cb0f6..0000000
--- a/src/mpid/ch3/channels/nemesis/netmod/sfi/errnames.txt
+++ /dev/null
@@ -1,42 +0,0 @@
-**sfi_avmap:SFI get address vector map failed
-**sfi_avmap %s %d %s %s:SFI address vector map failed (%s:%d:%s:%s)
-**sfi_tsend:SFI tagged sendto failed
-**sfi_tsend %s %d %s %s:SFI tagged sendto failed (%s:%d:%s:%s)
-**sfi_trecv:SFI tagged recvfrom failed
-**sfi_trecv %s %d %s %s:SFI tagged recvfrom failed (%s:%d:%s:%s)
-**sfi_getinfo:SFI getinfo() failed
-**sfi_getinfo %s %d %s %s:SFI getinfo() failed (%s:%d:%s:%s)
-**sfi_openep:SFI endpoint open failed
-**sfi_openep %s %d %s %s:SFI endpoint open failed (%s:%d:%s:%s)
-**sfi_openfabric:SFI fabric open failure
-**sfi_openfabric %s %d %s %s:SFI fabric open failed (%s:%d:%s:%s)
-**sfi_opendomain:SFI domain open failure
-**sfi_opendomain %s %d %s %s:SFI domain open failed (%s:%d:%s:%s)
-**sfi_opencq:SFI event queue create failure
-**sfi_opencq %s %d %s %s:SFI event queue create failed (%s:%d:%s:%s)
-**sfi_avopen:SFI address vector open failed
-**sfi_avopen %s %d %s %s:SFI address vector open failed (%s:%d:%s:%s)
-**sfi_bind:SFI resource bind failure
-**sfi_bind %s %d %s %s:SFI resource bind failed (%s:%d:%s:%s)
-**sfi_ep_enable:SFI endpoint enable failed
-**sfi_ep_enable %s %d %s %s:SFI endpoint enable failed (%s:%d:%s:%s)
-**sfi_getname:SFI get endpoint name failed
-**sfi_getname %s %d %s %s:SFI get endpoint name failed (%s:%d:%s:%s)
-**sfi_avclose:SFI av close failed
-**sfi_avclose %s %d %s %s:SFI av close failed (%s:%d:%s:%s)
-**sfi_epclose:SFI endpoint close failed
-**sfi_epclose %s %d %s %s:SFI endpoint close failed (%s:%d:%s:%s)
-**sfi_cqclose:SFI cq close failed
-**sfi_cqclose %s %d %s %s:SFI cq close failed (%s:%d:%s:%s)
-**sfi_mrclose:SFI mr close failed
-**sfi_mrclose %s %d %s %s:SFI mr close failed (%s:%d:%s:%s)
-**sfi_fabricclose:SFI fabric close failed
-**sfi_fabricclose %s %d %s %s:SFI fabric close failed (%s:%d:%s:%s)
-**sfi_domainclose:SFI domain close failed
-**sfi_domainclose %s %d %s %s:SFI domain close failed (%s:%d:%s:%s)
-**sfi_tsearch:SFI tsearch failed
-**sfi_tsearch %s %d %s %s:SFI tsearch failed (%s:%d:%s:%s)
-**sfi_poll:SFI poll failed
-**sfi_poll %s %d %s %s:SFI poll failed (%s:%d:%s:%s)
-**sfi_cancel:SFI cancel failed
-**sfi_cancel %s %d %s %s:SFI cancel failed (%s:%d:%s:%s)

http://git.mpich.org/mpich.git/commitdiff/64cf28039180236a0c8111503ad2888f45602a45

commit 64cf28039180236a0c8111503ad2888f45602a45
Author: Charles J Archer <charles.j.archer at intel.com>
Date:   Tue Dec 9 14:30:33 2014 -0800

    Compile time fixes to latest OFI interfaces

diff --git a/src/mpid/ch3/channels/nemesis/netmod/sfi/errnames.txt b/src/mpid/ch3/channels/nemesis/netmod/sfi/errnames.txt
index c1ae0e3..09cb0f6 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/sfi/errnames.txt
+++ b/src/mpid/ch3/channels/nemesis/netmod/sfi/errnames.txt
@@ -1,9 +1,9 @@
 **sfi_avmap:SFI get address vector map failed
 **sfi_avmap %s %d %s %s:SFI address vector map failed (%s:%d:%s:%s)
-**sfi_tsendto:SFI tagged sendto failed
-**sfi_tsendto %s %d %s %s:SFI tagged sendto failed (%s:%d:%s:%s)
-**sfi_trecvfrom:SFI tagged recvfrom failed
-**sfi_trecvfrom %s %d %s %s:SFI tagged recvfrom failed (%s:%d:%s:%s)
+**sfi_tsend:SFI tagged sendto failed
+**sfi_tsend %s %d %s %s:SFI tagged sendto failed (%s:%d:%s:%s)
+**sfi_trecv:SFI tagged recvfrom failed
+**sfi_trecv %s %d %s %s:SFI tagged recvfrom failed (%s:%d:%s:%s)
 **sfi_getinfo:SFI getinfo() failed
 **sfi_getinfo %s %d %s %s:SFI getinfo() failed (%s:%d:%s:%s)
 **sfi_openep:SFI endpoint open failed
diff --git a/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_cm.c b/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_cm.c
index b39517a..54bf757 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_cm.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_cm.c
@@ -101,14 +101,14 @@ static inline int MPID_nem_sfi_conn_req_callback(cq_tagged_entry_t * wc, MPID_Re
     MPIU_Memcpy(bc, rreq->dev.user_buf, wc->len);
     bc[wc->len] = '\0';
     MPIU_Assert(gl_data.conn_req == rreq);
-    FI_RC(fi_trecvfrom(gl_data.endpoint,
+    FI_RC(fi_trecv(gl_data.endpoint,
                        gl_data.conn_req->dev.user_buf,
                        SFI_KVSAPPSTRLEN,
                        gl_data.mr,
                        0,
                        MPID_CONN_REQ,
                        ~MPID_PROTOCOL_MASK,
-                       (void *) &(REQ_SFI(gl_data.conn_req)->sfi_context)), trecvfrom);
+                       (void *) &(REQ_SFI(gl_data.conn_req)->sfi_context)), trecv);
 
     addr = MPIU_Malloc(gl_data.bound_addrlen);
     MPIU_Assertp(addr);
@@ -221,34 +221,34 @@ static inline int MPID_nem_sfi_preposted_callback(cq_tagged_entry_t * wc, MPID_R
     REQ_SFI(new_rreq)->vc = vc;
     REQ_SFI(new_rreq)->pack_buffer = pack_buffer;
     REQ_SFI(new_rreq)->pack_buffer_size = pkt_len;
-    FI_RC(fi_trecvfrom(gl_data.endpoint,
+    FI_RC(fi_trecv(gl_data.endpoint,
                        REQ_SFI(new_rreq)->pack_buffer,
                        REQ_SFI(new_rreq)->pack_buffer_size,
                        gl_data.mr,
                        VC_SFI(vc)->direct_addr,
-                       wc->tag | MPID_MSG_DATA, 0, &(REQ_SFI(new_rreq)->sfi_context)), trecvfrom);
+                       wc->tag | MPID_MSG_DATA, 0, &(REQ_SFI(new_rreq)->sfi_context)), trecv);
 
     MPID_nem_sfi_create_req(&sreq, 1);
     sreq->dev.OnDataAvail = NULL;
     sreq->dev.next = NULL;
     REQ_SFI(sreq)->event_callback = MPID_nem_sfi_cts_send_callback;
     REQ_SFI(sreq)->parent = new_rreq;
-    FI_RC(fi_tsendto(gl_data.endpoint,
+    FI_RC(fi_tsend(gl_data.endpoint,
                      NULL,
                      0,
                      gl_data.mr,
                      VC_SFI(vc)->direct_addr,
-                     wc->tag | MPID_MSG_CTS, &(REQ_SFI(sreq)->sfi_context)), tsendto);
+                     wc->tag | MPID_MSG_CTS, &(REQ_SFI(sreq)->sfi_context)), tsend);
     MPIU_Assert(gl_data.persistent_req == rreq);
 
     rreq->dev.user_count = 0;
-    FI_RC(fi_trecvfrom(gl_data.endpoint,
+    FI_RC(fi_trecv(gl_data.endpoint,
                        &rreq->dev.user_count,
                        sizeof rreq->dev.user_count,
                        gl_data.mr,
                        0,
                        MPID_MSG_RTS,
-                       ~MPID_PROTOCOL_MASK, &(REQ_SFI(rreq)->sfi_context)), trecvfrom);
+                       ~MPID_PROTOCOL_MASK, &(REQ_SFI(rreq)->sfi_context)), trecv);
     END_FUNC_RC(FCNAME);
 }
 
@@ -303,14 +303,14 @@ int MPID_nem_sfi_cm_init(MPIDI_PG_t * pg_p, int pg_rank ATTRIBUTE((unused)))
     persistent_req->dev.next = NULL;
     REQ_SFI(persistent_req)->vc = NULL;
     REQ_SFI(persistent_req)->event_callback = MPID_nem_sfi_preposted_callback;
-    FI_RC(fi_trecvfrom(gl_data.endpoint,
+    FI_RC(fi_trecv(gl_data.endpoint,
                        &persistent_req->dev.user_count,
                        sizeof persistent_req->dev.user_count,
                        gl_data.mr,
                        0,
                        MPID_MSG_RTS,
                        ~MPID_PROTOCOL_MASK,
-                       (void *) &(REQ_SFI(persistent_req)->sfi_context)), trecvfrom);
+                       (void *) &(REQ_SFI(persistent_req)->sfi_context)), trecv);
     gl_data.persistent_req = persistent_req;
 
     /* --------------------------------- */
@@ -322,13 +322,13 @@ int MPID_nem_sfi_cm_init(MPIDI_PG_t * pg_p, int pg_rank ATTRIBUTE((unused)))
     conn_req->dev.next = NULL;
     REQ_SFI(conn_req)->vc = NULL;       /* We don't know the source yet */
     REQ_SFI(conn_req)->event_callback = MPID_nem_sfi_conn_req_callback;
-    FI_RC(fi_trecvfrom(gl_data.endpoint,
+    FI_RC(fi_trecv(gl_data.endpoint,
                        conn_req->dev.user_buf,
                        SFI_KVSAPPSTRLEN,
                        gl_data.mr,
                        0,
                        MPID_CONN_REQ,
-                       ~MPID_PROTOCOL_MASK, (void *) &(REQ_SFI(conn_req)->sfi_context)), trecvfrom);
+                       ~MPID_PROTOCOL_MASK, (void *) &(REQ_SFI(conn_req)->sfi_context)), trecv);
     gl_data.conn_req = conn_req;
 
 
@@ -537,12 +537,12 @@ int MPID_nem_sfi_connect_to_root(const char *business_card, MPIDI_VC_t * new_vc)
     REQ_SFI(sreq)->event_callback = MPID_nem_sfi_connect_to_root_callback;
     REQ_SFI(sreq)->pack_buffer = my_bc;
     conn_req_send_bits = init_sendtag(0, MPIR_Process.comm_world->rank, 0, MPID_CONN_REQ);
-    FI_RC(fi_tsendto(gl_data.endpoint,
+    FI_RC(fi_tsend(gl_data.endpoint,
                      REQ_SFI(sreq)->pack_buffer,
                      my_bc_len,
                      gl_data.mr,
                      VC_SFI(new_vc)->direct_addr,
-                     conn_req_send_bits, &(REQ_SFI(sreq)->sfi_context)), tsendto);
+                     conn_req_send_bits, &(REQ_SFI(sreq)->sfi_context)), tsend);
     MPID_nem_sfi_poll(MPID_NONBLOCKING_POLL);
     VC_SFI(new_vc)->is_cmvc = 1;
     VC_SFI(new_vc)->next = gl_data.cm_vcs;
diff --git a/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_impl.h b/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_impl.h
index 9e8b93f..d720170 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_impl.h
+++ b/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_impl.h
@@ -29,7 +29,7 @@ typedef struct fi_info info_t;
 typedef struct fi_cq_attr cq_attr_t;
 typedef struct fi_av_attr av_attr_t;
 typedef struct fi_domain_attr domain_attr_t;
-typedef struct fi_tx_ctx_attr tx_ctx_attr_t;
+typedef struct fi_tx_attr tx_attr_t;
 typedef struct fi_cq_tagged_entry cq_tagged_entry_t;
 typedef struct fi_cq_err_entry cq_err_entry_t;
 typedef struct fi_context context_t;
diff --git a/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_init.c b/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_init.c
index 88a6496..586e522 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_init.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_init.c
@@ -75,7 +75,7 @@ int MPID_nem_sfi_init(MPIDI_PG_t * pg_p, int pg_rank, char **bc_val_p, int *val_
     domain_attr_t domain_attr;
     memset(&domain_attr, 0, sizeof(domain_attr));
 
-    tx_ctx_attr_t tx_attr;
+    tx_attr_t tx_attr;
     memset(&tx_attr, 0, sizeof(tx_attr));
 
     domain_attr.threading = FI_THREAD_PROGRESS;
@@ -334,8 +334,8 @@ static inline int compile_time_checking()
     /* ------------------------------------------------------------------------ */
 #if 0
     MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_avmap", "**sfi_avmap %s %d %s %s", a, b, a, a);
-    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_tsendto", "**sfi_tsendto %s %d %s %s", a, b, a, a);
-    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_trecvfrom", "**sfi_trecvfrom %s %d %s %s", a, b, a, a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_tsend", "**sfi_tsend %s %d %s %s", a, b, a, a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_trecv", "**sfi_trecv %s %d %s %s", a, b, a, a);
     MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_getinfo", "**sfi_getinfo %s %d %s %s", a, b, a, a);
     MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_openep", "**sfi_openep %s %d %s %s", a, b, a, a);
     MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_openfabric", "**sfi_openfabric %s %d %s %s", a, b, a, a);
diff --git a/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_msg.c b/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_msg.c
index 3797f92..c202997 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_msg.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_msg.c
@@ -73,21 +73,21 @@
     REQ_SFI(cts_req)->event_callback = MPID_nem_sfi_cts_recv_callback;  \
     REQ_SFI(cts_req)->parent         = sreq;                            \
                                                                         \
-    FI_RC(fi_trecvfrom(gl_data.endpoint,                                \
+    FI_RC(fi_trecv(gl_data.endpoint,                                \
                        NULL,                                            \
                        0,                                               \
                        gl_data.mr,                                      \
                        VC_SFI(vc)->direct_addr,                         \
                        match_bits | MPID_MSG_CTS,                       \
                        0, /* Exact tag match, no ignore bits */         \
-                       &(REQ_SFI(cts_req)->sfi_context)),trecvfrom);    \
-    FI_RC(fi_tsendto(gl_data.endpoint,                                  \
+                       &(REQ_SFI(cts_req)->sfi_context)),trecv);    \
+    FI_RC(fi_tsend(gl_data.endpoint,                                  \
                      &REQ_SFI(sreq)->pack_buffer_size,                  \
                      sizeof(REQ_SFI(sreq)->pack_buffer_size),           \
                      gl_data.mr,                                        \
                      VC_SFI(vc)->direct_addr,                           \
                      match_bits,                                        \
-                     &(REQ_SFI(sreq)->sfi_context)),tsendto);           \
+                     &(REQ_SFI(sreq)->sfi_context)),tsend);           \
   })
 
 
@@ -108,12 +108,12 @@ static int MPID_nem_sfi_data_callback(cq_tagged_entry_t * wc, MPID_Request * sre
     if (sreq->cc == 2) {
         vc = REQ_SFI(sreq)->vc;
         REQ_SFI(sreq)->tag = tag | MPID_MSG_DATA;
-        FI_RC(fi_tsendto(gl_data.endpoint,
+        FI_RC(fi_tsend(gl_data.endpoint,
                          REQ_SFI(sreq)->pack_buffer,
                          REQ_SFI(sreq)->pack_buffer_size,
                          gl_data.mr,
                          VC_SFI(vc)->direct_addr,
-                         wc->tag | MPID_MSG_DATA, (void *) &(REQ_SFI(sreq)->sfi_context)), tsendto);
+                         wc->tag | MPID_MSG_DATA, (void *) &(REQ_SFI(sreq)->sfi_context)), tsend);
     }
     if (sreq->cc == 1) {
         if (REQ_SFI(sreq)->pack_buffer)
diff --git a/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_progress.c b/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_progress.c
index 8f40aeb..fdb1bbf 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_progress.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_progress.c
@@ -258,7 +258,7 @@ int MPID_nem_sfi_poll(int in_blocking_poll)
         }
         else if (ret < 0) {
             if (ret == -FI_EAVAIL) {
-                ret = fi_cq_readerr(gl_data.cq, (void *) &error, sizeof(error), 0);
+                ret = fi_cq_readerr(gl_data.cq, (void *) &error, 0);
                 if (error.err == FI_EMSGSIZE) {
                     /* ----------------------------------------------------- */
                     /* This error message should only be delivered on send   */
diff --git a/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_tagged.c b/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_tagged.c
index 2d88c10..237c735 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_tagged.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_tagged.c
@@ -104,12 +104,12 @@ static inline int MPID_nem_sfi_recv_callback(cq_tagged_entry_t * wc, MPID_Reques
         sync_req->dev.next = NULL;
         REQ_SFI(sync_req)->event_callback = MPID_nem_sfi_sync_recv_callback;
         REQ_SFI(sync_req)->parent = rreq;
-        FI_RC(fi_tsendto(gl_data.endpoint,
+        FI_RC(fi_tsend(gl_data.endpoint,
                          NULL,
                          0,
                          gl_data.mr,
                          VC_SFI(vc)->direct_addr,
-                         ssend_bits, &(REQ_SFI(sync_req)->sfi_context)), tsendto);
+                         ssend_bits, &(REQ_SFI(sync_req)->sfi_context)), tsend);
     }
     else {
         /* ---------------------------------------------------- */
@@ -185,22 +185,22 @@ static inline int do_isend(struct MPIDI_VC *vc,
         REQ_SFI(sync_req)->parent = sreq;
         ssend_match = init_recvtag(&ssend_mask, comm->context_id + context_offset, dest, tag);
         ssend_match |= MPID_SYNC_SEND_ACK;
-        FI_RC(fi_trecvfrom(gl_data.endpoint,    /* endpoint    */
+        FI_RC(fi_trecv(gl_data.endpoint,    /* endpoint    */
                            NULL,        /* recvbuf     */
                            0,   /* data sz     */
                            gl_data.mr,  /* dynamic mr  */
                            VC_SFI(vc)->direct_addr,     /* remote proc */
                            ssend_match, /* match bits  */
                            0ULL,        /* mask bits   */
-                           &(REQ_SFI(sync_req)->sfi_context)), trecvfrom);
+                           &(REQ_SFI(sync_req)->sfi_context)), trecv);
     }
-    FI_RC(fi_tsendto(gl_data.endpoint,  /* Endpoint                       */
+    FI_RC(fi_tsend(gl_data.endpoint,  /* Endpoint                       */
                      send_buffer,       /* Send buffer(packed or user)    */
                      data_sz,   /* Size of the send               */
                      gl_data.mr,        /* Dynamic memory region          */
                      VC_SFI(vc)->direct_addr,   /* Use the address of this VC     */
                      match_bits,        /* Match bits                     */
-                     &(REQ_SFI(sreq)->sfi_context)), tsendto);
+                     &(REQ_SFI(sreq)->sfi_context)), tsend);
     *request = sreq;
     END_FUNC_RC(FCNAME);
 }
@@ -250,12 +250,12 @@ int MPID_nem_sfi_recv_posted(struct MPIDI_VC *vc, struct MPID_Request *rreq)
     /* ---------------- */
     /* Post the receive */
     /* ---------------- */
-    FI_RC(fi_trecvfrom(gl_data.endpoint,
+    FI_RC(fi_trecv(gl_data.endpoint,
                        recv_buffer,
                        data_sz,
                        gl_data.mr,
                        remote_proc,
-                       match_bits, mask_bits, &(REQ_SFI(rreq)->sfi_context)), trecvfrom);
+                       match_bits, mask_bits, &(REQ_SFI(rreq)->sfi_context)), trecv);
     MPID_nem_sfi_poll(MPID_NONBLOCKING_POLL);
     END_FUNC_RC(FCNAME);
 }

http://git.mpich.org/mpich.git/commitdiff/8d0c60761774dc05ec147116fbfa79894b477e3a

commit 8d0c60761774dc05ec147116fbfa79894b477e3a
Author: artem.v.yalozo <artem.v.yalozo at intel.com>
Date:   Thu Oct 23 11:48:41 2014 +0400

    Windows conformance: RMA mutexes
    
    This patch provides the following fix wrt Windows conformance feature
    (makes single code working on both platforms Linux and Windows):
        - RMA mutexes fix for Windows
    
    Change-Id: Ib4f7b2ec8a07813f0ed35281a1d584637c84c0a9
    Signed-off-by: Ken Raffenetti <raffenet at mcs.anl.gov>

diff --git a/src/mpi/errhan/errnames.txt b/src/mpi/errhan/errnames.txt
index 060b3b4..68a0bb8 100644
--- a/src/mpi/errhan/errnames.txt
+++ b/src/mpi/errhan/errnames.txt
@@ -585,7 +585,6 @@ is too big (> MPIU_SHMW_GHND_SZ)
 **msgrcv:msgrcv failed
 **msgrcv %d:msgrcv returned %d
 **nextbootmsg:failed to get the next bootstrap message
-**winwait:WaitForSingleObject failed
 **CreateThread:CreateThread failed
 **CreateThread %d:CreateThread failed, error %d
 **FindWindowEx:FindWindowEx failed
@@ -810,6 +809,8 @@ is too big (> MPIU_SHMW_GHND_SZ)
 **pthread_unlock %s:pthread_unlock failed (%s)
 **pthread_mutex:pthread mutex routine failed
 **pthread_mutex %s:pthread mutex routine failed (%s)
+**windows_mutex:Windows mutex routine failed
+**windows_mutex %s:Windows mutex routine failed (%s)
 **badportrange:MPICH_PORT_RANGE - invalid range specified
 **argstr_missingifname:Missing ifname or invalid host/port description in business card
 **rtspkt:failure occurred while attempting to send RTS packet
diff --git a/src/mpid/ch3/channels/nemesis/include/mpidi_ch3_impl.h b/src/mpid/ch3/channels/nemesis/include/mpidi_ch3_impl.h
index 90aec70..33a28d0 100644
--- a/src/mpid/ch3/channels/nemesis/include/mpidi_ch3_impl.h
+++ b/src/mpid/ch3/channels/nemesis/include/mpidi_ch3_impl.h
@@ -87,6 +87,7 @@ int MPIDI_CH3_SHM_Win_free(MPID_Win **win_ptr);
 
 /* Shared memory window atomic/accumulate mutex implementation */
 
+#if !defined(HAVE_WINDOWS_H)
 #define MPIDI_CH3I_SHM_MUTEX_LOCK(win_ptr)                                              \
     do {                                                                                \
         int pt_err = pthread_mutex_lock((win_ptr)->shm_mutex);                          \
@@ -126,6 +127,64 @@ int MPIDI_CH3_SHM_Win_free(MPID_Win **win_ptr);
         MPIU_ERR_CHKANDJUMP1(pt_err, mpi_errno, MPI_ERR_OTHER, "**pthread_mutex",       \
                              "**pthread_mutex %s", strerror(pt_err));                   \
     } while (0);
+#else
+#define HANDLE_WIN_MUTEX_ERROR()                                                        \
+    do {                                                                                \
+        HLOCAL str;                                                                     \
+        char error_msg[MPIU_STRERROR_BUF_SIZE];                                         \
+        DWORD error = GetLastError();                                                   \
+        int num_bytes = FormatMessage(                                                  \
+        FORMAT_MESSAGE_FROM_SYSTEM |                                                    \
+        FORMAT_MESSAGE_ALLOCATE_BUFFER,                                                 \
+        0,                                                                              \
+        error,                                                                          \
+        MAKELANGID( LANG_NEUTRAL, SUBLANG_DEFAULT ),                                    \
+        (LPTSTR) &str,                                                                  \
+        0,0);                                                                           \
+                                                                                        \
+        if (num_bytes != 0) {                                                           \
+            int pt_err = 1;                                                             \
+            int mpi_errno = MPI_ERR_OTHER;                                              \
+            MPIU_Strncpy(error_msg, str, MPIU_STRERROR_BUF_SIZE);                       \
+            LocalFree(str);                                                             \
+            strtok(error_msg, "\r\n");                                                  \
+            MPIU_ERR_CHKANDJUMP1(pt_err, mpi_errno, MPI_ERR_OTHER, "**windows_mutex",   \
+                                 "**windows_mutex %s", error_msg);                      \
+        }                                                                               \
+    } while (0);
+
+#define MPIDI_CH3I_SHM_MUTEX_LOCK(win_ptr)                                              \
+    do {                                                                                \
+        DWORD result = WaitForSingleObject(*((win_ptr)->shm_mutex), INFINITE);          \
+        if (result == WAIT_FAILED) {                                                    \
+            HANDLE_WIN_MUTEX_ERROR();                                                   \
+        }                                                                               \
+    } while (0);
+
+#define MPIDI_CH3I_SHM_MUTEX_UNLOCK(win_ptr)                                            \
+    do {                                                                                \
+        BOOL result = ReleaseMutex(*((win_ptr)->shm_mutex));                            \
+        if (!result) {                                                                  \
+            HANDLE_WIN_MUTEX_ERROR();                                                   \
+        }                                                                               \
+    } while (0);
+
+#define MPIDI_CH3I_SHM_MUTEX_INIT(win_ptr)                                              \
+    do {                                                                                \
+        *((win_ptr)->shm_mutex) = CreateMutex(NULL, FALSE, NULL);                       \
+        if (*((win_ptr)->shm_mutex) == NULL) {                                          \
+            HANDLE_WIN_MUTEX_ERROR();                                                   \
+        }                                                                               \
+    } while (0);
+
+#define MPIDI_CH3I_SHM_MUTEX_DESTROY(win_ptr)                                           \
+    do {                                                                                \
+        BOOL result = CloseHandle(*((win_ptr)->shm_mutex));                             \
+        if (!result) {                                                                  \
+            HANDLE_WIN_MUTEX_ERROR();                                                   \
+        }                                                                               \
+    } while (0);
+#endif /* !defined(HAVE_WINDOWS_H) */
 
 
 /* Starting of shared window list */

http://git.mpich.org/mpich.git/commitdiff/8b90056beee08fe0558678515cf14de2719d1b88

commit 8b90056beee08fe0558678515cf14de2719d1b88
Author: andrey.lobanov <andrey.lobanov at intel.com>
Date:   Fri Oct 17 13:58:21 2014 +0400

    fixed memory leaks in pmi messages handling
    
      - name publishing/lookup support (pmi v1 and v2)
      - job and node attrs (v2)
    
    Change-Id: Id18d968da0d0bbf6e8cb2e7acffaf77d82a5e8b0
    Signed-off-by: Ken Raffenetti <raffenet at mcs.anl.gov>

diff --git a/src/pm/hydra/pm/pmiserv/pmip_pmi_v2.c b/src/pm/hydra/pm/pmiserv/pmip_pmi_v2.c
index 66f677b..7a1723b 100644
--- a/src/pm/hydra/pm/pmiserv/pmip_pmi_v2.c
+++ b/src/pm/hydra/pm/pmiserv/pmip_pmi_v2.c
@@ -19,7 +19,7 @@ static HYD_status send_cmd_upstream(const char *start, int fd, char *args[])
 {
     int i, sent, closed;
     struct HYD_string_stash stash;
-    char *buf;
+    char *buf = NULL;
     struct HYD_pmcd_hdr hdr;
     HYD_status status = HYD_SUCCESS;
 
@@ -56,6 +56,8 @@ static HYD_status send_cmd_upstream(const char *start, int fd, char *args[])
     HYDU_ASSERT(!closed, status);
 
   fn_exit:
+    if (buf)
+        HYDU_FREE(buf);
     HYDU_FUNC_EXIT();
     return status;
 
@@ -216,7 +218,7 @@ static HYD_status fn_job_getid(int fd, char *args[])
 {
     struct HYD_string_stash stash;
     char *cmd, *thrid;
-    struct HYD_pmcd_token *tokens;
+    struct HYD_pmcd_token *tokens = NULL;
     int token_count;
     HYD_status status = HYD_SUCCESS;
 
@@ -244,6 +246,8 @@ static HYD_status fn_job_getid(int fd, char *args[])
     HYDU_FREE(cmd);
 
   fn_exit:
+    if (tokens)
+        HYD_pmcd_pmi_free_tokens(tokens, token_count);
     HYDU_FUNC_EXIT();
     return status;
 
@@ -255,7 +259,7 @@ static HYD_status fn_info_putnodeattr(int fd, char *args[])
 {
     struct HYD_string_stash stash;
     char *key, *val, *thrid, *cmd;
-    struct HYD_pmcd_token *tokens;
+    struct HYD_pmcd_token *tokens = NULL;
     int token_count, ret;
     struct HYD_pmcd_pmi_v2_reqs *req;
     HYD_status status = HYD_SUCCESS;
@@ -302,6 +306,8 @@ static HYD_status fn_info_putnodeattr(int fd, char *args[])
     }
 
   fn_exit:
+    if (tokens)
+        HYD_pmcd_pmi_free_tokens(tokens, token_count);
     HYDU_FUNC_EXIT();
     return status;
 
@@ -316,7 +322,7 @@ static HYD_status fn_info_getnodeattr(int fd, char *args[])
     char *key, *waitval, *thrid;
     struct HYD_string_stash stash;
     char *cmd;
-    struct HYD_pmcd_token *tokens;
+    struct HYD_pmcd_token *tokens = NULL;
     int token_count;
     HYD_status status = HYD_SUCCESS;
 
@@ -384,6 +390,8 @@ static HYD_status fn_info_getnodeattr(int fd, char *args[])
     }
 
   fn_exit:
+    if (tokens)
+        HYD_pmcd_pmi_free_tokens(tokens, token_count);
     HYDU_FUNC_EXIT();
     return status;
 
@@ -446,7 +454,7 @@ static HYD_status fn_finalize(int fd, char *args[])
     char *thrid;
     struct HYD_string_stash stash;
     char *cmd;
-    struct HYD_pmcd_token *tokens;
+    struct HYD_pmcd_token *tokens = NULL;
     int token_count;
     HYD_status status = HYD_SUCCESS;
 
@@ -476,6 +484,8 @@ static HYD_status fn_finalize(int fd, char *args[])
     close(fd);
 
   fn_exit:
+    if (tokens)
+        HYD_pmcd_pmi_free_tokens(tokens, token_count);
     HYDU_FUNC_EXIT();
     return status;
 
diff --git a/src/pm/hydra/pm/pmiserv/pmiserv_pmi_v1.c b/src/pm/hydra/pm/pmiserv/pmiserv_pmi_v1.c
index 9955e18..4899fed 100644
--- a/src/pm/hydra/pm/pmiserv/pmiserv_pmi_v1.c
+++ b/src/pm/hydra/pm/pmiserv/pmiserv_pmi_v1.c
@@ -629,8 +629,8 @@ static HYD_status fn_publish_name(int fd, int pid, int pgid, char *args[])
     struct HYD_string_stash stash;
     char *cmd, *val;
     int token_count;
-    struct HYD_pmcd_token *tokens;
-    char *name, *port;
+    struct HYD_pmcd_token *tokens = NULL;
+    char *name = NULL, *port = NULL;
     int success = 0;
     HYD_status status = HYD_SUCCESS;
 
@@ -666,6 +666,13 @@ static HYD_status fn_publish_name(int fd, int pid, int pgid, char *args[])
     HYDU_FREE(cmd);
 
   fn_exit:
+    if (tokens)
+        HYD_pmcd_pmi_free_tokens(tokens, token_count);
+    if (name)
+        HYDU_FREE(name);
+    if (port)
+        HYDU_FREE(port);
+
     HYDU_FUNC_EXIT();
     return status;
 
@@ -678,7 +685,7 @@ static HYD_status fn_unpublish_name(int fd, int pid, int pgid, char *args[])
     struct HYD_string_stash stash;
     char *cmd, *name;
     int token_count;
-    struct HYD_pmcd_token *tokens;
+    struct HYD_pmcd_token *tokens = NULL;
     int success = 0;
     HYD_status status = HYD_SUCCESS;
 
@@ -709,6 +716,8 @@ static HYD_status fn_unpublish_name(int fd, int pid, int pgid, char *args[])
     HYDU_FREE(cmd);
 
   fn_exit:
+    if (tokens)
+        HYD_pmcd_pmi_free_tokens(tokens, token_count);
     HYDU_FUNC_EXIT();
     return status;
 
@@ -719,9 +728,9 @@ static HYD_status fn_unpublish_name(int fd, int pid, int pgid, char *args[])
 static HYD_status fn_lookup_name(int fd, int pid, int pgid, char *args[])
 {
     struct HYD_string_stash stash;
-    char *cmd, *name, *value;
+    char *cmd, *name, *value = NULL;
     int token_count;
-    struct HYD_pmcd_token *tokens;
+    struct HYD_pmcd_token *tokens = NULL;
     HYD_status status = HYD_SUCCESS;
 
     HYDU_FUNC_ENTER();
@@ -753,6 +762,10 @@ static HYD_status fn_lookup_name(int fd, int pid, int pgid, char *args[])
     HYDU_FREE(cmd);
 
   fn_exit:
+    if (tokens)
+        HYD_pmcd_pmi_free_tokens(tokens, token_count);
+    if (value)
+        HYDU_FREE(value);
     HYDU_FUNC_EXIT();
     return status;
 
diff --git a/src/pm/hydra/pm/pmiserv/pmiserv_pmi_v2.c b/src/pm/hydra/pm/pmiserv/pmiserv_pmi_v2.c
index b079b7d..00e6377 100644
--- a/src/pm/hydra/pm/pmiserv/pmiserv_pmi_v2.c
+++ b/src/pm/hydra/pm/pmiserv/pmiserv_pmi_v2.c
@@ -747,9 +747,9 @@ static HYD_status fn_spawn(int fd, int pid, int pgid, char *args[])
 static HYD_status fn_name_publish(int fd, int pid, int pgid, char *args[])
 {
     struct HYD_string_stash stash;
-    char *cmd, *thrid, *val, *name, *port;
+    char *cmd, *thrid, *val, *name = NULL, *port = NULL;
     int token_count, success;
-    struct HYD_pmcd_token *tokens;
+    struct HYD_pmcd_token *tokens = NULL;
     HYD_status status = HYD_SUCCESS;
 
     HYDU_FUNC_ENTER();
@@ -792,6 +792,12 @@ static HYD_status fn_name_publish(int fd, int pid, int pgid, char *args[])
     HYDU_FREE(cmd);
 
   fn_exit:
+    if (tokens)
+        HYD_pmcd_pmi_free_tokens(tokens, token_count);
+    if (name)
+        HYDU_FREE(name);
+    if (port)
+        HYDU_FREE(port);
     HYDU_FUNC_EXIT();
     return status;
 
@@ -804,7 +810,7 @@ static HYD_status fn_name_unpublish(int fd, int pid, int pgid, char *args[])
     struct HYD_string_stash stash;
     char *cmd, *thrid, *name;
     int token_count, success;
-    struct HYD_pmcd_token *tokens;
+    struct HYD_pmcd_token *tokens = NULL;
     HYD_status status = HYD_SUCCESS;
 
     HYDU_FUNC_ENTER();
@@ -842,6 +848,8 @@ static HYD_status fn_name_unpublish(int fd, int pid, int pgid, char *args[])
     HYDU_FREE(cmd);
 
   fn_exit:
+    if (tokens)
+        HYD_pmcd_pmi_free_tokens(tokens, token_count);
     HYDU_FUNC_EXIT();
     return status;
 
@@ -853,9 +861,8 @@ static HYD_status fn_name_lookup(int fd, int pid, int pgid, char *args[])
 {
     struct HYD_string_stash stash;
     char *cmd, *thrid, *name, *value;
-    struct HYD_pmcd_pmi_publish *publish;
     int token_count;
-    struct HYD_pmcd_token *tokens;
+    struct HYD_pmcd_token *tokens = NULL;
     HYD_status status = HYD_SUCCESS;
 
     HYDU_FUNC_ENTER();
@@ -865,9 +872,6 @@ static HYD_status fn_name_lookup(int fd, int pid, int pgid, char *args[])
 
     thrid = HYD_pmcd_pmi_find_token_keyval(tokens, token_count, "thrid");
 
-    HYDU_MALLOC(publish, struct HYD_pmcd_pmi_publish *, sizeof(struct HYD_pmcd_pmi_publish),
-                status);
-
     if ((name = HYD_pmcd_pmi_find_token_keyval(tokens, token_count, "name")) == NULL)
         HYDU_ERR_POP(status, "cannot find token: name\n");
 
@@ -897,6 +901,8 @@ static HYD_status fn_name_lookup(int fd, int pid, int pgid, char *args[])
     HYDU_FREE(cmd);
 
   fn_exit:
+    if (tokens)
+        HYD_pmcd_pmi_free_tokens(tokens, token_count);
     HYDU_FUNC_EXIT();
     return status;
 

http://git.mpich.org/mpich.git/commitdiff/967187c427d45ad92b9886c65e0735a107b4a0dc

commit 967187c427d45ad92b9886c65e0735a107b4a0dc
Author: Paul Coffman <pkcoff at us.ibm.com>
Date:   Thu Dec 4 20:55:01 2014 -0600

    romio gpfs: select correct read buffer addendum
    
    The original fix for 'romio gpfs: select correct read buffer' was still
    missing a critical piece for the last round to use the correct read
    buffer, resulting in a correctness issue that was missed by IOR but
    still found by the IBM PE test team.  The fix was to correctly toggle
    the buffer after the last read.
    
    Signed-off-by: Paul Coffman <pkcoff at us.ibm.com>
    Signed-off-by: Rob Latham <robl at mcs.anl.gov>

diff --git a/src/mpi/romio/adio/common/p2p_aggregation.c b/src/mpi/romio/adio/common/p2p_aggregation.c
index 292ba47..89e891b 100644
--- a/src/mpi/romio/adio/common/p2p_aggregation.c
+++ b/src/mpi/romio/adio/common/p2p_aggregation.c
@@ -733,10 +733,10 @@ void ADIOI_P2PContigReadAggregation(ADIO_File fd,
 
 		    }
 		    if (currentReadBuf == 0) {
-			read_buf = read_buf0;
+			read_buf = read_buf1;
 		    }
 		    else {
-			read_buf = read_buf1;
+			read_buf = read_buf0;
 		    }
 
 		}

http://git.mpich.org/mpich.git/commitdiff/700cd1569d9c6e8b69dbba03a452849101685e08

commit 700cd1569d9c6e8b69dbba03a452849101685e08
Author: Min Si <msi at il.is.s.u-tokyo.ac.jp>
Date:   Thu Dec 4 17:01:11 2014 -0600

    Fix win size translation in attrlangf90 test.
    
    This test passed a 0 size to win_create which is translated to a
    integer(32bit) var by fortran compiler and passed to c mpi_win_create as
    an invalid MPI_Aint(64bit) var by fortran binding because prototype
    checking is not supported. This test can be failed if mpi_win_create
    internally initializes resource related to the value of size (i.e., mxm
    maps win buffer in win_init).
    
    This patch fixed this issue by passing a 64bit local variable as size
    parameter instead of a constant var 0 in this f90 test.
    
    Signed-off-by: Junchao Zhang <jczhang at mcs.anl.gov>

diff --git a/test/mpi/f90/attr/attrlangf90.f90 b/test/mpi/f90/attr/attrlangf90.f90
index e1cedbc..d126631 100644
--- a/test/mpi/f90/attr/attrlangf90.f90
+++ b/test/mpi/f90/attr/attrlangf90.f90
@@ -142,6 +142,7 @@
       implicit none
       integer ierr
       integer errs, tv, rank
+      integer(MPI_ADDRESS_KIND) tmp
 
       errs = 0
       call MPI_INIT( ierr )
@@ -193,7 +194,8 @@
            & )
 !
 !     Create a window to use with the attribute tests in Fortran
-      call MPI_WIN_CREATE( MPI_BOTTOM, 0, 1, MPI_INFO_NULL,&
+      tmp = 0
+      call MPI_WIN_CREATE( MPI_BOTTOM, tmp, 1, MPI_INFO_NULL,&
            & MPI_COMM_WORLD, win, ierr )
 !
       if (fverbose) then

http://git.mpich.org/mpich.git/commitdiff/ccdc619a9e542ffeec5ca027d1d4c15a5b171d2e

commit ccdc619a9e542ffeec5ca027d1d4c15a5b171d2e
Author: James Dinan <james.dinan at intel.com>
Date:   Thu Nov 20 08:27:49 2014 -0500

    Add dynamic err code, predefined err class test
    
    Test for correct error class when a dynamic error code is created from a
    predefined error class.
    
    Signed-off-by: Wesley Bland <wbland at anl.gov>

diff --git a/test/mpi/errhan/Makefile.am b/test/mpi/errhan/Makefile.am
index 2d28fa9..2098592 100644
--- a/test/mpi/errhan/Makefile.am
+++ b/test/mpi/errhan/Makefile.am
@@ -19,7 +19,8 @@ noinst_PROGRAMS = \
     commcall      \
     errfatal      \
     predef_eh     \
-    errstring2
+    errstring2    \
+    dynamic_errcode_predefined_errclass
 
 EXTRA_PROGRAMS = errcode errring errstring
 
diff --git a/test/mpi/errhan/dynamic_errcode_predefined_errclass.c b/test/mpi/errhan/dynamic_errcode_predefined_errclass.c
new file mode 100644
index 0000000..3fa1c9f
--- /dev/null
+++ b/test/mpi/errhan/dynamic_errcode_predefined_errclass.c
@@ -0,0 +1,30 @@
+/*
+ *  (C) 2006 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ *
+ *  Portions of this code were written by Intel Corporation.
+ *  Copyright (C) 2011-2012 Intel Corporation.  Intel provides this material
+ *  to Argonne National Laboratory subject to Software Grant and Corporate
+ *  Contributor License Agreement dated February 8, 2012.
+ */
+
+#include <stdio.h>
+#include <mpi.h>
+
+int main(int argc, char **argv) {
+    int errcode, errclass;
+
+    MPI_Init(&argc, &argv);
+
+    MPI_Add_error_code(MPI_ERR_ARG, &errcode);
+    MPI_Error_class(errcode, &errclass);
+
+    if (errclass != MPI_ERR_ARG) {
+        printf("ERROR: Got 0x%x, expected 0x%x\n", errclass, MPI_ERR_ARG);
+    } else {
+        printf( " No Errors\n" );
+    }
+
+    MPI_Finalize();
+    return 0;
+}
diff --git a/test/mpi/errhan/testlist b/test/mpi/errhan/testlist
index 8ddc826..ebd60ff 100644
--- a/test/mpi/errhan/testlist
+++ b/test/mpi/errhan/testlist
@@ -4,3 +4,4 @@ errfatal 1 resultTest=TestErrFatal
 predef_eh 1
 predef_eh 2
 errstring2 1
+dynamic_errcode_predefined_errclass 1

http://git.mpich.org/mpich.git/commitdiff/260a0401ed51c3c1f156be22a8c3f33f17da0a83

commit 260a0401ed51c3c1f156be22a8c3f33f17da0a83
Author: James Dinan <james.dinan at intel.com>
Date:   Thu Nov 20 08:20:18 2014 -0500

    Fix error class buf in MPI_Error_add_code
    
    During error code creation, the error class was erroneously modified by
    applying ERROR_DYN_MASK when.  The dynamic bit is already set for
    user-defined error classes, so this bug had no effect in all existing
    MPICH tests.  However, when a predefined error class was passed during
    error code creation, it would be incorrectly marked as dynamic,
    resulting in an invalid result when the error class of a returned error
    code was returned via MPI_Error_class.
    
    Signed-off-by: Wesley Bland <wbland at anl.gov>

diff --git a/src/mpi/errhan/dynerrutil.c b/src/mpi/errhan/dynerrutil.c
index 2e3bb27..8dddf5a 100644
--- a/src/mpi/errhan/dynerrutil.c
+++ b/src/mpi/errhan/dynerrutil.c
@@ -256,7 +256,7 @@ int MPIR_Err_add_code( int class )
     /* --END ERROR HANDLING-- */
 
     /* Create the full error code */
-    new_code = class | ERROR_DYN_MASK | (new_code << ERROR_GENERIC_SHIFT);
+    new_code = class | (new_code << ERROR_GENERIC_SHIFT);
 
     /* FIXME: For robustness, we should make sure that the associated string
        is initialized to null */

http://git.mpich.org/mpich.git/commitdiff/e2862b51b471a1e219aba97139d3ba476534b76a

commit e2862b51b471a1e219aba97139d3ba476534b76a
Author: Kuleshov Aleksey <rndfax at yandex.ru>
Date:   Mon Dec 1 09:45:30 2014 -0600

    Fix bug in using wrong type for packsize.
    
    Type of the third argument for MPIR_Pack_size_impl should be a pointer
    to MPI_Aint. This patch fixes the wrong usage of int pointer for
    MPIR_Pack_size_impl in NEWMAD and MXM netmods.
    
    Signed-off-by: Sangmin Seo <sseo at anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_poll.c b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_poll.c
index e8bddc3..752a1f9 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_poll.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_poll.c
@@ -482,8 +482,8 @@ static int _mxm_process_rdtype(MPID_Request ** rreq_p, MPI_Datatype datatype,
         *iov_count = n_iov;
     }
     else {
-        int packsize = 0;
-        MPIR_Pack_size_impl(rreq->dev.user_count, rreq->dev.datatype, (MPI_Aint *) & packsize);
+        MPI_Aint packsize = 0;
+        MPIR_Pack_size_impl(rreq->dev.user_count, rreq->dev.datatype, &packsize);
         rreq->dev.tmpbuf = MPIU_Malloc((size_t) packsize);
         MPIU_Assert(rreq->dev.tmpbuf);
         rreq->dev.tmpbuf_sz = packsize;
diff --git a/src/mpid/ch3/channels/nemesis/netmod/newmad/newmad_poll.c b/src/mpid/ch3/channels/nemesis/netmod/newmad/newmad_poll.c
index 2dba872..5a32515 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/newmad/newmad_poll.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/newmad/newmad_poll.c
@@ -575,7 +575,7 @@ int MPID_nem_newmad_process_rdtype(MPID_Request **rreq_p, MPID_Datatype * dt_ptr
     }
     else
     {
-	int packsize = 0;
+	MPI_Aint packsize = 0;
 	MPIR_Pack_size_impl(rreq->dev.user_count, rreq->dev.datatype, &packsize);
 	rreq->dev.tmpbuf = MPIU_Malloc((size_t) packsize);
 	MPIU_Assert(rreq->dev.tmpbuf);

http://git.mpich.org/mpich.git/commitdiff/22734a653a37a907d64d3bdc840ab09cea8a0fe3

commit 22734a653a37a907d64d3bdc840ab09cea8a0fe3
Author: Pavan Balaji <balaji at anl.gov>
Date:   Sat Nov 22 11:21:25 2014 -0600

    Move static function to sit inside !defined MPICH_MPI_FROM_PMPI.
    
    Signed-off-by: Wesley Bland <wbland at anl.gov>

diff --git a/src/mpi/comm/intercomm_merge.c b/src/mpi/comm/intercomm_merge.c
index 001201e..63df40c 100644
--- a/src/mpi/comm/intercomm_merge.c
+++ b/src/mpi/comm/intercomm_merge.c
@@ -21,6 +21,12 @@ int MPI_Intercomm_merge(MPI_Comm intercomm, int high, MPI_Comm *newintracomm) __
 /* -- End Profiling Symbol Block */
 
 
+/* Define MPICH_MPI_FROM_PMPI if weak symbols are not supported to build
+   the MPI routines */
+#ifndef MPICH_MPI_FROM_PMPI
+#undef MPI_Intercomm_merge
+#define MPI_Intercomm_merge PMPI_Intercomm_merge
+
 /* This function creates VCRT for new communicator
  * basing on VCRT of existing communicator.
  */
@@ -73,12 +79,6 @@ fn_fail:
     return mpi_errno;
 }
 
-/* Define MPICH_MPI_FROM_PMPI if weak symbols are not supported to build
-   the MPI routines */
-#ifndef MPICH_MPI_FROM_PMPI
-#undef MPI_Intercomm_merge
-#define MPI_Intercomm_merge PMPI_Intercomm_merge
-
 #undef FUNCNAME
 #define FUNCNAME MPIR_Intercomm_merge_impl
 #undef FCNAME

http://git.mpich.org/mpich.git/commitdiff/da720427ca009f3cab14aa4b9cc491fbfad68d61

commit da720427ca009f3cab14aa4b9cc491fbfad68d61
Author: Pavan Balaji <balaji at anl.gov>
Date:   Sat Nov 22 11:19:58 2014 -0600

    No need to namespace a static function that is unused outside.
    
    Also remove the function prototype declaration since it is not used
    out-of-order.
    
    Signed-off-by: Wesley Bland <wbland at anl.gov>

diff --git a/src/mpi/comm/intercomm_merge.c b/src/mpi/comm/intercomm_merge.c
index e12d991..001201e 100644
--- a/src/mpi/comm/intercomm_merge.c
+++ b/src/mpi/comm/intercomm_merge.c
@@ -20,19 +20,16 @@ int MPI_Intercomm_merge(MPI_Comm intercomm, int high, MPI_Comm *newintracomm) __
 #endif
 /* -- End Profiling Symbol Block */
 
-/* These functions help implement the merge procedure */
-static int MPIR_Intercomm_merge_create_and_map_vcrt(MPID_Comm *comm_ptr, int local_high, MPID_Comm *new_intracomm_ptr);
-
 
 /* This function creates VCRT for new communicator
  * basing on VCRT of existing communicator.
  */
 
 #undef FUNCNAME
-#define FUNCNAME MPIR_Intercomm_merge_create_and_map_vcrt
+#define FUNCNAME create_and_map_vcrt
 #undef FCNAME
 #define FCNAME MPIDI_QUOTE(FUNCNAME)
-static int MPIR_Intercomm_merge_create_and_map_vcrt(MPID_Comm *comm_ptr, int local_high, MPID_Comm *new_intracomm_ptr)
+static int create_and_map_vcrt(MPID_Comm *comm_ptr, int local_high, MPID_Comm *new_intracomm_ptr)
 {
     int mpi_errno = MPI_SUCCESS;
     int i, j;
@@ -174,7 +171,7 @@ int MPIR_Intercomm_merge_impl(MPID_Comm *comm_ptr, int high, MPID_Comm **new_int
 
     /* Now we know which group comes first.  Build the new vcr
        from the existing vcrs */
-    mpi_errno = MPIR_Intercomm_merge_create_and_map_vcrt(comm_ptr, local_high, (*new_intracomm_ptr));
+    mpi_errno = create_and_map_vcrt(comm_ptr, local_high, (*new_intracomm_ptr));
     if (mpi_errno) MPIU_ERR_POP(mpi_errno);
 
     /* We've setup a temporary context id, based on the context id
@@ -208,7 +205,7 @@ int MPIR_Intercomm_merge_impl(MPID_Comm *comm_ptr, int high, MPID_Comm **new_int
     (*new_intracomm_ptr)->context_id = new_context_id;
     (*new_intracomm_ptr)->recvcontext_id = new_context_id;
 
-    mpi_errno = MPIR_Intercomm_merge_create_and_map_vcrt(comm_ptr, local_high, (*new_intracomm_ptr));
+    mpi_errno = create_and_map_vcrt(comm_ptr, local_high, (*new_intracomm_ptr));
     if (mpi_errno) MPIU_ERR_POP(mpi_errno);
 
     mpi_errno = MPIR_Comm_commit((*new_intracomm_ptr));

http://git.mpich.org/mpich.git/commitdiff/01d09be1ee869a0b9848dc2b8e3f48bca4980bf1

commit 01d09be1ee869a0b9848dc2b8e3f48bca4980bf1
Author: Pavan Balaji <balaji at anl.gov>
Date:   Sat Nov 22 11:07:46 2014 -0600

    Remove old unused errnames.
    
    Signed-off-by: Wesley Bland <wbland at anl.gov>

diff --git a/src/mpi/errhan/errnames-old.txt b/src/mpi/errhan/errnames-old.txt
deleted file mode 100644
index 213b2c8..0000000
--- a/src/mpi/errhan/errnames-old.txt
+++ /dev/null
@@ -1,286 +0,0 @@
-# These are unused error names, saved in case they're added again
-**abort:application called MPI_ABORT
-**allocmem %d %d:Unable to allocate %d memory for MPI_Alloc_mem; only %d available
-**argaddress:Address of location given to MPI_ADDRESS does not fix in a \
-Fortran integer
-**argaddress %ld:Address of location given to MPI_ADDRESS does not fix in a \
-Fortran integer (value is %ld)
-**argarray:Invalid value in array
-**argarray %s %d %d:Invalid value in %s[%d] = %d
-**argnamed:Invalid argument
-**argnamed %s %d:Invalid argument %s with value %d
-#
-# ch3:essm
-#
-**argstr_shmevent:shared memory event not found in the business card
-**event_create:unable to create an event
-**event_open:unable to open an event
-**event_reset:unable to reset an event
-**event_set:unable to set an event
-**event_wait:unable to wait on an event
-**postwrite:postwrite failed
-**postwrite %p %p:postwrite failed (%p %p)
-#
-**attrcopy:User defined attribute copy routine returned a non-zero return code
-**attrcopy %d:User defined attribute copy routine returned a non-zero return code %d
-**bad_conn:bad conn structure pointer
-**bad_conn %p %p:bad conn structure pointer (%p != %p)
-**badpacket:Received a packet of unknown type
-**badpacket %d:Received a packet of unknown type (%d)
-**bad_sock %d %d:bad sock (%d != %d)
-**base %d:Invalid base address %d
-**boot_attach:failed to attach to a bootstrap queue
-**boot_attach %s:failed to attach to a bootstrap queue - %s
-**bootqmsg %d %d:invalid bootstrap queue message size (%d bytes > %d)
-**bsendnobuf:No buffer to detach. 
-**bufalias %s %s:Buffer parameters %s and %s must not be aliased 
-**bufsize:Invalid buffer size
-**bufsize %d:Invalid buffer size (value is %d)
-**ca:invalid completion action
-**ca %d:invalid completion action (%d)
-**cancelperrecv:Cancellation of persistent receive requests is not supported
-**cancelpersend:Cancellation of persistent send requests is not supported
-**cancelsend:Cancellation of send requests is not supported
-# CH3:rdma
-**ch3_finalize:Channel finalization failed
-**ch3progress:Unable to make message passing progress
-**ch3progress %d:Channel progress engine failed on line %d
-**ch3_init:Channel init failed
-**ch3_send:send failed
-**read_progress:Unable to make read progress
-**handle_read:Unable to handle the read data
-**process_group:Process group initialization failed
-**rdma_finalize:Channel rdma finalization failed
-**rdma_init:Channel rdma initialization failed
-**poke:progress_poke failed
-**postpkt:Unable to post a read for the next packet header
-**write_progress:Write progress failed
-#
-**ch3|putpkt:UNIMPLEMENTED: unable to handling put packets
-**ch3|sock|addrinuse %d:[ch3:sock] tcp port %d already in use
-**ch3|sock|badbuscard:[ch3:sock] GetHostAndPort - Invalid business card
-**ch3|sock|badbuscard %s:[ch3:sock] GetHostAndPort - Invalid business card (%s)
-**ch3|sock|bizcard_cache:business card does not match the one in the cache
-**ch3|sock|bizcard_cache %s %s:business card in cache: %s, business card passed: %s
-**ch3|sock|connallocfailed:[ch3:sock] unable to allocate a connection structure
-**ch3|sock|connfailed %d %d:[ch3:sock] failed to connnect to remote process %d:%d
-**ch3|sock|hostlookup %s %d %s:[ch3:sock] failed to obtain host information for process %s:%d (%s)
-# 
-**fcntl:fcntl failed
-**needthreads:This function needs threads and threads have not been enabled
-**winpassive:Attempt to use passive target access with a window not allocated \
-with MPI_Alloc_mem. 
-**namepublish %s:Unable to publish service name %s
-**spawnfail:Could not spawn all requested processes 
-**failure:unknown failure
-**post_write %p %p:Unable to post a write (%p %p)
-**socket %d:WSASocket failed (errno %d)
-**shmq:invalid shm queue pointer
-**mx_get_info:mx_get_info failed
-**internrc %d:Internal MPI error!  Unexpected return code from internal function (rc=%d).
-**mpi_status_f2c:MPI_Status_f2c failed
-**hostlookup:Host lookup failed
-**mpi_wtick:MPI_Wtick failed
-**ioneedwr:Write access is required to this file
-**winnamelen:Specified window object name is too long
-**mpi_type_test:MPI_Type_test failed
-**iofstype:Cannot determine filesystem type
-**countarray:Invalid count in count array 
-**dimstensor %d %d:Tensor product size is %d but must be the same as the number of \
- nodes, which is %d
-**shmctl %d:shmctl failed, error %d
-**keyvalwin:Keyval is not in window object 
-**rangeinvalid:Invalid range
-**f90typetoomany %s %d:Too many requests for unnamed, predefined f90 %s \
- types; no more than %d allowed.
-**nameservice %s:Invalid service name %s (see MPI_Publish_name)
-**namepublish:Unable to publish service name
-**connclose:active connection unexpectedly closed
-**mpi_pcontrol %d:MPI_Pcontrol(level=%d) failed
-**dtypecomm:Pack buffer not packed for this communicator.
-**spawnpgm:The named program could not be found
-**listen %d:listen failed (errno %d)
-**badsock:internal error - bad sock
-**pkt_type:invalid packet type
-**pkt_type %d:invalid packet type (%d)
-**connterm:active connection unexpectedly terminated
-**opundefined_rma %d:RMA target received unknown RMA operation type %d
-**duphandle %d:unable to duplicate a handle (errno %d)
-**keyvalperm:Cannot free permanent attribute key
-**inet_ntop:inet_ntop failed
-**notsamevalue:Arguments to collective routine must be the same
-**invalid_shmq:invalid shm queue pointer
-**startup:Error on startup, such as a \
-      mismatch between mpiexec and the MPI libraries
-**ioamodeseq %s:Cannot use function %s when the file is opened with amode \
-    MPI_MODE_SEQUENTIAL
-**preinit %s:MPI_Init or MPI_Init_thread must be called before %s
-**rootlarge:Value of root is too large
-**dtypepermcontents:Cannot get contents of a permanent or basic data type 
-**connfailed %d %d:Failed to connect to remote process %d-%d
-**fcntl %s:fcntl failed - %s
-**getinfo %d:getaddrinfo failed (errno %d)
-**contextIdInUse:Context id already in use
-**mpi_wtime:MPI_Wtime failed
-**intracomm:Intracommunicator is not allowed 
-**infovallong %s %d %d:Value %s is too long (length is %d but maximum length is %d)
-**sock_gethost %d:gethostname failed (errno %d)
-**intercommcoll %s:Intercommunicator collective operation for %s has not been implemented
-**iofilenull:Null file handle
-**sock_iocp:unable to create an I/O completion port
-**dev|pg_destroy|pg_not_found:process group being destroyed was not in the process group list
-**dev|pg_destroy|pg_not_found %p:process group being destroyed was not in the process group list (pg=%p)
-**servicename %s:Attempt to lookup an unknown service name %s
-**multi_post_write:posting a write while a previously posted write is outstanding
-**mx_get_info %s:mx_get_info failed (%s)
-**statusignore:Invalid use of MPI_STATUS_IGNORE or MPI_STATUSES_IGNORE
-**gethostbyname %d:gethostbyname failed (errno %d)
-**fileopunsupported %s:Unsupported file operation %s
-**notsameorder:Collective routines called in an inconsistent order
-#**notsameorder:Collective routines called in an inconsistent order (optional arguments: null terminated
-#      array of names (array of string))
-**sock_byname %d:gethostbyname failed (errno %d)
-**sockw_badwsethnd: Invalid handle to waitSet
-**mpi_pcontrol:MPI_Pcontrol failed
-**failure %d:unknown failure, error %d
-**shmdt %s:shmdt failed - %s
-**preinit:MPI_Init or MPI_Init_thread must be called first
-**not_in_local_ranks:cannot find our rank in the list of local processes
-**inet_pton %s:inet_pton failed - %s
-**fileinuse %s:File %s is in use by some process
-**ioasyncwaiting:There are outstanding nonblocking I/O operations on this file
-**init_comm_create:unable to create an intercommunicator for the parent
-**post_sock_write_on_shm:posting a socket read on a shm connection
-**rootlarge %d %d:Value of root is too large (value is %d but must be less than %d)
-**mx_wait:mx_wait failed
-**init_strtok_host:failed to copy the hostname from the business card
-**iosequnsupported %s:MPI_MODE_SEQUENTIAL not supported on file system %s
-**keyvalcomm:Keyval is not in communicator 
-**commpeer:Peer communicator is not valid 
-**iofstypeunsupported %s:Specified filesystem %s is not available
-**GetMemTwice:Global shared memory initializer called more than once
-**inet_addr %s %s %d: inet_addr on %s failed, %s (errno %d)
-**othersys %s:System resource (%s) limit exceeded 
-**getsockopt %s:getsockopt failed - %s
-**filenospace %s %d %d:Not enough space for file %s; %d needed but only %d available
-**ibv_open_device %p:ibv_open_device failed got list (%p)
-**init_comm_create %d:unable to create an intercommunicator for the parent (error %d)
-**ibu_op:invalid infiniband operation
-**iofilecorrupt:File corrupt
-**shutdown %d:shutdown failed (errno %d)
-**sock_iocp %d:unable to create an I/O completion port (errno %d)
-**post_write:Unable to post a write
-**keyvaldtype:Keyval is not in datatype 
-**errhandlerperm:Cannot free permanent error handler
-**closesocket %d:closesocket failed (errno %d)
-**shm_wait:wait function failed
-**opundefined_rma:RMA target received unknown RMA operation
-**ibwrite:infiniband write failed
-**ioneedrdwr:Read/write access is required to this file
-**dimsarray %d %d:Value of dims[%d] is %d which is invalid
-**select %s:select failed - %s
-**iodispnotcurrent:Displacement must be set to MPI_DISPLACEMENT_CURRENT \
-    since file was opened with MPI_MODE_SEQUENTIAL
-**connfailed:Failed to connect to remote process
-**dtypeperm %s:Cannot free permanent data type %s
-**mpi_status_c2f:MPI_Status_c2f failed
-**boot_tostring:unable to get a string representation of the boostrap queue
-**porttimeout:Time out attempting an MPI_Comm_connect to a port
-**contextIdNotInUse %d:Context id %d to be freed is not in use
-**dtypepermcontents %s:Cannot get contents of a permanent or basic data type %s
-**countarray %d %d:Invalid count in the %d element of the count array; value \
-is %d
-**mpi_status_c2f %p %p:MPI_Status_c2f(c_status=%p, f_status=%p) failed
-**conn_still_active:connection closed while still active
-**sock_post_close %d:posting a close of the socket failed (errno %d)
-**servicenameunpublish:Attempt to unpublish an unknown service name
-**grapharraysize:Specified edge less than zero or greater than nnodes
-**commname:Cannot set name in communicator 
-**dimsarray:Invalid dimension argument in array 
-**infokeylong %s %d %d:Key %s is too long (length is %d but maximum allowed is %d)
-**count %d:Invalid count, value = %d
-**invalid_refcount %d %p %d:invalid reference count (handle=%d, object=%p, count=%d)
-**sock_create %d:unable to create a socket (errno %d)
-**nouniquehigh:Could not determine which group to place first in merged \
- intracommunicator.  Please use the parameter high to choose which group \
- should be placed first.
-**mx_test:mx_test failed
-**mx_test %s:mx_test failed (%s)
-**dtypenullarray:Null datatype in array of datatypes
-**dimstensor:Tensor product size does not match nnodes
-**spawnpgm %s:The program %s could not be found
-**hostlookup %d %d %s:Host lookup failed for process group %d, rank %d, business card <%s>
-**commlocalnull:Local communicator must not be MPI_COMM_NULL
-**init_vcrdup:failed to duplicate the virtual connection reference
-**sock_wait:sock_wait failed
-**invalid_handle %d %p:invalid handle (handle=%d, object=%p)
-**connrefused %d %d %s:Connection refused for process group %d, rank %d, business card <%s>
-**grapharraysize %d %d %d:Specified edge %d is %d but must be at least zero \
- and less than %d
-**init_strtok_host %s:failed to copy the hostname from this business card: %s
-**contextIdNotInUse:Context id to be freed is not in use
-**fileexist %s:File %s exists
-**filerdonly %s:Read-only file or filesystem name %s
-**notsameroot:Inconsistent root 
-**wsasock %d:WSAStartup failed (errno %d)
-**contextIdInUse %d:Context id %d already in use
-**mpi_status_f2c %p %p:MPI_Status_f2c(f_status=%p, c_status=%p) failed
-**rsendnomatch %d %d %d:Ready send from source %d, for destination %d and \
- with tag %d had no matching receive
-**iocp %d:unable to create an I/O completion port (errno %d)
-**winnotinit: Attempt to use an MPI RMA function that requires an MPI Window \
- object before creating any MPI Window object
-**inet_ntop %s:inet_ntop failed - %s
-**finalize_progress:finalizing the progress engine failed
-**inet_pton:inet_pton failed
-**filenull:Null MPI_File 
-**mx_wait %s:mx_wait failed (%s)
-**nomem %d %d:Out of memory (requested %d but only %d available)
-**fileamode %d:Invalid amode value of %d in MPI_File_open 
-**ibu_op %d:invalid infiniband operation (%d)
-**dtypenomatch:Type signatures do not match in communication
-**shmdt:shmdt failed
-**winname:Cannot set window object name 
-**multi_post_read:posting a read while a previously posted read is outstanding
-**progress_finalize:finalization of the progress engine failed
-**iofstype %s:Cannot determine filesystem type for file %s
-**keyvalnull:Null keyval 
-**startup %s:Error on startup: reason is %s
-**porttimeout %s:Time out attempting an MPI_Comm_connect to a port named %s
-**invalid_listener:invalid listener
-**invalid_listener %p:invalid listener (%p)
-**invalid_shmq %p %p:invalid shm queue pointer (%p != %p)
-**freemembase:Invalid base address in MPI_Free_mem 
-**spawnmanager:The process manager returned an error
-**spawnmanager %s:The process manager returned an error: %s
-**commnamelen:Specified communicator name is too long
-**commnamelen %d:Specified communicator name is too long (%d characters)
-**connallocfailed:Connection failed
-**connrefused:Connection refused
-**datarepunsupported %s:Unsupported datarep %s passed to MPI_File_set_view
-**errhandlerperm %s:Cannot free permanent error handler %s
-**filequota %s:Quota %s exceeded for files
-**intercommcoll:Intercommunicator collective operations have not been implemented
-**intercomm:Intercommunicator is not allowed 
-**iosharedunsupported %s:Shared file pointers not supported on filesystem %s
-**notsamevalue %s %s:Argument %s to collective routine %s must be the same
-**othersys:System resource limit exceeded 
-**pfinal_sockclose:sock_close failed
-**servicenameunpublish %s:Attempt to unpublish an unknown service name %s
-**setsockopt %s:setsockopt failed - %s
-**shmq %p %p:invalid shm queue pointer (%p != %p)
-**shmw_serbufsmall: Size of buffer to serialize shared memory handle    \
-is too small (< MPIU_SHMW_GHND_SZ)
-**sock_byname:gethostbyname failed
-**sock_byname %s %d:gethostbyname failed, %s (errno %d)
-**sock_connect %d:connect failed (errno %d)
-**sock_connect %s %d %d:unable to connect to %s on port %d, error %d
-**sockw_badtvalhnd: Invalid handle to timeval
-**sockw_badwsetshnd: Invalid handle to waitSet sock handle
-**winnamelen %d:Specified window object name is too long (%d characters)
-**ch3ireadaggressive:aggressive reading failed
-**dtypename:Cannot set name in data type 
-**dtypenullarray %s %d:Null datatype in array of datatypes %s[%d]
-**dev|pg_destroy_failed:attempt to destroy a process group failed
-**dev|pg_destroy_failed %p:attempt to destroy a process group failed (pg=%p)
-**pmi_kvs_get_parent %d:unable to get the PARENT_ROOT_PORT_NAME from the keyval space (pmi_error %d)
diff --git a/src/mpid/ch3/errnames-old.txt b/src/mpid/ch3/errnames-old.txt
deleted file mode 100644
index 6370c9c..0000000
--- a/src/mpid/ch3/errnames-old.txt
+++ /dev/null
@@ -1,12 +0,0 @@
-# Previously used error names
-**ch3|badca %d:specified completion action in not known (%d)
-**ch3|badca:specified completion action in not known
-**ch3|canceleager:failure occurred while performing local cancellation of a eager message
-**ch3|flowcntlpkt:UNIMPLEMENTED: unable to handle flow control packets
-**ch3|get_parent_port_err_bcast:an error occurred while broadcasting the error code from MPIDI_CH3_GetParentPort()
-**ch3|get_universe_size_notimpl: MPIDI_CH3_Get_universe_size() is not implemented
-**ch3|loadrecviov %s:failure occurred while loading the receive I/O vector (%s)
-**ch3|recvdata:failure occurred while attempting to receive message data
-**ch3|recvdata %s:failure occurred while attempting to receive message data (%s)
-**ch3|unknownpkt:received unknown packet type
-**ch3|unknownpkt %d:received unknown packet type (type=%d)
diff --git a/src/mpid/ch3/util/sock/errnames-old.txt b/src/mpid/ch3/util/sock/errnames-old.txt
deleted file mode 100644
index 57d9cf4..0000000
--- a/src/mpid/ch3/util/sock/errnames-old.txt
+++ /dev/null
@@ -1,19 +0,0 @@
-# No longer used error strings
-**ch3|sock|connfailed %s %d:[ch3:sock] failed to connnect to remote process %s:%d
-**ch3|sock|pg_limit:reached the limit of process groups for spawn/connect/accept
-**ch3|sock|pgrank:rank must be less than process group size
-**ch3|sock|pgrank %d %d:pg size %d, rank passed %d
-**ch3|sock|pgrank_cache:rank must be less than process group size in the cache
-**ch3|sock|pgrank_cache %d %d:pg size in cache %d, rank passed %d
-**ch3|sock|pgsize:process group sizes don't match
-**ch3|sock|pgsize %d %d:existing pg size %d, matching pg size %d
-**ch3|sock|postwrite %p %p %p:attempt to post a write operation failed (sreq=%p,conn=%p,vc=%p)
-**ch3|sock|progress_finalize:[ch3:sock] progress_finalize failed
-**ch3|sock|strdup:[ch3:sock] MPIU_Strdup failed
-**ch3|sock|immedread %p %p %p:immediate read operation failed (rreq=%p,conn=%p,vc=%p)
-**ch3|sock|immedwrite %p %p %p:immediate write operation failed (rreq=%p,conn=%p,vc=%p)
-**ch3|sock|pg_finalize:process group finalization failed
-**ch3|sock|pgsize_cache:process group size does not match the one in the cache
-**ch3|sock|pgsize_cache %d %d:size in cache %d, size passed %d
-**ch3|sock|post_write:[ch3:sock] posting a write failed
-**ch3|sock|postread %p %p %p:attempt to post a read operation failed (rreq=%p,conn=%p,vc=%p)
diff --git a/src/mpid/common/sock/errnames-old.txt b/src/mpid/common/sock/errnames-old.txt
deleted file mode 100644
index dfb1210..0000000
--- a/src/mpid/common/sock/errnames-old.txt
+++ /dev/null
@@ -1,7 +0,0 @@
-# Unused error strings
-**sock|badbuf %d %d:the supplied buffer contains invalid memory (set=%d,sock=%d)
-**sock|closed:sock has been closed locally
-**sock|connfailed %d %d:connection failure (set=%d,sock=%d)
-**sock|hostres:unable to resolve host name to an address
-**sock|osnomem %d %d:operating system routine failed due to lack of memory (set=%d,sock=%d)
-
diff --git a/src/mpid/common/sock/poll/errnames-old.txt b/src/mpid/common/sock/poll/errnames-old.txt
deleted file mode 100644
index e53dc70..0000000
--- a/src/mpid/common/sock/poll/errnames-old.txt
+++ /dev/null
@@ -1,3 +0,0 @@
-# Unused error names
-**sock|poll|hostres %d %d %s:unable to resolve host name to an address (set=%d,sock=%d,host=%s)
-**sock|poll|oserror %s %d %s:unexpected operating system error from %s (errno=%d:%s)
diff --git a/src/pmi/errnames-old.txt b/src/pmi/errnames-old.txt
deleted file mode 100644
index 3c09d43..0000000
--- a/src/pmi/errnames-old.txt
+++ /dev/null
@@ -1,16 +0,0 @@
-# Unused pmi error names
-**pmi_finalize:PMI_Finalize failed
-**pmi_finalize %d:PMI_Finalize returned %d
-**pmi_initialized:PMI_Initialized failed
-**pmi_initialized %d:PMI_Initialized returned %d
-**pmi_kvs_create:PMI_KVS_Create failed
-**pmi_kvs_create %d:PMI_KVS_Create returned %d
-**pmi_kvs_destroy %d:PMI_KVS_Destroy returned %d
-**pmi_kvs_get %s %s %d:PMI_KVS_Get(%s, %s) returned %d
-**pmi_kvs_iter_first:PMI_KVS_iter_first failed
-**pmi_kvs_iter_first %s %d:PMI_KVS_iter_first(%s) returned %d
-**pmi_kvs_iter_first %d:PMI_KVS_iter_first returned %d
-**pmi_kvs_iter_next %s %d:PMI_KVS_iter_next(%s) returned %d
-**pmi_kvs_put %s %s %s %d:PMI_KVS_Put(%s, %s, %s) returned %d
-**pmi_kvs_iter_next:PMI_KVS_iter_next failed
-**pmi_kvs_iter_next %d:PMI_KVS_iter_next returned %d

http://git.mpich.org/mpich.git/commitdiff/f3eb60f9cab4062e205e5db27df8b0050ea73bd7

commit f3eb60f9cab4062e205e5db27df8b0050ea73bd7
Author: Paul Coffman <pkcoff at us.ibm.com>
Date:   Mon Nov 24 21:29:58 2014 -0600

    romio gpfs: select correct read buffer
    
    ROMIO GPFSMPIO_P2PCONTIG threaded read needs to toggle first read buffer
    
    When using both the GPFSMPIO_P2PCONTIG and GPFSMPIO_PTHREADIO
    optimizations there was a correctness bug when reading where for the
    first round the read buffer did not toggle to the two-phase buffer for
    the pthread reader, resulting in diseminating the data from the wrong
    buffer.  The fix is to do the toggle after the first read.
    
    Signed-off-by: Paul Coffman <pkcoff at us.ibm.com>
    Signed-off-by: Rob Latham <robl at mcs.anl.gov>

diff --git a/src/mpi/romio/adio/common/p2p_aggregation.c b/src/mpi/romio/adio/common/p2p_aggregation.c
index c7c1800..292ba47 100644
--- a/src/mpi/romio/adio/common/p2p_aggregation.c
+++ b/src/mpi/romio/adio/common/p2p_aggregation.c
@@ -667,6 +667,7 @@ void ADIOI_P2PContigReadAggregation(ADIO_File fd,
 		ADIO_ReadContig(fd, read_buf,amountDataToReadThisRound,
 			MPI_BYTE, ADIO_EXPLICIT_OFFSET, currentRoundFDStart,
 			&status, error_code);
+        currentReadBuf = 1;
 
 #ifdef ROMIO_GPFS
 		endTimeBase = MPI_Wtime();

http://git.mpich.org/mpich.git/commitdiff/2251543f5b67a4f5baa0a22a8a07cb0db1844376

commit 2251543f5b67a4f5baa0a22a8a07cb0db1844376
Author: William Gropp <wgropp at illinois.edu>
Date:   Mon Nov 24 10:44:25 2014 -0600

    Make ROMIO htmldocs update link file
    
    Update the use of DOCTEXT to match the rest of MPICH, including adding
    -nolocation (drop the location of the source file from the documentation)
    and ensure that the mpi.cit file contains the I/O routines as well as
    the others (this file can be used to add links to the man pages in
    other documents).
    
    Signed-off-by: Rob Latham <robl at mcs.anl.gov>

diff --git a/src/mpi/romio/Makefile.am b/src/mpi/romio/Makefile.am
index 7468dd3..b9d4e25 100644
--- a/src/mpi/romio/Makefile.am
+++ b/src/mpi/romio/Makefile.am
@@ -132,20 +132,27 @@ mandoc_path3=$(abs_top_builddir)/man/man3
 htmldoc_path1=$(abs_top_builddir)/www/www1
 htmldoc_path3=$(abs_top_builddir)/www/www3
 doctext_docnotes=
+# Provide an easily replaced url root for the generated index file.
+# You can override this with URL desired in the index file generated by doctext.
+# You can ignore this if you don't use mapnames or tohtml to add links
+# to the MPI manual pages to documents.
+htmldoc_root3="--your-url-here--"
 
 .c.man-phony:
 	$(doctextman_verbose)$(DOCTEXT) -man -mpath $(mandoc_path3) -ext 3 \
-	    -heading MPI -quotefmt $(doctext_docnotes) $<
+	    -heading MPI -quotefmt -nolocation $(doctext_docnotes) $<
 .c.html-phony:
 	$(doctexthtml_verbose)$(DOCTEXT) -html -mpath $(htmldoc_path3) \
-	    -heading MPI -quotefmt $(doctext_docnotes) $<
+	    -heading MPI -quotefmt -nolocation \
+	    -index $(htmldoc_path3)/mpi.cit -indexdir $(htmldoc_root3) \
+		$(doctext_docnotes) $<
 
 .txt.man1-phony:
 	$(doctextman_verbose)$(DOCTEXT) -man -mpath $(mandoc_path1) -ext 1 \
-	    -heading MPI -quotefmt $(doctext_docnotes) $<
+	    -heading MPI -quotefmt -nolocation $(doctext_docnotes) $<
 .txt.html1-phony:
 	$(doctexthtml_verbose)$(DOCTEXT) -html -mpath $(htmldoc_path1) \
-	    -heading MPI -quotefmt $(doctext_docnotes) $<
+	    -heading MPI -quotefmt -nolocation $(doctext_docnotes) $<
 
 # use mandoc-local target to force directory creation before running DOCTEXT
 mandoc:

http://git.mpich.org/mpich.git/commitdiff/fb709f4f658b3898a1e71fd1b077dd00b0b6c709

commit fb709f4f658b3898a1e71fd1b077dd00b0b6c709
Author: Min Si <msi at il.is.s.u-tokyo.ac.jp>
Date:   Fri Nov 21 15:03:01 2014 -0600

    Simplify test setting of min/full datatype tests.
    
    Signed-off-by: Pavan Balaji <balaji at anl.gov>

diff --git a/test/mpi/coll/bcast2.c b/test/mpi/coll/bcast2.c
index 2acfda3..40054cd 100644
--- a/test/mpi/coll/bcast2.c
+++ b/test/mpi/coll/bcast2.c
@@ -39,10 +39,9 @@ int main( int argc, char *argv[] )
 
     MTEST_DATATYPE_FOR_EACH_COUNT(count) {
 
-        /* Only run full datatype tests in comm world to shorten test time. */
-        if (comm == MPI_COMM_WORLD) {
-            MTestInitFullDatatypes();
-        } else {
+        /* To shorten test time, only run the default version of datatype tests
+         * for comm world and run the minimum version for other communicators. */
+        if (comm != MPI_COMM_WORLD) {
             MTestInitMinDatatypes();
         }
 
diff --git a/test/mpi/coll/bcast3.c b/test/mpi/coll/bcast3.c
index c78d769..2da4bf8 100644
--- a/test/mpi/coll/bcast3.c
+++ b/test/mpi/coll/bcast3.c
@@ -35,10 +35,9 @@ int main( int argc, char *argv[] )
 	count = 1;
 	MTEST_DATATYPE_FOR_EACH_COUNT(count) {
 
-        /* Only run full datatype tests in comm world to shorten test time. */
-        if (comm == MPI_COMM_WORLD) {
-            MTestInitFullDatatypes();
-        } else {
+        /* To shorten test time, only run the default version of datatype tests
+         * for comm world and run the minimum version for other communicators. */
+        if (comm != MPI_COMM_WORLD) {
             MTestInitMinDatatypes();
         }
 
diff --git a/test/mpi/pt2pt/pingping.c b/test/mpi/pt2pt/pingping.c
index c725216..d8646f7 100644
--- a/test/mpi/pt2pt/pingping.c
+++ b/test/mpi/pt2pt/pingping.c
@@ -42,10 +42,9 @@ int main( int argc, char *argv[] )
 
 	for (count = 1; count < MAX_COUNT; count = count * 2) {
 
-        /* Only run full datatype tests in comm world to shorten test time. */
-        if (comm == MPI_COMM_WORLD) {
-            MTestInitFullDatatypes();
-        } else {
+        /* To shorten test time, only run the default version of datatype tests
+         * for comm world and run the minimum version for other communicators. */
+        if (comm != MPI_COMM_WORLD) {
             MTestInitMinDatatypes();
         }
 

http://git.mpich.org/mpich.git/commitdiff/d6ef4d28ae6623f84bfcea325923189254f25dd0

commit d6ef4d28ae6623f84bfcea325923189254f25dd0
Author: Min Si <msi at il.is.s.u-tokyo.ac.jp>
Date:   Sat Nov 22 08:29:38 2014 -0600

    Set default test level of mtest-datatype through env var.
    
    Three datatype test levels are defined: basic,min,full(default
    full).  The default level can be overwritten in runtime by
    setting environment variable MPITEST_DATATYPE_TEST_LEVEL.
    
    An MPI test can also specify different level for each datatype
    loop by calling corresponding datatype test initialization function
    before that loop, otherwise the default version is used.
    
        Basic     : MTestInitBasicDatatypes
        Minimum   : MTestInitMinDatatypes
        Full      : MTestInitFullDatatypes
    
    Signed-off-by: Pavan Balaji <balaji at anl.gov>

diff --git a/test/mpi/util/mtest_datatype_gen.c b/test/mpi/util/mtest_datatype_gen.c
index 66279af..7a95da3 100644
--- a/test/mpi/util/mtest_datatype_gen.c
+++ b/test/mpi/util/mtest_datatype_gen.c
@@ -68,17 +68,26 @@ static int verbose = 0;         /* Message level (0 is none) */
  *    2. Add its create/init/check functions in file mtest_datatype.c
  *    3. Add its creator function to mtestDdtCreators variable
  *
- *  Following two datatype generators are defined.
- *    1. Full datatypes generator:
+ *  Following three test levels of datatype are defined.
+ *    1. Basic
+ *      All basic datatypes
+ *    2. Minimum
+ *      All basic datatypes | Vector | Indexed
+ *    3. Full
  *      All basic datatypes | Vector | Hvector | Indexed | Hindexed |
  *      Indexed-block | Hindexed-block | Subarray with order-C | Subarray with order-Fortran
- *    2. Minimum datatypes generator:
- *      All basic datatypes | Vector | Indexed
  *
- *  MPI test can initialize either generator by calling the corresponding init
- *  function before datatype loop, The full generator is set by default.
- *    Full generator : MTestInitFullDatatypes
- *    Minimum generator : MTestInitMinDatatypes
+ *  There are two ways to specify the test level of datatype. The second way has
+ *  higher priority (means the value specified by the first way will be overwritten
+ *  by that in the second way).
+ *  1. Specify global test level by setting the MPITEST_DATATYPE_TEST_LEVEL
+ *     environment variable before execution (basic,min,full|full by default).
+ *  2. Initialize a special level for a datatype loop by calling the corresponding
+ *     initialization function before that loop, otherwise the default value specified
+ *     in the first way is used.
+ *    Basic     : MTestInitBasicDatatypes
+ *    Minimum   : MTestInitMinDatatypes
+ *    Full      : MTestInitFullDatatypes
  */
 
 static int datatype_index = 0;
@@ -103,11 +112,17 @@ static int MTEST_RECV_DDT_NUM_TESTS = 0;
 static int MTEST_RECV_DDT_RANGE = 0;
 
 enum {
-    MTEST_DATATYPE_VERSION_FULL,
-    MTEST_DATATYPE_VERSION_MIN
+    MTEST_DATATYPE_TEST_LEVEL_FULL,
+    MTEST_DATATYPE_TEST_LEVEL_MIN,
+    MTEST_DATATYPE_TEST_LEVEL_BASIC,
 };
 
-static int MTEST_DATATYPE_VERSION = MTEST_DATATYPE_VERSION_FULL;
+/* current datatype test level */
+static int MTEST_DATATYPE_TEST_LEVEL = MTEST_DATATYPE_TEST_LEVEL_FULL;
+/* default datatype test level specified by environment variable */
+static int MTEST_DATATYPE_TEST_LEVEL_ENV = -1;
+/* default datatype initialization function */
+static void (*MTestInitDefaultTestFunc) (void) = NULL;
 
 static void MTestInitDatatypeGen(int basic_dt_num, int derived_dt_num)
 {
@@ -130,9 +145,9 @@ static int MTestIsDatatypeGenInited()
 
 static void MTestPrintDatatypeGen()
 {
-    MTestPrintfMsg(1, "MTest datatype version : %s. %d basic datatype tests, "
+    MTestPrintfMsg(1, "MTest datatype test level : %s. %d basic datatype tests, "
                    "%d derived datatype tests will be generated\n",
-                   (MTEST_DATATYPE_VERSION == MTEST_DATATYPE_VERSION_FULL) ? "FULL" : "MIN",
+                   (MTEST_DATATYPE_TEST_LEVEL == MTEST_DATATYPE_TEST_LEVEL_FULL) ? "FULL" : "MIN",
                    MTEST_BDT_NUM_TESTS, MTEST_SEND_DDT_NUM_TESTS + MTEST_RECV_DDT_NUM_TESTS);
 }
 
@@ -143,12 +158,12 @@ static void MTestResetDatatypeGen()
 
 void MTestInitFullDatatypes()
 {
-    /* Do not allow to change datatype version during loop.
+    /* Do not allow to change datatype test level during loop.
      * Otherwise indexes will be wrong.
      * Test must explicitly call reset or wait for current datatype loop being
-     * done before changing to another datatype version. */
+     * done before changing to another test level. */
     if (!MTestIsDatatypeGenInited()) {
-        MTEST_DATATYPE_VERSION = MTEST_DATATYPE_VERSION_FULL;
+        MTEST_DATATYPE_TEST_LEVEL = MTEST_DATATYPE_TEST_LEVEL_FULL;
         MTestTypeCreatorInit((MTestDdtCreator *) mtestDdtCreators);
         MTestInitDatatypeGen(MTEST_BDT_MAX, MTEST_DDT_MAX);
     }
@@ -159,12 +174,12 @@ void MTestInitFullDatatypes()
 
 void MTestInitMinDatatypes()
 {
-    /* Do not allow to change datatype version during loop.
+    /* Do not allow to change datatype test level during loop.
      * Otherwise indexes will be wrong.
      * Test must explicitly call reset or wait for current datatype loop being
-     * done before changing to another datatype version. */
+     * done before changing to another test level. */
     if (!MTestIsDatatypeGenInited()) {
-        MTEST_DATATYPE_VERSION = MTEST_DATATYPE_VERSION_MIN;
+        MTEST_DATATYPE_TEST_LEVEL = MTEST_DATATYPE_TEST_LEVEL_MIN;
         MTestTypeMinCreatorInit((MTestDdtCreator *) mtestDdtCreators);
         MTestInitDatatypeGen(MTEST_BDT_MAX, MTEST_MIN_DDT_MAX);
     }
@@ -173,6 +188,49 @@ void MTestInitMinDatatypes()
     }
 }
 
+void MTestInitBasicDatatypes()
+{
+    /* Do not allow to change datatype test level during loop.
+     * Otherwise indexes will be wrong.
+     * Test must explicitly call reset or wait for current datatype loop being
+     * done before changing to another test level. */
+    if (!MTestIsDatatypeGenInited()) {
+        MTEST_DATATYPE_TEST_LEVEL = MTEST_DATATYPE_TEST_LEVEL_BASIC;
+        MTestInitDatatypeGen(MTEST_BDT_MAX, 0);
+    }
+    else {
+        printf("Warning: trying to reinitialize mtest datatype during " "datatype iteration!");
+    }
+}
+
+static inline void MTestInitDatatypeEnv()
+{
+    char *envval = 0;
+
+    /* Read global test level specified by user environment variable.
+     * Only initialize once at the first time that test calls datatype routine. */
+    if (MTEST_DATATYPE_TEST_LEVEL_ENV > -1)
+        return;
+
+    /* default full */
+    MTEST_DATATYPE_TEST_LEVEL_ENV = MTEST_DATATYPE_TEST_LEVEL_FULL;
+    MTestInitDefaultTestFunc = MTestInitFullDatatypes;
+
+    envval = getenv("MPITEST_DATATYPE_TEST_LEVEL");
+    if (envval && strlen(envval)) {
+        if (!strncmp(envval, "min", strlen("min"))) {
+            MTEST_DATATYPE_TEST_LEVEL_ENV = MTEST_DATATYPE_TEST_LEVEL_MIN;
+            MTestInitDefaultTestFunc = MTestInitMinDatatypes;
+        }
+        else if (!strncmp(envval, "basic", strlen("basic"))) {
+            MTEST_DATATYPE_TEST_LEVEL_ENV = MTEST_DATATYPE_TEST_LEVEL_BASIC;
+            MTestInitDefaultTestFunc = MTestInitBasicDatatypes;
+        }
+        else if (strncmp(envval, "full", strlen("full"))) {
+            fprintf(stderr, "Unknown MPITEST_DATATYPE_TEST_LEVEL %s\n", envval);
+        }
+    }
+}
 
 /* -------------------------------------------------------------------------------*/
 /* Routine to define various sets of blocklen/count/stride for derived datatypes. */
@@ -407,11 +465,12 @@ int MTestGetDatatypes(MTestDatatype * sendtype, MTestDatatype * recvtype, MPI_Ai
     int merr = 0;
 
     MTestGetDbgInfo(&dbgflag, &verbose);
+    MTestInitDatatypeEnv();
     MPI_Comm_rank(MPI_COMM_WORLD, &wrank);
 
-    /* Initialize the full version if test does not specify. */
+    /* Initialize the default test level if test does not specify. */
     if (!MTestIsDatatypeGenInited()) {
-        MTestInitFullDatatypes();
+        MTestInitDefaultTestFunc();
     }
 
     if (datatype_index == 0) {

http://git.mpich.org/mpich.git/commitdiff/aa649570c1e8cf4ce874413445cd08feda78cdea

commit aa649570c1e8cf4ce874413445cd08feda78cdea
Author: Charles J Archer <charles.j.archer at intel.com>
Date:   Mon Nov 17 10:15:05 2014 -0800

    Open Fabrics Working Group (OFIWG) Netmod Support
    
     * Implements a tag matching interface netmod over the OFIWG Scalable Fabric Interfaces (SFI)

diff --git a/src/mpid/ch3/channels/nemesis/netmod/Makefile.mk b/src/mpid/ch3/channels/nemesis/netmod/Makefile.mk
index c85abf9..978da0e 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/Makefile.mk
+++ b/src/mpid/ch3/channels/nemesis/netmod/Makefile.mk
@@ -14,3 +14,4 @@ include $(top_srcdir)/src/mpid/ch3/channels/nemesis/netmod/scif/Makefile.mk
 include $(top_srcdir)/src/mpid/ch3/channels/nemesis/netmod/portals4/Makefile.mk
 include $(top_srcdir)/src/mpid/ch3/channels/nemesis/netmod/ib/Makefile.mk
 include $(top_srcdir)/src/mpid/ch3/channels/nemesis/netmod/mxm/Makefile.mk
+include $(top_srcdir)/src/mpid/ch3/channels/nemesis/netmod/sfi/Makefile.mk
diff --git a/src/mpid/ch3/channels/nemesis/netmod/sfi/Makefile.mk b/src/mpid/ch3/channels/nemesis/netmod/sfi/Makefile.mk
new file mode 100644
index 0000000..bc3d6ef
--- /dev/null
+++ b/src/mpid/ch3/channels/nemesis/netmod/sfi/Makefile.mk
@@ -0,0 +1,19 @@
+## -*- Mode: Makefile; -*-
+## vim: set ft=automake :
+##
+## (C) 2011 by Argonne National Laboratory.
+##     See COPYRIGHT in top-level directory.
+##
+if BUILD_NEMESIS_NETMOD_SFI
+
+mpi_core_sources +=                                 		\
+    src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_init.c 	\
+    src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_cm.c	 	\
+    src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_tagged.c	\
+    src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_msg.c	 	\
+    src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_data.c	 	\
+    src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_progress.c
+
+errnames_txt_files += src/mpid/ch3/channels/nemesis/netmod/sfi/errnames.txt
+
+endif
diff --git a/src/mpid/ch3/channels/nemesis/netmod/sfi/errnames.txt b/src/mpid/ch3/channels/nemesis/netmod/sfi/errnames.txt
new file mode 100644
index 0000000..c1ae0e3
--- /dev/null
+++ b/src/mpid/ch3/channels/nemesis/netmod/sfi/errnames.txt
@@ -0,0 +1,42 @@
+**sfi_avmap:SFI get address vector map failed
+**sfi_avmap %s %d %s %s:SFI address vector map failed (%s:%d:%s:%s)
+**sfi_tsendto:SFI tagged sendto failed
+**sfi_tsendto %s %d %s %s:SFI tagged sendto failed (%s:%d:%s:%s)
+**sfi_trecvfrom:SFI tagged recvfrom failed
+**sfi_trecvfrom %s %d %s %s:SFI tagged recvfrom failed (%s:%d:%s:%s)
+**sfi_getinfo:SFI getinfo() failed
+**sfi_getinfo %s %d %s %s:SFI getinfo() failed (%s:%d:%s:%s)
+**sfi_openep:SFI endpoint open failed
+**sfi_openep %s %d %s %s:SFI endpoint open failed (%s:%d:%s:%s)
+**sfi_openfabric:SFI fabric open failure
+**sfi_openfabric %s %d %s %s:SFI fabric open failed (%s:%d:%s:%s)
+**sfi_opendomain:SFI domain open failure
+**sfi_opendomain %s %d %s %s:SFI domain open failed (%s:%d:%s:%s)
+**sfi_opencq:SFI event queue create failure
+**sfi_opencq %s %d %s %s:SFI event queue create failed (%s:%d:%s:%s)
+**sfi_avopen:SFI address vector open failed
+**sfi_avopen %s %d %s %s:SFI address vector open failed (%s:%d:%s:%s)
+**sfi_bind:SFI resource bind failure
+**sfi_bind %s %d %s %s:SFI resource bind failed (%s:%d:%s:%s)
+**sfi_ep_enable:SFI endpoint enable failed
+**sfi_ep_enable %s %d %s %s:SFI endpoint enable failed (%s:%d:%s:%s)
+**sfi_getname:SFI get endpoint name failed
+**sfi_getname %s %d %s %s:SFI get endpoint name failed (%s:%d:%s:%s)
+**sfi_avclose:SFI av close failed
+**sfi_avclose %s %d %s %s:SFI av close failed (%s:%d:%s:%s)
+**sfi_epclose:SFI endpoint close failed
+**sfi_epclose %s %d %s %s:SFI endpoint close failed (%s:%d:%s:%s)
+**sfi_cqclose:SFI cq close failed
+**sfi_cqclose %s %d %s %s:SFI cq close failed (%s:%d:%s:%s)
+**sfi_mrclose:SFI mr close failed
+**sfi_mrclose %s %d %s %s:SFI mr close failed (%s:%d:%s:%s)
+**sfi_fabricclose:SFI fabric close failed
+**sfi_fabricclose %s %d %s %s:SFI fabric close failed (%s:%d:%s:%s)
+**sfi_domainclose:SFI domain close failed
+**sfi_domainclose %s %d %s %s:SFI domain close failed (%s:%d:%s:%s)
+**sfi_tsearch:SFI tsearch failed
+**sfi_tsearch %s %d %s %s:SFI tsearch failed (%s:%d:%s:%s)
+**sfi_poll:SFI poll failed
+**sfi_poll %s %d %s %s:SFI poll failed (%s:%d:%s:%s)
+**sfi_cancel:SFI cancel failed
+**sfi_cancel %s %d %s %s:SFI cancel failed (%s:%d:%s:%s)
diff --git a/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_cm.c b/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_cm.c
new file mode 100644
index 0000000..b39517a
--- /dev/null
+++ b/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_cm.c
@@ -0,0 +1,577 @@
+/*
+ *  (C) 2006 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ *
+ *  Portions of this code were written by Intel Corporation.
+ *  Copyright (C) 2011-2012 Intel Corporation.  Intel provides this material
+ *  to Argonne National Laboratory subject to Software Grant and Corporate
+ *  Contributor License Agreement dated February 8, 2012.
+ */
+#include "sfi_impl.h"
+
+/* ------------------------------------------------------------------------ */
+/* sfi_tag_to_vc                                                            */
+/* This routine converts tag information from an incoming preposted receive */
+/* into the VC that uses the routine.  There is a possibility of a small    */
+/* list of temporary VC's that are used during dynamic task management      */
+/* to create the VC's.  This search is linear, but should be a small number */
+/* of temporary VC's that will eventually be destroyed by the upper layers  */
+/* Otherwise the tag is split into a PG "number", which is a hash of the    */
+/* data contained in the process group, and a source.  The source/pg number */
+/* is enough to look up the VC.                                             */
+/* ------------------------------------------------------------------------ */
+#undef FCNAME
+#define FCNAME DECL_FUNC(sfi_tag_to_vc)
+static inline MPIDI_VC_t *sfi_tag_to_vc(uint64_t match_bits)
+{
+    int pgid = 0, port = 0;
+    MPIDI_VC_t *vc = NULL;
+    MPIDI_PG_t *pg = NULL;
+
+    BEGIN_FUNC(FCNAME);
+    if (NO_PGID == get_pgid(match_bits)) {
+        /* -------------------------------------------------------------------- */
+        /* Dynamic path -- This uses a linear search, but number of cm vc's is  */
+        /* a small number, and they should be ephemeral.  This lookup should    */
+        /* be fast yet not normally on the critical path.                       */
+        /* -------------------------------------------------------------------- */
+        port = get_port(match_bits);
+        vc = gl_data.cm_vcs;
+        while (vc && vc->port_name_tag != port) {
+            vc = VC_SFI(vc)->next;
+        }
+        if (NULL == vc) {
+            MPIU_Assertp(0);
+        }
+    }
+    else {
+        /* -------------------------------------------------------------------- */
+        /* If there are no connection management VC's, this is the normal path  */
+        /* Generate the PG number has from each known process group compare to  */
+        /* the pg number in the tag.  The number of PG's should be small        */
+        /* -------------------------------------------------------------------- */
+        pg = gl_data.pg_p;
+        while (pg) {
+            MPIDI_PG_IdToNum(pg, &pgid);
+            if (get_pgid(match_bits) == pgid) {
+                break;
+            }
+            pg = pg->next;
+        }
+        if (pg) {
+            MPIDI_PG_Get_vc(pg, get_psource(match_bits), &vc);
+        }
+        else {
+            MPIU_Assert(0);
+        }
+    }
+    END_FUNC(FCNAME);
+    return vc;
+}
+
+/* ------------------------------------------------------------------------ */
+/* MPID_nem_sfi_conn_req_callback                                           */
+/* A new process has been created and is connected to the current world     */
+/* The address of the new process is exchanged via the business card        */
+/* instead of being exchanged up front during the creation of the first     */
+/* world.  The new connection routine is usually invoked when two worlds    */
+/* are started via dynamic tasking.                                         */
+/* This routine:                                                            */
+/*     * repost the persistent connection management receive request        */
+/*     * malloc/create/initialize the VC                                    */
+/*     * grabs the address name from the business card                      */
+/*     * uses fi_av_insert to insert the addr into the address vector.      */
+/* This is marked as a "connection management" vc, and may be destroyed     */
+/* by the upper layers.  We handle the cm vc's slightly differently than    */
+/* other VC's because they may not be part of a process group.              */
+/* ------------------------------------------------------------------------ */
+#undef FCNAME
+#define FCNAME DECL_FUNC(MPID_nem_sfi_conn_req_callback)
+static inline int MPID_nem_sfi_conn_req_callback(cq_tagged_entry_t * wc, MPID_Request * rreq)
+{
+    int ret, len, mpi_errno = MPI_SUCCESS;
+    char bc[SFI_KVSAPPSTRLEN];
+
+    MPIDI_VC_t *vc;
+    char *addr = NULL;
+    fi_addr_t direct_addr;
+
+    BEGIN_FUNC(FCNAME);
+
+    MPIU_Memcpy(bc, rreq->dev.user_buf, wc->len);
+    bc[wc->len] = '\0';
+    MPIU_Assert(gl_data.conn_req == rreq);
+    FI_RC(fi_trecvfrom(gl_data.endpoint,
+                       gl_data.conn_req->dev.user_buf,
+                       SFI_KVSAPPSTRLEN,
+                       gl_data.mr,
+                       0,
+                       MPID_CONN_REQ,
+                       ~MPID_PROTOCOL_MASK,
+                       (void *) &(REQ_SFI(gl_data.conn_req)->sfi_context)), trecvfrom);
+
+    addr = MPIU_Malloc(gl_data.bound_addrlen);
+    MPIU_Assertp(addr);
+
+    vc = MPIU_Malloc(sizeof(MPIDI_VC_t));
+    MPIU_Assertp(vc);
+
+    MPIDI_VC_Init(vc, NULL, 0);
+    MPI_RC(MPIDI_GetTagFromPort(bc, &vc->port_name_tag));
+    ret = MPIU_Str_get_binary_arg(bc, "SFI", addr, gl_data.bound_addrlen, &len);
+    MPIU_ERR_CHKANDJUMP((ret != MPIU_STR_SUCCESS && ret != MPIU_STR_NOMEM) ||
+                        (size_t) len != gl_data.bound_addrlen,
+                        mpi_errno, MPI_ERR_OTHER, "**badbusinesscard");
+
+    FI_RC(fi_av_insert(gl_data.av, addr, 1, &direct_addr, 0ULL, NULL), avmap);
+    VC_SFI(vc)->direct_addr = direct_addr;
+    VC_SFI(vc)->ready = 1;
+    VC_SFI(vc)->is_cmvc = 1;
+    VC_SFI(vc)->next = gl_data.cm_vcs;
+    gl_data.cm_vcs = vc;
+
+    MPIDI_CH3I_Acceptq_enqueue(vc, vc->port_name_tag);
+    MPIDI_CH3I_INCR_PROGRESS_COMPLETION_COUNT;
+  fn_exit:
+    MPIU_Free(addr);
+    END_FUNC(FCNAME);
+    return mpi_errno;
+  fn_fail:
+    if (vc)
+        MPIU_Free(vc);
+    goto fn_exit;
+}
+
+/* ------------------------------------------------------------------------ */
+/* MPID_nem_sfi_handle_packet                                               */
+/* The "parent" request tracks the state of the entire rendezvous           */
+/* As "child" requests complete, the cc counter is decremented              */
+/* Notify CH3 that we have an incoming packet (if cc hits 1).  Otherwise    */
+/* decrement the ref counter via request completion                         */
+/* ------------------------------------------------------------------------ */
+#undef FCNAME
+#define FCNAME DECL_FUNC(MPID_nem_sfi_handle_packet)
+static inline int MPID_nem_sfi_handle_packet(cq_tagged_entry_t * wc ATTRIBUTE((unused)),
+                                             MPID_Request * rreq)
+{
+    int mpi_errno = MPI_SUCCESS;
+    MPIDI_VC_t *vc;
+
+    BEGIN_FUNC(FCNAME);
+    if (rreq->cc == 1) {
+        vc = REQ_SFI(rreq)->vc;
+        MPIU_Assert(vc);
+        MPI_RC(MPID_nem_handle_pkt(vc, REQ_SFI(rreq)->pack_buffer, REQ_SFI(rreq)->pack_buffer_size))
+            MPIU_Free(REQ_SFI(rreq)->pack_buffer);
+    }
+    MPIDI_CH3U_Request_complete(rreq);
+    END_FUNC_RC(FCNAME);
+}
+
+/* ------------------------------------------------------------------------ */
+/* MPID_nem_sfi_cts_send_callback                                           */
+/* A wrapper around MPID_nem_sfi_handle_packet that decrements              */
+/* the parent request's counter, and cleans up the CTS request              */
+/* ------------------------------------------------------------------------ */
+#undef FCNAME
+#define FCNAME DECL_FUNC(MPID_nem_sfi_cts_send_callback)
+static inline int MPID_nem_sfi_cts_send_callback(cq_tagged_entry_t * wc, MPID_Request * sreq)
+{
+    int mpi_errno = MPI_SUCCESS;
+    BEGIN_FUNC(FCNAME);
+    MPI_RC(MPID_nem_sfi_handle_packet(wc, REQ_SFI(sreq)->parent));
+    MPIDI_CH3U_Request_complete(sreq);
+    END_FUNC_RC(FCNAME);
+}
+
+/* ------------------------------------------------------------------------ */
+/* MPID_nem_sfi_preposted_callback                                          */
+/* This callback handles incoming "SendContig" messages (see sfi_msg.c)     */
+/* for the send routines.  This implements the CTS response and the RTS     */
+/* handler.  The steps are as follows:                                      */
+/*   * Create a parent data request and post a receive into a pack buffer   */
+/*   * Create a child request and send the CTS packet                       */
+/*   * Re-Post the RTS receive and handler to handle the next message       */
+/* ------------------------------------------------------------------------ */
+#undef FCNAME
+#define FCNAME DECL_FUNC(MPID_nem_sfi_preposted_callback)
+static inline int MPID_nem_sfi_preposted_callback(cq_tagged_entry_t * wc, MPID_Request * rreq)
+{
+    int c, mpi_errno = MPI_SUCCESS;
+    size_t pkt_len;
+    char *pack_buffer = NULL;
+    MPIDI_VC_t *vc;
+    MPID_Request *new_rreq, *sreq;
+    BEGIN_FUNC(FCNAME);
+
+    vc = sfi_tag_to_vc(wc->tag);
+    MPIU_Assert(vc);
+    VC_READY_CHECK(vc);
+
+    pkt_len = rreq->dev.user_count;
+    pack_buffer = (char *) MPIU_Malloc(pkt_len);
+    MPIU_ERR_CHKANDJUMP1(pack_buffer == NULL, mpi_errno, MPI_ERR_OTHER,
+                         "**nomem", "**nomem %s", "Pack Buffer alloc");
+    c = 1;
+    MPID_nem_sfi_create_req(&new_rreq, 1);
+    MPID_cc_incr(new_rreq->cc_ptr, &c);
+    new_rreq->dev.OnDataAvail = NULL;
+    new_rreq->dev.next = NULL;
+    REQ_SFI(new_rreq)->event_callback = MPID_nem_sfi_handle_packet;
+    REQ_SFI(new_rreq)->vc = vc;
+    REQ_SFI(new_rreq)->pack_buffer = pack_buffer;
+    REQ_SFI(new_rreq)->pack_buffer_size = pkt_len;
+    FI_RC(fi_trecvfrom(gl_data.endpoint,
+                       REQ_SFI(new_rreq)->pack_buffer,
+                       REQ_SFI(new_rreq)->pack_buffer_size,
+                       gl_data.mr,
+                       VC_SFI(vc)->direct_addr,
+                       wc->tag | MPID_MSG_DATA, 0, &(REQ_SFI(new_rreq)->sfi_context)), trecvfrom);
+
+    MPID_nem_sfi_create_req(&sreq, 1);
+    sreq->dev.OnDataAvail = NULL;
+    sreq->dev.next = NULL;
+    REQ_SFI(sreq)->event_callback = MPID_nem_sfi_cts_send_callback;
+    REQ_SFI(sreq)->parent = new_rreq;
+    FI_RC(fi_tsendto(gl_data.endpoint,
+                     NULL,
+                     0,
+                     gl_data.mr,
+                     VC_SFI(vc)->direct_addr,
+                     wc->tag | MPID_MSG_CTS, &(REQ_SFI(sreq)->sfi_context)), tsendto);
+    MPIU_Assert(gl_data.persistent_req == rreq);
+
+    rreq->dev.user_count = 0;
+    FI_RC(fi_trecvfrom(gl_data.endpoint,
+                       &rreq->dev.user_count,
+                       sizeof rreq->dev.user_count,
+                       gl_data.mr,
+                       0,
+                       MPID_MSG_RTS,
+                       ~MPID_PROTOCOL_MASK, &(REQ_SFI(rreq)->sfi_context)), trecvfrom);
+    END_FUNC_RC(FCNAME);
+}
+
+/* ------------------------------------------------------------------------ */
+/* MPID_nem_sfi_connect_to_root_callback                                    */
+/* Complete and clean up the request                                        */
+/* ------------------------------------------------------------------------ */
+#undef FCNAME
+#define FCNAME DECL_FUNC(MPID_nem_sfi_connect_to_root_callback)
+int MPID_nem_sfi_connect_to_root_callback(cq_tagged_entry_t * wc ATTRIBUTE((unused)),
+                                          MPID_Request * sreq)
+{
+    int mpi_errno = MPI_SUCCESS;
+    BEGIN_FUNC(FCNAME);
+
+    if (REQ_SFI(sreq)->pack_buffer)
+        MPIU_Free(REQ_SFI(sreq)->pack_buffer);
+    MPIDI_CH3U_Request_complete(sreq);
+
+    END_FUNC(FCNAME);
+    return mpi_errno;
+}
+
+/* ------------------------------------------------------------------------ */
+/* MPID_nem_sfi_cm_init                                                     */
+/* This is a utility routine that sets up persistent connection management  */
+/* requests and a persistent data request to handle rendezvous SendContig   */
+/* messages.                                                                */
+/* ------------------------------------------------------------------------ */
+#undef FCNAME
+#define FCNAME DECL_FUNC(MPID_nem_sfi_cm_init)
+int MPID_nem_sfi_cm_init(MPIDI_PG_t * pg_p, int pg_rank ATTRIBUTE((unused)))
+{
+    int mpi_errno = MPI_SUCCESS;
+    MPID_Request *persistent_req, *conn_req;
+    BEGIN_FUNC(FCNAME);
+
+    /* ------------------------------------- */
+    /* Set up CH3 and netmod data structures */
+    /* ------------------------------------- */
+    MPI_RC(MPIDI_CH3I_Register_anysource_notification(MPID_nem_sfi_anysource_posted,
+                                                      MPID_nem_sfi_anysource_matched));
+    MPIDI_Anysource_iprobe_fn = MPID_nem_sfi_anysource_iprobe;
+    MPIDI_Anysource_improbe_fn = MPID_nem_sfi_anysource_improbe;
+    gl_data.pg_p = pg_p;
+
+    /* ----------------------------------- */
+    /* Post a persistent request to handle */
+    /* ----------------------------------- */
+    MPID_nem_sfi_create_req(&persistent_req, 1);
+    persistent_req->dev.OnDataAvail = NULL;
+    persistent_req->dev.next = NULL;
+    REQ_SFI(persistent_req)->vc = NULL;
+    REQ_SFI(persistent_req)->event_callback = MPID_nem_sfi_preposted_callback;
+    FI_RC(fi_trecvfrom(gl_data.endpoint,
+                       &persistent_req->dev.user_count,
+                       sizeof persistent_req->dev.user_count,
+                       gl_data.mr,
+                       0,
+                       MPID_MSG_RTS,
+                       ~MPID_PROTOCOL_MASK,
+                       (void *) &(REQ_SFI(persistent_req)->sfi_context)), trecvfrom);
+    gl_data.persistent_req = persistent_req;
+
+    /* --------------------------------- */
+    /* Post recv for connection requests */
+    /* --------------------------------- */
+    MPID_nem_sfi_create_req(&conn_req, 1);
+    conn_req->dev.user_buf = MPIU_Malloc(SFI_KVSAPPSTRLEN * sizeof(char));
+    conn_req->dev.OnDataAvail = NULL;
+    conn_req->dev.next = NULL;
+    REQ_SFI(conn_req)->vc = NULL;       /* We don't know the source yet */
+    REQ_SFI(conn_req)->event_callback = MPID_nem_sfi_conn_req_callback;
+    FI_RC(fi_trecvfrom(gl_data.endpoint,
+                       conn_req->dev.user_buf,
+                       SFI_KVSAPPSTRLEN,
+                       gl_data.mr,
+                       0,
+                       MPID_CONN_REQ,
+                       ~MPID_PROTOCOL_MASK, (void *) &(REQ_SFI(conn_req)->sfi_context)), trecvfrom);
+    gl_data.conn_req = conn_req;
+
+
+  fn_exit:
+    END_FUNC(FCNAME);
+    return mpi_errno;
+
+  fn_fail:
+    goto fn_exit;
+}
+
+/* ------------------------------------------------------------------------ */
+/* MPID_nem_sfi_cm_finalize                                                 */
+/* Clean up and cancle the requests initiated by the cm_init routine        */
+/* ------------------------------------------------------------------------ */
+#undef FCNAME
+#define FCNAME DECL_FUNC(MPID_nem_sfi_cm_finalize)
+int MPID_nem_sfi_cm_finalize()
+{
+    int mpi_errno = MPI_SUCCESS;
+    BEGIN_FUNC(FCNAME);
+    FI_RC(fi_cancel((fid_t) gl_data.endpoint,
+                    &(REQ_SFI(gl_data.persistent_req)->sfi_context)), cancel);
+    MPIR_STATUS_SET_CANCEL_BIT(gl_data.persistent_req->status, TRUE);
+    MPIR_STATUS_SET_COUNT(gl_data.persistent_req->status, 0);
+    MPIDI_CH3U_Request_complete(gl_data.persistent_req);
+
+    FI_RC(fi_cancel((fid_t) gl_data.endpoint, &(REQ_SFI(gl_data.conn_req)->sfi_context)), cancel);
+    MPIU_Free(gl_data.conn_req->dev.user_buf);
+    MPIR_STATUS_SET_CANCEL_BIT(gl_data.conn_req->status, TRUE);
+    MPIR_STATUS_SET_COUNT(gl_data.conn_req->status, 0);
+    MPIDI_CH3U_Request_complete(gl_data.conn_req);
+  fn_exit:
+    END_FUNC(FCNAME);
+    return mpi_errno;
+  fn_fail:
+    goto fn_exit;
+}
+
+/* ------------------------------------------------------------------------ */
+/* MPID_nem_sfi_vc_connect                                                  */
+/* Handle CH3/Nemesis VC connections                                        */
+/*   * Query the VC address information.  In particular we are looking for  */
+/*     the fabric address name.                                             */
+/*   * Use fi_av_insert to register the address name with SFI               */
+/* ------------------------------------------------------------------------ */
+#undef FCNAME
+#define FCNAME DECL_FUNC(MPID_nem_sfi_vc_connect)
+int MPID_nem_sfi_vc_connect(MPIDI_VC_t * vc)
+{
+    int len, ret, mpi_errno = MPI_SUCCESS;
+    char bc[SFI_KVSAPPSTRLEN], *addr = NULL;
+
+    BEGIN_FUNC(FCNAME);
+    addr = MPIU_Malloc(gl_data.bound_addrlen);
+    MPIU_Assert(addr);
+    MPIU_Assert(1 != VC_SFI(vc)->ready);
+
+    if (!vc->pg || !vc->pg->getConnInfo) {
+        goto fn_exit;
+    }
+
+    MPI_RC(vc->pg->getConnInfo(vc->pg_rank, bc, SFI_KVSAPPSTRLEN, vc->pg));
+    ret = MPIU_Str_get_binary_arg(bc, "SFI", addr, gl_data.bound_addrlen, &len);
+    MPIU_ERR_CHKANDJUMP((ret != MPIU_STR_SUCCESS && ret != MPIU_STR_NOMEM) ||
+                        (size_t) len != gl_data.bound_addrlen,
+                        mpi_errno, MPI_ERR_OTHER, "**badbusinesscard");
+    FI_RC(fi_av_insert(gl_data.av, addr, 1, &(VC_SFI(vc)->direct_addr), 0ULL, NULL), avmap);
+    VC_SFI(vc)->ready = 1;
+
+  fn_exit:
+    if (addr)
+        MPIU_Free(addr);
+    END_FUNC(FCNAME);
+    return mpi_errno;
+
+  fn_fail:
+    goto fn_exit;
+}
+
+#undef FCNAME
+#define FCNAME DECL_FUNC(MPID_nem_sfi_vc_init)
+int MPID_nem_sfi_vc_init(MPIDI_VC_t * vc)
+{
+    int mpi_errno = MPI_SUCCESS;
+    MPIDI_CH3I_VC *const vc_ch = &vc->ch;
+    MPID_nem_sfi_vc_t *const vc_sfi = VC_SFI(vc);
+
+    BEGIN_FUNC(FCNAME);
+    vc->sendNoncontig_fn = MPID_nem_sfi_SendNoncontig;
+    vc_ch->iStartContigMsg = MPID_nem_sfi_iStartContigMsg;
+    vc_ch->iSendContig = MPID_nem_sfi_iSendContig;
+    vc_ch->next = NULL;
+    vc_ch->prev = NULL;
+    vc_sfi->is_cmvc = 0;
+    vc->comm_ops = &_g_comm_ops;
+
+    MPIDI_CHANGE_VC_STATE(vc, ACTIVE);
+
+    if (NULL == vc->pg) {
+        vc_sfi->is_cmvc = 1;
+    }
+    else {
+    }
+    END_FUNC(FCNAME);
+    return mpi_errno;
+}
+
+/* ------------------------------------------------------------------------ */
+/* MPID_nem_sfi_vc_destroy                                                  */
+/* MPID_nem_sfi_vc_terminate                                                */
+/* TODO:  Verify this code has no leaks                                     */
+/* ------------------------------------------------------------------------ */
+#undef FCNAME
+#define FCNAME DECL_FUNC(MPID_nem_sfi_vc_destroy)
+int MPID_nem_sfi_vc_destroy(MPIDI_VC_t * vc)
+{
+    BEGIN_FUNC(FCNAME);
+    if (vc && (VC_SFI(vc)->is_cmvc == 1) && (VC_SFI(vc)->ready == 1)) {
+        if (vc->pg != NULL) {
+            printf("ERROR: VC Destroy (%p) pg = %s\n", vc, (char *) vc->pg->id);
+        }
+        MPIDI_VC_t *prev = gl_data.cm_vcs;
+        while (prev && prev != vc && VC_SFI(prev)->next != vc) {
+            prev = VC_SFI(vc)->next;
+        }
+        if (VC_SFI(prev)->next == vc) {
+            VC_SFI(prev)->next = VC_SFI(vc)->next;
+        }
+        else if (vc == gl_data.cm_vcs) {
+            gl_data.cm_vcs = VC_SFI(vc)->next;
+        }
+        else {
+            MPIU_Assert(0);
+        }
+    }
+    VC_SFI(vc)->ready = 0;
+    END_FUNC(FCNAME);
+    return MPI_SUCCESS;
+}
+
+#undef FCNAME
+#define FCNAME DECL_FUNC(MPID_nem_sfi_vc_terminate)
+int MPID_nem_sfi_vc_terminate(MPIDI_VC_t * vc)
+{
+    int mpi_errno = MPI_SUCCESS;
+    BEGIN_FUNC(FCNAME);
+    MPI_RC(MPIDI_CH3U_Handle_connection(vc, MPIDI_VC_EVENT_TERMINATED));
+    VC_SFI(vc)->ready = 0;
+    END_FUNC_RC(FCNAME);
+}
+
+
+
+/* ------------------------------------------------------------------------ */
+/* MPID_nem_sfi_connect_to_root                                             */
+/*  * A new unconnected VC (cm/ephemeral VC) has been created.  This code   */
+/*    connects the new VC to a rank in another process group.  The parent   */
+/*    address is obtained by an out of band method and given to this        */
+/*    routine as a business card                                            */
+/*  * Read the business card address and insert the address                 */
+/*  * Send a connection request to the parent.  The parent has posted a     */
+/*    persistent request to handle incoming connection requests             */
+/*    The connect message has the child's business card.                    */
+/*  * Add the new VC to the list of ephemeral BC's (cm_vc's).  These VC's   */
+/*    are not part of the process group, so they require special handling   */
+/*    during the SendContig family of routines.                             */
+/* ------------------------------------------------------------------------ */
+#undef FCNAME
+#define FCNAME DECL_FUNC(nm_connect_to_root)
+int MPID_nem_sfi_connect_to_root(const char *business_card, MPIDI_VC_t * new_vc)
+{
+    int len, ret, mpi_errno = MPI_SUCCESS, str_errno = MPI_SUCCESS;
+    int my_bc_len = SFI_KVSAPPSTRLEN;
+    char *addr = NULL, *bc = NULL, *my_bc = NULL;
+    MPID_Request *sreq;
+    uint64_t conn_req_send_bits;
+
+    BEGIN_FUNC(FCNAME);
+    addr = MPIU_Malloc(gl_data.bound_addrlen);
+    bc = MPIU_Malloc(SFI_KVSAPPSTRLEN);
+    MPIU_Assertp(addr);
+    MPIU_Assertp(bc);
+    my_bc = bc;
+    if (!business_card || business_card[0] != 't') {
+        mpi_errno = MPI_ERR_OTHER;
+        goto fn_fail;
+    }
+    MPI_RC(MPIDI_GetTagFromPort(business_card, &new_vc->port_name_tag));
+    ret = MPIU_Str_get_binary_arg(business_card, "SFI", addr, gl_data.bound_addrlen, &len);
+    MPIU_ERR_CHKANDJUMP((ret != MPIU_STR_SUCCESS && ret != MPIU_STR_NOMEM) ||
+                        (size_t) len != gl_data.bound_addrlen,
+                        mpi_errno, MPI_ERR_OTHER, "**badbusinesscard");
+    FI_RC(fi_av_insert(gl_data.av, addr, 1, &(VC_SFI(new_vc)->direct_addr), 0ULL, NULL), avmap);
+
+    VC_SFI(new_vc)->ready = 1;
+    str_errno = MPIU_Str_add_int_arg(&bc, &my_bc_len, "tag", new_vc->port_name_tag);
+    MPIU_ERR_CHKANDJUMP(str_errno, mpi_errno, MPI_ERR_OTHER, "**argstr_port_name_tag");
+    MPI_RC(MPID_nem_sfi_get_business_card(MPIR_Process.comm_world->rank, &bc, &my_bc_len));
+    my_bc_len = SFI_KVSAPPSTRLEN - my_bc_len;
+
+    MPID_nem_sfi_create_req(&sreq, 1);
+    sreq->kind = MPID_REQUEST_SEND;
+    sreq->dev.OnDataAvail = NULL;
+    sreq->dev.next = NULL;
+    REQ_SFI(sreq)->event_callback = MPID_nem_sfi_connect_to_root_callback;
+    REQ_SFI(sreq)->pack_buffer = my_bc;
+    conn_req_send_bits = init_sendtag(0, MPIR_Process.comm_world->rank, 0, MPID_CONN_REQ);
+    FI_RC(fi_tsendto(gl_data.endpoint,
+                     REQ_SFI(sreq)->pack_buffer,
+                     my_bc_len,
+                     gl_data.mr,
+                     VC_SFI(new_vc)->direct_addr,
+                     conn_req_send_bits, &(REQ_SFI(sreq)->sfi_context)), tsendto);
+    MPID_nem_sfi_poll(MPID_NONBLOCKING_POLL);
+    VC_SFI(new_vc)->is_cmvc = 1;
+    VC_SFI(new_vc)->next = gl_data.cm_vcs;
+    gl_data.cm_vcs = new_vc;
+  fn_exit:
+    if (addr)
+        MPIU_Free(addr);
+    END_FUNC(FCNAME);
+    return mpi_errno;
+  fn_fail:
+    if (my_bc)
+        MPIU_Free(my_bc);
+    goto fn_exit;
+}
+
+#undef FCNAME
+#define FCNAME DECL_FUNC(MPID_nem_sfi_get_business_card)
+int MPID_nem_sfi_get_business_card(int my_rank ATTRIBUTE((unused)),
+                                   char **bc_val_p, int *val_max_sz_p)
+{
+    int mpi_errno = MPI_SUCCESS, str_errno = MPIU_STR_SUCCESS;
+    BEGIN_FUNC(FCNAME);
+    str_errno = MPIU_Str_add_binary_arg(bc_val_p,
+                                        val_max_sz_p,
+                                        "SFI",
+                                        (char *) &gl_data.bound_addr, sizeof(gl_data.bound_addr));
+    if (str_errno) {
+        MPIU_ERR_CHKANDJUMP(str_errno == MPIU_STR_NOMEM, mpi_errno, MPI_ERR_OTHER, "**buscard_len");
+        MPIU_ERR_SETANDJUMP(mpi_errno, MPI_ERR_OTHER, "**buscard");
+    }
+    END_FUNC_RC(FCNAME);
+}
diff --git a/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_data.c b/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_data.c
new file mode 100644
index 0000000..1e39684
--- /dev/null
+++ b/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_data.c
@@ -0,0 +1,58 @@
+/*
+ *  (C) 2006 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ *
+ *  Portions of this code were written by Intel Corporation.
+ *  Copyright (C) 2011-2012 Intel Corporation.  Intel provides this material
+ *  to Argonne National Laboratory subject to Software Grant and Corporate
+ *  Contributor License Agreement dated February 8, 2012.
+ */
+#include "sfi_impl.h"
+
+
+MPID_nem_sfi_global_t gl_data;
+
+/* ************************************************************************** */
+/* Netmod Function Table                                                      */
+/* ************************************************************************** */
+MPIDI_Comm_ops_t _g_comm_ops = {
+    MPID_nem_sfi_recv_posted,   /* recv_posted */
+
+    MPID_nem_sfi_send,  /* send */
+    MPID_nem_sfi_send,  /* rsend */
+    MPID_nem_sfi_ssend, /* ssend */
+    MPID_nem_sfi_isend, /* isend */
+    MPID_nem_sfi_isend, /* irsend */
+    MPID_nem_sfi_issend,        /* issend */
+
+    NULL,       /* send_init */
+    NULL,       /* bsend_init */
+    NULL,       /* rsend_init */
+    NULL,       /* ssend_init */
+    NULL,       /* startall */
+
+    MPID_nem_sfi_cancel_send,   /* cancel_send */
+    MPID_nem_sfi_cancel_recv,   /* cancel_recv */
+
+    NULL,       /* probe */
+    MPID_nem_sfi_iprobe,        /* iprobe */
+    MPID_nem_sfi_improbe        /* improbe */
+};
+
+MPID_nem_netmod_funcs_t MPIDI_nem_sfi_funcs = {
+    MPID_nem_sfi_init,
+    MPID_nem_sfi_finalize,
+#ifdef ENABLE_CHECKPOINTING
+    NULL,
+    NULL,
+    NULL,
+#endif
+    MPID_nem_sfi_poll,
+    MPID_nem_sfi_get_business_card,
+    MPID_nem_sfi_connect_to_root,
+    MPID_nem_sfi_vc_init,
+    MPID_nem_sfi_vc_destroy,
+    MPID_nem_sfi_vc_terminate,
+    MPID_nem_sfi_anysource_iprobe,
+    MPID_nem_sfi_anysource_improbe,
+};
diff --git a/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_impl.h b/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_impl.h
new file mode 100644
index 0000000..9e8b93f
--- /dev/null
+++ b/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_impl.h
@@ -0,0 +1,342 @@
+/*
+ *  (C) 2006 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ *
+ *  Portions of this code were written by Intel Corporation.
+ *  Copyright (C) 2011-2012 Intel Corporation.  Intel provides this material
+ *  to Argonne National Laboratory subject to Software Grant and Corporate
+ *  Contributor License Agreement dated February 8, 2012.
+ */
+#ifndef SFI_IMPL_H
+#define SFI_IMPL_H
+
+#include "mpid_nem_impl.h"
+#include "mpihandlemem.h"
+#include "pmi.h"
+#include <rdma/fabric.h>
+#include <rdma/fi_errno.h>
+#include <rdma/fi_endpoint.h>
+#include <rdma/fi_domain.h>
+#include <rdma/fi_tagged.h>
+#include <rdma/fi_cm.h>
+#include <netdb.h>
+
+/* ************************************************************************** */
+/* Type Definitions                                                           */
+/* ************************************************************************** */
+typedef struct iovec iovec_t;
+typedef struct fi_info info_t;
+typedef struct fi_cq_attr cq_attr_t;
+typedef struct fi_av_attr av_attr_t;
+typedef struct fi_domain_attr domain_attr_t;
+typedef struct fi_tx_ctx_attr tx_ctx_attr_t;
+typedef struct fi_cq_tagged_entry cq_tagged_entry_t;
+typedef struct fi_cq_err_entry cq_err_entry_t;
+typedef struct fi_context context_t;
+typedef int (*event_callback_fn) (cq_tagged_entry_t * wc, MPID_Request *);
+typedef int (*req_fn) (MPIDI_VC_t *, MPID_Request *, int *);
+
+/* ******************************** */
+/* Global Object for state tracking */
+/* ******************************** */
+typedef struct {
+    fi_addr_t bound_addr;       /* This ranks bound address    */
+    fi_addr_t any_addr;         /* Specifies any source        */
+    size_t bound_addrlen;       /* length of the bound address */
+    struct fid_fabric *fabric;  /* fabric object               */
+    struct fid_domain *domain;  /* domain object               */
+    struct fid_ep *endpoint;    /* endpoint object             */
+    struct fid_cq *cq;          /* completion queue            */
+    struct fid_av *av;          /* address vector              */
+    struct fid_mr *mr;          /* memory region               */
+    MPIDI_PG_t *pg_p;           /* MPI Process group           */
+    MPIDI_VC_t *cm_vcs;         /* temporary VC's              */
+    MPID_Request *persistent_req;       /* Unexpected request queue    */
+    MPID_Request *conn_req;     /* Connection request          */
+    MPIDI_Comm_ops_t comm_ops;
+} MPID_nem_sfi_global_t;
+
+/* ******************************** */
+/* Device channel specific data     */
+/* This is per destination          */
+/* ******************************** */
+typedef struct {
+    fi_addr_t direct_addr;      /* Remote SFI address */
+    int ready;                  /* VC ready state     */
+    int is_cmvc;                /* Cleanup VC         */
+    MPIDI_VC_t *next;           /* VC queue           */
+} MPID_nem_sfi_vc_t;
+#define VC_SFI(vc) ((MPID_nem_sfi_vc_t *)vc->ch.netmod_area.padding)
+
+/* ******************************** */
+/* Per request object data          */
+/* SFI/Netmod specific              */
+/* ******************************** */
+typedef struct {
+    context_t sfi_context;      /* Context Object              */
+    void *addr;                 /* SFI Address                 */
+    event_callback_fn event_callback;   /* Callback Event              */
+    char *pack_buffer;          /* MPI Pack Buffer             */
+    int pack_buffer_size;       /* Pack buffer size            */
+    int match_state;            /* State of the match          */
+    int req_started;            /* Request state               */
+    MPIDI_VC_t *vc;             /* VC paired with this request */
+    uint64_t tag;               /* 64 bit tag request          */
+    MPID_Request *parent;       /* Parent request              */
+} MPID_nem_sfi_req_t;
+#define REQ_SFI(req) ((MPID_nem_sfi_req_t *)((req)->ch.netmod_area.padding))
+
+/* ******************************** */
+/* Logging and function macros      */
+/* ******************************** */
+#undef FUNCNAME
+#define FUNCNAME nothing
+#define BEGIN_FUNC(FUNCNAME)                    \
+  MPIDI_STATE_DECL(FUNCNAME);                   \
+  MPIDI_FUNC_ENTER(FUNCNAME);
+#define END_FUNC(FUNCNAME)                      \
+  MPIDI_FUNC_EXIT(FUNCNAME);
+#define END_FUNC_RC(FUNCNAME) \
+  fn_exit:                    \
+  MPIDI_FUNC_EXIT(FUNCNAME);  \
+  return mpi_errno;           \
+fn_fail:                      \
+  goto fn_exit;
+
+#define __SHORT_FILE__                          \
+  (strrchr(__FILE__,'/')                        \
+   ? strrchr(__FILE__,'/')+1                    \
+   : __FILE__                                   \
+)
+#define DECL_FUNC(FUNCNAME)  MPIU_QUOTE(FUNCNAME)
+#define SFI_COMPILE_TIME_ASSERT(expr_)                                  \
+  do { switch(0) { case 0: case (expr_): default: break; } } while (0)
+
+#define FI_RC(FUNC,STR)                                         \
+  do                                                            \
+    {                                                           \
+      ssize_t _ret = FUNC;                                      \
+      MPIU_ERR_##CHKANDJUMP4(_ret<0,                            \
+                           mpi_errno,                           \
+                           MPI_ERR_OTHER,                       \
+                           "**sfi_"#STR,                        \
+                           "**sfi_"#STR" %s %d %s %s",          \
+                           __SHORT_FILE__,                      \
+                           __LINE__,                            \
+                           FCNAME,                              \
+                           fi_strerror(-_ret));                 \
+    } while (0)
+
+#define PMI_RC(FUNC,STR)                                        \
+  do                                                            \
+    {                                                           \
+      pmi_errno  = FUNC;                                        \
+      MPIU_ERR_##CHKANDJUMP4(pmi_errno!=PMI_SUCCESS,            \
+                           mpi_errno,                           \
+                           MPI_ERR_OTHER,                       \
+                           "**sfi_"#STR,                        \
+                           "**sfi_"#STR" %s %d %s %s",          \
+                           __SHORT_FILE__,                      \
+                           __LINE__,                            \
+                           FCNAME,                              \
+                           #STR);                               \
+    } while (0)
+
+#define MPI_RC(FUNC)                                        \
+  do                                                        \
+    {                                                       \
+      mpi_errno  = FUNC;                                    \
+      if (mpi_errno) MPIU_ERR_POP(mpi_errno);               \
+    } while (0);
+
+#define VC_READY_CHECK(vc)                      \
+({                                              \
+  if (1 != VC_SFI(vc)->ready) {                 \
+    MPI_RC(MPID_nem_sfi_vc_connect(vc));        \
+  }                                             \
+})
+
+#define SFI_ADDR_INIT(src, vc, remote_proc) \
+({                                          \
+  if (MPI_ANY_SOURCE != src) {              \
+    MPIU_Assert(vc != NULL);                \
+    VC_READY_CHECK(vc);                     \
+    remote_proc = VC_SFI(vc)->direct_addr;  \
+  } else {                                  \
+    MPIU_Assert(vc == NULL);                \
+    remote_proc = gl_data.any_addr;         \
+  }                                         \
+})
+
+
+#define NO_PGID 0
+
+/* **************************************************************************
+ *  match/ignore bit manipulation
+ * **************************************************************************
+ * 0123 4567 01234567 0123 4567 01234567 0123 4567 01234567 01234567 01234567
+ *     |                  |                  |
+ * ^   |    context id    |       source     |       message tag
+ * |   |                  |                  |
+ * +---- protocol
+ * ************************************************************************** */
+#define MPID_PROTOCOL_MASK       (0xF000000000000000ULL)
+#define MPID_CONTEXT_MASK        (0x0FFFF00000000000ULL)
+#define MPID_SOURCE_MASK         (0x00000FFFF0000000ULL)
+#define MPID_TAG_MASK            (0x000000000FFFFFFFULL)
+#define MPID_PGID_MASK           (0x00000000FFFFFFFFULL)
+#define MPID_PSOURCE_MASK        (0x0000FFFF00000000ULL)
+#define MPID_PORT_NAME_MASK      (0x0FFF000000000000ULL)
+#define MPID_SYNC_SEND           (0x1000000000000000ULL)
+#define MPID_SYNC_SEND_ACK       (0x2000000000000000ULL)
+#define MPID_MSG_RTS             (0x3000000000000000ULL)
+#define MPID_MSG_CTS             (0x4000000000000000ULL)
+#define MPID_MSG_DATA            (0x5000000000000000ULL)
+#define MPID_CONN_REQ            (0x6000000000000000ULL)
+#define MPID_SOURCE_SHIFT        (16)
+#define MPID_TAG_SHIFT           (28)
+#define MPID_PSOURCE_SHIFT       (16)
+#define MPID_PORT_SHIFT          (32)
+#define SFI_KVSAPPSTRLEN         1024
+
+/* ******************************** */
+/* Request manipulation inlines     */
+/* ******************************** */
+static inline void MPID_nem_sfi_init_req(MPID_Request * req)
+{
+    memset(REQ_SFI(req), 0, sizeof(MPID_nem_sfi_req_t));
+}
+
+static inline int MPID_nem_sfi_create_req(MPID_Request ** request, int refcnt)
+{
+    int mpi_errno = MPI_SUCCESS;
+    MPID_Request *req;
+    req = MPID_Request_create();
+    MPIU_Assert(req);
+    MPIU_Object_set_ref(req, refcnt);
+    MPID_nem_sfi_init_req(req);
+    *request = req;
+    return mpi_errno;
+}
+
+/* ******************************** */
+/* Tag Manipulation inlines         */
+/* ******************************** */
+static inline uint64_t init_sendtag(MPIR_Context_id_t contextid, int source, int tag, uint64_t type)
+{
+    uint64_t match_bits;
+    match_bits = contextid;
+    match_bits = (match_bits << MPID_SOURCE_SHIFT);
+    match_bits |= source;
+    match_bits = (match_bits << MPID_TAG_SHIFT);
+    match_bits |= (MPID_TAG_MASK & tag) | type;
+    return match_bits;
+}
+
+/* receive posting */
+static inline uint64_t init_recvtag(uint64_t * mask_bits,
+                                    MPIR_Context_id_t contextid, int source, int tag)
+{
+    uint64_t match_bits = 0;
+    *mask_bits = MPID_SYNC_SEND;
+    match_bits = contextid;
+    match_bits = (match_bits << MPID_SOURCE_SHIFT);
+    if (MPI_ANY_SOURCE == source) {
+        match_bits = (match_bits << MPID_TAG_SHIFT);
+        *mask_bits |= MPID_SOURCE_MASK;
+    }
+    else {
+        match_bits |= source;
+        match_bits = (match_bits << MPID_TAG_SHIFT);
+    }
+    if (MPI_ANY_TAG == tag)
+        *mask_bits |= MPID_TAG_MASK;
+    else
+        match_bits |= (MPID_TAG_MASK & tag);
+
+    return match_bits;
+}
+
+static inline int get_tag(uint64_t match_bits)
+{
+    return ((int) (match_bits & MPID_TAG_MASK));
+}
+
+static inline int get_source(uint64_t match_bits)
+{
+    return ((int) ((match_bits & MPID_SOURCE_MASK) >> (MPID_TAG_SHIFT)));
+}
+
+static inline int get_psource(uint64_t match_bits)
+{
+    return ((int) ((match_bits & MPID_PSOURCE_MASK) >> (MPID_PORT_SHIFT)));
+}
+
+static inline int get_pgid(uint64_t match_bits)
+{
+    return ((int) (match_bits & MPID_PGID_MASK));
+}
+
+static inline int get_port(uint64_t match_bits)
+{
+    return ((int) ((match_bits & MPID_PORT_NAME_MASK) >> MPID_TAG_SHIFT));
+}
+
+/* ************************************************************************** */
+/* MPICH Comm Override and Netmod functions                                   */
+/* ************************************************************************** */
+int MPID_nem_sfi_recv_posted(struct MPIDI_VC *vc, struct MPID_Request *req);
+int MPID_nem_sfi_send(struct MPIDI_VC *vc, const void *buf, int count,
+                      MPI_Datatype datatype, int dest, int tag, MPID_Comm * comm,
+                      int context_offset, struct MPID_Request **request);
+int MPID_nem_sfi_isend(struct MPIDI_VC *vc, const void *buf, int count,
+                       MPI_Datatype datatype, int dest, int tag, MPID_Comm * comm,
+                       int context_offset, struct MPID_Request **request);
+int MPID_nem_sfi_ssend(struct MPIDI_VC *vc, const void *buf, int count,
+                       MPI_Datatype datatype, int dest, int tag, MPID_Comm * comm,
+                       int context_offset, struct MPID_Request **request);
+int MPID_nem_sfi_issend(struct MPIDI_VC *vc, const void *buf, int count,
+                        MPI_Datatype datatype, int dest, int tag, MPID_Comm * comm,
+                        int context_offset, struct MPID_Request **request);
+int MPID_nem_sfi_cancel_send(struct MPIDI_VC *vc, struct MPID_Request *sreq);
+int MPID_nem_sfi_cancel_recv(struct MPIDI_VC *vc, struct MPID_Request *rreq);
+int MPID_nem_sfi_iprobe(struct MPIDI_VC *vc, int source, int tag, MPID_Comm * comm,
+                        int context_offset, int *flag, MPI_Status * status);
+int MPID_nem_sfi_improbe(struct MPIDI_VC *vc, int source, int tag, MPID_Comm * comm,
+                         int context_offset, int *flag, MPID_Request ** message,
+                         MPI_Status * status);
+int MPID_nem_sfi_anysource_iprobe(int tag, MPID_Comm * comm, int context_offset,
+                                  int *flag, MPI_Status * status);
+int MPID_nem_sfi_anysource_improbe(int tag, MPID_Comm * comm, int context_offset,
+                                   int *flag, MPID_Request ** message, MPI_Status * status);
+void MPID_nem_sfi_anysource_posted(MPID_Request * rreq);
+int MPID_nem_sfi_anysource_matched(MPID_Request * rreq);
+int MPID_nem_sfi_send_data(cq_tagged_entry_t * wc, MPID_Request * sreq);
+int MPID_nem_sfi_SendNoncontig(MPIDI_VC_t * vc, MPID_Request * sreq,
+                               void *hdr, MPIDI_msg_sz_t hdr_sz);
+int MPID_nem_sfi_iStartContigMsg(MPIDI_VC_t * vc, void *hdr, MPIDI_msg_sz_t hdr_sz,
+                                 void *data, MPIDI_msg_sz_t data_sz, MPID_Request ** sreq_ptr);
+int MPID_nem_sfi_iSendContig(MPIDI_VC_t * vc, MPID_Request * sreq, void *hdr,
+                             MPIDI_msg_sz_t hdr_sz, void *data, MPIDI_msg_sz_t data_sz);
+
+/* ************************************************************************** */
+/* SFI utility functions : not exposed as a netmod public API                 */
+/* ************************************************************************** */
+#define MPID_NONBLOCKING_POLL 0
+#define MPID_BLOCKING_POLL 1
+int MPID_nem_sfi_init(MPIDI_PG_t * pg_p, int pg_rank, char **bc_val_p, int *val_max_sz_p);
+int MPID_nem_sfi_finalize(void);
+int MPID_nem_sfi_vc_init(MPIDI_VC_t * vc);
+int MPID_nem_sfi_get_business_card(int my_rank, char **bc_val_p, int *val_max_sz_p);
+int MPID_nem_sfi_poll(int in_blocking_poll);
+int MPID_nem_sfi_vc_terminate(MPIDI_VC_t * vc);
+int MPID_nem_sfi_vc_connect(MPIDI_VC_t * vc);
+int MPID_nem_sfi_connect_to_root(const char *business_card, MPIDI_VC_t * new_vc);
+int MPID_nem_sfi_vc_destroy(MPIDI_VC_t * vc);
+int MPID_nem_sfi_cm_init(MPIDI_PG_t * pg_p, int pg_rank);
+int MPID_nem_sfi_cm_finalize();
+
+extern MPID_nem_sfi_global_t gl_data;
+extern MPIDI_Comm_ops_t _g_comm_ops;
+
+#endif
diff --git a/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_init.c b/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_init.c
new file mode 100644
index 0000000..88a6496
--- /dev/null
+++ b/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_init.c
@@ -0,0 +1,461 @@
+/*
+ *  (C) 2006 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ *
+ *  Portions of this code were written by Intel Corporation.
+ *  Copyright (C) 2011-2012 Intel Corporation.  Intel provides this material
+ *  to Argonne National Laboratory subject to Software Grant and Corporate
+ *  Contributor License Agreement dated February 8, 2012.
+ */
+#include "sfi_impl.h"
+
+static inline int dump_and_choose_providers(info_t * prov, info_t ** prov_use);
+static inline int compile_time_checking();
+
+#undef FCNAME
+#define FCNAME DECL_FUNC(MPID_nem_sfi_init)
+int MPID_nem_sfi_init(MPIDI_PG_t * pg_p, int pg_rank, char **bc_val_p, int *val_max_sz_p)
+{
+    int ret, fi_version, i, len, pmi_errno;
+    int mpi_errno = MPI_SUCCESS;
+    info_t hints, *prov_tagged, *prov_use;
+    cq_attr_t cq_attr;
+    av_attr_t av_attr;
+    char kvsname[SFI_KVSAPPSTRLEN], key[SFI_KVSAPPSTRLEN], bc[SFI_KVSAPPSTRLEN];
+    char *my_bc, *addrs, *null_addr;
+    fi_addr_t *fi_addrs = NULL;
+    MPIDI_VC_t *vc;
+
+    BEGIN_FUNC(FCNAME);
+    MPIU_CHKLMEM_DECL(2);
+
+    compile_time_checking();
+    /* ------------------------------------------------------------------------ */
+    /* Hints to filter providers                                                */
+    /* See man fi_getinfo for a list                                            */
+    /* of all filters                                                           */
+    /* mode:  Select capabilities netmod is prepared to support.                */
+    /*        In this case, netmod will pass in context into                    */
+    /*        communication calls.                                              */
+    /*        Note that we do not fill in FI_LOCAL_MR, which means this netmod  */
+    /*        does not support exchange of memory regions on communication calls */
+    /*        SFI requires that all communication calls use a registered mr     */
+    /*        but in our case this netmod is written to only support transfers  */
+    /*        on a dynamic memory region that spans all of memory.  So, we do   */
+    /*        not set the FI_LOCAL_MR mode bit, and we set the FI_DYNAMIC_MR    */
+    /*        bit to tell SFI our requirement and filter providers appropriately */
+    /* ep_type:  reliable datagram operation                                    */
+    /* caps:     Capabilities required from the provider.  The bits specified   */
+    /*           with buffered receive, cancel, and remote complete implements  */
+    /*           MPI semantics.  Tagged is used to support tag matching.        */
+    /*           We expect to register all memory up front for use with this    */
+    /*           endpoint, so the netmod requires dynamic memory regions        */
+    /* ------------------------------------------------------------------------ */
+    memset(&hints, 0, sizeof(hints));
+    hints.mode = FI_CONTEXT;
+    hints.ep_type = FI_EP_RDM;  /* Reliable datagram         */
+    hints.caps = FI_TAGGED;     /* Tag matching interface    */
+    hints.caps |= FI_BUFFERED_RECV;     /* Buffered receives         */
+    hints.caps |= FI_REMOTE_COMPLETE;   /* Remote completion         */
+    hints.caps |= FI_CANCEL;    /* Support cancel            */
+    hints.caps |= FI_DYNAMIC_MR;        /* Global dynamic mem region */
+
+    /* ------------------------------------------------------------------------ */
+    /* FI_VERSION provides binary backward and forward compatibility support    */
+    /* Specify the version of SFI is coded to, the provider will select struct  */
+    /* layouts that are compatible with this version.                           */
+    /* ------------------------------------------------------------------------ */
+    fi_version = FI_VERSION(1, 0);
+
+    /* ------------------------------------------------------------------------ */
+    /* fi_getinfo:  returns information about fabric  services for reaching a   */
+    /* remote node or service.  this does not necessarily allocate resources.   */
+    /* Pass NULL for name/service because we want a list of providers supported */
+    /* ------------------------------------------------------------------------ */
+    domain_attr_t domain_attr;
+    memset(&domain_attr, 0, sizeof(domain_attr));
+
+    tx_ctx_attr_t tx_attr;
+    memset(&tx_attr, 0, sizeof(tx_attr));
+
+    domain_attr.threading = FI_THREAD_PROGRESS;
+    domain_attr.control_progress = FI_PROGRESS_AUTO;
+    tx_attr.op_flags = FI_REMOTE_COMPLETE;
+    hints.domain_attr = &domain_attr;
+    hints.tx_attr = &tx_attr;
+
+    FI_RC(fi_getinfo(fi_version,        /* Interface version requested               */
+                     NULL,      /* Optional name or fabric to resolve        */
+                     NULL,      /* Service name or port number to request    */
+                     0ULL,      /* Flag:  node/service specify local address */
+                     &hints,    /* In:  Hints to filter available providers  */
+                     &prov_tagged),     /* Out: List of providers that match hints   */
+          getinfo);
+    MPIU_ERR_CHKANDJUMP4(prov_tagged == NULL, mpi_errno, MPI_ERR_OTHER,
+                         "**sfi_getinfo", "**sfi_getinfo %s %d %s %s",
+                         __SHORT_FILE__, __LINE__, FCNAME, "No tag matching provider found");
+    /* ------------------------------------------------------------------------ */
+    /* Open fabric                                                              */
+    /* The getinfo struct returns a fabric attribute struct that can be used to */
+    /* instantiate the virtual or physical network.  This opens a "fabric       */
+    /* provider".   We choose the first available fabric, but getinfo           */
+    /* returns a list.  see man fi_fabric for details                           */
+    /* ------------------------------------------------------------------------ */
+    dump_and_choose_providers(prov_tagged, &prov_use);
+    FI_RC(fi_fabric(prov_use->fabric_attr,      /* In:   Fabric attributes */
+                    &gl_data.fabric,    /* Out:  Fabric descriptor */
+                    NULL), openfabric); /* Context: fabric events  */
+
+    /* ------------------------------------------------------------------------ */
+    /* Create the access domain, which is the physical or virtual network or    */
+    /* hardware port/collection of ports.  Returns a domain object that can be  */
+    /* used to create endpoints.  See man fi_domain for details.                */
+    /* Refine get_info filter for additional capabilities                       */
+    /* threading:  Disable locking, MPICH handles locking model                 */
+    /* control_progress:  enable async progress                                 */
+    /* op_flags:  Specifies default operation to set on all communication.      */
+    /*            In this case, we want remote completion to be set by default  */
+    /* ------------------------------------------------------------------------ */
+    FI_RC(fi_domain(gl_data.fabric,     /* In:  Fabric object             */
+                    prov_use,   /* In:  default domain attributes */
+                    &gl_data.domain,    /* Out: domain object             */
+                    NULL), opendomain); /* Context: Domain events         */
+
+    /* ------------------------------------------------------------------------ */
+    /* Create a transport level communication endpoint.  To use the endpoint,   */
+    /* it must be bound to completion counters or event queues and enabled,     */
+    /* and the resources consumed by it, such as address vectors, counters,     */
+    /* completion queues, etc.                                                  */
+    /* see man fi_endpoint for more details                                     */
+    /* ------------------------------------------------------------------------ */
+    FI_RC(fi_endpoint(gl_data.domain,   /* In: Domain Object        */
+                      prov_use, /* In: Configuration object */
+                      &gl_data.endpoint,        /* Out: Endpoint Object     */
+                      NULL), openep);   /* Context: endpoint events */
+
+    /* ------------------------------------------------------------------------ */
+    /* Create the objects that will be bound to the endpoint.                   */
+    /* The objects include:                                                     */
+    /*     * completion queue for events                                        */
+    /*     * address vector of other endpoint addresses                         */
+    /*     * dynamic memory-spanning memory region                              */
+    /* Other objects could be created (for example), but are unused in netmod   */
+    /*     * counters for incoming writes                                       */
+    /*     * completion counters for put and get                                */
+    /* ------------------------------------------------------------------------ */
+    FI_RC(fi_mr_reg(gl_data.domain,     /* In:  Domain Object              */
+                    0,  /* In:  Lower memory address       */
+                    UINTPTR_MAX,        /* In:  Upper memory address       */
+                    FI_SEND | FI_RECV,  /* In:  Expose MR for read/write   */
+                    0ULL,       /* In:  base MR offset             */
+                    0ULL,       /* In:  requested key              */
+                    0ULL,       /* In:  No flags                   */
+                    &gl_data.mr,        /* Out: memregion object           */
+                    NULL), mr_reg);     /* Context: memregion events       */
+
+    memset(&cq_attr, 0, sizeof(cq_attr));
+    cq_attr.format = FI_CQ_FORMAT_TAGGED;
+    FI_RC(fi_cq_open(gl_data.domain,    /* In:  Domain Object         */
+                     &cq_attr,  /* In:  Configuration object  */
+                     &gl_data.cq,       /* Out: CQ Object             */
+                     NULL), opencq);    /* Context: CQ events         */
+
+    memset(&av_attr, 0, sizeof(av_attr));
+    av_attr.type = FI_AV_MAP;   /* Mapped addressing mode     */
+    FI_RC(fi_av_open(gl_data.domain,    /* In:  Domain Object         */
+                     &av_attr,  /* In:  Configuration object  */
+                     &gl_data.av,       /* Out: AV Object             */
+                     NULL), avopen);    /* Context: AV events         */
+
+    /* --------------------------------------------- */
+    /* Bind the MR, CQ and AV to the endpoint object */
+    /* --------------------------------------------- */
+    FI_RC(fi_ep_bind(gl_data.endpoint, (fid_t) gl_data.mr, 0), bind);
+    FI_RC(fi_ep_bind(gl_data.endpoint, (fid_t) gl_data.cq, FI_SEND | FI_RECV), bind);
+    FI_RC(fi_ep_bind(gl_data.endpoint, (fid_t) gl_data.av, 0), bind);
+
+    /* ------------------------------------- */
+    /* Enable the endpoint for communication */
+    /* This commits the bind operations      */
+    /* ------------------------------------- */
+    FI_RC(fi_enable(gl_data.endpoint), ep_enable);
+
+    /* --------------------------- */
+    /* Free providers info         */
+    /* --------------------------- */
+    fi_freeinfo(prov_use);
+
+    /* ---------------------------------------------------- */
+    /* Exchange endpoint addresses using scalable database  */
+    /* or job launcher, in this case, use PMI interfaces    */
+    /* ---------------------------------------------------- */
+    gl_data.bound_addrlen = sizeof(gl_data.bound_addr);
+    FI_RC(fi_getname((fid_t) gl_data.endpoint, &gl_data.bound_addr,
+                     &gl_data.bound_addrlen), getname);
+
+    /* -------------------------------- */
+    /* Get our business card            */
+    /* -------------------------------- */
+    my_bc = *bc_val_p;
+    MPI_RC(MPID_nem_sfi_get_business_card(pg_rank, bc_val_p, val_max_sz_p));
+
+    /* -------------------------------- */
+    /* Publish the business card        */
+    /* to the KVS                       */
+    /* -------------------------------- */
+    PMI_RC(PMI_KVS_Get_my_name(kvsname, SFI_KVSAPPSTRLEN), pmi);
+    sprintf(key, "SFI-%d", pg_rank);
+
+    PMI_RC(PMI_KVS_Put(kvsname, key, my_bc), pmi);
+    PMI_RC(PMI_KVS_Commit(kvsname), pmi);
+
+    /* -------------------------------- */
+    /* Set the MPI maximum tag value    */
+    /* -------------------------------- */
+    MPIR_Process.attrs.tag_ub = (1 << MPID_TAG_SHIFT) - 1;
+
+    /* --------------------------------- */
+    /* Wait for all the ranks to publish */
+    /* their business card               */
+    /* --------------------------------- */
+    PMI_Barrier();
+
+    /* --------------------------------- */
+    /* Retrieve every rank's address     */
+    /* from KVS and store them in local  */
+    /* table                             */
+    /* --------------------------------- */
+    MPIU_CHKLMEM_MALLOC(addrs, char *, pg_p->size * gl_data.bound_addrlen, mpi_errno, "addrs");
+
+    for (i = 0; i < pg_p->size; ++i) {
+        sprintf(key, "SFI-%d", i);
+
+        PMI_RC(PMI_KVS_Get(kvsname, key, bc, SFI_KVSAPPSTRLEN), pmi);
+        ret = MPIU_Str_get_binary_arg(bc, "SFI",
+                                      (char *) &addrs[i * gl_data.bound_addrlen],
+                                      gl_data.bound_addrlen, &len);
+        MPIU_ERR_CHKANDJUMP((ret != MPIU_STR_SUCCESS && ret != MPIU_STR_NOMEM) ||
+                            (size_t) len != gl_data.bound_addrlen,
+                            mpi_errno, MPI_ERR_OTHER, "**badbusinesscard");
+    }
+
+    /* ---------------------------------------------------- */
+    /* Map the addresses into an address vector             */
+    /* The addressing mode is "map", so we must provide     */
+    /* storage to store the per destination addresses       */
+    /* ---------------------------------------------------- */
+    fi_addrs = MPIU_Malloc(pg_p->size * sizeof(fi_addr_t));
+    FI_RC(fi_av_insert(gl_data.av, addrs, pg_p->size, fi_addrs, 0ULL, NULL), avmap);
+
+    /* ---------------------------------------------------- */
+    /* Insert the ANY_SRC address                           */
+    /* ---------------------------------------------------- */
+    MPIU_CHKLMEM_MALLOC(null_addr, char *, 1 * gl_data.bound_addrlen, mpi_errno, "null_addr");
+    memset(null_addr, 0, gl_data.bound_addrlen);
+
+    FI_RC(fi_av_insert(gl_data.av, null_addr, 1, &gl_data.any_addr, 0ULL, NULL), avmap);
+
+    /* --------------------------------- */
+    /* Store the direct addresses in     */
+    /* the ranks' respective VCs         */
+    /* --------------------------------- */
+    for (i = 0; i < pg_p->size; ++i) {
+        MPIDI_PG_Get_vc(pg_p, i, &vc);
+        VC_SFI(vc)->direct_addr = fi_addrs[i];
+        VC_SFI(vc)->ready = 1;
+    }
+
+    /* --------------------------------------------- */
+    /* Initialize the connection management routines */
+    /* This completes any function handlers and      */
+    /* global data structures, and posts any         */
+    /* persistent communication requests that are    */
+    /* required, like connection management and      */
+    /* startcontig messages                          */
+    /* --------------------------------------------- */
+    MPI_RC(MPID_nem_sfi_cm_init(pg_p, pg_rank));
+  fn_exit:
+    if (fi_addrs)
+        MPIU_Free(fi_addrs);
+    MPIU_CHKLMEM_FREEALL();
+    END_FUNC(FCNAME);
+    return mpi_errno;
+  fn_fail:
+    goto fn_exit;
+}
+
+#undef FCNAME
+#define FCNAME DECL_FUNC(MPID_nem_sfi_finalize)
+int MPID_nem_sfi_finalize(void)
+{
+    int mpi_errno = MPI_SUCCESS;
+    int ret = 0;
+    BEGIN_FUNC(FCNAME);
+
+    /* --------------------------------------------- */
+    /* Syncronization                                */
+    /* Barrier across all ranks in this world        */
+    /* --------------------------------------------- */
+    MPIR_Barrier_impl(MPIR_Process.comm_world, &ret);
+
+    /* --------------------------------------------- */
+    /* Finalize connection management routines       */
+    /* Cancels any persistent/global requests and    */
+    /* frees any resources from cm_init()            */
+    /* --------------------------------------------- */
+    MPI_RC(MPID_nem_sfi_cm_finalize());
+
+    FI_RC(fi_close((fid_t) gl_data.mr), mrclose);
+    FI_RC(fi_close((fid_t) gl_data.av), avclose);
+    FI_RC(fi_close((fid_t) gl_data.endpoint), epclose);
+    FI_RC(fi_close((fid_t) gl_data.cq), cqclose);
+    FI_RC(fi_close((fid_t) gl_data.domain), domainclose);
+    FI_RC(fi_close((fid_t) gl_data.fabric), fabricclose);
+    END_FUNC_RC(FCNAME);
+}
+
+static inline int compile_time_checking()
+{
+    SFI_COMPILE_TIME_ASSERT(sizeof(MPID_nem_sfi_vc_t) <= MPID_NEM_VC_NETMOD_AREA_LEN);
+    SFI_COMPILE_TIME_ASSERT(sizeof(MPID_nem_sfi_req_t) <= MPID_NEM_REQ_NETMOD_AREA_LEN);
+    SFI_COMPILE_TIME_ASSERT(sizeof(iovec_t) == sizeof(MPID_IOV));
+    MPIU_Assert(((void *) &(((iovec_t *) 0)->iov_base)) ==
+                ((void *) &(((MPID_IOV *) 0)->MPID_IOV_BUF)));
+    MPIU_Assert(((void *) &(((iovec_t *) 0)->iov_len)) ==
+                ((void *) &(((MPID_IOV *) 0)->MPID_IOV_LEN)));
+    MPIU_Assert(sizeof(((iovec_t *) 0)->iov_len) == sizeof(((MPID_IOV *) 0)->MPID_IOV_LEN));
+
+    /* ------------------------------------------------------------------------ */
+    /* Generate the MPICH catalog files                                         */
+    /* The high level mpich build scripts inspect MPIU_ERR_ macros to generate  */
+    /* the message catalog.  However, this netmod buries the messages under the */
+    /* FI_RC macros, so the catalog doesn't get generated.  The build system    */
+    /* likely needs a MPIU_ERR_REGISTER macro                                   */
+    /* ------------------------------------------------------------------------ */
+#if 0
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_avmap", "**sfi_avmap %s %d %s %s", a, b, a, a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_tsendto", "**sfi_tsendto %s %d %s %s", a, b, a, a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_trecvfrom", "**sfi_trecvfrom %s %d %s %s", a, b, a, a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_getinfo", "**sfi_getinfo %s %d %s %s", a, b, a, a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_openep", "**sfi_openep %s %d %s %s", a, b, a, a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_openfabric", "**sfi_openfabric %s %d %s %s", a, b, a, a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_opendomain", "**sfi_opendomain %s %d %s %s", a, b, a, a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_opencq", "**sfi_opencq %s %d %s %s", a, b, a, a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_avopen", "**sfi_avopen %s %d %s %s", a, b, a, a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_bind", "**sfi_bind %s %d %s %s", a, b, a, a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_ep_enable", "**sfi_ep_enable %s %d %s %s", a, b, a, a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_getname", "**sfi_getname %s %d %s %s", a, b, a, a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_avclose", "**sfi_avclose %s %d %s %s", a, b, a, a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_epclose", "**sfi_epclose %s %d %s %s", a, b, a, a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_cqclose", "**sfi_cqclose %s %d %s %s", a, b, a, a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_fabricclose", "**sfi_fabricclose %s %d %s %s", a, b, a,
+                  a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_domainclose", "**sfi_domainclose %s %d %s %s", a, b, a,
+                  a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_tsearch", "**sfi_tsearch %s %d %s %s", a, b, a, a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_poll", "**sfi_poll %s %d %s %s", a, b, a, a);
+    MPIU_ERR_SET2(e, MPI_ERR_OTHER, "**sfi_cancel", "**sfi_cancel %s %d %s %s", a, b, a, a);
+#endif
+    return 0;
+}
+
+
+/*
+=== BEGIN_MPI_T_CVAR_INFO_BLOCK ===
+
+cvars:
+    - name        : MPIR_CVAR_DUMP_PROVIDERS
+      category    : DEVELOPER
+      type        : boolean
+      default     : false
+      class       : device
+      verbosity   : MPI_T_VERBOSITY_MPIDEV_DETAIL
+      scope       : MPI_T_SCOPE_LOCAL
+      description : >-
+        If true, dump provider information at init
+
+=== END_MPI_T_CVAR_INFO_BLOCK ===
+*/
+static inline int dump_and_choose_providers(info_t * prov, info_t ** prov_use)
+{
+    info_t *p = prov;
+    int i = 0;
+    *prov_use = prov;
+    if (MPIR_CVAR_DUMP_PROVIDERS) {
+        fprintf(stdout, "Dumping Providers(first=%p):\n", prov);
+        while (p) {
+            fprintf(stdout, " ********** Provider %d (%p) *********\n", i++, p);
+            fprintf(stdout, "%-18s: %-#20" PRIx64 "\n", "caps", p->caps);
+            fprintf(stdout, "%-18s: %-#20" PRIx64 "\n", "mode", p->mode);
+            fprintf(stdout, "%-18s: %-#20" PRIx32 "\n", "ep_type", p->ep_type);
+            fprintf(stdout, "%-18s: %-#20" PRIx32 "\n", "addr_format", p->addr_format);
+            fprintf(stdout, "%-18s: %-20lu\n", "src_addrlen", p->src_addrlen);
+            fprintf(stdout, "%-18s: %-20lu\n", "dest_addrlen", p->dest_addrlen);
+            fprintf(stdout, "%-18s: %-20p\n", "src_addr", p->src_addr);
+            fprintf(stdout, "%-18s: %-20p\n", "dest_addr", p->dest_addr);
+            fprintf(stdout, "%-18s: %-20p\n", "connreq", p->connreq);
+            fprintf(stdout, "%-18s: %-20p\n", "tx_attr", p->tx_attr);
+            fprintf(stdout, "       %-18s: %-#20" PRIx64 "\n", ".caps", p->tx_attr->caps);
+            fprintf(stdout, "       %-18s: %-#20" PRIx64 "\n", ".mode", p->tx_attr->mode);
+            fprintf(stdout, "       %-18s: %-#20" PRIx64 "\n", ".op_flags", p->tx_attr->op_flags);
+            fprintf(stdout, "       %-18s: %-#20" PRIx64 "\n", ".msg_order", p->tx_attr->msg_order);
+            fprintf(stdout, "       %-18s: %-20lu\n", ".inject_size", p->tx_attr->inject_size);
+            fprintf(stdout, "       %-18s: %-20lu\n", ".size", p->tx_attr->size);
+            fprintf(stdout, "       %-18s: %-20lu\n", ".iov_limit", p->tx_attr->iov_limit);
+            fprintf(stdout, "%-18s: %-20p\n", "rx_attr", p->rx_attr);
+            fprintf(stdout, "       %-18s: %-#20" PRIx64 "\n", ".caps", p->rx_attr->caps);
+            fprintf(stdout, "       %-18s: %-#20" PRIx64 "\n", ".mode", p->rx_attr->mode);
+            fprintf(stdout, "       %-18s: %-#20" PRIx64 "\n", ".op_flags", p->rx_attr->op_flags);
+            fprintf(stdout, "       %-18s: %-#20" PRIx64 "\n", ".msg_order", p->rx_attr->msg_order);
+            fprintf(stdout, "       %-18s: %-20lu\n", ".total_buffered_recv",
+                    p->rx_attr->total_buffered_recv);
+            fprintf(stdout, "       %-18s: %-20lu\n", ".size", p->rx_attr->size);
+            fprintf(stdout, "       %-18s: %-20lu\n", ".iov_limit", p->rx_attr->iov_limit);
+            fprintf(stdout, "%-18s: %-20p\n", "ep_attr", p->ep_attr);
+            fprintf(stdout, "       %-18s: %-#20" PRIx32 "\n", ".protocol", p->ep_attr->protocol);
+            fprintf(stdout, "       %-18s: %-20lu\n", ".max_msg_size", p->ep_attr->max_msg_size);
+            fprintf(stdout, "       %-18s: %-20lu\n", ".inject_size", p->ep_attr->inject_size);
+            fprintf(stdout, "       %-18s: %-20lu\n", ".total_buffered_recv",
+                    p->ep_attr->total_buffered_recv);
+            fprintf(stdout, "       %-18s: %-20lu\n", ".max_order_raw_size",
+                    p->ep_attr->max_order_raw_size);
+            fprintf(stdout, "       %-18s: %-20lu\n", ".max_order_war_size",
+                    p->ep_attr->max_order_war_size);
+            fprintf(stdout, "       %-18s: %-20lu\n", ".max_order_waw_size",
+                    p->ep_attr->max_order_waw_size);
+            fprintf(stdout, "       %-18s: %-20lu\n", ".mem_tag_format",
+                    p->ep_attr->mem_tag_format);
+            fprintf(stdout, "       %-18s: %-20lu\n", ".msg_order", p->ep_attr->msg_order);
+            fprintf(stdout, "       %-18s: %-20lu\n", ".tx_ctx_cnt", p->ep_attr->tx_ctx_cnt);
+            fprintf(stdout, "       %-18s: %-20lu\n", ".rx_ctx_cnt", p->ep_attr->rx_ctx_cnt);
+            fprintf(stdout, "%-18s: %-20p\n", "domain_attr", p->domain_attr);
+            fprintf(stdout, "           %-18s: %-20s\n", ".name", p->domain_attr->name);
+            fprintf(stdout, "           %-18s: %-#20" PRIx32 "\n", ".threading",
+                    p->domain_attr->threading);
+            fprintf(stdout, "           %-18s: %-#20" PRIx32 "\n", ".control_progress",
+                    p->domain_attr->control_progress);
+            fprintf(stdout, "           %-18s: %-#20" PRIx32 "\n", ".data_progress",
+                    p->domain_attr->data_progress);
+            fprintf(stdout, "           %-18s: %-20lu\n", ".mr_key_size",
+                    p->domain_attr->mr_key_size);
+            fprintf(stdout, "           %-18s: %-20lu\n", ".cq_data_size",
+                    p->domain_attr->cq_data_size);
+            fprintf(stdout, "           %-18s: %-20lu\n", ".ep_cnt", p->domain_attr->ep_cnt);
+            fprintf(stdout, "           %-18s: %-20lu\n", ".tx_ctx_cnt",
+                    p->domain_attr->tx_ctx_cnt);
+            fprintf(stdout, "           %-18s: %-20lu\n", ".rx_ctx_cnt",
+                    p->domain_attr->rx_ctx_cnt);
+            fprintf(stdout, "           %-18s: %-20lu\n", ".max_ep_tx_ctx",
+                    p->domain_attr->max_ep_tx_ctx);
+            fprintf(stdout, "           %-18s: %-20lu\n", ".max_ep_rx_ctx",
+                    p->domain_attr->max_ep_rx_ctx);
+            fprintf(stdout, "%-18s: %-20p\n", "fabric_attr", p->fabric_attr);
+            fprintf(stdout, "           %-18s: %-20s\n", ".name", p->fabric_attr->name);
+            fprintf(stdout, "           %-18s: %-20s\n", ".prov_name", p->fabric_attr->prov_name);
+            fprintf(stdout, "           %-18s: %-#20" PRIx32 "\n", ".prov_version",
+                    p->fabric_attr->prov_version);
+            p = p->next;
+        }
+    }
+    return i;
+}
diff --git a/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_msg.c b/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_msg.c
new file mode 100644
index 0000000..3797f92
--- /dev/null
+++ b/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_msg.c
@@ -0,0 +1,237 @@
+/*
+ *  (C) 2006 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ *
+ *  Portions of this code were written by Intel Corporation.
+ *  Copyright (C) 2011-2012 Intel Corporation.  Intel provides this material
+ *  to Argonne National Laboratory subject to Software Grant and Corporate
+ *  Contributor License Agreement dated February 8, 2012.
+ */
+#include "sfi_impl.h"
+
+/* ------------------------------------------------------------------------ */
+/* GET_PGID_AND_SET_MATCH macro looks up the process group to find the      */
+/* correct rank in multiple process groups.  The "contigmsg" family of apis */
+/* work on a global scope, not on a communicator scope(like tagged MPI.)    */
+/* The pgid matching is used for uniquely scoping the tag, usually in       */
+/* intercomms and dynamic process management where there are multiple       */
+/* global world spaces with similar ranks in the global space               */
+/* ------------------------------------------------------------------------ */
+#define GET_PGID_AND_SET_MATCH()                                        \
+({                                                                      \
+  if (vc->pg) {                                                         \
+    MPIDI_PG_IdToNum(gl_data.pg_p, &pgid);                              \
+  } else {                                                              \
+    pgid = NO_PGID;                                                     \
+  }                                                                     \
+  match_bits = (uint64_t)MPIR_Process.comm_world->rank <<               \
+    (MPID_PORT_SHIFT);                                                  \
+  if (0 == pgid) {                                                      \
+    match_bits |= (uint64_t)vc->port_name_tag<<                         \
+      (MPID_PORT_SHIFT+MPID_PSOURCE_SHIFT);                             \
+  }                                                                     \
+  match_bits |= pgid;                                                   \
+  match_bits |= MPID_MSG_RTS;                                           \
+})
+
+/* ------------------------------------------------------------------------ */
+/* START_COMM is common code used by the nemesis netmod functions:          */
+/* iSendContig                                                              */
+/* SendNoncontig                                                            */
+/* iStartContigMsg                                                          */
+/* These routines differ slightly in their behaviors, but can share common  */
+/* code to perform the send.  START_COMM provides that common code, which   */
+/* is based on a tagged rendezvous message.                                 */
+/* The rendezvous is implemented with an RTS-CTS-Data send protocol:        */
+/* CTS_POST()   |                                  |                        */
+/* RTS_SEND()   | -------------------------------> | ue_callback()(sfi_cm.c)*/
+/*              |                                  |   pack_buffer()        */
+/*              |                                  |   DATA_POST()          */
+/*              |                                  |   RTS_POST()           */
+/*              |                                  |   CTS_SEND()           */
+/* CTS_MATCH()  | <------------------------------- |                        */
+/* DATA_SEND()  | ===============================> | handle_packet()        */
+/*              |                                  |   notify_ch3_pkt()     */
+/*              v                                  v                        */
+/* ------------------------------------------------------------------------ */
+#define START_COMM()                                                    \
+  ({                                                                    \
+    GET_PGID_AND_SET_MATCH();                                           \
+    VC_READY_CHECK(vc);                                                 \
+    c = 1;                                                              \
+    MPID_cc_incr(sreq->cc_ptr, &c);                                     \
+    MPID_cc_incr(sreq->cc_ptr, &c);                                     \
+    REQ_SFI(sreq)->event_callback   = MPID_nem_sfi_data_callback;       \
+    REQ_SFI(sreq)->pack_buffer      = pack_buffer;                      \
+    REQ_SFI(sreq)->pack_buffer_size = pkt_len;                          \
+    REQ_SFI(sreq)->vc               = vc;                               \
+    REQ_SFI(sreq)->tag              = match_bits;                       \
+                                                                        \
+    MPID_nem_sfi_create_req(&cts_req, 1);                               \
+    cts_req->dev.OnDataAvail         = NULL;                            \
+    cts_req->dev.next                = NULL;                            \
+    REQ_SFI(cts_req)->event_callback = MPID_nem_sfi_cts_recv_callback;  \
+    REQ_SFI(cts_req)->parent         = sreq;                            \
+                                                                        \
+    FI_RC(fi_trecvfrom(gl_data.endpoint,                                \
+                       NULL,                                            \
+                       0,                                               \
+                       gl_data.mr,                                      \
+                       VC_SFI(vc)->direct_addr,                         \
+                       match_bits | MPID_MSG_CTS,                       \
+                       0, /* Exact tag match, no ignore bits */         \
+                       &(REQ_SFI(cts_req)->sfi_context)),trecvfrom);    \
+    FI_RC(fi_tsendto(gl_data.endpoint,                                  \
+                     &REQ_SFI(sreq)->pack_buffer_size,                  \
+                     sizeof(REQ_SFI(sreq)->pack_buffer_size),           \
+                     gl_data.mr,                                        \
+                     VC_SFI(vc)->direct_addr,                           \
+                     match_bits,                                        \
+                     &(REQ_SFI(sreq)->sfi_context)),tsendto);           \
+  })
+
+
+/* ------------------------------------------------------------------------ */
+/* General handler for RTS-CTS-Data protocol.  Waits for the cc counter     */
+/* to hit two (send RTS and receive CTS decrementers) before kicking off the*/
+/* bulk data transfer.  On data send completion, the request can be freed   */
+/* ------------------------------------------------------------------------ */
+#undef FCNAME
+#define FCNAME DECL_FUNC(MPID_nem_sfi_data_callback)
+static int MPID_nem_sfi_data_callback(cq_tagged_entry_t * wc, MPID_Request * sreq)
+{
+    int complete = 0, mpi_errno = MPI_SUCCESS;
+    MPIDI_VC_t *vc;
+    req_fn reqFn;
+    uint64_t tag = 0;
+    BEGIN_FUNC(FCNAME);
+    if (sreq->cc == 2) {
+        vc = REQ_SFI(sreq)->vc;
+        REQ_SFI(sreq)->tag = tag | MPID_MSG_DATA;
+        FI_RC(fi_tsendto(gl_data.endpoint,
+                         REQ_SFI(sreq)->pack_buffer,
+                         REQ_SFI(sreq)->pack_buffer_size,
+                         gl_data.mr,
+                         VC_SFI(vc)->direct_addr,
+                         wc->tag | MPID_MSG_DATA, (void *) &(REQ_SFI(sreq)->sfi_context)), tsendto);
+    }
+    if (sreq->cc == 1) {
+        if (REQ_SFI(sreq)->pack_buffer)
+            MPIU_Free(REQ_SFI(sreq)->pack_buffer);
+
+        reqFn = sreq->dev.OnDataAvail;
+        if (!reqFn) {
+            MPIDI_CH3U_Request_complete(sreq);
+        }
+        else {
+            vc = REQ_SFI(sreq)->vc;
+            MPI_RC(reqFn(vc, sreq, &complete));
+        }
+    }
+    else {
+        MPIDI_CH3U_Request_complete(sreq);
+    }
+    END_FUNC_RC(FCNAME);
+}
+
+/* ------------------------------------------------------------------------ */
+/* Signals the CTS has been received.  Call MPID_nem_sfi_data_callback on   */
+/* the parent send request to kick off the bulk data transfer               */
+/* ------------------------------------------------------------------------ */
+#undef FCNAME
+#define FCNAME DECL_FUNC(MPID_nem_sfi_cts_recv_callback)
+static int MPID_nem_sfi_cts_recv_callback(cq_tagged_entry_t * wc, MPID_Request * rreq)
+{
+    int mpi_errno = MPI_SUCCESS;
+    BEGIN_FUNC(FCNAME);
+    MPI_RC(MPID_nem_sfi_data_callback(wc, REQ_SFI(rreq)->parent));
+    MPIDI_CH3U_Request_complete(rreq);
+    END_FUNC_RC(FCNAME);
+}
+
+/* ------------------------------------------------------------------------ */
+/* The nemesis API implementations:                                         */
+/* These functions currently memory copy into a pack buffer before sending  */
+/* To improve performance, we can replace the memory copy with a non-contig */
+/* send (using tsendmsg)                                                    */
+/* For now, the memory copy is the simplest implementation of these         */
+/* functions over a tagged msg interface                                    */
+/* ------------------------------------------------------------------------ */
+#undef FCNAME
+#define FCNAME DECL_FUNC(MPID_nem_sfi_iSendContig)
+int MPID_nem_sfi_iSendContig(MPIDI_VC_t * vc,
+                             MPID_Request * sreq,
+                             void *hdr, MPIDI_msg_sz_t hdr_sz, void *data, MPIDI_msg_sz_t data_sz)
+{
+    int pgid, c, pkt_len, mpi_errno = MPI_SUCCESS;
+    char *pack_buffer;
+    uint64_t match_bits;
+    MPID_Request *cts_req;
+
+    BEGIN_FUNC(FCNAME);
+    MPIU_Assert(hdr_sz <= (MPIDI_msg_sz_t) sizeof(MPIDI_CH3_Pkt_t));
+    MPID_nem_sfi_init_req(sreq);
+    pkt_len = sizeof(MPIDI_CH3_Pkt_t) + data_sz;
+    pack_buffer = MPIU_Malloc(pkt_len);
+    MPIU_Assert(pack_buffer);
+    MPIU_Memcpy(pack_buffer, hdr, hdr_sz);
+    MPIU_Memcpy(pack_buffer + sizeof(MPIDI_CH3_Pkt_t), data, data_sz);
+    START_COMM();
+    END_FUNC_RC(FCNAME);
+}
+
+#undef FCNAME
+#define FCNAME DECL_FUNC(MPID_nem_sfi_SendNoncontig)
+int MPID_nem_sfi_SendNoncontig(MPIDI_VC_t * vc,
+                               MPID_Request * sreq, void *hdr, MPIDI_msg_sz_t hdr_sz)
+{
+    int c, pgid, pkt_len, mpi_errno = MPI_SUCCESS;
+    char *pack_buffer;
+    MPI_Aint data_sz;
+    uint64_t match_bits;
+    MPID_Request *cts_req;
+
+    BEGIN_FUNC(FCNAME);
+    MPIU_Assert(hdr_sz <= (MPIDI_msg_sz_t) sizeof(MPIDI_CH3_Pkt_t));
+    MPIU_Assert(sreq->dev.segment_first == 0);
+
+    data_sz = sreq->dev.segment_size;
+    pkt_len = sizeof(MPIDI_CH3_Pkt_t) + data_sz;
+    pack_buffer = MPIU_Malloc(pkt_len);
+    MPIU_Assert(pack_buffer);
+    MPIU_Memcpy(pack_buffer, hdr, hdr_sz);
+    MPID_Segment_pack(sreq->dev.segment_ptr, 0, &data_sz, pack_buffer + sizeof(MPIDI_CH3_Pkt_t));
+    START_COMM();
+    MPID_nem_sfi_poll(MPID_NONBLOCKING_POLL);
+    END_FUNC_RC(FCNAME);
+}
+
+#undef FCNAME
+#define FCNAME DECL_FUNC(MPID_nem_sfi_iStartContigMsg)
+int MPID_nem_sfi_iStartContigMsg(MPIDI_VC_t * vc,
+                                 void *hdr,
+                                 MPIDI_msg_sz_t hdr_sz,
+                                 void *data, MPIDI_msg_sz_t data_sz, MPID_Request ** sreq_ptr)
+{
+    int pkt_len, c, pgid, mpi_errno = MPI_SUCCESS;
+    MPID_Request *sreq;
+    MPID_Request *cts_req;
+    char *pack_buffer;
+    uint64_t match_bits;
+    BEGIN_FUNC(FCNAME);
+    MPIU_Assert(hdr_sz <= (MPIDI_msg_sz_t) sizeof(MPIDI_CH3_Pkt_t));
+
+    MPID_nem_sfi_create_req(&sreq, 2);
+    sreq->kind = MPID_REQUEST_SEND;
+    sreq->dev.OnDataAvail = NULL;
+    sreq->dev.next = NULL;
+    pkt_len = sizeof(MPIDI_CH3_Pkt_t) + data_sz;
+    pack_buffer = MPIU_Malloc(pkt_len);
+    MPIU_Assert(pack_buffer);
+    MPIU_Memcpy((void *) pack_buffer, hdr, hdr_sz);
+    if (data_sz)
+        MPIU_Memcpy((void *) (pack_buffer + sizeof(MPIDI_CH3_Pkt_t)), data, data_sz);
+    START_COMM();
+    *sreq_ptr = sreq;
+    END_FUNC_RC(FCNAME);
+}
diff --git a/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_progress.c b/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_progress.c
new file mode 100644
index 0000000..8f40aeb
--- /dev/null
+++ b/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_progress.c
@@ -0,0 +1,291 @@
+/*
+ *  (C) 2006 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ *
+ *  Portions of this code were written by Intel Corporation.
+ *  Copyright (C) 2011-2012 Intel Corporation.  Intel provides this material
+ *  to Argonne National Laboratory subject to Software Grant and Corporate
+ *  Contributor License Agreement dated February 8, 2012.
+ */
+#include "sfi_impl.h"
+
+#define TSEARCH_INIT      0
+#define TSEARCH_NOT_FOUND 1
+#define TSEARCH_FOUND     2
+
+/* ------------------------------------------------------------------------ */
+/* This routine looks up the request that contains a context object         */
+/* ------------------------------------------------------------------------ */
+static inline MPID_Request *context_to_req(void *sfi_context)
+{
+    return (MPID_Request *) container_of(sfi_context, MPID_Request, ch.netmod_area.padding);
+}
+
+/* ------------------------------------------------------------------------ */
+/* Populate the status object from the return of the tsearch                */
+/* ------------------------------------------------------------------------ */
+#undef FCNAME
+#define FCNAME DECL_FUNC(search_complete)
+static int search_complete(uint64_t tag, size_t msglen, MPID_Request * rreq)
+{
+    int mpi_errno = MPI_SUCCESS;
+    BEGIN_FUNC(FCNAME);
+    rreq->status.MPI_SOURCE = get_source(tag);
+    rreq->status.MPI_TAG = get_tag(tag);
+    rreq->status.MPI_ERROR = MPI_SUCCESS;
+    MPIR_STATUS_SET_COUNT(rreq->status, msglen);
+    END_FUNC(FCNAME);
+    return mpi_errno;
+}
+
+/* ------------------------------------------------------------------------ */
+/* Check if wc->data is filled.  If wc->data a message was found            */
+/* and we fill out the status.  Otherwise, it's not found, and we set the   */
+/* state of the search request to 1, not found                              */
+/* ------------------------------------------------------------------------ */
+#undef FCNAME
+#define FCNAME DECL_FUNC(tsearch_callback)
+static int tsearch_callback(cq_tagged_entry_t * wc, MPID_Request * rreq)
+{
+    int mpi_errno = MPI_SUCCESS;
+    BEGIN_FUNC(FCNAME);
+    if (wc->data) {
+        REQ_SFI(rreq)->match_state = TSEARCH_FOUND;
+        rreq->status.MPI_SOURCE = get_source(wc->tag);
+        rreq->status.MPI_TAG = get_tag(wc->tag);
+        MPIR_STATUS_SET_COUNT(rreq->status, wc->len);
+        rreq->status.MPI_ERROR = MPI_SUCCESS;
+    }
+    else {
+        REQ_SFI(rreq)->match_state = TSEARCH_NOT_FOUND;
+    }
+    END_FUNC(FCNAME);
+    return mpi_errno;
+}
+
+#undef FCNAME
+#define FCNAME DECL_FUNC(MPID_nem_sfi_iprobe_impl)
+int MPID_nem_sfi_iprobe_impl(struct MPIDI_VC *vc,
+                             int source,
+                             int tag,
+                             MPID_Comm * comm,
+                             int context_offset,
+                             int *flag, MPI_Status * status, MPID_Request ** rreq_ptr)
+{
+    int ret, mpi_errno = MPI_SUCCESS;
+    fi_addr_t remote_proc = 0;
+    uint64_t match_bits, mask_bits;
+    size_t len;
+    MPID_Request rreq_s, *rreq;
+
+    BEGIN_FUNC(FCNAME);
+    if (rreq_ptr) {
+        MPIDI_Request_create_rreq(rreq, mpi_errno, goto fn_exit);
+        *rreq_ptr = rreq;
+        rreq->comm = comm;
+        rreq->dev.match.parts.rank = source;
+        rreq->dev.match.parts.tag = tag;
+        rreq->dev.match.parts.context_id = comm->context_id;
+        MPIR_Comm_add_ref(comm);
+    }
+    else {
+        rreq = &rreq_s;
+        rreq->dev.OnDataAvail = NULL;
+    }
+    REQ_SFI(rreq)->event_callback = tsearch_callback;
+    REQ_SFI(rreq)->match_state = TSEARCH_INIT;
+    SFI_ADDR_INIT(source, vc, remote_proc);
+    match_bits = init_recvtag(&mask_bits, comm->context_id + context_offset, source, tag);
+
+    /* ------------------------------------------------------------------------ */
+    /* fi_tsearch:                                                              */
+    /* Initiate a search for a match in the hardware or software queue.         */
+    /* The search can complete immediately with a match found (or not, ENOMSG). */
+    /* It can also enqueue a context entry into the completion queue to make the */
+    /* search nonblocking.  This code will poll until the entry is complete.    */
+    /* ------------------------------------------------------------------------ */
+    ret = fi_tsearch(gl_data.endpoint,  /* Tagged Endpoint      */
+                     &match_bits,       /* Match bits           */
+                     mask_bits, /* Bits to ignore       */
+                     0, /* Flags                */
+                     &remote_proc,      /* Remote Address       */
+                     &len,      /* Out:  incoming msglen */
+                     &(REQ_SFI(rreq)->sfi_context));    /* Nonblocking context  */
+    if (ret == -FI_ENOMSG) {
+        *flag = 0;
+        goto fn_exit;
+    }
+    else if (ret == 1) {
+        *flag = 1;
+        search_complete(match_bits, len, rreq);
+        *status = rreq->status;
+        goto fn_exit;
+    }
+    else {
+        MPIU_ERR_CHKANDJUMP4((ret < 0), mpi_errno, MPI_ERR_OTHER,
+                             "**sfi_tsearch", "**sfi_tsearch %s %d %s %s",
+                             __SHORT_FILE__, __LINE__, FCNAME, fi_strerror(-ret));
+    }
+    while (TSEARCH_INIT == REQ_SFI(rreq)->match_state)
+        MPID_nem_sfi_poll(MPID_BLOCKING_POLL);
+
+    if (REQ_SFI(rreq)->match_state == TSEARCH_NOT_FOUND) {
+        if (rreq_ptr) {
+            MPIDI_CH3_Request_destroy(rreq);
+            *rreq_ptr = NULL;
+        }
+        *flag = 0;
+    }
+    else {
+        *status = rreq->status;
+        *flag = 1;
+    }
+    END_FUNC_RC(FCNAME);
+}
+
+#undef FCNAME
+#define FCNAME DECL_FUNC(MPID_nem_sfi_iprobe)
+int MPID_nem_sfi_iprobe(struct MPIDI_VC *vc,
+                        int source,
+                        int tag,
+                        MPID_Comm * comm, int context_offset, int *flag, MPI_Status * status)
+{
+    int rc;
+    BEGIN_FUNC(FCNAME);
+    rc = MPID_nem_sfi_iprobe_impl(vc, source, tag, comm, context_offset, flag, status, NULL);
+    END_FUNC(FCNAME);
+    return rc;
+}
+
+#undef FCNAME
+#define FCNAME DECL_FUNC(MPID_nem_sfi_improbe)
+int MPID_nem_sfi_improbe(struct MPIDI_VC *vc,
+                         int source,
+                         int tag,
+                         MPID_Comm * comm,
+                         int context_offset,
+                         int *flag, MPID_Request ** message, MPI_Status * status)
+{
+    int old_error = status->MPI_ERROR;
+    int s;
+    BEGIN_FUNC(FCNAME);
+    s = MPID_nem_sfi_iprobe_impl(vc, source, tag, comm, context_offset, flag, status, message);
+    if (flag && *flag) {
+        status->MPI_ERROR = old_error;
+        (*message)->kind = MPID_REQUEST_MPROBE;
+    }
+    END_FUNC(FCNAME);
+    return s;
+}
+
+#undef FCNAME
+#define FCNAME DECL_FUNC(MPID_nem_sfi_anysource_iprobe)
+int MPID_nem_sfi_anysource_iprobe(int tag,
+                                  MPID_Comm * comm,
+                                  int context_offset, int *flag, MPI_Status * status)
+{
+    int rc;
+    BEGIN_FUNC(FCNAME);
+    rc = MPID_nem_sfi_iprobe(NULL, MPI_ANY_SOURCE, tag, comm, context_offset, flag, status);
+    END_FUNC(FCNAME);
+    return rc;
+}
+
+#undef FCNAME
+#define FCNAME DECL_FUNC(MPID_nem_sfi_anysource_improbe)
+int MPID_nem_sfi_anysource_improbe(int tag,
+                                   MPID_Comm * comm,
+                                   int context_offset,
+                                   int *flag, MPID_Request ** message, MPI_Status * status)
+{
+    int rc;
+    BEGIN_FUNC(FCNAME);
+    rc = MPID_nem_sfi_improbe(NULL, MPI_ANY_SOURCE, tag, comm,
+                              context_offset, flag, message, status);
+    END_FUNC(FCNAME);
+    return rc;
+}
+
+#undef FCNAME
+#define FCNAME DECL_FUNC(MPID_nem_sfi_poll)
+int MPID_nem_sfi_poll(int in_blocking_poll)
+{
+    int complete = 0, mpi_errno = MPI_SUCCESS;
+    ssize_t ret;
+    cq_tagged_entry_t wc;
+    cq_err_entry_t error;
+    MPIDI_VC_t *vc;
+    MPID_Request *req;
+    req_fn reqFn;
+    BEGIN_FUNC(FCNAME);
+    do {
+        /* ----------------------------------------------------- */
+        /* Poll the completion queue                             */
+        /* The strategy here is                                  */
+        /* ret>0 successfull poll, events returned               */
+        /* ret==0 empty poll, no events/no error                 */
+        /* ret<0, error, but some error instances should not     */
+        /* cause MPI to terminate                                */
+        /* ----------------------------------------------------- */
+        ret = fi_cq_read(gl_data.cq,    /* Tagged completion queue       */
+                         (void *) &wc,  /* OUT:  Tagged completion entry */
+                         1);    /* Number of entries to poll     */
+        if (ret > 0) {
+            if (NULL != wc.op_context) {
+                req = context_to_req(wc.op_context);
+                if (REQ_SFI(req)->event_callback) {
+                    MPI_RC(REQ_SFI(req)->event_callback(&wc, req));
+                    continue;
+                }
+                reqFn = req->dev.OnDataAvail;
+                if (reqFn) {
+                    if (REQ_SFI(req)->pack_buffer) {
+                        MPIU_Free(REQ_SFI(req)->pack_buffer);
+                    }
+                    vc = REQ_SFI(req)->vc;
+
+                    complete = 0;
+                    MPI_RC(reqFn(vc, req, &complete));
+                    continue;
+                }
+                else {
+                    MPIU_Assert(0);
+                }
+            }
+            else {
+                MPIU_Assert(0);
+            }
+        }
+        else if (ret < 0) {
+            if (ret == -FI_EAVAIL) {
+                ret = fi_cq_readerr(gl_data.cq, (void *) &error, sizeof(error), 0);
+                if (error.err == FI_EMSGSIZE) {
+                    /* ----------------------------------------------------- */
+                    /* This error message should only be delivered on send   */
+                    /* events.  We want to ignore truncation errors          */
+                    /* on the sender side, but complete the request anyway   */
+                    /* Other kinds of requests, this is fatal.               */
+                    /* ----------------------------------------------------- */
+                    req = context_to_req(error.op_context);
+                    if (req->kind == MPID_REQUEST_SEND) {
+                        mpi_errno = REQ_SFI(req)->event_callback(NULL, req);
+                    }
+                    else if (req->kind == MPID_REQUEST_RECV) {
+                        mpi_errno = REQ_SFI(req)->event_callback(&wc, req);
+                        req->status.MPI_ERROR = MPI_ERR_TRUNCATE;
+                        req->status.MPI_TAG = error.tag;
+                    }
+                    else {
+                        mpi_errno = MPI_ERR_OTHER;
+                    }
+                }
+            }
+            else {
+                MPIU_ERR_CHKANDJUMP4(1, mpi_errno, MPI_ERR_OTHER, "**sfi_poll",
+                                     "**sfi_poll %s %d %s %s", __SHORT_FILE__,
+                                     __LINE__, FCNAME, fi_strerror(-ret));
+            }
+        }
+    } while (in_blocking_poll && (ret > 0));
+    END_FUNC_RC(FCNAME);
+}
diff --git a/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_tagged.c b/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_tagged.c
new file mode 100644
index 0000000..2d88c10
--- /dev/null
+++ b/src/mpid/ch3/channels/nemesis/netmod/sfi/sfi_tagged.c
@@ -0,0 +1,399 @@
+/*
+ *  (C) 2006 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ *
+ *  Portions of this code were written by Intel Corporation.
+ *  Copyright (C) 2011-2012 Intel Corporation.  Intel provides this material
+ *  to Argonne National Laboratory subject to Software Grant and Corporate
+ *  Contributor License Agreement dated February 8, 2012.
+ */
+#include "sfi_impl.h"
+
+#define MPID_NORMAL_SEND 0
+
+/* ------------------------------------------------------------------------ */
+/* Receive callback called after sending a syncronous send acknowledgement. */
+/* ------------------------------------------------------------------------ */
+#undef FCNAME
+#define FCNAME DECL_FUNC(MPID_nem_sfi_sync_recv_callback)
+static inline int MPID_nem_sfi_sync_recv_callback(cq_tagged_entry_t * wc ATTRIBUTE((unused)),
+                                                  MPID_Request * rreq)
+{
+    int mpi_errno = MPI_SUCCESS;
+
+    BEGIN_FUNC(FCNAME);
+
+    MPIDI_CH3U_Recvq_DP(REQ_SFI(rreq)->parent);
+    MPIDI_CH3U_Request_complete(REQ_SFI(rreq)->parent);
+    MPIDI_CH3U_Request_complete(rreq);
+
+    END_FUNC(FCNAME);
+    return mpi_errno;
+}
+
+/* ------------------------------------------------------------------------ */
+/* Send done callback                                                       */
+/* Free any temporary/pack buffers and complete the send request            */
+/* ------------------------------------------------------------------------ */
+#undef FCNAME
+#define FCNAME DECL_FUNC(MPID_nem_sfi_send_callback)
+static inline int MPID_nem_sfi_send_callback(cq_tagged_entry_t * wc ATTRIBUTE((unused)),
+                                             MPID_Request * sreq)
+{
+    int mpi_errno = MPI_SUCCESS;
+    BEGIN_FUNC(FCNAME);
+    if (REQ_SFI(sreq)->pack_buffer)
+        MPIU_Free(REQ_SFI(sreq)->pack_buffer);
+    MPIDI_CH3U_Request_complete(sreq);
+    END_FUNC(FCNAME);
+    return mpi_errno;
+}
+
+/* ------------------------------------------------------------------------ */
+/* Receive done callback                                                    */
+/* Handle an incoming receive completion event                              */
+/* ------------------------------------------------------------------------ */
+#undef FCNAME
+#define FCNAME DECL_FUNC(MPID_nem_sfi_recv_callback)
+static inline int MPID_nem_sfi_recv_callback(cq_tagged_entry_t * wc, MPID_Request * rreq)
+{
+    int err0, err1, src, mpi_errno = MPI_SUCCESS;
+    uint64_t ssend_bits;
+    MPIDI_msg_sz_t sz;
+    MPIDI_VC_t *vc;
+    MPID_Request *sync_req;
+    BEGIN_FUNC(FCNAME);
+    /* ---------------------------------------------------- */
+    /* Populate the MPI Status and unpack noncontig buffer  */
+    /* ---------------------------------------------------- */
+    rreq->status.MPI_ERROR = MPI_SUCCESS;
+    rreq->status.MPI_SOURCE = get_source(wc->tag);
+    rreq->status.MPI_TAG = get_tag(wc->tag);
+    REQ_SFI(rreq)->req_started = 1;
+    MPIR_STATUS_SET_COUNT(rreq->status, wc->len);
+
+    if (REQ_SFI(rreq)->pack_buffer) {
+        MPIDI_CH3U_Buffer_copy(REQ_SFI(rreq)->pack_buffer,
+                               MPIR_STATUS_GET_COUNT(rreq->status),
+                               MPI_BYTE, &err0, rreq->dev.user_buf,
+                               rreq->dev.user_count, rreq->dev.datatype, &sz, &err1);
+        MPIR_STATUS_SET_COUNT(rreq->status, sz);
+        MPIU_Free(REQ_SFI(rreq)->pack_buffer);
+        if (err0 || err1) {
+            rreq->status.MPI_ERROR = MPI_ERR_TYPE;
+        }
+    }
+
+    if ((wc->tag & MPID_PROTOCOL_MASK) == MPID_SYNC_SEND) {
+        /* ---------------------------------------------------- */
+        /* Ack the sync send and wait for the send request      */
+        /* completion(when callback executed.  A protocol bit   */
+        /* MPID_SYNC_SEND_ACK is set in the tag bits to provide */
+        /* separation of MPI messages and protocol messages     */
+        /* ---------------------------------------------------- */
+        vc = REQ_SFI(rreq)->vc;
+        if (!vc) {      /* MPI_ANY_SOURCE -- Post message from status, complete the VC */
+            src = get_source(wc->tag);
+            vc = rreq->comm->vcr[src];
+            MPIU_Assert(vc);
+        }
+        ssend_bits = init_sendtag(rreq->dev.match.parts.context_id,
+                                  rreq->comm->rank, rreq->status.MPI_TAG, MPID_SYNC_SEND_ACK);
+        MPID_nem_sfi_create_req(&sync_req, 1);
+        sync_req->dev.OnDataAvail = NULL;
+        sync_req->dev.next = NULL;
+        REQ_SFI(sync_req)->event_callback = MPID_nem_sfi_sync_recv_callback;
+        REQ_SFI(sync_req)->parent = rreq;
+        FI_RC(fi_tsendto(gl_data.endpoint,
+                         NULL,
+                         0,
+                         gl_data.mr,
+                         VC_SFI(vc)->direct_addr,
+                         ssend_bits, &(REQ_SFI(sync_req)->sfi_context)), tsendto);
+    }
+    else {
+        /* ---------------------------------------------------- */
+        /* Non-syncronous send, complete normally               */
+        /* by removing from the CH3 queue and completing the    */
+        /* request object                                       */
+        /* ---------------------------------------------------- */
+        MPIDI_CH3U_Recvq_DP(rreq);
+        MPIDI_CH3U_Request_complete(rreq);
+    }
+    END_FUNC_RC(FCNAME);
+}
+
+#undef FCNAME
+#define FCNAME DECL_FUNC(do_isend)
+static inline int do_isend(struct MPIDI_VC *vc,
+                           const void *buf,
+                           int count,
+                           MPI_Datatype datatype,
+                           int dest,
+                           int tag,
+                           MPID_Comm * comm,
+                           int context_offset, struct MPID_Request **request, uint64_t type)
+{
+    int err0, err1, dt_contig, mpi_errno = MPI_SUCCESS;
+    char *send_buffer;
+    uint64_t match_bits, ssend_match, ssend_mask;
+    MPI_Aint dt_true_lb;
+    MPID_Request *sreq = NULL, *sync_req = NULL;
+    MPIDI_msg_sz_t data_sz;
+    MPID_Datatype *dt_ptr;
+    BEGIN_FUNC(FCNAME);
+    VC_READY_CHECK(vc);
+
+    /* ---------------------------------------------------- */
+    /* Create the MPI request                               */
+    /* ---------------------------------------------------- */
+    MPID_nem_sfi_create_req(&sreq, 2);
+    sreq->kind = MPID_REQUEST_SEND;
+    sreq->dev.OnDataAvail = NULL;
+    REQ_SFI(sreq)->event_callback = MPID_nem_sfi_send_callback;
+    REQ_SFI(sreq)->vc = vc;
+
+    /* ---------------------------------------------------- */
+    /* Create the pack buffer (if required), and allocate   */
+    /* a send request                                       */
+    /* ---------------------------------------------------- */
+    match_bits = init_sendtag(comm->context_id + context_offset, comm->rank, tag, type);
+    sreq->dev.match.parts.tag = match_bits;
+    MPIDI_Datatype_get_info(count, datatype, dt_contig, data_sz, dt_ptr, dt_true_lb);
+    send_buffer = (char *) buf + dt_true_lb;
+    if (!dt_contig) {
+        send_buffer = (char *) MPIU_Malloc(data_sz);
+        MPIU_ERR_CHKANDJUMP1(send_buffer == NULL, mpi_errno,
+                             MPI_ERR_OTHER, "**nomem", "**nomem %s", "Send buffer alloc");
+        MPIDI_CH3U_Buffer_copy(buf, count, datatype, &err0,
+                               send_buffer, data_sz, MPI_BYTE, &data_sz, &err1);
+        REQ_SFI(sreq)->pack_buffer = send_buffer;
+    }
+
+    if (type == MPID_SYNC_SEND) {
+        /* ---------------------------------------------------- */
+        /* For syncronous send, we post a receive to catch the  */
+        /* match ack, but use the tag protocol bits to avoid    */
+        /* matching with MPI level messages.                    */
+        /* ---------------------------------------------------- */
+        int c = 1;
+        MPID_cc_incr(sreq->cc_ptr, &c);
+        MPID_nem_sfi_create_req(&sync_req, 1);
+        sync_req->dev.OnDataAvail = NULL;
+        sync_req->dev.next = NULL;
+        REQ_SFI(sync_req)->event_callback = MPID_nem_sfi_sync_recv_callback;
+        REQ_SFI(sync_req)->parent = sreq;
+        ssend_match = init_recvtag(&ssend_mask, comm->context_id + context_offset, dest, tag);
+        ssend_match |= MPID_SYNC_SEND_ACK;
+        FI_RC(fi_trecvfrom(gl_data.endpoint,    /* endpoint    */
+                           NULL,        /* recvbuf     */
+                           0,   /* data sz     */
+                           gl_data.mr,  /* dynamic mr  */
+                           VC_SFI(vc)->direct_addr,     /* remote proc */
+                           ssend_match, /* match bits  */
+                           0ULL,        /* mask bits   */
+                           &(REQ_SFI(sync_req)->sfi_context)), trecvfrom);
+    }
+    FI_RC(fi_tsendto(gl_data.endpoint,  /* Endpoint                       */
+                     send_buffer,       /* Send buffer(packed or user)    */
+                     data_sz,   /* Size of the send               */
+                     gl_data.mr,        /* Dynamic memory region          */
+                     VC_SFI(vc)->direct_addr,   /* Use the address of this VC     */
+                     match_bits,        /* Match bits                     */
+                     &(REQ_SFI(sreq)->sfi_context)), tsendto);
+    *request = sreq;
+    END_FUNC_RC(FCNAME);
+}
+
+#undef FCNAME
+#define FCNAME DECL_FUNC(MPID_nem_sfi_recv_posted)
+int MPID_nem_sfi_recv_posted(struct MPIDI_VC *vc, struct MPID_Request *rreq)
+{
+    int mpi_errno = MPI_SUCCESS, dt_contig, src, tag;
+    uint64_t match_bits = 0, mask_bits = 0;
+    fi_addr_t remote_proc = 0;
+    MPIDI_msg_sz_t data_sz;
+    MPI_Aint dt_true_lb;
+    MPID_Datatype *dt_ptr;
+    MPIR_Context_id_t context_id;
+    char *recv_buffer;
+    BEGIN_FUNC(FCNAME);
+
+    /* ------------------------ */
+    /* Initialize the request   */
+    /* ------------------------ */
+    MPID_nem_sfi_init_req(rreq);
+    REQ_SFI(rreq)->event_callback = MPID_nem_sfi_recv_callback;
+    REQ_SFI(rreq)->vc = vc;
+
+    /* ---------------------------------------------------- */
+    /* Fill out the match info, and allocate the pack buffer */
+    /* a send request                                       */
+    /* ---------------------------------------------------- */
+    src = rreq->dev.match.parts.rank;
+    tag = rreq->dev.match.parts.tag;
+    context_id = rreq->dev.match.parts.context_id;
+    match_bits = init_recvtag(&mask_bits, context_id, src, tag);
+    SFI_ADDR_INIT(src, vc, remote_proc);
+    MPIDI_Datatype_get_info(rreq->dev.user_count, rreq->dev.datatype,
+                            dt_contig, data_sz, dt_ptr, dt_true_lb);
+    if (dt_contig) {
+        recv_buffer = (char *) rreq->dev.user_buf + dt_true_lb;
+    }
+    else {
+        recv_buffer = (char *) MPIU_Malloc(data_sz);
+        MPIU_ERR_CHKANDJUMP1(recv_buffer == NULL, mpi_errno, MPI_ERR_OTHER,
+                             "**nomem", "**nomem %s", "Recv Pack Buffer alloc");
+        REQ_SFI(rreq)->pack_buffer = recv_buffer;
+    }
+
+    /* ---------------- */
+    /* Post the receive */
+    /* ---------------- */
+    FI_RC(fi_trecvfrom(gl_data.endpoint,
+                       recv_buffer,
+                       data_sz,
+                       gl_data.mr,
+                       remote_proc,
+                       match_bits, mask_bits, &(REQ_SFI(rreq)->sfi_context)), trecvfrom);
+    MPID_nem_sfi_poll(MPID_NONBLOCKING_POLL);
+    END_FUNC_RC(FCNAME);
+}
+
+#undef FCNAME
+#define FCNAME DECL_FUNC(MPID_nem_sfi_send)
+int MPID_nem_sfi_send(struct MPIDI_VC *vc,
+                      const void *buf,
+                      int count,
+                      MPI_Datatype datatype,
+                      int dest,
+                      int tag, MPID_Comm * comm, int context_offset, struct MPID_Request **request)
+{
+    int mpi_errno = MPI_SUCCESS;
+
+    BEGIN_FUNC(FCNAME);
+    mpi_errno = do_isend(vc, buf, count, datatype, dest, tag,
+                         comm, context_offset, request, MPID_NORMAL_SEND);
+    END_FUNC(FCNAME);
+    return mpi_errno;
+}
+
+#undef FCNAME
+#define FCNAME DECL_FUNC(MPID_nem_sfi_isend)
+int MPID_nem_sfi_isend(struct MPIDI_VC *vc,
+                       const void *buf,
+                       int count,
+                       MPI_Datatype datatype,
+                       int dest,
+                       int tag, MPID_Comm * comm, int context_offset, struct MPID_Request **request)
+{
+    int mpi_errno = MPI_SUCCESS;
+    BEGIN_FUNC(FCNAME);
+    mpi_errno = do_isend(vc, buf, count, datatype, dest,
+                         tag, comm, context_offset, request, MPID_NORMAL_SEND);
+    END_FUNC(FCNAME);
+    return mpi_errno;
+}
+
+#undef FCNAME
+#define FCNAME DECL_FUNC(MPID_nem_sfi_ssend)
+int MPID_nem_sfi_ssend(struct MPIDI_VC *vc,
+                       const void *buf,
+                       int count,
+                       MPI_Datatype datatype,
+                       int dest,
+                       int tag, MPID_Comm * comm, int context_offset, struct MPID_Request **request)
+{
+    int mpi_errno = MPI_SUCCESS;
+    BEGIN_FUNC(FCNAME);
+    mpi_errno = do_isend(vc, buf, count, datatype, dest,
+                         tag, comm, context_offset, request, MPID_SYNC_SEND);
+    END_FUNC(FCNAME);
+    return mpi_errno;
+}
+
+#undef FCNAME
+#define FCNAME DECL_FUNC(MPID_nem_sfi_issend)
+int MPID_nem_sfi_issend(struct MPIDI_VC *vc,
+                        const void *buf,
+                        int count,
+                        MPI_Datatype datatype,
+                        int dest,
+                        int tag,
+                        MPID_Comm * comm, int context_offset, struct MPID_Request **request)
+{
+    int mpi_errno = MPI_SUCCESS;
+    BEGIN_FUNC(FCNAME);
+    mpi_errno = do_isend(vc, buf, count, datatype, dest,
+                         tag, comm, context_offset, request, MPID_SYNC_SEND);
+    END_FUNC(FCNAME);
+    return mpi_errno;
+}
+
+#define DO_CANCEL(req)                                  \
+({                                                      \
+  int mpi_errno = MPI_SUCCESS;                          \
+  int ret;                                              \
+  BEGIN_FUNC(FCNAME);                                   \
+  MPID_nem_sfi_poll(MPID_NONBLOCKING_POLL);             \
+  ret = fi_cancel((fid_t)gl_data.endpoint,              \
+                  &(REQ_SFI(req)->sfi_context));        \
+  if (ret == 0) {                                        \
+    MPIR_STATUS_SET_CANCEL_BIT(req->status, TRUE);      \
+  } else {                                              \
+    MPIR_STATUS_SET_CANCEL_BIT(req->status, FALSE);     \
+  }                                                     \
+  END_FUNC(FCNAME);                                     \
+  return mpi_errno;                                     \
+})
+
+#undef FCNAME
+#define FCNAME DECL_FUNC(MPID_nem_sfi_cancel_send)
+int MPID_nem_sfi_cancel_send(struct MPIDI_VC *vc ATTRIBUTE((unused)), struct MPID_Request *sreq)
+{
+    DO_CANCEL(sreq);
+}
+
+#undef FCNAME
+#define FCNAME DECL_FUNC(MPID_nem_sfi_cancel_recv)
+int MPID_nem_sfi_cancel_recv(struct MPIDI_VC *vc ATTRIBUTE((unused)), struct MPID_Request *rreq)
+{
+    DO_CANCEL(rreq);
+}
+
+#undef FCNAME
+#define FCNAME DECL_FUNC(MPID_nem_sfi_anysource_posted)
+void MPID_nem_sfi_anysource_posted(MPID_Request * rreq)
+{
+    int mpi_errno = MPI_SUCCESS;
+    BEGIN_FUNC(FCNAME);
+    mpi_errno = MPID_nem_sfi_recv_posted(NULL, rreq);
+    MPIU_Assert(mpi_errno == MPI_SUCCESS);
+    END_FUNC(FCNAME);
+}
+
+#undef FCNAME
+#define FCNAME DECL_FUNC(MPID_nem_sfi_anysource_matched)
+int MPID_nem_sfi_anysource_matched(MPID_Request * rreq)
+{
+    int mpi_errno = FALSE;
+    int ret;
+    BEGIN_FUNC(FCNAME);
+    /* ----------------------------------------------------- */
+    /* Netmod has notified us that it has matched an any     */
+    /* source request on another device.  We have the chance */
+    /* to cancel this shared request if it has been posted   */
+    /* ----------------------------------------------------- */
+    ret = fi_cancel((fid_t) gl_data.endpoint, &(REQ_SFI(rreq)->sfi_context));
+    if (ret == 0) {
+        /* --------------------------------------------------- */
+        /* Request cancelled:  cancel and complete the request */
+        /* --------------------------------------------------- */
+        mpi_errno = TRUE;
+        MPIR_STATUS_SET_CANCEL_BIT(rreq->status, TRUE);
+        MPIR_STATUS_SET_COUNT(rreq->status, 0);
+        MPIDI_CH3U_Request_complete(rreq);
+    }
+    END_FUNC(FCNAME);
+    return mpi_errno;
+}
diff --git a/src/mpid/ch3/channels/nemesis/netmod/sfi/subconfigure.m4 b/src/mpid/ch3/channels/nemesis/netmod/sfi/subconfigure.m4
new file mode 100644
index 0000000..361f7d0
--- /dev/null
+++ b/src/mpid/ch3/channels/nemesis/netmod/sfi/subconfigure.m4
@@ -0,0 +1,24 @@
+[#] start of __file__
+dnl MPICH_SUBCFG_AFTER=src/mpid/ch3/channels/nemesis
+
+AC_DEFUN([PAC_SUBCFG_PREREQ_]PAC_SUBCFG_AUTO_SUFFIX,[
+    AM_COND_IF([BUILD_CH3_NEMESIS],[
+        for net in $nemesis_networks ; do
+            AS_CASE([$net],[sfi],[build_nemesis_netmod_sfi=yes])
+        done
+    ])
+    AM_CONDITIONAL([BUILD_NEMESIS_NETMOD_SFI],[test "X$build_nemesis_netmod_sfi" = "Xyes"])
+])dnl
+
+AC_DEFUN([PAC_SUBCFG_BODY_]PAC_SUBCFG_AUTO_SUFFIX,[
+AM_COND_IF([BUILD_NEMESIS_NETMOD_SFI],[
+    AC_MSG_NOTICE([RUNNING CONFIGURE FOR ch3:nemesis:sfi])
+
+    PAC_SET_HEADER_LIB_PATH(sfi)
+    PAC_CHECK_HEADER_LIB_FATAL(sfi, rdma/fabric.h, fabric, fi_getinfo)
+
+    AC_DEFINE([ENABLE_COMM_OVERRIDES], 1, [define to add per-vc function pointers to override send and recv functions])
+])dnl end AM_COND_IF(BUILD_NEMESIS_NETMOD_SFI,...)
+])dnl end _BODY
+
+[#] end of __file__

http://git.mpich.org/mpich.git/commitdiff/b84be57606aade7f9cee87c248fd81423cc2f45e

commit b84be57606aade7f9cee87c248fd81423cc2f45e
Author: Paul Coffman <pkcoff at us.ibm.com>
Date:   Thu Nov 20 18:36:41 2014 -0600

    Criteria for disabling pami optimized collectives invalid on BGQ
    
    At the end of MPIDI_Init_collsel_extension in the pami device init code
    mpid_init.c there is logic to disable the optimized collectives based on
    criteria that is invalid on BGQ but was nonetheless always evaluating to
    true and disabling the optimized collectives on BGQ.  Compiler
    directives were placed around the logic to avoid this code for the  BGQ
    platform.
    
    Signed-off-by: Paul Coffman <pkcoff at us.ibm.com>
    Signed-off-by: Rob Latham <robl at mcs.anl.gov>

diff --git a/src/mpid/pamid/src/mpid_init.c b/src/mpid/pamid/src/mpid_init.c
index 064f085..52f0b7c 100644
--- a/src/mpid/pamid/src/mpid_init.c
+++ b/src/mpid/pamid/src/mpid_init.c
@@ -636,9 +636,11 @@ void MPIDI_Init_collsel_extension()
   else
     MPIDI_Process.optimized.auto_select_colls = MPID_AUTO_SELECT_COLLS_NONE;
 
+#ifndef __BGQ__
   //If collective selection will be disabled, check on fca, if both not required, disable pami alltogether
   if(MPIDI_Process.optimized.auto_select_colls == MPID_AUTO_SELECT_COLLS_NONE && MPIDI_Process.optimized.collectives != MPID_COLL_FCA)
     MPIDI_Process.optimized.collectives = MPID_COLL_OFF;
+#endif
 }
 
 void MPIDI_Collsel_table_generate()

http://git.mpich.org/mpich.git/commitdiff/99b94f7eb247d769931250619a0bae050008cad5

commit 99b94f7eb247d769931250619a0bae050008cad5
Author: Min Si <msi at il.is.s.u-tokyo.ac.jp>
Date:   Wed Nov 19 14:49:06 2014 -0600

    Add tests for SHM detection in win_create.
    
    This program creates window with shm window buffer and checks the
    correctness of RMA operations issued through that window. It generates
    two tests with and without alloc_shm info, in which operations are
    issued out as SHM OP and as AM respectively.
    
    Signed-off-by: Xin Zhao <xinzhao3 at illinois.edu>

diff --git a/test/mpi/rma/Makefile.am b/test/mpi/rma/Makefile.am
index c5c7d71..0fce1ad 100644
--- a/test/mpi/rma/Makefile.am
+++ b/test/mpi/rma/Makefile.am
@@ -92,6 +92,8 @@ noinst_PROGRAMS =          \
     win_shared_noncontig   \
     win_shared_noncontig_put \
     win_shared_zerobyte    \
+    win_shared_create_allocshm    \
+    win_shared_create_no_allocshm \
     win_zero               \
     win_large_shm          \
     win_dynamic_acc        \
@@ -214,3 +216,7 @@ mutex_bench_shm_ordered_SOURCES  = mutex_bench.c mcs-mutex.c mcs-mutex.h
 
 linked_list_bench_lock_shr_nocheck_SOURCES  = linked_list_bench_lock_shr.c
 linked_list_bench_lock_shr_nocheck_CPPFLAGS = -DUSE_MODE_NOCHECK $(AM_CPPFLAGS)
+
+win_shared_create_allocshm_SOURCES = win_shared_create.c
+win_shared_create_no_allocshm_SOURCES = win_shared_create.c
+win_shared_create_allocshm_CPPFLAGS = -DUSE_INFO_ALLOC_SHM $(AM_CPPFLAGS)
diff --git a/test/mpi/rma/testlist.in b/test/mpi/rma/testlist.in
index 03f659d..dd9aa16 100644
--- a/test/mpi/rma/testlist.in
+++ b/test/mpi/rma/testlist.in
@@ -75,6 +75,8 @@ manyrma2 2 timeLimit=500
 manyrma2_shm 2 timeLimit=500
 manyrma3 2
 win_shared 4 mpiversion=3.0
+win_shared_create_allocshm 4 mpiversion=3.0
+win_shared_create_no_allocshm 4 mpiversion=3.0
 win_shared_noncontig 4 mpiversion=3.0
 win_shared_noncontig_put 4 mpiversion=3.0
 win_zero 4 mpiversion=3.0
diff --git a/test/mpi/rma/win_shared_create.c b/test/mpi/rma/win_shared_create.c
new file mode 100644
index 0000000..eec22d6
--- /dev/null
+++ b/test/mpi/rma/win_shared_create.c
@@ -0,0 +1,140 @@
+/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
+/*
+ *
+ *  (C) 2014 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <assert.h>
+#include <mpi.h>
+#include "mpitest.h"
+
+#define ELEM_PER_PROC 4
+int local_buf[ELEM_PER_PROC];
+
+const int verbose = 0;
+
+int main(int argc, char **argv)
+{
+    int i, rank, nproc;
+    int shm_rank, shm_nproc;
+    MPI_Aint size;
+    int errors = 0, all_errors = 0;
+    int **bases = NULL, *my_base = NULL;
+    int disp_unit;
+    MPI_Win shm_win = MPI_WIN_NULL, win = MPI_WIN_NULL;
+    MPI_Comm shm_comm = MPI_COMM_NULL;
+    MPI_Group shm_group = MPI_GROUP_NULL, world_group = MPI_GROUP_NULL;
+    int dst_shm_rank, dst_world_rank;
+    MPI_Info create_info = MPI_INFO_NULL;
+
+    MPI_Init(&argc, &argv);
+
+    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
+    MPI_Comm_size(MPI_COMM_WORLD, &nproc);
+
+    MPI_Comm_split_type(MPI_COMM_WORLD, MPI_COMM_TYPE_SHARED, rank, MPI_INFO_NULL, &shm_comm);
+
+    MPI_Comm_rank(shm_comm, &shm_rank);
+    MPI_Comm_size(shm_comm, &shm_nproc);
+
+    /* Platform does not support shared memory, just return. */
+    if (shm_nproc < 2) {
+        goto exit;
+    }
+
+    /* Specify the last process in the node as the target process */
+    dst_shm_rank = shm_nproc - 1;
+    MPI_Comm_group(shm_comm, &shm_group);
+    MPI_Comm_group(MPI_COMM_WORLD, &world_group);
+    MPI_Group_translate_ranks(shm_group, 1, &dst_shm_rank, world_group, &dst_world_rank);
+
+    bases = calloc(shm_nproc, sizeof(int *));
+
+    /* Allocate shm window among local processes, then create a global window with
+     * those shm window buffers */
+    MPI_Win_allocate_shared(sizeof(int) * ELEM_PER_PROC, sizeof(int), MPI_INFO_NULL,
+                            shm_comm, &my_base, &shm_win);
+    if (verbose)
+        printf("%d -- allocate shared: my_base = %p, absolute base\n", shm_rank, my_base);
+
+    for (i = 0; i < shm_nproc; i++) {
+        MPI_Win_shared_query(shm_win, i, &size, &disp_unit, &bases[i]);
+        if (verbose)
+            printf("%d --    shared query: base[%d]=%p, size %ld, unit %d\n",
+                   shm_rank, i, bases[i], size, disp_unit);
+    }
+
+#ifdef USE_INFO_ALLOC_SHM
+    MPI_Info_create(&create_info);
+    MPI_Info_set(create_info, "alloc_shm", "true");
+#else
+    create_info = MPI_INFO_NULL;
+#endif
+
+    MPI_Win_create(my_base, sizeof(int) * ELEM_PER_PROC, sizeof(int), create_info, MPI_COMM_WORLD,
+                   &win);
+
+    /* Reset data */
+    for (i = 0; i < ELEM_PER_PROC; i++) {
+        my_base[i] = 0;
+        local_buf[i] = i + 1;
+    }
+
+    /* Do RMA through global window, then check value through shared window */
+    MPI_Win_lock_all(0, win);
+    MPI_Win_lock_all(0, shm_win);
+
+    if (shm_rank == 0) {
+        MPI_Put(&local_buf[0], 1, MPI_INT, dst_world_rank, 0, 1, MPI_INT, win);
+        MPI_Put(&local_buf[ELEM_PER_PROC - 1], 1, MPI_INT, dst_world_rank, ELEM_PER_PROC - 1, 1,
+                MPI_INT, win);
+        MPI_Win_flush(dst_world_rank, win);
+    }
+
+    MPI_Win_sync(shm_win);
+    MPI_Barrier(shm_comm);
+    MPI_Win_sync(shm_win);
+
+    if (bases[dst_shm_rank][0] != local_buf[0]) {
+        errors++;
+        printf("%d -- Got %d at rank %d index %d, expected %d\n", rank,
+               bases[dst_shm_rank][0], dst_shm_rank, 0, local_buf[0]);
+    }
+    if (bases[dst_shm_rank][ELEM_PER_PROC - 1] != local_buf[ELEM_PER_PROC - 1]) {
+        errors++;
+        printf("%d -- Got %d at rank %d index %d, expected %d\n", rank,
+               bases[dst_shm_rank][ELEM_PER_PROC - 1], dst_shm_rank,
+               ELEM_PER_PROC - 1, local_buf[ELEM_PER_PROC - 1]);
+    }
+
+    MPI_Win_unlock_all(shm_win);
+    MPI_Win_unlock_all(win);
+
+    MPI_Reduce(&errors, &all_errors, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
+
+    MPI_Win_free(&win);
+    MPI_Win_free(&shm_win);
+
+  exit:
+    if (rank == 0 && all_errors == 0)
+        printf(" No Errors\n");
+
+    if (create_info != MPI_INFO_NULL)
+        MPI_Info_free(&create_info);
+    if (shm_comm != MPI_COMM_NULL)
+        MPI_Comm_free(&shm_comm);
+    if (shm_group != MPI_GROUP_NULL)
+        MPI_Group_free(&shm_group);
+    if (world_group != MPI_GROUP_NULL)
+        MPI_Group_free(&world_group);
+
+    MPI_Finalize();
+
+    if (bases)
+        free(bases);
+
+    return 0;
+}

http://git.mpich.org/mpich.git/commitdiff/4aa9470ba67ed9bec7bb97f0dcd7e8ea44e6de5a

commit 4aa9470ba67ed9bec7bb97f0dcd7e8ea44e6de5a
Author: Min Si <msi at il.is.s.u-tokyo.ac.jp>
Date:   Wed Nov 19 13:57:39 2014 -0600

    Add info check to avoid unnecessary SHM detection.
    
    If user does not explicitly set alloc_shm to TRUE in win_create, we
    should never detect SHM windows because of expensive overhead. However,
    current code does not check this info flag. This patch fixed it.
    
    Closes #2161
    
    Signed-off-by: Xin Zhao <xinzhao3 at illinois.edu>

diff --git a/src/mpid/ch3/src/ch3u_win_fns.c b/src/mpid/ch3/src/ch3u_win_fns.c
index 9e16ce8..71f8b83 100644
--- a/src/mpid/ch3/src/ch3u_win_fns.c
+++ b/src/mpid/ch3/src/ch3u_win_fns.c
@@ -132,7 +132,8 @@ int MPIDI_CH3U_Win_create(void *base, MPI_Aint size, int disp_unit, MPID_Info *i
     mpi_errno = MPIDI_CH3U_Win_create_gather(base, size, disp_unit, info, comm_ptr, win_ptr);
     if (mpi_errno != MPI_SUCCESS) { MPIU_ERR_POP(mpi_errno); }
 
-    if (MPIDI_CH3U_Win_fns.detect_shm != NULL) {
+    if ((*win_ptr)->info_args.alloc_shm == TRUE
+            && MPIDI_CH3U_Win_fns.detect_shm != NULL) {
         /* Detect if shared buffers are specified for the processes in the
          * current node. If so, enable shm RMA.*/
         mpi_errno = MPIDI_CH3U_Win_fns.detect_shm(win_ptr);
diff --git a/src/mpid/ch3/src/mpid_rma.c b/src/mpid/ch3/src/mpid_rma.c
index 685b665..59f438c 100644
--- a/src/mpid/ch3/src/mpid_rma.c
+++ b/src/mpid/ch3/src/mpid_rma.c
@@ -87,7 +87,17 @@ int MPID_Win_create(void *base, MPI_Aint size, int disp_unit, MPID_Info *info,
 
     (*win_ptr)->base = base;
 
-    mpi_errno = MPIDI_CH3U_Win_fns.create(base, size, disp_unit, info, comm_ptr, win_ptr); 
+    /* FOR CREATE, alloc_shm info is default to set to FALSE */
+    (*win_ptr)->info_args.alloc_shm = FALSE;
+    if (info != NULL) {
+        int alloc_shm_flag = 0;
+        char shm_alloc_value[MPI_MAX_INFO_VAL+1];
+        MPIR_Info_get_impl(info, "alloc_shm", MPI_MAX_INFO_VAL, shm_alloc_value, &alloc_shm_flag);
+        if ((alloc_shm_flag == 1) && (!strncmp(shm_alloc_value, "true", sizeof("true"))))
+            (*win_ptr)->info_args.alloc_shm = TRUE;
+    }
+
+    mpi_errno = MPIDI_CH3U_Win_fns.create(base, size, disp_unit, info, comm_ptr, win_ptr);
     if (mpi_errno) MPIU_ERR_POP(mpi_errno);
 
  fn_fail:

http://git.mpich.org/mpich.git/commitdiff/88cebe83e35a18c79796d1b7ccef26353624bcce

commit 88cebe83e35a18c79796d1b7ccef26353624bcce
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Fri Nov 14 15:48:08 2014 -0600

    portals4: add macro for safe PtlMEAppend
    
    Signed-off-by: Antonio J. Pena <apenya at mcs.anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
index 8f39d73..4071f72 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
@@ -208,6 +208,26 @@ int MPID_nem_ptl_lmt_handle_cookie(MPIDI_VC_t *vc, MPID_Request *req, MPID_IOV s
 int MPID_nem_ptl_lmt_done_send(MPIDI_VC_t *vc, MPID_Request *req);
 int MPID_nem_ptl_lmt_done_recv(MPIDI_VC_t *vc, MPID_Request *req);
 
+/* a safe PtlMEAppend for when there is no space available */
+static inline int MPID_nem_ptl_me_append(ptl_handle_ni_t  ni_handle,
+                                         ptl_pt_index_t   pt_index,
+                                         const ptl_me_t  *me,
+                                         ptl_list_t       ptl_list,
+                                         void            *user_ptr,
+                                         ptl_handle_me_t *me_handle)
+{
+    int ret;
+
+    while (1) {
+        ret = PtlMEAppend(ni_handle, pt_index, me, ptl_list, user_ptr, me_handle);
+        if (ret != PTL_NO_SPACE)
+            break;
+        MPID_nem_ptl_poll(1);
+    }
+
+    return ret;
+}
+
 /* packet handlers */
 
 int MPID_nem_ptl_pkt_cancel_send_req_handler(MPIDI_VC_t *vc, MPIDI_CH3_Pkt_t *pkt,
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c
index 02fcc0b..1780920 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c
@@ -63,8 +63,8 @@ int MPID_nem_ptl_nm_init(void)
         mes[i].match_bits = CTL_TAG;
         mes[i].ignore_bits = 0;
 
-        ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_control_pt, &mes[i],
-                          PTL_PRIORITY_LIST, (void *)(uint64_t)i, &me_handles[i]);
+        ret = MPID_nem_ptl_me_append(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_control_pt, &mes[i],
+                                     PTL_PRIORITY_LIST, (void *)(uint64_t)i, &me_handles[i]);
         MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmeappend", "**ptlmeappend %s",
                              MPID_nem_ptl_strerror(ret));
     }
@@ -134,15 +134,8 @@ static inline int meappend_large(ptl_process_t id, MPID_Request *req, ptl_match_
 
         ++REQ_PTL(req)->num_gets;
 
-        /* if there is no space to append the entry, process outstanding events and try again */
-        while (1) {
-            ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_control_pt, &me, PTL_PRIORITY_LIST, req,
-                              &foo_me_handle);
-            if (ret != PTL_NO_SPACE)
-                break;
-            MPID_nem_ptl_poll(1);
-        }
-
+        ret = MPID_nem_ptl_me_append(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_control_pt, &me, PTL_PRIORITY_LIST, req,
+                                     &foo_me_handle);
         MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmeappend", "**ptlmeappend %s",
                              MPID_nem_ptl_strerror(ret));
         MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST, "PtlMEAppend(req=%p tag=%#lx)", req, tag));
@@ -450,14 +443,8 @@ int MPID_nem_ptl_nm_ctl_event_handler(const ptl_event_t *e)
             }
 
             /* Repost the recv buffer */
-            /* if there is no space to append the entry, process outstanding events and try again */
-            while (1) {
-                ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_control_pt, &mes[buf_idx],
-                                  PTL_PRIORITY_LIST, e->user_ptr /* buf_idx */, &me_handles[buf_idx]);
-                if (ret != PTL_NO_SPACE)
-                    break;
-                MPID_nem_ptl_poll(1);
-            }
+            ret = MPID_nem_ptl_me_append(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_control_pt, &mes[buf_idx],
+                                         PTL_PRIORITY_LIST, e->user_ptr /* buf_idx */, &me_handles[buf_idx]);
             MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmeappend",
                                  "**ptlmeappend %s", MPID_nem_ptl_strerror(ret));
         }
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c
index 0a3bf10..3e22556 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c
@@ -106,13 +106,8 @@ static int append_overflow(int i)
     me.min_free = PTL_MAX_EAGER;
     
     /* if there is no space to append the entry, process outstanding events and try again */
-    while (1) {
-        ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_pt, &me, PTL_OVERFLOW_LIST, (void *)(size_t)i,
-                          &overflow_me_handle[i]);
-        if (ret != PTL_NO_SPACE)
-            break;
-        MPID_nem_ptl_poll(1);
-    }
+    ret = MPID_nem_ptl_me_append(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_pt, &me, PTL_OVERFLOW_LIST, (void *)(size_t)i,
+                                 &overflow_me_handle[i]);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmeappend", "**ptlmeappend %s", MPID_nem_ptl_strerror(ret));
 
  fn_exit:
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c
index 3633236..e69e9b1 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c
@@ -543,13 +543,7 @@ int MPID_nem_ptl_recv_posted(MPIDI_VC_t *vc, MPID_Request *rreq)
         
     }
 
-    /* if there is no space to append the entry, process outstanding events and try again */
-    while (1) {
-        ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_pt, &me, PTL_PRIORITY_LIST, rreq, &REQ_PTL(rreq)->put_me);
-        if (ret != PTL_NO_SPACE)
-            break;
-        MPID_nem_ptl_poll(1);
-    }
+    ret = MPID_nem_ptl_me_append(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_pt, &me, PTL_PRIORITY_LIST, rreq, &REQ_PTL(rreq)->put_me);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmeappend", "**ptlmeappend %s", MPID_nem_ptl_strerror(ret));
     DBG_MSG_MEAPPEND("REG", vc ? vc->pg_rank : MPI_ANY_SOURCE, me, rreq);
     MPIU_DBG_MSG_P(CH3_CHANNEL, VERBOSE, "    buf=%p", me.start);
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
index f6402f9..e49546f 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
@@ -41,13 +41,8 @@ static void big_meappend(void *buf, ptl_size_t left_to_send, MPIDI_VC_t *vc, ptl
         else
             me.length = left_to_send;
 
-        /* if there is no space to append the entry, process outstanding events and try again */
-        while (1) {
-            ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_get_pt, &me, PTL_PRIORITY_LIST, sreq, &REQ_PTL(sreq)->get_me_p[i]);
-            if (ret != PTL_NO_SPACE)
-                break;
-            MPID_nem_ptl_poll(1);
-        }
+        ret = MPID_nem_ptl_me_append(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_get_pt, &me, PTL_PRIORITY_LIST, sreq,
+                                     &REQ_PTL(sreq)->get_me_p[i]);
         DBG_MSG_MEAPPEND("CTL", vc->pg_rank, me, sreq);
         MPIU_Assert(ret == 0);
 
@@ -387,14 +382,8 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
                 MPIU_CHKPMEM_MALLOC(REQ_PTL(sreq)->get_me_p, ptl_handle_me_t *, sizeof(ptl_handle_me_t), mpi_errno, "get_me_p");
 
                 REQ_PTL(sreq)->num_gets = 1;
-                /* if there is no space to append the entry, process outstanding events and try again */
-                while (1) {
-                    ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_get_pt, &me, PTL_PRIORITY_LIST, sreq,
-                                      &REQ_PTL(sreq)->get_me_p[0]);
-                    if (ret != PTL_NO_SPACE)
-                        break;
-                    MPID_nem_ptl_poll(1);
-                }
+                ret = MPID_nem_ptl_me_append(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_get_pt, &me, PTL_PRIORITY_LIST, sreq,
+                                             &REQ_PTL(sreq)->get_me_p[0]);
                 MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmeappend", "**ptlmeappend %s", MPID_nem_ptl_strerror(ret));
                 DBG_MSG_MEAPPEND("CTL", vc->pg_rank, me, sreq);
 

http://git.mpich.org/mpich.git/commitdiff/c8b3a45945b31f6f59bd8af42106d7ab0bdc1c4e

commit c8b3a45945b31f6f59bd8af42106d7ab0bdc1c4e
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Fri Nov 14 10:41:53 2014 -0600

    portals4: raise list size limit
    
    Signed-off-by: Antonio J. Pena <apenya at mcs.anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
index ffad963..3b0807d 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
@@ -14,6 +14,7 @@
 
 #define UNEXPECTED_HDR_COUNT 32768
 #define EQ_COUNT             32768
+#define LIST_SIZE            32768
 #define NID_KEY  "NID"
 #define PID_KEY  "PID"
 #define PTI_KEY  "PTI"
@@ -180,6 +181,8 @@ static int ptl_init(MPIDI_PG_t *pg_p, int pg_rank, char **bc_val_p, int *val_max
         desired.max_unexpected_headers = UNEXPECTED_HDR_COUNT;
     if (desired.max_eqs < EQ_COUNT && getenv("PTL_LIM_MAX_EQS") == NULL)
         desired.max_eqs = EQ_COUNT;
+    if (desired.max_list_size < LIST_SIZE && getenv("PTL_LIM_MAX_LIST_SIZE") == NULL)
+        desired.max_list_size = LIST_SIZE;
 
     /* do the real init */
     ret = PtlNIInit(PTL_IFACE_DEFAULT, PTL_NI_MATCHING | PTL_NI_PHYSICAL,

http://git.mpich.org/mpich.git/commitdiff/47355ef010b086648af9960bdb77d932437d36b6

commit 47355ef010b086648af9960bdb77d932437d36b6
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Thu Nov 13 13:10:20 2014 -0600

    portals4: handle PTL_NO_SPACE
    
    It is possible that PtlMEAppend can return a PTL_NO_SPACE error, meaning
    there are too many outstanding operations already active. To avoid an abort
    we simply retry after processing events that have queued up locally.
    
    Signed-off-by: Antonio J. Pena <apenya at mcs.anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c
index 9a1000e..02fcc0b 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c
@@ -134,8 +134,15 @@ static inline int meappend_large(ptl_process_t id, MPID_Request *req, ptl_match_
 
         ++REQ_PTL(req)->num_gets;
 
-        ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_control_pt, &me, PTL_PRIORITY_LIST, req,
-                          &foo_me_handle);
+        /* if there is no space to append the entry, process outstanding events and try again */
+        while (1) {
+            ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_control_pt, &me, PTL_PRIORITY_LIST, req,
+                              &foo_me_handle);
+            if (ret != PTL_NO_SPACE)
+                break;
+            MPID_nem_ptl_poll(1);
+        }
+
         MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmeappend", "**ptlmeappend %s",
                              MPID_nem_ptl_strerror(ret));
         MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST, "PtlMEAppend(req=%p tag=%#lx)", req, tag));
@@ -443,8 +450,14 @@ int MPID_nem_ptl_nm_ctl_event_handler(const ptl_event_t *e)
             }
 
             /* Repost the recv buffer */
-            ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_control_pt, &mes[buf_idx],
-                              PTL_PRIORITY_LIST, e->user_ptr /* buf_idx */, &me_handles[buf_idx]);
+            /* if there is no space to append the entry, process outstanding events and try again */
+            while (1) {
+                ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_control_pt, &mes[buf_idx],
+                                  PTL_PRIORITY_LIST, e->user_ptr /* buf_idx */, &me_handles[buf_idx]);
+                if (ret != PTL_NO_SPACE)
+                    break;
+                MPID_nem_ptl_poll(1);
+            }
             MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmeappend",
                                  "**ptlmeappend %s", MPID_nem_ptl_strerror(ret));
         }
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c
index c016952..0a3bf10 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c
@@ -105,8 +105,14 @@ static int append_overflow(int i)
     me.ignore_bits = ~((ptl_match_bits_t)0);
     me.min_free = PTL_MAX_EAGER;
     
-    ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_pt, &me, PTL_OVERFLOW_LIST, (void *)(size_t)i,
-                      &overflow_me_handle[i]);
+    /* if there is no space to append the entry, process outstanding events and try again */
+    while (1) {
+        ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_pt, &me, PTL_OVERFLOW_LIST, (void *)(size_t)i,
+                          &overflow_me_handle[i]);
+        if (ret != PTL_NO_SPACE)
+            break;
+        MPID_nem_ptl_poll(1);
+    }
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmeappend", "**ptlmeappend %s", MPID_nem_ptl_strerror(ret));
 
  fn_exit:
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c
index ec6d90a..3633236 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c
@@ -543,7 +543,13 @@ int MPID_nem_ptl_recv_posted(MPIDI_VC_t *vc, MPID_Request *rreq)
         
     }
 
-    ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_pt, &me, PTL_PRIORITY_LIST, rreq, &REQ_PTL(rreq)->put_me);
+    /* if there is no space to append the entry, process outstanding events and try again */
+    while (1) {
+        ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_pt, &me, PTL_PRIORITY_LIST, rreq, &REQ_PTL(rreq)->put_me);
+        if (ret != PTL_NO_SPACE)
+            break;
+        MPID_nem_ptl_poll(1);
+    }
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmeappend", "**ptlmeappend %s", MPID_nem_ptl_strerror(ret));
     DBG_MSG_MEAPPEND("REG", vc ? vc->pg_rank : MPI_ANY_SOURCE, me, rreq);
     MPIU_DBG_MSG_P(CH3_CHANNEL, VERBOSE, "    buf=%p", me.start);
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
index 34a1b4f..f6402f9 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
@@ -41,7 +41,13 @@ static void big_meappend(void *buf, ptl_size_t left_to_send, MPIDI_VC_t *vc, ptl
         else
             me.length = left_to_send;
 
-        ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_get_pt, &me, PTL_PRIORITY_LIST, sreq, &REQ_PTL(sreq)->get_me_p[i]);
+        /* if there is no space to append the entry, process outstanding events and try again */
+        while (1) {
+            ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_get_pt, &me, PTL_PRIORITY_LIST, sreq, &REQ_PTL(sreq)->get_me_p[i]);
+            if (ret != PTL_NO_SPACE)
+                break;
+            MPID_nem_ptl_poll(1);
+        }
         DBG_MSG_MEAPPEND("CTL", vc->pg_rank, me, sreq);
         MPIU_Assert(ret == 0);
 
@@ -381,8 +387,14 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
                 MPIU_CHKPMEM_MALLOC(REQ_PTL(sreq)->get_me_p, ptl_handle_me_t *, sizeof(ptl_handle_me_t), mpi_errno, "get_me_p");
 
                 REQ_PTL(sreq)->num_gets = 1;
-                ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_get_pt, &me, PTL_PRIORITY_LIST, sreq,
-                                  &REQ_PTL(sreq)->get_me_p[0]);
+                /* if there is no space to append the entry, process outstanding events and try again */
+                while (1) {
+                    ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_get_pt, &me, PTL_PRIORITY_LIST, sreq,
+                                      &REQ_PTL(sreq)->get_me_p[0]);
+                    if (ret != PTL_NO_SPACE)
+                        break;
+                    MPID_nem_ptl_poll(1);
+                }
                 MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmeappend", "**ptlmeappend %s", MPID_nem_ptl_strerror(ret));
                 DBG_MSG_MEAPPEND("CTL", vc->pg_rank, me, sreq);
 
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl_init.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl_init.c
index 9262c5e..ce6113d 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl_init.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl_init.c
@@ -35,7 +35,11 @@ int rptli_post_control_buffer(ptl_handle_ni_t ni_handle, ptl_pt_index_t pt,
     me.ignore_bits = 0;
     me.min_free = 0;
 
-    ret = PtlMEAppend(ni_handle, pt, &me, PTL_PRIORITY_LIST, NULL, me_handle);
+    while (1) {
+        ret = PtlMEAppend(ni_handle, pt, &me, PTL_PRIORITY_LIST, NULL, me_handle);
+        if (ret != PTL_NO_SPACE)
+            break;
+    }
     RPTLU_ERR_POP(ret, "Error appending empty buffer to priority list\n");
 
   fn_exit:

http://git.mpich.org/mpich.git/commitdiff/0addd1519ebb06594534f2a1105fa215fbcaeb14

commit 0addd1519ebb06594534f2a1105fa215fbcaeb14
Author: Junchao Zhang <jczhang at mcs.anl.gov>
Date:   Mon Nov 17 20:23:57 2014 -0600

    Add MPI_Status_f2f08 & MPI_Status_f082f
    
    No reviewer

diff --git a/src/binding/fortran/use_mpi_f08/mpi_f08_types.F90 b/src/binding/fortran/use_mpi_f08/mpi_f08_types.F90
index f87e1d4..f669123 100644
--- a/src/binding/fortran/use_mpi_f08/mpi_f08_types.F90
+++ b/src/binding/fortran/use_mpi_f08/mpi_f08_types.F90
@@ -385,6 +385,30 @@ subroutine MPI_Sizeof_xcomplex128 (x, size, ierror)
     ierror = 0
 end subroutine MPI_Sizeof_xcomplex128
 
+subroutine MPI_Status_f2f08(f_status, f08_status, ierror)
+    integer, intent(in) :: f_status(MPI_STATUS_SIZE)
+    type(MPI_Status), intent(out) :: f08_status
+    integer, optional,  intent(out) :: ierror
+    f08_status%count_lo = f_status(1)
+    f08_status%count_hi_and_cancelled = f_status(2)
+    f08_status%MPI_SOURCE = f_status(MPI_SOURCE)
+    f08_status%MPI_TAG = f_status(MPI_TAG)
+    f08_status%MPI_ERROR = f_status(MPI_ERROR)
+    if (present(ierror)) ierror = 0
+end subroutine
+
+subroutine MPI_Status_f082f(f08_status, f_status, ierror)
+    type(MPI_Status), intent(in) :: f08_status
+    integer, intent(out) :: f_status(MPI_STATUS_SIZE)
+    integer, optional,  intent(out) :: ierror
+    f_status(1) = f08_status%count_lo
+    f_status(2) = f08_status%count_hi_and_cancelled
+    f_status(MPI_SOURCE) = f08_status%MPI_SOURCE
+    f_status(MPI_TAG) = f08_status%MPI_TAG
+    f_status(MPI_ERROR) = f08_status%MPI_ERROR
+    if (present(ierror)) ierror = 0
+end subroutine
+
 elemental subroutine MPI_Status_f08_assgn_c (status_f08, status_c)
     ! Defined status_f08 = status_c
     type(MPI_Status),intent(out) :: status_f08

http://git.mpich.org/mpich.git/commitdiff/3261c72f95bfaedff77544eea5ba1a15ad37c976

commit 3261c72f95bfaedff77544eea5ba1a15ad37c976
Author: Junchao Zhang <jczhang at mcs.anl.gov>
Date:   Mon Nov 17 20:23:34 2014 -0600

    Fix an arg name in MPI_TYPE_NULL_DELETE_FN
    
    No reviewer

diff --git a/src/binding/fortran/use_mpi_f08/mpi_f08_callbacks.F90 b/src/binding/fortran/use_mpi_f08/mpi_f08_callbacks.F90
index 4a596dd..9de14e9 100644
--- a/src/binding/fortran/use_mpi_f08/mpi_f08_callbacks.F90
+++ b/src/binding/fortran/use_mpi_f08/mpi_f08_callbacks.F90
@@ -237,12 +237,12 @@ subroutine MPI_TYPE_NULL_COPY_FN(oldtype,type_keyval,extra_state, &
     ierror = MPI_SUCCESS
 end subroutine
 
-subroutine MPI_TYPE_NULL_DELETE_FN(type,type_keyval, &
+subroutine MPI_TYPE_NULL_DELETE_FN(datatype,type_keyval, &
        attribute_val, extra_state, ierror)
     use mpi_f08_types, only : MPI_Datatype
     use mpi_f08_compile_constants, only : MPI_ADDRESS_KIND, MPI_SUCCESS
     implicit none
-    type(MPI_Datatype) :: type
+    type(MPI_Datatype) :: datatype
     integer :: type_keyval, ierror
     integer(kind=MPI_ADDRESS_KIND) :: attribute_val, extra_state
 

http://git.mpich.org/mpich.git/commitdiff/7c5ef0245c72a67a9d013c276ce2fa106e198426

commit 7c5ef0245c72a67a9d013c276ce2fa106e198426
Author: Junchao Zhang <jczhang at mcs.anl.gov>
Date:   Mon Nov 17 20:22:56 2014 -0600

    Fix an arg name in MPI_Comm_set_info
    
    No reviewer

diff --git a/src/binding/fortran/use_mpi_f08/mpi_f08.F90 b/src/binding/fortran/use_mpi_f08/mpi_f08.F90
index bd4c52d..e225dbb 100644
--- a/src/binding/fortran/use_mpi_f08/mpi_f08.F90
+++ b/src/binding/fortran/use_mpi_f08/mpi_f08.F90
@@ -1557,11 +1557,11 @@ interface MPI_Comm_set_attr
 end interface MPI_Comm_set_attr
 
 interface MPI_Comm_set_info
-    subroutine MPI_Comm_set_info_f08(comm, info_used, ierror)
+    subroutine MPI_Comm_set_info_f08(comm, info, ierror)
         use :: mpi_f08_types, only : MPI_Comm, MPI_Info
         implicit none
         type(MPI_Comm), intent(in) :: comm
-        type(MPI_Info), intent(in) :: info_used
+        type(MPI_Info), intent(in) :: info
         integer, optional, intent(out) :: ierror
     end subroutine MPI_Comm_set_info_f08
 end interface MPI_Comm_set_info

http://git.mpich.org/mpich.git/commitdiff/5b9ff6f2d96beea35ac4d1c256ea2dba06ec06b0

commit 5b9ff6f2d96beea35ac4d1c256ea2dba06ec06b0
Author: Junchao Zhang <jczhang at mcs.anl.gov>
Date:   Mon Nov 17 20:41:17 2014 -0600

    Fix a typo for MPI_TYPE_NULL
    
    It should be MPI_DATATYPE_NULL. MPI does not have MPI_TYPE_NULL.
    
    Signed-off-by: Huiwei Lu <huiweilu at mcs.anl.gov>

diff --git a/src/mpid/common/sched/mpid_sched.c b/src/mpid/common/sched/mpid_sched.c
index 2fab4d6..21082d1 100644
--- a/src/mpid/common/sched/mpid_sched.c
+++ b/src/mpid/common/sched/mpid_sched.c
@@ -19,7 +19,7 @@
 #endif
 
 /* helper macros to improve code readability */
-/* we pessimistically assume that MPI_TYPE_NULL may be passed as a "valid" type
+/* we pessimistically assume that MPI_DATATYPE_NULL may be passed as a "valid" type
  * for send/recv when MPI_PROC_NULL is the destination/src */
 #define dtype_add_ref_if_not_builtin(datatype_)                    \
     do {                                                           \
diff --git a/test/mpi/f08/ext/ctypesinf90.f90 b/test/mpi/f08/ext/ctypesinf90.f90
index 815d9b0..131761f 100644
--- a/test/mpi/f08/ext/ctypesinf90.f90
+++ b/test/mpi/f08/ext/ctypesinf90.f90
@@ -31,11 +31,11 @@
       errs = errs + f2ctype( MPI_LONG_INT, 14 )
       errs = errs + f2ctype( MPI_SHORT_INT, 15 )
       errs = errs + f2ctype( MPI_2INT, 16 )
-      if (MPI_LONG_DOUBLE .ne. MPI_TYPE_NULL) then
+      if (MPI_LONG_DOUBLE .ne. MPI_DATATYPE_NULL) then
           errs = errs + f2ctype( MPI_LONG_DOUBLE, 17 )
           errs = errs + f2ctype( MPI_LONG_DOUBLE_INT, 21 )
       endif
-      if (MPI_LONG_LONG .ne. MPI_TYPE_NULL) then
+      if (MPI_LONG_LONG .ne. MPI_DATATYPE_NULL) then
           errs = errs + f2ctype( MPI_LONG_LONG_INT, 18 )
           errs = errs + f2ctype( MPI_LONG_LONG, 19 )
           errs = errs + f2ctype( MPI_UNSIGNED_LONG_LONG, 20 )
diff --git a/test/mpi/f77/ext/ctypesinf.f b/test/mpi/f77/ext/ctypesinf.f
index f4d8ccb..5ee8d8c 100644
--- a/test/mpi/f77/ext/ctypesinf.f
+++ b/test/mpi/f77/ext/ctypesinf.f
@@ -31,11 +31,11 @@ C
       errs = errs + f2ctype( MPI_LONG_INT, 14 )
       errs = errs + f2ctype( MPI_SHORT_INT, 15 )
       errs = errs + f2ctype( MPI_2INT, 16 )
-      if (MPI_LONG_DOUBLE .ne. MPI_TYPE_NULL) then
+      if (MPI_LONG_DOUBLE .ne. MPI_DATATYPE_NULL) then
           errs = errs + f2ctype( MPI_LONG_DOUBLE, 17 )
           errs = errs + f2ctype( MPI_LONG_DOUBLE_INT, 21 )
       endif
-      if (MPI_LONG_LONG .ne. MPI_TYPE_NULL) then
+      if (MPI_LONG_LONG .ne. MPI_DATATYPE_NULL) then
           errs = errs + f2ctype( MPI_LONG_LONG_INT, 18 )
           errs = errs + f2ctype( MPI_LONG_LONG, 19 )
           errs = errs + f2ctype( MPI_UNSIGNED_LONG_LONG, 20 )

http://git.mpich.org/mpich.git/commitdiff/e1199dd62a56770326242a128b613e19a80a04d3

commit e1199dd62a56770326242a128b613e19a80a04d3
Author: Min Si <msi at il.is.s.u-tokyo.ac.jp>
Date:   Fri Nov 14 16:37:18 2014 -0600

    Increase bcast2 and bcast3 timelimit to 20mins.
    
    Some overloaded nightly test nodes use almost 20 minutes for running
    these tests. We increase their time limit for now to easily figure out
    other bugs reported by nightly test.

diff --git a/test/mpi/coll/testlist b/test/mpi/coll/testlist
index 36686e7..1a10ae1 100644
--- a/test/mpi/coll/testlist
+++ b/test/mpi/coll/testlist
@@ -35,8 +35,8 @@ bcasttest 10
 bcast2 4
 # More that 8 processes are required to get bcast to switch to the long
 # msg algorithm (see coll definitions in mpiimpl.h)
-bcast2 10 timeLimit=780
-bcast3 10 timeLimit=780
+bcast2 10 timeLimit=1200
+bcast3 10 timeLimit=1200
 bcastzerotype 1
 bcastzerotype 4
 bcastzerotype 5

http://git.mpich.org/mpich.git/commitdiff/ccfdd60e7719ec9c66dcf7008dda65d6f1aa371a

commit ccfdd60e7719ec9c66dcf7008dda65d6f1aa371a
Author: Antonio Pena Monferrer <apenya at mcs.anl.gov>
Date:   Thu Nov 13 19:50:07 2014 -0600

    Fix strict-aliasing rules break in Portals netmod
    
    Going from a macro to a function fixes the issue because of creating a
    copy of the pointer.
    
    Signed-off-by: Ken Raffenetti <raffenet at mcs.anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
index 5ab1eef..8f39d73 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
@@ -58,7 +58,9 @@ typedef struct {
 } MPID_nem_ptl_req_area;
 
 /* macro for ptl private in req */
-#define REQ_PTL(req) ((MPID_nem_ptl_req_area *)((req)->ch.netmod_area.padding))
+static inline MPID_nem_ptl_req_area * REQ_PTL(MPID_Request *req) {
+    return (MPID_nem_ptl_req_area *)req->ch.netmod_area.padding;
+}
 
 #define MPID_nem_ptl_init_req(req_) do {                        \
         int i;                                                  \

http://git.mpich.org/mpich.git/commitdiff/ac358d325368843c6a45bcb3948803ff4f5d473f

commit ac358d325368843c6a45bcb3948803ff4f5d473f
Author: Pavan Balaji <balaji at anl.gov>
Date:   Thu Nov 13 13:05:05 2014 -0600

    Clean up event stashing.
    
    Now, when we pop an event, we queue up the buddy event (e.g., ACK for
    SEND) to return next.  This way, we don't need to search for the event
    everytime.  Since we know that there'll be at most one such pending
    event, we maintain a single event structure for this.
    
    Signed-off-by: Ken Raffenetti <raffenet at mcs.anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c
index 08b858d..6e58e8f 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c
@@ -614,55 +614,8 @@ static int stash_event(struct rptl_op *op, ptl_event_t event)
 }
 
 
-#undef FUNCNAME
-#define FUNCNAME retrieve_event
-#undef FCNAME
-#define FCNAME MPIU_QUOTE(FUNCNAME)
-static int retrieve_event(ptl_event_t * event)
-{
-    struct rptl_target *target;
-    struct rptl_op *op;
-    int have_event = 0;
-    MPIDI_STATE_DECL(MPID_STATE_RETRIEVE_EVENT);
-
-    MPIDI_FUNC_ENTER(MPID_STATE_RETRIEVE_EVENT);
-
-    /* FIXME: this is an expensive loop over all pending operations
-     * everytime the user does an eqget */
-    for (target = rptl_info.target_list; target; target = target->next) {
-        for (op = target->data_op_list; op; op = op->next) {
-            if (op->events_ready) {
-                assert(op->op_type == RPTL_OP_PUT);
-                assert(op->u.put.send || op->u.put.ack);
-
-                if (op->u.put.send) {
-                    memcpy(event, op->u.put.send, sizeof(ptl_event_t));
-                    MPIU_Free(op->u.put.send);
-                    op->u.put.send = NULL;
-                }
-                else {
-                    memcpy(event, op->u.put.ack, sizeof(ptl_event_t));
-                    MPIU_Free(op->u.put.ack);
-                    op->u.put.ack = NULL;
-                }
-                event->user_ptr = op->u.put.user_ptr;
-
-                MPL_DL_DELETE(target->data_op_list, op);
-                rptli_op_free(op);
-
-                have_event = 1;
-                goto fn_exit;
-            }
-        }
-    }
-
-  fn_exit:
-    MPIDI_FUNC_EXIT(MPID_STATE_RETRIEVE_EVENT);
-    return have_event;
-
-  fn_fail:
-    goto fn_exit;
-}
+static ptl_event_t pending_event;
+static int pending_event_valid = 0;
 
 #undef FUNCNAME
 #define FUNCNAME MPID_nem_ptl_rptl_eqget
@@ -684,7 +637,9 @@ int MPID_nem_ptl_rptl_eqget(ptl_handle_eq_t eq_handle, ptl_event_t * event)
 
     /* before we poll the eq, we need to check if there are any
      * completed operations that need to be returned */
-    if (retrieve_event(event)) {
+    if (pending_event_valid) {
+        memcpy(event, &pending_event, sizeof(ptl_event_t));
+        pending_event_valid = 0;
         ret = PTL_OK;
         goto fn_exit;
     }
@@ -865,6 +820,21 @@ int MPID_nem_ptl_rptl_eqget(ptl_handle_eq_t eq_handle, ptl_event_t * event)
                 /* drop the send event */
                 ret = PTL_EQ_EMPTY;
             }
+            else {
+                /* if the message is over the data portal, we'll
+                 * return the send event.  if the user asked for an
+                 * ACK, we will enqueue the ack to be returned
+                 * next. */
+                if (op->u.put.ack_req & PTL_ACK_REQ) {
+                    /* only one event should be pending */
+                    assert(pending_event_valid == 0);
+                    memcpy(&pending_event, op->u.put.ack, sizeof(ptl_event_t));
+                    pending_event_valid = 1;
+                }
+                MPIU_Free(op->u.put.ack);
+                MPL_DL_DELETE(op->target->data_op_list, op);
+                rptli_op_free(op);
+            }
         }
 
         else if (event->type == PTL_EVENT_ACK && op->u.put.send) {
@@ -876,25 +846,40 @@ int MPID_nem_ptl_rptl_eqget(ptl_handle_eq_t eq_handle, ptl_event_t * event)
             op->events_ready = 1;
             event->user_ptr = op->u.put.user_ptr;
 
-            /* if the message is over the control portal, ignore ACK
-             * event */
+            /* if the message is over the control portal, ignore both
+             * events */
             if (op->u.put.pt_type == RPTL_PT_CONTROL) {
+                /* drop the send event */
                 MPIU_Free(op->u.put.send);
                 MPL_DL_DELETE(op->target->control_op_list, op);
                 rptli_op_free(op);
+
+                /* drop the ack event */
                 ret = PTL_EQ_EMPTY;
             }
+            else {
+                /* if the message is over the data portal, we'll
+                 * return the send event.  if the user asked for an
+                 * ACK, we will enqueue the ack to be returned
+                 * next. */
+                if (op->u.put.ack_req & PTL_ACK_REQ) {
+                    /* user asked for an ACK, so return it to the user
+                     * and queue up the SEND event for next time */
+                    memcpy(&pending_event, op->u.put.send, sizeof(ptl_event_t));
+                    MPIU_Free(op->u.put.send);
+                    assert(pending_event_valid == 0);
+                    pending_event_valid = 1;
+                }
+                else {
+                    /* user didn't ask for an ACK, overwrite the ACK
+                     * event with the pending send event */
+                    memcpy(event, op->u.put.send, sizeof(ptl_event_t));
+                    MPIU_Free(op->u.put.send);
 
-            /* if the user did not ask for an ACK discard this event
-             * and return the send event. */
-            else if (!(op->u.put.ack_req & PTL_ACK_REQ)) {
-                memcpy(event, op->u.put.send, sizeof(ptl_event_t));
-                MPIU_Free(op->u.put.send);
-
-                /* set the event user pointer again, since we copied
-                 * over the original event */
-                event->user_ptr = op->u.put.user_ptr;
-
+                    /* set the event user pointer again, since we
+                     * copied over the original event */
+                    event->user_ptr = op->u.put.user_ptr;
+                }
                 /* we should be in the data op list */
                 MPL_DL_DELETE(op->target->data_op_list, op);
                 rptli_op_free(op);

http://git.mpich.org/mpich.git/commitdiff/5f9e6b17d9a74d4288c0bec30162394b8cc1b4f3

commit 5f9e6b17d9a74d4288c0bec30162394b8cc1b4f3
Author: Pavan Balaji <balaji at anl.gov>
Date:   Thu Nov 13 12:46:42 2014 -0600

    On an error drop all existing events for that op.
    
    We were stashing events when the origin receives a NACK.  This is
    unnecessary since we retransmit the op and never use those stashed
    events.
    
    Signed-off-by: Ken Raffenetti <raffenet at mcs.anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c
index 6bbe304..08b858d 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c
@@ -807,21 +807,16 @@ int MPID_nem_ptl_rptl_eqget(ptl_handle_eq_t eq_handle, ptl_event_t * event)
                 assert(!(event->type == PTL_EVENT_SEND && op->u.put.send));
                 assert(!(event->type == PTL_EVENT_ACK && op->u.put.ack));
 
-                /* if we have received both events, discard them.
-                 * otherwise, stash the one we received while waiting
-                 * for the other. */
-                if (event->type == PTL_EVENT_SEND && op->u.put.ack) {
+                /* discard pending events, since we will retransmit
+                 * this op anyway */
+                if (op->u.put.ack) {
                     MPIU_Free(op->u.put.ack);
                     op->u.put.ack = NULL;
                 }
-                else if (event->type == PTL_EVENT_ACK && op->u.put.send) {
+                if (op->u.put.send) {
                     MPIU_Free(op->u.put.send);
                     op->u.put.send = NULL;
                 }
-                else {
-                    ret = stash_event(op, *event);
-                    RPTLU_ERR_POP(ret, "error stashing event\n");
-                }
             }
 
             if (op->op_type == RPTL_OP_PUT)

http://git.mpich.org/mpich.git/commitdiff/60c5695305edd71b724127eb2f4daa201247151a

commit 60c5695305edd71b724127eb2f4daa201247151a
Author: Pavan Balaji <balaji at anl.gov>
Date:   Thu Nov 13 12:45:46 2014 -0600

    Additional comments to explain what we are doing.
    
    Signed-off-by: Ken Raffenetti <raffenet at mcs.anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c
index 3b603b7..6bbe304 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c
@@ -859,12 +859,15 @@ int MPID_nem_ptl_rptl_eqget(ptl_handle_eq_t eq_handle, ptl_event_t * event)
             op->events_ready = 1;
             event->user_ptr = op->u.put.user_ptr;
 
-            /* if the message is over the control portal, ignore the
-             * ACK event */
+            /* if the message is over the control portal, ignore both
+             * events */
             if (op->u.put.pt_type == RPTL_PT_CONTROL) {
+                /* drop the ack event */
                 MPIU_Free(op->u.put.ack);
                 MPL_DL_DELETE(op->target->control_op_list, op);
                 rptli_op_free(op);
+
+                /* drop the send event */
                 ret = PTL_EQ_EMPTY;
             }
         }

http://git.mpich.org/mpich.git/commitdiff/e595b05e7d03095ef8dddb08c0c9457cc3d3f040

commit e595b05e7d03095ef8dddb08c0c9457cc3d3f040
Author: Pavan Balaji <balaji at anl.gov>
Date:   Wed Nov 12 20:41:08 2014 -0600

    rportals code refactoring.
    
    1. Moved op management to a different file.
    
    2. Move rptl_info to an extern, so it can be shared by multiple files.
    
    3. Separate out rptl initialization routines.
    
    Signed-off-by: Ken Raffenetti <raffenet at mcs.anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/Makefile.mk b/src/mpid/ch3/channels/nemesis/netmod/portals4/Makefile.mk
index 06c26d1..764e821 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/Makefile.mk
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/Makefile.mk
@@ -16,7 +16,9 @@ mpi_core_sources +=					\
     src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c	        \
     src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c            \
     src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_lmt.c             \
-    src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c
+    src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c                \
+    src/mpid/ch3/channels/nemesis/netmod/portals4/rptl_init.c           \
+    src/mpid/ch3/channels/nemesis/netmod/portals4/rptl_op.c
 
 noinst_HEADERS +=                                                \
     src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h     \
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c
index cbcc81c..3b603b7 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c
@@ -60,15 +60,7 @@
 #define IDS_ARE_EQUAL(t1, t2) \
     (t1.phys.nid == t2.phys.nid && t1.phys.pid == t2.phys.pid)
 
-static struct {
-    struct rptl *rptl_list;
-    struct rptl_target *target_list;
-
-    int world_size;
-    uint64_t origin_events_left;
-    int (*get_target_info) (int rank, ptl_process_t * id, ptl_pt_index_t local_data_pt,
-                            ptl_pt_index_t * target_data_pt, ptl_pt_index_t * target_control_pt);
-} rptl_info;
+struct rptl_info rptl_info;
 
 
 #undef FUNCNAME
@@ -118,292 +110,6 @@ static int find_target(ptl_process_t id, struct rptl_target **target)
 }
 
 
-#undef FUNCNAME
-#define FUNCNAME MPID_nem_ptl_rptl_init
-#undef FCNAME
-#define FCNAME MPIU_QUOTE(FUNCNAME)
-int MPID_nem_ptl_rptl_init(int world_size, uint64_t max_origin_events,
-                           int (*get_target_info) (int rank, ptl_process_t * id,
-                                                   ptl_pt_index_t local_data_pt,
-                                                   ptl_pt_index_t * target_data_pt,
-                                                   ptl_pt_index_t * target_control_pt))
-{
-    int ret = PTL_OK;
-    MPIDI_STATE_DECL(MPID_STATE_MPID_NEM_PTL_RPTL_INIT);
-
-    MPIDI_FUNC_ENTER(MPID_STATE_MPID_NEM_PTL_RPTL_INIT);
-
-    rptl_info.rptl_list = NULL;
-    rptl_info.target_list = NULL;
-
-    rptl_info.world_size = world_size;
-    rptl_info.origin_events_left = max_origin_events;
-    rptl_info.get_target_info = get_target_info;
-
-  fn_exit:
-    MPIDI_FUNC_EXIT(MPID_STATE_MPID_NEM_PTL_RPTL_INIT);
-    return ret;
-
-  fn_fail:
-    goto fn_exit;
-}
-
-
-#undef FUNCNAME
-#define FUNCNAME MPID_nem_ptl_rptl_drain_eq
-#undef FCNAME
-#define FCNAME MPIU_QUOTE(FUNCNAME)
-int MPID_nem_ptl_rptl_drain_eq(int eq_count, ptl_handle_eq_t *eq)
-{
-    int ret = PTL_OK;
-    ptl_event_t event;
-    struct rptl_op_pool_segment *op_segment;
-    int i;
-    struct rptl_target *target, *t;
-    MPIDI_STATE_DECL(MPID_STATE_MPID_NEM_PTL_RPTL_FINALIZE);
-
-    MPIDI_FUNC_ENTER(MPID_STATE_MPID_NEM_PTL_RPTL_FINALIZE);
-
-    for (target = rptl_info.target_list; target; target = target->next) {
-        while (target->control_op_list || target->data_op_list) {
-            for (i = 0; i < eq_count; i++) {
-                /* read and ignore all events */
-                ret = MPID_nem_ptl_rptl_eqget(eq[i], &event);
-                if (ret == PTL_EQ_EMPTY)
-                    ret = PTL_OK;
-                RPTLU_ERR_POP(ret, "Error calling MPID_nem_ptl_rptl_eqget\n");
-            }
-        }
-    }
-
-    for (target = rptl_info.target_list; target;) {
-        assert(target->data_op_list == NULL);
-        assert(target->control_op_list == NULL);
-
-        while (target->op_segment_list) {
-            op_segment = target->op_segment_list;
-            MPL_DL_DELETE(target->op_segment_list, op_segment);
-            MPIU_Free(op_segment);
-        }
-
-        t = target->next;
-        MPIU_Free(target);
-        target = t;
-    }
-
-  fn_exit:
-    MPIDI_FUNC_EXIT(MPID_STATE_MPID_NEM_PTL_RPTL_FINALIZE);
-    return ret;
-
-  fn_fail:
-    goto fn_exit;
-}
-
-
-#undef FUNCNAME
-#define FUNCNAME post_empty_buffer
-#undef FCNAME
-#define FCNAME MPIU_QUOTE(FUNCNAME)
-static inline int post_empty_buffer(ptl_handle_ni_t ni_handle, ptl_pt_index_t pt,
-                                    ptl_handle_me_t * me_handle)
-{
-    int ret;
-    ptl_me_t me;
-    ptl_process_t id;
-    MPIDI_STATE_DECL(MPID_STATE_POST_EMPTY_BUFFER);
-
-    MPIDI_FUNC_ENTER(MPID_STATE_POST_EMPTY_BUFFER);
-
-    id.phys.nid = PTL_NID_ANY;
-    id.phys.pid = PTL_PID_ANY;
-
-    me.start = NULL;
-    me.length = 0;
-    me.ct_handle = PTL_CT_NONE;
-    me.uid = PTL_UID_ANY;
-    me.options = (PTL_ME_OP_PUT | PTL_ME_OP_GET | PTL_ME_USE_ONCE | PTL_ME_IS_ACCESSIBLE |
-                  PTL_ME_EVENT_LINK_DISABLE | PTL_ME_EVENT_UNLINK_DISABLE);
-    me.match_id = id;
-    me.match_bits = 0;
-    me.ignore_bits = 0;
-    me.min_free = 0;
-
-    ret = PtlMEAppend(ni_handle, pt, &me, PTL_PRIORITY_LIST, NULL, me_handle);
-    RPTLU_ERR_POP(ret, "Error appending empty buffer to priority list\n");
-
-  fn_exit:
-    MPIDI_FUNC_EXIT(MPID_STATE_POST_EMPTY_BUFFER);
-    return ret;
-
-  fn_fail:
-    goto fn_exit;
-}
-
-
-#undef FUNCNAME
-#define FUNCNAME MPID_nem_ptl_rptl_ptinit
-#undef FCNAME
-#define FCNAME MPIU_QUOTE(FUNCNAME)
-int MPID_nem_ptl_rptl_ptinit(ptl_handle_ni_t ni_handle, ptl_handle_eq_t eq_handle, ptl_pt_index_t data_pt,
-                             ptl_pt_index_t control_pt)
-{
-    int ret = PTL_OK;
-    struct rptl *rptl;
-    int mpi_errno = MPI_SUCCESS;
-    int i;
-    ptl_md_t md;
-    MPIU_CHKPMEM_DECL(2);
-    MPIDI_STATE_DECL(MPID_STATE_MPID_NEM_PTL_RPTL_PTINIT);
-
-    MPIDI_FUNC_ENTER(MPID_STATE_MPID_NEM_PTL_RPTL_PTINIT);
-
-
-    /* setup the parts of rptls that can be done before world size or
-     * target information */
-    MPIU_CHKPMEM_MALLOC(rptl, struct rptl *, sizeof(struct rptl), mpi_errno, "rptl");
-    MPL_DL_APPEND(rptl_info.rptl_list, rptl);
-
-    rptl->local_state = RPTL_LOCAL_STATE_ACTIVE;
-    rptl->pause_ack_counter = 0;
-
-    rptl->data.ob_max_count = 0;
-    rptl->data.ob_curr_count = 0;
-
-    rptl->data.pt = data_pt;
-    rptl->control.pt = control_pt;
-
-    rptl->ni = ni_handle;
-    rptl->eq = eq_handle;
-
-    md.start = 0;
-    md.length = (ptl_size_t) (-1);
-    md.options = 0x0;
-    md.eq_handle = rptl->eq;
-    md.ct_handle = PTL_CT_NONE;
-    ret = PtlMDBind(rptl->ni, &md, &rptl->md);
-    RPTLU_ERR_POP(ret, "Error binding new global MD\n");
-
-    /* post world_size number of empty buffers on the control portal */
-    if (rptl->control.pt != PTL_PT_ANY) {
-        MPIU_CHKPMEM_MALLOC(rptl->control.me, ptl_handle_me_t *,
-                            2 * rptl_info.world_size * sizeof(ptl_handle_me_t), mpi_errno,
-                            "rptl target info");
-        for (i = 0; i < 2 * rptl_info.world_size; i++) {
-            ret = post_empty_buffer(rptl->ni, rptl->control.pt, &rptl->control.me[i]);
-            RPTLU_ERR_POP(ret, "Error in post_empty_buffer\n");
-        }
-        rptl->control.me_idx = 0;
-    }
-
-  fn_exit:
-    MPIU_CHKPMEM_COMMIT();
-    MPIDI_FUNC_EXIT(MPID_STATE_MPID_NEM_PTL_RPTL_PTINIT);
-    return ret;
-
-  fn_fail:
-    if (mpi_errno)
-        ret = PTL_FAIL;
-    MPIU_CHKPMEM_REAP();
-    goto fn_exit;
-}
-
-
-#undef FUNCNAME
-#define FUNCNAME MPID_nem_ptl_rptl_ptfini
-#undef FCNAME
-#define FCNAME MPIU_QUOTE(FUNCNAME)
-int MPID_nem_ptl_rptl_ptfini(ptl_pt_index_t pt_index)
-{
-    int i;
-    int ret = PTL_OK;
-    struct rptl *rptl;
-    MPIDI_STATE_DECL(MPID_STATE_MPID_NEM_PTL_RPTL_PTFINI);
-
-    MPIDI_FUNC_ENTER(MPID_STATE_MPID_NEM_PTL_RPTL_PTFINI);
-
-    /* find the right rptl */
-    for (rptl = rptl_info.rptl_list; rptl && rptl->data.pt != pt_index; rptl = rptl->next);
-    assert(rptl);
-
-    /* free control portals that were created */
-    if (rptl->control.pt != PTL_PT_ANY) {
-        for (i = 0; i < rptl_info.world_size * 2; i++) {
-            ret = PtlMEUnlink(rptl->control.me[i]);
-            RPTLU_ERR_POP(ret, "Error unlinking control buffers\n");
-        }
-        MPIU_Free(rptl->control.me);
-    }
-
-    MPL_DL_DELETE(rptl_info.rptl_list, rptl);
-    MPIU_Free(rptl);
-
-  fn_exit:
-    MPIDI_FUNC_EXIT(MPID_STATE_MPID_NEM_PTL_RPTL_PTFINI);
-    return ret;
-
-  fn_fail:
-    goto fn_exit;
-}
-
-
-#undef FUNCNAME
-#define FUNCNAME alloc_op
-#undef FCNAME
-#define FCNAME MPIU_QUOTE(FUNCNAME)
-static int alloc_op(struct rptl_op **op, struct rptl_target *target)
-{
-    int ret = PTL_OK;
-    struct rptl_op_pool_segment *op_segment;
-    int mpi_errno = MPI_SUCCESS;
-    int i;
-    MPIU_CHKPMEM_DECL(1);
-    MPIDI_STATE_DECL(MPID_STATE_ALLOC_OP);
-
-    MPIDI_FUNC_ENTER(MPID_STATE_ALLOC_OP);
-
-    assert(target);
-
-    if (target->op_pool == NULL) {
-        MPIU_CHKPMEM_MALLOC(op_segment, struct rptl_op_pool_segment *, sizeof(struct rptl_op_pool_segment),
-                            mpi_errno, "op pool segment");
-        MPL_DL_APPEND(target->op_segment_list, op_segment);
-
-        for (i = 0; i < RPTL_OP_POOL_SEGMENT_COUNT; i++)
-            MPL_DL_APPEND(target->op_pool, &op_segment->op[i]);
-    }
-
-    *op = target->op_pool;
-    MPL_DL_DELETE(target->op_pool, *op);
-
-  fn_exit:
-    MPIU_CHKPMEM_COMMIT();
-    MPIDI_FUNC_EXIT(MPID_STATE_ALLOC_OP);
-    return ret;
-
-  fn_fail:
-    if (mpi_errno)
-        ret = PTL_FAIL;
-    MPIU_CHKPMEM_REAP();
-    goto fn_exit;
-}
-
-
-#undef FUNCNAME
-#define FUNCNAME free_op
-#undef FCNAME
-#define FCNAME MPIU_QUOTE(FUNCNAME)
-static void free_op(struct rptl_op *op)
-{
-    MPIDI_STATE_DECL(MPID_STATE_FREE_OP);
-
-    MPIDI_FUNC_ENTER(MPID_STATE_FREE_OP);
-
-    MPL_DL_APPEND(op->target->op_pool, op);
-
-    MPIDI_FUNC_EXIT(MPID_STATE_FREE_OP);
-}
-
-
 static int rptl_put(ptl_handle_md_t md_handle, ptl_size_t local_offset, ptl_size_t length,
                     ptl_ack_req_t ack_req, ptl_process_t target_id, ptl_pt_index_t pt_index,
                     ptl_match_bits_t match_bits, ptl_size_t remote_offset, void *user_ptr,
@@ -620,7 +326,7 @@ static int rptl_put(ptl_handle_md_t md_handle, ptl_size_t local_offset, ptl_size
     ret = find_target(target_id, &target);
     RPTLU_ERR_POP(ret, "error finding target structure\n");
 
-    ret = alloc_op(&op, target);
+    ret = rptli_op_alloc(&op, target);
     RPTLU_ERR_POP(ret, "error allocating op\n");
 
     op->op_type = RPTL_OP_PUT;
@@ -694,7 +400,7 @@ int MPID_nem_ptl_rptl_get(ptl_handle_md_t md_handle, ptl_size_t local_offset, pt
     ret = find_target(target_id, &target);
     RPTLU_ERR_POP(ret, "error finding target structure\n");
 
-    ret = alloc_op(&op, target);
+    ret = rptli_op_alloc(&op, target);
     RPTLU_ERR_POP(ret, "error allocating op\n");
 
     op->op_type = RPTL_OP_GET;
@@ -942,7 +648,7 @@ static int retrieve_event(ptl_event_t * event)
                 event->user_ptr = op->u.put.user_ptr;
 
                 MPL_DL_DELETE(target->data_op_list, op);
-                free_op(op);
+                rptli_op_free(op);
 
                 have_event = 1;
                 goto fn_exit;
@@ -1048,11 +754,11 @@ int MPID_nem_ptl_rptl_eqget(ptl_handle_eq_t eq_handle, ptl_event_t * event)
         ret = PTL_EQ_EMPTY;
 
         /* the message came in on the control PT, repost it */
-        tmp_ret = post_empty_buffer(rptl->ni, rptl->control.pt,
+        tmp_ret = rptli_post_control_buffer(rptl->ni, rptl->control.pt,
                                     &rptl->control.me[rptl->control.me_idx]);
         if (tmp_ret) {
             ret = tmp_ret;
-            RPTLU_ERR_POP(ret, "Error returned from post_empty_buffer\n");
+            RPTLU_ERR_POP(ret, "Error returned from rptli_post_control_buffer\n");
         }
         rptl->control.me_idx++;
         if (rptl->control.me_idx >= 2 * rptl_info.world_size)
@@ -1141,7 +847,7 @@ int MPID_nem_ptl_rptl_eqget(ptl_handle_eq_t eq_handle, ptl_event_t * event)
 
             /* GET operations only go into the data op list */
             MPL_DL_DELETE(op->target->data_op_list, op);
-            free_op(op);
+            rptli_op_free(op);
         }
 
         else if (event->type == PTL_EVENT_SEND && op->u.put.ack) {
@@ -1158,7 +864,7 @@ int MPID_nem_ptl_rptl_eqget(ptl_handle_eq_t eq_handle, ptl_event_t * event)
             if (op->u.put.pt_type == RPTL_PT_CONTROL) {
                 MPIU_Free(op->u.put.ack);
                 MPL_DL_DELETE(op->target->control_op_list, op);
-                free_op(op);
+                rptli_op_free(op);
                 ret = PTL_EQ_EMPTY;
             }
         }
@@ -1177,7 +883,7 @@ int MPID_nem_ptl_rptl_eqget(ptl_handle_eq_t eq_handle, ptl_event_t * event)
             if (op->u.put.pt_type == RPTL_PT_CONTROL) {
                 MPIU_Free(op->u.put.send);
                 MPL_DL_DELETE(op->target->control_op_list, op);
-                free_op(op);
+                rptli_op_free(op);
                 ret = PTL_EQ_EMPTY;
             }
 
@@ -1193,7 +899,7 @@ int MPID_nem_ptl_rptl_eqget(ptl_handle_eq_t eq_handle, ptl_event_t * event)
 
                 /* we should be in the data op list */
                 MPL_DL_DELETE(op->target->data_op_list, op);
-                free_op(op);
+                rptli_op_free(op);
             }
         }
 
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.h b/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.h
index 7ce31d9..f99523c 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.h
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.h
@@ -160,28 +160,44 @@ struct rptl_target {
     struct rptl_target *prev;
 };
 
+struct rptl_info {
+    struct rptl *rptl_list;
+    struct rptl_target *target_list;
+
+    int world_size;
+    uint64_t origin_events_left;
+    int (*get_target_info) (int rank, ptl_process_t * id, ptl_pt_index_t local_data_pt,
+                            ptl_pt_index_t * target_data_pt, ptl_pt_index_t * target_control_pt);
+};
+
+extern struct rptl_info rptl_info;
+
+
+/* initialization */
 int MPID_nem_ptl_rptl_init(int world_size, uint64_t max_origin_events,
                            int (*get_target_info) (int rank, ptl_process_t * id,
                                                    ptl_pt_index_t local_data_pt,
                                                    ptl_pt_index_t * target_data_pt,
                                                    ptl_pt_index_t * target_control_pt));
-
 int MPID_nem_ptl_rptl_drain_eq(int eq_count, ptl_handle_eq_t *eq);
-
 int MPID_nem_ptl_rptl_ptinit(ptl_handle_ni_t ni_handle, ptl_handle_eq_t eq_handle, ptl_pt_index_t data_pt,
                              ptl_pt_index_t control_pt);
-
 int MPID_nem_ptl_rptl_ptfini(ptl_pt_index_t pt_index);
+int rptli_post_control_buffer(ptl_handle_ni_t ni_handle, ptl_pt_index_t pt,
+                              ptl_handle_me_t * me_handle);
+
+/* op management */
+int rptli_op_alloc(struct rptl_op **op, struct rptl_target *target);
+void rptli_op_free(struct rptl_op *op);
 
+/* communication */
 int MPID_nem_ptl_rptl_put(ptl_handle_md_t md_handle, ptl_size_t local_offset, ptl_size_t length,
                           ptl_ack_req_t ack_req, ptl_process_t target_id, ptl_pt_index_t pt_index,
                           ptl_match_bits_t match_bits, ptl_size_t remote_offset, void *user_ptr,
                           ptl_hdr_data_t hdr_data);
-
 int MPID_nem_ptl_rptl_get(ptl_handle_md_t md_handle, ptl_size_t local_offset, ptl_size_t length,
                           ptl_process_t target_id, ptl_pt_index_t pt_index,
                           ptl_match_bits_t match_bits, ptl_size_t remote_offset, void *user_ptr);
-
 int MPID_nem_ptl_rptl_eqget(ptl_handle_eq_t eq_handle, ptl_event_t * event);
 
 #endif /* RPTL_H_INCLUDED */
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl_init.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl_init.c
new file mode 100644
index 0000000..9262c5e
--- /dev/null
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl_init.c
@@ -0,0 +1,235 @@
+/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
+/*
+ *  (C) 2014 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ */
+
+#include "ptl_impl.h"
+#include "rptl.h"
+
+#undef FUNCNAME
+#define FUNCNAME rptli_post_control_buffer
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+int rptli_post_control_buffer(ptl_handle_ni_t ni_handle, ptl_pt_index_t pt,
+                              ptl_handle_me_t * me_handle)
+{
+    int ret;
+    ptl_me_t me;
+    ptl_process_t id;
+    MPIDI_STATE_DECL(MPID_STATE_RPTLI_POST_CONTROL_BUFFER);
+
+    MPIDI_FUNC_ENTER(MPID_STATE_RPTLI_POST_CONTROL_BUFFER);
+
+    id.phys.nid = PTL_NID_ANY;
+    id.phys.pid = PTL_PID_ANY;
+
+    me.start = NULL;
+    me.length = 0;
+    me.ct_handle = PTL_CT_NONE;
+    me.uid = PTL_UID_ANY;
+    me.options = (PTL_ME_OP_PUT | PTL_ME_OP_GET | PTL_ME_USE_ONCE | PTL_ME_IS_ACCESSIBLE |
+                  PTL_ME_EVENT_LINK_DISABLE | PTL_ME_EVENT_UNLINK_DISABLE);
+    me.match_id = id;
+    me.match_bits = 0;
+    me.ignore_bits = 0;
+    me.min_free = 0;
+
+    ret = PtlMEAppend(ni_handle, pt, &me, PTL_PRIORITY_LIST, NULL, me_handle);
+    RPTLU_ERR_POP(ret, "Error appending empty buffer to priority list\n");
+
+  fn_exit:
+    MPIDI_FUNC_EXIT(MPID_STATE_RPTLI_POST_CONTROL_BUFFER);
+    return ret;
+
+  fn_fail:
+    goto fn_exit;
+}
+
+
+#undef FUNCNAME
+#define FUNCNAME MPID_nem_ptl_rptl_init
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+int MPID_nem_ptl_rptl_init(int world_size, uint64_t max_origin_events,
+                           int (*get_target_info) (int rank, ptl_process_t * id,
+                                                   ptl_pt_index_t local_data_pt,
+                                                   ptl_pt_index_t * target_data_pt,
+                                                   ptl_pt_index_t * target_control_pt))
+{
+    int ret = PTL_OK;
+    MPIDI_STATE_DECL(MPID_STATE_MPID_NEM_PTL_RPTL_INIT);
+
+    MPIDI_FUNC_ENTER(MPID_STATE_MPID_NEM_PTL_RPTL_INIT);
+
+    rptl_info.rptl_list = NULL;
+    rptl_info.target_list = NULL;
+
+    rptl_info.world_size = world_size;
+    rptl_info.origin_events_left = max_origin_events;
+    rptl_info.get_target_info = get_target_info;
+
+  fn_exit:
+    MPIDI_FUNC_EXIT(MPID_STATE_MPID_NEM_PTL_RPTL_INIT);
+    return ret;
+
+  fn_fail:
+    goto fn_exit;
+}
+
+
+#undef FUNCNAME
+#define FUNCNAME MPID_nem_ptl_rptl_drain_eq
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+int MPID_nem_ptl_rptl_drain_eq(int eq_count, ptl_handle_eq_t *eq)
+{
+    int ret = PTL_OK;
+    ptl_event_t event;
+    struct rptl_op_pool_segment *op_segment;
+    int i;
+    struct rptl_target *target, *t;
+    MPIDI_STATE_DECL(MPID_STATE_MPID_NEM_PTL_RPTL_FINALIZE);
+
+    MPIDI_FUNC_ENTER(MPID_STATE_MPID_NEM_PTL_RPTL_FINALIZE);
+
+    for (target = rptl_info.target_list; target; target = target->next) {
+        while (target->control_op_list || target->data_op_list) {
+            for (i = 0; i < eq_count; i++) {
+                /* read and ignore all events */
+                ret = MPID_nem_ptl_rptl_eqget(eq[i], &event);
+                if (ret == PTL_EQ_EMPTY)
+                    ret = PTL_OK;
+                RPTLU_ERR_POP(ret, "Error calling MPID_nem_ptl_rptl_eqget\n");
+            }
+        }
+    }
+
+    for (target = rptl_info.target_list; target;) {
+        assert(target->data_op_list == NULL);
+        assert(target->control_op_list == NULL);
+
+        while (target->op_segment_list) {
+            op_segment = target->op_segment_list;
+            MPL_DL_DELETE(target->op_segment_list, op_segment);
+            MPIU_Free(op_segment);
+        }
+
+        t = target->next;
+        MPIU_Free(target);
+        target = t;
+    }
+
+  fn_exit:
+    MPIDI_FUNC_EXIT(MPID_STATE_MPID_NEM_PTL_RPTL_FINALIZE);
+    return ret;
+
+  fn_fail:
+    goto fn_exit;
+}
+
+
+#undef FUNCNAME
+#define FUNCNAME MPID_nem_ptl_rptl_ptinit
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+int MPID_nem_ptl_rptl_ptinit(ptl_handle_ni_t ni_handle, ptl_handle_eq_t eq_handle, ptl_pt_index_t data_pt,
+                             ptl_pt_index_t control_pt)
+{
+    int ret = PTL_OK;
+    struct rptl *rptl;
+    int mpi_errno = MPI_SUCCESS;
+    int i;
+    ptl_md_t md;
+    MPIU_CHKPMEM_DECL(2);
+    MPIDI_STATE_DECL(MPID_STATE_MPID_NEM_PTL_RPTL_PTINIT);
+
+    MPIDI_FUNC_ENTER(MPID_STATE_MPID_NEM_PTL_RPTL_PTINIT);
+
+
+    /* setup the parts of rptls that can be done before world size or
+     * target information */
+    MPIU_CHKPMEM_MALLOC(rptl, struct rptl *, sizeof(struct rptl), mpi_errno, "rptl");
+    MPL_DL_APPEND(rptl_info.rptl_list, rptl);
+
+    rptl->local_state = RPTL_LOCAL_STATE_ACTIVE;
+    rptl->pause_ack_counter = 0;
+
+    rptl->data.ob_max_count = 0;
+    rptl->data.ob_curr_count = 0;
+
+    rptl->data.pt = data_pt;
+    rptl->control.pt = control_pt;
+
+    rptl->ni = ni_handle;
+    rptl->eq = eq_handle;
+
+    md.start = 0;
+    md.length = (ptl_size_t) (-1);
+    md.options = 0x0;
+    md.eq_handle = rptl->eq;
+    md.ct_handle = PTL_CT_NONE;
+    ret = PtlMDBind(rptl->ni, &md, &rptl->md);
+    RPTLU_ERR_POP(ret, "Error binding new global MD\n");
+
+    /* post world_size number of empty buffers on the control portal */
+    if (rptl->control.pt != PTL_PT_ANY) {
+        MPIU_CHKPMEM_MALLOC(rptl->control.me, ptl_handle_me_t *,
+                            2 * rptl_info.world_size * sizeof(ptl_handle_me_t), mpi_errno,
+                            "rptl target info");
+        for (i = 0; i < 2 * rptl_info.world_size; i++) {
+            ret = rptli_post_control_buffer(rptl->ni, rptl->control.pt, &rptl->control.me[i]);
+            RPTLU_ERR_POP(ret, "Error in rptli_post_control_buffer\n");
+        }
+        rptl->control.me_idx = 0;
+    }
+
+  fn_exit:
+    MPIU_CHKPMEM_COMMIT();
+    MPIDI_FUNC_EXIT(MPID_STATE_MPID_NEM_PTL_RPTL_PTINIT);
+    return ret;
+
+  fn_fail:
+    if (mpi_errno)
+        ret = PTL_FAIL;
+    MPIU_CHKPMEM_REAP();
+    goto fn_exit;
+}
+
+
+#undef FUNCNAME
+#define FUNCNAME MPID_nem_ptl_rptl_ptfini
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+int MPID_nem_ptl_rptl_ptfini(ptl_pt_index_t pt_index)
+{
+    int i;
+    int ret = PTL_OK;
+    struct rptl *rptl;
+    MPIDI_STATE_DECL(MPID_STATE_MPID_NEM_PTL_RPTL_PTFINI);
+
+    MPIDI_FUNC_ENTER(MPID_STATE_MPID_NEM_PTL_RPTL_PTFINI);
+
+    /* find the right rptl */
+    for (rptl = rptl_info.rptl_list; rptl && rptl->data.pt != pt_index; rptl = rptl->next);
+    assert(rptl);
+
+    /* free control portals that were created */
+    if (rptl->control.pt != PTL_PT_ANY) {
+        for (i = 0; i < rptl_info.world_size * 2; i++) {
+            ret = PtlMEUnlink(rptl->control.me[i]);
+            RPTLU_ERR_POP(ret, "Error unlinking control buffers\n");
+        }
+        MPIU_Free(rptl->control.me);
+    }
+
+    MPL_DL_DELETE(rptl_info.rptl_list, rptl);
+    MPIU_Free(rptl);
+
+  fn_exit:
+    MPIDI_FUNC_EXIT(MPID_STATE_MPID_NEM_PTL_RPTL_PTFINI);
+    return ret;
+
+  fn_fail:
+    goto fn_exit;
+}
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl_op.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl_op.c
new file mode 100644
index 0000000..44a287c
--- /dev/null
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl_op.c
@@ -0,0 +1,65 @@
+/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
+/*
+ *  (C) 2014 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ */
+
+#include "ptl_impl.h"
+#include "rptl.h"
+
+#undef FUNCNAME
+#define FUNCNAME rptli_op_alloc
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+int rptli_op_alloc(struct rptl_op **op, struct rptl_target *target)
+{
+    int ret = PTL_OK;
+    struct rptl_op_pool_segment *op_segment;
+    int mpi_errno = MPI_SUCCESS;
+    int i;
+    MPIU_CHKPMEM_DECL(1);
+    MPIDI_STATE_DECL(MPID_STATE_RPTLI_OP_ALLOC);
+
+    MPIDI_FUNC_ENTER(MPID_STATE_RPTLI_OP_ALLOC);
+
+    assert(target);
+
+    if (target->op_pool == NULL) {
+        MPIU_CHKPMEM_MALLOC(op_segment, struct rptl_op_pool_segment *, sizeof(struct rptl_op_pool_segment),
+                            mpi_errno, "op pool segment");
+        MPL_DL_APPEND(target->op_segment_list, op_segment);
+
+        for (i = 0; i < RPTL_OP_POOL_SEGMENT_COUNT; i++)
+            MPL_DL_APPEND(target->op_pool, &op_segment->op[i]);
+    }
+
+    *op = target->op_pool;
+    MPL_DL_DELETE(target->op_pool, *op);
+
+  fn_exit:
+    MPIU_CHKPMEM_COMMIT();
+    MPIDI_FUNC_EXIT(MPID_STATE_RPTLI_OP_ALLOC);
+    return ret;
+
+  fn_fail:
+    if (mpi_errno)
+        ret = PTL_FAIL;
+    MPIU_CHKPMEM_REAP();
+    goto fn_exit;
+}
+
+
+#undef FUNCNAME
+#define FUNCNAME rptli_op_free
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+void rptli_op_free(struct rptl_op *op)
+{
+    MPIDI_STATE_DECL(MPID_STATE_RPTLI_OP_FREE);
+
+    MPIDI_FUNC_ENTER(MPID_STATE_RPTLI_OP_FREE);
+
+    MPL_DL_APPEND(op->target->op_pool, op);
+
+    MPIDI_FUNC_EXIT(MPID_STATE_RPTLI_OP_FREE);
+}

http://git.mpich.org/mpich.git/commitdiff/101fd2b7482158b09d919bc99cb0ccfefc2479aa

commit 101fd2b7482158b09d919bc99cb0ccfefc2479aa
Author: Junchao Zhang <jczhang at mcs.anl.gov>
Date:   Wed Nov 12 12:50:17 2014 -0600

    Support spaces after @whatever@ in testlist
    
    It makes testlist.in files more flexible and easier to read
    
    Signed-off-by: Sangmin Seo <sseo at anl.gov>

diff --git a/maint/f77tof90.in b/maint/f77tof90.in
index 3be22ed..b6ecccc 100644
--- a/maint/f77tof90.in
+++ b/maint/f77tof90.in
@@ -472,7 +472,7 @@ sub ConvertTestlist {
 	    # Cray-style pointers for MPI_Alloc_mem).
 	    $_ = "\@ALLOCMEMFC\@\n";
 	}
-	elsif (/^(\@\w+\@)(\w+f)\s+(.*)/) {
+	elsif (/^(\@\w+\@)(\s*\w+f)\s+(.*)/) {
 	    # This handles the case where an autoconf variable in the
             # testlist.in file is used to optionally comment out a test
 	    $_ = $1 . $2 . "90 "  . $3 . "\n";

http://git.mpich.org/mpich.git/commitdiff/01718643e8a76cffb77490cd9d2fb68aff8a9388

commit 01718643e8a76cffb77490cd9d2fb68aff8a9388
Author: Junchao Zhang <jczhang at mcs.anl.gov>
Date:   Thu Nov 6 20:21:48 2014 -0600

    Fix: skip #ifdef/#endif lines in input headers
    
    Without doing so, the script wrongly thinks #ifdef etc. are part of
    a subroutine's prototype line.
    
    No review since F08 binding is experimental now.

diff --git a/src/binding/fortran/use_mpi_f08/wrappers_c/buildiface b/src/binding/fortran/use_mpi_f08/wrappers_c/buildiface
index 89b9419..1c7b13b 100755
--- a/src/binding/fortran/use_mpi_f08/wrappers_c/buildiface
+++ b/src/binding/fortran/use_mpi_f08/wrappers_c/buildiface
@@ -407,6 +407,9 @@ while (<FD>) {
         }
     }
 
+    # Skip lines starting with # such as #ifdef or #endif
+    if (/^\s*#/) { next; }
+
     # If we found a semi-colon at the end, that's the end of the line.
     # This is not perfect (e.g., does not work when a single line has
     # multiple semi-colon separated statements), but should be good

http://git.mpich.org/mpich.git/commitdiff/aa004b9befae6e32e2866e2be795e538b1db33a2

commit aa004b9befae6e32e2866e2be795e538b1db33a2
Author: Junchao Zhang <jczhang at mcs.anl.gov>
Date:   Tue Nov 11 09:45:14 2014 -0600

    Add support for MPIX_ subroutines in cdesc buildiface
    
    No review since F08 binding is experimental now.

diff --git a/src/binding/fortran/use_mpi_f08/wrappers_c/buildiface b/src/binding/fortran/use_mpi_f08/wrappers_c/buildiface
index 91cbf73..89b9419 100755
--- a/src/binding/fortran/use_mpi_f08/wrappers_c/buildiface
+++ b/src/binding/fortran/use_mpi_f08/wrappers_c/buildiface
@@ -452,15 +452,17 @@ while (<FD>) {
     if (grep/void\s*\*/, @arglist) {
         $fname = "$routine";
         $fname =~ s/MPI_//g;
+        $fname =~ s/MPIX_//g;
         $fname =~ tr/A-Z/a-z/;
         $fname .= "_cdesc.c";
 
         print MAKEFD "\tsrc/binding/fortran/use_mpi_f08/wrappers_c/$fname \\\n";
         open(CFILE, ">$fname") || die "Could not open $fname\n";
 
-        # replace MPI_Foo with MPIR_Foo_cdesc
+        # replace MPI(X)_Foo with MPIR_Foo_cdesc
         $cdesc_routine = $routine;
         $cdesc_routine =~ s/MPI_/MPIR_/g;
+        $cdesc_routine =~ s/MPIX_/MPIR_/g;
         $cdesc_routine .= "_cdesc";
 
         print CFILE <<EOT;

http://git.mpich.org/mpich.git/commitdiff/ce4322b5bf63155d97b254e740fad40fbec6a00a

commit ce4322b5bf63155d97b254e740fad40fbec6a00a
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Thu Nov 13 18:20:41 2014 -0600

    portals4: change variable name put_acked -> put_done
    
    Helps clarity since we no longer use ACKs in the netmod code.
    
    Signed-off-by: Antonio Pena Monferrer <apenya at mcs.anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
index e5e1aea..5ab1eef 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
@@ -49,7 +49,7 @@ typedef struct {
     ptl_handle_me_t put_me;
     ptl_handle_me_t *get_me_p;
     int num_gets;
-    int put_acked;
+    int put_done;
     ptl_size_t chunk_offset;
     void *chunk_buffer[MPID_NEM_PTL_NUM_CHUNK_BUFFERS];
     MPIDI_msg_sz_t bytes_put;
@@ -72,7 +72,7 @@ typedef struct {
         REQ_PTL(req_)->put_me        = PTL_INVALID_HANDLE;      \
         REQ_PTL(req_)->get_me_p      = NULL;                    \
         REQ_PTL(req_)->num_gets      = 0;                       \
-        REQ_PTL(req_)->put_acked     = 0;                       \
+        REQ_PTL(req_)->put_done     = 0;                       \
         REQ_PTL(req_)->event_handler = NULL;                    \
         REQ_PTL(req_)->chunk_offset  = 0;                       \
     } while (0)
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c
index f506987..9a1000e 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c
@@ -177,7 +177,7 @@ static inline int send_pkt(MPIDI_VC_t *vc, void *hdr_p, void *data_p, MPIDI_msg_
     NEW_TAG(sendbuf->tag);
     TMPBUF(sreq) = NULL;
     REQ_PTL(sreq)->num_gets = 0;
-    REQ_PTL(sreq)->put_acked = 0;
+    REQ_PTL(sreq)->put_done = 0;
 
     if (data_sz) {
         MPIU_Memcpy(sendbuf->packet + sizeof(MPIDI_CH3_Pkt_t), data_p, sent_sz);
@@ -230,7 +230,7 @@ static int send_noncontig_pkt(MPIDI_VC_t *vc, MPID_Request *sreq, void *hdr_p)
     NEW_TAG(sendbuf->tag);
     TMPBUF(sreq) = NULL;
     REQ_PTL(sreq)->num_gets = 0;
-    REQ_PTL(sreq)->put_acked = 0;
+    REQ_PTL(sreq)->put_done = 0;
 
     if (sreq->dev.segment_size) {
         MPIDI_msg_sz_t last = sent_sz;
@@ -456,7 +456,7 @@ int MPID_nem_ptl_nm_ctl_event_handler(const ptl_event_t *e)
 
             if (--REQ_PTL(req)->num_gets == 0) {
                 MPIU_Free(TMPBUF(req));
-                if (REQ_PTL(req)->put_acked)
+                if (REQ_PTL(req)->put_done)
                     on_data_avail(req);  /* Otherwise we'll do it on the SEND */
             }
         }
@@ -467,7 +467,7 @@ int MPID_nem_ptl_nm_ctl_event_handler(const ptl_event_t *e)
             MPID_Request *const req = e->user_ptr;
 
             MPIU_Free(SENDBUF(req));
-            REQ_PTL(req)->put_acked = 1;
+            REQ_PTL(req)->put_done = 1;
             if (REQ_PTL(req)->num_gets == 0)  /* Otherwise GET will do it */
                 on_data_avail(req);
         }
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
index fb6d5f3..34a1b4f 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
@@ -104,13 +104,13 @@ static int handler_large(const ptl_event_t *e)
     MPIU_Assert(e->type == PTL_EVENT_SEND || e->type == PTL_EVENT_GET);
 
     if (e->type == PTL_EVENT_SEND) {
-        REQ_PTL(sreq)->put_acked = 1;
+        REQ_PTL(sreq)->put_done = 1;
     } else if (e->type == PTL_EVENT_GET) {
         /* decrement the remaining get operations */
         REQ_PTL(sreq)->num_gets--;
     }
 
-    if (REQ_PTL(sreq)->num_gets == 0 && REQ_PTL(sreq)->put_acked)
+    if (REQ_PTL(sreq)->num_gets == 0 && REQ_PTL(sreq)->put_done)
         mpi_errno = handler_send_complete(e);
 
  fn_exit:

http://git.mpich.org/mpich.git/commitdiff/7879fbae03871006066d167dafc9d37bb141dc67

commit 7879fbae03871006066d167dafc9d37bb141dc67
Author: Antonio Pena Monferrer <apenya at mcs.anl.gov>
Date:   Thu Nov 13 11:52:28 2014 -0600

    Use SEND instead of ACK events in Portals4
    
    The rportals layer is taking care of retransmissions, so we should only
    be interested in delivery events in the netmod layer.
    
    Signed-off-by: Ken Raffenetti <raffenet at mcs.anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c
index a6d4697..c016952 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c
@@ -170,7 +170,6 @@ int MPID_nem_ptl_poll(int is_blocking_poll)
             }
         case PTL_EVENT_PUT_OVERFLOW:
         case PTL_EVENT_GET:
-        case PTL_EVENT_ACK:
         case PTL_EVENT_SEND:
         case PTL_EVENT_REPLY:
         case PTL_EVENT_SEARCH: {
@@ -193,6 +192,7 @@ int MPID_nem_ptl_poll(int is_blocking_poll)
         case PTL_EVENT_LINK:
             /* ignore */
             break;
+        case PTL_EVENT_ACK:
         default:
             MPIU_Error_printf("Received unexpected event type: %d %s", event.type, MPID_nem_ptl_strevent(&event));
             MPIU_ERR_INTERNALANDJUMP(mpi_errno, "Unexpected event type");
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
index 022b541..fb6d5f3 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
@@ -66,10 +66,7 @@ static int handler_send_complete(const ptl_event_t *e)
 
     MPIDI_FUNC_ENTER(MPID_STATE_HANDLER_SEND_COMPLETE);
 
-    if (e->type == PTL_EVENT_SEND)  /* Ignore */
-        goto fn_exit;
-
-    MPIU_Assert(e->type == PTL_EVENT_ACK || e->type == PTL_EVENT_GET);
+    MPIU_Assert(e->type == PTL_EVENT_SEND || e->type == PTL_EVENT_GET);
 
     if (REQ_PTL(sreq)->md != PTL_INVALID_HANDLE) {
         ret = PtlMDRelease(REQ_PTL(sreq)->md);
@@ -104,20 +101,9 @@ static int handler_large(const ptl_event_t *e)
 
     MPIDI_FUNC_ENTER(MPID_STATE_HANDLER_LARGE);
 
-    if (e->type == PTL_EVENT_SEND)  /* Ignore */
-        goto fn_exit;
+    MPIU_Assert(e->type == PTL_EVENT_SEND || e->type == PTL_EVENT_GET);
 
-    if (e->type != PTL_EVENT_ACK && e->type != PTL_EVENT_GET)
-        MPIU_Error_printf("ACK event expected, received %s ni_fail=%s list=%s user_ptr=%p hdr_data=%#lx\n",
-                          MPID_nem_ptl_strevent(e), MPID_nem_ptl_strnifail(e->ni_fail_type),
-                          MPID_nem_ptl_strlist(e->ptl_list), e->user_ptr, e->hdr_data);
-    MPIU_Assert(e->type == PTL_EVENT_ACK || e->type == PTL_EVENT_GET);
-    
-    if (e->type == PTL_EVENT_ACK && e->mlength < PTL_LARGE_THRESHOLD) {
-        /* truncated message */
-        mpi_errno = handler_send_complete(e);
-        if (mpi_errno) MPIU_ERR_POP(mpi_errno);
-    } else if (e->type == PTL_EVENT_ACK) {
+    if (e->type == PTL_EVENT_SEND) {
         REQ_PTL(sreq)->put_acked = 1;
     } else if (e->type == PTL_EVENT_GET) {
         /* decrement the remaining get operations */
@@ -271,7 +257,7 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
             MPIU_DBG_MSG(CH3_CHANNEL, VERBOSE, "Small contig message");
             REQ_PTL(sreq)->event_handler = handler_send_complete;
             MPIU_DBG_MSG_P(CH3_CHANNEL, VERBOSE, "&REQ_PTL(sreq)->event_handler = %p", &(REQ_PTL(sreq)->event_handler));
-            ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)((char *)buf + dt_true_lb), data_sz, PTL_ACK_REQ, vc_ptl->id, vc_ptl->pt,
+            ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)((char *)buf + dt_true_lb), data_sz, PTL_NO_ACK_REQ, vc_ptl->id, vc_ptl->pt,
                          NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), 0, sreq,
                                         NPTL_HEADER(ssend_flag, data_sz));
             MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s", MPID_nem_ptl_strerror(ret));
@@ -308,7 +294,7 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
             MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmdbind", "**ptlmdbind %s", MPID_nem_ptl_strerror(ret));
                 
             REQ_PTL(sreq)->event_handler = handler_send_complete;
-            ret = MPID_nem_ptl_rptl_put(REQ_PTL(sreq)->md, 0, data_sz, PTL_ACK_REQ, vc_ptl->id, vc_ptl->pt,
+            ret = MPID_nem_ptl_rptl_put(REQ_PTL(sreq)->md, 0, data_sz, PTL_NO_ACK_REQ, vc_ptl->id, vc_ptl->pt,
                          NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), 0, sreq,
                                         NPTL_HEADER(ssend_flag, data_sz));
             MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s", MPID_nem_ptl_strerror(ret));
@@ -325,7 +311,7 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
         MPID_Segment_pack(sreq->dev.segment_ptr, sreq->dev.segment_first, &last, REQ_PTL(sreq)->chunk_buffer[0]);
         MPIU_Assert(last == sreq->dev.segment_size);
         REQ_PTL(sreq)->event_handler = handler_send_complete;
-        ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)REQ_PTL(sreq)->chunk_buffer[0], data_sz, PTL_ACK_REQ,
+        ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)REQ_PTL(sreq)->chunk_buffer[0], data_sz, PTL_NO_ACK_REQ,
                      vc_ptl->id, vc_ptl->pt, NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), 0, sreq,
                                     NPTL_HEADER(ssend_flag, data_sz));
         MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s", MPID_nem_ptl_strerror(ret));
@@ -342,7 +328,7 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
         REQ_PTL(sreq)->large = TRUE;
 
         REQ_PTL(sreq)->event_handler = handler_large;
-        ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)((char *)buf + dt_true_lb), PTL_LARGE_THRESHOLD, PTL_ACK_REQ, vc_ptl->id, vc_ptl->pt,
+        ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)((char *)buf + dt_true_lb), PTL_LARGE_THRESHOLD, PTL_NO_ACK_REQ, vc_ptl->id, vc_ptl->pt,
                      NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), 0, sreq,
                                     NPTL_HEADER(ssend_flag | NPTL_LARGE, data_sz));
         MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s", MPID_nem_ptl_strerror(ret));
@@ -412,7 +398,7 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
                 REQ_PTL(sreq)->large = TRUE;
 
                 REQ_PTL(sreq)->event_handler = handler_large;
-                ret = MPID_nem_ptl_rptl_put(REQ_PTL(sreq)->md, 0, PTL_LARGE_THRESHOLD, PTL_ACK_REQ, vc_ptl->id, vc_ptl->pt,
+                ret = MPID_nem_ptl_rptl_put(REQ_PTL(sreq)->md, 0, PTL_LARGE_THRESHOLD, PTL_NO_ACK_REQ, vc_ptl->id, vc_ptl->pt,
                              NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), 0, sreq,
                                             NPTL_HEADER(ssend_flag | NPTL_LARGE, data_sz));
                 MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s", MPID_nem_ptl_strerror(ret));
@@ -437,7 +423,7 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
 
     REQ_PTL(sreq)->event_handler = handler_large;
     ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)REQ_PTL(sreq)->chunk_buffer[0], PTL_LARGE_THRESHOLD,
-                                PTL_ACK_REQ, vc_ptl->id, vc_ptl->pt, NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank),
+                                PTL_NO_ACK_REQ, vc_ptl->id, vc_ptl->pt, NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank),
                                 0, sreq, NPTL_HEADER(ssend_flag | NPTL_LARGE, data_sz));
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s", MPID_nem_ptl_strerror(ret));
     DBG_MSG_PUT("global", PTL_LARGE_THRESHOLD, vc->pg_rank, NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), NPTL_HEADER(ssend_flag | NPTL_LARGE, data_sz));

http://git.mpich.org/mpich.git/commitdiff/08c09a9bd2d94b6c2299e5548a53a8e273a9bd49

commit 08c09a9bd2d94b6c2299e5548a53a8e273a9bd49
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Thu Nov 13 18:12:51 2014 -0600

    update .gitignore in testsuite
    
    No reviewer

diff --git a/test/mpi/.gitignore b/test/mpi/.gitignore
index 3a62aca..f575616 100644
--- a/test/mpi/.gitignore
+++ b/test/mpi/.gitignore
@@ -882,6 +882,7 @@
 /init/finalized
 /init/library_version
 /io/resized
+/io/testlist
 /manual/mpi_t/mpit_test
 /manual/mpi_t/mpit_test2
 /manual/mpi_t/nem_fbox_fallback_to_queue_count

http://git.mpich.org/mpich.git/commitdiff/4488a3abaa2c1054bb91474613616b6a8b4a185b

commit 4488a3abaa2c1054bb91474613616b6a8b4a185b
Author: Min Si <msi at il.is.s.u-tokyo.ac.jp>
Date:   Thu Nov 13 17:56:08 2014 -0600

    Increase bcast2,bcast3 timelimit to 13 mins.
    
    Nightly testing reported timeout on octopus with 12 timelimit. Each of
    them took 12:05 ~ 12:10 mins.

diff --git a/test/mpi/coll/testlist b/test/mpi/coll/testlist
index 280365e..36686e7 100644
--- a/test/mpi/coll/testlist
+++ b/test/mpi/coll/testlist
@@ -35,8 +35,8 @@ bcasttest 10
 bcast2 4
 # More that 8 processes are required to get bcast to switch to the long
 # msg algorithm (see coll definitions in mpiimpl.h)
-bcast2 10 timeLimit=720
-bcast3 10 timeLimit=720
+bcast2 10 timeLimit=780
+bcast3 10 timeLimit=780
 bcastzerotype 1
 bcastzerotype 4
 bcastzerotype 5

http://git.mpich.org/mpich.git/commitdiff/8648d333dbfdb4ac2786160fc21ff837190e17bd

commit 8648d333dbfdb4ac2786160fc21ff837190e17bd
Author: Antonio Pena Monferrer <apenya at mcs.anl.gov>
Date:   Thu Nov 13 01:35:10 2014 -0600

    Remove unnecessary soft ACKs from portals RMA code
    
    Those were introduced for a robust protocol during development.
    No longer needed.
    
    Signed-off-by: Ken Raffenetti <raffenet at mcs.anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c
index 481b88b..f506987 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c
@@ -15,14 +15,11 @@
 #define SENDBUF_SIZE(sent_sz_) (offsetof(buf_t, packet) + sizeof(MPIDI_CH3_Pkt_t) + (sent_sz_))
 #define SENDBUF(req_) REQ_PTL(req_)->chunk_buffer[0]
 #define TMPBUF(req_) REQ_PTL(req_)->chunk_buffer[1]
-#define NEW_TAG(tag_) do {     \
-    global_tag += 2;           \
-    if (global_tag == CTL_TAG) \
-        global_tag += 2;       \
-    (tag_) = global_tag;       \
+#define NEW_TAG(tag_) do {       \
+    if (++global_tag == CTL_TAG) \
+        ++global_tag;            \
+    (tag_) = global_tag;         \
 } while(0)
-#define GET_TAG(tag_)  (((tag_) >> 1) << 1)
-#define DONE_TAG(tag_) ((tag_) | 0x1)
 
 typedef struct {
     size_t remaining;
@@ -34,7 +31,6 @@ static buf_t recvbufs[NUM_RECV_BUFS];
 static ptl_me_t mes[NUM_RECV_BUFS];
 static ptl_handle_me_t me_handles[NUM_RECV_BUFS];
 static unsigned long long put_cnt = 0;  /* required to not finalizing too early */
-static MPID_Request *done_req;
 static ptl_match_bits_t global_tag = 0;
 
 
@@ -73,12 +69,6 @@ int MPID_nem_ptl_nm_init(void)
                              MPID_nem_ptl_strerror(ret));
     }
 
-    done_req = MPID_Request_create();
-    MPIU_Assert(done_req != NULL);
-    done_req->dev.OnDataAvail = NULL;
-    SENDBUF(done_req) = NULL;
-    REQ_PTL(done_req)->event_handler = MPID_nem_ptl_nm_ctl_event_handler;
-
  fn_exit:
     MPIDI_FUNC_EXIT(MPID_STATE_MPID_NEM_PTL_NM_INIT);
     return mpi_errno;
@@ -107,8 +97,6 @@ int MPID_nem_ptl_nm_finalize(void)
                              MPID_nem_ptl_strerror(ret));
     }
 
-    MPIDI_CH3_Request_destroy(done_req);
-
  fn_exit:
     MPIDI_FUNC_EXIT(MPID_STATE_MPID_NEM_PTL_NM_FINALIZE);
     return mpi_errno;
@@ -117,44 +105,6 @@ int MPID_nem_ptl_nm_finalize(void)
 }
 
 #undef FUNCNAME
-#define FUNCNAME meappend_done
-#undef FCNAME
-#define FCNAME MPIU_QUOTE(FUNCNAME)
-static inline int meappend_done(ptl_process_t id, MPID_Request *req, ptl_match_bits_t tag)
-{
-    int mpi_errno = MPI_SUCCESS;
-    int ret;
-    ptl_me_t me;
-    ptl_handle_me_t me_handle;
-    MPIDI_STATE_DECL(MPID_STATE_MEAPPEND_DONE);
-
-    MPIDI_FUNC_ENTER(MPID_STATE_MEAPPEND_DONE);
-
-    me.start = NULL;
-    me.length = 0;
-    me.ct_handle = PTL_CT_NONE;
-    me.uid = PTL_UID_ANY;
-    me.options = ( PTL_ME_OP_PUT | PTL_ME_USE_ONCE | PTL_ME_IS_ACCESSIBLE |
-                   PTL_ME_EVENT_LINK_DISABLE | PTL_ME_EVENT_UNLINK_DISABLE );
-    me.match_id = id;
-    me.match_bits = DONE_TAG(tag);
-    me.ignore_bits = 0;
-    me.min_free = 0;
-    ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_control_pt, &me, PTL_PRIORITY_LIST, req,
-                      &me_handle);
-    MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST, "PtlMEAppend(req=%p tag=%#lx)", req, DONE_TAG(tag)));
-    MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmeappend", "**ptlmeappend %s",
-                         MPID_nem_ptl_strerror(ret));
-    ++put_cnt;
-
- fn_exit:
-    MPIDI_FUNC_EXIT(MPID_STATE_MEAPPEND_DONE);
-    return mpi_errno;
- fn_fail:
-    goto fn_exit;
-}
-
-#undef FUNCNAME
 #define FUNCNAME meappend_large
 #undef FCNAME
 #define FCNAME MPIU_QUOTE(FUNCNAME)
@@ -175,21 +125,20 @@ static inline int meappend_large(ptl_process_t id, MPID_Request *req, ptl_match_
     me.options = ( PTL_ME_OP_GET | PTL_ME_USE_ONCE | PTL_ME_IS_ACCESSIBLE |
                    PTL_ME_EVENT_LINK_DISABLE | PTL_ME_EVENT_UNLINK_DISABLE );
     me.match_id = id;
-    me.match_bits = GET_TAG(tag);
+    me.match_bits = tag;
     me.ignore_bits = 0;
     me.min_free = 0;
 
     while (remaining) {
-        int incomplete;
         ptl_handle_me_t foo_me_handle;
 
-        MPIDI_CH3U_Request_increment_cc(req, &incomplete);  /* Cannot avoid GET events from poll infrastructure */
+        ++REQ_PTL(req)->num_gets;
 
         ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_control_pt, &me, PTL_PRIORITY_LIST, req,
                           &foo_me_handle);
         MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmeappend", "**ptlmeappend %s",
                              MPID_nem_ptl_strerror(ret));
-        MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST, "PtlMEAppend(req=%p tag=%#lx)", req, GET_TAG(tag)));
+        MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST, "PtlMEAppend(req=%p tag=%#lx)", req, tag));
 
         me.start = (char *)me.start + me.length;
         remaining -= me.length;
@@ -227,6 +176,8 @@ static inline int send_pkt(MPIDI_VC_t *vc, void *hdr_p, void *data_p, MPIDI_msg_
     sendbuf->remaining = data_sz - sent_sz;
     NEW_TAG(sendbuf->tag);
     TMPBUF(sreq) = NULL;
+    REQ_PTL(sreq)->num_gets = 0;
+    REQ_PTL(sreq)->put_acked = 0;
 
     if (data_sz) {
         MPIU_Memcpy(sendbuf->packet + sizeof(MPIDI_CH3_Pkt_t), data_p, sent_sz);
@@ -239,11 +190,6 @@ static inline int send_pkt(MPIDI_VC_t *vc, void *hdr_p, void *data_p, MPIDI_msg_
     SENDBUF(sreq) = sendbuf;
     REQ_PTL(sreq)->event_handler = MPID_nem_ptl_nm_ctl_event_handler;
 
-    /* Post ME for the DONE message */
-    mpi_errno = meappend_done(vc_ptl->id, sreq, sendbuf->tag);
-    if (mpi_errno)
-        goto fn_fail;
-
     ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)sendbuf, sendbuf_sz, PTL_NO_ACK_REQ,
                                 vc_ptl->id, vc_ptl->ptc, CTL_TAG, 0, sreq, MPIDI_Process.my_pg_rank);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s",
@@ -251,6 +197,8 @@ static inline int send_pkt(MPIDI_VC_t *vc, void *hdr_p, void *data_p, MPIDI_msg_
     MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST, "PtlPut(size=%lu id=(%#x,%#x) pt=%#x)",
                                             sendbuf_sz, vc_ptl->id.phys.nid,
                                             vc_ptl->id.phys.pid, vc_ptl->ptc));
+    ++put_cnt;
+
  fn_exit:
     MPIDI_FUNC_EXIT(MPID_STATE_SEND_PKT);
     return mpi_errno;
@@ -281,6 +229,8 @@ static int send_noncontig_pkt(MPIDI_VC_t *vc, MPID_Request *sreq, void *hdr_p)
     sendbuf->remaining = sreq->dev.segment_size - sent_sz;
     NEW_TAG(sendbuf->tag);
     TMPBUF(sreq) = NULL;
+    REQ_PTL(sreq)->num_gets = 0;
+    REQ_PTL(sreq)->put_acked = 0;
 
     if (sreq->dev.segment_size) {
         MPIDI_msg_sz_t last = sent_sz;
@@ -302,11 +252,6 @@ static int send_noncontig_pkt(MPIDI_VC_t *vc, MPID_Request *sreq, void *hdr_p)
     SENDBUF(sreq) = sendbuf;
     REQ_PTL(sreq)->event_handler = MPID_nem_ptl_nm_ctl_event_handler;
 
-    /* Post ME for the DONE message */
-    mpi_errno = meappend_done(vc_ptl->id, sreq, sendbuf->tag);
-    if (mpi_errno)
-        goto fn_fail;
-
     ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)sendbuf, sendbuf_sz, PTL_NO_ACK_REQ,
                                 vc_ptl->id, vc_ptl->ptc, CTL_TAG, 0, sreq, MPIDI_Process.my_pg_rank);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s",
@@ -314,6 +259,7 @@ static int send_noncontig_pkt(MPIDI_VC_t *vc, MPID_Request *sreq, void *hdr_p)
     MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST, "PtlPut(size=%lu id=(%#x,%#x) pt=%#x)",
                                             sendbuf_sz, vc_ptl->id.phys.nid,
                                             vc_ptl->id.phys.pid, vc_ptl->ptc));
+    ++put_cnt;
 
  fn_exit:
     MPIDI_FUNC_EXIT(MPID_STATE_SEND_NONCONTIG_PKT);
@@ -405,10 +351,11 @@ int MPID_nem_ptl_iSendContig(MPIDI_VC_t *vc, MPID_Request *sreq, void *hdr, MPID
 #define FCNAME MPIU_QUOTE(FUNCNAME)
 static inline void on_data_avail(MPID_Request * req)
 {
+    int (*reqFn) (MPIDI_VC_t *, MPID_Request *, int *);
     MPIDI_STATE_DECL(MPID_STATE_ON_DATA_AVAIL);
+
     MPIDI_FUNC_ENTER(MPID_STATE_ON_DATA_AVAIL);
 
-    int (*reqFn) (MPIDI_VC_t *, MPID_Request *, int *);
     reqFn = req->dev.OnDataAvail;
     if (!reqFn) {
         MPIDI_CH3U_Request_complete(req);
@@ -420,6 +367,9 @@ static inline void on_data_avail(MPID_Request * req)
         reqFn(vc, req, &complete);
         MPIU_Assert(complete == TRUE);
     }
+
+    --put_cnt;
+
     MPIDI_FUNC_EXIT(MPID_STATE_ON_DATA_AVAIL);
 }
 
@@ -437,13 +387,7 @@ int MPID_nem_ptl_nm_ctl_event_handler(const ptl_event_t *e)
     switch(e->type) {
 
     case PTL_EVENT_PUT:
-        if (e->match_bits != CTL_TAG) {
-            MPIU_Free(SENDBUF((MPID_Request *)e->user_ptr));
-            MPIU_Free(TMPBUF((MPID_Request *)e->user_ptr));
-            on_data_avail((MPID_Request *)e->user_ptr);
-            --put_cnt;
-        }
-        else {
+        {
             int ret;
             const uint64_t buf_idx = (uint64_t) e->user_ptr;
             const size_t packet_sz = e->mlength - offsetof(buf_t, packet);
@@ -459,15 +403,6 @@ int MPID_nem_ptl_nm_ctl_event_handler(const ptl_event_t *e)
                 mpi_errno = MPID_nem_handle_pkt(vc, recvbufs[buf_idx].packet, packet_sz);
                 if (mpi_errno)
                     MPIU_ERR_POP(mpi_errno);
-                /* Notify we're done */
-                ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, 0, 0, PTL_NO_ACK_REQ, vc_ptl->id, vc_ptl->ptc,
-                                            DONE_TAG(recvbufs[buf_idx].tag), 0, done_req, MPIDI_Process.my_pg_rank);
-                MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s",
-                                     MPID_nem_ptl_strerror(ret));
-                MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST,
-                                                        "PtlPut(size=0 id=(%#x,%#x) pt=%#x tag=%#lx)",
-                                                        vc_ptl->id.phys.nid, vc_ptl->id.phys.pid,
-                                                        vc_ptl->ptc, DONE_TAG(recvbufs[buf_idx].tag)));
             }
             else {
                 int incomplete;
@@ -493,13 +428,13 @@ int MPID_nem_ptl_nm_ctl_event_handler(const ptl_event_t *e)
                 while (recvbufs[buf_idx].remaining) {
                     MPIDI_CH3U_Request_increment_cc(req, &incomplete);  /* Will be decremented - and eventually freed in REPLY */
                     ret = MPID_nem_ptl_rptl_get(MPIDI_nem_ptl_global_md, (ptl_size_t)buf_ptr,
-                                                size, vc_ptl->id, vc_ptl->ptc, GET_TAG(recvbufs[buf_idx].tag), 0, req);
+                                                size, vc_ptl->id, vc_ptl->ptc, recvbufs[buf_idx].tag, 0, req);
                     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlget", "**ptlget %s",
                                          MPID_nem_ptl_strerror(ret));
                     MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST,
                                                             "PtlGet(size=%lu id=(%#x,%#x) pt=%#x tag=%#lx)", size,
                                                             vc_ptl->id.phys.nid,
-                                                            vc_ptl->id.phys.pid, vc_ptl->ptc, GET_TAG(recvbufs[buf_idx].tag)));
+                                                            vc_ptl->id.phys.pid, vc_ptl->ptc, recvbufs[buf_idx].tag));
                     buf_ptr += size;
                     recvbufs[buf_idx].remaining -= size;
                     if (recvbufs[buf_idx].remaining < MPIDI_nem_ptl_ni_limits.max_msg_size)
@@ -515,41 +450,47 @@ int MPID_nem_ptl_nm_ctl_event_handler(const ptl_event_t *e)
         }
         break;
 
+    case PTL_EVENT_GET:
+        {
+            MPID_Request *const req = e->user_ptr;
+
+            if (--REQ_PTL(req)->num_gets == 0) {
+                MPIU_Free(TMPBUF(req));
+                if (REQ_PTL(req)->put_acked)
+                    on_data_avail(req);  /* Otherwise we'll do it on the SEND */
+            }
+        }
+        break;
+
+    case PTL_EVENT_SEND:
+        {
+            MPID_Request *const req = e->user_ptr;
+
+            MPIU_Free(SENDBUF(req));
+            REQ_PTL(req)->put_acked = 1;
+            if (REQ_PTL(req)->num_gets == 0)  /* Otherwise GET will do it */
+                on_data_avail(req);
+        }
+        break;
+
     case PTL_EVENT_REPLY:
         {
             int incomplete;
-            MPID_Request *const rreq = e->user_ptr;
+            MPID_Request *const req = e->user_ptr;
 
-            MPIDI_CH3U_Request_decrement_cc(rreq, &incomplete);
+            MPIDI_CH3U_Request_decrement_cc(req, &incomplete);
             if (!incomplete) {
-                int ret;
-                MPID_nem_ptl_vc_area *const vc_ptl = VC_PTL(rreq->ch.vc);
-
-                mpi_errno = MPID_nem_handle_pkt(rreq->ch.vc, TMPBUF(rreq), REQ_PTL(rreq)->bytes_put);
+                mpi_errno = MPID_nem_handle_pkt(req->ch.vc, TMPBUF(req), REQ_PTL(req)->bytes_put);
                 if (mpi_errno)
                     MPIU_ERR_POP(mpi_errno);
 
-                /* Notify we're done */
-                ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, 0, 0, PTL_NO_ACK_REQ, vc_ptl->id, vc_ptl->ptc,
-                                            DONE_TAG(rreq->dev.match.parts.tag), 0, done_req, MPIDI_Process.my_pg_rank);
-                MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s",
-                                     MPID_nem_ptl_strerror(ret));
-                MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST,
-                                                        "PtlPut(size=0 id=(%#x,%#x) pt=%#x tag=%#lx)",
-                                                        vc_ptl->id.phys.nid, vc_ptl->id.phys.pid,
-                                                        vc_ptl->ptc, DONE_TAG((ptl_match_bits_t)SENDBUF(rreq))));
-
                 /* Free resources */
-                MPIU_Free(TMPBUF(rreq));
-                MPID_Request_release(rreq);
+                MPIU_Free(TMPBUF(req));
+                MPID_Request_release(req);
             }
         }
         break;
 
-    case PTL_EVENT_GET:
-        MPIDI_CH3U_Request_complete((MPID_Request *)e->user_ptr);
-        break;
-
     default:
         MPIU_Error_printf("Received unexpected event type: %d %s", e->type, MPID_nem_ptl_strevent(e));
         MPIU_ERR_INTERNALANDJUMP(mpi_errno, "Unexpected event type");
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c
index 85ef2f8..a6d4697 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c
@@ -171,6 +171,7 @@ int MPID_nem_ptl_poll(int is_blocking_poll)
         case PTL_EVENT_PUT_OVERFLOW:
         case PTL_EVENT_GET:
         case PTL_EVENT_ACK:
+        case PTL_EVENT_SEND:
         case PTL_EVENT_REPLY:
         case PTL_EVENT_SEARCH: {
             MPID_Request * const req = event.user_ptr;
@@ -189,7 +190,6 @@ int MPID_nem_ptl_poll(int is_blocking_poll)
         case PTL_EVENT_AUTO_UNLINK:
             overflow_me_handle[(size_t)event.user_ptr] = PTL_INVALID_HANDLE;
             break;
-        case PTL_EVENT_SEND:
         case PTL_EVENT_LINK:
             /* ignore */
             break;
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
index 51c0016..022b541 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
@@ -66,6 +66,9 @@ static int handler_send_complete(const ptl_event_t *e)
 
     MPIDI_FUNC_ENTER(MPID_STATE_HANDLER_SEND_COMPLETE);
 
+    if (e->type == PTL_EVENT_SEND)  /* Ignore */
+        goto fn_exit;
+
     MPIU_Assert(e->type == PTL_EVENT_ACK || e->type == PTL_EVENT_GET);
 
     if (REQ_PTL(sreq)->md != PTL_INVALID_HANDLE) {
@@ -101,6 +104,9 @@ static int handler_large(const ptl_event_t *e)
 
     MPIDI_FUNC_ENTER(MPID_STATE_HANDLER_LARGE);
 
+    if (e->type == PTL_EVENT_SEND)  /* Ignore */
+        goto fn_exit;
+
     if (e->type != PTL_EVENT_ACK && e->type != PTL_EVENT_GET)
         MPIU_Error_printf("ACK event expected, received %s ni_fail=%s list=%s user_ptr=%p hdr_data=%#lx\n",
                           MPID_nem_ptl_strevent(e), MPID_nem_ptl_strnifail(e->ni_fail_type),

http://git.mpich.org/mpich.git/commitdiff/eee6659fc01d5900dfbc6cf4f9ca6799b424e388

commit eee6659fc01d5900dfbc6cf4f9ca6799b424e388
Author: Devendar Bureddy <devendar at mellanox.com>
Date:   Thu Nov 13 21:31:13 2014 +0200

     Fix hcoll setup issues

diff --git a/src/mpid/common/hcoll/hcoll_dtypes.h b/src/mpid/common/hcoll/hcoll_dtypes.h
index 65ef6e3..6c86ab6 100644
--- a/src/mpid/common/hcoll/hcoll_dtypes.h
+++ b/src/mpid/common/hcoll/hcoll_dtypes.h
@@ -5,6 +5,7 @@
 static dte_data_representation_t mpi_dtype_2_dte_dtype(MPI_Datatype datatype)
 {
     switch (datatype) {
+    case MPI_CHAR:
     case MPI_SIGNED_CHAR:
         return DTE_BYTE;
     case MPI_SHORT:
diff --git a/src/mpid/common/hcoll/hcoll_init.c b/src/mpid/common/hcoll/hcoll_init.c
index 2bae31d..613a832 100644
--- a/src/mpid/common/hcoll/hcoll_init.c
+++ b/src/mpid/common/hcoll/hcoll_init.c
@@ -1,6 +1,7 @@
 #include "hcoll.h"
 
 static int hcoll_initialized = 0;
+static int hcoll_comm_world_initialized = 0;
 int hcoll_enable = 1;
 int hcoll_enable_barrier = 1;
 int hcoll_enable_bcast = 1;
@@ -22,10 +23,6 @@ int hcoll_destroy(void *param ATTRIBUTE((unused)))
 {
     if (1 == hcoll_initialized) {
         hcoll_finalize();
-        if (MPI_KEYVAL_INVALID != hcoll_comm_attr_keyval) {
-            MPIR_Comm_free_keyval_impl(hcoll_comm_attr_keyval);
-            hcoll_comm_attr_keyval = MPI_KEYVAL_INVALID;
-        }
     }
     hcoll_initialized = 0;
     return 0;
@@ -130,6 +127,13 @@ int hcoll_comm_create(MPID_Comm * comm_ptr, void *param)
     if (0 == hcoll_enable) {
         goto fn_exit;
     }
+    if (MPIR_Process.comm_world == comm_ptr) {
+        hcoll_comm_world_initialized = 1;
+    }
+    if (!hcoll_comm_world_initialized) {
+        comm_ptr->hcoll_priv.is_hcoll_init = 0;
+        goto fn_exit;
+    }
     num_ranks = comm_ptr->local_size;
     if ((MPID_INTRACOMM != comm_ptr->comm_kind) || (2 > num_ranks)) {
         comm_ptr->hcoll_priv.is_hcoll_init = 0;
@@ -185,6 +189,14 @@ int hcoll_comm_destroy(MPID_Comm * comm_ptr, void *param)
         goto fn_exit;
     }
     mpi_errno = MPI_SUCCESS;
+
+    if (comm_ptr == MPIR_Process.comm_world) {
+        if (MPI_KEYVAL_INVALID != hcoll_comm_attr_keyval) {
+            MPIR_Comm_free_keyval_impl(hcoll_comm_attr_keyval);
+            hcoll_comm_attr_keyval = MPI_KEYVAL_INVALID;
+        }
+    }
+
     context_destroyed = 0;
     if ((NULL != comm_ptr) && (0 != comm_ptr->hcoll_priv.is_hcoll_init)) {
         if (NULL != comm_ptr->coll_fns) {
@@ -193,7 +205,6 @@ int hcoll_comm_destroy(MPID_Comm * comm_ptr, void *param)
         comm_ptr->coll_fns = comm_ptr->hcoll_priv.hcoll_origin_coll_fns;
         hcoll_destroy_context(comm_ptr->hcoll_priv.hcoll_context,
                               (rte_grp_handle_t) comm_ptr, &context_destroyed);
-        MPIU_Assert(context_destroyed);
         comm_ptr->hcoll_priv.is_hcoll_init = 0;
     }
   fn_exit:

http://git.mpich.org/mpich.git/commitdiff/1a8618f8ebdb0c20ec2fcdb4f895ee69a9500fd9

commit 1a8618f8ebdb0c20ec2fcdb4f895ee69a9500fd9
Author: Pavan Balaji <balaji at anl.gov>
Date:   Thu Nov 13 09:36:30 2014 -0600

    Fix rportals bug when ACK not requested
    
    The user pointer was set, but later overwritten with an internal value.
    
    Signed-off-by: Antonio Pena Monferrer <apenya at mcs.anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c
index 4de135c..cbcc81c 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c
@@ -1186,6 +1186,11 @@ int MPID_nem_ptl_rptl_eqget(ptl_handle_eq_t eq_handle, ptl_event_t * event)
             else if (!(op->u.put.ack_req & PTL_ACK_REQ)) {
                 memcpy(event, op->u.put.send, sizeof(ptl_event_t));
                 MPIU_Free(op->u.put.send);
+
+                /* set the event user pointer again, since we copied
+                 * over the original event */
+                event->user_ptr = op->u.put.user_ptr;
+
                 /* we should be in the data op list */
                 MPL_DL_DELETE(op->target->data_op_list, op);
                 free_op(op);

http://git.mpich.org/mpich.git/commitdiff/e54deef81b8d55521b7ff9e4a05b8233a60ac125

commit e54deef81b8d55521b7ff9e4a05b8233a60ac125
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Wed Nov 12 12:26:36 2014 -0600

    portals4: better dtype mismatch detection
    
    The previous code only detected a datatype mismatch when the message
    was copied out of the unexpected queue. Now it will throw an error
    in both cases.
    
    We also set the error in the status object to match the default ch3
    behavior. This fixed an issue where the request would not be freed
    and cause extra debugging output at MPI_Finalize.
    
    Signed-off-by: Antonio Pena Monferrer <apenya at mcs.anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c
index 0440972..ec6d90a 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c
@@ -84,23 +84,24 @@ static int handler_recv_dequeue_complete(const ptl_event_t *e)
 {
     int mpi_errno = MPI_SUCCESS;
     MPID_Request *const rreq = e->user_ptr;
+    int is_contig;
+    MPI_Aint last;
+    MPI_Aint dt_true_lb;
+    MPIDI_msg_sz_t data_sz;
+    MPID_Datatype *dt_ptr ATTRIBUTE((unused));
+
     MPIDI_STATE_DECL(MPID_STATE_HANDLER_RECV_DEQUEUE_COMPLETE);
 
     MPIDI_FUNC_ENTER(MPID_STATE_HANDLER_RECV_DEQUEUE_COMPLETE);
 
     MPIU_Assert(e->type == PTL_EVENT_PUT || e->type == PTL_EVENT_PUT_OVERFLOW);
+
+    MPIDI_Datatype_get_info(rreq->dev.user_count, rreq->dev.datatype, is_contig, data_sz, dt_ptr, dt_true_lb);
     
     dequeue_req(e);
 
     if (e->type == PTL_EVENT_PUT_OVERFLOW) {
         /* unpack the data from unexpected buffer */
-        int is_contig;
-        MPI_Aint last;
-        MPI_Aint dt_true_lb;
-        MPIDI_msg_sz_t data_sz ATTRIBUTE((unused));
-        MPID_Datatype *dt_ptr ATTRIBUTE((unused));
-
-        MPIDI_Datatype_get_info(rreq->dev.user_count, rreq->dev.datatype, is_contig, data_sz, dt_ptr, dt_true_lb);
         MPIU_DBG_MSG_D(CH3_CHANNEL, VERBOSE, "is_contig = %d", is_contig);
 
         if (is_contig) {
@@ -108,8 +109,12 @@ static int handler_recv_dequeue_complete(const ptl_event_t *e)
         } else {
             last = e->mlength;
             MPID_Segment_unpack(rreq->dev.segment_ptr, rreq->dev.segment_first, &last, e->start);
-            MPIU_ERR_CHKANDJUMP(last != e->mlength, mpi_errno, MPI_ERR_OTHER, "**dtypemismatch");
+            if (last != e->mlength)
+                MPIU_ERR_SET(rreq->status.MPI_ERROR, MPI_ERR_TYPE, "**dtypemismatch");
         }
+    } else {
+        if (!is_contig && data_sz != e->mlength)
+            MPIU_ERR_SET(rreq->status.MPI_ERROR, MPI_ERR_TYPE, "**dtypemismatch");
     }
     
     mpi_errno = handler_recv_complete(e);

http://git.mpich.org/mpich.git/commitdiff/ad1465979e73837bab2fe8cdabbebf4fd0494497

commit ad1465979e73837bab2fe8cdabbebf4fd0494497
Author: Antonio Pena Monferrer <apenya at mcs.anl.gov>
Date:   Wed Nov 12 19:55:23 2014 -0600

    Code cleanup to fix compiler warnings in rportals
    
    Signed-off-by: Pavan Balaji <balaji at anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c
index af67c47..4de135c 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c
@@ -128,7 +128,6 @@ int MPID_nem_ptl_rptl_init(int world_size, uint64_t max_origin_events,
                                                    ptl_pt_index_t * target_data_pt,
                                                    ptl_pt_index_t * target_control_pt))
 {
-    int mpi_errno = MPI_SUCCESS;
     int ret = PTL_OK;
     MPIDI_STATE_DECL(MPID_STATE_MPID_NEM_PTL_RPTL_INIT);
 
@@ -393,7 +392,7 @@ static int alloc_op(struct rptl_op **op, struct rptl_target *target)
 #define FUNCNAME free_op
 #undef FCNAME
 #define FCNAME MPIU_QUOTE(FUNCNAME)
-void free_op(struct rptl_op *op)
+static void free_op(struct rptl_op *op)
 {
     MPIDI_STATE_DECL(MPID_STATE_FREE_OP);
 
@@ -414,7 +413,7 @@ static int rptl_put(ptl_handle_md_t md_handle, ptl_size_t local_offset, ptl_size
 #define FUNCNAME poke_progress
 #undef FCNAME
 #define FCNAME MPIU_QUOTE(FUNCNAME)
-int poke_progress(void)
+static int poke_progress(void)
 {
     int ret = PTL_OK;
     struct rptl_target *target;
@@ -965,13 +964,11 @@ static int retrieve_event(ptl_event_t * event)
 #define FCNAME MPIU_QUOTE(FUNCNAME)
 int MPID_nem_ptl_rptl_eqget(ptl_handle_eq_t eq_handle, ptl_event_t * event)
 {
-    struct rptl_op *op;
-    struct rptl *rptl;
-    ptl_event_t e;
+    struct rptl_op *op = NULL;
+    struct rptl *rptl = NULL;
     int ret = PTL_OK, tmp_ret = PTL_OK;
     int mpi_errno = MPI_SUCCESS;
     struct rptl_target *target;
-    MPIU_CHKPMEM_DECL(1);
     MPIDI_STATE_DECL(MPID_STATE_MPID_NEM_PTL_RPTL_EQGET);
 
     MPIDI_FUNC_ENTER(MPID_STATE_MPID_NEM_PTL_RPTL_EQGET);
@@ -1208,13 +1205,11 @@ int MPID_nem_ptl_rptl_eqget(ptl_handle_eq_t eq_handle, ptl_event_t * event)
     }
 
   fn_exit:
-    MPIU_CHKPMEM_COMMIT();
     MPIDI_FUNC_EXIT(MPID_STATE_MPID_NEM_PTL_RPTL_EQGET);
     return ret;
 
   fn_fail:
     if (mpi_errno)
         ret = PTL_FAIL;
-    MPIU_CHKPMEM_REAP();
     goto fn_exit;
 }

http://git.mpich.org/mpich.git/commitdiff/bfd7f00ecef9ef8c4cf3b53638de48334266621b

commit bfd7f00ecef9ef8c4cf3b53638de48334266621b
Author: Pavan Balaji <balaji at anl.gov>
Date:   Wed Nov 12 13:51:16 2014 -0600

    Change "flow_control" to data/control portal type.
    
    The terminology "flow_control" was a bit of a misnomer since we do
    more than just enable/disable flow control based on whether messages
    are on the data or control portal.
    
    Signed-off-by: Antonio Pena Monferrer <apenya at mcs.anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c
index e6b4de6..af67c47 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c
@@ -408,7 +408,7 @@ void free_op(struct rptl_op *op)
 static int rptl_put(ptl_handle_md_t md_handle, ptl_size_t local_offset, ptl_size_t length,
                     ptl_ack_req_t ack_req, ptl_process_t target_id, ptl_pt_index_t pt_index,
                     ptl_match_bits_t match_bits, ptl_size_t remote_offset, void *user_ptr,
-                    ptl_hdr_data_t hdr_data, int flow_control);
+                    ptl_hdr_data_t hdr_data, enum rptl_pt_type pt_type);
 
 #undef FUNCNAME
 #define FUNCNAME poke_progress
@@ -458,9 +458,8 @@ int poke_progress(void)
                 /* make sure the user setup a control portal */
                 assert(control_pt != PTL_PT_ANY);
 
-                /* disable flow control for control messages */
                 ret = rptl_put(rptl->md, 0, 0, PTL_NO_ACK_REQ, id, control_pt,
-                               0, 0, NULL, RPTL_CONTROL_MSG_UNPAUSE, 0);
+                               0, 0, NULL, RPTL_CONTROL_MSG_UNPAUSE, RPTL_PT_CONTROL);
                 RPTLU_ERR_POP(ret, "Error sending unpause message\n");
             }
         }
@@ -494,9 +493,8 @@ int poke_progress(void)
 
             target->state = RPTL_TARGET_STATE_PAUSE_ACKED;
 
-            /* disable flow control for control messages */
             ret = rptl_put(target->rptl->md, 0, 0, PTL_NO_ACK_REQ, id, control_pt, 0,
-                                        0, NULL, RPTL_CONTROL_MSG_PAUSE_ACK, 0);
+                           0, NULL, RPTL_CONTROL_MSG_PAUSE_ACK, RPTL_PT_CONTROL);
             RPTLU_ERR_POP(ret, "Error sending pause ack message\n");
 
             continue;
@@ -611,7 +609,7 @@ int poke_progress(void)
 static int rptl_put(ptl_handle_md_t md_handle, ptl_size_t local_offset, ptl_size_t length,
                     ptl_ack_req_t ack_req, ptl_process_t target_id, ptl_pt_index_t pt_index,
                     ptl_match_bits_t match_bits, ptl_size_t remote_offset, void *user_ptr,
-                    ptl_hdr_data_t hdr_data, int flow_control)
+                    ptl_hdr_data_t hdr_data, enum rptl_pt_type pt_type)
 {
     struct rptl_op *op;
     int ret = PTL_OK;
@@ -644,11 +642,11 @@ static int rptl_put(ptl_handle_md_t md_handle, ptl_size_t local_offset, ptl_size
     /* place to store the send and ack events */
     op->u.put.send = NULL;
     op->u.put.ack = NULL;
-    op->u.put.flow_control = flow_control;
+    op->u.put.pt_type = pt_type;
     op->events_ready = 0;
     op->target = target;
 
-    if (op->u.put.flow_control)
+    if (op->u.put.pt_type == RPTL_PT_DATA)
         MPL_DL_APPEND(target->data_op_list, op);
     else
         MPL_DL_APPEND(target->control_op_list, op);
@@ -675,7 +673,7 @@ int MPID_nem_ptl_rptl_put(ptl_handle_md_t md_handle, ptl_size_t local_offset, pt
                           ptl_hdr_data_t hdr_data)
 {
     return rptl_put(md_handle, local_offset, length, ack_req, target_id, pt_index, match_bits,
-                    remote_offset, user_ptr, hdr_data, 1);
+                    remote_offset, user_ptr, hdr_data, RPTL_PT_DATA);
 }
 
 
@@ -761,9 +759,8 @@ static int send_pause_messages(struct rptl *rptl)
         /* make sure the user setup a control portal */
         assert(control_pt != PTL_PT_ANY);
 
-        /* disable flow control for control messages */
         ret = rptl_put(rptl->md, 0, 0, PTL_NO_ACK_REQ, id, control_pt, 0, 0,
-                                    NULL, RPTL_CONTROL_MSG_PAUSE, 0);
+                                    NULL, RPTL_CONTROL_MSG_PAUSE, RPTL_PT_CONTROL);
         RPTLU_ERR_POP(ret, "Error sending pause message\n");
     }
 
@@ -1099,7 +1096,7 @@ int MPID_nem_ptl_rptl_eqget(ptl_handle_eq_t eq_handle, ptl_event_t * event)
 
             /* we should not get NACKs on the control portal */
             if (event->type == PTL_EVENT_ACK)
-                assert(op->u.put.flow_control);
+                assert(op->u.put.pt_type == RPTL_PT_DATA);
 
             op->state = RPTL_OP_STATE_NACKED;
 
@@ -1159,8 +1156,9 @@ int MPID_nem_ptl_rptl_eqget(ptl_handle_eq_t eq_handle, ptl_event_t * event)
             op->events_ready = 1;
             event->user_ptr = op->u.put.user_ptr;
 
-            /* if flow control is not set, ignore the ACK event */
-            if (op->u.put.flow_control == 0) {
+            /* if the message is over the control portal, ignore the
+             * ACK event */
+            if (op->u.put.pt_type == RPTL_PT_CONTROL) {
                 MPIU_Free(op->u.put.ack);
                 MPL_DL_DELETE(op->target->control_op_list, op);
                 free_op(op);
@@ -1177,8 +1175,9 @@ int MPID_nem_ptl_rptl_eqget(ptl_handle_eq_t eq_handle, ptl_event_t * event)
             op->events_ready = 1;
             event->user_ptr = op->u.put.user_ptr;
 
-            /* if flow control is not set, ignore ACK event */
-            if (op->u.put.flow_control == 0) {
+            /* if the message is over the control portal, ignore ACK
+             * event */
+            if (op->u.put.pt_type == RPTL_PT_CONTROL) {
                 MPIU_Free(op->u.put.send);
                 MPL_DL_DELETE(op->target->control_op_list, op);
                 free_op(op);
@@ -1190,7 +1189,7 @@ int MPID_nem_ptl_rptl_eqget(ptl_handle_eq_t eq_handle, ptl_event_t * event)
             else if (!(op->u.put.ack_req & PTL_ACK_REQ)) {
                 memcpy(event, op->u.put.send, sizeof(ptl_event_t));
                 MPIU_Free(op->u.put.send);
-                /* flow control is set, we should be in the data op list */
+                /* we should be in the data op list */
                 MPL_DL_DELETE(op->target->data_op_list, op);
                 free_op(op);
             }
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.h b/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.h
index 08e95e7..7ce31d9 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.h
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.h
@@ -30,6 +30,11 @@
         }                                                               \
     }
 
+enum rptl_pt_type {
+    RPTL_PT_DATA,
+    RPTL_PT_CONTROL
+};
+
 struct rptl_target;
 struct rptl_op {
     enum {
@@ -59,7 +64,7 @@ struct rptl_op {
             /* internal variables store events */
             ptl_event_t *send;
             ptl_event_t *ack;
-            int flow_control;
+            enum rptl_pt_type pt_type;
         } put;
         struct {
             ptl_handle_md_t md_handle;

http://git.mpich.org/mpich.git/commitdiff/95cc2b16b2ad4ceed9164cb56aebfeb23638382f

commit 95cc2b16b2ad4ceed9164cb56aebfeb23638382f
Author: Pavan Balaji <balaji at anl.gov>
Date:   Wed Nov 12 13:45:36 2014 -0600

    Do not expose the flowcontrol parameter to the user.
    
    Signed-off-by: Antonio Pena Monferrer <apenya at mcs.anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c
index 996f4be..481b88b 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c
@@ -245,7 +245,7 @@ static inline int send_pkt(MPIDI_VC_t *vc, void *hdr_p, void *data_p, MPIDI_msg_
         goto fn_fail;
 
     ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)sendbuf, sendbuf_sz, PTL_NO_ACK_REQ,
-                                vc_ptl->id, vc_ptl->ptc, CTL_TAG, 0, sreq, MPIDI_Process.my_pg_rank, 1);
+                                vc_ptl->id, vc_ptl->ptc, CTL_TAG, 0, sreq, MPIDI_Process.my_pg_rank);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s",
                          MPID_nem_ptl_strerror(ret));
     MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST, "PtlPut(size=%lu id=(%#x,%#x) pt=%#x)",
@@ -308,7 +308,7 @@ static int send_noncontig_pkt(MPIDI_VC_t *vc, MPID_Request *sreq, void *hdr_p)
         goto fn_fail;
 
     ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)sendbuf, sendbuf_sz, PTL_NO_ACK_REQ,
-                                vc_ptl->id, vc_ptl->ptc, CTL_TAG, 0, sreq, MPIDI_Process.my_pg_rank, 1);
+                                vc_ptl->id, vc_ptl->ptc, CTL_TAG, 0, sreq, MPIDI_Process.my_pg_rank);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s",
                          MPID_nem_ptl_strerror(ret));
     MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST, "PtlPut(size=%lu id=(%#x,%#x) pt=%#x)",
@@ -461,7 +461,7 @@ int MPID_nem_ptl_nm_ctl_event_handler(const ptl_event_t *e)
                     MPIU_ERR_POP(mpi_errno);
                 /* Notify we're done */
                 ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, 0, 0, PTL_NO_ACK_REQ, vc_ptl->id, vc_ptl->ptc,
-                                            DONE_TAG(recvbufs[buf_idx].tag), 0, done_req, MPIDI_Process.my_pg_rank, 1);
+                                            DONE_TAG(recvbufs[buf_idx].tag), 0, done_req, MPIDI_Process.my_pg_rank);
                 MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s",
                                      MPID_nem_ptl_strerror(ret));
                 MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST,
@@ -531,7 +531,7 @@ int MPID_nem_ptl_nm_ctl_event_handler(const ptl_event_t *e)
 
                 /* Notify we're done */
                 ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, 0, 0, PTL_NO_ACK_REQ, vc_ptl->id, vc_ptl->ptc,
-                                            DONE_TAG(rreq->dev.match.parts.tag), 0, done_req, MPIDI_Process.my_pg_rank, 1);
+                                            DONE_TAG(rreq->dev.match.parts.tag), 0, done_req, MPIDI_Process.my_pg_rank);
                 MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s",
                                      MPID_nem_ptl_strerror(ret));
                 MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST,
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
index a6f7d4b..51c0016 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
@@ -267,7 +267,7 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
             MPIU_DBG_MSG_P(CH3_CHANNEL, VERBOSE, "&REQ_PTL(sreq)->event_handler = %p", &(REQ_PTL(sreq)->event_handler));
             ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)((char *)buf + dt_true_lb), data_sz, PTL_ACK_REQ, vc_ptl->id, vc_ptl->pt,
                          NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), 0, sreq,
-                                        NPTL_HEADER(ssend_flag, data_sz), 1);
+                                        NPTL_HEADER(ssend_flag, data_sz));
             MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s", MPID_nem_ptl_strerror(ret));
             DBG_MSG_PUT("global", data_sz, vc->pg_rank, NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), NPTL_HEADER(ssend_flag, data_sz));
             MPIU_DBG_MSG_D(CH3_CHANNEL, VERBOSE, "id.nid = %#x", vc_ptl->id.phys.nid);
@@ -304,7 +304,7 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
             REQ_PTL(sreq)->event_handler = handler_send_complete;
             ret = MPID_nem_ptl_rptl_put(REQ_PTL(sreq)->md, 0, data_sz, PTL_ACK_REQ, vc_ptl->id, vc_ptl->pt,
                          NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), 0, sreq,
-                                        NPTL_HEADER(ssend_flag, data_sz), 1);
+                                        NPTL_HEADER(ssend_flag, data_sz));
             MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s", MPID_nem_ptl_strerror(ret));
             DBG_MSG_PUT("sreq", data_sz, vc->pg_rank, NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), NPTL_HEADER(ssend_flag, data_sz));
             goto fn_exit;
@@ -321,7 +321,7 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
         REQ_PTL(sreq)->event_handler = handler_send_complete;
         ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)REQ_PTL(sreq)->chunk_buffer[0], data_sz, PTL_ACK_REQ,
                      vc_ptl->id, vc_ptl->pt, NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), 0, sreq,
-                                    NPTL_HEADER(ssend_flag, data_sz), 1);
+                                    NPTL_HEADER(ssend_flag, data_sz));
         MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s", MPID_nem_ptl_strerror(ret));
         DBG_MSG_PUT("global", data_sz, vc->pg_rank, NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), NPTL_HEADER(ssend_flag, data_sz));
         goto fn_exit;
@@ -338,7 +338,7 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
         REQ_PTL(sreq)->event_handler = handler_large;
         ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)((char *)buf + dt_true_lb), PTL_LARGE_THRESHOLD, PTL_ACK_REQ, vc_ptl->id, vc_ptl->pt,
                      NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), 0, sreq,
-                                    NPTL_HEADER(ssend_flag | NPTL_LARGE, data_sz), 1);
+                                    NPTL_HEADER(ssend_flag | NPTL_LARGE, data_sz));
         MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s", MPID_nem_ptl_strerror(ret));
         DBG_MSG_PUT("global", PTL_LARGE_THRESHOLD, vc->pg_rank, NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), NPTL_HEADER(ssend_flag | NPTL_LARGE, data_sz));
         goto fn_exit;
@@ -408,7 +408,7 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
                 REQ_PTL(sreq)->event_handler = handler_large;
                 ret = MPID_nem_ptl_rptl_put(REQ_PTL(sreq)->md, 0, PTL_LARGE_THRESHOLD, PTL_ACK_REQ, vc_ptl->id, vc_ptl->pt,
                              NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), 0, sreq,
-                                            NPTL_HEADER(ssend_flag | NPTL_LARGE, data_sz), 1);
+                                            NPTL_HEADER(ssend_flag | NPTL_LARGE, data_sz));
                 MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s", MPID_nem_ptl_strerror(ret));
                 DBG_MSG_PUT("req", PTL_LARGE_THRESHOLD, vc->pg_rank, NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), NPTL_HEADER(ssend_flag | NPTL_LARGE, data_sz));
                 goto fn_exit;
@@ -432,7 +432,7 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
     REQ_PTL(sreq)->event_handler = handler_large;
     ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)REQ_PTL(sreq)->chunk_buffer[0], PTL_LARGE_THRESHOLD,
                                 PTL_ACK_REQ, vc_ptl->id, vc_ptl->pt, NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank),
-                                0, sreq, NPTL_HEADER(ssend_flag | NPTL_LARGE, data_sz), 1);
+                                0, sreq, NPTL_HEADER(ssend_flag | NPTL_LARGE, data_sz));
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s", MPID_nem_ptl_strerror(ret));
     DBG_MSG_PUT("global", PTL_LARGE_THRESHOLD, vc->pg_rank, NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), NPTL_HEADER(ssend_flag | NPTL_LARGE, data_sz));
     
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c
index 210bfdc..e6b4de6 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c
@@ -405,6 +405,11 @@ void free_op(struct rptl_op *op)
 }
 
 
+static int rptl_put(ptl_handle_md_t md_handle, ptl_size_t local_offset, ptl_size_t length,
+                    ptl_ack_req_t ack_req, ptl_process_t target_id, ptl_pt_index_t pt_index,
+                    ptl_match_bits_t match_bits, ptl_size_t remote_offset, void *user_ptr,
+                    ptl_hdr_data_t hdr_data, int flow_control);
+
 #undef FUNCNAME
 #define FUNCNAME poke_progress
 #undef FCNAME
@@ -454,8 +459,8 @@ int poke_progress(void)
                 assert(control_pt != PTL_PT_ANY);
 
                 /* disable flow control for control messages */
-                ret = MPID_nem_ptl_rptl_put(rptl->md, 0, 0, PTL_NO_ACK_REQ, id, control_pt,
-                                            0, 0, NULL, RPTL_CONTROL_MSG_UNPAUSE, 0);
+                ret = rptl_put(rptl->md, 0, 0, PTL_NO_ACK_REQ, id, control_pt,
+                               0, 0, NULL, RPTL_CONTROL_MSG_UNPAUSE, 0);
                 RPTLU_ERR_POP(ret, "Error sending unpause message\n");
             }
         }
@@ -490,7 +495,7 @@ int poke_progress(void)
             target->state = RPTL_TARGET_STATE_PAUSE_ACKED;
 
             /* disable flow control for control messages */
-            ret = MPID_nem_ptl_rptl_put(target->rptl->md, 0, 0, PTL_NO_ACK_REQ, id, control_pt, 0,
+            ret = rptl_put(target->rptl->md, 0, 0, PTL_NO_ACK_REQ, id, control_pt, 0,
                                         0, NULL, RPTL_CONTROL_MSG_PAUSE_ACK, 0);
             RPTLU_ERR_POP(ret, "Error sending pause ack message\n");
 
@@ -600,20 +605,20 @@ int poke_progress(void)
 
 
 #undef FUNCNAME
-#define FUNCNAME MPID_nem_ptl_rptl_put
+#define FUNCNAME rptl_put
 #undef FCNAME
 #define FCNAME MPIU_QUOTE(FUNCNAME)
-int MPID_nem_ptl_rptl_put(ptl_handle_md_t md_handle, ptl_size_t local_offset, ptl_size_t length,
-                          ptl_ack_req_t ack_req, ptl_process_t target_id, ptl_pt_index_t pt_index,
-                          ptl_match_bits_t match_bits, ptl_size_t remote_offset, void *user_ptr,
-                          ptl_hdr_data_t hdr_data, int flow_control)
+static int rptl_put(ptl_handle_md_t md_handle, ptl_size_t local_offset, ptl_size_t length,
+                    ptl_ack_req_t ack_req, ptl_process_t target_id, ptl_pt_index_t pt_index,
+                    ptl_match_bits_t match_bits, ptl_size_t remote_offset, void *user_ptr,
+                    ptl_hdr_data_t hdr_data, int flow_control)
 {
     struct rptl_op *op;
     int ret = PTL_OK;
     struct rptl_target *target;
-    MPIDI_STATE_DECL(MPID_STATE_MPID_NEM_PTL_RPTL_PUT);
+    MPIDI_STATE_DECL(MPID_STATE_RPTL_PUT);
 
-    MPIDI_FUNC_ENTER(MPID_STATE_MPID_NEM_PTL_RPTL_PUT);
+    MPIDI_FUNC_ENTER(MPID_STATE_RPTL_PUT);
 
     ret = find_target(target_id, &target);
     RPTLU_ERR_POP(ret, "error finding target structure\n");
@@ -652,7 +657,7 @@ int MPID_nem_ptl_rptl_put(ptl_handle_md_t md_handle, ptl_size_t local_offset, pt
     RPTLU_ERR_POP(ret, "Error from poke_progress\n");
 
   fn_exit:
-    MPIDI_FUNC_EXIT(MPID_STATE_MPID_NEM_PTL_RPTL_PUT);
+    MPIDI_FUNC_EXIT(MPID_STATE_RPTL_PUT);
     return ret;
 
   fn_fail:
@@ -661,6 +666,20 @@ int MPID_nem_ptl_rptl_put(ptl_handle_md_t md_handle, ptl_size_t local_offset, pt
 
 
 #undef FUNCNAME
+#define FUNCNAME rptl_put
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+int MPID_nem_ptl_rptl_put(ptl_handle_md_t md_handle, ptl_size_t local_offset, ptl_size_t length,
+                          ptl_ack_req_t ack_req, ptl_process_t target_id, ptl_pt_index_t pt_index,
+                          ptl_match_bits_t match_bits, ptl_size_t remote_offset, void *user_ptr,
+                          ptl_hdr_data_t hdr_data)
+{
+    return rptl_put(md_handle, local_offset, length, ack_req, target_id, pt_index, match_bits,
+                    remote_offset, user_ptr, hdr_data, 1);
+}
+
+
+#undef FUNCNAME
 #define FUNCNAME MPID_nem_ptl_rptl_get
 #undef FCNAME
 #define FCNAME MPIU_QUOTE(FUNCNAME)
@@ -743,7 +762,7 @@ static int send_pause_messages(struct rptl *rptl)
         assert(control_pt != PTL_PT_ANY);
 
         /* disable flow control for control messages */
-        ret = MPID_nem_ptl_rptl_put(rptl->md, 0, 0, PTL_NO_ACK_REQ, id, control_pt, 0, 0,
+        ret = rptl_put(rptl->md, 0, 0, PTL_NO_ACK_REQ, id, control_pt, 0, 0,
                                     NULL, RPTL_CONTROL_MSG_PAUSE, 0);
         RPTLU_ERR_POP(ret, "Error sending pause message\n");
     }
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.h b/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.h
index c5f1254..08e95e7 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.h
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.h
@@ -171,7 +171,7 @@ int MPID_nem_ptl_rptl_ptfini(ptl_pt_index_t pt_index);
 int MPID_nem_ptl_rptl_put(ptl_handle_md_t md_handle, ptl_size_t local_offset, ptl_size_t length,
                           ptl_ack_req_t ack_req, ptl_process_t target_id, ptl_pt_index_t pt_index,
                           ptl_match_bits_t match_bits, ptl_size_t remote_offset, void *user_ptr,
-                          ptl_hdr_data_t hdr_data, int flow_control);
+                          ptl_hdr_data_t hdr_data);
 
 int MPID_nem_ptl_rptl_get(ptl_handle_md_t md_handle, ptl_size_t local_offset, ptl_size_t length,
                           ptl_process_t target_id, ptl_pt_index_t pt_index,

http://git.mpich.org/mpich.git/commitdiff/2bd62c3c871fa968f5e67261dcc660e95f26a2ed

commit 2bd62c3c871fa968f5e67261dcc660e95f26a2ed
Author: Pavan Balaji <balaji at anl.gov>
Date:   Tue Nov 4 22:03:36 2014 -0600

    Several updates to the rportals code.
    
    We now use a target structure for each target ID that we want to send
    data to.  This allows us to separate out target-specific states and
    more cleanly manage operations to a single target.
    
    Signed-off-by: Antonio Pena Monferrer <apenya at mcs.anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c
index 60e8db8..996f4be 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c
@@ -461,7 +461,7 @@ int MPID_nem_ptl_nm_ctl_event_handler(const ptl_event_t *e)
                     MPIU_ERR_POP(mpi_errno);
                 /* Notify we're done */
                 ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, 0, 0, PTL_NO_ACK_REQ, vc_ptl->id, vc_ptl->ptc,
-                                            DONE_TAG(recvbufs[buf_idx].tag), 0, done_req, MPIDI_Process.my_pg_rank, 0);
+                                            DONE_TAG(recvbufs[buf_idx].tag), 0, done_req, MPIDI_Process.my_pg_rank, 1);
                 MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s",
                                      MPID_nem_ptl_strerror(ret));
                 MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST,
@@ -531,7 +531,7 @@ int MPID_nem_ptl_nm_ctl_event_handler(const ptl_event_t *e)
 
                 /* Notify we're done */
                 ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, 0, 0, PTL_NO_ACK_REQ, vc_ptl->id, vc_ptl->ptc,
-                                            DONE_TAG(rreq->dev.match.parts.tag), 0, done_req, MPIDI_Process.my_pg_rank, 0);
+                                            DONE_TAG(rreq->dev.match.parts.tag), 0, done_req, MPIDI_Process.my_pg_rank, 1);
                 MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s",
                                      MPID_nem_ptl_strerror(ret));
                 MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST,
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c
index da126e4..210bfdc 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c
@@ -60,39 +60,9 @@
 #define IDS_ARE_EQUAL(t1, t2) \
     (t1.phys.nid == t2.phys.nid && t1.phys.pid == t2.phys.pid)
 
-#define RPTL_OP_POOL_SEGMENT_COUNT  (1024)
-
 static struct {
     struct rptl *rptl_list;
-
-    struct rptl_op_pool_segment {
-        struct rptl_op op[RPTL_OP_POOL_SEGMENT_COUNT];
-        struct rptl_op_pool_segment *next;
-        struct rptl_op_pool_segment *prev;
-    } *op_segment_list;
-    struct rptl_op *op_pool;
-
-    struct rptl_op *op_list;
-
-    /* targets that we do not send messages to either because they
-     * sent a PAUSE message or because we received a NACK from them */
-    struct rptl_paused_target {
-        ptl_process_t id;
-        enum rptl_paused_target_state {
-            RPTL_TARGET_STATE_FLOWCONTROL,
-            RPTL_TARGET_STATE_DISABLED,
-            RPTL_TARGET_STATE_RECEIVED_PAUSE,
-            RPTL_TARGET_STATE_PAUSE_ACKED
-        } state;
-
-        /* the rptl on which the pause message came in, since we need
-         * to use it to send the pause ack to the right target
-         * portal */
-        struct rptl *rptl;
-
-        struct rptl_paused_target *next;
-        struct rptl_paused_target *prev;
-    } *paused_target_list;
+    struct rptl_target *target_list;
 
     int world_size;
     uint64_t origin_events_left;
@@ -102,82 +72,42 @@ static struct {
 
 
 #undef FUNCNAME
-#define FUNCNAME alloc_target
+#define FUNCNAME find_target
 #undef FCNAME
 #define FCNAME MPIU_QUOTE(FUNCNAME)
-static int alloc_target(ptl_process_t id, enum rptl_paused_target_state state, struct rptl *rptl)
+static int find_target(ptl_process_t id, struct rptl_target **target)
 {
     int mpi_errno = MPI_SUCCESS;
     int ret = PTL_OK;
-    struct rptl_paused_target *target;
+    struct rptl_target *t;
     MPIU_CHKPMEM_DECL(1);
-    MPIDI_STATE_DECL(MPID_STATE_ALLOC_TARGET);
+    MPIDI_STATE_DECL(MPID_STATE_FIND_TARGET);
 
-    MPIDI_FUNC_ENTER(MPID_STATE_ALLOC_TARGET);
+    MPIDI_FUNC_ENTER(MPID_STATE_FIND_TARGET);
 
-    for (target = rptl_info.paused_target_list; target; target = target->next)
-        if (IDS_ARE_EQUAL(target->id, id))
+    for (t = rptl_info.target_list; t; t = t->next)
+        if (IDS_ARE_EQUAL(t->id, id))
             break;
 
-    /* if a paused target does not already exist, create one */
-    if (target == NULL) {
-        /* create a new paused target */
-        MPIU_CHKPMEM_MALLOC(target, struct rptl_paused_target *, sizeof(struct rptl_paused_target),
-                            mpi_errno, "rptl paused target");
-        MPL_DL_APPEND(rptl_info.paused_target_list, target);
-
-        target->id = id;
-        target->state = state;
-        target->rptl = rptl;
-    }
-    else if (target->state < state) {
-        target->state = state;
-        target->rptl = rptl;
-    }
-    else {
-        /* target already exists and is in a higher state than the
-         * state we are trying to set.  e.g., this is possible if we
-         * got a PAUSE event from a different portal and acked. */
+    /* if the target does not already exist, create one */
+    if (t == NULL) {
+        MPIU_CHKPMEM_MALLOC(t, struct rptl_target *, sizeof(struct rptl_target), mpi_errno, "rptl target");
+        MPL_DL_APPEND(rptl_info.target_list, t);
+
+        t->id = id;
+        t->state = RPTL_TARGET_STATE_ACTIVE;
+        t->rptl = NULL;
+        t->op_segment_list = NULL;
+        t->op_pool = NULL;
+        t->data_op_list = NULL;
+        t->control_op_list = NULL;
     }
 
-  fn_exit:
-    MPIU_CHKPMEM_COMMIT();
-    MPIDI_FUNC_EXIT(MPID_STATE_ALLOC_TARGET);
-    return ret;
-
-  fn_fail:
-    if (mpi_errno)
-        ret = PTL_FAIL;
-    MPIU_CHKPMEM_REAP();
-    goto fn_exit;
-}
-
-
-#undef FUNCNAME
-#define FUNCNAME alloc_op_segment
-#undef FCNAME
-#define FCNAME MPIU_QUOTE(FUNCNAME)
-static int alloc_op_segment(void)
-{
-    struct rptl_op_pool_segment *op_segment;
-    int mpi_errno = MPI_SUCCESS;
-    int i;
-    int ret = PTL_OK;
-    MPIU_CHKPMEM_DECL(1);
-    MPIDI_STATE_DECL(MPID_STATE_ALLOC_OP_SEGMENT);
-
-    MPIDI_FUNC_ENTER(MPID_STATE_ALLOC_OP_SEGMENT);
-
-    MPIU_CHKPMEM_MALLOC(op_segment, struct rptl_op_pool_segment *, sizeof(struct rptl_op_pool_segment),
-                        mpi_errno, "op pool segment");
-    MPL_DL_APPEND(rptl_info.op_segment_list, op_segment);
-
-    for (i = 0; i < RPTL_OP_POOL_SEGMENT_COUNT; i++)
-        MPL_DL_APPEND(rptl_info.op_pool, &op_segment->op[i]);
+    *target = t;
 
   fn_exit:
     MPIU_CHKPMEM_COMMIT();
-    MPIDI_FUNC_EXIT(MPID_STATE_ALLOC_OP_SEGMENT);
+    MPIDI_FUNC_EXIT(MPID_STATE_FIND_TARGET);
     return ret;
 
   fn_fail:
@@ -205,14 +135,8 @@ int MPID_nem_ptl_rptl_init(int world_size, uint64_t max_origin_events,
     MPIDI_FUNC_ENTER(MPID_STATE_MPID_NEM_PTL_RPTL_INIT);
 
     rptl_info.rptl_list = NULL;
+    rptl_info.target_list = NULL;
 
-    rptl_info.op_pool = NULL;
-    ret = alloc_op_segment();
-    RPTLU_ERR_POP(ret, "error allocating op segment\n");
-
-    rptl_info.op_list = NULL;
-
-    rptl_info.paused_target_list = NULL;
     rptl_info.world_size = world_size;
     rptl_info.origin_events_left = max_origin_events;
     rptl_info.get_target_info = get_target_info;
@@ -236,24 +160,36 @@ int MPID_nem_ptl_rptl_drain_eq(int eq_count, ptl_handle_eq_t *eq)
     ptl_event_t event;
     struct rptl_op_pool_segment *op_segment;
     int i;
+    struct rptl_target *target, *t;
     MPIDI_STATE_DECL(MPID_STATE_MPID_NEM_PTL_RPTL_FINALIZE);
 
     MPIDI_FUNC_ENTER(MPID_STATE_MPID_NEM_PTL_RPTL_FINALIZE);
 
-    while (rptl_info.op_list) {
-        for (i = 0; i < eq_count; i++) {
-            /* read and ignore all events */
-            ret = MPID_nem_ptl_rptl_eqget(eq[i], &event);
-            if (ret == PTL_EQ_EMPTY)
-                ret = PTL_OK;
-            RPTLU_ERR_POP(ret, "Error calling MPID_nem_ptl_rptl_eqget\n");
+    for (target = rptl_info.target_list; target; target = target->next) {
+        while (target->control_op_list || target->data_op_list) {
+            for (i = 0; i < eq_count; i++) {
+                /* read and ignore all events */
+                ret = MPID_nem_ptl_rptl_eqget(eq[i], &event);
+                if (ret == PTL_EQ_EMPTY)
+                    ret = PTL_OK;
+                RPTLU_ERR_POP(ret, "Error calling MPID_nem_ptl_rptl_eqget\n");
+            }
         }
     }
 
-    while (rptl_info.op_segment_list) {
-        op_segment = rptl_info.op_segment_list;
-        MPL_DL_DELETE(rptl_info.op_segment_list, op_segment);
-        MPIU_Free(op_segment);
+    for (target = rptl_info.target_list; target;) {
+        assert(target->data_op_list == NULL);
+        assert(target->control_op_list == NULL);
+
+        while (target->op_segment_list) {
+            op_segment = target->op_segment_list;
+            MPL_DL_DELETE(target->op_segment_list, op_segment);
+            MPIU_Free(op_segment);
+        }
+
+        t = target->next;
+        MPIU_Free(target);
+        target = t;
     }
 
   fn_exit:
@@ -328,7 +264,7 @@ int MPID_nem_ptl_rptl_ptinit(ptl_handle_ni_t ni_handle, ptl_handle_eq_t eq_handl
     MPIU_CHKPMEM_MALLOC(rptl, struct rptl *, sizeof(struct rptl), mpi_errno, "rptl");
     MPL_DL_APPEND(rptl_info.rptl_list, rptl);
 
-    rptl->local_state = RPTL_LOCAL_STATE_NORMAL;
+    rptl->local_state = RPTL_LOCAL_STATE_ACTIVE;
     rptl->pause_ack_counter = 0;
 
     rptl->data.ob_max_count = 0;
@@ -415,26 +351,40 @@ int MPID_nem_ptl_rptl_ptfini(ptl_pt_index_t pt_index)
 #define FUNCNAME alloc_op
 #undef FCNAME
 #define FCNAME MPIU_QUOTE(FUNCNAME)
-int alloc_op(struct rptl_op **op)
+static int alloc_op(struct rptl_op **op, struct rptl_target *target)
 {
     int ret = PTL_OK;
+    struct rptl_op_pool_segment *op_segment;
+    int mpi_errno = MPI_SUCCESS;
+    int i;
+    MPIU_CHKPMEM_DECL(1);
     MPIDI_STATE_DECL(MPID_STATE_ALLOC_OP);
 
     MPIDI_FUNC_ENTER(MPID_STATE_ALLOC_OP);
 
-    if (rptl_info.op_pool == NULL) {
-        ret = alloc_op_segment();
-        RPTLU_ERR_POP(ret, "error allocating op segment\n");
+    assert(target);
+
+    if (target->op_pool == NULL) {
+        MPIU_CHKPMEM_MALLOC(op_segment, struct rptl_op_pool_segment *, sizeof(struct rptl_op_pool_segment),
+                            mpi_errno, "op pool segment");
+        MPL_DL_APPEND(target->op_segment_list, op_segment);
+
+        for (i = 0; i < RPTL_OP_POOL_SEGMENT_COUNT; i++)
+            MPL_DL_APPEND(target->op_pool, &op_segment->op[i]);
     }
 
-    *op = rptl_info.op_pool;
-    MPL_DL_DELETE(rptl_info.op_pool, *op);
+    *op = target->op_pool;
+    MPL_DL_DELETE(target->op_pool, *op);
 
   fn_exit:
+    MPIU_CHKPMEM_COMMIT();
     MPIDI_FUNC_EXIT(MPID_STATE_ALLOC_OP);
     return ret;
 
   fn_fail:
+    if (mpi_errno)
+        ret = PTL_FAIL;
+    MPIU_CHKPMEM_REAP();
     goto fn_exit;
 }
 
@@ -449,74 +399,199 @@ void free_op(struct rptl_op *op)
 
     MPIDI_FUNC_ENTER(MPID_STATE_FREE_OP);
 
-    MPL_DL_APPEND(rptl_info.op_pool, op);
+    MPL_DL_APPEND(op->target->op_pool, op);
 
     MPIDI_FUNC_EXIT(MPID_STATE_FREE_OP);
 }
 
 
 #undef FUNCNAME
-#define FUNCNAME issue_op
+#define FUNCNAME poke_progress
 #undef FCNAME
 #define FCNAME MPIU_QUOTE(FUNCNAME)
-int issue_op(struct rptl_op *op)
+int poke_progress(void)
 {
     int ret = PTL_OK;
-    struct rptl_paused_target *target;
-    MPIDI_STATE_DECL(MPID_STATE_ISSUE_OP);
+    struct rptl_target *target;
+    struct rptl_op *op;
+    struct rptl *rptl;
+    int i;
+    int mpi_errno = MPI_SUCCESS;
+    ptl_process_t id;
+    ptl_pt_index_t data_pt, control_pt;
+    MPIDI_STATE_DECL(MPID_STATE_POKE_PROGRESS);
 
-    MPIDI_FUNC_ENTER(MPID_STATE_ISSUE_OP);
+    MPIDI_FUNC_ENTER(MPID_STATE_POKE_PROGRESS);
 
-    if (op->op_type == RPTL_OP_PUT) {
-        for (target = rptl_info.paused_target_list; target; target = target->next)
-            if (IDS_ARE_EQUAL(target->id, op->u.put.target_id))
-                break;
+    /* make progress on local RPTLs */
+    for (rptl = rptl_info.rptl_list; rptl; rptl = rptl->next) {
+        /* if the local state is active, there's nothing to do */
+        if (rptl->local_state == RPTL_LOCAL_STATE_ACTIVE)
+            continue;
 
-        if (target && op->u.put.flow_control)
-            goto fn_exit;
+        /* if we are in a local AWAITING PAUSE ACKS state, see if we
+         * can send out the unpause message */
+        if (rptl->local_state == RPTL_LOCAL_STATE_AWAITING_PAUSE_ACKS &&
+            rptl->pause_ack_counter == rptl_info.world_size) {
+            /* if we are over the max count limit, do not send an
+             * unpause message yet */
+            if (rptl->data.ob_curr_count > rptl->data.ob_max_count)
+                continue;
 
-        if (rptl_info.origin_events_left < 2) {
-            ret = alloc_target(op->u.put.target_id, RPTL_TARGET_STATE_FLOWCONTROL, NULL);
-            RPTLU_ERR_POP(ret, "error allocating paused target\n");
-            goto fn_exit;
+            ret = PtlPTEnable(rptl->ni, rptl->data.pt);
+            RPTLU_ERR_POP(ret, "Error returned while reenabling PT\n");
+
+            rptl->local_state = RPTL_LOCAL_STATE_ACTIVE;
+
+            for (i = 0; i < rptl_info.world_size; i++) {
+                mpi_errno = rptl_info.get_target_info(i, &id, rptl->data.pt, &data_pt, &control_pt);
+                if (mpi_errno) {
+                    ret = PTL_FAIL;
+                    RPTLU_ERR_POP(ret, "Error getting target info\n");
+                }
+
+                /* make sure the user setup a control portal */
+                assert(control_pt != PTL_PT_ANY);
+
+                /* disable flow control for control messages */
+                ret = MPID_nem_ptl_rptl_put(rptl->md, 0, 0, PTL_NO_ACK_REQ, id, control_pt,
+                                            0, 0, NULL, RPTL_CONTROL_MSG_UNPAUSE, 0);
+                RPTLU_ERR_POP(ret, "Error sending unpause message\n");
+            }
         }
-        rptl_info.origin_events_left -= 2;
-
-        /* force request for an ACK even if the user didn't ask for
-         * it.  replace the user pointer with the OP id. */
-        ret =
-            PtlPut(op->u.put.md_handle, op->u.put.local_offset, op->u.put.length,
-                   PTL_ACK_REQ, op->u.put.target_id, op->u.put.pt_index,
-                   op->u.put.match_bits, op->u.put.remote_offset, op,
-                   op->u.put.hdr_data);
-        RPTLU_ERR_POP(ret, "Error issuing PUT\n");
     }
-    else {
-        for (target = rptl_info.paused_target_list; target; target = target->next)
-            if (IDS_ARE_EQUAL(target->id, op->u.get.target_id))
+
+    /* make progress on targets */
+    for (target = rptl_info.target_list; target; target = target->next) {
+        if (target->state == RPTL_TARGET_STATE_RECEIVED_PAUSE) {
+            for (op = target->data_op_list; op; op = op->next)
+                if (op->state == RPTL_OP_STATE_ISSUED)
+                    break;
+            if (op)
+                continue;
+
+            /* send a pause ack message */
+            assert(target->rptl);
+            for (i = 0; i < rptl_info.world_size; i++) {
+                /* find the target that has this target id and get the
+                 * control portal information for it */
+                mpi_errno = rptl_info.get_target_info(i, &id, target->rptl->data.pt, &data_pt, &control_pt);
+                if (mpi_errno) {
+                    ret = PTL_FAIL;
+                    RPTLU_ERR_POP(ret, "Error getting target info\n");
+                }
+                if (IDS_ARE_EQUAL(id, target->id))
+                    break;
+            }
+
+            /* make sure the user setup a control portal */
+            assert(control_pt != PTL_PT_ANY);
+
+            target->state = RPTL_TARGET_STATE_PAUSE_ACKED;
+
+            /* disable flow control for control messages */
+            ret = MPID_nem_ptl_rptl_put(target->rptl->md, 0, 0, PTL_NO_ACK_REQ, id, control_pt, 0,
+                                        0, NULL, RPTL_CONTROL_MSG_PAUSE_ACK, 0);
+            RPTLU_ERR_POP(ret, "Error sending pause ack message\n");
+
+            continue;
+        }
+
+        /* issue out all the control messages first */
+        for (op = target->control_op_list; op; op = op->next) {
+            assert(op->op_type == RPTL_OP_PUT);
+
+            /* skip all the issued ops */
+            if (op->state == RPTL_OP_STATE_ISSUED)
+                continue;
+
+            /* we should not get any NACKs on the control portal */
+            assert(op->state != RPTL_OP_STATE_NACKED);
+
+            if (rptl_info.origin_events_left < 2) {
+                /* too few origin events left.  we can't issue this op
+                 * or any following op to this target in order to
+                 * maintain ordering */
                 break;
+            }
 
-        if (target)
-            goto fn_exit;
+            rptl_info.origin_events_left -= 2;
 
-        if (rptl_info.origin_events_left < 1) {
-            ret = alloc_target(op->u.get.target_id, RPTL_TARGET_STATE_FLOWCONTROL, NULL);
-            RPTLU_ERR_POP(ret, "error allocating paused target\n");
-            goto fn_exit;
+            /* force request for an ACK even if the user didn't ask
+             * for it.  replace the user pointer with the OP id. */
+            ret = PtlPut(op->u.put.md_handle, op->u.put.local_offset, op->u.put.length,
+                         PTL_ACK_REQ, op->u.put.target_id, op->u.put.pt_index,
+                         op->u.put.match_bits, op->u.put.remote_offset, op,
+                         op->u.put.hdr_data);
+            RPTLU_ERR_POP(ret, "Error issuing PUT\n");
+
+            op->state = RPTL_OP_STATE_ISSUED;
         }
-        rptl_info.origin_events_left--;
 
-        ret =
-            PtlGet(op->u.get.md_handle, op->u.get.local_offset, op->u.get.length,
-                   op->u.get.target_id, op->u.get.pt_index, op->u.get.match_bits,
-                   op->u.get.remote_offset, op);
-        RPTLU_ERR_POP(ret, "Error issuing GET\n");
-    }
+        if (target->state == RPTL_TARGET_STATE_DISABLED || target->state == RPTL_TARGET_STATE_PAUSE_ACKED)
+            continue;
 
-    op->state = RPTL_OP_STATE_ISSUED;
+        /* then issue out all the data messages */
+        for (op = target->data_op_list; op; op = op->next) {
+            if (op->op_type == RPTL_OP_PUT) {
+                /* skip all the issued ops */
+                if (op->state == RPTL_OP_STATE_ISSUED)
+                    continue;
+
+                /* if an op has been nacked, don't issue anything else
+                 * to this target */
+                if (op->state == RPTL_OP_STATE_NACKED)
+                    break;
+
+                if (rptl_info.origin_events_left < 2) {
+                    /* too few origin events left.  we can't issue
+                     * this op or any following op to this target in
+                     * order to maintain ordering */
+                    break;
+                }
+
+                rptl_info.origin_events_left -= 2;
+
+                /* force request for an ACK even if the user didn't
+                 * ask for it.  replace the user pointer with the OP
+                 * id. */
+                ret = PtlPut(op->u.put.md_handle, op->u.put.local_offset, op->u.put.length,
+                             PTL_ACK_REQ, op->u.put.target_id, op->u.put.pt_index,
+                             op->u.put.match_bits, op->u.put.remote_offset, op,
+                             op->u.put.hdr_data);
+                RPTLU_ERR_POP(ret, "Error issuing PUT\n");
+            }
+            else if (op->op_type == RPTL_OP_GET) {
+                /* skip all the issued ops */
+                if (op->state == RPTL_OP_STATE_ISSUED)
+                    continue;
+
+                /* if an op has been nacked, don't issue anything else
+                 * to this target */
+                if (op->state == RPTL_OP_STATE_NACKED)
+                    break;
+
+                if (rptl_info.origin_events_left < 1) {
+                    /* too few origin events left.  we can't issue
+                     * this op or any following op to this target in
+                     * order to maintain ordering */
+                    break;
+                }
+
+                rptl_info.origin_events_left--;
+
+                ret = PtlGet(op->u.get.md_handle, op->u.get.local_offset, op->u.get.length,
+                             op->u.get.target_id, op->u.get.pt_index, op->u.get.match_bits,
+                             op->u.get.remote_offset, op);
+                RPTLU_ERR_POP(ret, "Error issuing GET\n");
+            }
+
+            op->state = RPTL_OP_STATE_ISSUED;
+        }
+    }
 
   fn_exit:
-    MPIDI_FUNC_EXIT(MPID_STATE_ISSUE_OP);
+    MPIDI_FUNC_EXIT(MPID_STATE_POKE_PROGRESS);
     return ret;
 
   fn_fail:
@@ -535,11 +610,15 @@ int MPID_nem_ptl_rptl_put(ptl_handle_md_t md_handle, ptl_size_t local_offset, pt
 {
     struct rptl_op *op;
     int ret = PTL_OK;
+    struct rptl_target *target;
     MPIDI_STATE_DECL(MPID_STATE_MPID_NEM_PTL_RPTL_PUT);
 
     MPIDI_FUNC_ENTER(MPID_STATE_MPID_NEM_PTL_RPTL_PUT);
 
-    ret = alloc_op(&op);
+    ret = find_target(target_id, &target);
+    RPTLU_ERR_POP(ret, "error finding target structure\n");
+
+    ret = alloc_op(&op, target);
     RPTLU_ERR_POP(ret, "error allocating op\n");
 
     op->op_type = RPTL_OP_PUT;
@@ -562,12 +641,15 @@ int MPID_nem_ptl_rptl_put(ptl_handle_md_t md_handle, ptl_size_t local_offset, pt
     op->u.put.ack = NULL;
     op->u.put.flow_control = flow_control;
     op->events_ready = 0;
+    op->target = target;
 
-    MPL_DL_APPEND(rptl_info.op_list, op);
+    if (op->u.put.flow_control)
+        MPL_DL_APPEND(target->data_op_list, op);
+    else
+        MPL_DL_APPEND(target->control_op_list, op);
 
-    /* if we are not in a PAUSED state, issue the operation */
-    ret = issue_op(op);
-    RPTLU_ERR_POP(ret, "Error from issue_op\n");
+    ret = poke_progress();
+    RPTLU_ERR_POP(ret, "Error from poke_progress\n");
 
   fn_exit:
     MPIDI_FUNC_EXIT(MPID_STATE_MPID_NEM_PTL_RPTL_PUT);
@@ -588,12 +670,15 @@ int MPID_nem_ptl_rptl_get(ptl_handle_md_t md_handle, ptl_size_t local_offset, pt
 {
     struct rptl_op *op;
     int ret = PTL_OK;
-    struct rptl_paused_target *target;
+    struct rptl_target *target;
     MPIDI_STATE_DECL(MPID_STATE_MPID_NEM_PTL_RPTL_GET);
 
     MPIDI_FUNC_ENTER(MPID_STATE_MPID_NEM_PTL_RPTL_GET);
 
-    ret = alloc_op(&op);
+    ret = find_target(target_id, &target);
+    RPTLU_ERR_POP(ret, "error finding target structure\n");
+
+    ret = alloc_op(&op, target);
     RPTLU_ERR_POP(ret, "error allocating op\n");
 
     op->op_type = RPTL_OP_GET;
@@ -610,15 +695,12 @@ int MPID_nem_ptl_rptl_get(ptl_handle_md_t md_handle, ptl_size_t local_offset, pt
     op->u.get.user_ptr = user_ptr;
 
     op->events_ready = 0;
+    op->target = target;
 
-    MPL_DL_APPEND(rptl_info.op_list, op);
+    MPL_DL_APPEND(target->data_op_list, op);
 
-    for (target = rptl_info.paused_target_list; target; target = target->next)
-        if (IDS_ARE_EQUAL(target->id, target_id))
-            break;
-
-    ret = issue_op(op);
-    RPTLU_ERR_POP(ret, "Error from issue_op\n");
+    ret = poke_progress();
+    RPTLU_ERR_POP(ret, "Error from poke_progress\n");
 
   fn_exit:
     MPIDI_FUNC_EXIT(MPID_STATE_MPID_NEM_PTL_RPTL_GET);
@@ -657,6 +739,9 @@ static int send_pause_messages(struct rptl *rptl)
             RPTLU_ERR_POP(ret, "Error getting target info while sending pause messages\n");
         }
 
+        /* make sure the user setup a control portal */
+        assert(control_pt != PTL_PT_ANY);
+
         /* disable flow control for control messages */
         ret = MPID_nem_ptl_rptl_put(rptl->md, 0, 0, PTL_NO_ACK_REQ, id, control_pt, 0, 0,
                                     NULL, RPTL_CONTROL_MSG_PAUSE, 0);
@@ -673,167 +758,35 @@ static int send_pause_messages(struct rptl *rptl)
 
 
 #undef FUNCNAME
-#define FUNCNAME send_pause_ack_messages
+#define FUNCNAME clear_nacks
 #undef FCNAME
 #define FCNAME MPIU_QUOTE(FUNCNAME)
-static int send_pause_ack_messages(void)
+static int clear_nacks(ptl_process_t target_id)
 {
+    struct rptl_target *target;
     struct rptl_op *op;
     int ret = PTL_OK;
-    struct rptl_paused_target *target;
-    MPIDI_STATE_DECL(MPID_STATE_SEND_PAUSE_ACK_MESSAGES);
+    MPIDI_STATE_DECL(MPID_STATE_CLEAR_NACKS);
 
-    MPIDI_FUNC_ENTER(MPID_STATE_SEND_PAUSE_ACK_MESSAGES);
-
-    for (target = rptl_info.paused_target_list; target; target = target->next) {
-        if (target->state != RPTL_TARGET_STATE_RECEIVED_PAUSE)
-            continue;
-
-        for (op = rptl_info.op_list; op; op = op->next) {
-            if (op->op_type == RPTL_OP_GET && IDS_ARE_EQUAL(op->u.get.target_id, target->id) &&
-                op->state == RPTL_OP_STATE_ISSUED)
-                break;
-
-            if (op->op_type == RPTL_OP_PUT && IDS_ARE_EQUAL(op->u.put.target_id, target->id)) {
-                if (op->state == RPTL_OP_STATE_ISSUED)
-                    break;
-                if (op->u.put.send || op->u.put.ack)
-                    break;
-            }
-        }
+    MPIDI_FUNC_ENTER(MPID_STATE_CLEAR_NACKS);
 
-        if (op == NULL) {
-            ptl_process_t id;
-            ptl_pt_index_t data_pt, control_pt;
-            int i;
-            int mpi_errno = MPI_SUCCESS;
+    ret = find_target(target_id, &target);
+    RPTLU_ERR_POP(ret, "error finding target\n");
 
-            for (i = 0; i < rptl_info.world_size; i++) {
-                /* find the target that has this target id and get the
-                 * control portal information for it */
-                mpi_errno = rptl_info.get_target_info(i, &id, target->rptl->data.pt, &data_pt, &control_pt);
-                if (mpi_errno) {
-                    ret = PTL_FAIL;
-                    RPTLU_ERR_POP(ret,
-                                  "Error getting target info while sending pause ack message\n");
-                }
-                if (IDS_ARE_EQUAL(id, target->id))
-                    break;
-            }
-
-            /* disable flow control for control messages */
-            ret =
-                MPID_nem_ptl_rptl_put(target->rptl->md, 0, 0, PTL_NO_ACK_REQ, id, control_pt, 0,
-                                      0, NULL, RPTL_CONTROL_MSG_PAUSE_ACK, 0);
-            RPTLU_ERR_POP(ret, "Error sending pause ack message\n");
-
-            if (target->state == RPTL_TARGET_STATE_RECEIVED_PAUSE)
-                target->state = RPTL_TARGET_STATE_PAUSE_ACKED;
-        }
-    }
-
-  fn_exit:
-    MPIDI_FUNC_EXIT(MPID_STATE_SEND_PAUSE_ACK_MESSAGES);
-    return ret;
-
-  fn_fail:
-    goto fn_exit;
-}
-
-
-#undef FUNCNAME
-#define FUNCNAME send_unpause_messages
-#undef FCNAME
-#define FCNAME MPIU_QUOTE(FUNCNAME)
-static int send_unpause_messages(void)
-{
-    int i, mpi_errno = MPI_SUCCESS;
-    ptl_process_t id;
-    ptl_pt_index_t data_pt, control_pt;
-    int ret = PTL_OK;
-    struct rptl *rptl;
-    MPIDI_STATE_DECL(MPID_STATE_SEND_UNPAUSE_MESSAGES);
-
-    MPIDI_FUNC_ENTER(MPID_STATE_SEND_UNPAUSE_MESSAGES);
-
-    for (rptl = rptl_info.rptl_list; rptl; rptl = rptl->next) {
-        assert(rptl->local_state != RPTL_LOCAL_STATE_AWAITING_PAUSE_ACKS ||
-               rptl->control.pt != PTL_PT_ANY);
-        if (rptl->control.pt == PTL_PT_ANY)
-            continue;
-        if (rptl->local_state != RPTL_LOCAL_STATE_AWAITING_PAUSE_ACKS)
-            continue;
-
-        if (rptl->pause_ack_counter == rptl_info.world_size) {
-            /* if we are over the max count limit, do not send an
-             * unpause message yet */
-            if (rptl->data.ob_curr_count > rptl->data.ob_max_count)
-                goto fn_exit;
-
-            ret = PtlPTEnable(rptl->ni, rptl->data.pt);
-            RPTLU_ERR_POP(ret, "Error returned while reenabling PT\n");
-
-            rptl->local_state = RPTL_LOCAL_STATE_NORMAL;
-
-            for (i = 0; i < rptl_info.world_size; i++) {
-                mpi_errno = rptl_info.get_target_info(i, &id, rptl->data.pt, &data_pt, &control_pt);
-                if (mpi_errno) {
-                    ret = PTL_FAIL;
-                    RPTLU_ERR_POP(ret,
-                                  "Error getting target info while sending unpause messages\n");
-                }
-
-                /* disable flow control for control messages */
-                ret =
-                    MPID_nem_ptl_rptl_put(rptl->md, 0, 0, PTL_NO_ACK_REQ, id, control_pt,
-                                          0, 0, NULL, RPTL_CONTROL_MSG_UNPAUSE, 0);
-                RPTLU_ERR_POP(ret, "Error sending unpause message\n");
-            }
-        }
-    }
-
-  fn_exit:
-    MPIDI_FUNC_EXIT(MPID_STATE_SEND_UNPAUSE_MESSAGES);
-    return ret;
-
-  fn_fail:
-    goto fn_exit;
-}
-
-
-#undef FUNCNAME
-#define FUNCNAME reissue_ops
-#undef FCNAME
-#define FCNAME MPIU_QUOTE(FUNCNAME)
-static int reissue_ops(ptl_process_t target_id)
-{
-    struct rptl_paused_target *target;
-    struct rptl_op *op;
-    int ret = PTL_OK;
-    MPIDI_STATE_DECL(MPID_STATE_REISSUE_OPS);
-
-    MPIDI_FUNC_ENTER(MPID_STATE_REISSUE_OPS);
-
-    for (target = rptl_info.paused_target_list; target; target = target->next)
-        if (IDS_ARE_EQUAL(target->id, target_id))
-            break;
-    assert(target);
-
-    MPL_DL_DELETE(rptl_info.paused_target_list, target);
-    MPIU_Free(target);
-
-    for (op = rptl_info.op_list; op; op = op->next) {
+    for (op = target->data_op_list; op; op = op->next) {
         if ((op->op_type == RPTL_OP_PUT && IDS_ARE_EQUAL(op->u.put.target_id, target_id)) ||
             (op->op_type == RPTL_OP_GET && IDS_ARE_EQUAL(op->u.get.target_id, target_id))) {
-            if (op->state != RPTL_OP_STATE_ISSUED) {
-                ret = issue_op(op);
-                RPTLU_ERR_POP(ret, "Error calling issue_op\n");
-            }
+            if (op->state == RPTL_OP_STATE_NACKED)
+                op->state = RPTL_OP_STATE_QUEUED;
         }
     }
+    target->state = RPTL_TARGET_STATE_ACTIVE;
+
+    ret = poke_progress();
+    RPTLU_ERR_POP(ret, "error in poke_progress\n");
 
   fn_exit:
-    MPIDI_FUNC_EXIT(MPID_STATE_REISSUE_OPS);
+    MPIDI_FUNC_EXIT(MPID_STATE_CLEAR_NACKS);
     return ret;
 
   fn_fail:
@@ -845,11 +798,11 @@ static int reissue_ops(ptl_process_t target_id)
 #define FUNCNAME get_event_info
 #undef FCNAME
 #define FCNAME MPIU_QUOTE(FUNCNAME)
-static void get_event_info(ptl_event_t * event, struct rptl **ret_rptl, struct rptl_op **ret_op)
+static int get_event_info(ptl_event_t * event, struct rptl **ret_rptl, struct rptl_op **ret_op)
 {
     struct rptl *rptl;
     struct rptl_op *op;
-    struct rptl_paused_target *target, *tmp;
+    int ret = PTL_OK;
     MPIDI_STATE_DECL(MPID_STATE_GET_EVENT_INFO);
 
     MPIDI_FUNC_ENTER(MPID_STATE_GET_EVENT_INFO);
@@ -860,18 +813,9 @@ static void get_event_info(ptl_event_t * event, struct rptl **ret_rptl, struct r
 
         rptl_info.origin_events_left++;
 
-        if (rptl_info.origin_events_left >= 2) {
-            for (target = rptl_info.paused_target_list; target;) {
-                if (target->state == RPTL_TARGET_STATE_FLOWCONTROL) {
-                    tmp = target->next;
-                    MPL_DL_DELETE(rptl_info.paused_target_list, target);
-                    MPIU_Free(target);
-                    target = tmp;
-                }
-                else
-                    target = target->next;
-            }
-        }
+        /* see if there are any pending ops to be issued */
+        ret = poke_progress();
+        RPTLU_ERR_POP(ret, "Error returned from poke_progress\n");
 
         assert(op);
         rptl = NULL;
@@ -892,7 +836,7 @@ static void get_event_info(ptl_event_t * event, struct rptl **ret_rptl, struct r
 
   fn_exit:
     MPIDI_FUNC_EXIT(MPID_STATE_GET_EVENT_INFO);
-    return;
+    return ret;
 
   fn_fail:
     goto fn_exit;
@@ -953,72 +897,52 @@ static int stash_event(struct rptl_op *op, ptl_event_t event)
 #define FUNCNAME retrieve_event
 #undef FCNAME
 #define FCNAME MPIU_QUOTE(FUNCNAME)
-static void retrieve_event(struct rptl *rptl, struct rptl_op *op, ptl_event_t * event)
+static int retrieve_event(ptl_event_t * event)
 {
+    struct rptl_target *target;
+    struct rptl_op *op;
+    int have_event = 0;
     MPIDI_STATE_DECL(MPID_STATE_RETRIEVE_EVENT);
 
     MPIDI_FUNC_ENTER(MPID_STATE_RETRIEVE_EVENT);
 
-    assert(op->op_type == RPTL_OP_PUT);
-    assert(op->u.put.send || op->u.put.ack);
-
-    if (op->u.put.send) {
-        memcpy(event, op->u.put.send, sizeof(ptl_event_t));
-        MPIU_Free(op->u.put.send);
-    }
-    else {
-        memcpy(event, op->u.put.ack, sizeof(ptl_event_t));
-        MPIU_Free(op->u.put.ack);
-    }
-    event->user_ptr = op->u.put.user_ptr;
-
-    MPL_DL_DELETE(rptl_info.op_list, op);
-    free_op(op);
-
-  fn_exit:
-    MPIDI_FUNC_EXIT(MPID_STATE_RETRIEVE_EVENT);
-    return;
-
-  fn_fail:
-    goto fn_exit;
-}
-
-
-#undef FUNCNAME
-#define FUNCNAME issue_pending_ops
-#undef FCNAME
-#define FCNAME MPIU_QUOTE(FUNCNAME)
-static int issue_pending_ops(void)
-{
-    struct rptl_paused_target *target, *tmp;
-    struct rptl_op *op;
-    int ret = PTL_OK;
-    MPIDI_STATE_DECL(MPID_STATE_ISSUE_PENDING_OPS);
+    /* FIXME: this is an expensive loop over all pending operations
+     * everytime the user does an eqget */
+    for (target = rptl_info.target_list; target; target = target->next) {
+        for (op = target->data_op_list; op; op = op->next) {
+            if (op->events_ready) {
+                assert(op->op_type == RPTL_OP_PUT);
+                assert(op->u.put.send || op->u.put.ack);
+
+                if (op->u.put.send) {
+                    memcpy(event, op->u.put.send, sizeof(ptl_event_t));
+                    MPIU_Free(op->u.put.send);
+                    op->u.put.send = NULL;
+                }
+                else {
+                    memcpy(event, op->u.put.ack, sizeof(ptl_event_t));
+                    MPIU_Free(op->u.put.ack);
+                    op->u.put.ack = NULL;
+                }
+                event->user_ptr = op->u.put.user_ptr;
 
-    MPIDI_FUNC_ENTER(MPID_STATE_ISSUE_PENDING_OPS);
+                MPL_DL_DELETE(target->data_op_list, op);
+                free_op(op);
 
-    for (op = rptl_info.op_list; op; op = op->next) {
-        if (op->state == RPTL_OP_STATE_QUEUED) {
-            for (target = rptl_info.paused_target_list; target; target = target->next)
-                if ((op->op_type == RPTL_OP_PUT && IDS_ARE_EQUAL(op->u.put.target_id, target->id)) ||
-                    (op->op_type == RPTL_OP_GET && IDS_ARE_EQUAL(op->u.get.target_id, target->id)))
-                    break;
-            if (target == NULL) {
-                ret = issue_op(op);
-                RPTLU_ERR_POP(ret, "error issuing op\n");
+                have_event = 1;
+                goto fn_exit;
             }
         }
     }
 
   fn_exit:
-    MPIDI_FUNC_EXIT(MPID_STATE_ISSUE_PENDING_OPS);
-    return ret;
+    MPIDI_FUNC_EXIT(MPID_STATE_RETRIEVE_EVENT);
+    return have_event;
 
   fn_fail:
     goto fn_exit;
 }
 
-
 #undef FUNCNAME
 #define FUNCNAME MPID_nem_ptl_rptl_eqget
 #undef FCNAME
@@ -1029,44 +953,21 @@ int MPID_nem_ptl_rptl_eqget(ptl_handle_eq_t eq_handle, ptl_event_t * event)
     struct rptl *rptl;
     ptl_event_t e;
     int ret = PTL_OK, tmp_ret = PTL_OK;
-    struct rptl_paused_target *target;
     int mpi_errno = MPI_SUCCESS;
+    struct rptl_target *target;
     MPIU_CHKPMEM_DECL(1);
     MPIDI_STATE_DECL(MPID_STATE_MPID_NEM_PTL_RPTL_EQGET);
 
     MPIDI_FUNC_ENTER(MPID_STATE_MPID_NEM_PTL_RPTL_EQGET);
 
+    ret = poke_progress();
+    RPTLU_ERR_POP(ret, "error poking progress\n");
+
     /* before we poll the eq, we need to check if there are any
      * completed operations that need to be returned */
-    /* FIXME: this is an expensive loop over all pending operations
-     * everytime the user does an eqget */
-    for (op = rptl_info.op_list; op; op = op->next) {
-        if (op->events_ready) {
-            retrieve_event(rptl, op, event);
-            ret = PTL_OK;
-            goto fn_exit;
-        }
-    }
-
-    /* see if pause ack messages need to be sent out */
-    tmp_ret = send_pause_ack_messages();
-    if (tmp_ret) {
-        ret = tmp_ret;
-        RPTLU_ERR_POP(ret, "Error returned from send_pause_ack_messages\n");
-    }
-
-    /* see if unpause messages need to be sent out */
-    tmp_ret = send_unpause_messages();
-    if (tmp_ret) {
-        ret = tmp_ret;
-        RPTLU_ERR_POP(ret, "Error returned from send_unpause_messages\n");
-    }
-
-    /* see if there are any pending ops to be issued */
-    tmp_ret = issue_pending_ops();
-    if (tmp_ret) {
-        ret = tmp_ret;
-        RPTLU_ERR_POP(ret, "Error returned from issue_pending_ops\n");
+    if (retrieve_event(event)) {
+        ret = PTL_OK;
+        goto fn_exit;
     }
 
     ret = PtlEQGet(eq_handle, event);
@@ -1074,7 +975,11 @@ int MPID_nem_ptl_rptl_eqget(ptl_handle_eq_t eq_handle, ptl_event_t * event)
         goto fn_exit;
 
     /* find the rptl and op associated with this event */
-    get_event_info(event, &rptl, &op);
+    tmp_ret = get_event_info(event, &rptl, &op);
+    if (tmp_ret) {
+        ret = tmp_ret;
+        RPTLU_ERR_POP(ret, "Error returned from get_event_info\n");
+    }
 
     /* PT_DISABLED events only occur on the target */
     if (event->type == PTL_EVENT_PT_DISABLED) {
@@ -1088,14 +993,16 @@ int MPID_nem_ptl_rptl_eqget(ptl_handle_eq_t eq_handle, ptl_event_t * event)
          * recover from disable events */
         assert(rptl->control.pt != PTL_PT_ANY);
 
-        rptl->local_state = RPTL_LOCAL_STATE_AWAITING_PAUSE_ACKS;
-        rptl->pause_ack_counter = 0;
+        if (rptl->local_state == RPTL_LOCAL_STATE_ACTIVE) {
+            rptl->local_state = RPTL_LOCAL_STATE_AWAITING_PAUSE_ACKS;
+            rptl->pause_ack_counter = 0;
 
-        /* send out pause messages */
-        tmp_ret = send_pause_messages(rptl);
-        if (tmp_ret) {
-            ret = tmp_ret;
-            RPTLU_ERR_POP(ret, "Error returned from send_pause_messages\n");
+            /* send out pause messages */
+            tmp_ret = send_pause_messages(rptl);
+            if (tmp_ret) {
+                ret = tmp_ret;
+                RPTLU_ERR_POP(ret, "Error returned from send_pause_messages\n");
+            }
         }
     }
 
@@ -1139,21 +1046,25 @@ int MPID_nem_ptl_rptl_eqget(ptl_handle_eq_t eq_handle, ptl_event_t * event)
             rptl->control.me_idx = 0;
 
         if (event->hdr_data == RPTL_CONTROL_MSG_PAUSE) {
-            tmp_ret = alloc_target(event->initiator, RPTL_TARGET_STATE_RECEIVED_PAUSE, rptl);
+            tmp_ret = find_target(event->initiator, &target);
             if (tmp_ret) {
                 ret = tmp_ret;
-                RPTLU_ERR_POP(ret, "Error returned from alloc_target\n");
+                RPTLU_ERR_POP(ret, "Error finding target\n");
             }
+            assert(target->state < RPTL_TARGET_STATE_RECEIVED_PAUSE);
+            target->state = RPTL_TARGET_STATE_RECEIVED_PAUSE;
+            target->rptl = rptl;
         }
         else if (event->hdr_data == RPTL_CONTROL_MSG_PAUSE_ACK) {
             rptl->pause_ack_counter++;
         }
         else {  /* got an UNPAUSE message */
-            /* reissue all operations to this target */
-            tmp_ret = reissue_ops(event->initiator);
+            /* clear NACKs from all operations to this target and poke
+             * progress */
+            tmp_ret = clear_nacks(event->initiator);
             if (tmp_ret) {
                 ret = tmp_ret;
-                RPTLU_ERR_POP(ret, "Error returned from reissue_ops\n");
+                RPTLU_ERR_POP(ret, "Error returned from clear_nacks\n");
             }
         }
     }
@@ -1167,6 +1078,10 @@ int MPID_nem_ptl_rptl_eqget(ptl_handle_eq_t eq_handle, ptl_event_t * event)
             /* hide the event from the user */
             ret = PTL_EQ_EMPTY;
 
+            /* we should not get NACKs on the control portal */
+            if (event->type == PTL_EVENT_ACK)
+                assert(op->u.put.flow_control);
+
             op->state = RPTL_OP_STATE_NACKED;
 
             if (op->op_type == RPTL_OP_PUT) {
@@ -1191,12 +1106,17 @@ int MPID_nem_ptl_rptl_eqget(ptl_handle_eq_t eq_handle, ptl_event_t * event)
             }
 
             if (op->op_type == RPTL_OP_PUT)
-                tmp_ret = alloc_target(op->u.put.target_id, RPTL_TARGET_STATE_DISABLED, NULL);
+                tmp_ret = find_target(op->u.put.target_id, &target);
             else
-                tmp_ret = alloc_target(op->u.get.target_id, RPTL_TARGET_STATE_DISABLED, NULL);
+                tmp_ret = find_target(op->u.get.target_id, &target);
             if (tmp_ret) {
                 ret = tmp_ret;
-                RPTLU_ERR_POP(ret, "Error returned from alloc_target\n");
+                RPTLU_ERR_POP(ret, "Error finding target\n");
+            }
+
+            if (target->state == RPTL_TARGET_STATE_ACTIVE) {
+                target->state = RPTL_TARGET_STATE_DISABLED;
+                target->rptl = NULL;
             }
         }
 
@@ -1205,7 +1125,9 @@ int MPID_nem_ptl_rptl_eqget(ptl_handle_eq_t eq_handle, ptl_event_t * event)
             assert(op->op_type == RPTL_OP_GET);
 
             event->user_ptr = op->u.get.user_ptr;
-            MPL_DL_DELETE(rptl_info.op_list, op);
+
+            /* GET operations only go into the data op list */
+            MPL_DL_DELETE(op->target->data_op_list, op);
             free_op(op);
         }
 
@@ -1218,9 +1140,11 @@ int MPID_nem_ptl_rptl_eqget(ptl_handle_eq_t eq_handle, ptl_event_t * event)
             op->events_ready = 1;
             event->user_ptr = op->u.put.user_ptr;
 
-            /* if flow control is not set, ignore events */
+            /* if flow control is not set, ignore the ACK event */
             if (op->u.put.flow_control == 0) {
-                retrieve_event(rptl, op, event);
+                MPIU_Free(op->u.put.ack);
+                MPL_DL_DELETE(op->target->control_op_list, op);
+                free_op(op);
                 ret = PTL_EQ_EMPTY;
             }
         }
@@ -1234,17 +1158,23 @@ int MPID_nem_ptl_rptl_eqget(ptl_handle_eq_t eq_handle, ptl_event_t * event)
             op->events_ready = 1;
             event->user_ptr = op->u.put.user_ptr;
 
-            /* if flow control is not set, ignore events */
+            /* if flow control is not set, ignore ACK event */
             if (op->u.put.flow_control == 0) {
-                retrieve_event(rptl, op, event);
+                MPIU_Free(op->u.put.send);
+                MPL_DL_DELETE(op->target->control_op_list, op);
+                free_op(op);
                 ret = PTL_EQ_EMPTY;
             }
 
-            /* if the user asked for an ACK, just return this event.
-             * if not, discard this event and retrieve the send
-             * event. */
-            else if (!(op->u.put.ack_req & PTL_ACK_REQ))
-                retrieve_event(rptl, op, event);
+            /* if the user did not ask for an ACK discard this event
+             * and return the send event. */
+            else if (!(op->u.put.ack_req & PTL_ACK_REQ)) {
+                memcpy(event, op->u.put.send, sizeof(ptl_event_t));
+                MPIU_Free(op->u.put.send);
+                /* flow control is set, we should be in the data op list */
+                MPL_DL_DELETE(op->target->data_op_list, op);
+                free_op(op);
+            }
         }
 
         else {
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.h b/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.h
index 359e24f..c5f1254 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.h
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.h
@@ -30,6 +30,7 @@
         }                                                               \
     }
 
+struct rptl_target;
 struct rptl_op {
     enum {
         RPTL_OP_PUT,
@@ -73,6 +74,7 @@ struct rptl_op {
     } u;
 
     int events_ready;
+    struct rptl_target *target;
 
     struct rptl_op *next;
     struct rptl_op *prev;
@@ -85,7 +87,7 @@ struct rptl_op {
 struct rptl {
     /* local portal state */
     enum {
-        RPTL_LOCAL_STATE_NORMAL,
+        RPTL_LOCAL_STATE_ACTIVE,
         RPTL_LOCAL_STATE_AWAITING_PAUSE_ACKS
     } local_state;
     uint64_t pause_ack_counter;
@@ -122,6 +124,37 @@ struct rptl {
     struct rptl *prev;
 };
 
+#define RPTL_OP_POOL_SEGMENT_COUNT  (1024)
+
+struct rptl_target {
+    ptl_process_t id;
+
+    enum rptl_target_state {
+        RPTL_TARGET_STATE_ACTIVE,
+        RPTL_TARGET_STATE_DISABLED,
+        RPTL_TARGET_STATE_RECEIVED_PAUSE,
+        RPTL_TARGET_STATE_PAUSE_ACKED
+    } state;
+
+    /* when we get a pause message, we need to know which rptl it came
+     * in on, so we can figure out what the corresponding target
+     * portal is.  for this, we store the local rptl */
+    struct rptl *rptl;
+
+    struct rptl_op_pool_segment {
+        struct rptl_op op[RPTL_OP_POOL_SEGMENT_COUNT];
+        struct rptl_op_pool_segment *next;
+        struct rptl_op_pool_segment *prev;
+    } *op_segment_list;
+    struct rptl_op *op_pool;
+
+    struct rptl_op *data_op_list;
+    struct rptl_op *control_op_list;
+
+    struct rptl_target *next;
+    struct rptl_target *prev;
+};
+
 int MPID_nem_ptl_rptl_init(int world_size, uint64_t max_origin_events,
                            int (*get_target_info) (int rank, ptl_process_t * id,
                                                    ptl_pt_index_t local_data_pt,

http://git.mpich.org/mpich.git/commitdiff/a13bf9b49399b578b30548be540e1007be80f66b

commit a13bf9b49399b578b30548be540e1007be80f66b
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Tue Nov 11 22:53:49 2014 -0600

    portals4: use a separate EQ per PT
    
    Signed-off-by: Antonio Pena Monferrer <apenya at mcs.anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
index 61ceaed..e5e1aea 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
@@ -19,8 +19,12 @@ extern ptl_pt_index_t  MPIDI_nem_ptl_pt;
 extern ptl_pt_index_t  MPIDI_nem_ptl_get_pt; /* portal for gets by receiver */
 extern ptl_pt_index_t  MPIDI_nem_ptl_control_pt; /* portal for MPICH control messages */
 extern ptl_pt_index_t  MPIDI_nem_ptl_rpt_pt; /* portal for MPICH control messages */
-extern ptl_handle_eq_t MPIDI_nem_ptl_target_eq;
 extern ptl_handle_eq_t MPIDI_nem_ptl_origin_eq;
+extern ptl_handle_eq_t MPIDI_nem_ptl_eq;
+extern ptl_handle_eq_t MPIDI_nem_ptl_get_eq;
+extern ptl_handle_eq_t MPIDI_nem_ptl_control_eq;
+extern ptl_handle_eq_t MPIDI_nem_ptl_origin_eq;
+extern ptl_handle_eq_t MPIDI_nem_ptl_rpt_eq;
 
 extern ptl_handle_md_t MPIDI_nem_ptl_global_md;
 extern ptl_ni_limits_t MPIDI_nem_ptl_ni_limits;
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
index 5087c92..ffad963 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
@@ -28,7 +28,9 @@ ptl_pt_index_t  MPIDI_nem_ptl_pt;
 ptl_pt_index_t  MPIDI_nem_ptl_get_pt; /* portal for gets by receiver */
 ptl_pt_index_t  MPIDI_nem_ptl_control_pt; /* portal for MPICH control messages */
 ptl_pt_index_t  MPIDI_nem_ptl_rpt_pt; /* portal for rportals control messages */
-ptl_handle_eq_t MPIDI_nem_ptl_target_eq;
+ptl_handle_eq_t MPIDI_nem_ptl_eq;
+ptl_handle_eq_t MPIDI_nem_ptl_get_eq;
+ptl_handle_eq_t MPIDI_nem_ptl_control_eq;
 ptl_handle_eq_t MPIDI_nem_ptl_origin_eq;
 ptl_pt_index_t  MPIDI_nem_ptl_control_rpt_pt; /* portal for rportals control messages */
 ptl_pt_index_t  MPIDI_nem_ptl_get_rpt_pt; /* portal for rportals control messages */
@@ -184,7 +186,14 @@ static int ptl_init(MPIDI_PG_t *pg_p, int pg_rank, char **bc_val_p, int *val_max
                     PTL_PID_ANY, &desired, &MPIDI_nem_ptl_ni_limits, &MPIDI_nem_ptl_ni);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlniinit", "**ptlniinit %s", MPID_nem_ptl_strerror(ret));
 
-    ret = PtlEQAlloc(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_ni_limits.max_eqs, &MPIDI_nem_ptl_target_eq);
+    /* allocate EQs for each portal */
+    ret = PtlEQAlloc(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_ni_limits.max_eqs, &MPIDI_nem_ptl_eq);
+    MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptleqalloc", "**ptleqalloc %s", MPID_nem_ptl_strerror(ret));
+
+    ret = PtlEQAlloc(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_ni_limits.max_eqs, &MPIDI_nem_ptl_get_eq);
+    MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptleqalloc", "**ptleqalloc %s", MPID_nem_ptl_strerror(ret));
+
+    ret = PtlEQAlloc(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_ni_limits.max_eqs, &MPIDI_nem_ptl_control_eq);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptleqalloc", "**ptleqalloc %s", MPID_nem_ptl_strerror(ret));
 
     /* allocate a separate EQ for origin events. with this, we can implement rate-limit operations
@@ -193,32 +202,32 @@ static int ptl_init(MPIDI_PG_t *pg_p, int pg_rank, char **bc_val_p, int *val_max
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptleqalloc", "**ptleqalloc %s", MPID_nem_ptl_strerror(ret));
 
     /* allocate portal for matching messages */
-    ret = PtlPTAlloc(MPIDI_nem_ptl_ni, PTL_PT_ONLY_USE_ONCE | PTL_PT_ONLY_TRUNCATE | PTL_PT_FLOWCTRL, MPIDI_nem_ptl_target_eq,
+    ret = PtlPTAlloc(MPIDI_nem_ptl_ni, PTL_PT_ONLY_USE_ONCE | PTL_PT_ONLY_TRUNCATE | PTL_PT_FLOWCTRL, MPIDI_nem_ptl_eq,
                      PTL_PT_ANY, &MPIDI_nem_ptl_pt);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptalloc", "**ptlptalloc %s", MPID_nem_ptl_strerror(ret));
 
     /* allocate portal for large messages where receiver does a get */
-    ret = PtlPTAlloc(MPIDI_nem_ptl_ni, PTL_PT_ONLY_USE_ONCE | PTL_PT_ONLY_TRUNCATE | PTL_PT_FLOWCTRL, MPIDI_nem_ptl_target_eq,
+    ret = PtlPTAlloc(MPIDI_nem_ptl_ni, PTL_PT_ONLY_USE_ONCE | PTL_PT_ONLY_TRUNCATE | PTL_PT_FLOWCTRL, MPIDI_nem_ptl_get_eq,
                      PTL_PT_ANY, &MPIDI_nem_ptl_get_pt);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptalloc", "**ptlptalloc %s", MPID_nem_ptl_strerror(ret));
 
     /* allocate portal for MPICH control messages */
-    ret = PtlPTAlloc(MPIDI_nem_ptl_ni, PTL_PT_ONLY_USE_ONCE | PTL_PT_ONLY_TRUNCATE | PTL_PT_FLOWCTRL, MPIDI_nem_ptl_target_eq,
+    ret = PtlPTAlloc(MPIDI_nem_ptl_ni, PTL_PT_ONLY_USE_ONCE | PTL_PT_ONLY_TRUNCATE | PTL_PT_FLOWCTRL, MPIDI_nem_ptl_control_eq,
                      PTL_PT_ANY, &MPIDI_nem_ptl_control_pt);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptalloc", "**ptlptalloc %s", MPID_nem_ptl_strerror(ret));
 
     /* allocate portal for MPICH control messages */
-    ret = PtlPTAlloc(MPIDI_nem_ptl_ni, PTL_PT_ONLY_USE_ONCE | PTL_PT_ONLY_TRUNCATE | PTL_PT_FLOWCTRL, MPIDI_nem_ptl_target_eq,
+    ret = PtlPTAlloc(MPIDI_nem_ptl_ni, PTL_PT_ONLY_USE_ONCE | PTL_PT_ONLY_TRUNCATE | PTL_PT_FLOWCTRL, MPIDI_nem_ptl_eq,
                      PTL_PT_ANY, &MPIDI_nem_ptl_rpt_pt);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptalloc", "**ptlptalloc %s", MPID_nem_ptl_strerror(ret));
 
     /* allocate portal for MPICH control messages */
-    ret = PtlPTAlloc(MPIDI_nem_ptl_ni, PTL_PT_ONLY_USE_ONCE | PTL_PT_ONLY_TRUNCATE | PTL_PT_FLOWCTRL, MPIDI_nem_ptl_target_eq,
+    ret = PtlPTAlloc(MPIDI_nem_ptl_ni, PTL_PT_ONLY_USE_ONCE | PTL_PT_ONLY_TRUNCATE | PTL_PT_FLOWCTRL, MPIDI_nem_ptl_get_eq,
                      PTL_PT_ANY, &MPIDI_nem_ptl_get_rpt_pt);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptalloc", "**ptlptalloc %s", MPID_nem_ptl_strerror(ret));
 
     /* allocate portal for MPICH control messages */
-    ret = PtlPTAlloc(MPIDI_nem_ptl_ni, PTL_PT_ONLY_USE_ONCE | PTL_PT_ONLY_TRUNCATE | PTL_PT_FLOWCTRL, MPIDI_nem_ptl_target_eq,
+    ret = PtlPTAlloc(MPIDI_nem_ptl_ni, PTL_PT_ONLY_USE_ONCE | PTL_PT_ONLY_TRUNCATE | PTL_PT_FLOWCTRL, MPIDI_nem_ptl_control_eq,
                      PTL_PT_ANY, &MPIDI_nem_ptl_control_rpt_pt);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptalloc", "**ptlptalloc %s", MPID_nem_ptl_strerror(ret));
 
@@ -276,7 +285,7 @@ static int ptl_finalize(void)
 {
     int mpi_errno = MPI_SUCCESS;
     int ret;
-    ptl_handle_eq_t eqs[2];
+    ptl_handle_eq_t eqs[4];
     MPIDI_STATE_DECL(MPID_STATE_PTL_FINALIZE);
     MPIDI_FUNC_ENTER(MPID_STATE_PTL_FINALIZE);
 
@@ -288,9 +297,11 @@ static int ptl_finalize(void)
     if (mpi_errno) MPIU_ERR_POP(mpi_errno);
 
     /* shut down portals */
-    eqs[0] = MPIDI_nem_ptl_target_eq;
-    eqs[1] = MPIDI_nem_ptl_origin_eq;
-    ret = MPID_nem_ptl_rptl_drain_eq(2, eqs);
+    eqs[0] = MPIDI_nem_ptl_eq;
+    eqs[1] = MPIDI_nem_ptl_get_eq;
+    eqs[2] = MPIDI_nem_ptl_control_eq;
+    eqs[3] = MPIDI_nem_ptl_origin_eq;
+    ret = MPID_nem_ptl_rptl_drain_eq(4, eqs);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptfree", "**ptlptfree %s", MPID_nem_ptl_strerror(ret));
 
     ret = MPID_nem_ptl_rptl_ptfini(MPIDI_nem_ptl_pt);
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c
index 857c9ec..85ef2f8 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c
@@ -131,16 +131,26 @@ int MPID_nem_ptl_poll(int is_blocking_poll)
     /* MPIDI_FUNC_ENTER(MPID_STATE_MPID_NEM_PTL_POLL); */
 
     while (1) {
-        /* check both origin and target EQs for events */
-        ret = MPID_nem_ptl_rptl_eqget(MPIDI_nem_ptl_target_eq, &event);
+        /* check EQs for events */
+        ret = MPID_nem_ptl_rptl_eqget(MPIDI_nem_ptl_eq, &event);
         MPIU_ERR_CHKANDJUMP(ret == PTL_EQ_DROPPED, mpi_errno, MPI_ERR_OTHER, "**eqdropped");
         if (ret == PTL_EQ_EMPTY) {
-            ret = MPID_nem_ptl_rptl_eqget(MPIDI_nem_ptl_origin_eq, &event);
+            ret = MPID_nem_ptl_rptl_eqget(MPIDI_nem_ptl_get_eq, &event);
             MPIU_ERR_CHKANDJUMP(ret == PTL_EQ_DROPPED, mpi_errno, MPI_ERR_OTHER, "**eqdropped");
 
-            /* if both queues are empty, exit the loop */
-            if (ret == PTL_EQ_EMPTY)
-                break;
+            if (ret == PTL_EQ_EMPTY) {
+                ret = MPID_nem_ptl_rptl_eqget(MPIDI_nem_ptl_control_eq, &event);
+                MPIU_ERR_CHKANDJUMP(ret == PTL_EQ_DROPPED, mpi_errno, MPI_ERR_OTHER, "**eqdropped");
+
+                if (ret == PTL_EQ_EMPTY) {
+                    ret = MPID_nem_ptl_rptl_eqget(MPIDI_nem_ptl_origin_eq, &event);
+                    MPIU_ERR_CHKANDJUMP(ret == PTL_EQ_DROPPED, mpi_errno, MPI_ERR_OTHER, "**eqdropped");
+                }
+
+                /* all EQs are empty */
+                if (ret == PTL_EQ_EMPTY)
+                    break;
+            }
         }
         MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptleqget", "**ptleqget %s", MPID_nem_ptl_strerror(ret));
         MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST, "Received event %s pt_idx=%d ni_fail=%s list=%s user_ptr=%p hdr_data=%#lx mlength=%lu rlength=%lu",

http://git.mpich.org/mpich.git/commitdiff/6a58ae1a1aaef14a5166de9d7fe2af9556571df9

commit 6a58ae1a1aaef14a5166de9d7fe2af9556571df9
Author: Antonio Pena Monferrer <apenya at mcs.anl.gov>
Date:   Tue Nov 11 21:02:14 2014 -0600

    Added internal control portal for the get portal
    
    Signed-off-by: Pavan Balaji <balaji at anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
index 62dd474..61ceaed 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
@@ -99,6 +99,7 @@ typedef struct {
     ptl_pt_index_t ptg;
     ptl_pt_index_t ptc;
     ptl_pt_index_t ptr;
+    ptl_pt_index_t ptrg;
     ptl_pt_index_t ptrc;
     int id_initialized; /* TRUE iff id and pt have been initialized */
     MPIDI_msg_sz_t num_queued_sends; /* number of reqs for this vc in sendq */
@@ -166,7 +167,7 @@ int MPID_nem_ptl_poll_finalize(void);
 int MPID_nem_ptl_poll(int is_blocking_poll);
 int MPID_nem_ptl_vc_terminated(MPIDI_VC_t *vc);
 int MPID_nem_ptl_get_id_from_bc(const char *business_card, ptl_process_t *id, ptl_pt_index_t *pt, ptl_pt_index_t *ptg,
-                                ptl_pt_index_t *ptc, ptl_pt_index_t *ptr, ptl_pt_index_t *ptrc);
+                                ptl_pt_index_t *ptc, ptl_pt_index_t *ptr, ptl_pt_index_t *ptrg, ptl_pt_index_t *ptrc);
 void MPI_nem_ptl_pack_byte(MPID_Segment *segment, MPI_Aint first, MPI_Aint last, void *buf,
                            MPID_nem_ptl_pack_overflow_t *overflow);
 int MPID_nem_ptl_unpack_byte(MPID_Segment *segment, MPI_Aint first, MPI_Aint last, void *buf,
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
index a631723..5087c92 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
@@ -20,6 +20,7 @@
 #define PTIG_KEY "PTIG"
 #define PTIC_KEY "PTIC"
 #define PTIR_KEY "PTIR"
+#define PTIRG_KEY "PTIRG"
 #define PTIRC_KEY "PTIRC"
 
 ptl_handle_ni_t MPIDI_nem_ptl_ni;
@@ -30,6 +31,7 @@ ptl_pt_index_t  MPIDI_nem_ptl_rpt_pt; /* portal for rportals control messages */
 ptl_handle_eq_t MPIDI_nem_ptl_target_eq;
 ptl_handle_eq_t MPIDI_nem_ptl_origin_eq;
 ptl_pt_index_t  MPIDI_nem_ptl_control_rpt_pt; /* portal for rportals control messages */
+ptl_pt_index_t  MPIDI_nem_ptl_get_rpt_pt; /* portal for rportals control messages */
 ptl_handle_md_t MPIDI_nem_ptl_global_md;
 ptl_ni_limits_t MPIDI_nem_ptl_ni_limits;
 
@@ -212,6 +214,11 @@ static int ptl_init(MPIDI_PG_t *pg_p, int pg_rank, char **bc_val_p, int *val_max
 
     /* allocate portal for MPICH control messages */
     ret = PtlPTAlloc(MPIDI_nem_ptl_ni, PTL_PT_ONLY_USE_ONCE | PTL_PT_ONLY_TRUNCATE | PTL_PT_FLOWCTRL, MPIDI_nem_ptl_target_eq,
+                     PTL_PT_ANY, &MPIDI_nem_ptl_get_rpt_pt);
+    MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptalloc", "**ptlptalloc %s", MPID_nem_ptl_strerror(ret));
+
+    /* allocate portal for MPICH control messages */
+    ret = PtlPTAlloc(MPIDI_nem_ptl_ni, PTL_PT_ONLY_USE_ONCE | PTL_PT_ONLY_TRUNCATE | PTL_PT_FLOWCTRL, MPIDI_nem_ptl_target_eq,
                      PTL_PT_ANY, &MPIDI_nem_ptl_control_rpt_pt);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptalloc", "**ptlptalloc %s", MPID_nem_ptl_strerror(ret));
 
@@ -237,7 +244,7 @@ static int ptl_init(MPIDI_PG_t *pg_p, int pg_rank, char **bc_val_p, int *val_max
      * we pass PTL_PT_ANY as the dummy portal.  unfortunately, portals
      * does not have an "invalid" PT constant, which would have been
      * more appropriate to pass over here. */
-    ret = MPID_nem_ptl_rptl_ptinit(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_origin_eq, MPIDI_nem_ptl_get_pt, PTL_PT_ANY);
+    ret = MPID_nem_ptl_rptl_ptinit(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_origin_eq, MPIDI_nem_ptl_get_pt, MPIDI_nem_ptl_get_rpt_pt);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptalloc", "**ptlptalloc %s", MPID_nem_ptl_strerror(ret));
 
     ret = MPID_nem_ptl_rptl_ptinit(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_origin_eq, MPIDI_nem_ptl_control_pt, MPIDI_nem_ptl_control_rpt_pt);
@@ -307,6 +314,9 @@ static int ptl_finalize(void)
     ret = PtlPTFree(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_rpt_pt);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptfree", "**ptlptfree %s", MPID_nem_ptl_strerror(ret));
 
+    ret = PtlPTFree(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_get_rpt_pt);
+    MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptfree", "**ptlptfree %s", MPID_nem_ptl_strerror(ret));
+
     ret = PtlPTFree(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_control_rpt_pt);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptfree", "**ptlptfree %s", MPID_nem_ptl_strerror(ret));
 
@@ -377,6 +387,12 @@ static int get_business_card(int my_rank, char **bc_val_p, int *val_max_sz_p)
         MPIU_ERR_CHKANDJUMP(str_errno == MPIU_STR_NOMEM, mpi_errno, MPI_ERR_OTHER, "**buscard_len");
         MPIU_ERR_SETANDJUMP(mpi_errno, MPI_ERR_OTHER, "**buscard");
     }
+    str_errno = MPIU_Str_add_binary_arg(bc_val_p, val_max_sz_p, PTIRG_KEY, (char *)&MPIDI_nem_ptl_get_rpt_pt,
+                                        sizeof(MPIDI_nem_ptl_get_rpt_pt));
+    if (str_errno) {
+        MPIU_ERR_CHKANDJUMP(str_errno == MPIU_STR_NOMEM, mpi_errno, MPI_ERR_OTHER, "**buscard_len");
+        MPIU_ERR_SETANDJUMP(mpi_errno, MPI_ERR_OTHER, "**buscard");
+    }
     str_errno = MPIU_Str_add_binary_arg(bc_val_p, val_max_sz_p, PTIRC_KEY, (char *)&MPIDI_nem_ptl_control_rpt_pt,
                                         sizeof(MPIDI_nem_ptl_control_rpt_pt));
     if (str_errno) {
@@ -475,7 +491,7 @@ static int vc_destroy(MPIDI_VC_t *vc)
 #define FUNCNAME MPID_nem_ptl_get_id_from_bc
 #undef FCNAME
 #define FCNAME MPIU_QUOTE(FUNCNAME)
-int MPID_nem_ptl_get_id_from_bc(const char *business_card, ptl_process_t *id, ptl_pt_index_t *pt, ptl_pt_index_t *ptg, ptl_pt_index_t *ptc, ptl_pt_index_t *ptr, ptl_pt_index_t *ptrc)
+int MPID_nem_ptl_get_id_from_bc(const char *business_card, ptl_process_t *id, ptl_pt_index_t *pt, ptl_pt_index_t *ptg, ptl_pt_index_t *ptc, ptl_pt_index_t *ptr, ptl_pt_index_t *ptrg, ptl_pt_index_t *ptrc)
 {
     int mpi_errno = MPI_SUCCESS;
     int ret;
@@ -502,6 +518,9 @@ int MPID_nem_ptl_get_id_from_bc(const char *business_card, ptl_process_t *id, pt
     ret = MPIU_Str_get_binary_arg(business_card, PTIR_KEY, (char *)ptr, sizeof(ptr), &len);
     MPIU_ERR_CHKANDJUMP(ret != MPIU_STR_SUCCESS || len != sizeof(*ptr), mpi_errno, MPI_ERR_OTHER, "**badbusinesscard");
 
+    ret = MPIU_Str_get_binary_arg(business_card, PTIRG_KEY, (char *)ptrg, sizeof(ptr), &len);
+    MPIU_ERR_CHKANDJUMP(ret != MPIU_STR_SUCCESS || len != sizeof(*ptrc), mpi_errno, MPI_ERR_OTHER, "**badbusinesscard");
+
     ret = MPIU_Str_get_binary_arg(business_card, PTIRC_KEY, (char *)ptrc, sizeof(ptr), &len);
     MPIU_ERR_CHKANDJUMP(ret != MPIU_STR_SUCCESS || len != sizeof(*ptrc), mpi_errno, MPI_ERR_OTHER, "**badbusinesscard");
 
@@ -595,7 +614,7 @@ int MPID_nem_ptl_init_id(MPIDI_VC_t *vc)
     mpi_errno = vc->pg->getConnInfo(vc->pg_rank, bc, val_max_sz, vc->pg);
     if (mpi_errno) MPIU_ERR_POP(mpi_errno);
 
-    mpi_errno = MPID_nem_ptl_get_id_from_bc(bc, &vc_ptl->id, &vc_ptl->pt, &vc_ptl->ptg, &vc_ptl->ptc, &vc_ptl->ptr, &vc_ptl->ptrc);
+    mpi_errno = MPID_nem_ptl_get_id_from_bc(bc, &vc_ptl->id, &vc_ptl->pt, &vc_ptl->ptg, &vc_ptl->ptc, &vc_ptl->ptr, &vc_ptl->ptrg, &vc_ptl->ptrc);
     if (mpi_errno) MPIU_ERR_POP(mpi_errno);
 
     vc_ptl->id_initialized = TRUE;

http://git.mpich.org/mpich.git/commitdiff/7cad07836684993f753eafbb6b0920ba4e22bf22

commit 7cad07836684993f753eafbb6b0920ba4e22bf22
Author: Antonio J. Pena <apenya at mcs.anl.gov>
Date:   Wed Oct 22 16:23:56 2014 -0500

    Fix Portals4 RMA
    
    Full redesign, mainly of the functions in ptl_nm.c and the
    communications involving the "control" portal. Still some
    problems with flow control.
    
    Signed-off-by: Ken Raffenetti <raffenet at mcs.anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
index f94fa9a..62dd474 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
@@ -99,6 +99,7 @@ typedef struct {
     ptl_pt_index_t ptg;
     ptl_pt_index_t ptc;
     ptl_pt_index_t ptr;
+    ptl_pt_index_t ptrc;
     int id_initialized; /* TRUE iff id and pt have been initialized */
     MPIDI_msg_sz_t num_queued_sends; /* number of reqs for this vc in sendq */
 } MPID_nem_ptl_vc_area;
@@ -153,7 +154,7 @@ typedef struct {
 
 int MPID_nem_ptl_nm_init(void);
 int MPID_nem_ptl_nm_finalize(void);
-int MPID_nem_ptl_nm_event_handler(const ptl_event_t *e);
+int MPID_nem_ptl_nm_ctl_event_handler(const ptl_event_t *e);
 int MPID_nem_ptl_sendq_complete_with_error(MPIDI_VC_t *vc, int req_errno);
 int MPID_nem_ptl_SendNoncontig(MPIDI_VC_t *vc, MPID_Request *sreq, void *hdr, MPIDI_msg_sz_t hdr_sz);
 int MPID_nem_ptl_iStartContigMsg(MPIDI_VC_t *vc, void *hdr, MPIDI_msg_sz_t hdr_sz, void *data, MPIDI_msg_sz_t data_sz,
@@ -165,7 +166,7 @@ int MPID_nem_ptl_poll_finalize(void);
 int MPID_nem_ptl_poll(int is_blocking_poll);
 int MPID_nem_ptl_vc_terminated(MPIDI_VC_t *vc);
 int MPID_nem_ptl_get_id_from_bc(const char *business_card, ptl_process_t *id, ptl_pt_index_t *pt, ptl_pt_index_t *ptg,
-                                ptl_pt_index_t *ptc, ptl_pt_index_t *ptr);
+                                ptl_pt_index_t *ptc, ptl_pt_index_t *ptr, ptl_pt_index_t *ptrc);
 void MPI_nem_ptl_pack_byte(MPID_Segment *segment, MPI_Aint first, MPI_Aint last, void *buf,
                            MPID_nem_ptl_pack_overflow_t *overflow);
 int MPID_nem_ptl_unpack_byte(MPID_Segment *segment, MPI_Aint first, MPI_Aint last, void *buf,
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
index 96ada05..a631723 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
@@ -20,6 +20,7 @@
 #define PTIG_KEY "PTIG"
 #define PTIC_KEY "PTIC"
 #define PTIR_KEY "PTIR"
+#define PTIRC_KEY "PTIRC"
 
 ptl_handle_ni_t MPIDI_nem_ptl_ni;
 ptl_pt_index_t  MPIDI_nem_ptl_pt;
@@ -28,6 +29,7 @@ ptl_pt_index_t  MPIDI_nem_ptl_control_pt; /* portal for MPICH control messages *
 ptl_pt_index_t  MPIDI_nem_ptl_rpt_pt; /* portal for rportals control messages */
 ptl_handle_eq_t MPIDI_nem_ptl_target_eq;
 ptl_handle_eq_t MPIDI_nem_ptl_origin_eq;
+ptl_pt_index_t  MPIDI_nem_ptl_control_rpt_pt; /* portal for rportals control messages */
 ptl_handle_md_t MPIDI_nem_ptl_global_md;
 ptl_ni_limits_t MPIDI_nem_ptl_ni_limits;
 
@@ -114,7 +116,7 @@ static int get_target_info(int rank, ptl_process_t *id, ptl_pt_index_t local_dat
     }
     else if (local_data_pt == MPIDI_nem_ptl_control_pt) {
         *target_data_pt = vc_ptl->ptc;
-        *target_control_pt = PTL_PT_ANY;
+        *target_control_pt = vc_ptl->ptrc;
     }
 
  fn_exit:
@@ -208,6 +210,11 @@ static int ptl_init(MPIDI_PG_t *pg_p, int pg_rank, char **bc_val_p, int *val_max
                      PTL_PT_ANY, &MPIDI_nem_ptl_rpt_pt);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptalloc", "**ptlptalloc %s", MPID_nem_ptl_strerror(ret));
 
+    /* allocate portal for MPICH control messages */
+    ret = PtlPTAlloc(MPIDI_nem_ptl_ni, PTL_PT_ONLY_USE_ONCE | PTL_PT_ONLY_TRUNCATE | PTL_PT_FLOWCTRL, MPIDI_nem_ptl_target_eq,
+                     PTL_PT_ANY, &MPIDI_nem_ptl_control_rpt_pt);
+    MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptalloc", "**ptlptalloc %s", MPID_nem_ptl_strerror(ret));
+
     /* create an MD that covers all of memory */
     md.start = 0;
     md.length = (ptl_size_t)-1;
@@ -226,14 +233,14 @@ static int ptl_init(MPIDI_PG_t *pg_p, int pg_rank, char **bc_val_p, int *val_max
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptalloc", "**ptlptalloc %s", MPID_nem_ptl_strerror(ret));
 
     /* allow rportal to manage the get and control portals, but we
-     * don't expect retransmission to be needed on these portals, so
+     * don't expect retransmission to be needed on the get portal, so
      * we pass PTL_PT_ANY as the dummy portal.  unfortunately, portals
      * does not have an "invalid" PT constant, which would have been
      * more appropriate to pass over here. */
     ret = MPID_nem_ptl_rptl_ptinit(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_origin_eq, MPIDI_nem_ptl_get_pt, PTL_PT_ANY);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptalloc", "**ptlptalloc %s", MPID_nem_ptl_strerror(ret));
 
-    ret = MPID_nem_ptl_rptl_ptinit(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_origin_eq, MPIDI_nem_ptl_control_pt, PTL_PT_ANY);
+    ret = MPID_nem_ptl_rptl_ptinit(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_origin_eq, MPIDI_nem_ptl_control_pt, MPIDI_nem_ptl_control_rpt_pt);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptalloc", "**ptlptalloc %s", MPID_nem_ptl_strerror(ret));
 
     /* create business card */
@@ -300,6 +307,9 @@ static int ptl_finalize(void)
     ret = PtlPTFree(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_rpt_pt);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptfree", "**ptlptfree %s", MPID_nem_ptl_strerror(ret));
 
+    ret = PtlPTFree(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_control_rpt_pt);
+    MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptfree", "**ptlptfree %s", MPID_nem_ptl_strerror(ret));
+
     ret = PtlNIFini(MPIDI_nem_ptl_ni);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlnifini", "**ptlnifini %s", MPID_nem_ptl_strerror(ret));
 
@@ -367,6 +377,12 @@ static int get_business_card(int my_rank, char **bc_val_p, int *val_max_sz_p)
         MPIU_ERR_CHKANDJUMP(str_errno == MPIU_STR_NOMEM, mpi_errno, MPI_ERR_OTHER, "**buscard_len");
         MPIU_ERR_SETANDJUMP(mpi_errno, MPI_ERR_OTHER, "**buscard");
     }
+    str_errno = MPIU_Str_add_binary_arg(bc_val_p, val_max_sz_p, PTIRC_KEY, (char *)&MPIDI_nem_ptl_control_rpt_pt,
+                                        sizeof(MPIDI_nem_ptl_control_rpt_pt));
+    if (str_errno) {
+        MPIU_ERR_CHKANDJUMP(str_errno == MPIU_STR_NOMEM, mpi_errno, MPI_ERR_OTHER, "**buscard_len");
+        MPIU_ERR_SETANDJUMP(mpi_errno, MPI_ERR_OTHER, "**buscard");
+    }
 
  fn_exit:
     MPIDI_FUNC_EXIT(MPID_STATE_GET_BUSINESS_CARD);
@@ -435,6 +451,8 @@ static int vc_init(MPIDI_VC_t *vc)
     vc_ptl->id_initialized = FALSE;
     vc_ptl->num_queued_sends = 0;
 
+    mpi_errno = MPID_nem_ptl_init_id(vc);
+
     MPIDI_FUNC_EXIT(MPID_STATE_VC_INIT);
     return mpi_errno;
 }
@@ -457,7 +475,7 @@ static int vc_destroy(MPIDI_VC_t *vc)
 #define FUNCNAME MPID_nem_ptl_get_id_from_bc
 #undef FCNAME
 #define FCNAME MPIU_QUOTE(FUNCNAME)
-int MPID_nem_ptl_get_id_from_bc(const char *business_card, ptl_process_t *id, ptl_pt_index_t *pt, ptl_pt_index_t *ptg, ptl_pt_index_t *ptc, ptl_pt_index_t *ptr)
+int MPID_nem_ptl_get_id_from_bc(const char *business_card, ptl_process_t *id, ptl_pt_index_t *pt, ptl_pt_index_t *ptg, ptl_pt_index_t *ptc, ptl_pt_index_t *ptr, ptl_pt_index_t *ptrc)
 {
     int mpi_errno = MPI_SUCCESS;
     int ret;
@@ -484,6 +502,9 @@ int MPID_nem_ptl_get_id_from_bc(const char *business_card, ptl_process_t *id, pt
     ret = MPIU_Str_get_binary_arg(business_card, PTIR_KEY, (char *)ptr, sizeof(ptr), &len);
     MPIU_ERR_CHKANDJUMP(ret != MPIU_STR_SUCCESS || len != sizeof(*ptr), mpi_errno, MPI_ERR_OTHER, "**badbusinesscard");
 
+    ret = MPIU_Str_get_binary_arg(business_card, PTIRC_KEY, (char *)ptrc, sizeof(ptr), &len);
+    MPIU_ERR_CHKANDJUMP(ret != MPIU_STR_SUCCESS || len != sizeof(*ptrc), mpi_errno, MPI_ERR_OTHER, "**badbusinesscard");
+
  fn_exit:
     MPIDI_FUNC_EXIT(MPID_STATE_MPID_NEM_PTL_GET_ID_FROM_BC);
     return mpi_errno;
@@ -509,8 +530,6 @@ int vc_terminate(MPIDI_VC_t *vc)
            outstanding sends with an error and terminate
            connection immediately. */
         MPIU_ERR_SET1(req_errno, MPIX_ERR_PROC_FAILED, "**comm_fail", "**comm_fail %d", vc->pg_rank);
-        mpi_errno = MPID_nem_ptl_sendq_complete_with_error(vc, req_errno);
-        if (mpi_errno) MPIU_ERR_POP(mpi_errno);
         mpi_errno = MPID_nem_ptl_vc_terminated(vc);
         if (mpi_errno) MPIU_ERR_POP(mpi_errno);
      } else if (vc_ptl->num_queued_sends == 0) {
@@ -576,7 +595,7 @@ int MPID_nem_ptl_init_id(MPIDI_VC_t *vc)
     mpi_errno = vc->pg->getConnInfo(vc->pg_rank, bc, val_max_sz, vc->pg);
     if (mpi_errno) MPIU_ERR_POP(mpi_errno);
 
-    mpi_errno = MPID_nem_ptl_get_id_from_bc(bc, &vc_ptl->id, &vc_ptl->pt, &vc_ptl->ptg, &vc_ptl->ptc, &vc_ptl->ptr);
+    mpi_errno = MPID_nem_ptl_get_id_from_bc(bc, &vc_ptl->id, &vc_ptl->pt, &vc_ptl->ptg, &vc_ptl->ptc, &vc_ptl->ptr, &vc_ptl->ptrc);
     if (mpi_errno) MPIU_ERR_POP(mpi_errno);
 
     vc_ptl->id_initialized = TRUE;
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c
index f0d447d..60e8db8 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c
@@ -1,46 +1,42 @@
 /* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
 /*
- *  (C) 2012 by Argonne National Laboratory.
+ *  (C) 2014 by Argonne National Laboratory.
  *      See COPYRIGHT in top-level directory.
  */
 
 #include "ptl_impl.h"
+#include "stddef.h"  /* C99; for offsetof */
 #include <mpl_utlist.h>
 #include "rptl.h"
 
-#define NUM_SEND_BUFS 100
-#define NUM_RECV_BUFS 100
-#define BUFLEN  (sizeof(MPIDI_CH3_Pkt_t) + PTL_MAX_EAGER)
+#define NUM_RECV_BUFS 50
+#define CTL_TAG 0
+#define PAYLOAD_SIZE  (PTL_MAX_EAGER - offsetof(buf_t, packet) - sizeof(MPIDI_CH3_Pkt_t))
+#define SENDBUF_SIZE(sent_sz_) (offsetof(buf_t, packet) + sizeof(MPIDI_CH3_Pkt_t) + (sent_sz_))
+#define SENDBUF(req_) REQ_PTL(req_)->chunk_buffer[0]
+#define TMPBUF(req_) REQ_PTL(req_)->chunk_buffer[1]
+#define NEW_TAG(tag_) do {     \
+    global_tag += 2;           \
+    if (global_tag == CTL_TAG) \
+        global_tag += 2;       \
+    (tag_) = global_tag;       \
+} while(0)
+#define GET_TAG(tag_)  (((tag_) >> 1) << 1)
+#define DONE_TAG(tag_) ((tag_) | 0x1)
+
+typedef struct {
+    size_t remaining;
+    ptl_match_bits_t tag;
+    char packet[PTL_MAX_EAGER];
+} buf_t;
+
+static buf_t recvbufs[NUM_RECV_BUFS];
+static ptl_me_t mes[NUM_RECV_BUFS];
+static ptl_handle_me_t me_handles[NUM_RECV_BUFS];
+static unsigned long long put_cnt = 0;  /* required to not finalizing too early */
+static MPID_Request *done_req;
+static ptl_match_bits_t global_tag = 0;
 
-typedef struct MPID_nem_ptl_sendbuf {
-    struct MPID_nem_ptl_sendbuf *next;
-    union {
-        struct {
-            MPIDI_CH3_Pkt_t hdr;
-            char payload[PTL_MAX_EAGER];
-        } hp; /* header+payload */
-        char p[BUFLEN]; /* just payload */
-    } buf;
-} MPID_nem_ptl_sendbuf_t;
-
-static MPID_nem_ptl_sendbuf_t sendbuf[NUM_SEND_BUFS];
-static MPID_nem_ptl_sendbuf_t *free_head = NULL;
-static MPID_nem_ptl_sendbuf_t *free_tail = NULL;
-
-static char recvbuf[BUFLEN][NUM_RECV_BUFS];
-static ptl_me_t recvbuf_me[NUM_RECV_BUFS];
-static ptl_handle_me_t recvbuf_me_handle[NUM_RECV_BUFS];
-
-#define FREE_EMPTY() (free_head == NULL)
-#define FREE_HEAD() free_head
-#define FREE_PUSH(buf_p) MPL_LL_PREPEND(free_head, free_tail, buf_p)
-#define FREE_POP(buf_pp) do { *(buf_pp) = free_head; MPL_LL_DELETE(free_head, free_tail, free_head); } while (0)
-
-static struct {MPID_Request *head, *tail;} send_queue;
-
-static int send_queued(void);
-
-static void vc_dbg_print_sendq(FILE *stream, MPIDI_VC_t *vc) {/* FIXME: write real function */ return;}
 
 #undef FUNCNAME
 #define FUNCNAME MPID_nem_ptl_nm_init
@@ -56,36 +52,33 @@ int MPID_nem_ptl_nm_init(void)
 
     MPIDI_FUNC_ENTER(MPID_STATE_MPID_NEM_PTL_NM_INIT);
 
-    MPIU_Assert(BUFLEN == sizeof(sendbuf->buf));
-
-    /* init send */
-    for (i = 0; i < NUM_SEND_BUFS; ++i)
-        FREE_PUSH(&sendbuf[i]);
-
-    send_queue.head = send_queue.tail = NULL;
-
-    MPID_nem_net_module_vc_dbg_print_sendq = vc_dbg_print_sendq;
-
     /* init recv */
     id_any.phys.pid = PTL_PID_ANY;
     id_any.phys.nid = PTL_NID_ANY;
     
     for (i = 0; i < NUM_RECV_BUFS; ++i) {
-        recvbuf_me[i].start = recvbuf[i];
-        recvbuf_me[i].length = BUFLEN;
-        recvbuf_me[i].ct_handle = PTL_CT_NONE;
-        recvbuf_me[i].uid = PTL_UID_ANY;
-        recvbuf_me[i].options = (PTL_ME_OP_PUT | PTL_ME_USE_ONCE | PTL_ME_EVENT_UNLINK_DISABLE |
-                                 PTL_ME_EVENT_LINK_DISABLE | PTL_ME_IS_ACCESSIBLE);
-        recvbuf_me[i].match_id = id_any;
-        recvbuf_me[i].match_bits = 0;
-        recvbuf_me[i].ignore_bits = (ptl_match_bits_t)~0;
-
-        ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_control_pt, &recvbuf_me[i], PTL_PRIORITY_LIST, (void *)(uint64_t)i,
-                          &recvbuf_me_handle[i]);
-        MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmeappend", "**ptlmeappend %s", MPID_nem_ptl_strerror(ret));
+        mes[i].start = &recvbufs[i];
+        mes[i].length = sizeof(buf_t);
+        mes[i].ct_handle = PTL_CT_NONE;
+        mes[i].uid = PTL_UID_ANY;
+        mes[i].options = (PTL_ME_OP_PUT | PTL_ME_USE_ONCE | PTL_ME_EVENT_UNLINK_DISABLE |
+                         PTL_ME_EVENT_LINK_DISABLE | PTL_ME_IS_ACCESSIBLE);
+        mes[i].match_id = id_any;
+        mes[i].match_bits = CTL_TAG;
+        mes[i].ignore_bits = 0;
+
+        ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_control_pt, &mes[i],
+                          PTL_PRIORITY_LIST, (void *)(uint64_t)i, &me_handles[i]);
+        MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmeappend", "**ptlmeappend %s",
+                             MPID_nem_ptl_strerror(ret));
     }
 
+    done_req = MPID_Request_create();
+    MPIU_Assert(done_req != NULL);
+    done_req->dev.OnDataAvail = NULL;
+    SENDBUF(done_req) = NULL;
+    REQ_PTL(done_req)->event_handler = MPID_nem_ptl_nm_ctl_event_handler;
+
  fn_exit:
     MPIDI_FUNC_EXIT(MPID_STATE_MPID_NEM_PTL_NM_INIT);
     return mpi_errno;
@@ -106,11 +99,16 @@ int MPID_nem_ptl_nm_finalize(void)
 
     MPIDI_FUNC_ENTER(MPID_STATE_MPID_NEM_PTL_NM_FINALIZE);
 
+    while (put_cnt) MPID_nem_ptl_poll(1);  /* Wait for puts to finish */
+
     for (i = 0; i < NUM_RECV_BUFS; ++i) {
-        ret = PtlMEUnlink(recvbuf_me_handle[i]);
-        MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmeunlink", "**ptlmeunlink %s", MPID_nem_ptl_strerror(ret));
+        ret = PtlMEUnlink(me_handles[i]);
+        MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmeunlink", "**ptlmeunlink %s",
+                             MPID_nem_ptl_strerror(ret));
     }
 
+    MPIDI_CH3_Request_destroy(done_req);
+
  fn_exit:
     MPIDI_FUNC_EXIT(MPID_STATE_MPID_NEM_PTL_NM_FINALIZE);
     return mpi_errno;
@@ -119,111 +117,140 @@ int MPID_nem_ptl_nm_finalize(void)
 }
 
 #undef FUNCNAME
-#define FUNCNAME MPID_nem_ptl_sendq_complete_with_error
+#define FUNCNAME meappend_done
 #undef FCNAME
 #define FCNAME MPIU_QUOTE(FUNCNAME)
-int MPID_nem_ptl_sendq_complete_with_error(MPIDI_VC_t *vc, int req_errno)
+static inline int meappend_done(ptl_process_t id, MPID_Request *req, ptl_match_bits_t tag)
 {
     int mpi_errno = MPI_SUCCESS;
-    MPIDI_STATE_DECL(MPID_STATE_MPID_NEM_PTL_SENDQ_COMPLETE_WITH_ERROR);
-
-    MPIDI_FUNC_ENTER(MPID_STATE_MPID_NEM_PTL_SENDQ_COMPLETE_WITH_ERROR);
-
+    int ret;
+    ptl_me_t me;
+    ptl_handle_me_t me_handle;
+    MPIDI_STATE_DECL(MPID_STATE_MEAPPEND_DONE);
+
+    MPIDI_FUNC_ENTER(MPID_STATE_MEAPPEND_DONE);
+
+    me.start = NULL;
+    me.length = 0;
+    me.ct_handle = PTL_CT_NONE;
+    me.uid = PTL_UID_ANY;
+    me.options = ( PTL_ME_OP_PUT | PTL_ME_USE_ONCE | PTL_ME_IS_ACCESSIBLE |
+                   PTL_ME_EVENT_LINK_DISABLE | PTL_ME_EVENT_UNLINK_DISABLE );
+    me.match_id = id;
+    me.match_bits = DONE_TAG(tag);
+    me.ignore_bits = 0;
+    me.min_free = 0;
+    ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_control_pt, &me, PTL_PRIORITY_LIST, req,
+                      &me_handle);
+    MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST, "PtlMEAppend(req=%p tag=%#lx)", req, DONE_TAG(tag)));
+    MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmeappend", "**ptlmeappend %s",
+                         MPID_nem_ptl_strerror(ret));
+    ++put_cnt;
 
  fn_exit:
-    MPIDI_FUNC_EXIT(MPID_STATE_MPID_NEM_PTL_SENDQ_COMPLETE_WITH_ERROR);
+    MPIDI_FUNC_EXIT(MPID_STATE_MEAPPEND_DONE);
     return mpi_errno;
  fn_fail:
     goto fn_exit;
 }
 
-
-
 #undef FUNCNAME
-#define FUNCNAME save_iov
+#define FUNCNAME meappend_large
 #undef FCNAME
 #define FCNAME MPIU_QUOTE(FUNCNAME)
-static inline void save_iov(MPID_Request *sreq, void *hdr, void *data, MPIDI_msg_sz_t data_sz)
+static inline int meappend_large(ptl_process_t id, MPID_Request *req, ptl_match_bits_t tag, void *buf, size_t remaining)
 {
-    int index = 0;
-    MPIDI_STATE_DECL(MPID_STATE_SAVE_IOV);
-
-    MPIDI_FUNC_ENTER(MPID_STATE_SAVE_IOV);
-
-    MPIU_Assert(hdr || data_sz);
-    
-    if (hdr) {
-        sreq->dev.pending_pkt = *(MPIDI_CH3_Pkt_t *)hdr;
-        sreq->dev.iov[index].MPID_IOV_BUF = &sreq->dev.pending_pkt;
-        sreq->dev.iov[index].MPID_IOV_LEN = sizeof(MPIDI_CH3_Pkt_t);
-        ++index;
-    }
-    if (data_sz) {
-        sreq->dev.iov[index].MPID_IOV_BUF = data;
-        sreq->dev.iov[index].MPID_IOV_LEN = data_sz;
-        ++index;
+    int mpi_errno = MPI_SUCCESS;
+    int ret;
+    ptl_me_t me;
+    MPIDI_STATE_DECL(MPID_STATE_MEAPPEND_LARGE);
+
+    MPIDI_FUNC_ENTER(MPID_STATE_MEAPPEND_LARGE);
+
+    me.start = buf;
+    me.length = remaining < MPIDI_nem_ptl_ni_limits.max_msg_size ?
+                    remaining : MPIDI_nem_ptl_ni_limits.max_msg_size;
+    me.ct_handle = PTL_CT_NONE;
+    me.uid = PTL_UID_ANY;
+    me.options = ( PTL_ME_OP_GET | PTL_ME_USE_ONCE | PTL_ME_IS_ACCESSIBLE |
+                   PTL_ME_EVENT_LINK_DISABLE | PTL_ME_EVENT_UNLINK_DISABLE );
+    me.match_id = id;
+    me.match_bits = GET_TAG(tag);
+    me.ignore_bits = 0;
+    me.min_free = 0;
+
+    while (remaining) {
+        int incomplete;
+        ptl_handle_me_t foo_me_handle;
+
+        MPIDI_CH3U_Request_increment_cc(req, &incomplete);  /* Cannot avoid GET events from poll infrastructure */
+
+        ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_control_pt, &me, PTL_PRIORITY_LIST, req,
+                          &foo_me_handle);
+        MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmeappend", "**ptlmeappend %s",
+                             MPID_nem_ptl_strerror(ret));
+        MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST, "PtlMEAppend(req=%p tag=%#lx)", req, GET_TAG(tag)));
+
+        me.start = (char *)me.start + me.length;
+        remaining -= me.length;
+        if (remaining < MPIDI_nem_ptl_ni_limits.max_msg_size)
+            me.length = remaining;
     }
-    sreq->dev.iov_count = index;
 
-    MPIDI_FUNC_EXIT(MPID_STATE_SAVE_IOV);
+ fn_exit:
+    MPIDI_FUNC_EXIT(MPID_STATE_MEAPPEND_LARGE);
+    return mpi_errno;
+ fn_fail:
+    goto fn_exit;
 }
 
 #undef FUNCNAME
 #define FUNCNAME send_pkt
 #undef FCNAME
 #define FCNAME MPIU_QUOTE(FUNCNAME)
-static inline int send_pkt(MPIDI_VC_t *vc, void **vhdr_p, void **vdata_p, MPIDI_msg_sz_t *data_sz_p)
+static inline int send_pkt(MPIDI_VC_t *vc, void *hdr_p, void *data_p, MPIDI_msg_sz_t data_sz,
+                           MPID_Request *sreq)
 {
     int mpi_errno = MPI_SUCCESS;
-    MPID_nem_ptl_sendbuf_t *sb;
     MPID_nem_ptl_vc_area *const vc_ptl = VC_PTL(vc);
     int ret;
-    MPIDI_CH3_Pkt_t **hdr_p = (MPIDI_CH3_Pkt_t **)vhdr_p;
-    char **data_p = (char **)vdata_p;
+    buf_t *sendbuf;
+    const size_t sent_sz = data_sz < PAYLOAD_SIZE ? data_sz : PAYLOAD_SIZE;
+    const size_t sendbuf_sz = SENDBUF_SIZE(sent_sz);
     MPIDI_STATE_DECL(MPID_STATE_SEND_PKT);
 
     MPIDI_FUNC_ENTER(MPID_STATE_SEND_PKT);
     
-    if (!vc_ptl->id_initialized) {
-        mpi_errno = MPID_nem_ptl_init_id(vc);
-        if (mpi_errno) MPIU_ERR_POP(mpi_errno);
-    }
+    sendbuf = MPIU_Malloc(sendbuf_sz);
+    MPIU_Assert(sendbuf != NULL);
+    MPIU_Memcpy(sendbuf->packet, hdr_p, sizeof(MPIDI_CH3_Pkt_t));
+    sendbuf->remaining = data_sz - sent_sz;
+    NEW_TAG(sendbuf->tag);
+    TMPBUF(sreq) = NULL;
 
-    if (MPIDI_CH3I_Sendq_empty(send_queue) && !FREE_EMPTY()) {
-        MPIDI_msg_sz_t len;
-        /* send header and first chunk of data */
-        FREE_POP(&sb);
-        sb->buf.hp.hdr = **hdr_p;
-        len = *data_sz_p;
-        if (len > PTL_MAX_EAGER)
-            len = PTL_MAX_EAGER;
-        MPIU_Memcpy(sb->buf.hp.payload, *data_p, len);
-        ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)sb->buf.p, sizeof(sb->buf.hp.hdr) + len, PTL_NO_ACK_REQ, vc_ptl->id, vc_ptl->ptc, 0, 0, sb,
-                                    MPIDI_Process.my_pg_rank, 1);
-        MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s", MPID_nem_ptl_strerror(ret));
-        MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST, "MPID_nem_ptl_rptl_put(size=%lu id=(%#x,%#x) pt=%#x) sb=%p",
-                                                sizeof(sb->buf.hp.hdr) + len, vc_ptl->id.phys.nid, vc_ptl->id.phys.pid,
-                                                vc_ptl->ptc, sb));
-        *hdr_p = NULL;
-        *data_p += len;
-        *data_sz_p -= len;
-
-        /* send additional data chunks if necessary */
-        while (*data_sz_p && !FREE_EMPTY()) {
-            FREE_POP(&sb);
-            len = *data_sz_p;
-            if (len > BUFLEN)
-                len = BUFLEN;
-            MPIU_Memcpy(sb->buf.p, *data_p, len);
-            ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)sb->buf.p, len, PTL_NO_ACK_REQ, vc_ptl->id, vc_ptl->ptc, 0, 0, sb, MPIDI_Process.my_pg_rank, 1);
-            MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s", MPID_nem_ptl_strerror(ret));
-            MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST, "MPID_nem_ptl_rptl_put(size=%lu id=(%#x,%#x) pt=%#x) sb=%p", len,
-                                                    vc_ptl->id.phys.nid, vc_ptl->id.phys.pid, vc_ptl->ptc, sb));
-            *data_p += len;
-            *data_sz_p -= len;
-        }
+    if (data_sz) {
+        MPIU_Memcpy(sendbuf->packet + sizeof(MPIDI_CH3_Pkt_t), data_p, sent_sz);
+        if (sendbuf->remaining)  /* Post MEs for the remote gets */
+            mpi_errno = meappend_large(vc_ptl->id, sreq, sendbuf->tag, (char *)data_p + sent_sz, sendbuf->remaining);
+            if (mpi_errno)
+                goto fn_fail;
     }
 
+    SENDBUF(sreq) = sendbuf;
+    REQ_PTL(sreq)->event_handler = MPID_nem_ptl_nm_ctl_event_handler;
+
+    /* Post ME for the DONE message */
+    mpi_errno = meappend_done(vc_ptl->id, sreq, sendbuf->tag);
+    if (mpi_errno)
+        goto fn_fail;
+
+    ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)sendbuf, sendbuf_sz, PTL_NO_ACK_REQ,
+                                vc_ptl->id, vc_ptl->ptc, CTL_TAG, 0, sreq, MPIDI_Process.my_pg_rank, 1);
+    MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s",
+                         MPID_nem_ptl_strerror(ret));
+    MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST, "PtlPut(size=%lu id=(%#x,%#x) pt=%#x)",
+                                            sendbuf_sz, vc_ptl->id.phys.nid,
+                                            vc_ptl->id.phys.pid, vc_ptl->ptc));
  fn_exit:
     MPIDI_FUNC_EXIT(MPID_STATE_SEND_PKT);
     return mpi_errno;
@@ -235,116 +262,61 @@ static inline int send_pkt(MPIDI_VC_t *vc, void **vhdr_p, void **vdata_p, MPIDI_
 #define FUNCNAME send_noncontig_pkt
 #undef FCNAME
 #define FCNAME MPIU_QUOTE(FUNCNAME)
-static int send_noncontig_pkt(MPIDI_VC_t *vc, MPID_Request *sreq, void **vhdr_p, int *complete)
+static int send_noncontig_pkt(MPIDI_VC_t *vc, MPID_Request *sreq, void *hdr_p)
 {
     int mpi_errno = MPI_SUCCESS;
-    MPID_nem_ptl_sendbuf_t *sb;
     MPID_nem_ptl_vc_area *const vc_ptl = VC_PTL(vc);
     int ret;
-    MPIDI_msg_sz_t last;
-    MPIDI_CH3_Pkt_t **hdr_p = (MPIDI_CH3_Pkt_t **)vhdr_p;
+    buf_t *sendbuf;
+    const size_t sent_sz = sreq->dev.segment_size < PAYLOAD_SIZE ? sreq->dev.segment_size : PAYLOAD_SIZE;
+    size_t sendbuf_sz = SENDBUF_SIZE(sent_sz);
     MPIDI_STATE_DECL(MPID_STATE_SEND_NONCONTIG_PKT);
-
     MPIDI_FUNC_ENTER(MPID_STATE_SEND_NONCONTIG_PKT);
 
-    *complete = 0;
-    MPID_nem_ptl_init_req(sreq);
+    MPIU_Assert(sreq->dev.segment_first == 0);
 
-    if (!vc_ptl->id_initialized) {
-        mpi_errno = MPID_nem_ptl_init_id(vc);
-        if (mpi_errno) MPIU_ERR_POP(mpi_errno);
-    }
+    sendbuf = MPIU_Malloc(sendbuf_sz);
+    MPIU_Assert(sendbuf != NULL);
+    MPIU_Memcpy(sendbuf->packet, hdr_p, sizeof(MPIDI_CH3_Pkt_t));
+    sendbuf->remaining = sreq->dev.segment_size - sent_sz;
+    NEW_TAG(sendbuf->tag);
+    TMPBUF(sreq) = NULL;
 
-    if (MPIDI_CH3I_Sendq_empty(send_queue) && !FREE_EMPTY()) {
-        /* send header and first chunk of data */
-        FREE_POP(&sb);
-        sb->buf.hp.hdr = **hdr_p;
-
-        MPIU_Assert(sreq->dev.segment_first == 0);
-
-        last = sreq->dev.segment_size;
-        if (last > PTL_MAX_EAGER)
-            last = PTL_MAX_EAGER;
-        MPI_nem_ptl_pack_byte(sreq->dev.segment_ptr, 0, last, sb->buf.hp.payload, &REQ_PTL(sreq)->overflow[0]);
-        ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)sb->buf.p, sizeof(sb->buf.hp.hdr) + last, PTL_NO_ACK_REQ, vc_ptl->id, vc_ptl->ptc, 0, 0, sb,
-                                    MPIDI_Process.my_pg_rank, 1);
-        MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s", MPID_nem_ptl_strerror(ret));
-        MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST, "MPID_nem_ptl_rptl_put(size=%lu id=(%#x,%#x) pt=%#x) sb=%p",
-                                                sizeof(sb->buf.hp.hdr) + last, vc_ptl->id.phys.nid, vc_ptl->id.phys.pid,
-                                                vc_ptl->ptc, sb));
-        *vhdr_p = NULL;
-
-        if (last == sreq->dev.segment_size) {
-            *complete = 1;
-            goto fn_exit;
-        }
-        
-        /* send additional data chunks */
-        sreq->dev.segment_first = last;
+    if (sreq->dev.segment_size) {
+        MPIDI_msg_sz_t last = sent_sz;
+        MPID_Segment_pack(sreq->dev.segment_ptr, 0, &last, sendbuf->packet + sizeof(MPIDI_CH3_Pkt_t));
 
-        while (!FREE_EMPTY()) {
-            FREE_POP(&sb);
-            
+        if (sendbuf->remaining) {  /* Post MEs for the remote gets */
+            TMPBUF(sreq) = MPIU_Malloc(sendbuf->remaining);
+            sreq->dev.segment_first = last;
             last = sreq->dev.segment_size;
-            if (last > sreq->dev.segment_first+BUFLEN)
-                last = sreq->dev.segment_first+BUFLEN;
+            MPID_Segment_pack(sreq->dev.segment_ptr, sreq->dev.segment_first, &last, TMPBUF(sreq));
+            MPIU_Assert(last == sreq->dev.segment_size);
 
-            MPI_nem_ptl_pack_byte(sreq->dev.segment_ptr, sreq->dev.segment_first, last, sb->buf.p, &REQ_PTL(sreq)->overflow[0]);
-            sreq->dev.segment_first = last;
-            ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)sb->buf.p, last - sreq->dev.segment_first, PTL_NO_ACK_REQ, vc_ptl->id, vc_ptl->ptc, 0, 0, sb,
-                                        MPIDI_Process.my_pg_rank, 1);
-            MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s", MPID_nem_ptl_strerror(ret));
-            MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST, "MPID_nem_ptl_rptl_put(size=%lu id=(%#x,%#x) pt=%#x) sb=%p",
-                                                    last - sreq->dev.segment_first, vc_ptl->id.phys.nid, vc_ptl->id.phys.pid,
-                                                    vc_ptl->ptc, sb));
-
-            if (last == sreq->dev.segment_size) {
-                *complete = 1;
-                goto fn_exit;
-            }
+            mpi_errno = meappend_large(vc_ptl->id, sreq, sendbuf->tag, TMPBUF(sreq), sendbuf->remaining);
+            if (mpi_errno)
+                goto fn_fail;
         }
     }
 
- fn_exit:
-    MPIDI_FUNC_EXIT(MPID_STATE_SEND_NONCONTIG_PKT);
-    return mpi_errno;
- fn_fail:
-    goto fn_exit;
-}
-
+    SENDBUF(sreq) = sendbuf;
+    REQ_PTL(sreq)->event_handler = MPID_nem_ptl_nm_ctl_event_handler;
 
-#undef FUNCNAME
-#define FUNCNAME enqueue_request
-#undef FCNAME
-#define FCNAME MPIU_QUOTE(FUNCNAME)
-static int enqueue_request(MPIDI_VC_t *vc, MPID_Request *sreq)
-{
-    int mpi_errno = MPI_SUCCESS;
-    MPID_nem_ptl_vc_area *const vc_ptl = VC_PTL(vc);
-    MPIDI_STATE_DECL(MPID_STATE_ENQUEUE_REQUEST);
+    /* Post ME for the DONE message */
+    mpi_errno = meappend_done(vc_ptl->id, sreq, sendbuf->tag);
+    if (mpi_errno)
+        goto fn_fail;
 
-    MPIDI_FUNC_ENTER(MPID_STATE_ENQUEUE_REQUEST);
-    
-    MPIU_DBG_MSG (CH3_CHANNEL, VERBOSE, "enqueuing");
-    MPIU_Assert(FREE_EMPTY() || !MPIDI_CH3I_Sendq_empty(send_queue));
-    MPIU_Assert(sreq->dev.iov_count >= 1 && sreq->dev.iov[0].MPID_IOV_LEN > 0);
-
-    sreq->ch.vc = vc;
-    sreq->dev.iov_offset = 0;
-
-    ++(vc_ptl->num_queued_sends);
-        
-    if (FREE_EMPTY()) {
-        MPIDI_CH3I_Sendq_enqueue(&send_queue, sreq);
-    } else {
-        /* there are other sends in the queue before this one: try to send from the queue */
-        MPIDI_CH3I_Sendq_enqueue(&send_queue, sreq);
-        mpi_errno = send_queued();
-        if (mpi_errno) MPIU_ERR_POP(mpi_errno);
-    }
+    ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)sendbuf, sendbuf_sz, PTL_NO_ACK_REQ,
+                                vc_ptl->id, vc_ptl->ptc, CTL_TAG, 0, sreq, MPIDI_Process.my_pg_rank, 1);
+    MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s",
+                         MPID_nem_ptl_strerror(ret));
+    MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST, "PtlPut(size=%lu id=(%#x,%#x) pt=%#x)",
+                                            sendbuf_sz, vc_ptl->id.phys.nid,
+                                            vc_ptl->id.phys.pid, vc_ptl->ptc));
 
  fn_exit:
-    MPIDI_FUNC_EXIT(MPID_STATE_ENQUEUE_REQUEST);
+    MPIDI_FUNC_EXIT(MPID_STATE_SEND_NONCONTIG_PKT);
     return mpi_errno;
  fn_fail:
     goto fn_exit;
@@ -358,46 +330,14 @@ static int enqueue_request(MPIDI_VC_t *vc, MPID_Request *sreq)
 int MPID_nem_ptl_SendNoncontig(MPIDI_VC_t *vc, MPID_Request *sreq, void *hdr, MPIDI_msg_sz_t hdr_sz)
 {
     int mpi_errno = MPI_SUCCESS;
-    int complete = 0;
     MPIDI_STATE_DECL(MPID_STATE_MPID_NEM_PTL_SENDNONCONTIG);
 
     MPIDI_FUNC_ENTER(MPID_STATE_MPID_NEM_PTL_SENDNONCONTIG);
-    MPIU_ERR_SETFATALANDJUMP(mpi_errno, MPI_ERR_OTHER, "**notimpl");
     
     MPIU_Assert(hdr_sz <= sizeof(MPIDI_CH3_Pkt_t));
-    
-    mpi_errno = send_noncontig_pkt(vc, sreq, &hdr, &complete);
+    mpi_errno = send_noncontig_pkt(vc, sreq, hdr);
     if (mpi_errno) MPIU_ERR_POP(mpi_errno);
     
-    if (complete) {
-        /* sent whole message */
-        int (*reqFn)(MPIDI_VC_t *, MPID_Request *, int *);
-        reqFn = sreq->dev.OnDataAvail;
-        if (!reqFn) {
-            MPIU_Assert(MPIDI_Request_get_type(sreq) != MPIDI_REQUEST_TYPE_GET_RESP);
-            MPIDI_CH3U_Request_complete(sreq);
-            MPIU_DBG_MSG(CH3_CHANNEL, VERBOSE, ".... complete");
-            goto fn_exit;
-        } else {
-            complete = 0;
-            mpi_errno = reqFn(vc, sreq, &complete);
-            if (mpi_errno) MPIU_ERR_POP(mpi_errno);
-                        
-            if (complete) {
-                MPIU_DBG_MSG(CH3_CHANNEL, VERBOSE, ".... complete");
-                goto fn_exit;
-            }
-            /* not completed: more to send */
-        }
-    }
-
-    REQ_PTL(sreq)->noncontig = TRUE;
-    save_iov(sreq, hdr, NULL, 0); /* save the header in IOV if necessary */
-
-    /* enqueue request */
-    mpi_errno = enqueue_request(vc, sreq);
-    if (mpi_errno) MPIU_ERR_POP(mpi_errno);
-
  fn_exit:
     MPIDI_FUNC_EXIT(MPID_STATE_MPID_NEM_PTL_SENDNONCONTIG);
     return mpi_errno;
@@ -409,40 +349,25 @@ int MPID_nem_ptl_SendNoncontig(MPIDI_VC_t *vc, MPID_Request *sreq, void *hdr, MP
 #define FUNCNAME MPID_nem_ptl_iStartContigMsg
 #undef FCNAME
 #define FCNAME MPIU_QUOTE(FUNCNAME)
-int MPID_nem_ptl_iStartContigMsg(MPIDI_VC_t *vc, void *hdr, MPIDI_msg_sz_t hdr_sz, void *data, MPIDI_msg_sz_t data_sz,
-                                   MPID_Request **sreq_ptr)
+int MPID_nem_ptl_iStartContigMsg(MPIDI_VC_t *vc, void *hdr, MPIDI_msg_sz_t hdr_sz, void *data,
+                                 MPIDI_msg_sz_t data_sz, MPID_Request **sreq_ptr)
 {
     int mpi_errno = MPI_SUCCESS;
-    MPID_Request *sreq = NULL;
     MPIDI_STATE_DECL(MPID_STATE_MPID_NEM_PTL_ISTARTCONTIGMSG);
 
     MPIDI_FUNC_ENTER(MPID_STATE_MPID_NEM_PTL_ISTARTCONTIGMSG);
     MPIU_Assert(hdr_sz <= sizeof(MPIDI_CH3_Pkt_t));
 
-    mpi_errno = send_pkt(vc, &hdr, &data, &data_sz);
-    if (mpi_errno) MPIU_ERR_POP(mpi_errno);
-    
-    if (hdr == NULL && data_sz == 0) {
-        /* sent whole message */
-        *sreq_ptr = NULL;
-        goto fn_exit;
-    }
-    
     /* create a request */
-    sreq = MPID_Request_create();
-    MPIU_Assert(sreq != NULL);
-    MPIU_Object_set_ref(sreq, 2);
-    sreq->kind = MPID_REQUEST_SEND;
-
-    sreq->dev.OnDataAvail = 0;
-    REQ_PTL(sreq)->noncontig = FALSE;
-    save_iov(sreq, hdr, data, data_sz);
-
-    /* enqueue request */
-    mpi_errno = enqueue_request(vc, sreq);
+    *sreq_ptr = MPID_Request_create();
+    MPIU_Assert(*sreq_ptr != NULL);
+    MPIU_Object_set_ref(*sreq_ptr, 2);
+    (*sreq_ptr)->kind = MPID_REQUEST_SEND;
+    (*sreq_ptr)->dev.OnDataAvail = NULL;
+    (*sreq_ptr)->dev.user_buf = NULL;
+
+    mpi_errno = send_pkt(vc, hdr, data, data_sz, *sreq_ptr);
     if (mpi_errno) MPIU_ERR_POP(mpi_errno);
-    
-    *sreq_ptr = sreq;
 
  fn_exit:
     MPIDI_FUNC_EXIT(MPID_STATE_MPID_NEM_PTL_ISTARTCONTIGMSG);
@@ -464,40 +389,7 @@ int MPID_nem_ptl_iSendContig(MPIDI_VC_t *vc, MPID_Request *sreq, void *hdr, MPID
     MPIDI_FUNC_ENTER(MPID_STATE_MPID_NEM_PTL_ISENDCONTIG);
     MPIU_Assert(hdr_sz <= sizeof(MPIDI_CH3_Pkt_t));
     
-    mpi_errno = send_pkt(vc, &hdr, &data, &data_sz);
-    if (mpi_errno) MPIU_ERR_POP(mpi_errno);
-    
-    if (hdr == NULL && data_sz == 0) {
-        /* sent whole message */
-        int (*reqFn)(MPIDI_VC_t *, MPID_Request *, int *);
-        reqFn = sreq->dev.OnDataAvail;
-        if (!reqFn) {
-            MPIU_Assert(MPIDI_Request_get_type(sreq) != MPIDI_REQUEST_TYPE_GET_RESP);
-            MPIDI_CH3U_Request_complete(sreq);
-            MPIU_DBG_MSG(CH3_CHANNEL, VERBOSE, ".... complete");
-            goto fn_exit;
-        } else {
-            int complete = 0;
-                        
-            mpi_errno = reqFn(vc, sreq, &complete);
-            if (mpi_errno) MPIU_ERR_POP(mpi_errno);
-                        
-            if (complete) {
-                MPIU_DBG_MSG(CH3_CHANNEL, VERBOSE, ".... complete");
-                goto fn_exit;
-            }
-            /* not completed: more to send */
-        }
-    } else {
-        save_iov(sreq, hdr, data, data_sz);
-    }
-
-    REQ_PTL(sreq)->noncontig = FALSE;
-    
-    /* enqueue request */
-    MPIU_Assert(sreq->dev.iov_count >= 1 && sreq->dev.iov[0].MPID_IOV_LEN > 0);
-
-    mpi_errno = enqueue_request(vc, sreq);
+    mpi_errno = send_pkt(vc, hdr, data, data_sz, sreq);
     if (mpi_errno) MPIU_ERR_POP(mpi_errno);
     
  fn_exit:
@@ -508,159 +400,156 @@ int MPID_nem_ptl_iSendContig(MPIDI_VC_t *vc, MPID_Request *sreq, void *hdr, MPID
 }
 
 #undef FUNCNAME
-#define FUNCNAME send_queued
+#define FUNCNAME on_data_avail
 #undef FCNAME
 #define FCNAME MPIU_QUOTE(FUNCNAME)
-static int send_queued(void)
+static inline void on_data_avail(MPID_Request * req)
 {
-    int mpi_errno = MPI_SUCCESS;
-    MPID_nem_ptl_sendbuf_t *sb;
-    int ret;
-    MPIDI_STATE_DECL(MPID_STATE_SEND_QUEUED);
-
-    MPIDI_FUNC_ENTER(MPID_STATE_SEND_QUEUED);
-
-    while (!MPIDI_CH3I_Sendq_empty(send_queue) && !FREE_EMPTY()) {
-        int complete = TRUE;
-        MPIDI_msg_sz_t send_len = 0;
-        int i;
-        MPID_Request *sreq;
-        int (*reqFn)(MPIDI_VC_t *, MPID_Request *, int *);
-
-        sreq = MPIDI_CH3I_Sendq_head(send_queue); /* don't dequeue until we're finished sending this request */
-        FREE_POP(&sb);
-        
-        /* copy the iov */
-        MPIU_Assert(sreq->dev.iov_count <= 2);
-        for (i = sreq->dev.iov_offset; i < sreq->dev.iov_count + sreq->dev.iov_offset; ++i) {
-            MPIDI_msg_sz_t len;
-            len = sreq->dev.iov[i].iov_len;
-            if (len > BUFLEN)
-                len = BUFLEN;
-            MPIU_Memcpy(sb->buf.p, sreq->dev.iov[i].iov_base, len);
-            send_len += len;
-            if (len < sreq->dev.iov[i].iov_len) {
-                /* ran out of space in buffer */
-                sreq->dev.iov[i].iov_base = (char *)sreq->dev.iov[i].iov_base + len;
-                sreq->dev.iov[i].iov_len -= len;
-                sreq->dev.iov_offset = i;
-                complete = FALSE;
-                break;
-            }
-        }
+    MPIDI_STATE_DECL(MPID_STATE_ON_DATA_AVAIL);
+    MPIDI_FUNC_ENTER(MPID_STATE_ON_DATA_AVAIL);
 
-        /* copy any noncontig data if there's room left in the send buffer */
-        if (send_len < BUFLEN && REQ_PTL(sreq)->noncontig) {
-            MPIDI_msg_sz_t last;
-            MPIU_Assert(complete); /* if complete has been set to false, there can't be any space left in the send buffer */
-            last = sreq->dev.segment_size;
-            if (last > sreq->dev.segment_first+BUFLEN) {
-                last = sreq->dev.segment_first+BUFLEN;
-                complete = FALSE;
-            }
-            MPI_nem_ptl_pack_byte(sreq->dev.segment_ptr, sreq->dev.segment_first, last, sb->buf.p, &REQ_PTL(sreq)->overflow[0]);
-            send_len += last - sreq->dev.segment_first;
-            sreq->dev.segment_first = last;
-        }
-        ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)sb->buf.p, send_len, PTL_NO_ACK_REQ, VC_PTL(sreq->ch.vc)->id, VC_PTL(sreq->ch.vc)->ptc, 0, 0, sb,
-                                    MPIDI_Process.my_pg_rank, 1);
-        MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s", MPID_nem_ptl_strerror(ret));
-
-        if (!complete)
-            continue;
-        
-        /* sent all of the data */
-        reqFn = sreq->dev.OnDataAvail;
-        if (!reqFn) {
-            MPIU_Assert(MPIDI_Request_get_type(sreq) != MPIDI_REQUEST_TYPE_GET_RESP);
-            MPIDI_CH3U_Request_complete(sreq);
-        } else {
-            complete = 0;
-            mpi_errno = reqFn(sreq->ch.vc, sreq, &complete);
-            if (mpi_errno) MPIU_ERR_POP(mpi_errno);
-
-            if (!complete)
-                continue;
-        }
-        
-        /* completed the request */
-        --(VC_PTL(sreq->ch.vc)->num_queued_sends);
-        MPIDI_CH3I_Sendq_dequeue(&send_queue, &sreq);
+    int (*reqFn) (MPIDI_VC_t *, MPID_Request *, int *);
+    reqFn = req->dev.OnDataAvail;
+    if (!reqFn) {
+        MPIDI_CH3U_Request_complete(req);
         MPIU_DBG_MSG(CH3_CHANNEL, VERBOSE, ".... complete");
-
-        if (VC_PTL(sreq->ch.vc)->num_queued_sends == 0 && sreq->ch.vc->state == MPIDI_VC_STATE_CLOSED) {
-            /* this VC is closing, if this was the last req queued for that vc, call vc_terminated() */
-            mpi_errno = MPID_nem_ptl_vc_terminated(sreq->ch.vc);
-            if (mpi_errno) MPIU_ERR_POP(mpi_errno);
-        }
-        
     }
-    
- fn_exit:
-    MPIDI_FUNC_EXIT(MPID_STATE_SEND_QUEUED);
-    return mpi_errno;
- fn_fail:
-    goto fn_exit;
-}
-
-
-#undef FUNCNAME
-#define FUNCNAME handle_ack
-#undef FCNAME
-#define FCNAME MPIU_QUOTE(FUNCNAME)
-static int handle_ack(const ptl_event_t *e)
-{
-    int mpi_errno = MPI_SUCCESS;
-    MPIDI_STATE_DECL(HANDLE_ACK);
-
-    MPIDI_FUNC_ENTER(HANDLE_ACK);
-    MPIU_Assert(e->type == PTL_EVENT_SEND);
-
-    FREE_PUSH((MPID_nem_ptl_sendbuf_t *)e->user_ptr);
-
-    if (!MPIDI_CH3I_Sendq_empty(send_queue)) {
-        mpi_errno = send_queued();
-        if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+    else {
+        int complete;
+        MPIDI_VC_t *vc = req->ch.vc;
+        reqFn(vc, req, &complete);
+        MPIU_Assert(complete == TRUE);
     }
-    
- fn_exit:
-    MPIDI_FUNC_EXIT(HANDLE_ACK);
-    return mpi_errno;
- fn_fail:
-    goto fn_exit;
+    MPIDI_FUNC_EXIT(MPID_STATE_ON_DATA_AVAIL);
 }
 
 #undef FUNCNAME
-#define FUNCNAME MPID_nem_ptl_nm_event_handler
+#define FUNCNAME MPID_nem_ptl_nm_ctl_event_handler
 #undef FCNAME
 #define FCNAME MPIU_QUOTE(FUNCNAME)
-int MPID_nem_ptl_nm_event_handler(const ptl_event_t *e)
+int MPID_nem_ptl_nm_ctl_event_handler(const ptl_event_t *e)
 {
     int mpi_errno = MPI_SUCCESS;
-    MPIDI_VC_t *vc;
-    int ret;
-    MPIDI_STATE_DECL(MPID_STATE_MPID_NEM_PTL_NM_EVENT_HANDLER);
+    MPIDI_STATE_DECL(MPID_STATE_MPID_NEM_PTL_NM_CTL_EVENT_HANDLER);
 
-    MPIDI_FUNC_ENTER(MPID_STATE_MPID_NEM_PTL_NM_EVENT_HANDLER);
+    MPIDI_FUNC_ENTER(MPID_STATE_MPID_NEM_PTL_NM_CTL_EVENT_HANDLER);
+
+    switch(e->type) {
 
-    switch (e->type) {
     case PTL_EVENT_PUT:
-        MPIDI_PG_Get_vc_set_active(MPIDI_Process.my_pg, (uint64_t)e->hdr_data, &vc);
-        mpi_errno = MPID_nem_handle_pkt(vc, e->start, e->rlength);
-        if (mpi_errno) MPIU_ERR_POP(mpi_errno);
-
-        MPIU_Assert(e->start == recvbuf[(uint64_t)e->user_ptr]);
-        ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_control_pt, &recvbuf_me[(uint64_t)e->user_ptr],
-                          PTL_PRIORITY_LIST, e->user_ptr, &recvbuf_me_handle[(uint64_t)e->user_ptr]);
-        MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmeappend", "**ptlmeappend %s", MPID_nem_ptl_strerror(ret));
+        if (e->match_bits != CTL_TAG) {
+            MPIU_Free(SENDBUF((MPID_Request *)e->user_ptr));
+            MPIU_Free(TMPBUF((MPID_Request *)e->user_ptr));
+            on_data_avail((MPID_Request *)e->user_ptr);
+            --put_cnt;
+        }
+        else {
+            int ret;
+            const uint64_t buf_idx = (uint64_t) e->user_ptr;
+            const size_t packet_sz = e->mlength - offsetof(buf_t, packet);
+            MPIDI_VC_t *vc;
+            MPID_nem_ptl_vc_area * vc_ptl;
+
+            MPIU_Assert(e->start == &recvbufs[buf_idx]);
+
+            MPIDI_PG_Get_vc(MPIDI_Process.my_pg, (uint64_t)e->hdr_data, &vc);
+            vc_ptl = VC_PTL(vc);
+
+            if (recvbufs[buf_idx].remaining == 0) {
+                mpi_errno = MPID_nem_handle_pkt(vc, recvbufs[buf_idx].packet, packet_sz);
+                if (mpi_errno)
+                    MPIU_ERR_POP(mpi_errno);
+                /* Notify we're done */
+                ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, 0, 0, PTL_NO_ACK_REQ, vc_ptl->id, vc_ptl->ptc,
+                                            DONE_TAG(recvbufs[buf_idx].tag), 0, done_req, MPIDI_Process.my_pg_rank, 0);
+                MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s",
+                                     MPID_nem_ptl_strerror(ret));
+                MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST,
+                                                        "PtlPut(size=0 id=(%#x,%#x) pt=%#x tag=%#lx)",
+                                                        vc_ptl->id.phys.nid, vc_ptl->id.phys.pid,
+                                                        vc_ptl->ptc, DONE_TAG(recvbufs[buf_idx].tag)));
+            }
+            else {
+                int incomplete;
+                size_t size;
+                char *buf_ptr;
+
+                MPID_Request *req = MPID_Request_create();
+                MPIU_Assert(req != NULL);
+                MPIDI_CH3U_Request_decrement_cc(req, &incomplete);  /* We'll increment it below */
+                REQ_PTL(req)->event_handler = MPID_nem_ptl_nm_ctl_event_handler;
+                REQ_PTL(req)->bytes_put = packet_sz + recvbufs[buf_idx].remaining;
+                TMPBUF(req) = MPIU_Malloc(REQ_PTL(req)->bytes_put);
+                MPIU_Assert(TMPBUF(req) != NULL);
+                MPIU_Memcpy(TMPBUF(req), recvbufs[buf_idx].packet, packet_sz);
+
+                req->ch.vc = vc;
+
+                req->dev.match.parts.tag = recvbufs[buf_idx].tag;
+
+                size = recvbufs[buf_idx].remaining < MPIDI_nem_ptl_ni_limits.max_msg_size ?
+                           recvbufs[buf_idx].remaining : MPIDI_nem_ptl_ni_limits.max_msg_size;
+                buf_ptr = (char *)TMPBUF(req) + packet_sz;
+                while (recvbufs[buf_idx].remaining) {
+                    MPIDI_CH3U_Request_increment_cc(req, &incomplete);  /* Will be decremented - and eventually freed in REPLY */
+                    ret = MPID_nem_ptl_rptl_get(MPIDI_nem_ptl_global_md, (ptl_size_t)buf_ptr,
+                                                size, vc_ptl->id, vc_ptl->ptc, GET_TAG(recvbufs[buf_idx].tag), 0, req);
+                    MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlget", "**ptlget %s",
+                                         MPID_nem_ptl_strerror(ret));
+                    MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST,
+                                                            "PtlGet(size=%lu id=(%#x,%#x) pt=%#x tag=%#lx)", size,
+                                                            vc_ptl->id.phys.nid,
+                                                            vc_ptl->id.phys.pid, vc_ptl->ptc, GET_TAG(recvbufs[buf_idx].tag)));
+                    buf_ptr += size;
+                    recvbufs[buf_idx].remaining -= size;
+                    if (recvbufs[buf_idx].remaining < MPIDI_nem_ptl_ni_limits.max_msg_size)
+                        size = recvbufs[buf_idx].remaining;
+                }
+            }
+
+            /* Repost the recv buffer */
+            ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_control_pt, &mes[buf_idx],
+                              PTL_PRIORITY_LIST, e->user_ptr /* buf_idx */, &me_handles[buf_idx]);
+            MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmeappend",
+                                 "**ptlmeappend %s", MPID_nem_ptl_strerror(ret));
+        }
         break;
-    case PTL_EVENT_ACK:
-        mpi_errno = handle_ack(e);
-        if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+
+    case PTL_EVENT_REPLY:
+        {
+            int incomplete;
+            MPID_Request *const rreq = e->user_ptr;
+
+            MPIDI_CH3U_Request_decrement_cc(rreq, &incomplete);
+            if (!incomplete) {
+                int ret;
+                MPID_nem_ptl_vc_area *const vc_ptl = VC_PTL(rreq->ch.vc);
+
+                mpi_errno = MPID_nem_handle_pkt(rreq->ch.vc, TMPBUF(rreq), REQ_PTL(rreq)->bytes_put);
+                if (mpi_errno)
+                    MPIU_ERR_POP(mpi_errno);
+
+                /* Notify we're done */
+                ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, 0, 0, PTL_NO_ACK_REQ, vc_ptl->id, vc_ptl->ptc,
+                                            DONE_TAG(rreq->dev.match.parts.tag), 0, done_req, MPIDI_Process.my_pg_rank, 0);
+                MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s",
+                                     MPID_nem_ptl_strerror(ret));
+                MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST,
+                                                        "PtlPut(size=0 id=(%#x,%#x) pt=%#x tag=%#lx)",
+                                                        vc_ptl->id.phys.nid, vc_ptl->id.phys.pid,
+                                                        vc_ptl->ptc, DONE_TAG((ptl_match_bits_t)SENDBUF(rreq))));
+
+                /* Free resources */
+                MPIU_Free(TMPBUF(rreq));
+                MPID_Request_release(rreq);
+            }
+        }
         break;
-    case PTL_EVENT_SEND:
-        /* ignore */
+
+    case PTL_EVENT_GET:
+        MPIDI_CH3U_Request_complete((MPID_Request *)e->user_ptr);
         break;
+
     default:
         MPIU_Error_printf("Received unexpected event type: %d %s", e->type, MPID_nem_ptl_strevent(e));
         MPIU_ERR_INTERNALANDJUMP(mpi_errno, "Unexpected event type");
@@ -668,7 +557,7 @@ int MPID_nem_ptl_nm_event_handler(const ptl_event_t *e)
     }
 
  fn_exit:
-    MPIDI_FUNC_EXIT(MPID_STATE_MPID_NEM_PTL_NM_EVENT_HANDLER);
+    MPIDI_FUNC_EXIT(MPID_STATE_MPID_NEM_PTL_NM_CTL_EVENT_HANDLER);
     return mpi_errno;
  fn_fail:
     goto fn_exit;
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c
index 26a1eb2..857c9ec 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c
@@ -143,23 +143,21 @@ int MPID_nem_ptl_poll(int is_blocking_poll)
                 break;
         }
         MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptleqget", "**ptleqget %s", MPID_nem_ptl_strerror(ret));
-        MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST, "Received event %s ni_fail=%s list=%s user_ptr=%p hdr_data=%#lx mlength=%lu",
-                                                MPID_nem_ptl_strevent(&event), MPID_nem_ptl_strnifail(event.ni_fail_type),
-                                                MPID_nem_ptl_strlist(event.ptl_list), event.user_ptr, event.hdr_data, event.mlength));
-
+        MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST, "Received event %s pt_idx=%d ni_fail=%s list=%s user_ptr=%p hdr_data=%#lx mlength=%lu rlength=%lu",
+                                                MPID_nem_ptl_strevent(&event), event.pt_index, MPID_nem_ptl_strnifail(event.ni_fail_type),
+                                                MPID_nem_ptl_strlist(event.ptl_list), event.user_ptr, event.hdr_data, event.mlength, event.rlength));
         MPIU_ERR_CHKANDJUMP2(event.ni_fail_type != PTL_NI_OK && event.ni_fail_type != PTL_NI_NO_MATCH, mpi_errno, MPI_ERR_OTHER, "**ptlni_fail", "**ptlni_fail %s %s", MPID_nem_ptl_strevent(&event), MPID_nem_ptl_strnifail(event.ni_fail_type));
-        
-        /* handle control messages */
-        if (event.pt_index == MPIDI_nem_ptl_control_pt) {
-            mpi_errno = MPID_nem_ptl_nm_event_handler(&event);
-            if (mpi_errno) MPIU_ERR_POP(mpi_errno);
-            goto fn_exit;
-        }
-        
+
         switch (event.type) {
         case PTL_EVENT_PUT:
             if (event.ptl_list == PTL_OVERFLOW_LIST)
                 break;
+            if (event.pt_index == MPIDI_nem_ptl_control_pt) {
+                mpi_errno = MPID_nem_ptl_nm_ctl_event_handler(&event);
+                if (mpi_errno)
+                    MPIU_ERR_POP(mpi_errno);
+                break;
+            }
         case PTL_EVENT_PUT_OVERFLOW:
         case PTL_EVENT_GET:
         case PTL_EVENT_ACK:
@@ -168,8 +166,10 @@ int MPID_nem_ptl_poll(int is_blocking_poll)
             MPID_Request * const req = event.user_ptr;
             MPIU_DBG_MSG_P(CH3_CHANNEL, VERBOSE, "req = %p", req);
             MPIU_DBG_MSG_P(CH3_CHANNEL, VERBOSE, "REQ_PTL(req)->event_handler = %p", REQ_PTL(req)->event_handler);
-            mpi_errno = REQ_PTL(req)->event_handler(&event);
-            if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+            if (REQ_PTL(req)->event_handler) {
+                mpi_errno = REQ_PTL(req)->event_handler(&event);
+                if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+            }
             break;
         }
         case PTL_EVENT_AUTO_FREE:
@@ -179,8 +179,8 @@ int MPID_nem_ptl_poll(int is_blocking_poll)
         case PTL_EVENT_AUTO_UNLINK:
             overflow_me_handle[(size_t)event.user_ptr] = PTL_INVALID_HANDLE;
             break;
-        case PTL_EVENT_LINK:
         case PTL_EVENT_SEND:
+        case PTL_EVENT_LINK:
             /* ignore */
             break;
         default:

http://git.mpich.org/mpich.git/commitdiff/7160f815be10f4d087524bc88e106bfb183a9011

commit 7160f815be10f4d087524bc88e106bfb183a9011
Author: Min Si <msi at il.is.s.u-tokyo.ac.jp>
Date:   Wed Nov 12 13:08:37 2014 -0600

    Increase time limit of bcast2 and bcast3 to 12mins.
    
    Timeout is reported on some overloaded machines with 10 minutes time
    limitation.
    
    Signed-off-by: Xin Zhao <xinzhao3 at illinois.edu>

diff --git a/test/mpi/coll/testlist b/test/mpi/coll/testlist
index 2e68ed7..280365e 100644
--- a/test/mpi/coll/testlist
+++ b/test/mpi/coll/testlist
@@ -35,8 +35,8 @@ bcasttest 10
 bcast2 4
 # More that 8 processes are required to get bcast to switch to the long
 # msg algorithm (see coll definitions in mpiimpl.h)
-bcast2 10 timeLimit=600
-bcast3 10 timeLimit=600
+bcast2 10 timeLimit=720
+bcast3 10 timeLimit=720
 bcastzerotype 1
 bcastzerotype 4
 bcastzerotype 5

http://git.mpich.org/mpich.git/commitdiff/6ae110358946948b73679cf0ae6485c380223a2c

commit 6ae110358946948b73679cf0ae6485c380223a2c
Author: Huiwei Lu <huiweilu at mcs.anl.gov>
Date:   Wed Nov 12 14:55:04 2014 -0600

    Cleans the debugging print of bcast
    
    Free the group and communicator created in the test so it does not
    complain when memory debug is on.
    
    Signed-off-by: Wesley Bland <wbland at anl.gov>

diff --git a/test/mpi/ft/bcast.c b/test/mpi/ft/bcast.c
index 8c877a9..02b2ee2 100644
--- a/test/mpi/ft/bcast.c
+++ b/test/mpi/ft/bcast.c
@@ -84,6 +84,9 @@ int main(int argc, char **argv)
         fflush(stdout);
     }
 
+    MPI_Group_free(&world);
+    MPI_Group_free(&newgroup);
+    MPI_Comm_free(&newcomm);
     MPI_Finalize();
 
     return 0;

http://git.mpich.org/mpich.git/commitdiff/55c69dad7396fc72532bb494e056bfdf8cb52c0c

commit 55c69dad7396fc72532bb494e056bfdf8cb52c0c
Author: Antonio Pena Monferrer <apenya at mcs.anl.gov>
Date:   Sat Nov 8 12:27:17 2014 -0600

    Added large message cases to getfence1/putfence1
    
    These are meant to hit the >1GB message size and hence test the large
    message case in Portals4.
    
    Signed-off-by: Wesley Bland <wbland at anl.gov>

diff --git a/test/mpi/rma/getfence1.c b/test/mpi/rma/getfence1.c
index 869b143..80034df 100644
--- a/test/mpi/rma/getfence1.c
+++ b/test/mpi/rma/getfence1.c
@@ -6,20 +6,89 @@
  */
 #include "mpi.h"
 #include <stdio.h>
+#include <string.h>
+#include <limits.h>
 #include "mpitest.h"
 
+#define LARGE_CNT_CONTIG    550000000
+#define LARGE_CNT_NONCONTIG 150000000
+
 /*
 static char MTEST_Descrip[] = "Get with Fence";
 */
 
-int main( int argc, char *argv[] )
+static inline int test(MPI_Comm comm, int rank, int source, int dest,
+                MTestDatatype *sendtype, MTestDatatype *recvtype)
 {
     int errs = 0, err;
+    int disp_unit;
+    MPI_Aint      extent;
+    MPI_Win       win;
+
+    MTestPrintfMsg( 1,
+                    "Getting count = %ld of sendtype %s - count = %ld receive type %s\n",
+                    sendtype->count, MTestGetDatatypeName( sendtype ), recvtype->count,
+                    MTestGetDatatypeName( recvtype ) );
+    /* Make sure that everyone has a recv buffer */
+    recvtype->InitBuf( recvtype );
+    sendtype->InitBuf( sendtype );
+    /* By default, print information about errors */
+    recvtype->printErrors = 1;
+    sendtype->printErrors = 1;
+
+    MPI_Type_extent( sendtype->datatype, &extent );
+    disp_unit = extent < INT_MAX ? extent : 1;
+    MPI_Win_create( sendtype->buf, sendtype->count * extent,
+                    disp_unit, MPI_INFO_NULL, comm, &win );
+    MPI_Win_fence( 0, win );
+    if (rank == source) {
+        /* The source does not need to do anything besides the
+           fence */
+        MPI_Win_fence( 0, win );
+    }
+    else if (rank == dest) {
+        /* To improve reporting of problems about operations, we
+           change the error handler to errors return */
+        MPI_Win_set_errhandler( win, MPI_ERRORS_RETURN );
+
+        /* This should have the same effect, in terms of
+           transfering data, as a send/recv pair */
+        err = MPI_Get( recvtype->buf, recvtype->count,
+                       recvtype->datatype, source, 0,
+                       sendtype->count, sendtype->datatype, win );
+        if (err) {
+            errs++;
+            if (errs < 10) {
+                MTestPrintError( err );
+            }
+        }
+        err = MPI_Win_fence( 0, win );
+        if (err) {
+            errs++;
+            if (errs < 10) {
+                MTestPrintError( err );
+            }
+        }
+        err = MTestCheckRecv( 0, recvtype );
+        if (err) {
+            errs += err;
+        }
+    }
+    else {
+        MPI_Win_fence( 0, win );
+    }
+    MPI_Win_free( &win );
+
+    return errs;
+}
+
+
+int main( int argc, char *argv[] )
+{
+    int errs = 0;
     int rank, size, source, dest;
     int minsize = 2, count; 
     MPI_Comm      comm;
-    MPI_Win       win;
-    MPI_Aint      extent;
     MTestDatatype sendtype, recvtype;
 
     MTest_Init( &argc, &argv );
@@ -38,61 +107,34 @@ int main( int argc, char *argv[] )
 	
 	MTEST_DATATYPE_FOR_EACH_COUNT(count) {
 	    while (MTestGetDatatypes( &sendtype, &recvtype, count )) {
-		/* Make sure that everyone has a recv buffer */
-		recvtype.InitBuf( &recvtype );
-		sendtype.InitBuf( &sendtype );
-		/* By default, print information about errors */
-		recvtype.printErrors = 1;
-		sendtype.printErrors = 1;
-
-		MPI_Type_extent( sendtype.datatype, &extent );
-		MPI_Win_create( sendtype.buf, sendtype.count * extent, 
-				(int)extent, MPI_INFO_NULL, comm, &win );
-		MPI_Win_fence( 0, win );
-		if (rank == source) {
-		    /* The source does not need to do anything besides the
-		       fence */
-		    MPI_Win_fence( 0, win );
-		}
-		else if (rank == dest) {
-		    /* To improve reporting of problems about operations, we
-		       change the error handler to errors return */
-		    MPI_Win_set_errhandler( win, MPI_ERRORS_RETURN );
-
-		    /* This should have the same effect, in terms of
-		       transfering data, as a send/recv pair */
-		    err = MPI_Get( recvtype.buf, recvtype.count, 
-				   recvtype.datatype, source, 0, 
-				   sendtype.count, sendtype.datatype, win );
-		    if (err) {
-			errs++;
-			if (errs < 10) {
-			    MTestPrintError( err );
-			}
-		    }
-		    err = MPI_Win_fence( 0, win );
-		    if (err) {
-			errs++;
-			if (errs < 10) {
-			    MTestPrintError( err );
-			}
-		    }
-		    err = MTestCheckRecv( 0, &recvtype );
-		    if (err) {
-			errs += err;
-		    }
-		}
-		else {
-		    MPI_Win_fence( 0, win );
-		}
-		MPI_Win_free( &win );
-		MTestFreeDatatype( &recvtype );
-		MTestFreeDatatype( &sendtype );
+                errs += test(comm, rank, source, dest, &sendtype, &recvtype);
+                MTestFreeDatatype(&sendtype);
+                MTestFreeDatatype(&recvtype);
 	    }
 	}
         MTestFreeComm(&comm);
     }
 
+    /* Part #2: simple large size test - contiguous and noncontiguous */
+    if (sizeof(void *) > 4) {  /* Only if > 32-bit architecture */
+        MPI_Comm_rank( MPI_COMM_WORLD, &rank );
+        MPI_Comm_size( MPI_COMM_WORLD, &size );
+        source = 0;
+        dest   = size - 1;
+
+        MTestGetDatatypes(&sendtype, &recvtype, LARGE_CNT_CONTIG);
+        errs += test(MPI_COMM_WORLD, rank, source, dest, &sendtype, &recvtype);
+
+        do {
+            MTestFreeDatatype(&sendtype);
+            MTestFreeDatatype(&recvtype);
+            MTestGetDatatypes(&sendtype, &recvtype, LARGE_CNT_NONCONTIG);
+        } while (strstr(MTestGetDatatypeName(&sendtype), "vector") == NULL);
+        errs += test(MPI_COMM_WORLD, rank, source, dest, &sendtype, &recvtype);
+        MTestFreeDatatype(&sendtype);
+        MTestFreeDatatype(&recvtype);
+    }
+
     MTest_Finalize( errs );
     MPI_Finalize();
     return 0;
diff --git a/test/mpi/rma/putfence1.c b/test/mpi/rma/putfence1.c
index 5c6bee4..de13720 100644
--- a/test/mpi/rma/putfence1.c
+++ b/test/mpi/rma/putfence1.c
@@ -6,24 +6,94 @@
  */
 #include "mpi.h"
 #include <stdio.h>
+#include <string.h>
 #include "mpitest.h"
 
+/* These counts allow reasonable sizes for the large tests */
+#define LARGE_CNT_CONTIG    550000000
+#define LARGE_CNT_NONCONTIG 150000000
+
 /*
 static char MTEST_Descrip[] = "Put with Fence";
 */
 
-int main( int argc, char *argv[] )
+static inline int test(MPI_Comm comm, int rank, int source, int dest,
+                MTestDatatype *sendtype, MTestDatatype *recvtype)
 {
     int errs = 0, err;
+    MPI_Aint extent;
+    MPI_Win  win;
+
+    MTestPrintfMsg( 1,
+                    "Putting count = %ld of sendtype %s - count = %ld receive type %s\n",
+                    sendtype->count, MTestGetDatatypeName( sendtype ), recvtype->count,
+                    MTestGetDatatypeName( recvtype ) );
+
+    /* Make sure that everyone has a recv buffer */
+    recvtype->InitBuf( recvtype );
+    MPI_Type_extent( recvtype->datatype, &extent );
+    MPI_Win_create( recvtype->buf, recvtype->count * extent,
+                    extent, MPI_INFO_NULL, comm, &win );
+    MPI_Win_fence( 0, win );
+    if (rank == source) {
+        /* To improve reporting of problems about operations, we
+           change the error handler to errors return */
+        MPI_Win_set_errhandler( win, MPI_ERRORS_RETURN );
+
+        sendtype->InitBuf( sendtype );
+
+        err = MPI_Put( sendtype->buf, sendtype->count,
+                       sendtype->datatype, dest, 0,
+                       recvtype->count, recvtype->datatype, win );
+        if (err) {
+            errs++;
+            if (errs < 10) {
+                MTestPrintError( err );
+            }
+        }
+        err = MPI_Win_fence( 0, win );
+        if (err) {
+            errs++;
+            if (errs < 10) {
+                MTestPrintError( err );
+            }
+        }
+    }
+    else if (rank == dest) {
+        MPI_Win_fence( 0, win );
+        /* This should have the same effect, in terms of
+           transfering data, as a send/recv pair */
+        err = MTestCheckRecv( 0, recvtype );
+        if (err) {
+            if (errs < 10) {
+                printf( "Data in target buffer did not match for destination datatype %s (put with source datatype %s)\n",
+                        MTestGetDatatypeName( recvtype ),
+                        MTestGetDatatypeName( sendtype ) );
+                /* Redo the test, with the errors printed */
+                recvtype->printErrors = 1;
+                (void)MTestCheckRecv( 0, recvtype );
+            }
+            errs += err;
+        }
+    }
+    else {
+        MPI_Win_fence( 0, win );
+    }
+    MPI_Win_free( &win );
+
+    return errs;
+}
+
+
+int main( int argc, char *argv[] )
+{
+    int errs = 0;
     int rank, size, source, dest;
     int minsize = 2, count; 
     MPI_Comm      comm;
-    MPI_Win       win;
-    MPI_Aint      extent;
     MTestDatatype sendtype, recvtype;
 
     MTest_Init( &argc, &argv );
-
     /* The following illustrates the use of the routines to 
        run through a selection of communicators and datatypes.
        Use subsets of these for tests that do not involve combinations 
@@ -38,71 +108,34 @@ int main( int argc, char *argv[] )
 	
 	MTEST_DATATYPE_FOR_EACH_COUNT(count) {
 	    while (MTestGetDatatypes( &sendtype, &recvtype, count )) {
-
-		MTestPrintfMsg( 1, 
-		       "Putting count = %d of sendtype %s receive type %s\n", 
-				count, MTestGetDatatypeName( &sendtype ),
-				MTestGetDatatypeName( &recvtype ) );
-
-		/* Make sure that everyone has a recv buffer */
-		recvtype.InitBuf( &recvtype );
-
-		MPI_Type_extent( recvtype.datatype, &extent );
-		MPI_Win_create( recvtype.buf, recvtype.count * extent, 
-				extent, MPI_INFO_NULL, comm, &win );
-		MPI_Win_fence( 0, win );
-		if (rank == source) {
-		    /* To improve reporting of problems about operations, we
-		       change the error handler to errors return */
-		    MPI_Win_set_errhandler( win, MPI_ERRORS_RETURN );
-
-		    sendtype.InitBuf( &sendtype );
-		    
-		    err = MPI_Put( sendtype.buf, sendtype.count, 
-				   sendtype.datatype, dest, 0, 
-				   recvtype.count, recvtype.datatype, win );
-		    if (err) {
-			errs++;
-			if (errs < 10) {
-			    MTestPrintError( err );
-			}
-		    }
-		    err = MPI_Win_fence( 0, win );
-		    if (err) {
-			errs++;
-			if (errs < 10) {
-			    MTestPrintError( err );
-			}
-		    }
-		}
-		else if (rank == dest) {
-		    MPI_Win_fence( 0, win );
-		    /* This should have the same effect, in terms of
-		       transfering data, as a send/recv pair */
-		    err = MTestCheckRecv( 0, &recvtype );
-		    if (err) {
-			if (errs < 10) {
-			    printf( "Data in target buffer did not match for destination datatype %s (put with source datatype %s)\n", 
-				    MTestGetDatatypeName( &recvtype ),
-				    MTestGetDatatypeName( &sendtype ) );
-			    /* Redo the test, with the errors printed */
-			    recvtype.printErrors = 1;
-			    (void)MTestCheckRecv( 0, &recvtype );
-			}
-			errs += err;
-		    }
-		}
-		else {
-		    MPI_Win_fence( 0, win );
-		}
-		MPI_Win_free( &win );
-		MTestFreeDatatype( &sendtype );
-		MTestFreeDatatype( &recvtype );
+                errs += test(comm, rank, source, dest, &sendtype, &recvtype);
+                MTestFreeDatatype(&sendtype);
+                MTestFreeDatatype(&recvtype);
 	    }
 	}
         MTestFreeComm(&comm);
     }
 
+    /* Part #2: simple large size test - contiguous and noncontiguous */
+    if (sizeof(void *) > 4) {  /* Only if > 32-bit architecture */
+        MPI_Comm_rank( MPI_COMM_WORLD, &rank );
+        MPI_Comm_size( MPI_COMM_WORLD, &size );
+        source = 0;
+        dest   = size - 1;
+
+        MTestGetDatatypes(&sendtype, &recvtype, LARGE_CNT_CONTIG);
+        errs += test(MPI_COMM_WORLD, rank, source, dest, &sendtype, &recvtype);
+
+        do {
+            MTestFreeDatatype(&sendtype);
+            MTestFreeDatatype(&recvtype);
+            MTestGetDatatypes(&sendtype, &recvtype, LARGE_CNT_NONCONTIG);
+        } while (strstr(MTestGetDatatypeName(&sendtype), "vector") == NULL);
+        errs += test(MPI_COMM_WORLD, rank, source, dest, &sendtype, &recvtype);
+        MTestFreeDatatype(&sendtype);
+        MTestFreeDatatype(&recvtype);
+    }
+
     MTest_Finalize( errs );
     MPI_Finalize();
     return 0;
diff --git a/test/mpi/rma/testlist.in b/test/mpi/rma/testlist.in
index cb60752..03f659d 100644
--- a/test/mpi/rma/testlist.in
+++ b/test/mpi/rma/testlist.in
@@ -2,7 +2,7 @@ winname 2
 allocmem 2
 putfence1 4
 putfidx 4
-getfence1 4
+getfence1 4 timeLimit=400
 accfence1 4
 adlb_mimic1 3
 accfence2 4

http://git.mpich.org/mpich.git/commitdiff/b0e9dab6f247857b85f023c25993d47519f87df9

commit b0e9dab6f247857b85f023c25993d47519f87df9
Author: Antonio Pena Monferrer <apenya at mcs.anl.gov>
Date:   Sat Nov 8 12:26:03 2014 -0600

    Added support for large-count datatype tests
    
    Signed-off-by: Wesley Bland <wbland at anl.gov>

diff --git a/test/mpi/include/mpitest.h b/test/mpi/include/mpitest.h
index 0b46ef7..f20eb76 100644
--- a/test/mpi/include/mpitest.h
+++ b/test/mpi/include/mpitest.h
@@ -38,7 +38,7 @@ void MTestGetDbgInfo(int *dbgflag, int *verbose);
 typedef struct _MTestDatatype {
     MPI_Datatype datatype;
     void *buf;              /* buffer to use in communication */
-    int  count;             /* count to use for this datatype */
+    MPI_Aint  count;        /* count to use for this datatype */
     int  isBasic;           /* true if the type is predefined */
     int  printErrors;       /* true if errors should be printed
 			       (used by the CheckBuf routines) */
@@ -72,7 +72,7 @@ void MTestInitFullDatatypes();
 void MTestInitMinDatatypes();
 
 int MTestCheckRecv( MPI_Status *, MTestDatatype * );
-int MTestGetDatatypes( MTestDatatype *, MTestDatatype *, int );
+int MTestGetDatatypes( MTestDatatype *, MTestDatatype *, MPI_Aint );
 void MTestResetDatatypes( void );
 void MTestFreeDatatype( MTestDatatype * );
 const char *MTestGetDatatypeName( MTestDatatype * );
diff --git a/test/mpi/util/mtest_datatype.c b/test/mpi/util/mtest_datatype.c
index 39a5f1e..39e3ceb 100644
--- a/test/mpi/util/mtest_datatype.c
+++ b/test/mpi/util/mtest_datatype.c
@@ -82,7 +82,7 @@ static void *MTestTypeContigInit(MTestDatatype * mtype)
 
     if (mtype->count > 0) {
         unsigned char *p;
-        int i, totsize;
+        MPI_Aint i, totsize;
         merr = MPI_Type_extent(mtype->datatype, &size);
         if (merr)
             MTestPrintError(merr);
@@ -116,8 +116,8 @@ static int MTestTypeContigCheckbuf(MTestDatatype * mtype)
 {
     unsigned char *p;
     unsigned char expected;
-    int i, totsize, err = 0, merr;
-    MPI_Aint size;
+    int err = 0, merr;
+    MPI_Aint i, totsize, size;
 
     p = (unsigned char *) mtype->buf;
     if (p) {
@@ -130,7 +130,7 @@ static int MTestTypeContigCheckbuf(MTestDatatype * mtype)
             if (p[i] != expected) {
                 err++;
                 if (mtype->printErrors && err < 10) {
-                    printf("Data expected = %x but got p[%d] = %x\n", expected, i, p[i]);
+                    printf("Data expected = %x but got p[%ld] = %x\n", expected, i, p[i]);
                     fflush(stdout);
                 }
             }
@@ -154,7 +154,8 @@ static void *MTestTypeVectorInit(MTestDatatype * mtype)
 
     if (mtype->count > 0) {
         unsigned char *p;
-        int i, j, k, nc;
+        int j, k, nc;
+        MPI_Aint i;
 
         merr = MPI_Type_extent(mtype->datatype, &size);
         if (merr)
@@ -257,7 +258,8 @@ static void *MTestTypeIndexedInit(MTestDatatype * mtype)
 
     if (mtype->count > 0) {
         unsigned char *p;
-        int i, j, k, b, nc;
+        int j, k, b, nc;
+        MPI_Aint i;
 
         /* Allocate buffer */
         merr = MPI_Type_extent(mtype->datatype, &size);
@@ -371,7 +373,8 @@ static void *MTestTypeIndexedBlockInit(MTestDatatype * mtype)
 
     if (mtype->count > 0) {
         unsigned char *p;
-        int i, k, j, nc;
+        int k, j, nc;
+        MPI_Aint i;
 
         /* Allocate the send/recv buffer */
         merr = MPI_Type_extent(mtype->datatype, &size);
@@ -477,7 +480,8 @@ static void *MTestTypeSubarrayInit(MTestDatatype * mtype)
 
     if (mtype->count > 0) {
         unsigned char *p;
-        int i, k, j, b, nc;
+        int k, j, b, nc;
+        MPI_Aint i;
 
         /* Allocate the send/recv buffer */
         merr = MPI_Type_extent(mtype->datatype, &size);
@@ -1240,7 +1244,7 @@ void *MTestTypeInitRecv(MTestDatatype * mtype)
 
     if (mtype->count > 0) {
         signed char *p;
-        int i, totsize;
+        MPI_Aint i, totsize;
         merr = MPI_Type_extent(mtype->datatype, &size);
         if (merr)
             MTestPrintError(merr);
diff --git a/test/mpi/util/mtest_datatype_gen.c b/test/mpi/util/mtest_datatype_gen.c
index 2cd1d2c..66279af 100644
--- a/test/mpi/util/mtest_datatype_gen.c
+++ b/test/mpi/util/mtest_datatype_gen.c
@@ -178,12 +178,12 @@ void MTestInitMinDatatypes()
 /* Routine to define various sets of blocklen/count/stride for derived datatypes. */
 /* ------------------------------------------------------------------------------ */
 
-static inline int MTestDdtStructDefine(int ddt_index, int tot_count, int *count,
-                                       int *blen, int *stride, int *align_tot_count)
+static inline int MTestDdtStructDefine(int ddt_index, MPI_Aint tot_count, MPI_Aint *count,
+                                       MPI_Aint *blen, MPI_Aint *stride, MPI_Aint *align_tot_count)
 {
     int merr = 0;
     int ddt_c_st;
-    int _short = 0, _align_tot_count = 0, _count = 0, _blen = 0, _stride = 0;
+    MPI_Aint _short = 0, _align_tot_count = 0, _count = 0, _blen = 0, _stride = 0;
     ddt_c_st = ddt_index % MTEST_DDT_NUM_SUBTESTS;
 
     /* Get short value according to user specified tot_count.
@@ -244,7 +244,7 @@ static inline int MTestDdtStructDefine(int ddt_index, int tot_count, int *count,
 /* ------------------------------------------------------------------------ */
 
 static inline int MTestGetBasicDatatypes(MTestDatatype * sendtype,
-                                         MTestDatatype * recvtype, int tot_count)
+                                         MTestDatatype * recvtype, MPI_Aint tot_count)
 {
     int merr = 0;
     int bdt_index = datatype_index - MTEST_BDT_START_IDX;
@@ -303,11 +303,12 @@ static inline int MTestGetBasicDatatypes(MTestDatatype * sendtype,
 /* ------------------------------------------------------------------------ */
 
 static inline int MTestGetSendDerivedDatatypes(MTestDatatype * sendtype,
-                                               MTestDatatype * recvtype, int tot_count)
+                                               MTestDatatype * recvtype, MPI_Aint tot_count)
 {
     int merr = 0;
     int ddt_datatype_index, ddt_c_dt;
-    int blen, stride, count, align_tot_count, tsize = 1;
+    MPI_Count tsize = 1;
+    MPI_Aint blen, stride, count, align_tot_count;;
     MPI_Datatype old_type = MPI_DOUBLE;
 
     /* Check index */
@@ -336,7 +337,7 @@ static inline int MTestGetSendDerivedDatatypes(MTestDatatype * sendtype,
         return merr;
 
     sendtype->count = 1;
-    merr = MPI_Type_size(sendtype->datatype, &tsize);
+    merr = MPI_Type_size_x(sendtype->datatype, &tsize);
     if (merr)
         MTestPrintError(merr);
 
@@ -351,11 +352,12 @@ static inline int MTestGetSendDerivedDatatypes(MTestDatatype * sendtype,
 }
 
 static inline int MTestGetRecvDerivedDatatypes(MTestDatatype * sendtype,
-                                               MTestDatatype * recvtype, int tot_count)
+                                               MTestDatatype * recvtype, MPI_Aint tot_count)
 {
     int merr = 0;
     int ddt_datatype_index, ddt_c_dt;
-    int blen, stride, count, align_tot_count, tsize;
+    MPI_Count tsize;
+    MPI_Aint blen, stride, count, align_tot_count;
     MPI_Datatype old_type = MPI_DOUBLE;
 
     /* Check index */
@@ -383,7 +385,7 @@ static inline int MTestGetRecvDerivedDatatypes(MTestDatatype * sendtype,
         return merr;
 
     recvtype->count = 1;
-    merr = MPI_Type_size(recvtype->datatype, &tsize);
+    merr = MPI_Type_size_x(recvtype->datatype, &tsize);
     if (merr)
         MTestPrintError(merr);
 
@@ -400,7 +402,7 @@ static inline int MTestGetRecvDerivedDatatypes(MTestDatatype * sendtype,
 /* ------------------------------------------------------------------------ */
 /* Exposed routine to external tests                                         */
 /* ------------------------------------------------------------------------ */
-int MTestGetDatatypes(MTestDatatype * sendtype, MTestDatatype * recvtype, int tot_count)
+int MTestGetDatatypes(MTestDatatype * sendtype, MTestDatatype * recvtype, MPI_Aint tot_count)
 {
     int merr = 0;
 
@@ -448,11 +450,11 @@ int MTestGetDatatypes(MTestDatatype * sendtype, MTestDatatype * recvtype, int to
     datatype_index++;
 
     if (verbose >= 2 && datatype_index > 0) {
-        int ssize, rsize;
+        MPI_Count ssize, rsize;
         const char *sendtype_nm = MTestGetDatatypeName(sendtype);
         const char *recvtype_nm = MTestGetDatatypeName(recvtype);
-        MPI_Type_size(sendtype->datatype, &ssize);
-        MPI_Type_size(recvtype->datatype, &rsize);
+        MPI_Type_size_x(sendtype->datatype, &ssize);
+        MPI_Type_size_x(recvtype->datatype, &rsize);
 
         MTestPrintfMsg(2, "Get datatypes: send = %s(size %d count %d basesize %d), "
                        "recv = %s(size %d count %d basesize %d), tot_count=%d\n",

http://git.mpich.org/mpich.git/commitdiff/4e618fd87e837e7d4f7ff5b50b4330ae1bec78c8

commit 4e618fd87e837e7d4f7ff5b50b4330ae1bec78c8
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Mon Nov 10 15:50:56 2014 -0600

    portals4: implement cancel send
    
    All MPI_Sends in the Portals4 netmod will cause some or all of the data to be
    sent eagerly to the receiver. Canceling a send means having to find the data in
    the unexpected message queue and removing it in order to preserve matching.
    Because the message queues exist at the netmod level, it needs its own cancel
    protocol.
    
    The protocol is modeled on a similar case in CH3, but with its own method
    for searching the unexpected queue. Custom netmod packet handlers are used to
    receive and process the control messages.
    
    Known Issue:
      Because we are using different PTs for the send and cancel message, it is
      possible the cancel request could arrive before the message being canceled.
    
    Signed-off-by: Antonio Pena Monferrer <apenya at mcs.anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
index f5c204d..f94fa9a 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
@@ -200,6 +200,36 @@ int MPID_nem_ptl_lmt_handle_cookie(MPIDI_VC_t *vc, MPID_Request *req, MPID_IOV s
 int MPID_nem_ptl_lmt_done_send(MPIDI_VC_t *vc, MPID_Request *req);
 int MPID_nem_ptl_lmt_done_recv(MPIDI_VC_t *vc, MPID_Request *req);
 
+/* packet handlers */
+
+int MPID_nem_ptl_pkt_cancel_send_req_handler(MPIDI_VC_t *vc, MPIDI_CH3_Pkt_t *pkt,
+                                             MPIDI_msg_sz_t *buflen, MPID_Request **rreqp);
+int MPID_nem_ptl_pkt_cancel_send_resp_handler(MPIDI_VC_t *vc, MPIDI_CH3_Pkt_t *pkt,
+                                              MPIDI_msg_sz_t *buflen, MPID_Request **rreqp);
+
+/* local packet types */
+
+typedef enum MPIDI_nem_ptl_pkt_type {
+    MPIDI_NEM_PTL_PKT_CANCEL_SEND_REQ,
+    MPIDI_NEM_PTL_PKT_CANCEL_SEND_RESP,
+    MPIDI_NEM_TCP_PKT_INVALID = -1 /* force signed, to avoid warnings */
+} MPIDI_nem_ptl_pkt_type_t;
+
+typedef struct MPIDI_nem_ptl_pkt_cancel_send_req
+{
+    MPIDI_CH3_Pkt_type_t type;
+    unsigned subtype;
+    MPIDI_Message_match match;
+    MPI_Request sender_req_id;
+} MPIDI_nem_ptl_pkt_cancel_send_req_t;
+
+typedef struct MPIDI_nem_ptl_pkt_cancel_send_resp
+{
+    MPIDI_CH3_Pkt_type_t type;
+    unsigned subtype;
+    MPI_Request sender_req_id;
+    int ack;
+} MPIDI_nem_ptl_pkt_cancel_send_resp_t;
 
 /* debugging */
 const char *MPID_nem_ptl_strerror(int ret);
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
index 2803cae..96ada05 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
@@ -76,6 +76,7 @@ static MPIDI_Comm_ops_t comm_ops = {
     MPID_nem_ptl_improbe        /* improbe */
 };
 
+static MPIDI_CH3_PktHandler_Fcn *MPID_nem_ptl_pkt_handlers[2]; /* for CANCEL_SEND_REQ and CANCEL_SEND_RESP */
 
 #undef FUNCNAME
 #define FUNCNAME get_target_info
@@ -412,6 +413,13 @@ static int vc_init(MPIDI_VC_t *vc)
     vc_ch->iStartContigMsg = MPID_nem_ptl_iStartContigMsg;
     vc_ch->iSendContig     = MPID_nem_ptl_iSendContig;
 
+    vc_ch->num_pkt_handlers = 2;
+    vc_ch->pkt_handler = MPID_nem_ptl_pkt_handlers;
+    MPID_nem_ptl_pkt_handlers[MPIDI_NEM_PTL_PKT_CANCEL_SEND_REQ] =
+        MPID_nem_ptl_pkt_cancel_send_req_handler;
+    MPID_nem_ptl_pkt_handlers[MPIDI_NEM_PTL_PKT_CANCEL_SEND_RESP] =
+        MPID_nem_ptl_pkt_cancel_send_resp_handler;
+
     vc_ch->lmt_initiate_lmt  = MPID_nem_ptl_lmt_initiate_lmt;
     vc_ch->lmt_start_recv    = MPID_nem_ptl_lmt_start_recv;
     vc_ch->lmt_start_send    = MPID_nem_ptl_lmt_start_send;
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_probe.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_probe.c
index 9a583e5..3de3b1f 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_probe.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_probe.c
@@ -292,3 +292,116 @@ int MPID_nem_ptl_anysource_improbe(int tag, MPID_Comm * comm, int context_offset
     goto fn_exit;
 }
 
+#undef FUNCNAME
+#define FUNCNAME MPID_nem_ptl_pkt_cancel_send_req_handler
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+int MPID_nem_ptl_pkt_cancel_send_req_handler(MPIDI_VC_t *vc, MPIDI_CH3_Pkt_t *pkt,
+                                                    MPIDI_msg_sz_t *buflen, MPID_Request **rreqp)
+{
+    int ret, mpi_errno = MPI_SUCCESS;
+    MPIDI_nem_ptl_pkt_cancel_send_req_t *req_pkt = (MPIDI_nem_ptl_pkt_cancel_send_req_t *)pkt;
+    MPID_PKT_DECL_CAST(upkt, MPIDI_nem_ptl_pkt_cancel_send_resp_t, resp_pkt);
+    MPID_Request *search_req, *resp_req;
+    ptl_me_t me;
+    MPID_nem_ptl_vc_area *const vc_ptl = VC_PTL(vc);
+
+    MPIU_DBG_MSG_FMT(CH3_OTHER,VERBOSE,(MPIU_DBG_FDEST,
+      "received cancel send req pkt, sreq=0x%08x, rank=%d, tag=%d, context=%d",
+                      req_pkt->sender_req_id, req_pkt->match.parts.rank,
+                      req_pkt->match.parts.tag, req_pkt->match.parts.context_id));
+
+    /* create a dummy request and search for the message */
+    /* create a request */
+    search_req = MPID_Request_create();
+    MPID_nem_ptl_init_req(search_req);
+    MPIU_ERR_CHKANDJUMP1(!search_req, mpi_errno, MPI_ERR_OTHER, "**nomem", "**nomem %s", "MPID_Request_create");
+    MPIU_Object_set_ref(search_req, 2); /* 1 ref for progress engine and 1 ref for us */
+    search_req->kind = MPID_REQUEST_MPROBE;
+
+    /* create a dummy ME to use for searching the list */
+    me.start = NULL;
+    me.length = 0;
+    me.ct_handle = PTL_CT_NONE;
+    me.uid = PTL_UID_ANY;
+    me.options = ( PTL_ME_OP_PUT | PTL_ME_USE_ONCE );
+    me.min_free = 0;
+    me.match_bits = NPTL_MATCH(req_pkt->match.parts.tag, req_pkt->match.parts.context_id, req_pkt->match.parts.rank);
+
+    me.match_id = vc_ptl->id;
+    me.ignore_bits = NPTL_MATCH_IGNORE;
+
+    /* FIXME: this should use a custom handler that throws the data away inline */
+    REQ_PTL(search_req)->event_handler = handle_mprobe;
+
+    /* submit a search request */
+    ret = PtlMESearch(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_pt, &me, PTL_SEARCH_DELETE, search_req);
+    MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmesearch", "**ptlmesearch %s", MPID_nem_ptl_strerror(ret));
+    DBG_MSG_MESearch("REG", vc ? vc->pg_rank : 0, me, search_req);
+
+    /* wait for search request to complete */
+    do {
+        mpi_errno = MPID_nem_ptl_poll(FALSE);
+        if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+    } while (!MPID_Request_is_complete(search_req));
+
+    /* send response */
+    resp_pkt->type = MPIDI_NEM_PKT_NETMOD;
+    resp_pkt->subtype = MPIDI_NEM_PTL_PKT_CANCEL_SEND_RESP;
+    resp_pkt->ack = REQ_PTL(search_req)->found;
+    resp_pkt->sender_req_id = req_pkt->sender_req_id;
+
+    MPID_nem_ptl_iStartContigMsg(vc, resp_pkt, sizeof(*resp_pkt), NULL,
+                                 0, &resp_req);
+
+    /* if the message was found, free the temporary buffer used to copy the data */
+    if (REQ_PTL(search_req)->found)
+        MPIU_Free(search_req->dev.tmpbuf);
+
+    MPID_Request_release(search_req);
+    if (resp_req != NULL)
+        MPID_Request_release(resp_req);
+
+ fn_exit:
+    return mpi_errno;
+ fn_fail:
+    goto fn_exit;
+}
+
+#undef FUNCNAME
+#define FUNCNAME MPID_nem_ptl_pkt_cancel_send_resp_handler
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+int MPID_nem_ptl_pkt_cancel_send_resp_handler(MPIDI_VC_t *vc, MPIDI_CH3_Pkt_t *pkt,
+                                              MPIDI_msg_sz_t *buflen, MPID_Request **rreqp)
+{
+    int mpi_errno = MPI_SUCCESS;
+    MPID_Request *sreq;
+    MPIDI_nem_ptl_pkt_cancel_send_resp_t *resp_pkt = (MPIDI_nem_ptl_pkt_cancel_send_resp_t *)pkt;
+    int i, ret;
+
+    MPID_Request_get_ptr(resp_pkt->sender_req_id, sreq);
+
+    if (resp_pkt->ack) {
+        MPIR_STATUS_SET_CANCEL_BIT(sreq->status, TRUE);
+
+        /* remove any remaining get MEs */
+        for (i = 0; i < REQ_PTL(sreq)->num_gets; i++) {
+            ret = PtlMEUnlink(REQ_PTL(sreq)->get_me_p[i]);
+            MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmeunlink", "**ptlmeunlink %s", MPID_nem_ptl_strerror(ret));
+        }
+        MPIU_DBG_MSG(CH3_OTHER,TYPICAL,"message cancelled");
+    } else {
+        MPIR_STATUS_SET_CANCEL_BIT(sreq->status, FALSE);
+        MPIU_DBG_MSG(CH3_OTHER,TYPICAL,"unable to cancel message");
+    }
+
+    MPIDI_CH3U_Request_complete(sreq);
+
+     *rreqp = NULL;
+
+ fn_exit:
+    return mpi_errno;
+ fn_fail:
+    goto fn_exit;
+}
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c
index 0ef1335..0440972 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c
@@ -87,7 +87,7 @@ static int handler_recv_dequeue_complete(const ptl_event_t *e)
     MPIDI_STATE_DECL(MPID_STATE_HANDLER_RECV_DEQUEUE_COMPLETE);
 
     MPIDI_FUNC_ENTER(MPID_STATE_HANDLER_RECV_DEQUEUE_COMPLETE);
-    
+
     MPIU_Assert(e->type == PTL_EVENT_PUT || e->type == PTL_EVENT_PUT_OVERFLOW);
     
     dequeue_req(e);
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
index e6bbc66..a6f7d4b 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
@@ -248,15 +248,17 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
 
     MPID_nem_ptl_request_create_sreq(sreq, mpi_errno, comm);
     sreq->dev.match.parts.rank = dest;
+    sreq->dev.match.parts.tag = tag;
+    sreq->dev.match.parts.context_id = comm->context_id + context_offset;
 
     if (!vc_ptl->id_initialized) {
         mpi_errno = MPID_nem_ptl_init_id(vc);
         if (mpi_errno) MPIU_ERR_POP(mpi_errno);
     }
-    
+
     MPIDI_Datatype_get_info(count, datatype, dt_contig, data_sz, dt_ptr, dt_true_lb);
     MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST, "count=%d datatype=%#x contig=%d data_sz=%lu", count, datatype, dt_contig, data_sz));
-    
+
     if (data_sz <= PTL_LARGE_THRESHOLD) {
         /* Small message.  Send all data eagerly */
         if (dt_contig) {
@@ -492,12 +494,37 @@ int MPID_nem_ptl_issend(struct MPIDI_VC *vc, const void *buf, int count, MPI_Dat
 int MPID_nem_ptl_cancel_send(struct MPIDI_VC *vc,  struct MPID_Request *sreq)
 {
     int mpi_errno = MPI_SUCCESS;
+    MPID_PKT_DECL_CAST(upkt, MPIDI_nem_ptl_pkt_cancel_send_req_t, csr_pkt);
+    MPID_Request *csr_sreq;
+    int was_incomplete;
+
     MPIDI_STATE_DECL(MPID_STATE_MPID_NEM_PTL_CANCEL_SEND);
 
     MPIDI_FUNC_ENTER(MPID_STATE_MPID_NEM_PTL_CANCEL_SEND);
 
-    /* portals4 has no way of cancelling a send */
-    MPIU_ERR_SETFATAL(mpi_errno, MPI_ERR_OTHER, "**notimpl");
+    /* The completion counter and reference count are incremented to keep
+       the request around long enough to receive a
+       response regardless of what the user does (free the request before
+       waiting, etc.). */
+    MPIDI_CH3U_Request_increment_cc(sreq, &was_incomplete);
+    if (!was_incomplete) {
+        /* The reference count is incremented only if the request was
+           complete before the increment. */
+        MPIR_Request_add_ref(sreq);
+    }
+
+    csr_pkt->type = MPIDI_NEM_PKT_NETMOD;
+    csr_pkt->subtype = MPIDI_NEM_PTL_PKT_CANCEL_SEND_REQ;
+    csr_pkt->match.parts.rank = sreq->dev.match.parts.rank;
+    csr_pkt->match.parts.tag = sreq->dev.match.parts.tag;
+    csr_pkt->match.parts.context_id = sreq->dev.match.parts.context_id;
+    csr_pkt->sender_req_id = sreq->handle;
+
+    MPID_nem_ptl_iStartContigMsg(vc, csr_pkt, sizeof(*csr_pkt), NULL,
+                                 0, &csr_sreq);
+
+    if (csr_sreq != NULL)
+        MPID_Request_release(csr_sreq);
 
  fn_exit:
     MPIDI_FUNC_EXIT(MPID_STATE_MPID_NEM_PTL_CANCEL_SEND);

http://git.mpich.org/mpich.git/commitdiff/a7197f0b8ec51bfb41f1b336be24323d7ac2479d

commit a7197f0b8ec51bfb41f1b336be24323d7ac2479d
Author: Norio Yamaguchi <norio.yamaguchi at riken.jp>
Date:   Tue Oct 7 09:38:12 2014 +0900

    Fix malloc functions in netmod-IB
    
    Instead of overriding malloc functions, set some hook functions only
    when using netmod-IB.

diff --git a/src/mpid/ch3/channels/nemesis/netmod/ib/ib_malloc.c b/src/mpid/ch3/channels/nemesis/netmod/ib/ib_malloc.c
index 5cf81a4..569b4a7 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/ib/ib_malloc.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/ib/ib_malloc.c
@@ -1,11 +1,14 @@
 #define _GNU_SOURCE 1
 #include <stdio.h>
+#include <stdlib.h>
 #include <stdint.h>
 #include <string.h>
 #include <sys/mman.h>
 #include <unistd.h>
 #include <sys/syscall.h>
 #include <pthread.h>
+#include <malloc.h>
+#include "mpid_nem_impl.h"
 
 //#define __DEBUG__
 
@@ -22,10 +25,9 @@
 #endif
 
 static void _local_malloc_initialize_hook(void);
-void *malloc(size_t size);
-void free(void *addr);
-void *realloc(void *addr, size_t size);
-void *calloc(size_t nmemb, size_t size);
+void *ib_malloc_hook(size_t size, const void *caller);
+void ib_free_hook(void *addr, const void *caller);
+void *ib_realloc_hook(void *addr, size_t size, const void *caller);
 
 void (*__malloc_initialize_hook) (void) = _local_malloc_initialize_hook;
 
@@ -47,6 +49,38 @@ static int __tunnel_munmap = 0;
 
 #define do_segfault  (*(unsigned int*)0 = 0)    // segmentation fault
 
+static int use_ib_malloc = 0;
+
+static void ib_check_env(void)
+{
+    char *target = NULL, *tmp_str = NULL;
+
+    /* The order of comparison is the same as MPIR_T_cvar_init in mpich_cvars.c */
+    tmp_str = getenv("MPICH_NEMESIS_NETMOD");
+    if (tmp_str) {
+        target = tmp_str;
+    }
+    tmp_str = getenv("MPIR_PARAM_NEMESIS_NETMOD");
+    if (tmp_str) {
+        target = tmp_str;
+    }
+    tmp_str = getenv("MPIR_CVAR_NEMESIS_NETMOD");
+    if (tmp_str) {
+        target = tmp_str;
+    }
+
+    /* If environment variable is set, then compare with it.
+     * If environment variables are not set, then compare with the first element of netmod-list.
+     */
+    if ((target && !strncmp(target, "ib", MPID_NEM_MAX_NETMOD_STRING_LEN)) ||
+        (!target && !strncmp(MPID_nem_netmod_strings[0], "ib", MPID_NEM_MAX_NETMOD_STRING_LEN))) {
+        use_ib_malloc = 1;
+        __malloc_hook = ib_malloc_hook;
+        __free_hook = ib_free_hook;
+        __realloc_hook = ib_realloc_hook;
+    }
+}
+
 struct free_list {
     struct free_list *next;
     struct free_list *prev;
@@ -206,6 +240,13 @@ static void _local_malloc_initialize_hook(void)
     pthread_mutex_init(&mutex, NULL);
 
     pthread_mutex_lock(&mutex);
+
+    ib_check_env();
+    if (!use_ib_malloc) {
+        pthread_mutex_unlock(&mutex);
+        return;
+    }
+
     __initialized_malloc = 1;
 
     for (i = 0; i < ARRAY_SIZE; i++) {
@@ -260,7 +301,7 @@ static void _local_malloc_initialize_hook(void)
     pthread_mutex_unlock(&mutex);
 }
 
-void *malloc(size_t size)
+void *ib_malloc_hook(size_t size, const void *caller)
 {
     int i;
     int pow;
@@ -425,7 +466,7 @@ static inline void free_core(void *addr)
     pthread_mutex_unlock(&mutex);
 }
 
-void free(void *addr)
+void ib_free_hook(void *addr, const void *caller)
 {
     if (addr) {
         free_core(addr);
@@ -433,13 +474,13 @@ void free(void *addr)
     }
 }
 
-void *realloc(void *addr, size_t size)
+void *ib_realloc_hook(void *addr, size_t size, const void *caller)
 {
     void *tmp;
 
     dprintf("realloc(%p, %lu)\n", addr, size);
 
-    tmp = malloc(size);
+    tmp = ib_malloc_hook(size, NULL);
 
     if (addr != NULL) {
         int old_pow, new_pow, power;
@@ -472,27 +513,9 @@ void *realloc(void *addr, size_t size)
     return tmp;
 }
 
-void *calloc(size_t nmemb, size_t size)
-{
-    size_t total_sz;
-    char *ptr;
-
-    if (!nmemb || !size)
-        return NULL;
-
-    total_sz = nmemb * size;
-    ptr = malloc(total_sz);
-    if (ptr == NULL)
-        return NULL;
-
-    memset(ptr, 0, total_sz);
-
-    return ptr;
-}
-
 int munmap(void *addr, size_t length)
 {
-    if (__tunnel_munmap) {
+    if (!use_ib_malloc || __tunnel_munmap) {
         dprintf("munmap(%p, 0x%lx)\n", addr, length);
 
         return syscall(__NR_munmap, addr, length);

http://git.mpich.org/mpich.git/commitdiff/b5cf2aa8a7646d32a51f305094a5689e89bbc39d

commit b5cf2aa8a7646d32a51f305094a5689e89bbc39d
Author: Norio Yamaguchi <norio.yamaguchi at riken.jp>
Date:   Tue Nov 11 18:07:04 2014 +0900

    Fix the implementation of RMA in netmod-IB
    
    Corresponding to the implementations of RMA in the upper layer.

diff --git a/src/mpid/ch3/channels/nemesis/netmod/ib/ib_poll.c b/src/mpid/ch3/channels/nemesis/netmod/ib/ib_poll.c
index 938adf3..c5e8779 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/ib/ib_poll.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/ib/ib_poll.c
@@ -462,6 +462,7 @@ int MPID_nem_ib_drain_scq(int dont_call_progress)
                 req->dev.recv_data_sz = type_size * req->dev.user_count;
 
                 int complete = 0;
+                int (*reqFn) (MPIDI_VC_t *, MPID_Request *, int *);
                 mpi_errno =
                     MPIDI_CH3U_Receive_data_found(req, REQ_FIELD(req, lmt_pack_buf), &data_len,
                                                   &complete);
@@ -472,7 +473,12 @@ int MPID_nem_ib_drain_scq(int dont_call_progress)
                 MPIU_Free(REQ_FIELD(req, lmt_pack_buf));
 
                 MPID_nem_ib_lmt_send_PKT_LMT_DONE(req->ch.vc, req);
-                MPIDI_CH3U_Request_complete(req);
+                reqFn = req->dev.OnFinal;
+                if (reqFn) {
+                    reqFn(req->ch.vc, req, &complete);
+                } else {
+                    MPIDI_CH3U_Request_complete(req);
+                }
             }
 
             /* decrement the number of entries in IB command queue */
@@ -498,7 +504,7 @@ int MPID_nem_ib_drain_scq(int dont_call_progress)
                 MPIU_Free(REQ_FIELD(req, lmt_pack_buf));
 
                 complete = 0;
-                mpi_errno = MPIDI_CH3_ReqHandler_PutAccumRespComplete(req->ch.vc, req, &complete);      // call MPIDI_CH3U_Request_complete()
+                mpi_errno = MPIDI_CH3_ReqHandler_PutRecvComplete(req->ch.vc, req, &complete);      // call MPIDI_CH3U_Request_complete()
                 if (mpi_errno)
                     MPIU_ERR_POP(mpi_errno);
                 MPIU_Assert(complete == TRUE);
@@ -530,7 +536,7 @@ int MPID_nem_ib_drain_scq(int dont_call_progress)
 
                 /* All dtype data has been received, call req handler */
                 mpi_errno =
-                    MPIDI_CH3_ReqHandler_PutRespDerivedDTComplete(req->ch.vc, req, &complete);
+                    MPIDI_CH3_ReqHandler_PutDerivedDTRecvComplete(req->ch.vc, req, &complete);
                 MPIU_ERR_CHKANDJUMP1(mpi_errno, mpi_errno, MPI_ERR_OTHER, "**ch3|postrecv",
                                      "**ch3|postrecv %s", "MPIDI_CH3_PKT_PUT");
                 /* return 'complete == FALSE' */
@@ -571,7 +577,7 @@ int MPID_nem_ib_drain_scq(int dont_call_progress)
                 MPIU_Free(REQ_FIELD(req, lmt_pack_buf));
 
                 complete = 0;
-                mpi_errno = MPIDI_CH3_ReqHandler_PutAccumRespComplete(req->ch.vc, req, &complete);      // call MPIDI_CH3U_Request_complete()
+                mpi_errno = MPIDI_CH3_ReqHandler_AccumRecvComplete(req->ch.vc, req, &complete);      // call MPIDI_CH3U_Request_complete()
                 if (mpi_errno)
                     MPIU_ERR_POP(mpi_errno);
                 MPIU_Assert(complete == TRUE);
@@ -603,7 +609,7 @@ int MPID_nem_ib_drain_scq(int dont_call_progress)
 
                 /* All dtype data has been received, call req handler */
                 mpi_errno =
-                    MPIDI_CH3_ReqHandler_AccumRespDerivedDTComplete(req->ch.vc, req, &complete);
+                    MPIDI_CH3_ReqHandler_AccumDerivedDTRecvComplete(req->ch.vc, req, &complete);
                 MPIU_ERR_CHKANDJUMP1(mpi_errno, mpi_errno, MPI_ERR_OTHER, "**ch3|postrecv",
                                      "**ch3|postrecv %s", "MPIDI_CH3_ACCUMULATE");
                 /* return 'complete == FALSE' */
@@ -1819,7 +1825,6 @@ int MPID_nem_ib_PktHandler_Put(MPIDI_VC_t * vc, MPIDI_CH3_Pkt_t * pkt,
 
     MPIU_Assert(put_pkt->target_win_handle != MPI_WIN_NULL);
     MPID_Win_get_ptr(put_pkt->target_win_handle, win_ptr);
-    mpi_errno = MPIDI_CH3_Start_rma_op_target(win_ptr, put_pkt->flags);
 
     req = MPID_Request_create();
     MPIU_Object_set_ref(req, 1);        /* decrement only in drain_scq ? */
@@ -1831,6 +1836,7 @@ int MPID_nem_ib_PktHandler_Put(MPIDI_VC_t * vc, MPIDI_CH3_Pkt_t * pkt,
     req->dev.target_win_handle = put_pkt->target_win_handle;
     req->dev.source_win_handle = put_pkt->source_win_handle;
     req->dev.flags = put_pkt->flags;
+    req->dev.OnFinal = MPIDI_CH3_ReqHandler_PutRecvComplete;
 
     if (MPIR_DATATYPE_IS_PREDEFINED(put_pkt->datatype)) {
         MPIDI_Request_set_type(req, MPIDI_REQUEST_TYPE_PUT_RESP);
@@ -1838,6 +1844,12 @@ int MPID_nem_ib_PktHandler_Put(MPIDI_VC_t * vc, MPIDI_CH3_Pkt_t * pkt,
 
         MPID_Datatype_get_size_macro(put_pkt->datatype, type_size);
         req->dev.recv_data_sz = type_size * put_pkt->count;
+        if (put_pkt->immed_len > 0) {
+            /* See if we can receive some data from packet header. */
+            MPIU_Memcpy(req->dev.user_buf, put_pkt->data, put_pkt->immed_len);
+            req->dev.user_buf = (void*)((char*)req->dev.user_buf + put_pkt->immed_len);
+            req->dev.recv_data_sz -= put_pkt->immed_len;
+        }
     }
     else {
         /* derived datatype */
@@ -1945,7 +1957,6 @@ int MPID_nem_ib_PktHandler_Accumulate(MPIDI_VC_t * vc,
 
     MPIU_Assert(accum_pkt->target_win_handle != MPI_WIN_NULL);
     MPID_Win_get_ptr(accum_pkt->target_win_handle, win_ptr);
-    mpi_errno = MPIDI_CH3_Start_rma_op_target(win_ptr, accum_pkt->flags);
 
     req = MPID_Request_create();
     MPIU_Object_set_ref(req, 1);
@@ -1960,12 +1971,8 @@ int MPID_nem_ib_PktHandler_Accumulate(MPIDI_VC_t * vc,
     req->dev.source_win_handle = accum_pkt->source_win_handle;
     req->dev.flags = accum_pkt->flags;
 
-    if (accum_pkt->type == MPIDI_CH3_PKT_GET_ACCUM) {
-        req->dev.resp_request_handle = accum_pkt->request_handle;
-    }
-    else {
-        req->dev.resp_request_handle = MPI_REQUEST_NULL;
-    }
+    req->dev.resp_request_handle = MPI_REQUEST_NULL;
+    req->dev.OnFinal = MPIDI_CH3_ReqHandler_AccumRecvComplete;
 
     if (MPIR_DATATYPE_IS_PREDEFINED(accum_pkt->datatype)) {
         MPIDI_Request_set_type(req, MPIDI_REQUEST_TYPE_ACCUM_RESP);
@@ -1978,13 +1985,22 @@ int MPID_nem_ib_PktHandler_Accumulate(MPIDI_VC_t * vc,
         MPIU_Assert(true_lb == 0);
 
         req->dev.user_buf = MPIU_Malloc(accum_pkt->count * (MPIR_MAX(extent, true_extent)));
+        req->dev.final_user_buf = req->dev.user_buf;
 
         MPID_Datatype_get_size_macro(accum_pkt->datatype, type_size);
         req->dev.recv_data_sz = type_size * accum_pkt->count;
+
+        if (accum_pkt->immed_len > 0) {
+            /* See if we can receive some data from packet header. */
+            MPIU_Memcpy(req->dev.user_buf, accum_pkt->data, accum_pkt->immed_len);
+            req->dev.user_buf = (void*)((char*)req->dev.user_buf + accum_pkt->immed_len);
+            req->dev.recv_data_sz -= accum_pkt->immed_len;
+        }
+
     }
     else {
         MPIDI_Request_set_type(req, MPIDI_REQUEST_TYPE_ACCUM_RESP_DERIVED_DT);
-        req->dev.OnDataAvail = MPIDI_CH3_ReqHandler_AccumRespDerivedDTComplete;
+        req->dev.OnDataAvail = MPIDI_CH3_ReqHandler_AccumDerivedDTRecvComplete;
         req->dev.datatype = MPI_DATATYPE_NULL;
 
         req->dev.dtype_info = (MPIDI_RMA_dtype_info *) MPIU_Malloc(sizeof(MPIDI_RMA_dtype_info));
@@ -2137,6 +2153,25 @@ int MPID_nem_ib_PktHandler_GetResp(MPIDI_VC_t * vc,
 
     MPID_Request_get_ptr(get_resp_pkt->request_handle, req);
 
+    MPID_Win *win_ptr;
+    int target_rank = get_resp_pkt->target_rank;
+
+    MPID_Win_get_ptr(get_resp_pkt->source_win_handle, win_ptr);
+
+    /* decrement ack_counter on target */
+    if (get_resp_pkt->flags & MPIDI_CH3_PKT_FLAG_RMA_LOCK_GRANTED) {
+        mpi_errno = set_lock_sync_counter(win_ptr, target_rank);
+        if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+    }
+    if (get_resp_pkt->flags & MPIDI_CH3_PKT_FLAG_RMA_FLUSH_ACK) {
+        mpi_errno = MPIDI_CH3I_RMA_Handle_flush_ack(win_ptr, target_rank);
+        if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+    }
+    if (get_resp_pkt->flags & MPIDI_CH3_PKT_FLAG_RMA_UNLOCK_ACK) {
+        mpi_errno = MPIDI_CH3I_RMA_Handle_flush_ack(win_ptr, target_rank);
+        if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+    }
+
     void *write_to_buf;
 
     req->ch.lmt_data_sz = s_cookie_buf->len;
diff --git a/src/mpid/ch3/channels/nemesis/netmod/ib/ib_send.c b/src/mpid/ch3/channels/nemesis/netmod/ib/ib_send.c
index 152f33c..45c2006 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/ib/ib_send.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/ib/ib_send.c
@@ -302,9 +302,6 @@ static int MPID_nem_ib_iSendContig_core(MPIDI_VC_t * vc, MPID_Request * sreq, vo
     if (((MPIDI_CH3_Pkt_t *) hdr)->type == MPIDI_CH3_PKT_GET) {
         //printf("isendcontig_core,MPIDI_CH3_PKT_GET,ref_count=%d\n", sreq->ref_count);
     }
-    if (hdr && ((MPIDI_CH3_Pkt_t *) hdr)->type == MPIDI_CH3_PKT_ACCUM_IMMED) {
-        dprintf("isendcontig_core,MPIDI_CH3_PKT_ACCUM_IMMED,ref_count=%d\n", sreq->ref_count);
-    }
     if (hdr && ((MPIDI_CH3_Pkt_t *) hdr)->type == MPIDI_CH3_PKT_ACCUMULATE) {
         dprintf("isendcontig_core,MPIDI_CH3_PKT_ACCUMULATE,ref_count=%d\n", sreq->ref_count);
     }

http://git.mpich.org/mpich.git/commitdiff/0815558bd0add7aa204b4949ac34302cb7d17676

commit 0815558bd0add7aa204b4949ac34302cb7d17676
Author: Norio Yamaguchi <norio.yamaguchi at riken.jp>
Date:   Tue Nov 11 15:17:05 2014 +0900

    Fix compile error

diff --git a/src/mpid/ch3/channels/nemesis/netmod/ib/ib_poll.c b/src/mpid/ch3/channels/nemesis/netmod/ib/ib_poll.c
index e6cc5b4..938adf3 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/ib/ib_poll.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/ib/ib_poll.c
@@ -33,7 +33,7 @@ static int entered_drain_scq = 0;
                 mpi_errno = MPID_nem_ib_poll_eager(&MPID_nem_ib_ringbuf[n]); /*FIXME: perform send_progress for all sendqs */ \
                 MPIU_ERR_CHKANDJUMP(mpi_errno, mpi_errno, MPI_ERR_OTHER, "**MPID_nem_ib_poll_eager"); \
             }                                                           \
-        } while (0)
+        } while (0);                                                    \
 }
 #if 0
    int n;                                         \

http://git.mpich.org/mpich.git/commitdiff/2a94597d52f81ca20232dda2934183b8a8092a5c

commit 2a94597d52f81ca20232dda2934183b8a8092a5c
Author: Junchao Zhang <jczhang at mcs.anl.gov>
Date:   Mon Nov 10 14:35:35 2014 -0600

    Fix: Explicitly add the OPA header file to mpiimpl.h
    
    Without it, the code is broken in Intel's MPI build
    
    Signed-off-by: Antonio J. Pena <apenya at mcs.anl.gov>

diff --git a/src/include/mpiimpl.h b/src/include/mpiimpl.h
index a063061..58666fd 100644
--- a/src/include/mpiimpl.h
+++ b/src/include/mpiimpl.h
@@ -36,6 +36,8 @@
    do not want mpi.h to depend on any other files or configure flags */
 #include "mpichconf.h"
 
+#include "opa_primitives.h"
+
 /* if we are defining this, we must define it before including mpl.h */
 #if defined(MPICH_DEBUG_MEMINIT)
 #define MPL_VG_ENABLED 1

http://git.mpich.org/mpich.git/commitdiff/5ad1d4d24a1ee7e704b87fbd36a0a9e1a187f64e

commit 5ad1d4d24a1ee7e704b87fbd36a0a9e1a187f64e
Author: Junchao Zhang <jczhang at mcs.anl.gov>
Date:   Fri Nov 7 09:58:12 2014 -0600

    Add support for skipping individual MPIX tests
    
    If --enable_strictmpi is passed to configure, we need to skip non-MPI-standard tests.
    Here is how you can do that. Suppose you have an MPIX test foobar, you need
    
    1) In Makefile.am, to skip building foobar, add
    
        if BUILD_MPIX_TESTS
        noinst_PROGRAMS += foobar
        endif
    
      Note: There is no tab indentions before noinst_PROGRAMS
    
    2) In testlist.in (please convert testlist to testlist.in if necessary), to skip
       running foobar, add
    
        @mpix@ foobar 2
    
    Signed-off-by: Pavan Balaji <balaji at anl.gov>

diff --git a/test/mpi/configure.ac b/test/mpi/configure.ac
index 1bf4a62..70c05f5 100644
--- a/test/mpi/configure.ac
+++ b/test/mpi/configure.ac
@@ -464,6 +464,18 @@ if test "$FROM_MPICH" = "yes" -a "$enable_strictmpi" = "no" ; then
 fi
 AC_SUBST(MPI_HAS_MPIX)
 
+# Prepend @mpix@ to lines of tests in testlist.in which are MPIX tests so that
+# we can skip running these tests when we do strict MPI test.
+mpix="#"
+if test "$enable_strictmpi" = "no"; then
+    mpix=""
+fi
+AC_SUBST(mpix)
+
+# Use the conditional variable BUILD_MPIX_TESTS to conditionally add MPIX tests
+# to noninst_PROGRAMS to skip building the tests when we do strict MPI test
+AM_CONDITIONAL([BUILD_MPIX_TESTS], [test "$enable_strictmpi" = "no"])
+
 # preserve these values across a reconfigure
 AC_ARG_VAR([WRAPPER_CFLAGS],[])
 AC_ARG_VAR([WRAPPER_CPPFLAGS],[])

http://git.mpich.org/mpich.git/commitdiff/a1359352ff7a3861269f83318f035aef4bb52d19

commit a1359352ff7a3861269f83318f035aef4bb52d19
Author: Junchao Zhang <jczhang at mcs.anl.gov>
Date:   Fri Nov 7 12:58:46 2014 -0600

    Remove unused AM_CONDITIONAL vars in testsuite configure.ac
    
    The expressions are wrong, e.g., [test "X$f77dir" = "f77"] should be [test "$f77dir" = "f77"].
    Also, these vars are not used. So we just remove them.
    
    Signed-off-by: Ken Raffenetti <raffenet at mcs.anl.gov>

diff --git a/test/mpi/configure.ac b/test/mpi/configure.ac
index 81cbd95..1bf4a62 100644
--- a/test/mpi/configure.ac
+++ b/test/mpi/configure.ac
@@ -804,7 +804,6 @@ if test "$f77dir" = "f77" ; then
     AC_DEFINE(HAVE_FORTRAN_BINDING,1,[Define if Fortran is supported])
 fi
 
-AM_CONDITIONAL([BUILD_F77_TESTS],[test "X$f77dir" = "f77"])
 
 AC_ARG_VAR([MPI_SIZEOF_AINT],[if set, force MPI_Aint to a width of this many bytes])
 AC_ARG_VAR([MPI_SIZEOF_OFFSET],[if set, force MPI_Offset to a width of this many bytes])
@@ -1189,7 +1188,6 @@ elif test "$enable_fc" = yes ; then
     ])
     AC_LANG_POP([Fortran])
 fi
-AM_CONDITIONAL([BUILD_F90_TESTS],[test "X$f90dir" = "f90"])
 
 f08dir="#"
 AC_SUBST(f08dir)
@@ -1294,7 +1292,6 @@ if test "$enable_cxx" = yes ; then
     fi
     AC_LANG_POP([C++])
 fi
-AM_CONDITIONAL([BUILD_CXX_TESTS],[test "X$cxxdir" = "cxx"])
 
 AC_LANG_C
 # IO

http://git.mpich.org/mpich.git/commitdiff/a58b494e02bbf19f635d8f58ead81bddc32919ee

commit a58b494e02bbf19f635d8f58ead81bddc32919ee
Author: Huiwei Lu <huiweilu at mcs.anl.gov>
Date:   Thu Nov 6 15:58:40 2014 -0600

    Fixes env in runtest
    
    When the parameter of 'env' is parsed the first time, it adds an extra
    space in the front. When the script kicks off each test, this extra
    space is not a correct form the script want to interpret and it
    complains in the output: "not in a=b form".
    
    Signed-off-by: Wesley Bland <wbland at anl.gov>

diff --git a/test/mpi/runtests.in b/test/mpi/runtests.in
index 1659c8f..ba8dd6a 100644
--- a/test/mpi/runtests.in
+++ b/test/mpi/runtests.in
@@ -439,7 +439,12 @@ sub RunList {
 		    $mpiexecArgs = "$mpiexecArgs $value";
 		}
 		elsif ($key eq "env") {
-		    $progEnv = "$progEnv $value";
+		    if ($progEnv eq "") {
+			$progEnv = "$value";
+		    }
+		    else {
+			$progEnv = "$progEnv $value";
+		    }
 		}
 		elsif ($key eq "mpiversion") {
 		    $mpiVersion = $value;

http://git.mpich.org/mpich.git/commitdiff/ea8993ebd8940eca8a5151bff7a9e73ea077252a

commit ea8993ebd8940eca8a5151bff7a9e73ea077252a
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Thu Nov 6 13:07:48 2014 -0600

    portals4: fixup for noncontig recvs
    
    A recent testsuite update unveiled an issue when unpacking a large
    noncontiguous message. We need to ignore any previous segment
    manipulation when unpacking the beginning of the message.
    
    Signed-off-by: Antonio J. Pena <apenya at mcs.anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c
index 2152da7..0ef1335 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c
@@ -287,7 +287,7 @@ static int handler_recv_dequeue_large(const ptl_event_t *e)
             MPIU_Memcpy((char *)rreq->dev.user_buf + dt_true_lb, e->start, e->mlength);
         } else {
             last = e->mlength;
-            MPID_Segment_unpack(rreq->dev.segment_ptr, rreq->dev.segment_first, &last, e->start);
+            MPID_Segment_unpack(rreq->dev.segment_ptr, 0, &last, e->start);
             MPIU_Assert(last == e->mlength);
             rreq->dev.segment_first = e->mlength;
         }

http://git.mpich.org/mpich.git/commitdiff/4761cf58315577613d8a13de4abfec65d4fa9e4a

commit 4761cf58315577613d8a13de4abfec65d4fa9e4a
Author: Xin Zhao <xinzhao3 at illinois.edu>
Date:   Thu Nov 6 13:57:35 2014 -0600

    Remove unused variable in examples/ircpi.c.
    
    Signed-off-by: Ken Raffenetti <raffenet at mcs.anl.gov>

diff --git a/examples/ircpi.c b/examples/ircpi.c
index 30c1ed2..6e42c5b 100644
--- a/examples/ircpi.c
+++ b/examples/ircpi.c
@@ -11,7 +11,7 @@
 
 int main(int argc, char *argv[])
 {
-    int n, myid, numprocs, i, ierr;
+    int n, myid, numprocs, i;
     double PI25DT = 3.141592653589793238462643;
     double mypi, pi, h, sum, x;
     MPI_Win nwin, piwin;
@@ -36,7 +36,7 @@ int main(int argc, char *argv[])
         if (myid == 0) {
             fprintf(stdout, "Enter the number of intervals: (0 quits) ");
             fflush(stdout);
-            ierr=scanf("%d",&n);
+            scanf("%d",&n);
             pi = 0.0;
         }
         MPI_Win_fence(0, nwin);

http://git.mpich.org/mpich.git/commitdiff/6eb880c06d3ced595f52878638136eaa1b44ea90

commit 6eb880c06d3ced595f52878638136eaa1b44ea90
Author: Xin Zhao <xinzhao3 at illinois.edu>
Date:   Thu Nov 6 13:31:12 2014 -0600

    Move ircpi.c from test/mpi/rma to examples.
    
    ircpi.c is an interactive test, which is never triggered
    in RMA test suite. It is better to put it under examples.
    
    Signed-off-by: Ken Raffenetti <raffenet at mcs.anl.gov>

diff --git a/examples/Makefile.am b/examples/Makefile.am
index 72d593a..40d4cff 100644
--- a/examples/Makefile.am
+++ b/examples/Makefile.am
@@ -57,7 +57,7 @@ noinst_PROGRAMS = cpi
 # pmandel requires a separate set of socket calls (its a long story)
 # and may not build on most platforms
 EXTRA_PROGRAMS = pmandel pmandel_spawn pmandel_service pmandel_spaserv    \
-                 pmandel_fence hellow icpi parent child srtest \
+                 pmandel_fence hellow icpi ircpi parent child srtest \
                  spawn_merge_parent spawn_merge_child1 spawn_merge_child2
 
 # LIBS includes -lmpich and other libraries (e.g., -lpmpich if
@@ -82,6 +82,8 @@ cpi_LDFLAGS = $(AM_LDFLAGS) $(mpich_libtool_static_flag)
 
 icpi_SOURCES = icpi.c
 icpi_LDADD = -lm
+ircpi_SOURCES = ircpi.c
+ircpi_LDADD = -lm
 pmandel_SOURCES = pmandel.c
 pmandel_LDADD = -lm
 pmandel_spawn_SOURCES = pmandel_spawn.c
diff --git a/examples/ircpi.c b/examples/ircpi.c
new file mode 100644
index 0000000..30c1ed2
--- /dev/null
+++ b/examples/ircpi.c
@@ -0,0 +1,71 @@
+/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
+/*
+ *  (C) 2001 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ */
+#include "mpi.h"
+#include "stdio.h"
+#include <math.h>
+
+/* From Using MPI-2 */
+
+int main(int argc, char *argv[])
+{
+    int n, myid, numprocs, i, ierr;
+    double PI25DT = 3.141592653589793238462643;
+    double mypi, pi, h, sum, x;
+    MPI_Win nwin, piwin;
+
+    MPI_Init(&argc,&argv);
+    MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
+    MPI_Comm_rank(MPI_COMM_WORLD,&myid);
+
+    if (myid == 0) {
+        MPI_Win_create(&n, sizeof(int), 1, MPI_INFO_NULL,
+                       MPI_COMM_WORLD, &nwin);
+        MPI_Win_create(&pi, sizeof(double), 1, MPI_INFO_NULL,
+                       MPI_COMM_WORLD, &piwin);
+    }
+    else {
+        MPI_Win_create(MPI_BOTTOM, 0, 1, MPI_INFO_NULL,
+                       MPI_COMM_WORLD, &nwin);
+        MPI_Win_create(MPI_BOTTOM, 0, 1, MPI_INFO_NULL,
+                       MPI_COMM_WORLD, &piwin);
+    }
+    while (1) {
+        if (myid == 0) {
+            fprintf(stdout, "Enter the number of intervals: (0 quits) ");
+            fflush(stdout);
+            ierr=scanf("%d",&n);
+            pi = 0.0;
+        }
+        MPI_Win_fence(0, nwin);
+        if (myid != 0)
+            MPI_Get(&n, 1, MPI_INT, 0, 0, 1, MPI_INT, nwin);
+        MPI_Win_fence(0, nwin);
+        if (n == 0)
+            break;
+        else {
+            h = 1.0 / (double) n;
+            sum = 0.0;
+            for (i = myid + 1; i <= n; i += numprocs) {
+                x = h * ((double)i - 0.5);
+                sum += (4.0 / (1.0 + x*x));
+            }
+            mypi = h * sum;
+            MPI_Win_fence( 0, piwin);
+            MPI_Accumulate(&mypi, 1, MPI_DOUBLE, 0, 0, 1, MPI_DOUBLE,
+                           MPI_SUM, piwin);
+            MPI_Win_fence(0, piwin);
+            if (myid == 0) {
+                fprintf(stdout, "pi is approximately %.16f, Error is %.16f\n",
+                        pi, fabs(pi - PI25DT));
+                fflush(stdout);
+            }
+        }
+    }
+    MPI_Win_free(&nwin);
+    MPI_Win_free(&piwin);
+    MPI_Finalize();
+    return 0;
+}
diff --git a/test/mpi/rma/Makefile.am b/test/mpi/rma/Makefile.am
index e370fbf..c5c7d71 100644
--- a/test/mpi/rma/Makefile.am
+++ b/test/mpi/rma/Makefile.am
@@ -14,7 +14,6 @@ EXTRA_DIST = testlist
 ## correctly
 noinst_PROGRAMS =          \
     allocmem               \
-    ircpi                  \
     test1                  \
     test2                  \
     test2_shm              \
@@ -215,5 +214,3 @@ mutex_bench_shm_ordered_SOURCES  = mutex_bench.c mcs-mutex.c mcs-mutex.h
 
 linked_list_bench_lock_shr_nocheck_SOURCES  = linked_list_bench_lock_shr.c
 linked_list_bench_lock_shr_nocheck_CPPFLAGS = -DUSE_MODE_NOCHECK $(AM_CPPFLAGS)
-
-ircpi_LDADD    = $(LDADD) -lm
diff --git a/test/mpi/rma/ircpi.c b/test/mpi/rma/ircpi.c
deleted file mode 100644
index 99a83ed..0000000
--- a/test/mpi/rma/ircpi.c
+++ /dev/null
@@ -1,71 +0,0 @@
-/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
-/*
- *  (C) 2001 by Argonne National Laboratory.
- *      See COPYRIGHT in top-level directory.
- */
-#include "mpi.h" 
-#include "stdio.h"
-#include <math.h> 
-
-/* From Using MPI-2 */
-
-int main(int argc, char *argv[]) 
-{ 
-    int n, myid, numprocs, i, ierr; 
-    double PI25DT = 3.141592653589793238462643; 
-    double mypi, pi, h, sum, x; 
-    MPI_Win nwin, piwin; 
- 
-    MPI_Init(&argc,&argv); 
-    MPI_Comm_size(MPI_COMM_WORLD,&numprocs); 
-    MPI_Comm_rank(MPI_COMM_WORLD,&myid); 
- 
-    if (myid == 0) { 
-	MPI_Win_create(&n, sizeof(int), 1, MPI_INFO_NULL, 
-		       MPI_COMM_WORLD, &nwin); 
-	MPI_Win_create(&pi, sizeof(double), 1, MPI_INFO_NULL, 
-		       MPI_COMM_WORLD, &piwin);  
-    } 
-    else { 
-	MPI_Win_create(MPI_BOTTOM, 0, 1, MPI_INFO_NULL, 
-		       MPI_COMM_WORLD, &nwin); 
-	MPI_Win_create(MPI_BOTTOM, 0, 1, MPI_INFO_NULL, 
-		       MPI_COMM_WORLD, &piwin); 
-    } 
-    while (1) { 
-        if (myid == 0) { 
-            fprintf(stdout, "Enter the number of intervals: (0 quits) ");
-	    fflush(stdout); 
-            ierr=scanf("%d",&n); 
-	    pi = 0.0;			 
-        } 
-	MPI_Win_fence(0, nwin); 
-	if (myid != 0)  
-	    MPI_Get(&n, 1, MPI_INT, 0, 0, 1, MPI_INT, nwin); 
-	MPI_Win_fence(0, nwin); 
-        if (n == 0) 
-            break; 
-        else { 
-            h   = 1.0 / (double) n; 
-            sum = 0.0; 
-            for (i = myid + 1; i <= n; i += numprocs) { 
-                x = h * ((double)i - 0.5); 
-                sum += (4.0 / (1.0 + x*x)); 
-            } 
-            mypi = h * sum; 
-	    MPI_Win_fence( 0, piwin); 
-	    MPI_Accumulate(&mypi, 1, MPI_DOUBLE, 0, 0, 1, MPI_DOUBLE, 
-			   MPI_SUM, piwin); 
-	    MPI_Win_fence(0, piwin); 
-            if (myid == 0) { 
-                fprintf(stdout, "pi is approximately %.16f, Error is %.16f\n", 
-                       pi, fabs(pi - PI25DT)); 
-		fflush(stdout);
-	    }
-        } 
-    } 
-    MPI_Win_free(&nwin); 
-    MPI_Win_free(&piwin); 
-    MPI_Finalize(); 
-    return 0; 
-} 

http://git.mpich.org/mpich.git/commitdiff/362bf4fbd43d3b1d5f3086d1d5348dd5d6906fbd

commit 362bf4fbd43d3b1d5f3086d1d5348dd5d6906fbd
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Thu Nov 6 09:46:12 2014 -0600

    portals4: temporarily buf counts in ptl_nm.c
    
    There is a complete overhaul of this file on the way, but in the meantime
    we raise these limits to prevent deadlock in MPI_Finalize with process
    count >= 12.
    
    No reviewer.

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c
index e461bbc..f0d447d 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c
@@ -8,8 +8,8 @@
 #include <mpl_utlist.h>
 #include "rptl.h"
 
-#define NUM_SEND_BUFS 20
-#define NUM_RECV_BUFS 20
+#define NUM_SEND_BUFS 100
+#define NUM_RECV_BUFS 100
 #define BUFLEN  (sizeof(MPIDI_CH3_Pkt_t) + PTL_MAX_EAGER)
 
 typedef struct MPID_nem_ptl_sendbuf {

http://git.mpich.org/mpich.git/commitdiff/470b2e760439df147a9dc2fbf3e3d5ac3ff5f4e1

commit 470b2e760439df147a9dc2fbf3e3d5ac3ff5f4e1
Author: Min Si <msi at il.is.s.u-tokyo.ac.jp>
Date:   Wed Oct 29 15:34:32 2014 -0500

    Increase timelimit to 10mins for bcast2 and bcast3.
    
    Signed-off-by: Wesley Bland <wbland at anl.gov>

diff --git a/test/mpi/coll/testlist b/test/mpi/coll/testlist
index c4028f6..2e68ed7 100644
--- a/test/mpi/coll/testlist
+++ b/test/mpi/coll/testlist
@@ -35,8 +35,8 @@ bcasttest 10
 bcast2 4
 # More that 8 processes are required to get bcast to switch to the long
 # msg algorithm (see coll definitions in mpiimpl.h)
-bcast2 10 timeLimit=420
-bcast3 10 timeLimit=420
+bcast2 10 timeLimit=600
+bcast3 10 timeLimit=600
 bcastzerotype 1
 bcastzerotype 4
 bcastzerotype 5

http://git.mpich.org/mpich.git/commitdiff/a5f290875eb73b5e97166638e348d1d47f7062c9

commit a5f290875eb73b5e97166638e348d1d47f7062c9
Author: Min Si <msi at il.is.s.u-tokyo.ac.jp>
Date:   Mon Nov 3 15:41:15 2014 -0600

    Add min version of mtest datatype generator.
    
    Some mpi tests such as bcast2 and bcast3 take 20mins to run all the
    datatypes on tcp. Therefore, we also define a minimum version of the
    datatype generator which only generates vector and indexed tests so that
    such heavy tests can use the min version to shorten time.
    
    We enable the full version by default, tests can turn to min version by
    calling the corresponding init func before datatype loop.
    
    In coll/bcast2, coll/bcast3 and pt2pt/pingpong tests, we change to min
    version from the second datatype loop.
    
    Signed-off-by: Wesley Bland <wbland at anl.gov>

diff --git a/test/mpi/coll/bcast2.c b/test/mpi/coll/bcast2.c
index 2ff6d86..2acfda3 100644
--- a/test/mpi/coll/bcast2.c
+++ b/test/mpi/coll/bcast2.c
@@ -38,6 +38,14 @@ int main( int argc, char *argv[] )
 	MPI_Errhandler_set( comm, MPI_ERRORS_RETURN );
 
     MTEST_DATATYPE_FOR_EACH_COUNT(count) {
+
+        /* Only run full datatype tests in comm world to shorten test time. */
+        if (comm == MPI_COMM_WORLD) {
+            MTestInitFullDatatypes();
+        } else {
+            MTestInitMinDatatypes();
+        }
+
 	    while (MTestGetDatatypes( &sendtype, &recvtype, count )) {
 		for (root=0; root<size; root++) {
 		    if (rank == root) {
diff --git a/test/mpi/coll/bcast3.c b/test/mpi/coll/bcast3.c
index 3457208..c78d769 100644
--- a/test/mpi/coll/bcast3.c
+++ b/test/mpi/coll/bcast3.c
@@ -34,6 +34,14 @@ int main( int argc, char *argv[] )
 	
 	count = 1;
 	MTEST_DATATYPE_FOR_EACH_COUNT(count) {
+
+        /* Only run full datatype tests in comm world to shorten test time. */
+        if (comm == MPI_COMM_WORLD) {
+            MTestInitFullDatatypes();
+        } else {
+            MTestInitMinDatatypes();
+        }
+
 	    while (MTestGetDatatypes( &sendtype, &recvtype, count )) {
 		for (root=0; root<size; root++) {
 		    if (rank == root) {
diff --git a/test/mpi/include/mpitest.h b/test/mpi/include/mpitest.h
index cdbbe79..0b46ef7 100644
--- a/test/mpi/include/mpitest.h
+++ b/test/mpi/include/mpitest.h
@@ -63,6 +63,14 @@ typedef struct _MTestDatatype {
 #define MTEST_DATATYPE_FOR_EACH_COUNT(count) \
         for (count = 1; count <= 262144; count *= 8)
 
+/* Setup the full version of datatype tests.
+ * It generate tests for all basic datatypes and all derived datatypes except darray. */
+void MTestInitFullDatatypes();
+
+/* Setup the minimum version of datatype tests.
+ * It generate tests for all basic datatypes, vector and indexed. */
+void MTestInitMinDatatypes();
+
 int MTestCheckRecv( MPI_Status *, MTestDatatype * );
 int MTestGetDatatypes( MTestDatatype *, MTestDatatype *, int );
 void MTestResetDatatypes( void );
diff --git a/test/mpi/pt2pt/pingping.c b/test/mpi/pt2pt/pingping.c
index 95f6e39..c725216 100644
--- a/test/mpi/pt2pt/pingping.c
+++ b/test/mpi/pt2pt/pingping.c
@@ -41,6 +41,14 @@ int main( int argc, char *argv[] )
 	MPI_Comm_set_errhandler( comm, MPI_ERRORS_RETURN );
 
 	for (count = 1; count < MAX_COUNT; count = count * 2) {
+
+        /* Only run full datatype tests in comm world to shorten test time. */
+        if (comm == MPI_COMM_WORLD) {
+            MTestInitFullDatatypes();
+        } else {
+            MTestInitMinDatatypes();
+        }
+
 	    while (MTestGetDatatypes( &sendtype, &recvtype, count )) {
 		int nbytes;
 		MPI_Type_size( sendtype.datatype, &nbytes );
diff --git a/test/mpi/util/mtest_datatype.c b/test/mpi/util/mtest_datatype.c
index 412faa7..39a5f1e 100644
--- a/test/mpi/util/mtest_datatype.c
+++ b/test/mpi/util/mtest_datatype.c
@@ -1281,3 +1281,10 @@ void MTestTypeCreatorInit(MTestDdtCreator * creators)
     creators[MTEST_DDT_SUBARRAY_ORDER_C] = MTestTypeSubArrayOrderCCreate;
     creators[MTEST_DDT_SUBARRAY_ORDER_FORTRAN] = MTestTypeSubArrayOrderFortranCreate;
 }
+
+void MTestTypeMinCreatorInit(MTestDdtCreator * creators)
+{
+    memset(creators, 0, sizeof(MTestDdtCreator) * MTEST_DDT_MAX);
+    creators[MTEST_MIN_DDT_VECTOR] = MTestTypeVectorCreate;
+    creators[MTEST_MIN_DDT_INDEXED] = MTestTypeIndexedCreate;
+}
diff --git a/test/mpi/util/mtest_datatype.h b/test/mpi/util/mtest_datatype.h
index f4c6828..c33a8e4 100644
--- a/test/mpi/util/mtest_datatype.h
+++ b/test/mpi/util/mtest_datatype.h
@@ -44,9 +44,16 @@ enum MTEST_DERIVED_DT {
     MTEST_DDT_MAX
 };
 
+enum MTEST_MIN_DERIVED_DT {
+    MTEST_MIN_DDT_VECTOR,
+    MTEST_MIN_DDT_INDEXED,
+    MTEST_MIN_DDT_MAX
+};
+
 typedef int (*MTestDdtCreator) (int, int, int, MPI_Datatype, const char *, MTestDatatype *);
 
 extern void MTestTypeCreatorInit(MTestDdtCreator * creators);
+extern void MTestTypeMinCreatorInit(MTestDdtCreator * creators);
 extern void *MTestTypeInitRecv(MTestDatatype * mtype);
 
 extern int MTestTypeBasicCreate(MPI_Datatype oldtype, MTestDatatype * mtype);
diff --git a/test/mpi/util/mtest_datatype_gen.c b/test/mpi/util/mtest_datatype_gen.c
index 60c53ef..2cd1d2c 100644
--- a/test/mpi/util/mtest_datatype_gen.c
+++ b/test/mpi/util/mtest_datatype_gen.c
@@ -61,35 +61,117 @@ static int verbose = 0;         /* Message level (0 is none) */
  *  Datatype definition:
  *    Every type is initialized by the creation function stored in
  *    mtestDdtCreators variable, all of their create/init/check functions are
- *    defined in file mtest_datatype.c. Following derived datatypes are defined:
- *    Contiguous | Vector | HVector | Indexed | Hindexed | Indexed-block |
- *    Hindexed-block | Struct | Subarray with order-C | Subarray with order-Fortran
+ *    defined in file mtest_datatype.c.
  *
  *  How to add a new derived datatype:
  *    1. Add the new datatype in enum MTEST_DERIVED_DT.
  *    2. Add its create/init/check functions in file mtest_datatype.c
  *    3. Add its creator function to mtestDdtCreators variable
+ *
+ *  Following two datatype generators are defined.
+ *    1. Full datatypes generator:
+ *      All basic datatypes | Vector | Hvector | Indexed | Hindexed |
+ *      Indexed-block | Hindexed-block | Subarray with order-C | Subarray with order-Fortran
+ *    2. Minimum datatypes generator:
+ *      All basic datatypes | Vector | Indexed
+ *
+ *  MPI test can initialize either generator by calling the corresponding init
+ *  function before datatype loop, The full generator is set by default.
+ *    Full generator : MTestInitFullDatatypes
+ *    Minimum generator : MTestInitMinDatatypes
  */
 
 static int datatype_index = 0;
 
+/* ------------------------------------------------------------------------ */
+/* Routine and internal parameters to define the range of datatype tests */
+/* ------------------------------------------------------------------------ */
+#define MTEST_DDT_NUM_SUBTESTS 4        /* 4 kinds of derived datatype structure */
+static MTestDdtCreator mtestDdtCreators[MTEST_DDT_MAX];
 
-#define MTEST_BDT_START_IDX 0
-#define MTEST_BDT_NUM_TESTS (MTEST_BDT_MAX)
-#define MTEST_BDT_RANGE (MTEST_BDT_START_IDX + MTEST_BDT_NUM_TESTS)
+static int MTEST_BDT_START_IDX = -1;
+static int MTEST_BDT_NUM_TESTS = 0;
+static int MTEST_BDT_RANGE = 0;
 
-#define MTEST_DDT_NUM_SUBTESTS 4        /* 4 kinds of derived datatype structure */
-#define MTEST_DDT_NUM_TYPES (MTEST_DDT_MAX)
+static int MTEST_DDT_NUM_TYPES = 0;
+static int MTEST_SEND_DDT_START_IDX = 0;
+static int MTEST_SEND_DDT_NUM_TESTS = 0;
+static int MTEST_SEND_DDT_RANGE = 0;
 
-#define MTEST_SEND_DDT_START_IDX (MTEST_BDT_NUM_TESTS)
-#define MTEST_SEND_DDT_NUM_TESTS (MTEST_DDT_NUM_TYPES * MTEST_DDT_NUM_SUBTESTS)
-#define MTEST_SEND_DDT_RANGE (MTEST_SEND_DDT_START_IDX + MTEST_SEND_DDT_NUM_TESTS)
+static int MTEST_RECV_DDT_START_IDX = 0;
+static int MTEST_RECV_DDT_NUM_TESTS = 0;
+static int MTEST_RECV_DDT_RANGE = 0;
 
-#define MTEST_RECV_DDT_START_IDX (MTEST_SEND_DDT_START_IDX + MTEST_SEND_DDT_NUM_TESTS)
-#define MTEST_RECV_DDT_NUM_TESTS (MTEST_DDT_NUM_TYPES * MTEST_DDT_NUM_SUBTESTS)
-#define MTEST_RECV_DDT_RANGE (MTEST_RECV_DDT_START_IDX + MTEST_RECV_DDT_NUM_TESTS)
+enum {
+    MTEST_DATATYPE_VERSION_FULL,
+    MTEST_DATATYPE_VERSION_MIN
+};
 
-static MTestDdtCreator mtestDdtCreators[MTEST_DDT_MAX];
+static int MTEST_DATATYPE_VERSION = MTEST_DATATYPE_VERSION_FULL;
+
+static void MTestInitDatatypeGen(int basic_dt_num, int derived_dt_num)
+{
+    MTEST_BDT_START_IDX = 0;
+    MTEST_BDT_NUM_TESTS = basic_dt_num;
+    MTEST_BDT_RANGE = MTEST_BDT_START_IDX + MTEST_BDT_NUM_TESTS;
+    MTEST_DDT_NUM_TYPES = derived_dt_num;
+    MTEST_SEND_DDT_START_IDX = MTEST_BDT_NUM_TESTS;
+    MTEST_SEND_DDT_NUM_TESTS = MTEST_DDT_NUM_TYPES * MTEST_DDT_NUM_SUBTESTS;
+    MTEST_SEND_DDT_RANGE = MTEST_SEND_DDT_START_IDX + MTEST_SEND_DDT_NUM_TESTS;
+    MTEST_RECV_DDT_START_IDX = MTEST_SEND_DDT_START_IDX + MTEST_SEND_DDT_NUM_TESTS;
+    MTEST_RECV_DDT_NUM_TESTS = MTEST_DDT_NUM_TYPES * MTEST_DDT_NUM_SUBTESTS;
+    MTEST_RECV_DDT_RANGE = MTEST_RECV_DDT_START_IDX + MTEST_RECV_DDT_NUM_TESTS;
+}
+
+static int MTestIsDatatypeGenInited()
+{
+    return (MTEST_BDT_START_IDX < 0) ? 0 : 1;
+}
+
+static void MTestPrintDatatypeGen()
+{
+    MTestPrintfMsg(1, "MTest datatype version : %s. %d basic datatype tests, "
+                   "%d derived datatype tests will be generated\n",
+                   (MTEST_DATATYPE_VERSION == MTEST_DATATYPE_VERSION_FULL) ? "FULL" : "MIN",
+                   MTEST_BDT_NUM_TESTS, MTEST_SEND_DDT_NUM_TESTS + MTEST_RECV_DDT_NUM_TESTS);
+}
+
+static void MTestResetDatatypeGen()
+{
+    MTEST_BDT_START_IDX = -1;
+}
+
+void MTestInitFullDatatypes()
+{
+    /* Do not allow to change datatype version during loop.
+     * Otherwise indexes will be wrong.
+     * Test must explicitly call reset or wait for current datatype loop being
+     * done before changing to another datatype version. */
+    if (!MTestIsDatatypeGenInited()) {
+        MTEST_DATATYPE_VERSION = MTEST_DATATYPE_VERSION_FULL;
+        MTestTypeCreatorInit((MTestDdtCreator *) mtestDdtCreators);
+        MTestInitDatatypeGen(MTEST_BDT_MAX, MTEST_DDT_MAX);
+    }
+    else {
+        printf("Warning: trying to reinitialize mtest datatype during " "datatype iteration!");
+    }
+}
+
+void MTestInitMinDatatypes()
+{
+    /* Do not allow to change datatype version during loop.
+     * Otherwise indexes will be wrong.
+     * Test must explicitly call reset or wait for current datatype loop being
+     * done before changing to another datatype version. */
+    if (!MTestIsDatatypeGenInited()) {
+        MTEST_DATATYPE_VERSION = MTEST_DATATYPE_VERSION_MIN;
+        MTestTypeMinCreatorInit((MTestDdtCreator *) mtestDdtCreators);
+        MTestInitDatatypeGen(MTEST_BDT_MAX, MTEST_MIN_DDT_MAX);
+    }
+    else {
+        printf("Warning: trying to reinitialize mtest datatype during " "datatype iteration!");
+    }
+}
 
 
 /* -------------------------------------------------------------------------------*/
@@ -325,8 +407,16 @@ int MTestGetDatatypes(MTestDatatype * sendtype, MTestDatatype * recvtype, int to
     MTestGetDbgInfo(&dbgflag, &verbose);
     MPI_Comm_rank(MPI_COMM_WORLD, &wrank);
 
-    MTestTypeCreatorInit((MTestDdtCreator *) mtestDdtCreators);
+    /* Initialize the full version if test does not specify. */
+    if (!MTestIsDatatypeGenInited()) {
+        MTestInitFullDatatypes();
+    }
+
+    if (datatype_index == 0) {
+        MTestPrintDatatypeGen();
+    }
 
+    /* Start generating tests */
     if (datatype_index < MTEST_BDT_RANGE) {
         merr = MTestGetBasicDatatypes(sendtype, recvtype, tot_count);
 
@@ -342,6 +432,7 @@ int MTestGetDatatypes(MTestDatatype * sendtype, MTestDatatype * recvtype, int to
     else {
         /* out of range */
         datatype_index = -1;
+        MTestResetDatatypeGen();
     }
 
     /* stop if error reported */
@@ -356,17 +447,18 @@ int MTestGetDatatypes(MTestDatatype * sendtype, MTestDatatype * recvtype, int to
 
     datatype_index++;
 
-    if ((verbose || dbgflag) && datatype_index > 0) {
+    if (verbose >= 2 && datatype_index > 0) {
         int ssize, rsize;
         const char *sendtype_nm = MTestGetDatatypeName(sendtype);
         const char *recvtype_nm = MTestGetDatatypeName(recvtype);
         MPI_Type_size(sendtype->datatype, &ssize);
         MPI_Type_size(recvtype->datatype, &rsize);
-        printf("Get datatypes: send = %s(size %d count %d basesize %d), "
-               "recv = %s(size %d count %d basesize %d), tot_count=%d\n",
-               sendtype_nm, ssize, sendtype->count, sendtype->basesize,
-               recvtype_nm, rsize, recvtype->count, recvtype->basesize,
-               tot_count);
+
+        MTestPrintfMsg(2, "Get datatypes: send = %s(size %d count %d basesize %d), "
+                       "recv = %s(size %d count %d basesize %d), tot_count=%d\n",
+                       sendtype_nm, ssize, sendtype->count, sendtype->basesize,
+                       recvtype_nm, rsize, recvtype->count, recvtype->basesize,
+                       tot_count);
         fflush(stdout);
     }
 
@@ -380,6 +472,7 @@ int MTestGetDatatypes(MTestDatatype * sendtype, MTestDatatype * recvtype, int to
 void MTestResetDatatypes(void)
 {
     datatype_index = 0;
+    MTestResetDatatypeGen();
 }
 
 /* Return the index of the current datatype.  This is rarely needed and

http://git.mpich.org/mpich.git/commitdiff/34e665cec90942f83dce403ee11a8cfb7f14bde2

commit 34e665cec90942f83dce403ee11a8cfb7f14bde2
Author: Min Si <msi at il.is.s.u-tokyo.ac.jp>
Date:   Fri Oct 24 13:22:37 2014 -0500

    Change count of mtest datatype in bcast3 mpi test.
    
    0 count is not allowed in subarray. Please revert this change if it has
    special reason to set count-1 in the datatype while loop.
    
    Signed-off-by: Wesley Bland <wbland at anl.gov>

diff --git a/test/mpi/coll/bcast3.c b/test/mpi/coll/bcast3.c
index daf8c1c..3457208 100644
--- a/test/mpi/coll/bcast3.c
+++ b/test/mpi/coll/bcast3.c
@@ -34,7 +34,7 @@ int main( int argc, char *argv[] )
 	
 	count = 1;
 	MTEST_DATATYPE_FOR_EACH_COUNT(count) {
-	    while (MTestGetDatatypes( &sendtype, &recvtype, count-1 )) {
+	    while (MTestGetDatatypes( &sendtype, &recvtype, count )) {
 		for (root=0; root<size; root++) {
 		    if (rank == root) {
 			sendtype.InitBuf( &sendtype );

http://git.mpich.org/mpich.git/commitdiff/ccdd417bd2419224a60752d0db39e771832de129

commit ccdd417bd2419224a60752d0db39e771832de129
Author: Min Si <msi at il.is.s.u-tokyo.ac.jp>
Date:   Fri Oct 24 12:35:56 2014 -0500

    Use unified loop of mtest datatype in mpi tests.
    
    Each mpi test originally called the mtest datatype by using a hardcoded
    count loop. Now we predefine a count loop and let the mpi tests call
    this predefined loop instead of hardcoding. This change allows the mtest
    routine to manage the size of generated datatypes in every mpi
    test (i.e., to ensure that every mpi test reaches the large message
    algorithms).
    
    Signed-off-by: Wesley Bland <wbland at anl.gov>

diff --git a/test/mpi/coll/bcast2.c b/test/mpi/coll/bcast2.c
index b2c2f79..2ff6d86 100644
--- a/test/mpi/coll/bcast2.c
+++ b/test/mpi/coll/bcast2.c
@@ -37,9 +37,7 @@ int main( int argc, char *argv[] )
 	   change the error handler to errors return */
 	MPI_Errhandler_set( comm, MPI_ERRORS_RETURN );
 
-	/* The max value of count must be very large to ensure that we 
-	   reach the long message algorithms */
-	for (count = 1; count < 280000; count = count * 4) {
+    MTEST_DATATYPE_FOR_EACH_COUNT(count) {
 	    while (MTestGetDatatypes( &sendtype, &recvtype, count )) {
 		for (root=0; root<size; root++) {
 		    if (rank == root) {
diff --git a/test/mpi/coll/bcast3.c b/test/mpi/coll/bcast3.c
index 84250b2..daf8c1c 100644
--- a/test/mpi/coll/bcast3.c
+++ b/test/mpi/coll/bcast3.c
@@ -33,9 +33,7 @@ int main( int argc, char *argv[] )
 	MPI_Comm_size( comm, &size );
 	
 	count = 1;
-	/* This must be very large to ensure that we reach the long message
-	   algorithms */
-	for (count = 4; count < 66000; count = count * 4) {
+	MTEST_DATATYPE_FOR_EACH_COUNT(count) {
 	    while (MTestGetDatatypes( &sendtype, &recvtype, count-1 )) {
 		for (root=0; root<size; root++) {
 		    if (rank == root) {
diff --git a/test/mpi/include/mpitest.h b/test/mpi/include/mpitest.h
index f31c1bc..cdbbe79 100644
--- a/test/mpi/include/mpitest.h
+++ b/test/mpi/include/mpitest.h
@@ -57,6 +57,12 @@ typedef struct _MTestDatatype {
     int   (*CheckBuf)( struct _MTestDatatype * );
 } MTestDatatype;
 
+/* The max value of count must be very large to ensure that we
+ *  reach the long message algorithms. (The maximal count or block length
+ *  can be generated by 256K count is 4K or 32Kbytes respectively) */
+#define MTEST_DATATYPE_FOR_EACH_COUNT(count) \
+        for (count = 1; count <= 262144; count *= 8)
+
 int MTestCheckRecv( MPI_Status *, MTestDatatype * );
 int MTestGetDatatypes( MTestDatatype *, MTestDatatype *, int );
 void MTestResetDatatypes( void );
diff --git a/test/mpi/pt2pt/sendrecv1.c b/test/mpi/pt2pt/sendrecv1.c
index 13f1dd2..57059eb 100644
--- a/test/mpi/pt2pt/sendrecv1.c
+++ b/test/mpi/pt2pt/sendrecv1.c
@@ -39,7 +39,7 @@ int main( int argc, char *argv[] )
 	   change the error handler to errors return */
 	MPI_Comm_set_errhandler( comm, MPI_ERRORS_RETURN );
 
-	for (count = 1; count < 65000; count = count * 2) {
+	MTEST_DATATYPE_FOR_EACH_COUNT(count) {
 	    while (MTestGetDatatypes( &sendtype, &recvtype, count )) {
 		/* Make sure that everyone has a recv buffer */
 		recvtype.InitBuf( &recvtype );
diff --git a/test/mpi/pt2pt/sendself.c b/test/mpi/pt2pt/sendself.c
index 5286272..6b1e261 100644
--- a/test/mpi/pt2pt/sendself.c
+++ b/test/mpi/pt2pt/sendself.c
@@ -31,7 +31,7 @@ int main( int argc, char *argv[] )
        change the error handler to errors return */
     MPI_Comm_set_errhandler( comm, MPI_ERRORS_RETURN );
     
-    for (count = 1; count < 65000; count = count * 2) {
+    MTEST_DATATYPE_FOR_EACH_COUNT(count) {
 	while (MTestGetDatatypes( &sendtype, &recvtype, count )) {
 	    
 	    sendtype.InitBuf( &sendtype );
diff --git a/test/mpi/rma/accfence1.c b/test/mpi/rma/accfence1.c
index 91d9f43..0ede7ab 100644
--- a/test/mpi/rma/accfence1.c
+++ b/test/mpi/rma/accfence1.c
@@ -36,7 +36,7 @@ int main( int argc, char *argv[] )
 	source = 0;
 	dest   = size - 1;
 	
-	for (count = 1; count < 65000; count = count * 2) {
+	MTEST_DATATYPE_FOR_EACH_COUNT(count) {
 	    while (MTestGetDatatypes( &sendtype, &recvtype, count )) {
 		/* Make sure that everyone has a recv buffer */
 		recvtype.InitBuf( &recvtype );
diff --git a/test/mpi/rma/accpscw1.c b/test/mpi/rma/accpscw1.c
index 4b4976e..893a577 100644
--- a/test/mpi/rma/accpscw1.c
+++ b/test/mpi/rma/accpscw1.c
@@ -37,7 +37,7 @@ int main( int argc, char *argv[] )
 	source = 0;
 	dest   = size - 1;
 	
-	for (count = 1; count < 65000; count = count * 2) {
+	MTEST_DATATYPE_FOR_EACH_COUNT(count) {
 	    while (MTestGetDatatypes( &sendtype, &recvtype, count )) {
 		/* Make sure that everyone has a recv buffer */
 		recvtype.InitBuf( &recvtype );
diff --git a/test/mpi/rma/epochtest.c b/test/mpi/rma/epochtest.c
index 7a3222a..4504454 100644
--- a/test/mpi/rma/epochtest.c
+++ b/test/mpi/rma/epochtest.c
@@ -57,7 +57,7 @@ int main( int argc, char **argv )
 	source = 0;
 	dest   = size - 1;
 	
-	for (count = 1; count < 65000; count = count * 2) {
+	MTEST_DATATYPE_FOR_EACH_COUNT(count) {
 	    while (MTestGetDatatypes( &sendtype, &recvtype, count )) {
 
 		MTestPrintfMsg( 1, 
diff --git a/test/mpi/rma/getfence1.c b/test/mpi/rma/getfence1.c
index 2aaba82..869b143 100644
--- a/test/mpi/rma/getfence1.c
+++ b/test/mpi/rma/getfence1.c
@@ -36,7 +36,7 @@ int main( int argc, char *argv[] )
 	source = 0;
 	dest   = size - 1;
 	
-	for (count = 1; count < 65000; count = count * 2) {
+	MTEST_DATATYPE_FOR_EACH_COUNT(count) {
 	    while (MTestGetDatatypes( &sendtype, &recvtype, count )) {
 		/* Make sure that everyone has a recv buffer */
 		recvtype.InitBuf( &recvtype );
diff --git a/test/mpi/rma/putfence1.c b/test/mpi/rma/putfence1.c
index 1020063..5c6bee4 100644
--- a/test/mpi/rma/putfence1.c
+++ b/test/mpi/rma/putfence1.c
@@ -36,7 +36,7 @@ int main( int argc, char *argv[] )
 	source = 0;
 	dest   = size - 1;
 	
-	for (count = 1; count < 65000; count = count * 2) {
+	MTEST_DATATYPE_FOR_EACH_COUNT(count) {
 	    while (MTestGetDatatypes( &sendtype, &recvtype, count )) {
 
 		MTestPrintfMsg( 1, 
diff --git a/test/mpi/rma/putpscw1.c b/test/mpi/rma/putpscw1.c
index ff18f4c..fb05df5 100644
--- a/test/mpi/rma/putpscw1.c
+++ b/test/mpi/rma/putpscw1.c
@@ -37,7 +37,7 @@ int main( int argc, char *argv[] )
 	source = 0;
 	dest   = size - 1;
 	
-	for (count = 1; count < 65000; count = count * 2) {
+	MTEST_DATATYPE_FOR_EACH_COUNT(count) {
 	    while (MTestGetDatatypes( &sendtype, &recvtype, count )) {
 		/* Make sure that everyone has a recv buffer */
 		recvtype.InitBuf( &recvtype );
diff --git a/test/mpi/template.c b/test/mpi/template.c
index 323586e..801fc5e 100644
--- a/test/mpi/template.c
+++ b/test/mpi/template.c
@@ -38,7 +38,7 @@ int main( int argc, char *argv[] )
 	   change the error handler to errors return */
 	MPI_Comm_set_errhandler( comm, MPI_ERRORS_RETURN );
 	
-	for (count = 1; count < 65000; count = count * 2) {
+	MTEST_DATATYPE_FOR_EACH_COUNT(count) {
 	    while (MTestGetDatatypes( &sendtype, &recvtype, count )) {
 		if (rank == source) {
 		    sendtype.InitBuf( &sendtype );

http://git.mpich.org/mpich.git/commitdiff/d3abd5ae76c8ff88ebb46d11d49c361958c04939

commit d3abd5ae76c8ff88ebb46d11d49c361958c04939
Author: Min Si <msi at il.is.s.u-tokyo.ac.jp>
Date:   Tue Nov 4 14:31:29 2014 -0600

    Add new datatypes in mtest_datatype.
    
    Add hvector, hindexed, indexed_block, hindexed_block, struct,
    subarray with order-c and subarray with order-fortran datatypes.
    
    Signed-off-by: Wesley Bland <wbland at anl.gov>

diff --git a/test/mpi/include/mpitest.h b/test/mpi/include/mpitest.h
index 86a6a54..f31c1bc 100644
--- a/test/mpi/include/mpitest.h
+++ b/test/mpi/include/mpitest.h
@@ -44,11 +44,14 @@ typedef struct _MTestDatatype {
 			       (used by the CheckBuf routines) */
     /* The following is optional data that is used by some of
        the derived datatypes */
-    int  stride, nblock, blksize, *index;
-    /* stride, nelm, and blksize are in bytes */
-    int *displs, *displ_in_bytes, basesize;
-    /* displacements are in multiples of base type; basesize is the
-       size of that type*/
+    int  nblock, *index;
+    /* stride, and blksize are in bytes */
+    MPI_Aint stride, blksize, *displ_in_bytes;
+    int *displs, basesize;
+    MPI_Datatype *old_datatypes;
+    /* used in subarray */
+    int arr_sizes[2], arr_subsizes[2], arr_starts[2], order;
+
     void *(*InitBuf)( struct _MTestDatatype * );
     void *(*FreeBuf)( struct _MTestDatatype * );
     int   (*CheckBuf)( struct _MTestDatatype * );
diff --git a/test/mpi/util/mtest_datatype.c b/test/mpi/util/mtest_datatype.c
index c07aedd..412faa7 100644
--- a/test/mpi/util/mtest_datatype.c
+++ b/test/mpi/util/mtest_datatype.c
@@ -43,10 +43,13 @@ static void *MTestTypeFree(MTestDatatype * mtype)
         free(mtype->displ_in_bytes);
     if (mtype->index)
         free(mtype->index);
+    if (mtype->old_datatypes)
+        free(mtype->old_datatypes);
     mtype->buf = 0;
     mtype->displs = 0;
     mtype->displ_in_bytes = 0;
     mtype->index = 0;
+    mtype->old_datatypes = 0;
 
     return 0;
 }
@@ -63,6 +66,7 @@ static inline void MTestTypeReset(MTestDatatype * mtype)
     mtype->index = 0;
     mtype->displs = 0;
     mtype->displ_in_bytes = 0;
+    mtype->old_datatypes = 0;
 }
 
 /* ------------------------------------------------------------------------ */
@@ -248,12 +252,12 @@ static int MTestTypeVectorCheckbuf(MTestDatatype * mtype)
  */
 static void *MTestTypeIndexedInit(MTestDatatype * mtype)
 {
-    MPI_Aint size = 0, totsize;
+    MPI_Aint size = 0, totsize, dt_offset, offset;
     int merr;
 
     if (mtype->count > 0) {
         unsigned char *p;
-        int i, j, k, b, nc, offset, dt_offset;
+        int i, j, k, b, nc;
 
         /* Allocate buffer */
         merr = MPI_Type_extent(mtype->datatype, &size);
@@ -314,11 +318,11 @@ static int MTestTypeIndexedCheckbuf(MTestDatatype * mtype)
     unsigned char *p;
     unsigned char expected;
     int err = 0, merr;
-    MPI_Aint size = 0;
+    MPI_Aint size = 0, offset, dt_offset;
 
     p = (unsigned char *) mtype->buf;
     if (p) {
-        int i, j, k, b, nc, offset, dt_offset;
+        int i, j, k, b, nc;
         merr = MPI_Type_extent(mtype->datatype, &size);
         if (merr)
             MTestPrintError(merr);
@@ -353,6 +357,240 @@ static int MTestTypeIndexedCheckbuf(MTestDatatype * mtype)
     return err;
 }
 
+/* ------------------------------------------------------------------------ */
+/* Datatype routines for indexed-block datatypes                            */
+/* ------------------------------------------------------------------------ */
+
+/*
+ * Initialize buffer of indexed-block datatype
+ */
+static void *MTestTypeIndexedBlockInit(MTestDatatype * mtype)
+{
+    MPI_Aint size = 0, totsize, offset, dt_offset;
+    int merr;
+
+    if (mtype->count > 0) {
+        unsigned char *p;
+        int i, k, j, nc;
+
+        /* Allocate the send/recv buffer */
+        merr = MPI_Type_extent(mtype->datatype, &size);
+        if (merr)
+            MTestPrintError(merr);
+        totsize = size * mtype->count;
+
+        if (!mtype->buf) {
+            mtype->buf = (void *) malloc(totsize);
+        }
+        p = (unsigned char *) (mtype->buf);
+        if (!p) {
+            char errmsg[128] = { 0 };
+            sprintf(errmsg, "Out of memory in %s", __FUNCTION__);
+            MTestError(errmsg);
+        }
+
+        /* First, set to -1 */
+        for (i = 0; i < totsize; i++)
+            p[i] = 0xff;
+
+        /* Now, set the actual elements to the successive values.
+         * We require that the base type is a contiguous type */
+        nc = 0;
+        dt_offset = 0;
+        /* For each datatype */
+        for (k = 0; k < mtype->count; k++) {
+            /* For each block */
+            for (i = 0; i < mtype->nblock; i++) {
+                offset = dt_offset + mtype->displ_in_bytes[i];
+                /* For each byte in the block */
+                for (j = 0; j < mtype->blksize; j++) {
+                    p[offset + j] = (unsigned char) (0xff ^ (nc++ & 0xff));
+                }
+            }
+            dt_offset += size;
+        }
+    }
+    else {
+        /* count == 0 */
+        if (mtype->buf) {
+            free(mtype->buf);
+        }
+        mtype->buf = 0;
+    }
+    return mtype->buf;
+}
+
+/*
+ * Check value of received indexed-block datatype buffer
+ */
+static int MTestTypeIndexedBlockCheckbuf(MTestDatatype * mtype)
+{
+    unsigned char *p;
+    unsigned char expected;
+    int err = 0, merr;
+    MPI_Aint size = 0, offset, dt_offset;
+
+    p = (unsigned char *) mtype->buf;
+    if (p) {
+        int i, j, k, nc;
+        merr = MPI_Type_extent(mtype->datatype, &size);
+        if (merr)
+            MTestPrintError(merr);
+
+        nc = 0;
+        dt_offset = 0;
+        /* For each datatype */
+        for (k = 0; k < mtype->count; k++) {
+            /* For each block */
+            for (i = 0; i < mtype->nblock; i++) {
+                offset = dt_offset + mtype->displ_in_bytes[i];
+                /* For each byte in the block */
+                for (j = 0; j < mtype->blksize; j++) {
+                    expected = (unsigned char) (0xff ^ (nc++ & 0xff));
+                    if (p[offset + j] != expected) {
+                        err++;
+                        if (mtype->printErrors && err < 10) {
+                            printf("Data expected = %x but got p[%d,%d] = %x\n",
+                                   expected, i, j, p[offset + j]);
+                            fflush(stdout);
+                        }
+                    }
+                }
+            }
+            dt_offset += size;
+        }
+    }
+    return err;
+}
+
+/* ------------------------------------------------------------------------ */
+/* Datatype routines for subarray datatypes with order Fortran              */
+/* ------------------------------------------------------------------------ */
+
+/*
+ * Initialize buffer of subarray datatype.
+ */
+static void *MTestTypeSubarrayInit(MTestDatatype * mtype)
+{
+    MPI_Aint size = 0, totsize, offset, dt_offset, byte_offset;
+    int merr;
+
+    if (mtype->count > 0) {
+        unsigned char *p;
+        int i, k, j, b, nc;
+
+        /* Allocate the send/recv buffer */
+        merr = MPI_Type_extent(mtype->datatype, &size);
+        if (merr)
+            MTestPrintError(merr);
+        totsize = size * mtype->count;
+
+        if (!mtype->buf) {
+            mtype->buf = (void *) malloc(totsize);
+        }
+        p = (unsigned char *) (mtype->buf);
+        if (!p) {
+            char errmsg[128] = { 0 };
+            sprintf(errmsg, "Out of memory in %s", __FUNCTION__);
+            MTestError(errmsg);
+        }
+
+        /* First, set to -1 */
+        for (i = 0; i < totsize; i++)
+            p[i] = 0xff;
+
+        /* Now, set the actual elements to the successive values.
+         * We require that the base type is a contiguous type. */
+        int ncol, sub_ncol, sub_nrow, sub_col_start, sub_row_start;
+        ncol = mtype->arr_sizes[1];
+        sub_nrow = mtype->arr_subsizes[0];
+        sub_ncol = mtype->arr_subsizes[1];
+        sub_row_start = mtype->arr_starts[0];
+        sub_col_start = mtype->arr_starts[1];
+
+        nc = 0;
+        dt_offset = 0;
+        /* For each datatype */
+        for (k = 0; k < mtype->count; k++) {
+            /* For each row */
+            for (i = 0; i < sub_nrow; i++) {
+                offset = (sub_row_start + i) * ncol + sub_col_start;
+                /* For each element in row */
+                for (j = 0; j < sub_ncol; j++) {
+                    byte_offset = dt_offset + (offset + j) * mtype->basesize;
+                    /* For each byte in element */
+                    for (b = 0; b < mtype->basesize; b++)
+                        p[byte_offset + b] = (unsigned char) (0xff ^ (nc++ & 0xff));
+                }
+            }
+            dt_offset += size;
+        }
+    }
+    else {
+        /* count == 0 */
+        if (mtype->buf) {
+            free(mtype->buf);
+        }
+        mtype->buf = 0;
+    }
+    return mtype->buf;
+}
+
+/*
+ * Check value of received subarray datatype buffer
+ */
+static int MTestTypeSubarrayCheckbuf(MTestDatatype * mtype)
+{
+    unsigned char *p;
+    unsigned char expected;
+    int err = 0, merr;
+    MPI_Aint size, offset, dt_offset, byte_offset;
+
+    p = (unsigned char *) mtype->buf;
+    if (p) {
+        int j, k, i, b, nc;
+        merr = MPI_Type_extent(mtype->datatype, &size);
+        if (merr)
+            MTestPrintError(merr);
+
+        int ncol, sub_ncol, sub_nrow, sub_col_start, sub_row_start;
+        ncol = mtype->arr_sizes[1];
+        sub_nrow = mtype->arr_subsizes[0];
+        sub_ncol = mtype->arr_subsizes[1];
+        sub_row_start = mtype->arr_starts[0];
+        sub_col_start = mtype->arr_starts[1];
+
+        nc = 0;
+        dt_offset = 0;
+        /* For each datatype */
+        for (k = 0; k < mtype->count; k++) {
+            /* For each row */
+            for (i = 0; i < sub_nrow; i++) {
+                offset = (sub_row_start + i) * ncol + sub_col_start;
+                /* For each element in row */
+                for (j = 0; j < sub_ncol; j++) {
+                    byte_offset = dt_offset + (offset + j) * mtype->basesize;
+                    /* For each byte in element */
+                    for (b = 0; b < mtype->basesize; b++) {
+                        expected = (unsigned char) (0xff ^ (nc++ & 0xff));
+                        if (p[byte_offset + b] != expected) {
+                            err++;
+                            if (mtype->printErrors && err < 10) {
+                                printf("Data expected = %x but got p[%d,%d,%d] = %x\n",
+                                       expected, i, j, b, p[byte_offset + b]);
+                                fflush(stdout);
+                            }
+                        }
+                    }
+                }
+            }
+            dt_offset += size;
+        }
+    }
+    if (err)
+        printf("%s error\n", __FUNCTION__);
+    return err;
+}
 
 /* ------------------------------------------------------------------------ */
 /* Datatype creators                                                      */
@@ -451,6 +689,56 @@ static int MTestTypeVectorCreate(int nblock, int blocklen, int stride,
 }
 
 /*
+ * Setup hvector type info and handlers.
+ *
+ * A hvector datatype is created by using following parameters.
+ * nblock:   Number of blocks.
+ * blocklen: Number of elements in each block.
+ * stride:   Strided number of elements between blocks.
+ * oldtype:  Datatype of element.
+ */
+static int MTestTypeHvectorCreate(int nblock, int blocklen, int stride,
+                                  MPI_Datatype oldtype, const char *typename_prefix,
+                                  MTestDatatype * mtype)
+{
+    int merr;
+    char type_name[128];
+
+    MTestTypeReset(mtype);
+
+    merr = MPI_Type_size(oldtype, &mtype->basesize);
+    if (merr)
+        MTestPrintError(merr);
+
+    /* These sizes are in bytes (see the VectorInit code) */
+    mtype->stride = stride * mtype->basesize;
+    mtype->blksize = blocklen * mtype->basesize;
+    mtype->nblock = nblock;
+
+    /* Hvector uses stride in bytes */
+    merr = MPI_Type_create_hvector(nblock, blocklen, mtype->stride, oldtype, &mtype->datatype);
+    if (merr)
+        MTestPrintError(merr);
+    merr = MPI_Type_commit(&mtype->datatype);
+    if (merr)
+        MTestPrintError(merr);
+
+    memset(type_name, 0, sizeof(type_name));
+    sprintf(type_name, "%s %s (%d nblock %d blocklen %d stride)", typename_prefix, "hvector",
+            nblock, blocklen, stride);
+    merr = MPI_Type_set_name(mtype->datatype, (char *) type_name);
+    if (merr)
+        MTestPrintError(merr);
+
+    /* User the same functions as vector, because mtype->stride is in bytes */
+    mtype->InitBuf = MTestTypeVectorInit;
+    mtype->FreeBuf = MTestTypeFree;
+    mtype->CheckBuf = MTestTypeVectorCheckbuf;
+
+    return merr;
+}
+
+/*
  * Setup indexed type info and handlers.
  *
  * A indexed datatype is created by using following parameters.
@@ -475,7 +763,7 @@ static int MTestTypeIndexedCreate(int nblock, int blocklen, int stride,
         MTestPrintError(merr);
 
     mtype->displs = (int *) malloc(nblock * sizeof(int));
-    mtype->displ_in_bytes = (int *) malloc(nblock * sizeof(int));
+    mtype->displ_in_bytes = (MPI_Aint *) malloc(nblock * sizeof(MPI_Aint));
     mtype->index = (int *) malloc(nblock * sizeof(int));
     if (!mtype->displs || !mtype->displ_in_bytes || !mtype->index) {
         char errmsg[128] = { 0 };
@@ -513,6 +801,381 @@ static int MTestTypeIndexedCreate(int nblock, int blocklen, int stride,
 }
 
 /*
+ * Setup hindexed type info and handlers.
+ *
+ * A hindexed datatype is created by using following parameters.
+ * nblock:   Number of blocks.
+ * blocklen: Number of elements in each block. Each block has the same length.
+ * stride:   Strided number of elements between two adjacent blocks. The byte
+ *           displacement of each block is set as (index of current block * stride * size of oldtype).
+ * oldtype:  Datatype of element.
+ */
+static inline int MTestTypeHindexedCreate(int nblock, int blocklen, int stride,
+                                          MPI_Datatype oldtype, const char *typename_prefix,
+                                          MTestDatatype * mtype)
+{
+    int merr;
+    char type_name[128];
+    int i;
+
+    MTestTypeReset(mtype);
+
+    merr = MPI_Type_size(oldtype, &mtype->basesize);
+    if (merr)
+        MTestPrintError(merr);
+
+    mtype->index = (int *) malloc(nblock * sizeof(int));
+    mtype->displ_in_bytes = (MPI_Aint *) malloc(nblock * sizeof(MPI_Aint));
+    if (!mtype->displ_in_bytes || !mtype->index) {
+        char errmsg[128] = { 0 };
+        sprintf(errmsg, "Out of memory in %s", __FUNCTION__);
+        MTestError(errmsg);
+    }
+
+    mtype->nblock = nblock;
+    for (i = 0; i < nblock; i++) {
+        mtype->index[i] = blocklen;
+        mtype->displ_in_bytes[i] = stride * i * mtype->basesize;
+    }
+
+    /* Hindexed uses displacement in bytes */
+    merr = MPI_Type_create_hindexed(nblock, mtype->index, mtype->displ_in_bytes,
+                                    oldtype, &mtype->datatype);
+    if (merr)
+        MTestPrintError(merr);
+    merr = MPI_Type_commit(&mtype->datatype);
+    if (merr)
+        MTestPrintError(merr);
+
+    memset(type_name, 0, sizeof(type_name));
+    sprintf(type_name, "%s %s (%d nblock %d blocklen %d stride)", typename_prefix, "hindex", nblock,
+            blocklen, stride);
+    merr = MPI_Type_set_name(mtype->datatype, (char *) type_name);
+    if (merr)
+        MTestPrintError(merr);
+
+    /* Reuse indexed functions, because all of them only use displ_in_bytes */
+    mtype->InitBuf = MTestTypeIndexedInit;
+    mtype->FreeBuf = MTestTypeFree;
+    mtype->CheckBuf = MTestTypeIndexedCheckbuf;
+
+    return merr;
+}
+
+
+/*
+ * Setup indexed-block type info and handlers.
+ *
+ * A indexed-block datatype is created by using following parameters.
+ * nblock:   Number of blocks.
+ * blocklen: Number of elements in each block.
+ * stride:   Strided number of elements between two adjacent blocks. The
+ *           displacement of each block is set as (index of current block * stride).
+ * oldtype:  Datatype of element.
+ */
+static int MTestTypeIndexedBlockCreate(int nblock, int blocklen, int stride,
+                                       MPI_Datatype oldtype, const char *typename_prefix,
+                                       MTestDatatype * mtype)
+{
+    int merr;
+    char type_name[128];
+    int i;
+
+    MTestTypeReset(mtype);
+
+    merr = MPI_Type_size(oldtype, &mtype->basesize);
+    if (merr)
+        MTestPrintError(merr);
+
+    mtype->displs = (int *) malloc(nblock * sizeof(int));
+    mtype->displ_in_bytes = (MPI_Aint *) malloc(nblock * sizeof(MPI_Aint));
+    if (!mtype->displs || !mtype->displ_in_bytes) {
+        char errmsg[128] = { 0 };
+        sprintf(errmsg, "Out of memory in %s", __FUNCTION__);
+        MTestError(errmsg);
+    }
+
+    mtype->nblock = nblock;
+    mtype->blksize = blocklen * mtype->basesize;
+    for (i = 0; i < nblock; i++) {
+        mtype->displs[i] = stride * i;
+        mtype->displ_in_bytes[i] = stride * i * mtype->basesize;
+    }
+
+    /* Indexed-block uses displacement in oldtypes */
+    merr = MPI_Type_create_indexed_block(nblock, blocklen, mtype->displs,
+                                         oldtype, &mtype->datatype);
+    if (merr)
+        MTestPrintError(merr);
+    merr = MPI_Type_commit(&mtype->datatype);
+    if (merr)
+        MTestPrintError(merr);
+
+    memset(type_name, 0, sizeof(type_name));
+    sprintf(type_name, "%s %s (%d nblock %d blocklen %d stride)", typename_prefix, "index_block",
+            nblock, blocklen, stride);
+    merr = MPI_Type_set_name(mtype->datatype, (char *) type_name);
+    if (merr)
+        MTestPrintError(merr);
+
+    mtype->InitBuf = MTestTypeIndexedBlockInit;
+    mtype->FreeBuf = MTestTypeFree;
+    mtype->CheckBuf = MTestTypeIndexedBlockCheckbuf;
+
+    return merr;
+}
+
+/*
+ * Setup hindexed-block type info and handlers.
+ *
+ * A hindexed-block datatype is created by using following parameters.
+ * nblock:   Number of blocks.
+ * blocklen: Number of elements in each block.
+ * stride:   Strided number of elements between two adjacent blocks. The byte
+ *           displacement of each block is set as (index of current block * stride * size of oldtype).
+ * oldtype:  Datatype of element.
+ */
+static int MTestTypeHindexedBlockCreate(int nblock, int blocklen, int stride,
+                                        MPI_Datatype oldtype, const char *typename_prefix,
+                                        MTestDatatype * mtype)
+{
+    int merr;
+    char type_name[128];
+    int i;
+
+    MTestTypeReset(mtype);
+
+    merr = MPI_Type_size(oldtype, &mtype->basesize);
+    if (merr)
+        MTestPrintError(merr);
+
+    mtype->displ_in_bytes = (MPI_Aint *) malloc(nblock * sizeof(MPI_Aint));
+    if (!mtype->displ_in_bytes) {
+        char errmsg[128] = { 0 };
+        sprintf(errmsg, "Out of memory in %s", __FUNCTION__);
+        MTestError(errmsg);
+    }
+
+    mtype->nblock = nblock;
+    mtype->blksize = blocklen * mtype->basesize;
+    for (i = 0; i < nblock; i++) {
+        mtype->displ_in_bytes[i] = stride * i * mtype->basesize;
+    }
+
+    /* Hindexed-block uses displacement in bytes */
+    merr = MPI_Type_create_hindexed_block(nblock, blocklen, mtype->displ_in_bytes,
+                                          oldtype, &mtype->datatype);
+    if (merr)
+        MTestPrintError(merr);
+    merr = MPI_Type_commit(&mtype->datatype);
+    if (merr)
+        MTestPrintError(merr);
+
+    memset(type_name, 0, sizeof(type_name));
+    sprintf(type_name, "%s %s (%d nblock %d blocklen %d stride)", typename_prefix, "hindex_block",
+            nblock, blocklen, stride);
+    merr = MPI_Type_set_name(mtype->datatype, (char *) type_name);
+    if (merr)
+        MTestPrintError(merr);
+
+    /* Reuse indexed-block functions, because all of them only use displ_in_bytes */
+    mtype->InitBuf = MTestTypeIndexedBlockInit;
+    mtype->FreeBuf = MTestTypeFree;
+    mtype->CheckBuf = MTestTypeIndexedBlockCheckbuf;
+
+    return merr;
+}
+
+/*
+ * Setup struct type info and handlers.
+ *
+ * A struct datatype is created by using following parameters.
+ * nblock:   Number of blocks.
+ * blocklen: Number of elements in each block. Each block has the same length.
+ * stride:   Strided number of elements between two adjacent blocks. The byte
+ *           displacement of each block is set as (index of current block * stride * size of oldtype).
+ * oldtype:  Datatype of element. Each block has the same oldtype.
+ */
+static int MTestTypeStructCreate(int nblock, int blocklen, int stride,
+                                 MPI_Datatype oldtype, const char *typename_prefix,
+                                 MTestDatatype * mtype)
+{
+    int merr;
+    char type_name[128];
+    int i;
+
+    MTestTypeReset(mtype);
+
+    merr = MPI_Type_size(oldtype, &mtype->basesize);
+    if (merr)
+        MTestPrintError(merr);
+
+    mtype->old_datatypes = (MPI_Datatype *) malloc(nblock * sizeof(MPI_Datatype));
+    mtype->displ_in_bytes = (MPI_Aint *) malloc(nblock * sizeof(MPI_Aint));
+    mtype->index = (int *) malloc(nblock * sizeof(int));
+    if (!mtype->displ_in_bytes || !mtype->old_datatypes) {
+        char errmsg[128] = { 0 };
+        sprintf(errmsg, "Out of memory in %s", __FUNCTION__);
+        MTestError(errmsg);
+    }
+
+    mtype->nblock = nblock;
+    mtype->blksize = blocklen * mtype->basesize;
+    for (i = 0; i < nblock; i++) {
+        mtype->displ_in_bytes[i] = stride * i * mtype->basesize;
+        mtype->old_datatypes[i] = oldtype;
+        mtype->index[i] = blocklen;
+    }
+
+    /* Struct uses displacement in bytes */
+    merr = MPI_Type_create_struct(nblock, mtype->index, mtype->displ_in_bytes,
+                                  mtype->old_datatypes, &mtype->datatype);
+    if (merr)
+        MTestPrintError(merr);
+    merr = MPI_Type_commit(&mtype->datatype);
+    if (merr)
+        MTestPrintError(merr);
+
+    memset(type_name, 0, sizeof(type_name));
+    sprintf(type_name, "%s %s (%d nblock %d blocklen %d stride)", typename_prefix, "struct",
+            nblock, blocklen, stride);
+    merr = MPI_Type_set_name(mtype->datatype, (char *) type_name);
+    if (merr)
+        MTestPrintError(merr);
+
+    /* Reuse indexed functions, because they use the same displ_in_bytes and index */
+    mtype->InitBuf = MTestTypeIndexedInit;
+    mtype->FreeBuf = MTestTypeFree;
+    mtype->CheckBuf = MTestTypeIndexedCheckbuf;
+
+    return merr;
+}
+
+/*
+ * Setup order-C subarray type info and handlers.
+ *
+ * A 2D-subarray datatype specified with order C and located in the left-middle
+ * of the full array is created by using input parameters.
+ * Number of elements in the dimensions of the full array: {nblock + 2, stride}
+ * Number of elements in the dimensions of the subarray: {nblock, blocklen}
+ * Starting of the subarray in each dimension: {1, stride - blocklen}
+ * order: MPI_ORDER_C
+ * oldtype: oldtype
+ */
+static int MTestTypeSubArrayOrderCCreate(int nblock, int blocklen, int stride,
+                                         MPI_Datatype oldtype, const char *typename_prefix,
+                                         MTestDatatype * mtype)
+{
+    int merr;
+    char type_name[128];
+
+    MTestTypeReset(mtype);
+
+    merr = MPI_Type_size(oldtype, &mtype->basesize);
+    if (merr)
+        MTestPrintError(merr);
+
+    mtype->arr_sizes[0] = nblock + 2;   /* {row, col} */
+    mtype->arr_sizes[1] = stride;
+    mtype->arr_subsizes[0] = nblock;    /* {row, col} */
+    mtype->arr_subsizes[1] = blocklen;
+    mtype->arr_starts[0] = 1;   /* {row, col} */
+    mtype->arr_starts[1] = stride - blocklen;
+    mtype->order = MPI_ORDER_C;
+
+    merr = MPI_Type_create_subarray(2, mtype->arr_sizes, mtype->arr_subsizes, mtype->arr_starts,
+                                    mtype->order, oldtype, &mtype->datatype);
+    if (merr)
+        MTestPrintError(merr);
+    merr = MPI_Type_commit(&mtype->datatype);
+    if (merr)
+        MTestPrintError(merr);
+
+    memset(type_name, 0, sizeof(type_name));
+    sprintf(type_name, "%s %s (full{%d,%d}, sub{%d,%d},start{%d,%d})",
+            typename_prefix, "subarray-c", mtype->arr_sizes[0], mtype->arr_sizes[1],
+            mtype->arr_subsizes[0], mtype->arr_subsizes[1], mtype->arr_starts[0],
+            mtype->arr_starts[1]);
+    merr = MPI_Type_set_name(mtype->datatype, (char *) type_name);
+    if (merr)
+        MTestPrintError(merr);
+
+    mtype->InitBuf = MTestTypeSubarrayInit;
+    mtype->FreeBuf = MTestTypeFree;
+    mtype->CheckBuf = MTestTypeSubarrayCheckbuf;
+
+    return merr;
+}
+
+
+/*
+ * Setup order-Fortran subarray type info and handlers.
+ *
+ * A 2D-subarray datatype specified with order Fortran and located in the middle
+ * bottom of the full array is created by using input parameters.
+ * Number of elements in the dimensions of the full array: {stride, nblock + 2}
+ * Number of elements in the dimensions of the subarray: {blocklen, nblock}
+ * Starting of the subarray in each dimension: {stride - blocklen, 1}
+ * order: MPI_ORDER_FORTRAN
+ * oldtype: oldtype
+ */
+static int MTestTypeSubArrayOrderFortranCreate(int nblock, int blocklen, int stride,
+                                               MPI_Datatype oldtype, const char *typename_prefix,
+                                               MTestDatatype * mtype)
+{
+    int merr;
+    char type_name[128];
+
+    MTestTypeReset(mtype);
+
+    merr = MPI_Type_size(oldtype, &mtype->basesize);
+    if (merr)
+        MTestPrintError(merr);
+
+    /* use the same row and col as that of order-c subarray for buffer
+     * initialization and check because we access buffer in order-c */
+    mtype->arr_sizes[0] = nblock + 2;   /* {row, col} */
+    mtype->arr_sizes[1] = stride;
+    mtype->arr_subsizes[0] = nblock;    /* {row, col} */
+    mtype->arr_subsizes[1] = blocklen;
+    mtype->arr_starts[0] = 1;   /* {row, col} */
+    mtype->arr_starts[1] = stride - blocklen;
+    mtype->order = MPI_ORDER_FORTRAN;
+
+    /* reverse row and col when create datatype so that we can get the same
+     * packed data on the other side in order to reuse the contig check function */
+    int arr_sizes[2] = { mtype->arr_sizes[1], mtype->arr_sizes[0] };
+    int arr_subsizes[2] = { mtype->arr_subsizes[1], mtype->arr_subsizes[0] };
+    int arr_starts[2] = { mtype->arr_starts[1], mtype->arr_starts[0] };
+
+    merr = MPI_Type_create_subarray(2, arr_sizes, arr_subsizes, arr_starts,
+                                    mtype->order, oldtype, &mtype->datatype);
+    if (merr)
+        MTestPrintError(merr);
+    merr = MPI_Type_commit(&mtype->datatype);
+    if (merr)
+        MTestPrintError(merr);
+
+    memset(type_name, 0, sizeof(type_name));
+    sprintf(type_name, "%s %s (full{%d,%d}, sub{%d,%d},start{%d,%d})",
+            typename_prefix, "subarray-f", arr_sizes[0], arr_sizes[1],
+            arr_subsizes[0], arr_subsizes[1], arr_starts[0], arr_starts[1]);
+    merr = MPI_Type_set_name(mtype->datatype, (char *) type_name);
+    if (merr)
+        MTestPrintError(merr);
+
+    mtype->InitBuf = MTestTypeSubarrayInit;
+    mtype->FreeBuf = MTestTypeFree;
+    mtype->CheckBuf = MTestTypeSubarrayCheckbuf;
+
+    return merr;
+}
+
+/* ------------------------------------------------------------------------ */
+/* Datatype routines exposed to test generator                             */
+/* ------------------------------------------------------------------------ */
+
+/*
  * Setup basic type info and handlers.
  */
 int MTestTypeBasicCreate(MPI_Datatype oldtype, MTestDatatype * mtype)
@@ -609,5 +1272,12 @@ void MTestTypeCreatorInit(MTestDdtCreator * creators)
     memset(creators, 0, sizeof(MTestDdtCreator) * MTEST_DDT_MAX);
     creators[MTEST_DDT_CONTIGUOUS] = MTestTypeContiguousCreate;
     creators[MTEST_DDT_VECTOR] = MTestTypeVectorCreate;
+    creators[MTEST_DDT_HVECTOR] = MTestTypeHvectorCreate;
     creators[MTEST_DDT_INDEXED] = MTestTypeIndexedCreate;
+    creators[MTEST_DDT_HINDEXED] = MTestTypeHindexedCreate;
+    creators[MTEST_DDT_INDEXED_BLOCK] = MTestTypeIndexedBlockCreate;
+    creators[MTEST_DDT_HINDEXED_BLOCK] = MTestTypeHindexedBlockCreate;
+    creators[MTEST_DDT_STRUCT] = MTestTypeStructCreate;
+    creators[MTEST_DDT_SUBARRAY_ORDER_C] = MTestTypeSubArrayOrderCCreate;
+    creators[MTEST_DDT_SUBARRAY_ORDER_FORTRAN] = MTestTypeSubArrayOrderFortranCreate;
 }
diff --git a/test/mpi/util/mtest_datatype.h b/test/mpi/util/mtest_datatype.h
index 48fa96d..f4c6828 100644
--- a/test/mpi/util/mtest_datatype.h
+++ b/test/mpi/util/mtest_datatype.h
@@ -33,7 +33,14 @@ enum MTEST_BASIC_DT {
 enum MTEST_DERIVED_DT {
     MTEST_DDT_CONTIGUOUS,
     MTEST_DDT_VECTOR,
+    MTEST_DDT_HVECTOR,
     MTEST_DDT_INDEXED,
+    MTEST_DDT_HINDEXED,
+    MTEST_DDT_INDEXED_BLOCK,
+    MTEST_DDT_HINDEXED_BLOCK,
+    MTEST_DDT_STRUCT,
+    MTEST_DDT_SUBARRAY_ORDER_C,
+    MTEST_DDT_SUBARRAY_ORDER_FORTRAN,
     MTEST_DDT_MAX
 };
 
diff --git a/test/mpi/util/mtest_datatype_gen.c b/test/mpi/util/mtest_datatype_gen.c
index a19247b..60c53ef 100644
--- a/test/mpi/util/mtest_datatype_gen.c
+++ b/test/mpi/util/mtest_datatype_gen.c
@@ -62,7 +62,8 @@ static int verbose = 0;         /* Message level (0 is none) */
  *    Every type is initialized by the creation function stored in
  *    mtestDdtCreators variable, all of their create/init/check functions are
  *    defined in file mtest_datatype.c. Following derived datatypes are defined:
- *    Contiguous | Vector | Indexed
+ *    Contiguous | Vector | HVector | Indexed | Hindexed | Indexed-block |
+ *    Hindexed-block | Struct | Subarray with order-C | Subarray with order-Fortran
  *
  *  How to add a new derived datatype:
  *    1. Add the new datatype in enum MTEST_DERIVED_DT.

http://git.mpich.org/mpich.git/commitdiff/717dde2ac9014f835295872dcba2a553ad7bd952

commit 717dde2ac9014f835295872dcba2a553ad7bd952
Author: Min Si <msi at il.is.s.u-tokyo.ac.jp>
Date:   Tue Nov 4 14:30:38 2014 -0600

    Move datatype reset to mtest_datatype.c
    
    Signed-off-by: Wesley Bland <wbland at anl.gov>

diff --git a/test/mpi/util/mtest_datatype.c b/test/mpi/util/mtest_datatype.c
index 596e979..c07aedd 100644
--- a/test/mpi/util/mtest_datatype.c
+++ b/test/mpi/util/mtest_datatype.c
@@ -51,6 +51,19 @@ static void *MTestTypeFree(MTestDatatype * mtype)
     return 0;
 }
 
+static inline void MTestTypeReset(MTestDatatype * mtype)
+{
+    mtype->InitBuf = 0;
+    mtype->FreeBuf = 0;
+    mtype->CheckBuf = 0;
+    mtype->datatype = 0;
+    mtype->isBasic = 0;
+    mtype->printErrors = 0;
+    mtype->buf = 0;
+    mtype->index = 0;
+    mtype->displs = 0;
+    mtype->displ_in_bytes = 0;
+}
 
 /* ------------------------------------------------------------------------ */
 /* Datatype routines for contiguous datatypes                               */
@@ -361,6 +374,8 @@ static int MTestTypeContiguousCreate(int nblock, int blocklen, int stride,
     int merr = 0;
     char type_name[128];
 
+    MTestTypeReset(mtype);
+
     merr = MPI_Type_size(oldtype, &mtype->basesize);
     if (merr)
         MTestPrintError(merr);
@@ -403,6 +418,8 @@ static int MTestTypeVectorCreate(int nblock, int blocklen, int stride,
     int merr = 0;
     char type_name[128];
 
+    MTestTypeReset(mtype);
+
     merr = MPI_Type_size(oldtype, &mtype->basesize);
     if (merr)
         MTestPrintError(merr);
@@ -451,6 +468,8 @@ static int MTestTypeIndexedCreate(int nblock, int blocklen, int stride,
     char type_name[128];
     int i;
 
+    MTestTypeReset(mtype);
+
     merr = MPI_Type_size(oldtype, &mtype->basesize);
     if (merr)
         MTestPrintError(merr);
@@ -500,6 +519,8 @@ int MTestTypeBasicCreate(MPI_Datatype oldtype, MTestDatatype * mtype)
 {
     int merr = 0;
 
+    MTestTypeReset(mtype);
+
     merr = MPI_Type_size(oldtype, &mtype->basesize);
     if (merr)
         MTestPrintError(merr);
@@ -523,6 +544,8 @@ int MTestTypeDupCreate(MPI_Datatype oldtype, MTestDatatype * mtype)
 {
     int merr = 0;
 
+    MTestTypeReset(mtype);
+
     merr = MPI_Type_size(oldtype, &mtype->basesize);
     if (merr)
         MTestPrintError(merr);
diff --git a/test/mpi/util/mtest_datatype_gen.c b/test/mpi/util/mtest_datatype_gen.c
index 79fb1f9..a19247b 100644
--- a/test/mpi/util/mtest_datatype_gen.c
+++ b/test/mpi/util/mtest_datatype_gen.c
@@ -314,18 +314,6 @@ static inline int MTestGetRecvDerivedDatatypes(MTestDatatype * sendtype,
     return merr;
 }
 
-static inline void MTestResetDatatype(MTestDatatype * mtype)
-{
-    mtype->InitBuf = 0;
-    mtype->FreeBuf = 0;
-    mtype->CheckBuf = 0;
-    mtype->datatype = 0;
-    mtype->isBasic = 0;
-    mtype->printErrors = 0;
-    mtype->buf = 0;
-}
-
-
 /* ------------------------------------------------------------------------ */
 /* Exposed routine to external tests                                         */
 /* ------------------------------------------------------------------------ */
@@ -336,9 +324,6 @@ int MTestGetDatatypes(MTestDatatype * sendtype, MTestDatatype * recvtype, int to
     MTestGetDbgInfo(&dbgflag, &verbose);
     MPI_Comm_rank(MPI_COMM_WORLD, &wrank);
 
-    MTestResetDatatype(sendtype);
-    MTestResetDatatype(recvtype);
-
     MTestTypeCreatorInit((MTestDdtCreator *) mtestDdtCreators);
 
     if (datatype_index < MTEST_BDT_RANGE) {

http://git.mpich.org/mpich.git/commitdiff/7e965650d88d59ac484d89974fe97a0ab793e9f5

commit 7e965650d88d59ac484d89974fe97a0ab793e9f5
Author: Min Si <msi at il.is.s.u-tokyo.ac.jp>
Date:   Mon Nov 3 14:53:47 2014 -0600

    Use general free func for all datatype.
    
    Signed-off-by: Wesley Bland <wbland at anl.gov>

diff --git a/test/mpi/util/mtest_datatype.c b/test/mpi/util/mtest_datatype.c
index b2cf308..596e979 100644
--- a/test/mpi/util/mtest_datatype.c
+++ b/test/mpi/util/mtest_datatype.c
@@ -28,6 +28,30 @@
 #endif
 #include <errno.h>
 
+
+/* ------------------------------------------------------------------------ */
+/* General datatype routines                        */
+/* ------------------------------------------------------------------------ */
+
+static void *MTestTypeFree(MTestDatatype * mtype)
+{
+    if (mtype->buf)
+        free(mtype->buf);
+    if (mtype->displs)
+        free(mtype->displs);
+    if (mtype->displ_in_bytes)
+        free(mtype->displ_in_bytes);
+    if (mtype->index)
+        free(mtype->index);
+    mtype->buf = 0;
+    mtype->displs = 0;
+    mtype->displ_in_bytes = 0;
+    mtype->index = 0;
+
+    return 0;
+}
+
+
 /* ------------------------------------------------------------------------ */
 /* Datatype routines for contiguous datatypes                               */
 /* ------------------------------------------------------------------------ */
@@ -69,18 +93,6 @@ static void *MTestTypeContigInit(MTestDatatype * mtype)
 }
 
 /*
- * Free buffer of basic datatype
- */
-static void *MTestTypeContigFree(MTestDatatype * mtype)
-{
-    if (mtype->buf) {
-        free(mtype->buf);
-        mtype->buf = 0;
-    }
-    return 0;
-}
-
-/*
  * Check value of received basic datatype buffer.
  */
 static int MTestTypeContigCheckbuf(MTestDatatype * mtype)
@@ -170,18 +182,6 @@ static void *MTestTypeVectorInit(MTestDatatype * mtype)
 }
 
 /*
- * Free buffer of vector datatype
- */
-static void *MTestTypeVectorFree(MTestDatatype * mtype)
-{
-    if (mtype->buf) {
-        free(mtype->buf);
-        mtype->buf = 0;
-    }
-    return 0;
-}
-
-/*
  * Check value of received vector datatype buffer
  */
 static int MTestTypeVectorCheckbuf(MTestDatatype * mtype)
@@ -294,24 +294,6 @@ static void *MTestTypeIndexedInit(MTestDatatype * mtype)
 }
 
 /*
- * Free buffer of indexed datatype
- */
-static void *MTestTypeIndexedFree(MTestDatatype * mtype)
-{
-    if (mtype->buf) {
-        free(mtype->buf);
-        free(mtype->displs);
-        free(mtype->displ_in_bytes);
-        free(mtype->index);
-        mtype->buf = 0;
-        mtype->displs = 0;
-        mtype->displ_in_bytes = 0;
-        mtype->index = 0;
-    }
-    return 0;
-}
-
-/*
  * Check value of received indexed datatype buffer
  */
 static int MTestTypeIndexedCheckbuf(MTestDatatype * mtype)
@@ -360,7 +342,7 @@ static int MTestTypeIndexedCheckbuf(MTestDatatype * mtype)
 
 
 /* ------------------------------------------------------------------------ */
-/* Datatype generators                                                      */
+/* Datatype creators                                                      */
 /* ------------------------------------------------------------------------ */
 
 /*
@@ -400,7 +382,7 @@ static int MTestTypeContiguousCreate(int nblock, int blocklen, int stride,
         MTestPrintError(merr);
 
     mtype->InitBuf = MTestTypeContigInit;
-    mtype->FreeBuf = MTestTypeContigFree;
+    mtype->FreeBuf = MTestTypeFree;
     mtype->CheckBuf = MTestTypeContigCheckbuf;
     return merr;
 }
@@ -446,7 +428,7 @@ static int MTestTypeVectorCreate(int nblock, int blocklen, int stride,
         MTestPrintError(merr);
 
     mtype->InitBuf = MTestTypeVectorInit;
-    mtype->FreeBuf = MTestTypeVectorFree;
+    mtype->FreeBuf = MTestTypeFree;
     mtype->CheckBuf = MTestTypeVectorCheckbuf;
     return merr;
 }
@@ -505,7 +487,7 @@ static int MTestTypeIndexedCreate(int nblock, int blocklen, int stride,
         MTestPrintError(merr);
 
     mtype->InitBuf = MTestTypeIndexedInit;
-    mtype->FreeBuf = MTestTypeIndexedFree;
+    mtype->FreeBuf = MTestTypeFree;
     mtype->CheckBuf = MTestTypeIndexedCheckbuf;
 
     return merr;
@@ -525,7 +507,7 @@ int MTestTypeBasicCreate(MPI_Datatype oldtype, MTestDatatype * mtype)
     mtype->datatype = oldtype;
     mtype->isBasic = 1;
     mtype->InitBuf = MTestTypeContigInit;
-    mtype->FreeBuf = MTestTypeContigFree;
+    mtype->FreeBuf = MTestTypeFree;
     mtype->CheckBuf = MTestTypeContigCheckbuf;
 
     return merr;
@@ -553,7 +535,7 @@ int MTestTypeDupCreate(MPI_Datatype oldtype, MTestDatatype * mtype)
      * was committed (MPI-2, section 8.8) */
 
     mtype->InitBuf = MTestTypeContigInit;
-    mtype->FreeBuf = MTestTypeContigFree;
+    mtype->FreeBuf = MTestTypeFree;
     mtype->CheckBuf = MTestTypeContigCheckbuf;
 
     return merr;

http://git.mpich.org/mpich.git/commitdiff/019939c36a28b2b1d834da425854c898d592dbd7

commit 019939c36a28b2b1d834da425854c898d592dbd7
Author: Min Si <msi at il.is.s.u-tokyo.ac.jp>
Date:   Tue Oct 21 12:48:44 2014 -0500

    Separate and rewrite mtest datatype.
    
    This patch separates mtest datatype from file mtest.c and then rewrites
    the whole structure for applying various test patterns and datatypes.
    
    We separate mtest datatype funcs and test generators.
    1. In mtest_datatype_gen.c, we generate test cases for both basic and
    derived datatype.
    2. In mtest_datatype.c, we define the MTestDatatype creator,
    init/free/check functions for each derived datatype in order to reuse in
    multiple test cases generated by 1.
    
    About test case definition:
    1. For every basic datatype, we only define one test case using
    the same type for both send and receive buffers.
    2. For every derived datatype, we test ddt send buffer and receive
    buffer separately, each with contig buffer on the other side. We define
    following four different ddt structures for each test:
    	a.large block length
    	b.large count
    	c.large block length and large stride
    	d.large count and large stride
    
    Signed-off-by: Wesley Bland <wbland at anl.gov>

diff --git a/test/mpi/Makefile.mtest b/test/mpi/Makefile.mtest
index f051b0f..67a9983 100644
--- a/test/mpi/Makefile.mtest
+++ b/test/mpi/Makefile.mtest
@@ -17,6 +17,8 @@
 # AM_CPPFLAGS are used for C++ code as well
 AM_CPPFLAGS = -I$(top_builddir)/include -I$(top_srcdir)/include
 LDADD = $(top_builddir)/util/mtest.o
+LDADD += $(top_builddir)/util/mtest_datatype.o
+LDADD += $(top_builddir)/util/mtest_datatype_gen.o
 
 ## FIXME "DEPADD" is a simplemake concept, which we can handle on a per-target
 ## prog_DEPENDENCIES variable, but it would be better to figure out the right
@@ -26,6 +28,12 @@ LDADD = $(top_builddir)/util/mtest.o
 $(top_builddir)/util/mtest.$(OBJEXT): $(top_srcdir)/util/mtest.c
 	(cd $(top_builddir)/util && $(MAKE) mtest.$(OBJEXT))
 
+$(top_builddir)/util/mtest_datatype.$(OBJEXT): $(top_srcdir)/util/mtest_datatype.c
+	(cd $(top_builddir)/util && $(MAKE) mtest_datatype.$(OBJEXT))
+
+$(top_builddir)/util/mtest_datatype_gen.$(OBJEXT): $(top_srcdir)/util/mtest_datatype_gen.c
+	(cd $(top_builddir)/util && $(MAKE) mtest_datatype_gen.$(OBJEXT))
+
 testing:
 	$(top_builddir)/runtests -srcdir=$(srcdir) -tests=testlist \
 		-mpiexec=${MPIEXEC} -xmlfile=summary.xml \
diff --git a/test/mpi/include/mpitest.h b/test/mpi/include/mpitest.h
index 292033a..86a6a54 100644
--- a/test/mpi/include/mpitest.h
+++ b/test/mpi/include/mpitest.h
@@ -25,6 +25,7 @@ int MTestReturnValue( int );
  * Utilities
  */
 void MTestSleep( int );
+void MTestGetDbgInfo(int *dbgflag, int *verbose);
 
 /*
  * This structure contains the information used to test datatypes
@@ -43,9 +44,9 @@ typedef struct _MTestDatatype {
 			       (used by the CheckBuf routines) */
     /* The following is optional data that is used by some of
        the derived datatypes */
-    int  stride, nelm, blksize, *index;
+    int  stride, nblock, blksize, *index;
     /* stride, nelm, and blksize are in bytes */
-    int *displs, basesize;
+    int *displs, *displ_in_bytes, basesize;
     /* displacements are in multiples of base type; basesize is the
        size of that type*/
     void *(*InitBuf)( struct _MTestDatatype * );
diff --git a/test/mpi/util/mtest.c b/test/mpi/util/mtest.c
index 43a1590..ea309c0 100644
--- a/test/mpi/util/mtest.c
+++ b/test/mpi/util/mtest.c
@@ -256,712 +256,12 @@ void MTestSleep( int sec )
 }
 #endif
 
-/*
- * Datatypes
- *
- * Eventually, this could read a description of a file.  For now, we hard 
- * code the choices.
- *
- * Each kind of datatype has the following functions:
- *    MTestTypeXXXInit     - Initialize a send buffer for that type
- *    MTestTypeXXXInitRecv - Initialize a receive buffer for that type
- *    MTestTypeXXXFree     - Free any buffers associate with that type
- *    MTestTypeXXXCheckbuf - Check that the buffer contains the expected data
- * These routines work with (nearly) any datatype that is of type XXX, 
- * allowing the test codes to create a variety of contiguous, vector, and
- * indexed types, then test them by calling these routines.
- *
- * Available types (for the XXX) are
- *    Contig   - Simple contiguous buffers
- *    Vector   - Simple strided "vector" type
- *    Indexed  - Indexed datatype.  Only for a count of 1 instance of the 
- *               datatype
- */
-static int datatype_index = 0;
-
-/* ------------------------------------------------------------------------ */
-/* Datatype routines for contiguous datatypes                               */
-/* ------------------------------------------------------------------------ */
-/* 
- * Setup contiguous buffers of n copies of a datatype.
- */
-static void *MTestTypeContigInit( MTestDatatype *mtype )
-{
-    MPI_Aint size;
-    int merr;
-
-    if (mtype->count > 0) {
-	signed char *p;
-	int  i, totsize;
-	merr = MPI_Type_extent( mtype->datatype, &size );
-	if (merr) MTestPrintError( merr );
-	totsize = size * mtype->count;
-	if (!mtype->buf) {
-	    mtype->buf = (void *) malloc( totsize );
-	}
-	p = (signed char *)(mtype->buf);
-	if (!p) {
-	    /* Error - out of memory */
-	    MTestError( "Out of memory in type buffer init" );
-	}
-	for (i=0; i<totsize; i++) {
-	    p[i] = 0xff ^ (i & 0xff);
-	}
-    }
-    else {
-	if (mtype->buf) {
-	    free( mtype->buf );
-	}
-	mtype->buf = 0;
-    }
-    return mtype->buf;
-}
-
-/* 
- * Setup contiguous buffers of n copies of a datatype.  Initialize for
- * reception (e.g., set initial data to detect failure)
- */
-static void *MTestTypeContigInitRecv( MTestDatatype *mtype )
-{
-    MPI_Aint size;
-    int      merr;
-
-    if (mtype->count > 0) {
-	signed char *p;
-	int  i, totsize;
-	merr = MPI_Type_extent( mtype->datatype, &size );
-	if (merr) MTestPrintError( merr );
-	totsize = size * mtype->count;
-	if (!mtype->buf) {
-	    mtype->buf = (void *) malloc( totsize );
-	}
-	p = (signed char *)(mtype->buf);
-	if (!p) {
-	    /* Error - out of memory */
-	    MTestError( "Out of memory in type buffer init" );
-	}
-	for (i=0; i<totsize; i++) {
-	    p[i] = 0xff;
-	}
-    }
-    else {
-	if (mtype->buf) {
-	    free( mtype->buf );
-	}
-	mtype->buf = 0;
-    }
-    return mtype->buf;
-}
-static void *MTestTypeContigFree( MTestDatatype *mtype )
-{
-    if (mtype->buf) {
-	free( mtype->buf );
-	mtype->buf = 0;
-    }
-    return 0;
-}
-static int MTestTypeContigCheckbuf( MTestDatatype *mtype )
-{
-    unsigned char *p;
-    unsigned char expected;
-    int  i, totsize, err = 0, merr;
-    MPI_Aint size;
-
-    p = (unsigned char *)mtype->buf;
-    if (p) {
-	merr = MPI_Type_extent( mtype->datatype, &size );
-	if (merr) MTestPrintError( merr );
-	totsize = size * mtype->count;
-	for (i=0; i<totsize; i++) {
-	    expected = (0xff ^ (i & 0xff));
-	    if (p[i] != expected) {
-		err++;
-		if (mtype->printErrors && err < 10) {
-		    printf( "Data expected = %x but got p[%d] = %x\n",
-			    expected, i, p[i] );
-		    fflush( stdout );
-		}
-	    }
-	}
-    }
-    return err;
-}
-
-/* ------------------------------------------------------------------------ */
-/* Datatype routines for vector datatypes                                   */
-/* ------------------------------------------------------------------------ */
-
-static void *MTestTypeVectorInit( MTestDatatype *mtype )
-{
-    MPI_Aint size;
-    int      merr;
-
-    if (mtype->count > 0) {
-	unsigned char *p;
-	int  i, j, k, nc, totsize;
-
-	merr = MPI_Type_extent( mtype->datatype, &size );
-	if (merr) MTestPrintError( merr );
-	totsize	   = mtype->count * size;
-	if (!mtype->buf) {
-	    mtype->buf = (void *) malloc( totsize );
-	}
-	p	   = (unsigned char *)(mtype->buf);
-	if (!p) {
-	    /* Error - out of memory */
-	    MTestError( "Out of memory in type buffer init" );
-	}
-
-	/* First, set to -1 */
-	for (i=0; i<totsize; i++) p[i] = 0xff;
-
-	/* Now, set the actual elements to the successive values.
-	   To do this, we need to run 3 loops */
-	nc = 0;
-	/* count is usually one for a vector type */
-	for (k=0; k<mtype->count; k++) {
-	    /* For each element (block) */
-	    for (i=0; i<mtype->nelm; i++) {
-		/* For each value */
-		for (j=0; j<mtype->blksize; j++) {
-		    p[j] = (0xff ^ (nc & 0xff));
-		    nc++;
-		}
-		p += mtype->stride;
-	    }
-	}
-    }
-    else {
-	mtype->buf = 0;
-    }
-    return mtype->buf;
+/* Other mtest subfiles read debug setting using this function. */
+void MTestGetDbgInfo(int *_dbgflag, int *_verbose) {
+    *_dbgflag = dbgflag;
+    *_verbose = verbose;
 }
 
-static void *MTestTypeVectorFree( MTestDatatype *mtype )
-{
-    if (mtype->buf) {
-	free( mtype->buf );
-	mtype->buf = 0;
-    }
-    return 0;
-}
-
-/* ------------------------------------------------------------------------ */
-/* Datatype routines for indexed block datatypes                            */
-/* ------------------------------------------------------------------------ */
-
-/* 
- * Setup a buffer for one copy of an indexed datatype. 
- */
-static void *MTestTypeIndexedInit( MTestDatatype *mtype )
-{
-    MPI_Aint totsize;
-    int      merr;
-    
-    if (mtype->count > 1) {
-	MTestError( "This datatype is supported only for a single count" );
-    }
-    if (mtype->count == 1) {
-	signed char *p;
-	int  i, k, offset, j;
-
-	/* Allocate the send/recv buffer */
-	merr = MPI_Type_extent( mtype->datatype, &totsize );
-	if (merr) MTestPrintError( merr );
-	if (!mtype->buf) {
-	    mtype->buf = (void *) malloc( totsize );
-	}
-	p = (signed char *)(mtype->buf);
-	if (!p) {
-	    MTestError( "Out of memory in type buffer init\n" );
-	}
-	/* Initialize the elements */
-	/* First, set to -1 */
-	for (i=0; i<totsize; i++) p[i] = 0xff;
-
-	/* Now, set the actual elements to the successive values.
-	   We require that the base type is a contiguous type */
-	k = 0;
-	for (i=0; i<mtype->nelm; i++) {
-	    int b;
-	    /* Compute the offset: */
-	    offset = mtype->displs[i] * mtype->basesize;
-	    /* For each element in the block */
-	    for (b=0; b<mtype->index[i]; b++) {
-		for (j=0; j<mtype->basesize; j++) {
-		    p[offset+j] = 0xff ^ (k++ & 0xff);
-		}
-		offset += mtype->basesize;
-	    }
-	}
-    }
-    else {
-	/* count == 0 */
-	if (mtype->buf) {
-	    free( mtype->buf );
-	}
-	mtype->buf = 0;
-    }
-    return mtype->buf;
-}
-
-/* 
- * Setup indexed buffers for 1 copy of a datatype.  Initialize for
- * reception (e.g., set initial data to detect failure)
- */
-static void *MTestTypeIndexedInitRecv( MTestDatatype *mtype )
-{
-    MPI_Aint totsize;
-    int      merr;
-
-    if (mtype->count > 1) {
-	MTestError( "This datatype is supported only for a single count" );
-    }
-    if (mtype->count == 1) {
-	signed char *p;
-	int  i;
-	merr = MPI_Type_extent( mtype->datatype, &totsize );
-	if (merr) MTestPrintError( merr );
-	if (!mtype->buf) {
-	    mtype->buf = (void *) malloc( totsize );
-	}
-	p = (signed char *)(mtype->buf);
-	if (!p) {
-	    /* Error - out of memory */
-	    MTestError( "Out of memory in type buffer init\n" );
-	}
-	for (i=0; i<totsize; i++) {
-	    p[i] = 0xff;
-	}
-    }
-    else {
-	/* count == 0 */
-	if (mtype->buf) {
-	    free( mtype->buf );
-	}
-	mtype->buf = 0;
-    }
-    return mtype->buf;
-}
-
-static void *MTestTypeIndexedFree( MTestDatatype *mtype )
-{
-    if (mtype->buf) {
-	free( mtype->buf );
-	free( mtype->displs );
-	free( mtype->index );
-	mtype->buf    = 0;
-	mtype->displs = 0;
-	mtype->index  = 0;
-    }
-    return 0;
-}
-
-static int MTestTypeIndexedCheckbuf( MTestDatatype *mtype )
-{
-    unsigned char *p;
-    unsigned char expected;
-    int  i, err = 0, merr;
-    MPI_Aint totsize;
-
-    p = (unsigned char *)mtype->buf;
-    if (p) {
-	int j, k, offset;
-	merr = MPI_Type_extent( mtype->datatype, &totsize );
-	if (merr) MTestPrintError( merr );
-	
-	k = 0;
-	for (i=0; i<mtype->nelm; i++) {
-	    int b;
-	    /* Compute the offset: */
-	    offset = mtype->displs[i] * mtype->basesize;
-	    for (b=0; b<mtype->index[i]; b++) {
-		for (j=0; j<mtype->basesize; j++) {
-		    expected = (0xff ^ (k & 0xff));
-		    if (p[offset+j] != expected) {
-			err++;
-			if (mtype->printErrors && err < 10) {
-			    printf( "Data expected = %x but got p[%d,%d] = %x\n",
-				    expected, i,j, p[offset+j] );
-			    fflush( stdout );
-			}
-		    }
-		    k++;
-		}
-		offset += mtype->basesize;
-	    }
-	}
-    }
-    return err;
-}
-
-
-/* ------------------------------------------------------------------------ */
-/* Routines to select a datatype and associated buffer create/fill/check    */
-/* routines                                                                 */
-/* ------------------------------------------------------------------------ */
-
-/* 
-   Create a range of datatypes with a given count elements.
-   This uses a selection of types, rather than an exhaustive collection.
-   It allocates both send and receive types so that they can have the same
-   type signature (collection of basic types) but different type maps (layouts
-   in memory) 
- */
-int MTestGetDatatypes( MTestDatatype *sendtype, MTestDatatype *recvtype,
-		       int count )
-{
-    int merr;
-    int i;
-
-    sendtype->InitBuf	  = 0;
-    sendtype->FreeBuf	  = 0;
-    sendtype->CheckBuf	  = 0;
-    sendtype->datatype	  = 0;
-    sendtype->isBasic	  = 0;
-    sendtype->printErrors = 0;
-    recvtype->InitBuf	  = 0;
-    recvtype->FreeBuf	  = 0;
-
-    recvtype->CheckBuf	  = 0;
-    recvtype->datatype	  = 0;
-    recvtype->isBasic	  = 0;
-    recvtype->printErrors = 0;
-
-    sendtype->buf	  = 0;
-    recvtype->buf	  = 0;
-
-    /* Set the defaults for the message lengths */
-    sendtype->count	  = count;
-    recvtype->count	  = count;
-    /* Use datatype_index to choose a datatype to use.  If at the end of the
-       list, return 0 */
-    switch (datatype_index) {
-    case 0:
-	sendtype->datatype = MPI_INT;
-	sendtype->isBasic  = 1;
-	recvtype->datatype = MPI_INT;
-	recvtype->isBasic  = 1;
-	break;
-    case 1:
-	sendtype->datatype = MPI_DOUBLE;
-	sendtype->isBasic  = 1;
-	recvtype->datatype = MPI_DOUBLE;
-	recvtype->isBasic  = 1;
-	break;
-    case 2:
-	sendtype->datatype = MPI_FLOAT_INT;
-	sendtype->isBasic  = 1;
-	recvtype->datatype = MPI_FLOAT_INT;
-	recvtype->isBasic  = 1;
-	break;
-    case 3:
-	merr = MPI_Type_dup( MPI_INT, &sendtype->datatype );
-	if (merr) MTestPrintError( merr );
-	merr = MPI_Type_set_name( sendtype->datatype,
-                                  (char*)"dup of MPI_INT" );
-	if (merr) MTestPrintError( merr );
-	merr = MPI_Type_dup( MPI_INT, &recvtype->datatype );
-	if (merr) MTestPrintError( merr );
-	merr = MPI_Type_set_name( recvtype->datatype,
-                                  (char*)"dup of MPI_INT" );
-	if (merr) MTestPrintError( merr );
-	/* dup'ed types are already committed if the original type 
-	   was committed (MPI-2, section 8.8) */
-	break;
-    case 4:
-	/* vector send type and contiguous receive type */
-	/* These sizes are in bytes (see the VectorInit code) */
- 	sendtype->stride   = 3 * sizeof(int);
-	sendtype->blksize  = sizeof(int);
-	sendtype->nelm     = recvtype->count;
-
-	merr = MPI_Type_vector( recvtype->count, 1, 3, MPI_INT, 
-				&sendtype->datatype );
-	if (merr) MTestPrintError( merr );
-        merr = MPI_Type_commit( &sendtype->datatype );
-	if (merr) MTestPrintError( merr );
-	merr = MPI_Type_set_name( sendtype->datatype,
-                                  (char*)"int-vector" );
-	if (merr) MTestPrintError( merr );
-	sendtype->count    = 1;
- 	recvtype->datatype = MPI_INT;
-	recvtype->isBasic  = 1;
-	sendtype->InitBuf  = MTestTypeVectorInit;
-	recvtype->InitBuf  = MTestTypeContigInitRecv;
-	sendtype->FreeBuf  = MTestTypeVectorFree;
-	recvtype->FreeBuf  = MTestTypeContigFree;
-	sendtype->CheckBuf = 0;
-	recvtype->CheckBuf = MTestTypeContigCheckbuf;
-	break;
-
-    case 5:
-	/* Indexed send using many small blocks and contig receive */
-	sendtype->blksize  = sizeof(int);
-	sendtype->nelm     = recvtype->count;
-	sendtype->basesize = sizeof(int);
-	sendtype->displs   = (int *)malloc( sendtype->nelm * sizeof(int) );
-	sendtype->index    = (int *)malloc( sendtype->nelm * sizeof(int) );
-	if (!sendtype->displs || !sendtype->index) {
-	    MTestError( "Out of memory in type init\n" );
-	}
-	/* Make the sizes larger (4 ints) to help push the total
-	   size to over 256k in some cases, as the MPICH code as of
-	   10/1/06 used large internal buffers for packing non-contiguous
-	   messages */
-	for (i=0; i<sendtype->nelm; i++) {
-	    sendtype->index[i]   = 4;
-	    sendtype->displs[i]  = 5*i;
-	}
-	merr = MPI_Type_indexed( sendtype->nelm,
-				 sendtype->index, sendtype->displs, 
-				 MPI_INT, &sendtype->datatype );
-	if (merr) MTestPrintError( merr );
-        merr = MPI_Type_commit( &sendtype->datatype );
-	if (merr) MTestPrintError( merr );
-	merr = MPI_Type_set_name( sendtype->datatype,
-                                  (char*)"int-indexed(4-int)" );
-	if (merr) MTestPrintError( merr );
-	sendtype->count    = 1;
-	sendtype->InitBuf  = MTestTypeIndexedInit;
-	sendtype->FreeBuf  = MTestTypeIndexedFree;
-	sendtype->CheckBuf = 0;
-
- 	recvtype->datatype = MPI_INT;
-	recvtype->isBasic  = 1;
-	recvtype->count    = count * 4;
-	recvtype->InitBuf  = MTestTypeContigInitRecv;
-	recvtype->FreeBuf  = MTestTypeContigFree;
-	recvtype->CheckBuf = MTestTypeContigCheckbuf;
-	break;
-
-    case 6:
-	/* Indexed send using 2 large blocks and contig receive */
-	sendtype->blksize  = sizeof(int);
-	sendtype->nelm     = 2;
-	sendtype->basesize = sizeof(int);
-	sendtype->displs   = (int *)malloc( sendtype->nelm * sizeof(int) );
-	sendtype->index    = (int *)malloc( sendtype->nelm * sizeof(int) );
-	if (!sendtype->displs || !sendtype->index) {
-	    MTestError( "Out of memory in type init\n" );
-	}
-	/* index -> block size */
-	sendtype->index[0]   = (recvtype->count + 1) / 2;
-	sendtype->displs[0]  = 0;
-	sendtype->index[1]   = recvtype->count - sendtype->index[0];
-	sendtype->displs[1]  = sendtype->index[0] + 1; 
-	/* There is a deliberate gap here */
-
-	merr = MPI_Type_indexed( sendtype->nelm,
-				 sendtype->index, sendtype->displs, 
-				 MPI_INT, &sendtype->datatype );
-	if (merr) MTestPrintError( merr );
-        merr = MPI_Type_commit( &sendtype->datatype );
-	if (merr) MTestPrintError( merr );
-	merr = MPI_Type_set_name( sendtype->datatype,
-                                  (char*)"int-indexed(2 blocks)" );
-	if (merr) MTestPrintError( merr );
-	sendtype->count    = 1;
-	sendtype->InitBuf  = MTestTypeIndexedInit;
-	sendtype->FreeBuf  = MTestTypeIndexedFree;
-	sendtype->CheckBuf = 0;
-
- 	recvtype->datatype = MPI_INT;
-	recvtype->isBasic  = 1;
-	recvtype->count    = sendtype->index[0] + sendtype->index[1];
-	recvtype->InitBuf  = MTestTypeContigInitRecv;
-	recvtype->FreeBuf  = MTestTypeContigFree;
-	recvtype->CheckBuf = MTestTypeContigCheckbuf;
-	break;
-
-    case 7:
-	/* Indexed receive using many small blocks and contig send */
-	recvtype->blksize  = sizeof(int);
-	recvtype->nelm     = recvtype->count;
-	recvtype->basesize = sizeof(int);
-	recvtype->displs   = (int *)malloc( recvtype->nelm * sizeof(int) );
-	recvtype->index    = (int *)malloc( recvtype->nelm * sizeof(int) );
-	if (!recvtype->displs || !recvtype->index) {
-	    MTestError( "Out of memory in type recv init\n" );
-	}
-	/* Make the sizes larger (4 ints) to help push the total
-	   size to over 256k in some cases, as the MPICH code as of
-	   10/1/06 used large internal buffers for packing non-contiguous
-	   messages */
-	/* Note that there are gaps in the indexed type */
-	for (i=0; i<recvtype->nelm; i++) {
-	    recvtype->index[i]   = 4;
-	    recvtype->displs[i]  = 5*i;
-	}
-	merr = MPI_Type_indexed( recvtype->nelm,
-				 recvtype->index, recvtype->displs, 
-				 MPI_INT, &recvtype->datatype );
-	if (merr) MTestPrintError( merr );
-        merr = MPI_Type_commit( &recvtype->datatype );
-	if (merr) MTestPrintError( merr );
-	merr = MPI_Type_set_name( recvtype->datatype,
-                                  (char*)"recv-int-indexed(4-int)" );
-	if (merr) MTestPrintError( merr );
-	recvtype->count    = 1;
-	recvtype->InitBuf  = MTestTypeIndexedInitRecv;
-	recvtype->FreeBuf  = MTestTypeIndexedFree;
-	recvtype->CheckBuf = MTestTypeIndexedCheckbuf;
-
- 	sendtype->datatype = MPI_INT;
-	sendtype->isBasic  = 1;
-	sendtype->count    = count * 4;
-	sendtype->InitBuf  = MTestTypeContigInit;
-	sendtype->FreeBuf  = MTestTypeContigFree;
-	sendtype->CheckBuf = 0;
-	break;
-
-	/* Less commonly used but still simple types */
-    case 8:
-	sendtype->datatype = MPI_SHORT;
-	sendtype->isBasic  = 1;
-	recvtype->datatype = MPI_SHORT;
-	recvtype->isBasic  = 1;
-	break;
-    case 9:
-	sendtype->datatype = MPI_LONG;
-	sendtype->isBasic  = 1;
-	recvtype->datatype = MPI_LONG;
-	recvtype->isBasic  = 1;
-	break;
-    case 10:
-	sendtype->datatype = MPI_CHAR;
-	sendtype->isBasic  = 1;
-	recvtype->datatype = MPI_CHAR;
-	recvtype->isBasic  = 1;
-	break;
-    case 11:
-	sendtype->datatype = MPI_UINT64_T;
-	sendtype->isBasic  = 1;
-	recvtype->datatype = MPI_UINT64_T;
-	recvtype->isBasic  = 1;
-	break;
-    case 12:
-	sendtype->datatype = MPI_FLOAT;
-	sendtype->isBasic  = 1;
-	recvtype->datatype = MPI_FLOAT;
-	recvtype->isBasic  = 1;
-	break;
-
-#ifndef USE_STRICT_MPI
-	/* MPI_BYTE may only be used with MPI_BYTE in strict MPI */
-    case 13:
-	sendtype->datatype = MPI_INT;
-	sendtype->isBasic  = 1;
-	recvtype->datatype = MPI_BYTE;
-	recvtype->isBasic  = 1;
-	recvtype->count    *= sizeof(int);
-	break;
-#endif
-    default:
-	datatype_index = -1;
-    }
-
-    if (!sendtype->InitBuf) {
-	sendtype->InitBuf  = MTestTypeContigInit;
-	recvtype->InitBuf  = MTestTypeContigInitRecv;
-	sendtype->FreeBuf  = MTestTypeContigFree;
-	recvtype->FreeBuf  = MTestTypeContigFree;
-	sendtype->CheckBuf = MTestTypeContigCheckbuf;
-	recvtype->CheckBuf = MTestTypeContigCheckbuf;
-    }
-    datatype_index++;
-
-    if (dbgflag && datatype_index > 0) {
-	int typesize;
-	fprintf( stderr, "%d: sendtype is %s\n", wrank, MTestGetDatatypeName( sendtype ) );
-	merr = MPI_Type_size( sendtype->datatype, &typesize );
-	if (merr) MTestPrintError( merr );
-	fprintf( stderr, "%d: sendtype size = %d\n", wrank, typesize );
-	fprintf( stderr, "%d: recvtype is %s\n", wrank, MTestGetDatatypeName( recvtype ) );
-	merr = MPI_Type_size( recvtype->datatype, &typesize );
-	if (merr) MTestPrintError( merr );
-	fprintf( stderr, "%d: recvtype size = %d\n", wrank, typesize );
-	fflush( stderr );
-	
-    }
-    else if (verbose && datatype_index > 0) {
-	printf( "Get new datatypes: send = %s, recv = %s\n", 
-		MTestGetDatatypeName( sendtype ), 
-		MTestGetDatatypeName( recvtype ) );
-	fflush( stdout );
-    }
-
-    return datatype_index;
-}
-
-/* Reset the datatype index (start from the initial data type.
-   Note: This routine is rarely needed; MTestGetDatatypes automatically
-   starts over after the last available datatype is used.
-*/
-void MTestResetDatatypes( void )
-{
-    datatype_index = 0;
-}
-/* Return the index of the current datatype.  This is rarely needed and
-   is provided mostly to enable debugging of the MTest package itself */
-int MTestGetDatatypeIndex( void )
-{
-    return datatype_index;
-}
-
-/* Free the storage associated with a datatype */
-void MTestFreeDatatype( MTestDatatype *mtype )
-{
-    int merr;
-    /* Invoke a datatype-specific free function to handle
-       both the datatype and the send/receive buffers */
-    if (mtype->FreeBuf) {
-	(mtype->FreeBuf)( mtype );
-    }
-    /* Free the datatype itself if it was created */
-    if (!mtype->isBasic) {
-	merr = MPI_Type_free( &mtype->datatype );
-	if (merr) MTestPrintError( merr );
-    }
-}
-
-/* Check that a message was received correctly.  Returns the number of
-   errors detected.  Status may be NULL or MPI_STATUS_IGNORE */
-int MTestCheckRecv( MPI_Status *status, MTestDatatype *recvtype )
-{
-    int count;
-    int errs = 0, merr;
-
-    if (status && status != MPI_STATUS_IGNORE) {
-	merr = MPI_Get_count( status, recvtype->datatype, &count );
-	if (merr) MTestPrintError( merr );
-	
-	/* Check count against expected count */
-	if (count != recvtype->count) {
-	    errs ++;
-	}
-    }
-
-    /* Check received data */
-    if (!errs && recvtype->CheckBuf( recvtype )) {
-	errs++;
-    }
-    return errs;
-}
-
-/* This next routine uses a circular buffer of static name arrays just to
-   simplify the use of the routine */
-const char *MTestGetDatatypeName( MTestDatatype *dtype )
-{
-    static char name[4][MPI_MAX_OBJECT_NAME];
-    static int sp=0;
-    int rlen, merr;
-
-    if (sp >= 4) sp = 0;
-    merr = MPI_Type_get_name( dtype->datatype, name[sp], &rlen );
-    if (merr) MTestPrintError( merr );
-    return (const char *)name[sp++];
-}
 /* ----------------------------------------------------------------------- */
 
 /* 
diff --git a/test/mpi/util/mtest_datatype.c b/test/mpi/util/mtest_datatype.c
new file mode 100644
index 0000000..b2cf308
--- /dev/null
+++ b/test/mpi/util/mtest_datatype.c
@@ -0,0 +1,608 @@
+/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
+/*
+ *
+ *  (C) 2014 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ */
+#include "mtest_datatype.h"
+#if defined(HAVE_STDIO_H) || defined(STDC_HEADERS)
+#include <stdio.h>
+#endif
+#if defined(HAVE_STDLIB_H) || defined(STDC_HEADERS)
+#include <stdlib.h>
+#endif
+#if defined(HAVE_STRING_H) || defined(STDC_HEADERS)
+#include <string.h>
+#endif
+#ifdef HAVE_STDARG_H
+#include <stdarg.h>
+#endif
+/* The following two includes permit the collection of resource usage
+   data in the tests
+ */
+#ifdef HAVE_SYS_TIME_H
+#include <sys/time.h>
+#endif
+#ifdef HAVE_SYS_RESOURCE_H
+#include <sys/resource.h>
+#endif
+#include <errno.h>
+
+/* ------------------------------------------------------------------------ */
+/* Datatype routines for contiguous datatypes                               */
+/* ------------------------------------------------------------------------ */
+/*
+ * Initialize buffer of basic datatype
+ */
+static void *MTestTypeContigInit(MTestDatatype * mtype)
+{
+    MPI_Aint size;
+    int merr;
+
+    if (mtype->count > 0) {
+        unsigned char *p;
+        int i, totsize;
+        merr = MPI_Type_extent(mtype->datatype, &size);
+        if (merr)
+            MTestPrintError(merr);
+        totsize = size * mtype->count;
+        if (!mtype->buf) {
+            mtype->buf = (void *) malloc(totsize);
+        }
+        p = (unsigned char *) (mtype->buf);
+        if (!p) {
+            char errmsg[128] = { 0 };
+            sprintf(errmsg, "Out of memory in %s", __FUNCTION__);
+            MTestError(errmsg);
+        }
+        for (i = 0; i < totsize; i++) {
+            p[i] = (unsigned char) (0xff ^ (i & 0xff));
+        }
+    }
+    else {
+        if (mtype->buf) {
+            free(mtype->buf);
+        }
+        mtype->buf = 0;
+    }
+    return mtype->buf;
+}
+
+/*
+ * Free buffer of basic datatype
+ */
+static void *MTestTypeContigFree(MTestDatatype * mtype)
+{
+    if (mtype->buf) {
+        free(mtype->buf);
+        mtype->buf = 0;
+    }
+    return 0;
+}
+
+/*
+ * Check value of received basic datatype buffer.
+ */
+static int MTestTypeContigCheckbuf(MTestDatatype * mtype)
+{
+    unsigned char *p;
+    unsigned char expected;
+    int i, totsize, err = 0, merr;
+    MPI_Aint size;
+
+    p = (unsigned char *) mtype->buf;
+    if (p) {
+        merr = MPI_Type_extent(mtype->datatype, &size);
+        if (merr)
+            MTestPrintError(merr);
+        totsize = size * mtype->count;
+        for (i = 0; i < totsize; i++) {
+            expected = (unsigned char) (0xff ^ (i & 0xff));
+            if (p[i] != expected) {
+                err++;
+                if (mtype->printErrors && err < 10) {
+                    printf("Data expected = %x but got p[%d] = %x\n", expected, i, p[i]);
+                    fflush(stdout);
+                }
+            }
+        }
+    }
+    return err;
+}
+
+
+/* ------------------------------------------------------------------------ */
+/* Datatype routines for vector datatypes                                   */
+/* ------------------------------------------------------------------------ */
+
+/*
+ * Initialize buffer of vector datatype
+ */
+static void *MTestTypeVectorInit(MTestDatatype * mtype)
+{
+    MPI_Aint size, totsize, dt_offset, byte_offset;
+    int merr;
+
+    if (mtype->count > 0) {
+        unsigned char *p;
+        int i, j, k, nc;
+
+        merr = MPI_Type_extent(mtype->datatype, &size);
+        if (merr)
+            MTestPrintError(merr);
+        totsize = mtype->count * size;
+        if (!mtype->buf) {
+            mtype->buf = (void *) malloc(totsize);
+        }
+        p = (unsigned char *) (mtype->buf);
+        if (!p) {
+            char errmsg[128] = { 0 };
+            sprintf(errmsg, "Out of memory in %s", __FUNCTION__);
+            MTestError(errmsg);
+        }
+
+        /* First, set to -1 */
+        for (i = 0; i < totsize; i++)
+            p[i] = 0xff;
+
+        /* Now, set the actual elements to the successive values.
+         * We require that the base type is a contiguous type */
+        nc = 0;
+        dt_offset = 0;
+        /* For each datatype */
+        for (k = 0; k < mtype->count; k++) {
+            /* For each block */
+            for (i = 0; i < mtype->nblock; i++) {
+                byte_offset = dt_offset + i * mtype->stride;
+                /* For each byte */
+                for (j = 0; j < mtype->blksize; j++) {
+                    p[byte_offset + j] = (unsigned char) (0xff ^ (nc & 0xff));
+                    nc++;
+                }
+            }
+            dt_offset += size;
+        }
+    }
+    else {
+        mtype->buf = 0;
+    }
+    return mtype->buf;
+}
+
+/*
+ * Free buffer of vector datatype
+ */
+static void *MTestTypeVectorFree(MTestDatatype * mtype)
+{
+    if (mtype->buf) {
+        free(mtype->buf);
+        mtype->buf = 0;
+    }
+    return 0;
+}
+
+/*
+ * Check value of received vector datatype buffer
+ */
+static int MTestTypeVectorCheckbuf(MTestDatatype * mtype)
+{
+    unsigned char *p;
+    unsigned char expected;
+    int i, err = 0, merr;
+    MPI_Aint size = 0, byte_offset, dt_offset;
+
+    p = (unsigned char *) mtype->buf;
+    if (p) {
+        int j, k, nc;
+        merr = MPI_Type_extent(mtype->datatype, &size);
+        if (merr)
+            MTestPrintError(merr);
+
+        nc = 0;
+        dt_offset = 0;
+        /* For each datatype */
+        for (k = 0; k < mtype->count; k++) {
+            /* For each block */
+            for (i = 0; i < mtype->nblock; i++) {
+                byte_offset = dt_offset + i * mtype->stride;
+                /* For each byte */
+                for (j = 0; j < mtype->blksize; j++) {
+                    expected = (unsigned char) (0xff ^ (nc & 0xff));
+                    if (p[byte_offset + j] != expected) {
+                        err++;
+                        if (mtype->printErrors && err < 10) {
+                            printf("Data expected = %x but got p[%d,%d] = %x\n", expected, i, j,
+                                   p[byte_offset + j]);
+                            fflush(stdout);
+                        }
+                    }
+                    nc++;
+                }
+            }
+            dt_offset += size;
+        }
+    }
+    return err;
+}
+
+
+/* ------------------------------------------------------------------------ */
+/* Datatype routines for indexed datatypes                            */
+/* ------------------------------------------------------------------------ */
+
+/*
+ * Initialize buffer of indexed datatype
+ */
+static void *MTestTypeIndexedInit(MTestDatatype * mtype)
+{
+    MPI_Aint size = 0, totsize;
+    int merr;
+
+    if (mtype->count > 0) {
+        unsigned char *p;
+        int i, j, k, b, nc, offset, dt_offset;
+
+        /* Allocate buffer */
+        merr = MPI_Type_extent(mtype->datatype, &size);
+        if (merr)
+            MTestPrintError(merr);
+        totsize = size * mtype->count;
+
+        if (!mtype->buf) {
+            mtype->buf = (void *) malloc(totsize);
+        }
+        p = (unsigned char *) (mtype->buf);
+        if (!p) {
+            char errmsg[128] = { 0 };
+            sprintf(errmsg, "Out of memory in %s", __FUNCTION__);
+            MTestError(errmsg);
+        }
+
+        /* First, set to -1 */
+        for (i = 0; i < totsize; i++)
+            p[i] = 0xff;
+
+        /* Now, set the actual elements to the successive values.
+         * We require that the base type is a contiguous type */
+        nc = 0;
+        dt_offset = 0;
+        /* For each datatype */
+        for (k = 0; k < mtype->count; k++) {
+            /* For each block */
+            for (i = 0; i < mtype->nblock; i++) {
+                /* For each element in the block */
+                for (j = 0; j < mtype->index[i]; j++) {
+                    offset = dt_offset + mtype->displ_in_bytes[i]
+                        + j * mtype->basesize;
+                    /* For each byte in the element */
+                    for (b = 0; b < mtype->basesize; b++) {
+                        p[offset + b] = (unsigned char) (0xff ^ (nc++ & 0xff));
+                    }
+                }
+            }
+            dt_offset += size;
+        }
+    }
+    else {
+        /* count == 0 */
+        if (mtype->buf) {
+            free(mtype->buf);
+        }
+        mtype->buf = 0;
+    }
+    return mtype->buf;
+}
+
+/*
+ * Free buffer of indexed datatype
+ */
+static void *MTestTypeIndexedFree(MTestDatatype * mtype)
+{
+    if (mtype->buf) {
+        free(mtype->buf);
+        free(mtype->displs);
+        free(mtype->displ_in_bytes);
+        free(mtype->index);
+        mtype->buf = 0;
+        mtype->displs = 0;
+        mtype->displ_in_bytes = 0;
+        mtype->index = 0;
+    }
+    return 0;
+}
+
+/*
+ * Check value of received indexed datatype buffer
+ */
+static int MTestTypeIndexedCheckbuf(MTestDatatype * mtype)
+{
+    unsigned char *p;
+    unsigned char expected;
+    int err = 0, merr;
+    MPI_Aint size = 0;
+
+    p = (unsigned char *) mtype->buf;
+    if (p) {
+        int i, j, k, b, nc, offset, dt_offset;
+        merr = MPI_Type_extent(mtype->datatype, &size);
+        if (merr)
+            MTestPrintError(merr);
+
+        nc = 0;
+        dt_offset = 0;
+        /* For each datatype */
+        for (k = 0; k < mtype->count; k++) {
+            /* For each block */
+            for (i = 0; i < mtype->nblock; i++) {
+                /* For each element in the block */
+                for (j = 0; j < mtype->index[i]; j++) {
+                    offset = dt_offset + mtype->displ_in_bytes[i]
+                        + j * mtype->basesize;
+                    /* For each byte in the element */
+                    for (b = 0; b < mtype->basesize; b++) {
+                        expected = (unsigned char) (0xff ^ (nc++ & 0xff));
+                        if (p[offset + b] != expected) {
+                            err++;
+                            if (mtype->printErrors && err < 10) {
+                                printf("Data expected = %x but got p[%d,%d] = %x\n",
+                                       expected, i, j, p[offset + b]);
+                                fflush(stdout);
+                            }
+                        }
+                    }
+                }
+            }
+            dt_offset += size;
+        }
+    }
+    return err;
+}
+
+
+/* ------------------------------------------------------------------------ */
+/* Datatype generators                                                      */
+/* ------------------------------------------------------------------------ */
+
+/*
+ * Setup contiguous type info and handlers.
+ *
+ * A contiguous datatype is created by using following parameters (stride is unused).
+ * nblock:   Number of blocks.
+ * blocklen: Number of elements in each block. The total number of elements in
+ *           this datatype is set as (nblock * blocklen).
+ * oldtype:  Datatype of element.
+ */
+static int MTestTypeContiguousCreate(int nblock, int blocklen, int stride,
+                                     MPI_Datatype oldtype, const char *typename_prefix,
+                                     MTestDatatype * mtype)
+{
+    int merr = 0;
+    char type_name[128];
+
+    merr = MPI_Type_size(oldtype, &mtype->basesize);
+    if (merr)
+        MTestPrintError(merr);
+
+    mtype->nblock = nblock;
+    mtype->blksize = blocklen * mtype->basesize;
+
+    merr = MPI_Type_contiguous(nblock * blocklen, oldtype, &mtype->datatype);
+    if (merr)
+        MTestPrintError(merr);
+    merr = MPI_Type_commit(&mtype->datatype);
+    if (merr)
+        MTestPrintError(merr);
+
+    memset(type_name, 0, sizeof(type_name));
+    sprintf(type_name, "%s %s (%d count)", typename_prefix, "contiguous", nblock * blocklen);
+    merr = MPI_Type_set_name(mtype->datatype, (char *) type_name);
+    if (merr)
+        MTestPrintError(merr);
+
+    mtype->InitBuf = MTestTypeContigInit;
+    mtype->FreeBuf = MTestTypeContigFree;
+    mtype->CheckBuf = MTestTypeContigCheckbuf;
+    return merr;
+}
+
+/*
+ * Setup vector type info and handlers.
+ *
+ * A vector datatype is created by using following parameters.
+ * nblock:   Number of blocks.
+ * blocklen: Number of elements in each block.
+ * stride:   Strided number of elements between blocks.
+ * oldtype:  Datatype of element.
+ */
+static int MTestTypeVectorCreate(int nblock, int blocklen, int stride,
+                                 MPI_Datatype oldtype, const char *typename_prefix,
+                                 MTestDatatype * mtype)
+{
+    int merr = 0;
+    char type_name[128];
+
+    merr = MPI_Type_size(oldtype, &mtype->basesize);
+    if (merr)
+        MTestPrintError(merr);
+
+    /* These sizes are in bytes (see the VectorInit code) */
+    mtype->stride = stride * mtype->basesize;
+    mtype->blksize = blocklen * mtype->basesize;
+    mtype->nblock = nblock;
+
+    /* Vector uses stride in oldtypes */
+    merr = MPI_Type_vector(nblock, blocklen, stride, oldtype, &mtype->datatype);
+    if (merr)
+        MTestPrintError(merr);
+    merr = MPI_Type_commit(&mtype->datatype);
+    if (merr)
+        MTestPrintError(merr);
+
+    memset(type_name, 0, sizeof(type_name));
+    sprintf(type_name, "%s %s (%d nblock %d blocklen %d stride)", typename_prefix, "vector", nblock,
+            blocklen, stride);
+    merr = MPI_Type_set_name(mtype->datatype, (char *) type_name);
+    if (merr)
+        MTestPrintError(merr);
+
+    mtype->InitBuf = MTestTypeVectorInit;
+    mtype->FreeBuf = MTestTypeVectorFree;
+    mtype->CheckBuf = MTestTypeVectorCheckbuf;
+    return merr;
+}
+
+/*
+ * Setup indexed type info and handlers.
+ *
+ * A indexed datatype is created by using following parameters.
+ * nblock:   Number of blocks.
+ * blocklen: Number of elements in each block. Each block has the same length.
+ * stride:   Strided number of elements between two adjacent blocks. The
+ *           displacement of each block is set as (index of current block * stride).
+ * oldtype:  Datatype of element.
+ */
+static int MTestTypeIndexedCreate(int nblock, int blocklen, int stride,
+                                  MPI_Datatype oldtype, const char *typename_prefix,
+                                  MTestDatatype * mtype)
+{
+    int merr = 0;
+    char type_name[128];
+    int i;
+
+    merr = MPI_Type_size(oldtype, &mtype->basesize);
+    if (merr)
+        MTestPrintError(merr);
+
+    mtype->displs = (int *) malloc(nblock * sizeof(int));
+    mtype->displ_in_bytes = (int *) malloc(nblock * sizeof(int));
+    mtype->index = (int *) malloc(nblock * sizeof(int));
+    if (!mtype->displs || !mtype->displ_in_bytes || !mtype->index) {
+        char errmsg[128] = { 0 };
+        sprintf(errmsg, "Out of memory in %s", __FUNCTION__);
+        MTestError(errmsg);
+    }
+
+    mtype->nblock = nblock;
+    for (i = 0; i < nblock; i++) {
+        mtype->index[i] = blocklen;
+        mtype->displs[i] = stride * i;  /*stride between the start of two blocks */
+        mtype->displ_in_bytes[i] = stride * i * mtype->basesize;
+    }
+
+    /* Indexed uses displacement in oldtypes */
+    merr = MPI_Type_indexed(nblock, mtype->index, mtype->displs, oldtype, &mtype->datatype);
+    if (merr)
+        MTestPrintError(merr);
+    merr = MPI_Type_commit(&mtype->datatype);
+    if (merr)
+        MTestPrintError(merr);
+
+    memset(type_name, 0, sizeof(type_name));
+    sprintf(type_name, "%s %s (%d nblock %d blocklen %d stride)", typename_prefix, "index", nblock,
+            blocklen, stride);
+    merr = MPI_Type_set_name(mtype->datatype, (char *) type_name);
+    if (merr)
+        MTestPrintError(merr);
+
+    mtype->InitBuf = MTestTypeIndexedInit;
+    mtype->FreeBuf = MTestTypeIndexedFree;
+    mtype->CheckBuf = MTestTypeIndexedCheckbuf;
+
+    return merr;
+}
+
+/*
+ * Setup basic type info and handlers.
+ */
+int MTestTypeBasicCreate(MPI_Datatype oldtype, MTestDatatype * mtype)
+{
+    int merr = 0;
+
+    merr = MPI_Type_size(oldtype, &mtype->basesize);
+    if (merr)
+        MTestPrintError(merr);
+
+    mtype->datatype = oldtype;
+    mtype->isBasic = 1;
+    mtype->InitBuf = MTestTypeContigInit;
+    mtype->FreeBuf = MTestTypeContigFree;
+    mtype->CheckBuf = MTestTypeContigCheckbuf;
+
+    return merr;
+}
+
+/*
+ * Setup dup type info and handlers.
+ *
+ * A dup datatype is created by using following parameters.
+ * oldtype:  Datatype of element.
+ */
+int MTestTypeDupCreate(MPI_Datatype oldtype, MTestDatatype * mtype)
+{
+    int merr = 0;
+
+    merr = MPI_Type_size(oldtype, &mtype->basesize);
+    if (merr)
+        MTestPrintError(merr);
+
+    merr = MPI_Type_dup(oldtype, &mtype->datatype);
+    if (merr)
+        MTestPrintError(merr);
+
+    /* dup'ed types are already committed if the original type
+     * was committed (MPI-2, section 8.8) */
+
+    mtype->InitBuf = MTestTypeContigInit;
+    mtype->FreeBuf = MTestTypeContigFree;
+    mtype->CheckBuf = MTestTypeContigCheckbuf;
+
+    return merr;
+}
+
+
+/*
+ * General initialization for receive buffer.
+ * Allocate buffer and initialize for reception (e.g., set initial data to detect failure)
+ * Both basic and derived datatype can be handled by using extent as buffer size.
+ */
+void *MTestTypeInitRecv(MTestDatatype * mtype)
+{
+    MPI_Aint size;
+    int merr;
+
+    if (mtype->count > 0) {
+        signed char *p;
+        int i, totsize;
+        merr = MPI_Type_extent(mtype->datatype, &size);
+        if (merr)
+            MTestPrintError(merr);
+        totsize = size * mtype->count;
+        if (!mtype->buf) {
+            mtype->buf = (void *) malloc(totsize);
+        }
+        p = (signed char *) (mtype->buf);
+        if (!p) {
+            char errmsg[128] = { 0 };
+            sprintf(errmsg, "Out of memory in %s", __FUNCTION__);
+            MTestError(errmsg);
+        }
+        for (i = 0; i < totsize; i++) {
+            p[i] = 0xff;
+        }
+    }
+    else {
+        if (mtype->buf) {
+            free(mtype->buf);
+        }
+        mtype->buf = 0;
+    }
+    return mtype->buf;
+}
+
+void MTestTypeCreatorInit(MTestDdtCreator * creators)
+{
+    memset(creators, 0, sizeof(MTestDdtCreator) * MTEST_DDT_MAX);
+    creators[MTEST_DDT_CONTIGUOUS] = MTestTypeContiguousCreate;
+    creators[MTEST_DDT_VECTOR] = MTestTypeVectorCreate;
+    creators[MTEST_DDT_INDEXED] = MTestTypeIndexedCreate;
+}
diff --git a/test/mpi/util/mtest_datatype.h b/test/mpi/util/mtest_datatype.h
new file mode 100644
index 0000000..48fa96d
--- /dev/null
+++ b/test/mpi/util/mtest_datatype.h
@@ -0,0 +1,48 @@
+/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
+/*
+ *
+ *  (C) 2014 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ */
+
+#ifndef MTEST_DATATYPE_H_
+#define MTEST_DATATYPE_H_
+
+#include "mpi.h"
+#include "mpitestconf.h"
+#include "mpitest.h"
+
+/* Provide backward portability to MPI 1 */
+#ifndef MPI_VERSION
+#define MPI_VERSION 1
+#endif
+
+enum MTEST_BASIC_DT {
+    MTEST_BDT_INT,
+    MTEST_BDT_DOUBLE,
+    MTEST_BDT_FLOAT_INT,
+    MTEST_BDT_SHORT,
+    MTEST_BDT_LONG,
+    MTEST_BDT_CHAR,
+    MTEST_BDT_UINT64_T,
+    MTEST_BDT_FLOAT,
+    MTEST_BDT_BYTE,
+    MTEST_BDT_MAX
+};
+
+enum MTEST_DERIVED_DT {
+    MTEST_DDT_CONTIGUOUS,
+    MTEST_DDT_VECTOR,
+    MTEST_DDT_INDEXED,
+    MTEST_DDT_MAX
+};
+
+typedef int (*MTestDdtCreator) (int, int, int, MPI_Datatype, const char *, MTestDatatype *);
+
+extern void MTestTypeCreatorInit(MTestDdtCreator * creators);
+extern void *MTestTypeInitRecv(MTestDatatype * mtype);
+
+extern int MTestTypeBasicCreate(MPI_Datatype oldtype, MTestDatatype * mtype);
+extern int MTestTypeDupCreate(MPI_Datatype oldtype, MTestDatatype * mtype);
+
+#endif /* MTEST_DATATYPE_H_ */
diff --git a/test/mpi/util/mtest_datatype_gen.c b/test/mpi/util/mtest_datatype_gen.c
new file mode 100644
index 0000000..79fb1f9
--- /dev/null
+++ b/test/mpi/util/mtest_datatype_gen.c
@@ -0,0 +1,462 @@
+/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
+/*
+ *
+ *  (C) 2014 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ */
+#include "mtest_datatype.h"
+#if defined(HAVE_STDIO_H) || defined(STDC_HEADERS)
+#include <stdio.h>
+#endif
+#if defined(HAVE_STDLIB_H) || defined(STDC_HEADERS)
+#include <stdlib.h>
+#endif
+#if defined(HAVE_STRING_H) || defined(STDC_HEADERS)
+#include <string.h>
+#endif
+#ifdef HAVE_STDARG_H
+#include <stdarg.h>
+#endif
+/* The following two includes permit the collection of resource usage
+   data in the tests
+ */
+#ifdef HAVE_SYS_TIME_H
+#include <sys/time.h>
+#endif
+#ifdef HAVE_SYS_RESOURCE_H
+#include <sys/resource.h>
+#endif
+#include <errno.h>
+
+static int dbgflag = 0;         /* Flag used for debugging */
+static int wrank = -1;          /* World rank */
+static int verbose = 0;         /* Message level (0 is none) */
+
+/*
+ * Utility routines for writing MPI datatype communication tests.
+ *
+ * Both basic and derived datatype are included.
+ * For basic datatypes, every type has a test case that both the send and
+ * receive buffer use the same datatype and count.
+ *
+ *  For derived datatypes:
+ *    All the test cases are defined in this file, and the datatype definitions
+ *    are in file mtest_datatype.c. Each test case will be automatically called
+ *    by every datatype.
+ *
+ *  Test case generation:
+ *    Every datatype tests derived datatype send buffer and
+ *    derived datatype receive buffer separately. Each test contains various sub
+ *    tests for different structures (i.e., different value of count or block
+ *    length). The following four structures are defined:
+ *      L count & S block length & S stride
+ *      S count & L block length & S stride
+ *      L count & S block length & L stride
+ *      S count & L block length & L stride
+ *
+ *  How to add a new structure for each datatype:
+ *    1. Add structure definition in function MTestDdtStructDefine.
+ *    2. Increase MTEST_DDT_NUM_SUBTESTS
+ *
+ *  Datatype definition:
+ *    Every type is initialized by the creation function stored in
+ *    mtestDdtCreators variable, all of their create/init/check functions are
+ *    defined in file mtest_datatype.c. Following derived datatypes are defined:
+ *    Contiguous | Vector | Indexed
+ *
+ *  How to add a new derived datatype:
+ *    1. Add the new datatype in enum MTEST_DERIVED_DT.
+ *    2. Add its create/init/check functions in file mtest_datatype.c
+ *    3. Add its creator function to mtestDdtCreators variable
+ */
+
+static int datatype_index = 0;
+
+
+#define MTEST_BDT_START_IDX 0
+#define MTEST_BDT_NUM_TESTS (MTEST_BDT_MAX)
+#define MTEST_BDT_RANGE (MTEST_BDT_START_IDX + MTEST_BDT_NUM_TESTS)
+
+#define MTEST_DDT_NUM_SUBTESTS 4        /* 4 kinds of derived datatype structure */
+#define MTEST_DDT_NUM_TYPES (MTEST_DDT_MAX)
+
+#define MTEST_SEND_DDT_START_IDX (MTEST_BDT_NUM_TESTS)
+#define MTEST_SEND_DDT_NUM_TESTS (MTEST_DDT_NUM_TYPES * MTEST_DDT_NUM_SUBTESTS)
+#define MTEST_SEND_DDT_RANGE (MTEST_SEND_DDT_START_IDX + MTEST_SEND_DDT_NUM_TESTS)
+
+#define MTEST_RECV_DDT_START_IDX (MTEST_SEND_DDT_START_IDX + MTEST_SEND_DDT_NUM_TESTS)
+#define MTEST_RECV_DDT_NUM_TESTS (MTEST_DDT_NUM_TYPES * MTEST_DDT_NUM_SUBTESTS)
+#define MTEST_RECV_DDT_RANGE (MTEST_RECV_DDT_START_IDX + MTEST_RECV_DDT_NUM_TESTS)
+
+static MTestDdtCreator mtestDdtCreators[MTEST_DDT_MAX];
+
+
+/* -------------------------------------------------------------------------------*/
+/* Routine to define various sets of blocklen/count/stride for derived datatypes. */
+/* ------------------------------------------------------------------------------ */
+
+static inline int MTestDdtStructDefine(int ddt_index, int tot_count, int *count,
+                                       int *blen, int *stride, int *align_tot_count)
+{
+    int merr = 0;
+    int ddt_c_st;
+    int _short = 0, _align_tot_count = 0, _count = 0, _blen = 0, _stride = 0;
+    ddt_c_st = ddt_index % MTEST_DDT_NUM_SUBTESTS;
+
+    /* Get short value according to user specified tot_count.
+     * It is used as count for large-block-length structure, or block length
+     * for large-count structure. */
+    if (tot_count < 2) {
+        _short = 1;
+    }
+    else if (tot_count < 64) {
+        _short = 2;
+    }
+    else {
+        _short = 64;
+    }
+    _align_tot_count = (tot_count + _short - 1) & ~(_short - 1);
+
+    switch (ddt_c_st) {
+    case 0:
+        /* Large block length. */
+        _count = _short;
+        _blen = _align_tot_count / _short;
+        _stride = _blen * 2;
+        break;
+    case 1:
+        /* Large count */
+        _count = _align_tot_count / _short;
+        _blen = _short;
+        _stride = _blen * 2;
+        break;
+    case 2:
+        /* Large block length and large stride */
+        _count = _short;
+        _blen = _align_tot_count / _short;
+        _stride = _blen * 10;
+        break;
+    case 3:
+        /* Large count and large stride */
+        _count = _align_tot_count / _short;
+        _blen = _short;
+        _stride = _blen * 10;
+        break;
+    default:
+        /* Undefined index */
+        merr = 1;
+        break;
+    }
+
+    *align_tot_count = _align_tot_count;
+    *count = _count;
+    *blen = _blen;
+    *stride = _stride;
+
+    return merr;
+}
+
+/* ------------------------------------------------------------------------ */
+/* Routine to generate basic datatypes                                       */
+/* ------------------------------------------------------------------------ */
+
+static inline int MTestGetBasicDatatypes(MTestDatatype * sendtype,
+                                         MTestDatatype * recvtype, int tot_count)
+{
+    int merr = 0;
+    int bdt_index = datatype_index - MTEST_BDT_START_IDX;
+    if (bdt_index >= MTEST_BDT_MAX) {
+        printf("Wrong index:  global %d, bst %d in %s\n", datatype_index, bdt_index, __FUNCTION__);
+        merr++;
+        return merr;
+    }
+
+    switch (bdt_index) {
+    case MTEST_BDT_INT:
+        merr = MTestTypeBasicCreate(MPI_INT, sendtype);
+        merr = MTestTypeBasicCreate(MPI_INT, recvtype);
+        break;
+    case MTEST_BDT_DOUBLE:
+        merr = MTestTypeBasicCreate(MPI_DOUBLE, sendtype);
+        merr = MTestTypeBasicCreate(MPI_DOUBLE, recvtype);
+        break;
+    case MTEST_BDT_FLOAT_INT:
+        merr = MTestTypeBasicCreate(MPI_FLOAT_INT, sendtype);
+        merr = MTestTypeBasicCreate(MPI_FLOAT_INT, recvtype);
+        break;
+    case MTEST_BDT_SHORT:
+        merr = MTestTypeBasicCreate(MPI_SHORT, sendtype);
+        merr = MTestTypeBasicCreate(MPI_SHORT, recvtype);
+        break;
+    case MTEST_BDT_LONG:
+        merr = MTestTypeBasicCreate(MPI_LONG, sendtype);
+        merr = MTestTypeBasicCreate(MPI_LONG, recvtype);
+        break;
+    case MTEST_BDT_CHAR:
+        merr = MTestTypeBasicCreate(MPI_CHAR, sendtype);
+        merr = MTestTypeBasicCreate(MPI_CHAR, recvtype);
+        break;
+    case MTEST_BDT_UINT64_T:
+        merr = MTestTypeBasicCreate(MPI_UINT64_T, sendtype);
+        merr = MTestTypeBasicCreate(MPI_UINT64_T, recvtype);
+        break;
+    case MTEST_BDT_FLOAT:
+        merr = MTestTypeBasicCreate(MPI_FLOAT, sendtype);
+        merr = MTestTypeBasicCreate(MPI_FLOAT, recvtype);
+        break;
+    case MTEST_BDT_BYTE:
+        merr = MTestTypeBasicCreate(MPI_BYTE, sendtype);
+        merr = MTestTypeBasicCreate(MPI_BYTE, recvtype);
+        break;
+    }
+    sendtype->count = tot_count;
+    recvtype->count = tot_count;
+
+    return merr;
+}
+
+/* ------------------------------------------------------------------------ */
+/* Routine to generate send/receive derived datatypes                     */
+/* ------------------------------------------------------------------------ */
+
+static inline int MTestGetSendDerivedDatatypes(MTestDatatype * sendtype,
+                                               MTestDatatype * recvtype, int tot_count)
+{
+    int merr = 0;
+    int ddt_datatype_index, ddt_c_dt;
+    int blen, stride, count, align_tot_count, tsize = 1;
+    MPI_Datatype old_type = MPI_DOUBLE;
+
+    /* Check index */
+    ddt_datatype_index = datatype_index - MTEST_SEND_DDT_START_IDX;
+    ddt_c_dt = ddt_datatype_index / MTEST_DDT_NUM_SUBTESTS;
+    if (ddt_c_dt >= MTEST_DDT_MAX || !mtestDdtCreators[ddt_c_dt]) {
+        printf("Wrong index:  global %d, send %d send-ddt %d, or undefined creator in %s\n",
+               datatype_index, ddt_datatype_index, ddt_c_dt, __FUNCTION__);
+        merr++;
+        return merr;
+    }
+
+    /* Set datatype structure */
+    merr = MTestDdtStructDefine(ddt_datatype_index, tot_count, &count, &blen,
+                                &stride, &align_tot_count);
+    if (merr) {
+        printf("Wrong index:  global %d, send %d send-ddt %d, or undefined ddt structure in %s\n",
+               datatype_index, ddt_datatype_index, ddt_c_dt, __FUNCTION__);
+        merr++;
+        return merr;
+    }
+
+    /* Create send datatype */
+    merr = mtestDdtCreators[ddt_c_dt] (count, blen, stride, old_type, "send", sendtype);
+    if (merr)
+        return merr;
+
+    sendtype->count = 1;
+    merr = MPI_Type_size(sendtype->datatype, &tsize);
+    if (merr)
+        MTestPrintError(merr);
+
+    /* Create receive datatype */
+    merr = MTestTypeBasicCreate(MPI_CHAR, recvtype);
+    if (merr)
+        return merr;
+
+    recvtype->count = sendtype->count * tsize;
+
+    return merr;
+}
+
+static inline int MTestGetRecvDerivedDatatypes(MTestDatatype * sendtype,
+                                               MTestDatatype * recvtype, int tot_count)
+{
+    int merr = 0;
+    int ddt_datatype_index, ddt_c_dt;
+    int blen, stride, count, align_tot_count, tsize;
+    MPI_Datatype old_type = MPI_DOUBLE;
+
+    /* Check index */
+    ddt_datatype_index = datatype_index - MTEST_RECV_DDT_START_IDX;
+    ddt_c_dt = ddt_datatype_index / MTEST_DDT_NUM_SUBTESTS;
+    if (ddt_c_dt >= MTEST_DDT_MAX || !mtestDdtCreators[ddt_c_dt]) {
+        printf("Wrong index:  global %d, recv %d recv-ddt %d, or undefined creator in %s\n",
+               datatype_index, ddt_datatype_index, ddt_c_dt, __FUNCTION__);
+        merr++;
+        return merr;
+    }
+
+    /* Set datatype structure */
+    merr = MTestDdtStructDefine(ddt_datatype_index, tot_count, &count, &blen,
+                                &stride, &align_tot_count);
+    if (merr) {
+        printf("Wrong index:  global %d, recv %d recv-ddt %d, or undefined ddt structure in %s\n",
+               datatype_index, ddt_datatype_index, ddt_c_dt, __FUNCTION__);
+        return merr;
+    }
+
+    /* Create receive datatype */
+    merr = mtestDdtCreators[ddt_c_dt] (count, blen, stride, old_type, "recv", recvtype);
+    if (merr)
+        return merr;
+
+    recvtype->count = 1;
+    merr = MPI_Type_size(recvtype->datatype, &tsize);
+    if (merr)
+        MTestPrintError(merr);
+
+    /* Create send datatype */
+    merr = MTestTypeBasicCreate(MPI_CHAR, sendtype);
+    if (merr)
+        return merr;
+
+    sendtype->count = recvtype->count * tsize;
+
+    return merr;
+}
+
+static inline void MTestResetDatatype(MTestDatatype * mtype)
+{
+    mtype->InitBuf = 0;
+    mtype->FreeBuf = 0;
+    mtype->CheckBuf = 0;
+    mtype->datatype = 0;
+    mtype->isBasic = 0;
+    mtype->printErrors = 0;
+    mtype->buf = 0;
+}
+
+
+/* ------------------------------------------------------------------------ */
+/* Exposed routine to external tests                                         */
+/* ------------------------------------------------------------------------ */
+int MTestGetDatatypes(MTestDatatype * sendtype, MTestDatatype * recvtype, int tot_count)
+{
+    int merr = 0;
+
+    MTestGetDbgInfo(&dbgflag, &verbose);
+    MPI_Comm_rank(MPI_COMM_WORLD, &wrank);
+
+    MTestResetDatatype(sendtype);
+    MTestResetDatatype(recvtype);
+
+    MTestTypeCreatorInit((MTestDdtCreator *) mtestDdtCreators);
+
+    if (datatype_index < MTEST_BDT_RANGE) {
+        merr = MTestGetBasicDatatypes(sendtype, recvtype, tot_count);
+
+    }
+    else if (datatype_index < MTEST_SEND_DDT_RANGE) {
+        merr = MTestGetSendDerivedDatatypes(sendtype, recvtype, tot_count);
+
+    }
+    else if (datatype_index < MTEST_RECV_DDT_RANGE) {
+        merr = MTestGetRecvDerivedDatatypes(sendtype, recvtype, tot_count);
+
+    }
+    else {
+        /* out of range */
+        datatype_index = -1;
+    }
+
+    /* stop if error reported */
+    if (merr) {
+        datatype_index = -1;
+    }
+
+    if (datatype_index > 0) {
+        /* general initialization for receive buffer. */
+        recvtype->InitBuf = MTestTypeInitRecv;
+    }
+
+    datatype_index++;
+
+    if ((verbose || dbgflag) && datatype_index > 0) {
+        int ssize, rsize;
+        const char *sendtype_nm = MTestGetDatatypeName(sendtype);
+        const char *recvtype_nm = MTestGetDatatypeName(recvtype);
+        MPI_Type_size(sendtype->datatype, &ssize);
+        MPI_Type_size(recvtype->datatype, &rsize);
+        printf("Get datatypes: send = %s(size %d count %d basesize %d), "
+               "recv = %s(size %d count %d basesize %d), tot_count=%d\n",
+               sendtype_nm, ssize, sendtype->count, sendtype->basesize,
+               recvtype_nm, rsize, recvtype->count, recvtype->basesize,
+               tot_count);
+        fflush(stdout);
+    }
+
+    return datatype_index;
+}
+
+/* Reset the datatype index (start from the initial data type.
+   Note: This routine is rarely needed; MTestGetDatatypes automatically
+   starts over after the last available datatype is used.
+*/
+void MTestResetDatatypes(void)
+{
+    datatype_index = 0;
+}
+
+/* Return the index of the current datatype.  This is rarely needed and
+   is provided mostly to enable debugging of the MTest package itself */
+int MTestGetDatatypeIndex(void)
+{
+    return datatype_index;
+}
+
+/* Free the storage associated with a datatype */
+void MTestFreeDatatype(MTestDatatype * mtype)
+{
+    int merr;
+    /* Invoke a datatype-specific free function to handle
+     * both the datatype and the send/receive buffers */
+    if (mtype->FreeBuf) {
+        (mtype->FreeBuf) (mtype);
+    }
+    /* Free the datatype itself if it was created */
+    if (!mtype->isBasic) {
+        merr = MPI_Type_free(&mtype->datatype);
+        if (merr)
+            MTestPrintError(merr);
+    }
+}
+
+/* Check that a message was received correctly.  Returns the number of
+   errors detected.  Status may be NULL or MPI_STATUS_IGNORE */
+int MTestCheckRecv(MPI_Status * status, MTestDatatype * recvtype)
+{
+    int count;
+    int errs = 0, merr;
+
+    if (status && status != MPI_STATUS_IGNORE) {
+        merr = MPI_Get_count(status, recvtype->datatype, &count);
+        if (merr)
+            MTestPrintError(merr);
+
+        /* Check count against expected count */
+        if (count != recvtype->count) {
+            errs++;
+        }
+    }
+
+    /* Check received data */
+    if (!errs && recvtype->CheckBuf(recvtype)) {
+        errs++;
+    }
+    return errs;
+}
+
+/* This next routine uses a circular buffer of static name arrays just to
+   simplify the use of the routine */
+const char *MTestGetDatatypeName(MTestDatatype * dtype)
+{
+    static char name[4][MPI_MAX_OBJECT_NAME];
+    static int sp = 0;
+    int rlen, merr;
+
+    if (sp >= 4)
+        sp = 0;
+    merr = MPI_Type_get_name(dtype->datatype, name[sp], &rlen);
+    if (merr)
+        MTestPrintError(merr);
+    return (const char *) name[sp++];
+}

http://git.mpich.org/mpich.git/commitdiff/bda9517d616b83133028aa11949a91184ae52512

commit bda9517d616b83133028aa11949a91184ae52512
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Wed Nov 5 01:03:32 2014 -0600

    portals4: set max origin events in rptl_init
    
    Set the maximum number of origin events to the returned limit from
    PtlNIInit. Rportals uses the value to prevent exhausting the local
    EQ and causing a flowcontrol event.
    
    No reviewer.

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
index e69f4e1..2803cae 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
@@ -217,7 +217,7 @@ static int ptl_init(MPIDI_PG_t *pg_p, int pg_rank, char **bc_val_p, int *val_max
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmdbind", "**ptlmdbind %s", MPID_nem_ptl_strerror(ret));
 
     /* currently, rportlas only works with a single NI and EQ */
-    ret = MPID_nem_ptl_rptl_init(MPIDI_Process.my_pg->size, 5, get_target_info);
+    ret = MPID_nem_ptl_rptl_init(MPIDI_Process.my_pg->size, MPIDI_nem_ptl_ni_limits.max_eqs, get_target_info);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlniinit", "**ptlniinit %s", MPID_nem_ptl_strerror(ret));
 
     /* allow rportal to manage the primary portal and retransmit if needed */

http://git.mpich.org/mpich.git/commitdiff/aa4992e71aab7c210b877c58d965d37159f82dbb

commit aa4992e71aab7c210b877c58d965d37159f82dbb
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Tue Nov 4 22:28:19 2014 -0600

    portals4: prevent early request free
    
    The large send handler incorrectly assumed event ordering from portals.
    This could lead to a request being freed while pending events would
    still attempt to access it, causing a segfault or incorrect handler to
    execute.
    
    Signed-off-by: Pavan Balaji <balaji at anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
index d7974a5..f5c204d 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
@@ -45,6 +45,7 @@ typedef struct {
     ptl_handle_me_t put_me;
     ptl_handle_me_t *get_me_p;
     int num_gets;
+    int put_acked;
     ptl_size_t chunk_offset;
     void *chunk_buffer[MPID_NEM_PTL_NUM_CHUNK_BUFFERS];
     MPIDI_msg_sz_t bytes_put;
@@ -67,6 +68,7 @@ typedef struct {
         REQ_PTL(req_)->put_me        = PTL_INVALID_HANDLE;      \
         REQ_PTL(req_)->get_me_p      = NULL;                    \
         REQ_PTL(req_)->num_gets      = 0;                       \
+        REQ_PTL(req_)->put_acked     = 0;                       \
         REQ_PTL(req_)->event_handler = NULL;                    \
         REQ_PTL(req_)->chunk_offset  = 0;                       \
     } while (0)
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
index 6abffaa..e6bbc66 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
@@ -111,12 +111,16 @@ static int handler_large(const ptl_event_t *e)
         /* truncated message */
         mpi_errno = handler_send_complete(e);
         if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+    } else if (e->type == PTL_EVENT_ACK) {
+        REQ_PTL(sreq)->put_acked = 1;
     } else if (e->type == PTL_EVENT_GET) {
         /* decrement the remaining get operations */
-        if (--REQ_PTL(sreq)->num_gets == 0)
-            mpi_errno = handler_send_complete(e);
+        REQ_PTL(sreq)->num_gets--;
     }
 
+    if (REQ_PTL(sreq)->num_gets == 0 && REQ_PTL(sreq)->put_acked)
+        mpi_errno = handler_send_complete(e);
+
  fn_exit:
     MPIDI_FUNC_EXIT(MPID_STATE_HANDLER_LARGE);
     return mpi_errno;

http://git.mpich.org/mpich.git/commitdiff/104f93433c0b1fc29cc760f270233fd13986c8b7

commit 104f93433c0b1fc29cc760f270233fd13986c8b7
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Tue Oct 28 17:49:44 2014 -0500

    portals4: handle out-of-order reply events
    
    In large message cases, when multiple get operations are issued, the
    data may arrive out-of-order back at the initiator. A counter is required
    to ensure all operations have completed. In the temporary buffer case, we
    simply wait for all the data to arrive, and unpack in one operation.
    
    Signed-off-by: Pavan Balaji <balaji at anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
index ec599f2..d7974a5 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
@@ -44,6 +44,7 @@ typedef struct {
     ptl_handle_md_t md;
     ptl_handle_me_t put_me;
     ptl_handle_me_t *get_me_p;
+    int num_gets;
     ptl_size_t chunk_offset;
     void *chunk_buffer[MPID_NEM_PTL_NUM_CHUNK_BUFFERS];
     MPIDI_msg_sz_t bytes_put;
@@ -65,6 +66,7 @@ typedef struct {
         REQ_PTL(req_)->md            = PTL_INVALID_HANDLE;      \
         REQ_PTL(req_)->put_me        = PTL_INVALID_HANDLE;      \
         REQ_PTL(req_)->get_me_p      = NULL;                    \
+        REQ_PTL(req_)->num_gets      = 0;                       \
         REQ_PTL(req_)->event_handler = NULL;                    \
         REQ_PTL(req_)->chunk_offset  = 0;                       \
     } while (0)
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c
index 4c4b6a4..26a1eb2 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c
@@ -165,14 +165,11 @@ int MPID_nem_ptl_poll(int is_blocking_poll)
         case PTL_EVENT_ACK:
         case PTL_EVENT_REPLY:
         case PTL_EVENT_SEARCH: {
-            /* intermediate operations for large messages pass a NULL user_ptr. we can ignore these events */
-            if (event.user_ptr) {
-                MPID_Request * const req = event.user_ptr;
-                MPIU_DBG_MSG_P(CH3_CHANNEL, VERBOSE, "req = %p", req);
-                MPIU_DBG_MSG_P(CH3_CHANNEL, VERBOSE, "REQ_PTL(req)->event_handler = %p", REQ_PTL(req)->event_handler);
-                mpi_errno = REQ_PTL(req)->event_handler(&event);
-                if (mpi_errno) MPIU_ERR_POP(mpi_errno);
-            }
+            MPID_Request * const req = event.user_ptr;
+            MPIU_DBG_MSG_P(CH3_CHANNEL, VERBOSE, "req = %p", req);
+            MPIU_DBG_MSG_P(CH3_CHANNEL, VERBOSE, "REQ_PTL(req)->event_handler = %p", REQ_PTL(req)->event_handler);
+            mpi_errno = REQ_PTL(req)->event_handler(&event);
+            if (mpi_errno) MPIU_ERR_POP(mpi_errno);
             break;
         }
         case PTL_EVENT_AUTO_FREE:
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c
index 80576e7..2152da7 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c
@@ -57,7 +57,7 @@ static int handler_recv_complete(const ptl_event_t *e)
     MPIDI_FUNC_ENTER(MPID_STATE_HANDLER_RECV_COMPLETE);
     
     MPIU_Assert(e->type == PTL_EVENT_REPLY || e->type == PTL_EVENT_PUT || e->type == PTL_EVENT_PUT_OVERFLOW);
-    
+
     if (REQ_PTL(rreq)->md != PTL_INVALID_HANDLE) {
         ret = PtlMDRelease(REQ_PTL(rreq)->md);
         MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmdrelease", "**ptlmdrelease %s", MPID_nem_ptl_strerror(ret));
@@ -122,10 +122,10 @@ static int handler_recv_dequeue_complete(const ptl_event_t *e)
 }
 
 #undef FUNCNAME
-#define FUNCNAME handler_recv_unpack
+#define FUNCNAME handler_recv_big_get
 #undef FCNAME
 #define FCNAME MPIU_QUOTE(FUNCNAME)
-static int handler_recv_unpack(const ptl_event_t *e)
+static int handler_recv_big_get(const ptl_event_t *e)
 {
     int mpi_errno = MPI_SUCCESS;
     MPID_Request *const rreq = e->user_ptr;
@@ -137,14 +137,17 @@ static int handler_recv_unpack(const ptl_event_t *e)
 
     MPIU_Assert(e->type == PTL_EVENT_REPLY);
 
-    last = rreq->dev.segment_size;
-    MPID_Segment_unpack(rreq->dev.segment_ptr, rreq->dev.segment_first, &last,
-                        (char *)REQ_PTL(rreq)->chunk_buffer[0] + REQ_PTL(rreq)->chunk_offset);
-
-    rreq->dev.segment_first += e->mlength;
-    REQ_PTL(rreq)->chunk_offset += e->mlength;
-    if (rreq->dev.segment_first == rreq->dev.segment_size)
+    /* decrement the number of remaining gets */
+    REQ_PTL(rreq)->num_gets--;
+    if (REQ_PTL(rreq)->num_gets == 0) {
+        /* if we used a temporary buffer, unpack the data */
+        if (REQ_PTL(rreq)->chunk_buffer[0]) {
+            last = rreq->dev.segment_size;
+            MPID_Segment_unpack(rreq->dev.segment_ptr, rreq->dev.segment_first, &last, REQ_PTL(rreq)->chunk_buffer[0]);
+            MPIU_Assert(last == rreq->dev.segment_size);
+        }
         mpi_errno = handler_recv_complete(e);
+    }
 
     if (mpi_errno) MPIU_ERR_POP(mpi_errno);
 
@@ -164,25 +167,21 @@ static void big_get(void *buf, ptl_size_t left_to_get, MPIDI_VC_t *vc, ptl_match
     int ret;
     MPID_nem_ptl_vc_area *vc_ptl;
     ptl_size_t start, get_sz;
-    void *user_ptr = NULL;
 
     vc_ptl = VC_PTL(vc);
     start = (ptl_size_t)buf;
 
-    /* we need to handle all event if we are unpacking from the chunk_buffer */
-    if (REQ_PTL(rreq)->event_handler == handler_recv_unpack)
-        user_ptr = rreq;
+    /* we need to handle all events */
+    REQ_PTL(rreq)->event_handler = handler_recv_big_get;
 
     while (left_to_get > 0) {
         /* get up to the maximum allowed by the portals interface */
-        if (left_to_get > MPIDI_nem_ptl_ni_limits.max_msg_size) {
+        if (left_to_get > MPIDI_nem_ptl_ni_limits.max_msg_size)
             get_sz = MPIDI_nem_ptl_ni_limits.max_msg_size;
-        } else {
+        else
             get_sz = left_to_get;
-            /* attach the request to the final operation */
-            user_ptr = rreq;
-        }
-        ret = MPID_nem_ptl_rptl_get(MPIDI_nem_ptl_global_md, start, get_sz, vc_ptl->id, vc_ptl->ptg, match_bits, 0, user_ptr);
+
+        ret = MPID_nem_ptl_rptl_get(MPIDI_nem_ptl_global_md, start, get_sz, vc_ptl->id, vc_ptl->ptg, match_bits, 0, rreq);
         DBG_MSG_GET("global", get_sz, vc->pg_rank, match_bits);
         MPIU_DBG_MSG_P(CH3_CHANNEL, VERBOSE, "   buf=%p", (char *)start);
         MPIU_Assert(ret == 0);
@@ -190,6 +189,7 @@ static void big_get(void *buf, ptl_size_t left_to_get, MPIDI_VC_t *vc, ptl_match
         /* account for what has been sent */
         start += get_sz;
         left_to_get -= get_sz;
+        REQ_PTL(rreq)->num_gets++;
     }
 }
 
@@ -304,8 +304,6 @@ static int handler_recv_dequeue_large(const ptl_event_t *e)
 
     /* we need to GET the rest of the data from the sender's buffer */
     if (dt_contig) {
-        REQ_PTL(rreq)->event_handler = handler_recv_complete;
-
         big_get((char *)rreq->dev.user_buf + dt_true_lb + PTL_LARGE_THRESHOLD, data_sz - PTL_LARGE_THRESHOLD,
                 vc, e->match_bits, rreq);
         goto fn_exit;
@@ -338,10 +336,9 @@ static int handler_recv_dequeue_large(const ptl_event_t *e)
         
     /* message won't fit in a single IOV, allocate buffer and unpack when received */
     /* FIXME: For now, allocate a single large buffer to hold entire message */
-    MPIU_CHKPMEM_MALLOC(REQ_PTL(rreq)->chunk_buffer[0], void *, rreq->dev.segment_size - rreq->dev.segment_first,
+    MPIU_CHKPMEM_MALLOC(REQ_PTL(rreq)->chunk_buffer[0], void *, data_sz - PTL_LARGE_THRESHOLD,
                         mpi_errno, "chunk_buffer");
-    REQ_PTL(rreq)->event_handler = handler_recv_unpack;
-    big_get(REQ_PTL(rreq)->chunk_buffer[0], rreq->dev.segment_size - rreq->dev.segment_first, vc, e->match_bits, rreq);
+    big_get(REQ_PTL(rreq)->chunk_buffer[0], data_sz - PTL_LARGE_THRESHOLD, vc, e->match_bits, rreq);
 
  fn_exit:
     MPIU_CHKPMEM_COMMIT();
@@ -396,7 +393,6 @@ static int handler_recv_dequeue_unpack_large(const ptl_event_t *e)
 
     MPIU_CHKPMEM_MALLOC(REQ_PTL(rreq)->chunk_buffer[0], void *, rreq->dev.segment_size - rreq->dev.segment_first,
                         mpi_errno, "chunk_buffer");
-    REQ_PTL(rreq)->event_handler = handler_recv_unpack;
     big_get(REQ_PTL(rreq)->chunk_buffer[0], rreq->dev.segment_size - rreq->dev.segment_first, vc, e->match_bits, rreq);
 
  fn_exit:
@@ -708,7 +704,6 @@ int MPID_nem_ptl_lmt_start_recv(MPIDI_VC_t *vc,  MPID_Request *rreq, MPID_IOV s_
     if (dt_contig) {
         void * real_user_buf = (char *)rreq->dev.user_buf + dt_true_lb;
 
-        REQ_PTL(rreq)->event_handler = handler_recv_complete;
         big_get((char *)real_user_buf + PTL_LARGE_THRESHOLD, data_sz - PTL_LARGE_THRESHOLD, vc, match_bits, rreq);
 
         /* The memcpy is done after the get purposely for overlapping */
@@ -756,7 +751,6 @@ int MPID_nem_ptl_lmt_start_recv(MPIDI_VC_t *vc,  MPID_Request *rreq, MPID_IOV s_
             /* FIXME: For now, allocate a single large buffer to hold entire message */
             MPIU_CHKPMEM_MALLOC(REQ_PTL(rreq)->chunk_buffer[0], void *, rreq->dev.segment_size - rreq->dev.segment_first,
                                 mpi_errno, "chunk_buffer");
-            REQ_PTL(rreq)->event_handler = handler_recv_unpack;
             big_get(REQ_PTL(rreq)->chunk_buffer[0], rreq->dev.segment_size - rreq->dev.segment_first, vc, match_bits, rreq);
         }
     }
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
index 152ff88..6abffaa 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
@@ -13,7 +13,6 @@
 #define FCNAME MPIU_QUOTE(FUNCNAME)
 static void big_meappend(void *buf, ptl_size_t left_to_send, MPIDI_VC_t *vc, ptl_match_bits_t match_bits, MPID_Request *sreq)
 {
-    void * user_ptr = NULL;
     int i, ret;
     MPID_nem_ptl_vc_area *vc_ptl;
     ptl_me_t me;
@@ -39,19 +38,17 @@ static void big_meappend(void *buf, ptl_size_t left_to_send, MPIDI_VC_t *vc, ptl
         /* send up to the maximum allowed by the portals interface */
         if (left_to_send > MPIDI_nem_ptl_ni_limits.max_msg_size)
             me.length = MPIDI_nem_ptl_ni_limits.max_msg_size;
-        else {
+        else
             me.length = left_to_send;
-            /* attach the request to the final operation */
-            user_ptr = sreq;
-        }
 
-        ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_get_pt, &me, PTL_PRIORITY_LIST, user_ptr, &REQ_PTL(sreq)->get_me_p[i]);
+        ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_get_pt, &me, PTL_PRIORITY_LIST, sreq, &REQ_PTL(sreq)->get_me_p[i]);
         DBG_MSG_MEAPPEND("CTL", vc->pg_rank, me, sreq);
         MPIU_Assert(ret == 0);
 
         /* account for what has been sent */
         me.start = (char *)me.start + me.length;
         left_to_send -= me.length;
+        REQ_PTL(sreq)->num_gets++;
     }
 }
 
@@ -114,8 +111,10 @@ static int handler_large(const ptl_event_t *e)
         /* truncated message */
         mpi_errno = handler_send_complete(e);
         if (mpi_errno) MPIU_ERR_POP(mpi_errno);
-    } else {
-        REQ_PTL(sreq)->event_handler = handler_send_complete;
+    } else if (e->type == PTL_EVENT_GET) {
+        /* decrement the remaining get operations */
+        if (--REQ_PTL(sreq)->num_gets == 0)
+            mpi_errno = handler_send_complete(e);
     }
 
  fn_exit:
@@ -382,7 +381,8 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
                 me.min_free = 0;
 
                 MPIU_CHKPMEM_MALLOC(REQ_PTL(sreq)->get_me_p, ptl_handle_me_t *, sizeof(ptl_handle_me_t), mpi_errno, "get_me_p");
-                        
+
+                REQ_PTL(sreq)->num_gets = 1;
                 ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_get_pt, &me, PTL_PRIORITY_LIST, sreq,
                                   &REQ_PTL(sreq)->get_me_p[0]);
                 MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmeappend", "**ptlmeappend %s", MPID_nem_ptl_strerror(ret));

http://git.mpich.org/mpich.git/commitdiff/f20134719cf1621e9b9dc71507f53dc61120ba9b

commit f20134719cf1621e9b9dc71507f53dc61120ba9b
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Thu Oct 23 10:54:13 2014 -0500

    add test for large non-contiguous datatype
    
    This is useful test case for netmods that use packing and/or break
    large messages into smaller chunks.
    
    Signed-off-by: Pavan Balaji <balaji at anl.gov>

diff --git a/test/mpi/datatype/Makefile.am b/test/mpi/datatype/Makefile.am
index 03c13b5..2840393 100644
--- a/test/mpi/datatype/Makefile.am
+++ b/test/mpi/datatype/Makefile.am
@@ -39,6 +39,7 @@ noinst_PROGRAMS =           \
     large-count             \
     large_type              \
     large_type_sendrec      \
+    large_vec	            \
     lbub                    \
     localpack               \
     longdouble              \
diff --git a/test/mpi/datatype/large_vec.c b/test/mpi/datatype/large_vec.c
new file mode 100644
index 0000000..0b7eafb
--- /dev/null
+++ b/test/mpi/datatype/large_vec.c
@@ -0,0 +1,81 @@
+/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
+/*
+ *  (C) 2014 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ */
+#include <mpi.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include "mpitest.h"
+
+/* tests non-contig send/recv of a message > 2GB. count=270M, type=long long
+   run with 3 processes to exercise both shared memory and TCP in Nemesis tests*/
+
+int main(int argc, char *argv[])
+{
+    int ierr, i, size, rank;
+    int elems = 270000000;
+    MPI_Status status;
+    MPI_Datatype dtype;
+    long long *cols;
+    int errs = 0;
+
+
+    MTest_Init(&argc, &argv);
+
+    /* need large memory */
+    if (sizeof(void *) < 8) {
+        MTest_Finalize(errs);
+        MPI_Finalize();
+        return 0;
+    }
+
+    ierr = MPI_Comm_size(MPI_COMM_WORLD, &size);
+    ierr = MPI_Comm_rank(MPI_COMM_WORLD, &rank);
+    if (size != 3) {
+        fprintf(stderr, "[%d] usage: mpiexec -n 3 %s\n", rank, argv[0]);
+        MPI_Abort(MPI_COMM_WORLD, 1);
+    }
+
+    cols = malloc(elems * sizeof(long long));
+    if (cols == NULL) {
+        printf("malloc of >2GB array failed\n");
+        errs++;
+        MTest_Finalize(errs);
+        MPI_Finalize();
+        return 0;
+    }
+
+    MPI_Type_vector(elems / 2, 1, 2, MPI_LONG_LONG_INT, &dtype);
+    MPI_Type_commit(&dtype);
+
+    if (rank == 0) {
+        for (i = 0; i < elems; i++)
+            cols[i] = i;
+        /* printf("[%d] sending...\n",rank); */
+        ierr = MPI_Send(cols, 1, dtype, 1, 0, MPI_COMM_WORLD);
+        ierr = MPI_Send(cols, 1, dtype, 2, 0, MPI_COMM_WORLD);
+    }
+    else {
+        /* printf("[%d] receiving...\n",rank); */
+        for (i = 0; i < elems; i++)
+            cols[i] = -1;
+        ierr = MPI_Recv(cols, 1, dtype, 0, 0, MPI_COMM_WORLD, &status);
+        /* ierr = MPI_Get_count(&status,MPI_LONG_LONG_INT,&cnt);
+         * Get_count still fails because count is not 64 bit */
+        for (i = 0; i < elems; i++) {
+            if (i % 2)
+                continue;
+            if (cols[i] != i) {
+                printf("Rank %d, cols[i]=%lld, should be %d\n", rank, cols[i], i);
+                errs++;
+            }
+        }
+    }
+
+    MPI_Type_free(&dtype);
+
+    MTest_Finalize(errs);
+    MPI_Finalize();
+    return 0;
+}
diff --git a/test/mpi/datatype/testlist.in b/test/mpi/datatype/testlist.in
index 549c3eb..7fee6a4 100644
--- a/test/mpi/datatype/testlist.in
+++ b/test/mpi/datatype/testlist.in
@@ -59,3 +59,4 @@ cxx-types 1 mpiversion=3.0
 @largetest at large_type 1 mpiversion=3.0
 @largetest at large_type_sendrec 2 arg=31 mpiversion=3.0
 @largetest at large_type_sendrec 2 arg=32 mpiversion=3.0 timeLimit=360
+ at largetest@large_vec 3 mpiversion=3.0

http://git.mpich.org/mpich.git/commitdiff/bbc93f88e3d0d18cc498e1ad278982feb5fe39ce

commit bbc93f88e3d0d18cc498e1ad278982feb5fe39ce
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Fri Oct 24 14:35:27 2014 -0500

    portals4: use helper function for big sends
    
    Move some duplicate code for posting multiple get operations into a
    dedicated helper function.
    
    Signed-off-by: Pavan Balaji <balaji at anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
index 95e244e..152ff88 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
@@ -8,6 +8,54 @@
 #include "rptl.h"
 
 #undef FUNCNAME
+#define FUNCNAME big_meappend
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+static void big_meappend(void *buf, ptl_size_t left_to_send, MPIDI_VC_t *vc, ptl_match_bits_t match_bits, MPID_Request *sreq)
+{
+    void * user_ptr = NULL;
+    int i, ret;
+    MPID_nem_ptl_vc_area *vc_ptl;
+    ptl_me_t me;
+
+    vc_ptl = VC_PTL(vc);
+
+    me.start = buf;
+    me.ct_handle = PTL_CT_NONE;
+    me.uid = PTL_UID_ANY;
+    me.options = ( PTL_ME_OP_PUT | PTL_ME_OP_GET | PTL_ME_USE_ONCE | PTL_ME_IS_ACCESSIBLE | PTL_ME_EVENT_LINK_DISABLE |
+                   PTL_ME_EVENT_UNLINK_DISABLE );
+    me.match_id = vc_ptl->id;
+    me.match_bits = match_bits;
+    me.ignore_bits = 0;
+    me.min_free = 0;
+
+    /* allocate enough handles to cover all get operations */
+    REQ_PTL(sreq)->get_me_p = MPIU_Malloc(sizeof(ptl_handle_me_t) *
+                                        ((left_to_send / MPIDI_nem_ptl_ni_limits.max_msg_size) + 1));
+
+    /* queue up as many entries as necessary to describe the entire message */
+    for (i = 0; left_to_send > 0; i++) {
+        /* send up to the maximum allowed by the portals interface */
+        if (left_to_send > MPIDI_nem_ptl_ni_limits.max_msg_size)
+            me.length = MPIDI_nem_ptl_ni_limits.max_msg_size;
+        else {
+            me.length = left_to_send;
+            /* attach the request to the final operation */
+            user_ptr = sreq;
+        }
+
+        ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_get_pt, &me, PTL_PRIORITY_LIST, user_ptr, &REQ_PTL(sreq)->get_me_p[i]);
+        DBG_MSG_MEAPPEND("CTL", vc->pg_rank, me, sreq);
+        MPIU_Assert(ret == 0);
+
+        /* account for what has been sent */
+        me.start = (char *)me.start + me.length;
+        left_to_send -= me.length;
+    }
+}
+
+#undef FUNCNAME
 #define FUNCNAME handler_send_complete
 #undef FCNAME
 #define FCNAME MPIU_QUOTE(FUNCNAME)
@@ -277,47 +325,9 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
     /* Large message.  Send first chunk of data and let receiver get the rest */
     if (dt_contig) {
         /* create ME for buffer so receiver can issue a GET for the data */
-        ptl_size_t left_to_send;
-        void * user_ptr = NULL;
-        int i;
-
         MPIU_DBG_MSG(CH3_CHANNEL, VERBOSE, "Large contig message");
-        me.start = (char *)buf + dt_true_lb + PTL_LARGE_THRESHOLD;
-        left_to_send = data_sz - PTL_LARGE_THRESHOLD;
-        me.ct_handle = PTL_CT_NONE;
-        me.uid = PTL_UID_ANY;
-        me.options = ( PTL_ME_OP_PUT | PTL_ME_OP_GET | PTL_ME_USE_ONCE | PTL_ME_IS_ACCESSIBLE | PTL_ME_EVENT_LINK_DISABLE |
-                       PTL_ME_EVENT_UNLINK_DISABLE );
-        me.match_id = vc_ptl->id;
-        me.match_bits = NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank);
-        me.ignore_bits = 0;
-        me.min_free = 0;
-
-        /* allocate enough handles to cover all get operations */
-        MPIU_CHKPMEM_MALLOC(REQ_PTL(sreq)->get_me, ptl_handle_me_t *,
-                            sizeof(ptl_handle_me_t) * ((left_to_send / MPIDI_nem_ptl_ni_limits.max_msg_size) + 1),
-                            mpi_errno, "get_me");
-
-        /* queue up as many entries as necessary to describe the entire message */
-        for (i = 0; left_to_send > 0; i++) {
-            /* send up to the maximum allowed by the portals interface */
-            if (left_to_send > MPIDI_nem_ptl_ni_limits.max_msg_size)
-                me.length = MPIDI_nem_ptl_ni_limits.max_msg_size;
-            else {
-                me.length = left_to_send;
-                /* attach the request to the final operation */
-                user_ptr = sreq;
-            }
-
-            ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_get_pt, &me, PTL_PRIORITY_LIST, user_ptr, &REQ_PTL(sreq)->get_me[i]);
-            MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmeappend", "**ptlmeappend %s", MPID_nem_ptl_strerror(ret));
-            DBG_MSG_MEAPPEND("CTL", vc->pg_rank, me, sreq);
-
-            /* account for what has been sent */
-            me.start = (char *)me.start + me.length;
-            left_to_send -= me.length;
-        }
-
+        big_meappend((char *)buf + dt_true_lb + PTL_LARGE_THRESHOLD, data_sz - PTL_LARGE_THRESHOLD, vc,
+                     NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), sreq);
         REQ_PTL(sreq)->large = TRUE;
 
         REQ_PTL(sreq)->event_handler = handler_large;
@@ -402,11 +412,6 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
         /* Don't handle this case separately */
     }
 
-    /* same code as large contig */
-    ptl_size_t left_to_send;
-    void * user_ptr = NULL;
-    int i;
-
     /* allocate a temporary buffer and copy all the data to send */
     MPIU_CHKPMEM_MALLOC(REQ_PTL(sreq)->chunk_buffer[0], void *, data_sz, mpi_errno, "tmpbuf");
 
@@ -414,42 +419,8 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
     MPID_Segment_pack(sreq->dev.segment_ptr, 0, &last, REQ_PTL(sreq)->chunk_buffer[0]);
     MPIU_Assert(last == data_sz);
 
-    me.start = (char *)REQ_PTL(sreq)->chunk_buffer[0] + PTL_LARGE_THRESHOLD;
-    left_to_send = data_sz - PTL_LARGE_THRESHOLD;
-    me.ct_handle = PTL_CT_NONE;
-    me.uid = PTL_UID_ANY;
-    me.options = ( PTL_ME_OP_PUT | PTL_ME_OP_GET | PTL_ME_USE_ONCE | PTL_ME_IS_ACCESSIBLE | PTL_ME_EVENT_LINK_DISABLE |
-                   PTL_ME_EVENT_UNLINK_DISABLE );
-    me.match_id = vc_ptl->id;
-    me.match_bits = NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank);
-    me.ignore_bits = 0;
-    me.min_free = 0;
-
-    /* allocate enough handles to cover all get operations */
-    MPIU_CHKPMEM_MALLOC(REQ_PTL(sreq)->get_me, ptl_handle_me_t *,
-                        sizeof(ptl_handle_me_t) * ((left_to_send / MPIDI_nem_ptl_ni_limits.max_msg_size) + 1),
-                        mpi_errno, "get_me");
-
-    /* queue up as many entries as necessary to describe the entire message */
-    for (i = 0; left_to_send > 0; i++) {
-        /* send up to the maximum allowed by the portals interface */
-        if (left_to_send > MPIDI_nem_ptl_ni_limits.max_msg_size)
-            me.length = MPIDI_nem_ptl_ni_limits.max_msg_size;
-        else {
-            me.length = left_to_send;
-            /* attach the request to the final operation */
-            user_ptr = sreq;
-        }
-
-        ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_get_pt, &me, PTL_PRIORITY_LIST, user_ptr, &REQ_PTL(sreq)->get_me[i]);
-        MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmeappend", "**ptlmeappend %s", MPID_nem_ptl_strerror(ret));
-        DBG_MSG_MEAPPEND("CTL", vc->pg_rank, me, sreq);
-
-        /* account for what has been sent */
-        me.start = (char *)me.start + me.length;
-        left_to_send -= me.length;
-    }
-
+    big_meappend((char *)REQ_PTL(sreq)->chunk_buffer[0] + PTL_LARGE_THRESHOLD, data_sz - PTL_LARGE_THRESHOLD, vc,
+                 NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), sreq);
     REQ_PTL(sreq)->large = TRUE;
 
     REQ_PTL(sreq)->event_handler = handler_large;

http://git.mpich.org/mpich.git/commitdiff/632cacf4284749e39f4b65b9d0b02291f70e2f1b

commit 632cacf4284749e39f4b65b9d0b02291f70e2f1b
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Thu Oct 23 10:53:45 2014 -0500

    portals4: add support for large non-contig
    
    Large messages (either larger than max_msg_size or > MPID_IOV_LIMIT), will
    be packed into a temporary buffer. These need to be optimized.
    
    Signed-off-by: Pavan Balaji <balaji at anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
index 10c66f1..ec599f2 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
@@ -44,6 +44,7 @@ typedef struct {
     ptl_handle_md_t md;
     ptl_handle_me_t put_me;
     ptl_handle_me_t *get_me_p;
+    ptl_size_t chunk_offset;
     void *chunk_buffer[MPID_NEM_PTL_NUM_CHUNK_BUFFERS];
     MPIDI_msg_sz_t bytes_put;
     int found; /* used in probes with PtlMESearch() */
@@ -65,6 +66,7 @@ typedef struct {
         REQ_PTL(req_)->put_me        = PTL_INVALID_HANDLE;      \
         REQ_PTL(req_)->get_me_p      = NULL;                    \
         REQ_PTL(req_)->event_handler = NULL;                    \
+        REQ_PTL(req_)->chunk_offset  = 0;                       \
     } while (0)
 
 #define MPID_nem_ptl_request_create_sreq(sreq_, errno_, comm_) do {                                             \
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c
index 2f4345d..80576e7 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c
@@ -122,6 +122,78 @@ static int handler_recv_dequeue_complete(const ptl_event_t *e)
 }
 
 #undef FUNCNAME
+#define FUNCNAME handler_recv_unpack
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+static int handler_recv_unpack(const ptl_event_t *e)
+{
+    int mpi_errno = MPI_SUCCESS;
+    MPID_Request *const rreq = e->user_ptr;
+    MPI_Aint last;
+
+    MPIDI_STATE_DECL(MPID_STATE_HANDLER_RECV_UNPACK);
+
+    MPIDI_FUNC_ENTER(MPID_STATE_HANDLER_RECV_UNPACK);
+
+    MPIU_Assert(e->type == PTL_EVENT_REPLY);
+
+    last = rreq->dev.segment_size;
+    MPID_Segment_unpack(rreq->dev.segment_ptr, rreq->dev.segment_first, &last,
+                        (char *)REQ_PTL(rreq)->chunk_buffer[0] + REQ_PTL(rreq)->chunk_offset);
+
+    rreq->dev.segment_first += e->mlength;
+    REQ_PTL(rreq)->chunk_offset += e->mlength;
+    if (rreq->dev.segment_first == rreq->dev.segment_size)
+        mpi_errno = handler_recv_complete(e);
+
+    if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+
+ fn_exit:
+    MPIDI_FUNC_EXIT(MPID_STATE_HANDLER_RECV_UNPACK);
+    return mpi_errno;
+ fn_fail:
+    goto fn_exit;
+}
+
+#undef FUNCNAME
+#define FUNCNAME big_get
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+static void big_get(void *buf, ptl_size_t left_to_get, MPIDI_VC_t *vc, ptl_match_bits_t match_bits, MPID_Request *rreq)
+{
+    int ret;
+    MPID_nem_ptl_vc_area *vc_ptl;
+    ptl_size_t start, get_sz;
+    void *user_ptr = NULL;
+
+    vc_ptl = VC_PTL(vc);
+    start = (ptl_size_t)buf;
+
+    /* we need to handle all event if we are unpacking from the chunk_buffer */
+    if (REQ_PTL(rreq)->event_handler == handler_recv_unpack)
+        user_ptr = rreq;
+
+    while (left_to_get > 0) {
+        /* get up to the maximum allowed by the portals interface */
+        if (left_to_get > MPIDI_nem_ptl_ni_limits.max_msg_size) {
+            get_sz = MPIDI_nem_ptl_ni_limits.max_msg_size;
+        } else {
+            get_sz = left_to_get;
+            /* attach the request to the final operation */
+            user_ptr = rreq;
+        }
+        ret = MPID_nem_ptl_rptl_get(MPIDI_nem_ptl_global_md, start, get_sz, vc_ptl->id, vc_ptl->ptg, match_bits, 0, user_ptr);
+        DBG_MSG_GET("global", get_sz, vc->pg_rank, match_bits);
+        MPIU_DBG_MSG_P(CH3_CHANNEL, VERBOSE, "   buf=%p", (char *)start);
+        MPIU_Assert(ret == 0);
+
+        /* account for what has been sent */
+        start += get_sz;
+        left_to_get -= get_sz;
+    }
+}
+
+#undef FUNCNAME
 #define FUNCNAME handler_recv_unpack_complete
 #undef FCNAME
 #define FCNAME MPIU_QUOTE(FUNCNAME)
@@ -232,33 +304,10 @@ static int handler_recv_dequeue_large(const ptl_event_t *e)
 
     /* we need to GET the rest of the data from the sender's buffer */
     if (dt_contig) {
-        /* recv buffer is contig */
-        ptl_size_t start, left_to_get, get_sz;
-        void * user_ptr = NULL;
-
         REQ_PTL(rreq)->event_handler = handler_recv_complete;
 
-        start = (ptl_size_t)((char *)rreq->dev.user_buf + dt_true_lb + PTL_LARGE_THRESHOLD);
-        left_to_get = data_sz - PTL_LARGE_THRESHOLD;
-
-        while (left_to_get > 0) {
-            /* get up to the maximum allowed by the portals interface */
-            if (left_to_get > MPIDI_nem_ptl_ni_limits.max_msg_size) {
-                get_sz = MPIDI_nem_ptl_ni_limits.max_msg_size;
-            } else {
-                get_sz = left_to_get;
-                /* attach the request to the final operation */
-                user_ptr = rreq;
-            }
-            ret = MPID_nem_ptl_rptl_get(MPIDI_nem_ptl_global_md, start, get_sz, vc_ptl->id, vc_ptl->ptg, e->match_bits, 0, user_ptr);
-            DBG_MSG_GET("global", get_sz, vc->pg_rank, e->match_bits);
-            MPIU_DBG_MSG_P(CH3_CHANNEL, VERBOSE, "   buf=%p", (char *)start);
-            MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlget", "**ptlget %s", MPID_nem_ptl_strerror(ret));
-
-            /* account for what has been sent */
-            start += get_sz;
-            left_to_get -= get_sz;
-        }
+        big_get((char *)rreq->dev.user_buf + dt_true_lb + PTL_LARGE_THRESHOLD, data_sz - PTL_LARGE_THRESHOLD,
+                vc, e->match_bits, rreq);
         goto fn_exit;
     }
 
@@ -268,7 +317,7 @@ static int handler_recv_dequeue_large(const ptl_event_t *e)
     rreq->dev.iov_count = MPID_IOV_LIMIT;
     MPID_Segment_pack_vector(rreq->dev.segment_ptr, rreq->dev.segment_first, &last, rreq->dev.iov, &rreq->dev.iov_count);
 
-    if (last == rreq->dev.segment_size) {
+    if (last == rreq->dev.segment_size && rreq->dev.segment_size <= MPIDI_nem_ptl_ni_limits.max_msg_size + PTL_LARGE_THRESHOLD) {
         /* Rest of message fits in one IOV */
         ptl_md_t md;
 
@@ -289,12 +338,10 @@ static int handler_recv_dequeue_large(const ptl_event_t *e)
         
     /* message won't fit in a single IOV, allocate buffer and unpack when received */
     /* FIXME: For now, allocate a single large buffer to hold entire message */
-    MPIU_CHKPMEM_MALLOC(REQ_PTL(rreq)->chunk_buffer[0], void *, rreq->dev.segment_size - rreq->dev.segment_first, mpi_errno, "chunk_buffer");
-
-    REQ_PTL(rreq)->event_handler = handler_recv_unpack_complete;
-    ret = MPID_nem_ptl_rptl_get(MPIDI_nem_ptl_global_md, (ptl_size_t)REQ_PTL(rreq)->chunk_buffer[0],
-                 rreq->dev.segment_size - rreq->dev.segment_first, vc_ptl->id, vc_ptl->ptg, e->match_bits, 0, rreq);
-    MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlget", "**ptlget %s", MPID_nem_ptl_strerror(ret));
+    MPIU_CHKPMEM_MALLOC(REQ_PTL(rreq)->chunk_buffer[0], void *, rreq->dev.segment_size - rreq->dev.segment_first,
+                        mpi_errno, "chunk_buffer");
+    REQ_PTL(rreq)->event_handler = handler_recv_unpack;
+    big_get(REQ_PTL(rreq)->chunk_buffer[0], rreq->dev.segment_size - rreq->dev.segment_first, vc, e->match_bits, rreq);
 
  fn_exit:
     MPIU_CHKPMEM_COMMIT();
@@ -316,8 +363,7 @@ static int handler_recv_dequeue_unpack_large(const ptl_event_t *e)
     int mpi_errno = MPI_SUCCESS;
     MPID_Request *const rreq = e->user_ptr;
     MPIDI_VC_t *vc;
-    MPID_nem_ptl_vc_area *vc_ptl;
-    int ret;
+    MPI_Aint last;
     void *buf;
     MPIU_CHKPMEM_DECL(1);
     MPIDI_STATE_DECL(MPID_STATE_HANDLER_RECV_DEQUEUE_UNPACK_LARGE);
@@ -326,7 +372,6 @@ static int handler_recv_dequeue_unpack_large(const ptl_event_t *e)
     MPIU_Assert(e->type == PTL_EVENT_PUT || e->type == PTL_EVENT_PUT_OVERFLOW);
 
     MPIDI_Comm_get_vc(rreq->comm, NPTL_MATCH_GET_RANK(e->match_bits), &vc);
-    vc_ptl = VC_PTL(vc);
 
     dequeue_req(e);
 
@@ -343,18 +388,16 @@ static int handler_recv_dequeue_unpack_large(const ptl_event_t *e)
         buf = REQ_PTL(rreq)->chunk_buffer[0];
 
     MPIU_Assert(e->mlength == PTL_LARGE_THRESHOLD);
-    mpi_errno = MPID_nem_ptl_unpack_byte(rreq->dev.segment_ptr, rreq->dev.segment_first, PTL_LARGE_THRESHOLD,
-                                         buf, &REQ_PTL(rreq)->overflow[0]);
-    if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+    last = PTL_LARGE_THRESHOLD;
+    MPID_Segment_unpack(rreq->dev.segment_ptr, rreq->dev.segment_first, &last, buf);
+    MPIU_Assert(last == PTL_LARGE_THRESHOLD);
     rreq->dev.segment_first += PTL_LARGE_THRESHOLD;
     MPIU_Free(REQ_PTL(rreq)->chunk_buffer[0]);
 
-    MPIU_CHKPMEM_MALLOC(REQ_PTL(rreq)->chunk_buffer[0], void *, rreq->dev.segment_size - rreq->dev.segment_first, mpi_errno, "chunk_buffer");
-
-    REQ_PTL(rreq)->event_handler = handler_recv_unpack_complete;
-    ret = MPID_nem_ptl_rptl_get(MPIDI_nem_ptl_global_md, (ptl_size_t)REQ_PTL(rreq)->chunk_buffer[0],
-                 rreq->dev.segment_size - rreq->dev.segment_first, vc_ptl->id, vc_ptl->ptg, e->match_bits, 0, rreq);
-    MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlget", "**ptlget %s", MPID_nem_ptl_strerror(ret));
+    MPIU_CHKPMEM_MALLOC(REQ_PTL(rreq)->chunk_buffer[0], void *, rreq->dev.segment_size - rreq->dev.segment_first,
+                        mpi_errno, "chunk_buffer");
+    REQ_PTL(rreq)->event_handler = handler_recv_unpack;
+    big_get(REQ_PTL(rreq)->chunk_buffer[0], rreq->dev.segment_size - rreq->dev.segment_first, vc, e->match_bits, rreq);
 
  fn_exit:
     MPIU_CHKPMEM_COMMIT();
@@ -666,12 +709,8 @@ int MPID_nem_ptl_lmt_start_recv(MPIDI_VC_t *vc,  MPID_Request *rreq, MPID_IOV s_
         void * real_user_buf = (char *)rreq->dev.user_buf + dt_true_lb;
 
         REQ_PTL(rreq)->event_handler = handler_recv_complete;
-        ret = MPID_nem_ptl_rptl_get(MPIDI_nem_ptl_global_md, (ptl_size_t)((char *)real_user_buf + PTL_LARGE_THRESHOLD),
-                     data_sz - PTL_LARGE_THRESHOLD, vc_ptl->id, vc_ptl->ptg, match_bits, 0, rreq);
-        DBG_MSG_GET("global", data_sz - PTL_LARGE_THRESHOLD, vc->pg_rank, match_bits);
-        MPIU_DBG_MSG_P(CH3_CHANNEL, VERBOSE, "   buf=%p", (char *)real_user_buf + PTL_LARGE_THRESHOLD);
-        MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlget", "**ptlget %s",
-                             MPID_nem_ptl_strerror(ret));
+        big_get((char *)real_user_buf + PTL_LARGE_THRESHOLD, data_sz - PTL_LARGE_THRESHOLD, vc, match_bits, rreq);
+
         /* The memcpy is done after the get purposely for overlapping */
         MPIU_Memcpy(real_user_buf, rreq->dev.tmpbuf, PTL_LARGE_THRESHOLD);
     }
@@ -684,16 +723,16 @@ int MPID_nem_ptl_lmt_start_recv(MPIDI_VC_t *vc,  MPID_Request *rreq, MPID_IOV s_
         MPID_Segment_init(rreq->dev.user_buf, rreq->dev.user_count, rreq->dev.datatype,
                           rreq->dev.segment_ptr, 0);
         rreq->dev.segment_first = 0;
-        rreq->dev.segment_size = data_sz - PTL_LARGE_THRESHOLD;
+        rreq->dev.segment_size = data_sz;
         last = PTL_LARGE_THRESHOLD;
         MPID_Segment_unpack(rreq->dev.segment_ptr, rreq->dev.segment_first, &last, rreq->dev.tmpbuf);
         MPIU_Assert(last == PTL_LARGE_THRESHOLD);
         rreq->dev.segment_first = PTL_LARGE_THRESHOLD;
-        last = data_sz - PTL_LARGE_THRESHOLD;
+        last = rreq->dev.segment_size;
         rreq->dev.iov_count = MPID_IOV_LIMIT;
         MPID_Segment_pack_vector(rreq->dev.segment_ptr, rreq->dev.segment_first, &last, rreq->dev.iov,
                                  &rreq->dev.iov_count);
-        if (last == rreq->dev.segment_size) {
+        if (last == rreq->dev.segment_size && last <= MPIDI_nem_ptl_ni_limits.max_msg_size + PTL_LARGE_THRESHOLD) {
             /* Rest of message fits in one IOV */
             ptl_md_t md;
 
@@ -707,22 +746,18 @@ int MPID_nem_ptl_lmt_start_recv(MPIDI_VC_t *vc,  MPID_Request *rreq, MPID_IOV s_
                                  MPID_nem_ptl_strerror(ret));
 
             REQ_PTL(rreq)->event_handler = handler_recv_complete;
-            ret = MPID_nem_ptl_rptl_get(REQ_PTL(rreq)->md, 0, rreq->dev.segment_size, vc_ptl->id, vc_ptl->ptg,
-                         match_bits, PTL_LARGE_THRESHOLD, rreq);
+            ret = MPID_nem_ptl_rptl_get(REQ_PTL(rreq)->md, 0, rreq->dev.segment_size - rreq->dev.segment_first,
+                                        vc_ptl->id, vc_ptl->ptg, match_bits, 0, rreq);
             MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlget", "**ptlget %s",
                                  MPID_nem_ptl_strerror(ret));
         }
         else {
             /* message won't fit in a single IOV, allocate buffer and unpack when received */
             /* FIXME: For now, allocate a single large buffer to hold entire message */
-            MPIU_CHKPMEM_MALLOC(REQ_PTL(rreq)->chunk_buffer[0], void *, rreq->dev.segment_size,
+            MPIU_CHKPMEM_MALLOC(REQ_PTL(rreq)->chunk_buffer[0], void *, rreq->dev.segment_size - rreq->dev.segment_first,
                                 mpi_errno, "chunk_buffer");
-            REQ_PTL(rreq)->event_handler = handler_recv_unpack_complete;
-            ret = MPID_nem_ptl_rptl_get(MPIDI_nem_ptl_global_md, (ptl_size_t)REQ_PTL(rreq)->chunk_buffer[0],
-                         rreq->dev.segment_size, vc_ptl->id, vc_ptl->ptg, match_bits,
-                         PTL_LARGE_THRESHOLD, rreq);
-            MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlget", "**ptlget %s",
-                                 MPID_nem_ptl_strerror(ret));
+            REQ_PTL(rreq)->event_handler = handler_recv_unpack;
+            big_get(REQ_PTL(rreq)->chunk_buffer[0], rreq->dev.segment_size - rreq->dev.segment_first, vc, match_bits, rreq);
         }
     }
     MPIU_Free(rreq->dev.tmpbuf);
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
index d2cee52..95e244e 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
@@ -317,9 +317,9 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
             me.start = (char *)me.start + me.length;
             left_to_send -= me.length;
         }
-        
+
         REQ_PTL(sreq)->large = TRUE;
-            
+
         REQ_PTL(sreq)->event_handler = handler_large;
         ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)((char *)buf + dt_true_lb), PTL_LARGE_THRESHOLD, PTL_ACK_REQ, vc_ptl->id, vc_ptl->pt,
                      NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), 0, sreq,
@@ -356,7 +356,7 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
                                      &sreq->dev.iov[initial_iov_count], &sreq->dev.iov_count);
             remaining_iov_count = sreq->dev.iov_count;
 
-            if (last == sreq->dev.segment_size) {
+            if (last == sreq->dev.segment_size && last <= MPIDI_nem_ptl_ni_limits.max_msg_size + PTL_LARGE_THRESHOLD) {
                 /* Entire message fit in one IOV */
                 MPIU_DBG_MSG(CH3_CHANNEL, VERBOSE, "    rest of message fits in one IOV");
                 /* Create ME for remaining data */
@@ -388,7 +388,7 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
                 MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmdbind", "**ptlmdbind %s", MPID_nem_ptl_strerror(ret));
 
                 REQ_PTL(sreq)->large = TRUE;
-                        
+
                 REQ_PTL(sreq)->event_handler = handler_large;
                 ret = MPID_nem_ptl_rptl_put(REQ_PTL(sreq)->md, 0, PTL_LARGE_THRESHOLD, PTL_ACK_REQ, vc_ptl->id, vc_ptl->pt,
                              NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), 0, sreq,
@@ -402,16 +402,20 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
         /* Don't handle this case separately */
     }
 
-    /* Message doesn't fit in IOV, pack into buffers */
-    MPIU_DBG_MSG(CH3_CHANNEL, VERBOSE, "    Message doesn't fit in IOV: use bounce buffer");
+    /* same code as large contig */
+    ptl_size_t left_to_send;
+    void * user_ptr = NULL;
+    int i;
+
+    /* allocate a temporary buffer and copy all the data to send */
+    MPIU_CHKPMEM_MALLOC(REQ_PTL(sreq)->chunk_buffer[0], void *, data_sz, mpi_errno, "tmpbuf");
 
-    /* FIXME: For now, allocate a single large buffer to hold entire message */
-    MPIU_CHKPMEM_MALLOC(REQ_PTL(sreq)->chunk_buffer[0], void *, data_sz, mpi_errno, "chunk_buffer");
-    MPI_nem_ptl_pack_byte(sreq->dev.segment_ptr, 0, data_sz, REQ_PTL(sreq)->chunk_buffer[0], &REQ_PTL(sreq)->overflow[0]);
+    last = data_sz;
+    MPID_Segment_pack(sreq->dev.segment_ptr, 0, &last, REQ_PTL(sreq)->chunk_buffer[0]);
+    MPIU_Assert(last == data_sz);
 
-    /* create ME for buffer so receiver can issue a GET for the data */
     me.start = (char *)REQ_PTL(sreq)->chunk_buffer[0] + PTL_LARGE_THRESHOLD;
-    me.length = data_sz - PTL_LARGE_THRESHOLD;
+    left_to_send = data_sz - PTL_LARGE_THRESHOLD;
     me.ct_handle = PTL_CT_NONE;
     me.uid = PTL_UID_ANY;
     me.options = ( PTL_ME_OP_PUT | PTL_ME_OP_GET | PTL_ME_USE_ONCE | PTL_ME_IS_ACCESSIBLE | PTL_ME_EVENT_LINK_DISABLE |
@@ -421,61 +425,39 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
     me.ignore_bits = 0;
     me.min_free = 0;
 
-    MPIU_CHKPMEM_MALLOC(REQ_PTL(sreq)->get_me, ptl_handle_me_t *, sizeof(ptl_handle_me_t), mpi_errno, "get_me");
+    /* allocate enough handles to cover all get operations */
+    MPIU_CHKPMEM_MALLOC(REQ_PTL(sreq)->get_me, ptl_handle_me_t *,
+                        sizeof(ptl_handle_me_t) * ((left_to_send / MPIDI_nem_ptl_ni_limits.max_msg_size) + 1),
+                        mpi_errno, "get_me");
+
+    /* queue up as many entries as necessary to describe the entire message */
+    for (i = 0; left_to_send > 0; i++) {
+        /* send up to the maximum allowed by the portals interface */
+        if (left_to_send > MPIDI_nem_ptl_ni_limits.max_msg_size)
+            me.length = MPIDI_nem_ptl_ni_limits.max_msg_size;
+        else {
+            me.length = left_to_send;
+            /* attach the request to the final operation */
+            user_ptr = sreq;
+        }
+
+        ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_get_pt, &me, PTL_PRIORITY_LIST, user_ptr, &REQ_PTL(sreq)->get_me[i]);
+        MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmeappend", "**ptlmeappend %s", MPID_nem_ptl_strerror(ret));
+        DBG_MSG_MEAPPEND("CTL", vc->pg_rank, me, sreq);
 
-    DBG_MSG_MEAPPEND("CTL", vc->pg_rank, me, sreq);
-    ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_get_pt, &me, PTL_PRIORITY_LIST, sreq, &REQ_PTL(sreq)->get_me[0]);
-    MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmeappend", "**ptlmeappend %s", MPID_nem_ptl_strerror(ret));
+        /* account for what has been sent */
+        me.start = (char *)me.start + me.length;
+        left_to_send -= me.length;
+    }
 
     REQ_PTL(sreq)->large = TRUE;
-    
+
     REQ_PTL(sreq)->event_handler = handler_large;
-    ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)REQ_PTL(sreq)->chunk_buffer[0], PTL_LARGE_THRESHOLD, PTL_ACK_REQ,
-                 vc_ptl->id, vc_ptl->pt, NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), 0, sreq,
-                                NPTL_HEADER(ssend_flag | NPTL_LARGE, data_sz), 1);
+    ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)REQ_PTL(sreq)->chunk_buffer[0], PTL_LARGE_THRESHOLD,
+                                PTL_ACK_REQ, vc_ptl->id, vc_ptl->pt, NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank),
+                                0, sreq, NPTL_HEADER(ssend_flag | NPTL_LARGE, data_sz), 1);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s", MPID_nem_ptl_strerror(ret));
     DBG_MSG_PUT("global", PTL_LARGE_THRESHOLD, vc->pg_rank, NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), NPTL_HEADER(ssend_flag | NPTL_LARGE, data_sz));
-    goto fn_exit;
-
-#if 0
-    sreq->dev.segment_first = 0;
-
-    /* Pack first chunk of message */
-    MPIU_CHKPMEM_MALLOC(req_PTL(sreq_)->chunk_buffer, void *, PTL_LARGE_THRESHOLD, mpi_errno, "chunk_buffer");
-    MPI_nem_ptl_pack_byte(sreq->dev.segment_ptr, 0, PTL_LARGE_THRESHOLD, REQ_PTL(sreq_)->chunk_buffer[0],
-              &REQ_PTL(sreq)->overflow[0]);
-    sreq->dev.segment_first = PTL_LARGE_THRESHOLD;
-            
-    /* Pack second chunk of message */
-    MPIU_CHKPMEM_MALLOC(req_PTL(sreq_)->chunk_buffer, void *, PTL_LARGE_THRESHOLD, mpi_errno, "chunk_buffer");
-    MPI_nem_ptl_pack_byte(sreq->dev.segment_ptr, sreq->dev.segment_first, sreq->dev.segment_first + PTL_LARGE_THRESHOLD,
-              REQ_PTL(sreq_)->chunk_buffer[1], &REQ_PTL(sreq)->overflow[1]);
-    sreq->dev.segment_first += PTL_LARGE_THRESHOLD;
-
-    /* create ME for second chunk */
-    me.start = REQ_PTL(sreq_)->chunk_buffer[1];
-    me.length = PTL_LARGE_THRESHOLD;
-    me.ct_handle = PTL_CT_NONE;
-    me.uid = PTL_UID_ANY;
-    me.options = ( PTL_ME_OP_PUT | PTL_ME_OP_GET | PTL_ME_USE_ONCE | PTL_ME_IS_ACCESSIBLE | PTL_ME_EVENT_LINK_DISABLE |
-                   PTL_ME_EVENT_UNLINK_DISABLE );
-    me.match_id = vc_ptl->id;
-    me.match_bits = NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank);
-    me.ignore_bits = 0;
-    me.min_free = 0;
-            
-    ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_get_pt, &me, PTL_PRIORITY_LIST, sreq, &REQ_PTL(sreq)->me);
-    MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmeappend", "**ptlmeappend %s", MPID_nem_ptl_strerror(ret));
-
-
-    REQ_PTL(sreq)->large = TRUE;
-                        
-    REQ_PTL(sreq)->event_handler = handler_large_multi;
-    ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)REQ_PTL(sreq_)->chunk_buffer[0], PTL_LARGE_THRESHOLD, PTL_ACK_REQ, vc_ptl->id,
-                 vc_ptl->pt, NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), 0, sreq,
-                                NPTL_HEADER(ssend_flag | NPTL_LARGE | NPTL_MULTIPLE, data_sz), 1);
-    MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s", MPID_nem_ptl_strerror(ret));
-#endif
     
  fn_exit:
     *request = sreq;

http://git.mpich.org/mpich.git/commitdiff/8dc3a7f01bb25d801d353c05feab7d3ff8eb6b9f

commit 8dc3a7f01bb25d801d353c05feab7d3ff8eb6b9f
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Fri Oct 10 16:31:39 2014 -0500

    portals4: support for large contiguous messages
    
    If a message is larger than the max_msg_size limit, issue multiple MEs
    for the remainder of the message. Completion events for the intermediate
    operations will be ignored. Only the final operation will trigger the
    event handler to tell MPI that communication is complete.
    
    Signed-off-by: Pavan Balaji <balaji at anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
index 46d33f7..10c66f1 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
@@ -42,7 +42,8 @@ typedef struct {
     int noncontig;
     int large;
     ptl_handle_md_t md;
-    ptl_handle_me_t me;
+    ptl_handle_me_t put_me;
+    ptl_handle_me_t *get_me_p;
     void *chunk_buffer[MPID_NEM_PTL_NUM_CHUNK_BUFFERS];
     MPIDI_msg_sz_t bytes_put;
     int found; /* used in probes with PtlMESearch() */
@@ -61,7 +62,8 @@ typedef struct {
         REQ_PTL(req_)->noncontig     = FALSE;                   \
         REQ_PTL(req_)->large         = FALSE;                   \
         REQ_PTL(req_)->md            = PTL_INVALID_HANDLE;      \
-        REQ_PTL(req_)->me            = PTL_INVALID_HANDLE;      \
+        REQ_PTL(req_)->put_me        = PTL_INVALID_HANDLE;      \
+        REQ_PTL(req_)->get_me_p      = NULL;                    \
         REQ_PTL(req_)->event_handler = NULL;                    \
     } while (0)
 
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c
index 26a1eb2..4c4b6a4 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c
@@ -165,11 +165,14 @@ int MPID_nem_ptl_poll(int is_blocking_poll)
         case PTL_EVENT_ACK:
         case PTL_EVENT_REPLY:
         case PTL_EVENT_SEARCH: {
-            MPID_Request * const req = event.user_ptr;
-            MPIU_DBG_MSG_P(CH3_CHANNEL, VERBOSE, "req = %p", req);
-            MPIU_DBG_MSG_P(CH3_CHANNEL, VERBOSE, "REQ_PTL(req)->event_handler = %p", REQ_PTL(req)->event_handler);
-            mpi_errno = REQ_PTL(req)->event_handler(&event);
-            if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+            /* intermediate operations for large messages pass a NULL user_ptr. we can ignore these events */
+            if (event.user_ptr) {
+                MPID_Request * const req = event.user_ptr;
+                MPIU_DBG_MSG_P(CH3_CHANNEL, VERBOSE, "req = %p", req);
+                MPIU_DBG_MSG_P(CH3_CHANNEL, VERBOSE, "REQ_PTL(req)->event_handler = %p", REQ_PTL(req)->event_handler);
+                mpi_errno = REQ_PTL(req)->event_handler(&event);
+                if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+            }
             break;
         }
         case PTL_EVENT_AUTO_FREE:
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_probe.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_probe.c
index 3d88225..9a583e5 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_probe.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_probe.c
@@ -73,7 +73,7 @@ static int handle_mprobe(const ptl_event_t *e)
 
     /* At this point we know the ME is unlinked. Invalidate the handle to
        prevent further accesses, e.g. an attempted cancel. */
-    REQ_PTL(req)->me = PTL_INVALID_HANDLE;
+    REQ_PTL(req)->put_me = PTL_INVALID_HANDLE;
     req->dev.recv_pending_count = 1;
 
   fn_exit:
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c
index cca6d4c..2f4345d 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c
@@ -19,7 +19,7 @@ static void dequeue_req(const ptl_event_t *e)
 
     /* At this point we know the ME is unlinked. Invalidate the handle to
        prevent further accesses, e.g. an attempted cancel. */
-    REQ_PTL(rreq)->me = PTL_INVALID_HANDLE;
+    REQ_PTL(rreq)->put_me = PTL_INVALID_HANDLE;
 
     found = MPIDI_CH3U_Recvq_DP(rreq);
     MPIU_Assert(found);
@@ -233,12 +233,32 @@ static int handler_recv_dequeue_large(const ptl_event_t *e)
     /* we need to GET the rest of the data from the sender's buffer */
     if (dt_contig) {
         /* recv buffer is contig */
+        ptl_size_t start, left_to_get, get_sz;
+        void * user_ptr = NULL;
+
         REQ_PTL(rreq)->event_handler = handler_recv_complete;
-        ret = MPID_nem_ptl_rptl_get(MPIDI_nem_ptl_global_md, (ptl_size_t)((char *)rreq->dev.user_buf + dt_true_lb + PTL_LARGE_THRESHOLD),
-                     data_sz - PTL_LARGE_THRESHOLD, vc_ptl->id, vc_ptl->ptg, e->match_bits, 0, rreq);
-        DBG_MSG_GET("global", data_sz - PTL_LARGE_THRESHOLD, vc->pg_rank, e->match_bits);
-        MPIU_DBG_MSG_P(CH3_CHANNEL, VERBOSE, "   buf=%p", (char *)rreq->dev.user_buf + dt_true_lb + PTL_LARGE_THRESHOLD);
-        MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlget", "**ptlget %s", MPID_nem_ptl_strerror(ret));
+
+        start = (ptl_size_t)((char *)rreq->dev.user_buf + dt_true_lb + PTL_LARGE_THRESHOLD);
+        left_to_get = data_sz - PTL_LARGE_THRESHOLD;
+
+        while (left_to_get > 0) {
+            /* get up to the maximum allowed by the portals interface */
+            if (left_to_get > MPIDI_nem_ptl_ni_limits.max_msg_size) {
+                get_sz = MPIDI_nem_ptl_ni_limits.max_msg_size;
+            } else {
+                get_sz = left_to_get;
+                /* attach the request to the final operation */
+                user_ptr = rreq;
+            }
+            ret = MPID_nem_ptl_rptl_get(MPIDI_nem_ptl_global_md, start, get_sz, vc_ptl->id, vc_ptl->ptg, e->match_bits, 0, user_ptr);
+            DBG_MSG_GET("global", get_sz, vc->pg_rank, e->match_bits);
+            MPIU_DBG_MSG_P(CH3_CHANNEL, VERBOSE, "   buf=%p", (char *)start);
+            MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlget", "**ptlget %s", MPID_nem_ptl_strerror(ret));
+
+            /* account for what has been sent */
+            start += get_sz;
+            left_to_get -= get_sz;
+        }
         goto fn_exit;
     }
 
@@ -479,7 +499,7 @@ int MPID_nem_ptl_recv_posted(MPIDI_VC_t *vc, MPID_Request *rreq)
         
     }
 
-    ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_pt, &me, PTL_PRIORITY_LIST, rreq, &REQ_PTL(rreq)->me);
+    ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_pt, &me, PTL_PRIORITY_LIST, rreq, &REQ_PTL(rreq)->put_me);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmeappend", "**ptlmeappend %s", MPID_nem_ptl_strerror(ret));
     DBG_MSG_MEAPPEND("REG", vc ? vc->pg_rank : MPI_ANY_SOURCE, me, rreq);
     MPIU_DBG_MSG_P(CH3_CHANNEL, VERBOSE, "    buf=%p", me.start);
@@ -533,8 +553,8 @@ static int cancel_recv(MPID_Request *rreq, int *cancelled)
     /* An invalid handle indicates the operation has been completed
        and the matching list entry unlinked. At that point, the operation
        cannot be cancelled. */
-    if (REQ_PTL(rreq)->me != PTL_INVALID_HANDLE) {
-        ptl_err = PtlMEUnlink(REQ_PTL(rreq)->me);
+    if (REQ_PTL(rreq)->put_me != PTL_INVALID_HANDLE) {
+        ptl_err = PtlMEUnlink(REQ_PTL(rreq)->put_me);
         if (ptl_err == PTL_OK)
             *cancelled = TRUE;
         /* FIXME: if we properly invalidate matching list entry handles, we should be
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
index 796f559..d2cee52 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
@@ -31,6 +31,9 @@ static int handler_send_complete(const ptl_event_t *e)
     for (i = 0; i < MPID_NEM_PTL_NUM_CHUNK_BUFFERS; ++i)
         if (REQ_PTL(sreq)->chunk_buffer[i])
             MPIU_Free(REQ_PTL(sreq)->chunk_buffer[i]);
+
+    if (REQ_PTL(sreq)->get_me_p)
+        MPIU_Free(REQ_PTL(sreq)->get_me_p);
     
     MPIDI_CH3U_Request_complete(sreq);
 
@@ -274,9 +277,13 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
     /* Large message.  Send first chunk of data and let receiver get the rest */
     if (dt_contig) {
         /* create ME for buffer so receiver can issue a GET for the data */
+        ptl_size_t left_to_send;
+        void * user_ptr = NULL;
+        int i;
+
         MPIU_DBG_MSG(CH3_CHANNEL, VERBOSE, "Large contig message");
         me.start = (char *)buf + dt_true_lb + PTL_LARGE_THRESHOLD;
-        me.length = data_sz - PTL_LARGE_THRESHOLD;
+        left_to_send = data_sz - PTL_LARGE_THRESHOLD;
         me.ct_handle = PTL_CT_NONE;
         me.uid = PTL_UID_ANY;
         me.options = ( PTL_ME_OP_PUT | PTL_ME_OP_GET | PTL_ME_USE_ONCE | PTL_ME_IS_ACCESSIBLE | PTL_ME_EVENT_LINK_DISABLE |
@@ -286,9 +293,30 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
         me.ignore_bits = 0;
         me.min_free = 0;
 
-        ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_get_pt, &me, PTL_PRIORITY_LIST, sreq, &REQ_PTL(sreq)->me);
-        MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmeappend", "**ptlmeappend %s", MPID_nem_ptl_strerror(ret));
-        DBG_MSG_MEAPPEND("CTL", vc->pg_rank, me, sreq);
+        /* allocate enough handles to cover all get operations */
+        MPIU_CHKPMEM_MALLOC(REQ_PTL(sreq)->get_me, ptl_handle_me_t *,
+                            sizeof(ptl_handle_me_t) * ((left_to_send / MPIDI_nem_ptl_ni_limits.max_msg_size) + 1),
+                            mpi_errno, "get_me");
+
+        /* queue up as many entries as necessary to describe the entire message */
+        for (i = 0; left_to_send > 0; i++) {
+            /* send up to the maximum allowed by the portals interface */
+            if (left_to_send > MPIDI_nem_ptl_ni_limits.max_msg_size)
+                me.length = MPIDI_nem_ptl_ni_limits.max_msg_size;
+            else {
+                me.length = left_to_send;
+                /* attach the request to the final operation */
+                user_ptr = sreq;
+            }
+
+            ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_get_pt, &me, PTL_PRIORITY_LIST, user_ptr, &REQ_PTL(sreq)->get_me[i]);
+            MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmeappend", "**ptlmeappend %s", MPID_nem_ptl_strerror(ret));
+            DBG_MSG_MEAPPEND("CTL", vc->pg_rank, me, sreq);
+
+            /* account for what has been sent */
+            me.start = (char *)me.start + me.length;
+            left_to_send -= me.length;
+        }
         
         REQ_PTL(sreq)->large = TRUE;
             
@@ -342,9 +370,11 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
                 me.match_bits = NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank);
                 me.ignore_bits = 0;
                 me.min_free = 0;
+
+                MPIU_CHKPMEM_MALLOC(REQ_PTL(sreq)->get_me_p, ptl_handle_me_t *, sizeof(ptl_handle_me_t), mpi_errno, "get_me_p");
                         
                 ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_get_pt, &me, PTL_PRIORITY_LIST, sreq,
-                                  &REQ_PTL(sreq)->me);
+                                  &REQ_PTL(sreq)->get_me_p[0]);
                 MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmeappend", "**ptlmeappend %s", MPID_nem_ptl_strerror(ret));
                 DBG_MSG_MEAPPEND("CTL", vc->pg_rank, me, sreq);
 
@@ -391,8 +421,10 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
     me.ignore_bits = 0;
     me.min_free = 0;
 
+    MPIU_CHKPMEM_MALLOC(REQ_PTL(sreq)->get_me, ptl_handle_me_t *, sizeof(ptl_handle_me_t), mpi_errno, "get_me");
+
     DBG_MSG_MEAPPEND("CTL", vc->pg_rank, me, sreq);
-    ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_get_pt, &me, PTL_PRIORITY_LIST, sreq, &REQ_PTL(sreq)->me);
+    ret = PtlMEAppend(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_get_pt, &me, PTL_PRIORITY_LIST, sreq, &REQ_PTL(sreq)->get_me[0]);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmeappend", "**ptlmeappend %s", MPID_nem_ptl_strerror(ret));
 
     REQ_PTL(sreq)->large = TRUE;

http://git.mpich.org/mpich.git/commitdiff/0fc7ab9b6811bdac119195241bf1a13edd8bd787

commit 0fc7ab9b6811bdac119195241bf1a13edd8bd787
Author: Min Si <msi at il.is.s.u-tokyo.ac.jp>
Date:   Sat Nov 1 23:19:10 2014 -0500

    Add req RMA op tests checking local completion.
    
    Rput/accumulate + wait guarantees local completion, which means we can
    modify local buffer after wait is finished. These two tests check the local
    completion of Rput and Raccumulate by modifying local buffer after wait
    and then checking remote data. We expect the remote data should be equal
    to the local data before modifying.
    
    Signed-off-by: Xin Zhao <xinzhao3 at illinois.edu>

diff --git a/test/mpi/rma/Makefile.am b/test/mpi/rma/Makefile.am
index 13529c3..e370fbf 100644
--- a/test/mpi/rma/Makefile.am
+++ b/test/mpi/rma/Makefile.am
@@ -137,6 +137,8 @@ noinst_PROGRAMS =          \
     acc-loc                \
     fence_shm              \
     get-struct             \
+    rput_local_comp        \
+    racc_local_comp        \
     at_complete
 
 strided_acc_indexed_LDADD       = $(LDADD) -lm
diff --git a/test/mpi/rma/racc_local_comp.c b/test/mpi/rma/racc_local_comp.c
new file mode 100644
index 0000000..ea9e57c
--- /dev/null
+++ b/test/mpi/rma/racc_local_comp.c
@@ -0,0 +1,132 @@
+/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
+/*
+ *
+ *  (C) 2014 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ */
+#include <mpi.h>
+#include <stdio.h>
+#include <assert.h>
+#include "mpitest.h"
+
+#define ITER 100
+#define MAX_SIZE 65536
+
+int main(int argc, char *argv[])
+{
+    int rank, nproc, i;
+    int errors = 0, all_errors = 0;
+    int *buf = NULL, *winbuf = NULL;
+    MPI_Win window;
+
+    MPI_Init(&argc, &argv);
+    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
+    MPI_Comm_size(MPI_COMM_WORLD, &nproc);
+
+    if (nproc < 2) {
+        if (rank == 0)
+            printf("Error: must be run with two or more processes\n");
+        MPI_Abort(MPI_COMM_WORLD, 1);
+    }
+
+    MPI_Alloc_mem(MAX_SIZE * sizeof(int), MPI_INFO_NULL, &buf);
+    MPI_Alloc_mem(MAX_SIZE * sizeof(int), MPI_INFO_NULL, &winbuf);
+    MPI_Win_create(winbuf, MAX_SIZE * sizeof(int), sizeof(int), MPI_INFO_NULL,
+                   MPI_COMM_WORLD, &window);
+
+    MPI_Win_lock_all(0, window);
+
+    /* Test Raccumulate local completion with small data.
+     * Small data is always copied to header packet as immediate data. */
+    if (rank == 1) {
+        for (i = 0; i < ITER; i++) {
+            MPI_Request acc_req;
+            int val = -1;
+
+            buf[0] = rank * i;
+            MPI_Raccumulate(&buf[0], 1, MPI_INT, 0, 0, 1, MPI_INT, MPI_MAX, window, &acc_req);
+            MPI_Wait(&acc_req, MPI_STATUS_IGNORE);
+
+            /* reset local buffer to check local completion */
+            buf[0] = 0;
+            MPI_Win_flush(0, window);
+
+            MPI_Get(&val, 1, MPI_INT, 0, 0, 1, MPI_INT, window);
+            MPI_Win_flush(0, window);
+
+            if (val != rank * i) {
+                printf("%d - Got %d in small Raccumulate test, expected %d (%d * %d)\n", rank, val,
+                       rank * i, rank, i);
+                errors++;
+            }
+        }
+    }
+
+    MPI_Barrier(MPI_COMM_WORLD);
+
+    /* Test Raccumulate local completion with large data .
+     * Large data is not suitable for 1-copy optimization, and always sent out
+     * from user buffer. */
+    if (rank == 1) {
+        for (i = 0; i < ITER; i++) {
+            MPI_Request acc_req;
+            int val0 = -1, val1 = -1, val2 = -1;
+            int j;
+
+            /* initialize data */
+            for (j = 0; j < MAX_SIZE; j++) {
+                buf[j] = rank + j + i;
+            }
+
+            MPI_Raccumulate(buf, MAX_SIZE, MPI_INT, 0, 0, MAX_SIZE, MPI_INT, MPI_REPLACE, window,
+                            &acc_req);
+            MPI_Wait(&acc_req, MPI_STATUS_IGNORE);
+
+            /* reset local buffer to check local completion */
+            buf[0] = 0;
+            buf[MAX_SIZE - 1] = 0;
+            buf[MAX_SIZE / 2] = 0;
+            MPI_Win_flush(0, window);
+
+            /* get remote values which are modified in local buffer after wait */
+            MPI_Get(&val0, 1, MPI_INT, 0, 0, 1, MPI_INT, window);
+            MPI_Get(&val1, 1, MPI_INT, 0, MAX_SIZE - 1, 1, MPI_INT, window);
+            MPI_Get(&val2, 1, MPI_INT, 0, MAX_SIZE / 2, 1, MPI_INT, window);
+            MPI_Win_flush(0, window);
+
+            if (val0 != rank + i) {
+                printf("%d - Got %d in large Raccumulate test, expected %d\n", rank,
+                       val0, rank + i);
+                errors++;
+            }
+            if (val1 != rank + MAX_SIZE - 1 + i) {
+                printf("%d - Got %d in large Raccumulate test, expected %d\n", rank,
+                       val1, rank + MAX_SIZE - 1 + i);
+                errors++;
+            }
+            if (val2 != rank + MAX_SIZE / 2 + i) {
+                printf("%d - Got %d in large Raccumulate test, expected %d\n", rank,
+                       val2, rank + MAX_SIZE / 2 + i);
+                errors++;
+            }
+        }
+    }
+
+    MPI_Win_unlock_all(window);
+    MPI_Barrier(MPI_COMM_WORLD);
+
+    MPI_Win_free(&window);
+    if (buf)
+        MPI_Free_mem(buf);
+    if (winbuf)
+        MPI_Free_mem(winbuf);
+
+    MPI_Reduce(&errors, &all_errors, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
+
+    if (rank == 0 && all_errors == 0)
+        printf(" No Errors\n");
+
+    MPI_Finalize();
+
+    return 0;
+}
diff --git a/test/mpi/rma/rput_local_comp.c b/test/mpi/rma/rput_local_comp.c
new file mode 100644
index 0000000..0d1f682
--- /dev/null
+++ b/test/mpi/rma/rput_local_comp.c
@@ -0,0 +1,129 @@
+/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
+/*
+ *
+ *  (C) 2014 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ */
+#include <mpi.h>
+#include <stdio.h>
+#include <assert.h>
+#include "mpitest.h"
+
+#define ITER 100
+#define MAX_SIZE 65536
+
+int main(int argc, char *argv[])
+{
+    int rank, nproc, i;
+    int errors = 0, all_errors = 0;
+    int *buf = NULL, *winbuf = NULL;
+    MPI_Win window;
+
+    MPI_Init(&argc, &argv);
+    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
+    MPI_Comm_size(MPI_COMM_WORLD, &nproc);
+
+    if (nproc < 2) {
+        if (rank == 0)
+            printf("Error: must be run with two or more processes\n");
+        MPI_Abort(MPI_COMM_WORLD, 1);
+    }
+
+    MPI_Alloc_mem(MAX_SIZE * sizeof(int), MPI_INFO_NULL, &buf);
+    MPI_Alloc_mem(MAX_SIZE * sizeof(int), MPI_INFO_NULL, &winbuf);
+    MPI_Win_create(winbuf, MAX_SIZE * sizeof(int), sizeof(int), MPI_INFO_NULL,
+                   MPI_COMM_WORLD, &window);
+
+    MPI_Win_lock_all(0, window);
+
+    /* Test Rput local completion with small data.
+     * Small data is always copied to header packet as immediate data. */
+    if (rank == 1) {
+        for (i = 0; i < ITER; i++) {
+            MPI_Request put_req;
+            int val = -1;
+
+            buf[0] = rank;
+            MPI_Rput(&buf[0], 1, MPI_INT, 0, 0, 1, MPI_INT, window, &put_req);
+            MPI_Wait(&put_req, MPI_STATUS_IGNORE);
+
+            /* reset local buffer to check local completion */
+            buf[0] = 0;
+            MPI_Win_flush(0, window);
+
+            MPI_Get(&val, 1, MPI_INT, 0, 0, 1, MPI_INT, window);
+            MPI_Win_flush(0, window);
+
+            if (val != rank) {
+                printf("%d - Got %d in small Rput test, expected %d\n", rank, val, rank);
+                errors++;
+            }
+        }
+    }
+
+    MPI_Barrier(MPI_COMM_WORLD);
+
+    /* Test Rput local completion with large data .
+     * Large data is not suitable for 1-copy optimization, and always sent out
+     * from user buffer. */
+    if (rank == 1) {
+        for (i = 0; i < ITER; i++) {
+            MPI_Request put_req;
+            int val0 = -1, val1 = -1, val2 = -1;
+            int j;
+
+            /* initialize data */
+            for (j = 0; j < MAX_SIZE; j++) {
+                buf[j] = rank + j + i;
+            }
+
+            MPI_Rput(buf, MAX_SIZE, MPI_INT, 0, 0, MAX_SIZE, MPI_INT, window, &put_req);
+            MPI_Wait(&put_req, MPI_STATUS_IGNORE);
+
+            /* reset local buffer to check local completion */
+            buf[0] = 0;
+            buf[MAX_SIZE - 1] = 0;
+            buf[MAX_SIZE / 2] = 0;
+            MPI_Win_flush(0, window);
+
+            /* get remote values which are modified in local buffer after wait */
+            MPI_Get(&val0, 1, MPI_INT, 0, 0, 1, MPI_INT, window);
+            MPI_Get(&val1, 1, MPI_INT, 0, MAX_SIZE - 1, 1, MPI_INT, window);
+            MPI_Get(&val2, 1, MPI_INT, 0, MAX_SIZE / 2, 1, MPI_INT, window);
+            MPI_Win_flush(0, window);
+
+            if (val0 != rank + i) {
+                printf("%d - Got %d in large Rput test, expected %d\n", rank, val0, rank + i);
+                errors++;
+            }
+            if (val1 != rank + MAX_SIZE - 1 + i) {
+                printf("%d - Got %d in large Rput test, expected %d\n", rank, val1,
+                       rank + MAX_SIZE - 1 + i);
+                errors++;
+            }
+            if (val2 != rank + MAX_SIZE / 2 + i) {
+                printf("%d - Got %d in large Rput test, expected %d\n", rank, val2,
+                       rank + MAX_SIZE / 2 + i);
+                errors++;
+            }
+        }
+    }
+
+    MPI_Win_unlock_all(window);
+    MPI_Barrier(MPI_COMM_WORLD);
+
+    MPI_Win_free(&window);
+    if (buf)
+        MPI_Free_mem(buf);
+    if (winbuf)
+        MPI_Free_mem(winbuf);
+
+    MPI_Reduce(&errors, &all_errors, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
+
+    if (rank == 0 && all_errors == 0)
+        printf(" No Errors\n");
+
+    MPI_Finalize();
+
+    return 0;
+}
diff --git a/test/mpi/rma/testlist.in b/test/mpi/rma/testlist.in
index d13d7cf..cb60752 100644
--- a/test/mpi/rma/testlist.in
+++ b/test/mpi/rma/testlist.in
@@ -102,6 +102,8 @@ flush 4 mpiversion=3.0
 reqops 4 mpiversion=3.0
 req_example 4 mpiversion=3.0
 req_example_shm 4 mpiversion=3.0
+rput_local_comp 2 mpiversion=3.0
+racc_local_comp 2 mpiversion=3.0
 win_info 4 mpiversion=3.0
 linked_list_lockall 4 mpiversion=3.0
 pscw_ordering 4 mpiversion=3.0

http://git.mpich.org/mpich.git/commitdiff/b0fa16749ff4ff760a2e894275ca4081852c663c

commit b0fa16749ff4ff760a2e894275ca4081852c663c
Author: Junchao Zhang <jczhang at mcs.anl.gov>
Date:   Tue Nov 4 14:34:45 2014 -0600

    Uncomment some code in F08 since the cce bug is fixed
    
    It was commented out due to a cce/8.3.0 bug. Since the bug is fixed in cce/8.3.2,
    we can safely uncomment the code.
    
    No reviewer

diff --git a/src/binding/fortran/use_mpi_f08/wrappers_f/register_datarep_f08ts.F90 b/src/binding/fortran/use_mpi_f08/wrappers_f/register_datarep_f08ts.F90
index a687844..fda3212 100644
--- a/src/binding/fortran/use_mpi_f08/wrappers_f/register_datarep_f08ts.F90
+++ b/src/binding/fortran/use_mpi_f08/wrappers_f/register_datarep_f08ts.F90
@@ -35,13 +35,13 @@ subroutine MPI_Register_datarep_f08(datarep, read_conversion_fn, write_conversio
     write_conversion_fn_c = c_funloc(write_conversion_fn)
     dtype_file_extent_fn_c = c_funloc(dtype_file_extent_fn)
 
-   !if (c_associated(read_conversion_fn_c, c_funloc(MPI_CONVERSION_FN_NULL))) then
-   !    read_conversion_fn_c = C_NULL_FUNPTR
-   !end if
+    if (c_associated(read_conversion_fn_c, c_funloc(MPI_CONVERSION_FN_NULL))) then
+        read_conversion_fn_c = C_NULL_FUNPTR
+    end if
 
-   !if (c_associated(write_conversion_fn_c, c_funloc(MPI_CONVERSION_FN_NULL))) then
-   !    read_conversion_fn_c = C_NULL_FUNPTR
-   !end if
+    if (c_associated(write_conversion_fn_c, c_funloc(MPI_CONVERSION_FN_NULL))) then
+        write_conversion_fn_c = C_NULL_FUNPTR
+    end if
 
     ierror_c = MPIR_Register_datarep_c(datarep_c, read_conversion_fn_c, write_conversion_fn_c, &
                                        dtype_file_extent_fn_c, extra_state)

http://git.mpich.org/mpich.git/commitdiff/0b526b2bbf6ec89f96da55e0f9fe85b8105b3f8f

commit 0b526b2bbf6ec89f96da55e0f9fe85b8105b3f8f
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Fri Oct 31 11:19:39 2014 -0500

    portals4: create separate EQ for origin events
    
    An EQ for origin events is useful for rate-limiting operations so that
    a process does not locally trigger a flow control event on its portal.
    We will implement the rate-limiting logic in the rportals layer.
    
    Signed-off-by: Antonio J. Pena <apenya at mcs.anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
index 497a51d..46d33f7 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
@@ -19,7 +19,8 @@ extern ptl_pt_index_t  MPIDI_nem_ptl_pt;
 extern ptl_pt_index_t  MPIDI_nem_ptl_get_pt; /* portal for gets by receiver */
 extern ptl_pt_index_t  MPIDI_nem_ptl_control_pt; /* portal for MPICH control messages */
 extern ptl_pt_index_t  MPIDI_nem_ptl_rpt_pt; /* portal for MPICH control messages */
-extern ptl_handle_eq_t MPIDI_nem_ptl_eq;
+extern ptl_handle_eq_t MPIDI_nem_ptl_target_eq;
+extern ptl_handle_eq_t MPIDI_nem_ptl_origin_eq;
 
 extern ptl_handle_md_t MPIDI_nem_ptl_global_md;
 extern ptl_ni_limits_t MPIDI_nem_ptl_ni_limits;
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
index 06a8be6..e69f4e1 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
@@ -26,7 +26,8 @@ ptl_pt_index_t  MPIDI_nem_ptl_pt;
 ptl_pt_index_t  MPIDI_nem_ptl_get_pt; /* portal for gets by receiver */
 ptl_pt_index_t  MPIDI_nem_ptl_control_pt; /* portal for MPICH control messages */
 ptl_pt_index_t  MPIDI_nem_ptl_rpt_pt; /* portal for rportals control messages */
-ptl_handle_eq_t MPIDI_nem_ptl_eq;
+ptl_handle_eq_t MPIDI_nem_ptl_target_eq;
+ptl_handle_eq_t MPIDI_nem_ptl_origin_eq;
 ptl_handle_md_t MPIDI_nem_ptl_global_md;
 ptl_ni_limits_t MPIDI_nem_ptl_ni_limits;
 
@@ -178,26 +179,31 @@ static int ptl_init(MPIDI_PG_t *pg_p, int pg_rank, char **bc_val_p, int *val_max
                     PTL_PID_ANY, &desired, &MPIDI_nem_ptl_ni_limits, &MPIDI_nem_ptl_ni);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlniinit", "**ptlniinit %s", MPID_nem_ptl_strerror(ret));
 
-    ret = PtlEQAlloc(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_ni_limits.max_eqs, &MPIDI_nem_ptl_eq);
+    ret = PtlEQAlloc(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_ni_limits.max_eqs, &MPIDI_nem_ptl_target_eq);
+    MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptleqalloc", "**ptleqalloc %s", MPID_nem_ptl_strerror(ret));
+
+    /* allocate a separate EQ for origin events. with this, we can implement rate-limit operations
+       to prevent a locally triggered flow control even */
+    ret = PtlEQAlloc(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_ni_limits.max_eqs, &MPIDI_nem_ptl_origin_eq);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptleqalloc", "**ptleqalloc %s", MPID_nem_ptl_strerror(ret));
 
     /* allocate portal for matching messages */
-    ret = PtlPTAlloc(MPIDI_nem_ptl_ni, PTL_PT_ONLY_USE_ONCE | PTL_PT_ONLY_TRUNCATE | PTL_PT_FLOWCTRL, MPIDI_nem_ptl_eq,
+    ret = PtlPTAlloc(MPIDI_nem_ptl_ni, PTL_PT_ONLY_USE_ONCE | PTL_PT_ONLY_TRUNCATE | PTL_PT_FLOWCTRL, MPIDI_nem_ptl_target_eq,
                      PTL_PT_ANY, &MPIDI_nem_ptl_pt);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptalloc", "**ptlptalloc %s", MPID_nem_ptl_strerror(ret));
 
     /* allocate portal for large messages where receiver does a get */
-    ret = PtlPTAlloc(MPIDI_nem_ptl_ni, PTL_PT_ONLY_USE_ONCE | PTL_PT_ONLY_TRUNCATE | PTL_PT_FLOWCTRL, MPIDI_nem_ptl_eq,
+    ret = PtlPTAlloc(MPIDI_nem_ptl_ni, PTL_PT_ONLY_USE_ONCE | PTL_PT_ONLY_TRUNCATE | PTL_PT_FLOWCTRL, MPIDI_nem_ptl_target_eq,
                      PTL_PT_ANY, &MPIDI_nem_ptl_get_pt);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptalloc", "**ptlptalloc %s", MPID_nem_ptl_strerror(ret));
 
     /* allocate portal for MPICH control messages */
-    ret = PtlPTAlloc(MPIDI_nem_ptl_ni, PTL_PT_ONLY_USE_ONCE | PTL_PT_ONLY_TRUNCATE | PTL_PT_FLOWCTRL, MPIDI_nem_ptl_eq,
+    ret = PtlPTAlloc(MPIDI_nem_ptl_ni, PTL_PT_ONLY_USE_ONCE | PTL_PT_ONLY_TRUNCATE | PTL_PT_FLOWCTRL, MPIDI_nem_ptl_target_eq,
                      PTL_PT_ANY, &MPIDI_nem_ptl_control_pt);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptalloc", "**ptlptalloc %s", MPID_nem_ptl_strerror(ret));
 
     /* allocate portal for MPICH control messages */
-    ret = PtlPTAlloc(MPIDI_nem_ptl_ni, PTL_PT_ONLY_USE_ONCE | PTL_PT_ONLY_TRUNCATE | PTL_PT_FLOWCTRL, MPIDI_nem_ptl_eq,
+    ret = PtlPTAlloc(MPIDI_nem_ptl_ni, PTL_PT_ONLY_USE_ONCE | PTL_PT_ONLY_TRUNCATE | PTL_PT_FLOWCTRL, MPIDI_nem_ptl_target_eq,
                      PTL_PT_ANY, &MPIDI_nem_ptl_rpt_pt);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptalloc", "**ptlptalloc %s", MPID_nem_ptl_strerror(ret));
 
@@ -205,7 +211,7 @@ static int ptl_init(MPIDI_PG_t *pg_p, int pg_rank, char **bc_val_p, int *val_max
     md.start = 0;
     md.length = (ptl_size_t)-1;
     md.options = 0x0;
-    md.eq_handle = MPIDI_nem_ptl_eq;
+    md.eq_handle = MPIDI_nem_ptl_origin_eq;
     md.ct_handle = PTL_CT_NONE;
     ret = PtlMDBind(MPIDI_nem_ptl_ni, &md, &MPIDI_nem_ptl_global_md);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmdbind", "**ptlmdbind %s", MPID_nem_ptl_strerror(ret));
@@ -215,7 +221,7 @@ static int ptl_init(MPIDI_PG_t *pg_p, int pg_rank, char **bc_val_p, int *val_max
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlniinit", "**ptlniinit %s", MPID_nem_ptl_strerror(ret));
 
     /* allow rportal to manage the primary portal and retransmit if needed */
-    ret = MPID_nem_ptl_rptl_ptinit(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_eq, MPIDI_nem_ptl_pt, MPIDI_nem_ptl_rpt_pt);
+    ret = MPID_nem_ptl_rptl_ptinit(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_origin_eq, MPIDI_nem_ptl_pt, MPIDI_nem_ptl_rpt_pt);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptalloc", "**ptlptalloc %s", MPID_nem_ptl_strerror(ret));
 
     /* allow rportal to manage the get and control portals, but we
@@ -223,10 +229,10 @@ static int ptl_init(MPIDI_PG_t *pg_p, int pg_rank, char **bc_val_p, int *val_max
      * we pass PTL_PT_ANY as the dummy portal.  unfortunately, portals
      * does not have an "invalid" PT constant, which would have been
      * more appropriate to pass over here. */
-    ret = MPID_nem_ptl_rptl_ptinit(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_eq, MPIDI_nem_ptl_get_pt, PTL_PT_ANY);
+    ret = MPID_nem_ptl_rptl_ptinit(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_origin_eq, MPIDI_nem_ptl_get_pt, PTL_PT_ANY);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptalloc", "**ptlptalloc %s", MPID_nem_ptl_strerror(ret));
 
-    ret = MPID_nem_ptl_rptl_ptinit(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_eq, MPIDI_nem_ptl_control_pt, PTL_PT_ANY);
+    ret = MPID_nem_ptl_rptl_ptinit(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_origin_eq, MPIDI_nem_ptl_control_pt, PTL_PT_ANY);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptalloc", "**ptlptalloc %s", MPID_nem_ptl_strerror(ret));
 
     /* create business card */
@@ -255,6 +261,7 @@ static int ptl_finalize(void)
 {
     int mpi_errno = MPI_SUCCESS;
     int ret;
+    ptl_handle_eq_t eqs[2];
     MPIDI_STATE_DECL(MPID_STATE_PTL_FINALIZE);
     MPIDI_FUNC_ENTER(MPID_STATE_PTL_FINALIZE);
 
@@ -266,7 +273,9 @@ static int ptl_finalize(void)
     if (mpi_errno) MPIU_ERR_POP(mpi_errno);
 
     /* shut down portals */
-    ret = MPID_nem_ptl_rptl_drain_eq(1, &MPIDI_nem_ptl_eq);
+    eqs[0] = MPIDI_nem_ptl_target_eq;
+    eqs[1] = MPIDI_nem_ptl_origin_eq;
+    ret = MPID_nem_ptl_rptl_drain_eq(2, eqs);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptfree", "**ptlptfree %s", MPID_nem_ptl_strerror(ret));
 
     ret = MPID_nem_ptl_rptl_ptfini(MPIDI_nem_ptl_pt);
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c
index 60e07a0..26a1eb2 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c
@@ -131,10 +131,17 @@ int MPID_nem_ptl_poll(int is_blocking_poll)
     /* MPIDI_FUNC_ENTER(MPID_STATE_MPID_NEM_PTL_POLL); */
 
     while (1) {
-        ret = MPID_nem_ptl_rptl_eqget(MPIDI_nem_ptl_eq, &event);
-        if (ret == PTL_EQ_EMPTY)
-            break;
+        /* check both origin and target EQs for events */
+        ret = MPID_nem_ptl_rptl_eqget(MPIDI_nem_ptl_target_eq, &event);
         MPIU_ERR_CHKANDJUMP(ret == PTL_EQ_DROPPED, mpi_errno, MPI_ERR_OTHER, "**eqdropped");
+        if (ret == PTL_EQ_EMPTY) {
+            ret = MPID_nem_ptl_rptl_eqget(MPIDI_nem_ptl_origin_eq, &event);
+            MPIU_ERR_CHKANDJUMP(ret == PTL_EQ_DROPPED, mpi_errno, MPI_ERR_OTHER, "**eqdropped");
+
+            /* if both queues are empty, exit the loop */
+            if (ret == PTL_EQ_EMPTY)
+                break;
+        }
         MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptleqget", "**ptleqget %s", MPID_nem_ptl_strerror(ret));
         MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST, "Received event %s ni_fail=%s list=%s user_ptr=%p hdr_data=%#lx mlength=%lu",
                                                 MPID_nem_ptl_strevent(&event), MPID_nem_ptl_strnifail(event.ni_fail_type),
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c
index fd694f5..cca6d4c 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c
@@ -255,7 +255,7 @@ static int handler_recv_dequeue_large(const ptl_event_t *e)
         md.start = rreq->dev.iov;
         md.length = rreq->dev.iov_count;
         md.options = PTL_IOVEC;
-        md.eq_handle = MPIDI_nem_ptl_eq;
+        md.eq_handle = MPIDI_nem_ptl_origin_eq;
         md.ct_handle = PTL_CT_NONE;
         ret = PtlMDBind(MPIDI_nem_ptl_ni, &md, &REQ_PTL(rreq)->md);
         MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmdbind", "**ptlmdbind %s", MPID_nem_ptl_strerror(ret));
@@ -680,7 +680,7 @@ int MPID_nem_ptl_lmt_start_recv(MPIDI_VC_t *vc,  MPID_Request *rreq, MPID_IOV s_
             md.start = rreq->dev.iov;
             md.length = rreq->dev.iov_count;
             md.options = PTL_IOVEC;
-            md.eq_handle = MPIDI_nem_ptl_eq;
+            md.eq_handle = MPIDI_nem_ptl_origin_eq;
             md.ct_handle = PTL_CT_NONE;
             ret = PtlMDBind(MPIDI_nem_ptl_ni, &md, &REQ_PTL(rreq)->md);
             MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmdbind", "**ptlmdbind %s",
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
index a6fab84..796f559 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
@@ -240,7 +240,7 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
             md.start = sreq->dev.iov;
             md.length = sreq->dev.iov_count;
             md.options = PTL_IOVEC;
-            md.eq_handle = MPIDI_nem_ptl_eq;
+            md.eq_handle = MPIDI_nem_ptl_origin_eq;
             md.ct_handle = PTL_CT_NONE;
             ret = PtlMDBind(MPIDI_nem_ptl_ni, &md, &REQ_PTL(sreq)->md);
             MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmdbind", "**ptlmdbind %s", MPID_nem_ptl_strerror(ret));
@@ -352,7 +352,7 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
                 md.start = sreq->dev.iov;
                 md.length = initial_iov_count;
                 md.options = PTL_IOVEC;
-                md.eq_handle = MPIDI_nem_ptl_eq;
+                md.eq_handle = MPIDI_nem_ptl_origin_eq;
                 md.ct_handle = PTL_CT_NONE;
                 ret = PtlMDBind(MPIDI_nem_ptl_ni, &md, &REQ_PTL(sreq)->md);
                 MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmdbind", "**ptlmdbind %s", MPID_nem_ptl_strerror(ret));

http://git.mpich.org/mpich.git/commitdiff/d5c8c75a01a4705f5d7561ef69423ad731538f3c

commit d5c8c75a01a4705f5d7561ef69423ad731538f3c
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Fri Oct 31 11:13:01 2014 -0500

    portals4: create EQ using limit from NI init
    
    Rather than use an EQ limit that may be lower than the system default,
    just create our EQ using the returned maximum from PtlNIInit.
    
    Signed-off-by: Antonio J. Pena <apenya at mcs.anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
index 4a53a3d..06a8be6 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
@@ -178,7 +178,7 @@ static int ptl_init(MPIDI_PG_t *pg_p, int pg_rank, char **bc_val_p, int *val_max
                     PTL_PID_ANY, &desired, &MPIDI_nem_ptl_ni_limits, &MPIDI_nem_ptl_ni);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlniinit", "**ptlniinit %s", MPID_nem_ptl_strerror(ret));
 
-    ret = PtlEQAlloc(MPIDI_nem_ptl_ni, EQ_COUNT, &MPIDI_nem_ptl_eq);
+    ret = PtlEQAlloc(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_ni_limits.max_eqs, &MPIDI_nem_ptl_eq);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptleqalloc", "**ptleqalloc %s", MPID_nem_ptl_strerror(ret));
 
     /* allocate portal for matching messages */

http://git.mpich.org/mpich.git/commitdiff/aa7eb720e4fe45a32dbb7d34b8a91aae38393837

commit aa7eb720e4fe45a32dbb7d34b8a91aae38393837
Author: Junchao Zhang <jczhang at mcs.anl.gov>
Date:   Sun Nov 2 22:09:42 2014 -0600

    Code clean up for enum MPIR_MPI_State_t
    
    Also remove numbering for each enum, which is not necessary and is
    hard to maintain.
    
    Signed-off-by: Pavan Balaji <balaji at anl.gov>

diff --git a/src/include/mpiimpl.h b/src/include/mpiimpl.h
index f3ede9e..a063061 100644
--- a/src/include/mpiimpl.h
+++ b/src/include/mpiimpl.h
@@ -2165,8 +2165,11 @@ extern struct MPID_CommOps  *MPID_Comm_fns; /* Communicator creation functions *
 
 
 /* Per process data */
-typedef enum MPIR_MPI_State_t { MPICH_PRE_INIT=0, MPICH_WITHIN_MPI=1,
-               MPICH_POST_FINALIZED=2 } MPIR_MPI_State_t;
+typedef enum MPIR_MPI_State_t {
+    MPICH_PRE_INIT=0,
+    MPICH_WITHIN_MPI,
+    MPICH_POST_FINALIZED
+} MPIR_MPI_State_t;
 
 typedef struct PreDefined_attrs {
     int appnum;          /* Application number provided by mpiexec (MPI-2) */

http://git.mpich.org/mpich.git/commitdiff/9470920a1d735ebe14a8e6aab30a222ebc3a0b0c

commit 9470920a1d735ebe14a8e6aab30a222ebc3a0b0c
Author: Pavan Balaji <balaji at anl.gov>
Date:   Tue Oct 21 13:35:02 2014 -0500

    Initial draft of flow-control in the portals4 netmod.
    
    Portals4 by itself does not provide any flow-control.  This needs to
    be managed by an upper-layer, such as MPICH.  Before this patch we
    were relying on a bunch of unexpected buffers that were posted to the
    portals library to manage unexpected messages.  However, since portals
    asynchronously pulls out messages from the network, if the application
    is delayed, it might result in the unexpected buffers being filled out
    and the portal disabled.  This would cause MPICH to abort.
    
    In this patch, we implement an initial version of flow-control that
    allows us to reenable the portal when it gets disabled.  All this is
    done in the context of the "rportals" wrappers that are implemented in
    the rptl.* files.  We create an extra control portal that is only used
    by rportals.  When the primary data portal gets disabled, the target
    sends PAUSE messages to all other processes.  Once each process
    confirms that it has no outstanding packets on the wire (i.e., all
    packets have either been ACKed or NACKed), it sends a PAUSE-ACK
    message.  When the target receives PAUSE-ACK messages from all
    processes (thus confirming that the network traffic to itself has been
    quiesced), it reenables the portal and sends an UNPAUSE message to all
    processes.
    
    This patch still does not deal with origin-side resource exhaustion.
    This can happen, for example, if we run out of space on the event
    queue on the origin side.
    
    Signed-off-by: Ken Raffenetti <raffenet at mcs.anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/Makefile.mk b/src/mpid/ch3/channels/nemesis/netmod/portals4/Makefile.mk
index 3901503..06c26d1 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/Makefile.mk
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/Makefile.mk
@@ -15,10 +15,12 @@ mpi_core_sources +=					\
     src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c		\
     src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c	        \
     src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c            \
-    src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_lmt.c
+    src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_lmt.c             \
+    src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c
 
 noinst_HEADERS +=                                                \
-    src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h 
+    src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h     \
+    src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.h
 
 endif BUILD_NEMESIS_NETMOD_PORTALS4
 
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
index 6130d98..497a51d 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_impl.h
@@ -18,6 +18,7 @@ extern ptl_handle_ni_t MPIDI_nem_ptl_ni;
 extern ptl_pt_index_t  MPIDI_nem_ptl_pt;
 extern ptl_pt_index_t  MPIDI_nem_ptl_get_pt; /* portal for gets by receiver */
 extern ptl_pt_index_t  MPIDI_nem_ptl_control_pt; /* portal for MPICH control messages */
+extern ptl_pt_index_t  MPIDI_nem_ptl_rpt_pt; /* portal for MPICH control messages */
 extern ptl_handle_eq_t MPIDI_nem_ptl_eq;
 
 extern ptl_handle_md_t MPIDI_nem_ptl_global_md;
@@ -88,6 +89,7 @@ typedef struct {
     ptl_pt_index_t pt;
     ptl_pt_index_t ptg;
     ptl_pt_index_t ptc;
+    ptl_pt_index_t ptr;
     int id_initialized; /* TRUE iff id and pt have been initialized */
     MPIDI_msg_sz_t num_queued_sends; /* number of reqs for this vc in sendq */
 } MPID_nem_ptl_vc_area;
@@ -154,7 +156,7 @@ int MPID_nem_ptl_poll_finalize(void);
 int MPID_nem_ptl_poll(int is_blocking_poll);
 int MPID_nem_ptl_vc_terminated(MPIDI_VC_t *vc);
 int MPID_nem_ptl_get_id_from_bc(const char *business_card, ptl_process_t *id, ptl_pt_index_t *pt, ptl_pt_index_t *ptg,
-                                ptl_pt_index_t *ptc);
+                                ptl_pt_index_t *ptc, ptl_pt_index_t *ptr);
 void MPI_nem_ptl_pack_byte(MPID_Segment *segment, MPI_Aint first, MPI_Aint last, void *buf,
                            MPID_nem_ptl_pack_overflow_t *overflow);
 int MPID_nem_ptl_unpack_byte(MPID_Segment *segment, MPI_Aint first, MPI_Aint last, void *buf,
@@ -197,7 +199,7 @@ const char *MPID_nem_ptl_strnifail(ptl_ni_fail_t ni_fail);
 const char *MPID_nem_ptl_strlist(ptl_list_t list);
 
 #define DBG_MSG_PUT(md_, data_sz_, pg_rank_, match_, header_) do {                                                                          \
-        MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST, "PtlPut: md=%s data_sz=%lu pg_rank=%d", md_, data_sz_, pg_rank_));          \
+        MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST, "MPID_nem_ptl_rptl_put: md=%s data_sz=%lu pg_rank=%d", md_, data_sz_, pg_rank_));          \
         MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST, "        tag=%#lx ctx=%#lx rank=%ld match=%#lx",                            \
                                                 NPTL_MATCH_GET_TAG(match_), NPTL_MATCH_GET_CTX(match_), NPTL_MATCH_GET_RANK(match_), match_)); \
         MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST, "        flags=%c%c%c data_sz=%ld header=%#lx",                             \
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
index a6ef6e6..4a53a3d 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
@@ -6,6 +6,7 @@
 
 #include "ptl_impl.h"
 #include <pmi.h>
+#include "rptl.h"
 
 #ifdef ENABLE_CHECKPOINTING
 #error Checkpointing not implemented
@@ -18,11 +19,13 @@
 #define PTI_KEY  "PTI"
 #define PTIG_KEY "PTIG"
 #define PTIC_KEY "PTIC"
+#define PTIR_KEY "PTIR"
 
 ptl_handle_ni_t MPIDI_nem_ptl_ni;
 ptl_pt_index_t  MPIDI_nem_ptl_pt;
 ptl_pt_index_t  MPIDI_nem_ptl_get_pt; /* portal for gets by receiver */
 ptl_pt_index_t  MPIDI_nem_ptl_control_pt; /* portal for MPICH control messages */
+ptl_pt_index_t  MPIDI_nem_ptl_rpt_pt; /* portal for rportals control messages */
 ptl_handle_eq_t MPIDI_nem_ptl_eq;
 ptl_handle_md_t MPIDI_nem_ptl_global_md;
 ptl_ni_limits_t MPIDI_nem_ptl_ni_limits;
@@ -74,6 +77,54 @@ static MPIDI_Comm_ops_t comm_ops = {
 
 
 #undef FUNCNAME
+#define FUNCNAME get_target_info
+#undef FCNAME
+#define FCNAME MPIDI_QUOTE(FUNCNAME)
+static int get_target_info(int rank, ptl_process_t *id, ptl_pt_index_t local_data_pt, ptl_pt_index_t *target_data_pt,
+                           ptl_pt_index_t *target_control_pt)
+{
+    int mpi_errno = MPI_SUCCESS;
+    struct MPIDI_VC *vc;
+    MPID_nem_ptl_vc_area *vc_ptl;
+    MPIDI_STATE_DECL(MPID_STATE_MPID_NEM_GET_TARGET_INFO);
+
+    MPIDI_FUNC_ENTER(MPID_STATE_MPID_NEM_GET_TARGET_INFO);
+
+    MPIDI_PG_Get_vc(MPIDI_Process.my_pg, rank, &vc);
+    vc_ptl = VC_PTL(vc);
+    if (!vc_ptl->id_initialized) {
+        mpi_errno = MPID_nem_ptl_init_id(vc);
+        if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+    }
+
+    *id = vc_ptl->id;
+
+    MPIU_Assert(local_data_pt == MPIDI_nem_ptl_pt || local_data_pt == MPIDI_nem_ptl_get_pt ||
+                local_data_pt == MPIDI_nem_ptl_control_pt);
+
+    if (local_data_pt == MPIDI_nem_ptl_pt) {
+        *target_data_pt = vc_ptl->pt;
+        *target_control_pt = vc_ptl->ptr;
+    }
+    else if (local_data_pt == MPIDI_nem_ptl_get_pt) {
+        *target_data_pt = vc_ptl->ptg;
+        *target_control_pt = PTL_PT_ANY;
+    }
+    else if (local_data_pt == MPIDI_nem_ptl_control_pt) {
+        *target_data_pt = vc_ptl->ptc;
+        *target_control_pt = PTL_PT_ANY;
+    }
+
+ fn_exit:
+    MPIDI_FUNC_EXIT(MPID_STATE_MPID_NEM_GET_TARGET_INFO);
+    return mpi_errno;
+
+ fn_fail:
+    goto fn_exit;
+}
+
+
+#undef FUNCNAME
 #define FUNCNAME ptl_init
 #undef FCNAME
 #define FCNAME MPIDI_QUOTE(FUNCNAME)
@@ -145,6 +196,11 @@ static int ptl_init(MPIDI_PG_t *pg_p, int pg_rank, char **bc_val_p, int *val_max
                      PTL_PT_ANY, &MPIDI_nem_ptl_control_pt);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptalloc", "**ptlptalloc %s", MPID_nem_ptl_strerror(ret));
 
+    /* allocate portal for MPICH control messages */
+    ret = PtlPTAlloc(MPIDI_nem_ptl_ni, PTL_PT_ONLY_USE_ONCE | PTL_PT_ONLY_TRUNCATE | PTL_PT_FLOWCTRL, MPIDI_nem_ptl_eq,
+                     PTL_PT_ANY, &MPIDI_nem_ptl_rpt_pt);
+    MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptalloc", "**ptlptalloc %s", MPID_nem_ptl_strerror(ret));
+
     /* create an MD that covers all of memory */
     md.start = 0;
     md.length = (ptl_size_t)-1;
@@ -154,6 +210,24 @@ static int ptl_init(MPIDI_PG_t *pg_p, int pg_rank, char **bc_val_p, int *val_max
     ret = PtlMDBind(MPIDI_nem_ptl_ni, &md, &MPIDI_nem_ptl_global_md);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmdbind", "**ptlmdbind %s", MPID_nem_ptl_strerror(ret));
 
+    /* currently, rportlas only works with a single NI and EQ */
+    ret = MPID_nem_ptl_rptl_init(MPIDI_Process.my_pg->size, 5, get_target_info);
+    MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlniinit", "**ptlniinit %s", MPID_nem_ptl_strerror(ret));
+
+    /* allow rportal to manage the primary portal and retransmit if needed */
+    ret = MPID_nem_ptl_rptl_ptinit(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_eq, MPIDI_nem_ptl_pt, MPIDI_nem_ptl_rpt_pt);
+    MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptalloc", "**ptlptalloc %s", MPID_nem_ptl_strerror(ret));
+
+    /* allow rportal to manage the get and control portals, but we
+     * don't expect retransmission to be needed on these portals, so
+     * we pass PTL_PT_ANY as the dummy portal.  unfortunately, portals
+     * does not have an "invalid" PT constant, which would have been
+     * more appropriate to pass over here. */
+    ret = MPID_nem_ptl_rptl_ptinit(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_eq, MPIDI_nem_ptl_get_pt, PTL_PT_ANY);
+    MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptalloc", "**ptlptalloc %s", MPID_nem_ptl_strerror(ret));
+
+    ret = MPID_nem_ptl_rptl_ptinit(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_eq, MPIDI_nem_ptl_control_pt, PTL_PT_ANY);
+    MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptalloc", "**ptlptalloc %s", MPID_nem_ptl_strerror(ret));
 
     /* create business card */
     mpi_errno = get_business_card(pg_rank, bc_val_p, val_max_sz_p);
@@ -192,15 +266,30 @@ static int ptl_finalize(void)
     if (mpi_errno) MPIU_ERR_POP(mpi_errno);
 
     /* shut down portals */
+    ret = MPID_nem_ptl_rptl_drain_eq(1, &MPIDI_nem_ptl_eq);
+    MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptfree", "**ptlptfree %s", MPID_nem_ptl_strerror(ret));
+
+    ret = MPID_nem_ptl_rptl_ptfini(MPIDI_nem_ptl_pt);
+    MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptfree", "**ptlptfree %s", MPID_nem_ptl_strerror(ret));
+
     ret = PtlPTFree(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_pt);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptfree", "**ptlptfree %s", MPID_nem_ptl_strerror(ret));
 
+    ret = MPID_nem_ptl_rptl_ptfini(MPIDI_nem_ptl_get_pt);
+    MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptfree", "**ptlptfree %s", MPID_nem_ptl_strerror(ret));
+
     ret = PtlPTFree(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_get_pt);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptfree", "**ptlptfree %s", MPID_nem_ptl_strerror(ret));
 
+    ret = MPID_nem_ptl_rptl_ptfini(MPIDI_nem_ptl_control_pt);
+    MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptfree", "**ptlptfree %s", MPID_nem_ptl_strerror(ret));
+
     ret = PtlPTFree(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_control_pt);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptfree", "**ptlptfree %s", MPID_nem_ptl_strerror(ret));
 
+    ret = PtlPTFree(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_rpt_pt);
+    MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptfree", "**ptlptfree %s", MPID_nem_ptl_strerror(ret));
+
     ret = PtlNIFini(MPIDI_nem_ptl_ni);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlnifini", "**ptlnifini %s", MPID_nem_ptl_strerror(ret));
 
@@ -262,6 +351,12 @@ static int get_business_card(int my_rank, char **bc_val_p, int *val_max_sz_p)
         MPIU_ERR_CHKANDJUMP(str_errno == MPIU_STR_NOMEM, mpi_errno, MPI_ERR_OTHER, "**buscard_len");
         MPIU_ERR_SETANDJUMP(mpi_errno, MPI_ERR_OTHER, "**buscard");
     }
+    str_errno = MPIU_Str_add_binary_arg(bc_val_p, val_max_sz_p, PTIR_KEY, (char *)&MPIDI_nem_ptl_rpt_pt,
+                                        sizeof(MPIDI_nem_ptl_rpt_pt));
+    if (str_errno) {
+        MPIU_ERR_CHKANDJUMP(str_errno == MPIU_STR_NOMEM, mpi_errno, MPI_ERR_OTHER, "**buscard_len");
+        MPIU_ERR_SETANDJUMP(mpi_errno, MPI_ERR_OTHER, "**buscard");
+    }
 
  fn_exit:
     MPIDI_FUNC_EXIT(MPID_STATE_GET_BUSINESS_CARD);
@@ -345,7 +440,7 @@ static int vc_destroy(MPIDI_VC_t *vc)
 #define FUNCNAME MPID_nem_ptl_get_id_from_bc
 #undef FCNAME
 #define FCNAME MPIU_QUOTE(FUNCNAME)
-int MPID_nem_ptl_get_id_from_bc(const char *business_card, ptl_process_t *id, ptl_pt_index_t *pt, ptl_pt_index_t *ptg, ptl_pt_index_t *ptc)
+int MPID_nem_ptl_get_id_from_bc(const char *business_card, ptl_process_t *id, ptl_pt_index_t *pt, ptl_pt_index_t *ptg, ptl_pt_index_t *ptc, ptl_pt_index_t *ptr)
 {
     int mpi_errno = MPI_SUCCESS;
     int ret;
@@ -369,6 +464,9 @@ int MPID_nem_ptl_get_id_from_bc(const char *business_card, ptl_process_t *id, pt
     ret = MPIU_Str_get_binary_arg(business_card, PTIC_KEY, (char *)ptc, sizeof(ptc), &len);
     MPIU_ERR_CHKANDJUMP(ret != MPIU_STR_SUCCESS || len != sizeof(*ptc), mpi_errno, MPI_ERR_OTHER, "**badbusinesscard");
 
+    ret = MPIU_Str_get_binary_arg(business_card, PTIR_KEY, (char *)ptr, sizeof(ptr), &len);
+    MPIU_ERR_CHKANDJUMP(ret != MPIU_STR_SUCCESS || len != sizeof(*ptr), mpi_errno, MPI_ERR_OTHER, "**badbusinesscard");
+
  fn_exit:
     MPIDI_FUNC_EXIT(MPID_STATE_MPID_NEM_PTL_GET_ID_FROM_BC);
     return mpi_errno;
@@ -461,7 +559,7 @@ int MPID_nem_ptl_init_id(MPIDI_VC_t *vc)
     mpi_errno = vc->pg->getConnInfo(vc->pg_rank, bc, val_max_sz, vc->pg);
     if (mpi_errno) MPIU_ERR_POP(mpi_errno);
 
-    mpi_errno = MPID_nem_ptl_get_id_from_bc(bc, &vc_ptl->id, &vc_ptl->pt, &vc_ptl->ptg, &vc_ptl->ptc);
+    mpi_errno = MPID_nem_ptl_get_id_from_bc(bc, &vc_ptl->id, &vc_ptl->pt, &vc_ptl->ptg, &vc_ptl->ptc, &vc_ptl->ptr);
     if (mpi_errno) MPIU_ERR_POP(mpi_errno);
 
     vc_ptl->id_initialized = TRUE;
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c
index 1032a2b..e461bbc 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c
@@ -6,6 +6,7 @@
 
 #include "ptl_impl.h"
 #include <mpl_utlist.h>
+#include "rptl.h"
 
 #define NUM_SEND_BUFS 20
 #define NUM_RECV_BUFS 20
@@ -197,10 +198,10 @@ static inline int send_pkt(MPIDI_VC_t *vc, void **vhdr_p, void **vdata_p, MPIDI_
         if (len > PTL_MAX_EAGER)
             len = PTL_MAX_EAGER;
         MPIU_Memcpy(sb->buf.hp.payload, *data_p, len);
-        ret = PtlPut(MPIDI_nem_ptl_global_md, (ptl_size_t)sb->buf.p, sizeof(sb->buf.hp.hdr) + len, PTL_NO_ACK_REQ, vc_ptl->id, vc_ptl->ptc, 0, 0, sb,
-                     MPIDI_Process.my_pg_rank);
+        ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)sb->buf.p, sizeof(sb->buf.hp.hdr) + len, PTL_NO_ACK_REQ, vc_ptl->id, vc_ptl->ptc, 0, 0, sb,
+                                    MPIDI_Process.my_pg_rank, 1);
         MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s", MPID_nem_ptl_strerror(ret));
-        MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST, "PtlPut(size=%lu id=(%#x,%#x) pt=%#x) sb=%p",
+        MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST, "MPID_nem_ptl_rptl_put(size=%lu id=(%#x,%#x) pt=%#x) sb=%p",
                                                 sizeof(sb->buf.hp.hdr) + len, vc_ptl->id.phys.nid, vc_ptl->id.phys.pid,
                                                 vc_ptl->ptc, sb));
         *hdr_p = NULL;
@@ -214,9 +215,9 @@ static inline int send_pkt(MPIDI_VC_t *vc, void **vhdr_p, void **vdata_p, MPIDI_
             if (len > BUFLEN)
                 len = BUFLEN;
             MPIU_Memcpy(sb->buf.p, *data_p, len);
-            ret = PtlPut(MPIDI_nem_ptl_global_md, (ptl_size_t)sb->buf.p, len, PTL_NO_ACK_REQ, vc_ptl->id, vc_ptl->ptc, 0, 0, sb, MPIDI_Process.my_pg_rank);
+            ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)sb->buf.p, len, PTL_NO_ACK_REQ, vc_ptl->id, vc_ptl->ptc, 0, 0, sb, MPIDI_Process.my_pg_rank, 1);
             MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s", MPID_nem_ptl_strerror(ret));
-            MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST, "PtlPut(size=%lu id=(%#x,%#x) pt=%#x) sb=%p", len,
+            MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST, "MPID_nem_ptl_rptl_put(size=%lu id=(%#x,%#x) pt=%#x) sb=%p", len,
                                                     vc_ptl->id.phys.nid, vc_ptl->id.phys.pid, vc_ptl->ptc, sb));
             *data_p += len;
             *data_sz_p -= len;
@@ -265,10 +266,10 @@ static int send_noncontig_pkt(MPIDI_VC_t *vc, MPID_Request *sreq, void **vhdr_p,
         if (last > PTL_MAX_EAGER)
             last = PTL_MAX_EAGER;
         MPI_nem_ptl_pack_byte(sreq->dev.segment_ptr, 0, last, sb->buf.hp.payload, &REQ_PTL(sreq)->overflow[0]);
-        ret = PtlPut(MPIDI_nem_ptl_global_md, (ptl_size_t)sb->buf.p, sizeof(sb->buf.hp.hdr) + last, PTL_NO_ACK_REQ, vc_ptl->id, vc_ptl->ptc, 0, 0, sb,
-                     MPIDI_Process.my_pg_rank);
+        ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)sb->buf.p, sizeof(sb->buf.hp.hdr) + last, PTL_NO_ACK_REQ, vc_ptl->id, vc_ptl->ptc, 0, 0, sb,
+                                    MPIDI_Process.my_pg_rank, 1);
         MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s", MPID_nem_ptl_strerror(ret));
-        MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST, "PtlPut(size=%lu id=(%#x,%#x) pt=%#x) sb=%p",
+        MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST, "MPID_nem_ptl_rptl_put(size=%lu id=(%#x,%#x) pt=%#x) sb=%p",
                                                 sizeof(sb->buf.hp.hdr) + last, vc_ptl->id.phys.nid, vc_ptl->id.phys.pid,
                                                 vc_ptl->ptc, sb));
         *vhdr_p = NULL;
@@ -290,10 +291,10 @@ static int send_noncontig_pkt(MPIDI_VC_t *vc, MPID_Request *sreq, void **vhdr_p,
 
             MPI_nem_ptl_pack_byte(sreq->dev.segment_ptr, sreq->dev.segment_first, last, sb->buf.p, &REQ_PTL(sreq)->overflow[0]);
             sreq->dev.segment_first = last;
-            ret = PtlPut(MPIDI_nem_ptl_global_md, (ptl_size_t)sb->buf.p, last - sreq->dev.segment_first, PTL_NO_ACK_REQ, vc_ptl->id, vc_ptl->ptc, 0, 0, sb,
-                         MPIDI_Process.my_pg_rank);
+            ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)sb->buf.p, last - sreq->dev.segment_first, PTL_NO_ACK_REQ, vc_ptl->id, vc_ptl->ptc, 0, 0, sb,
+                                        MPIDI_Process.my_pg_rank, 1);
             MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s", MPID_nem_ptl_strerror(ret));
-            MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST, "PtlPut(size=%lu id=(%#x,%#x) pt=%#x) sb=%p",
+            MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST, "MPID_nem_ptl_rptl_put(size=%lu id=(%#x,%#x) pt=%#x) sb=%p",
                                                     last - sreq->dev.segment_first, vc_ptl->id.phys.nid, vc_ptl->id.phys.pid,
                                                     vc_ptl->ptc, sb));
 
@@ -561,8 +562,8 @@ static int send_queued(void)
             send_len += last - sreq->dev.segment_first;
             sreq->dev.segment_first = last;
         }
-        ret = PtlPut(MPIDI_nem_ptl_global_md, (ptl_size_t)sb->buf.p, send_len, PTL_NO_ACK_REQ, VC_PTL(sreq->ch.vc)->id, VC_PTL(sreq->ch.vc)->ptc, 0, 0, sb,
-                     MPIDI_Process.my_pg_rank);
+        ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)sb->buf.p, send_len, PTL_NO_ACK_REQ, VC_PTL(sreq->ch.vc)->id, VC_PTL(sreq->ch.vc)->ptc, 0, 0, sb,
+                                    MPIDI_Process.my_pg_rank, 1);
         MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s", MPID_nem_ptl_strerror(ret));
 
         if (!complete)
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c
index 8104eaf..60e07a0 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c
@@ -5,6 +5,7 @@
  */
 
 #include "ptl_impl.h"
+#include "rptl.h"
 
 #define OVERFLOW_LENGTH (1024*1024)
 #define NUM_OVERFLOW_ME 8
@@ -130,7 +131,7 @@ int MPID_nem_ptl_poll(int is_blocking_poll)
     /* MPIDI_FUNC_ENTER(MPID_STATE_MPID_NEM_PTL_POLL); */
 
     while (1) {
-        ret = PtlEQGet(MPIDI_nem_ptl_eq, &event);
+        ret = MPID_nem_ptl_rptl_eqget(MPIDI_nem_ptl_eq, &event);
         if (ret == PTL_EQ_EMPTY)
             break;
         MPIU_ERR_CHKANDJUMP(ret == PTL_EQ_DROPPED, mpi_errno, MPI_ERR_OTHER, "**eqdropped");
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c
index 15a9345..fd694f5 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c
@@ -5,6 +5,7 @@
  */
 
 #include "ptl_impl.h"
+#include "rptl.h"
 
 #undef FUNCNAME
 #define FUNCNAME dequeue_req
@@ -233,7 +234,7 @@ static int handler_recv_dequeue_large(const ptl_event_t *e)
     if (dt_contig) {
         /* recv buffer is contig */
         REQ_PTL(rreq)->event_handler = handler_recv_complete;
-        ret = PtlGet(MPIDI_nem_ptl_global_md, (ptl_size_t)((char *)rreq->dev.user_buf + dt_true_lb + PTL_LARGE_THRESHOLD),
+        ret = MPID_nem_ptl_rptl_get(MPIDI_nem_ptl_global_md, (ptl_size_t)((char *)rreq->dev.user_buf + dt_true_lb + PTL_LARGE_THRESHOLD),
                      data_sz - PTL_LARGE_THRESHOLD, vc_ptl->id, vc_ptl->ptg, e->match_bits, 0, rreq);
         DBG_MSG_GET("global", data_sz - PTL_LARGE_THRESHOLD, vc->pg_rank, e->match_bits);
         MPIU_DBG_MSG_P(CH3_CHANNEL, VERBOSE, "   buf=%p", (char *)rreq->dev.user_buf + dt_true_lb + PTL_LARGE_THRESHOLD);
@@ -260,7 +261,7 @@ static int handler_recv_dequeue_large(const ptl_event_t *e)
         MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmdbind", "**ptlmdbind %s", MPID_nem_ptl_strerror(ret));
 
         REQ_PTL(rreq)->event_handler = handler_recv_complete;
-        ret = PtlGet(REQ_PTL(rreq)->md, 0, rreq->dev.segment_size - rreq->dev.segment_first, vc_ptl->id, vc_ptl->ptg,
+        ret = MPID_nem_ptl_rptl_get(REQ_PTL(rreq)->md, 0, rreq->dev.segment_size - rreq->dev.segment_first, vc_ptl->id, vc_ptl->ptg,
                      e->match_bits, 0, rreq);
         MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlget", "**ptlget %s", MPID_nem_ptl_strerror(ret));
         goto fn_exit;
@@ -271,7 +272,7 @@ static int handler_recv_dequeue_large(const ptl_event_t *e)
     MPIU_CHKPMEM_MALLOC(REQ_PTL(rreq)->chunk_buffer[0], void *, rreq->dev.segment_size - rreq->dev.segment_first, mpi_errno, "chunk_buffer");
 
     REQ_PTL(rreq)->event_handler = handler_recv_unpack_complete;
-    ret = PtlGet(MPIDI_nem_ptl_global_md, (ptl_size_t)REQ_PTL(rreq)->chunk_buffer[0],
+    ret = MPID_nem_ptl_rptl_get(MPIDI_nem_ptl_global_md, (ptl_size_t)REQ_PTL(rreq)->chunk_buffer[0],
                  rreq->dev.segment_size - rreq->dev.segment_first, vc_ptl->id, vc_ptl->ptg, e->match_bits, 0, rreq);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlget", "**ptlget %s", MPID_nem_ptl_strerror(ret));
 
@@ -331,7 +332,7 @@ static int handler_recv_dequeue_unpack_large(const ptl_event_t *e)
     MPIU_CHKPMEM_MALLOC(REQ_PTL(rreq)->chunk_buffer[0], void *, rreq->dev.segment_size - rreq->dev.segment_first, mpi_errno, "chunk_buffer");
 
     REQ_PTL(rreq)->event_handler = handler_recv_unpack_complete;
-    ret = PtlGet(MPIDI_nem_ptl_global_md, (ptl_size_t)REQ_PTL(rreq)->chunk_buffer[0],
+    ret = MPID_nem_ptl_rptl_get(MPIDI_nem_ptl_global_md, (ptl_size_t)REQ_PTL(rreq)->chunk_buffer[0],
                  rreq->dev.segment_size - rreq->dev.segment_first, vc_ptl->id, vc_ptl->ptg, e->match_bits, 0, rreq);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlget", "**ptlget %s", MPID_nem_ptl_strerror(ret));
 
@@ -645,7 +646,7 @@ int MPID_nem_ptl_lmt_start_recv(MPIDI_VC_t *vc,  MPID_Request *rreq, MPID_IOV s_
         void * real_user_buf = (char *)rreq->dev.user_buf + dt_true_lb;
 
         REQ_PTL(rreq)->event_handler = handler_recv_complete;
-        ret = PtlGet(MPIDI_nem_ptl_global_md, (ptl_size_t)((char *)real_user_buf + PTL_LARGE_THRESHOLD),
+        ret = MPID_nem_ptl_rptl_get(MPIDI_nem_ptl_global_md, (ptl_size_t)((char *)real_user_buf + PTL_LARGE_THRESHOLD),
                      data_sz - PTL_LARGE_THRESHOLD, vc_ptl->id, vc_ptl->ptg, match_bits, 0, rreq);
         DBG_MSG_GET("global", data_sz - PTL_LARGE_THRESHOLD, vc->pg_rank, match_bits);
         MPIU_DBG_MSG_P(CH3_CHANNEL, VERBOSE, "   buf=%p", (char *)real_user_buf + PTL_LARGE_THRESHOLD);
@@ -686,7 +687,7 @@ int MPID_nem_ptl_lmt_start_recv(MPIDI_VC_t *vc,  MPID_Request *rreq, MPID_IOV s_
                                  MPID_nem_ptl_strerror(ret));
 
             REQ_PTL(rreq)->event_handler = handler_recv_complete;
-            ret = PtlGet(REQ_PTL(rreq)->md, 0, rreq->dev.segment_size, vc_ptl->id, vc_ptl->ptg,
+            ret = MPID_nem_ptl_rptl_get(REQ_PTL(rreq)->md, 0, rreq->dev.segment_size, vc_ptl->id, vc_ptl->ptg,
                          match_bits, PTL_LARGE_THRESHOLD, rreq);
             MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlget", "**ptlget %s",
                                  MPID_nem_ptl_strerror(ret));
@@ -697,7 +698,7 @@ int MPID_nem_ptl_lmt_start_recv(MPIDI_VC_t *vc,  MPID_Request *rreq, MPID_IOV s_
             MPIU_CHKPMEM_MALLOC(REQ_PTL(rreq)->chunk_buffer[0], void *, rreq->dev.segment_size,
                                 mpi_errno, "chunk_buffer");
             REQ_PTL(rreq)->event_handler = handler_recv_unpack_complete;
-            ret = PtlGet(MPIDI_nem_ptl_global_md, (ptl_size_t)REQ_PTL(rreq)->chunk_buffer[0],
+            ret = MPID_nem_ptl_rptl_get(MPIDI_nem_ptl_global_md, (ptl_size_t)REQ_PTL(rreq)->chunk_buffer[0],
                          rreq->dev.segment_size, vc_ptl->id, vc_ptl->ptg, match_bits,
                          PTL_LARGE_THRESHOLD, rreq);
             MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlget", "**ptlget %s",
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
index 72d64b3..a6fab84 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
@@ -5,6 +5,7 @@
  */
 
 #include "ptl_impl.h"
+#include "rptl.h"
 
 #undef FUNCNAME
 #define FUNCNAME handler_send_complete
@@ -101,7 +102,7 @@ static int handler_pack_chunk(const ptl_event_t *e)
     sreq->dev.segment_first += PTL_LARGE_THRESHOLD;
 
     /* notify receiver */
-    ret = PtlPut(MPIDI_nem_ptl_global_md, 0, 0, PTL_ACK_REQ, vc_ptl->id,
+    ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, 0, 0, PTL_ACK_REQ, vc_ptl->id,
                  vc_ptl->pt, ?????, 0, sreq,
                  NPTL_HEADER(?????, MPIDI_Process.my_pg_rank, me.match_bits));
 
@@ -208,9 +209,9 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
             MPIU_DBG_MSG(CH3_CHANNEL, VERBOSE, "Small contig message");
             REQ_PTL(sreq)->event_handler = handler_send_complete;
             MPIU_DBG_MSG_P(CH3_CHANNEL, VERBOSE, "&REQ_PTL(sreq)->event_handler = %p", &(REQ_PTL(sreq)->event_handler));
-            ret = PtlPut(MPIDI_nem_ptl_global_md, (ptl_size_t)((char *)buf + dt_true_lb), data_sz, PTL_ACK_REQ, vc_ptl->id, vc_ptl->pt,
+            ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)((char *)buf + dt_true_lb), data_sz, PTL_ACK_REQ, vc_ptl->id, vc_ptl->pt,
                          NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), 0, sreq,
-                         NPTL_HEADER(ssend_flag, data_sz));
+                                        NPTL_HEADER(ssend_flag, data_sz), 1);
             MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s", MPID_nem_ptl_strerror(ret));
             DBG_MSG_PUT("global", data_sz, vc->pg_rank, NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), NPTL_HEADER(ssend_flag, data_sz));
             MPIU_DBG_MSG_D(CH3_CHANNEL, VERBOSE, "id.nid = %#x", vc_ptl->id.phys.nid);
@@ -245,9 +246,9 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
             MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlmdbind", "**ptlmdbind %s", MPID_nem_ptl_strerror(ret));
                 
             REQ_PTL(sreq)->event_handler = handler_send_complete;
-            ret = PtlPut(REQ_PTL(sreq)->md, 0, data_sz, PTL_ACK_REQ, vc_ptl->id, vc_ptl->pt,
+            ret = MPID_nem_ptl_rptl_put(REQ_PTL(sreq)->md, 0, data_sz, PTL_ACK_REQ, vc_ptl->id, vc_ptl->pt,
                          NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), 0, sreq,
-                         NPTL_HEADER(ssend_flag, data_sz));
+                                        NPTL_HEADER(ssend_flag, data_sz), 1);
             MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s", MPID_nem_ptl_strerror(ret));
             DBG_MSG_PUT("sreq", data_sz, vc->pg_rank, NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), NPTL_HEADER(ssend_flag, data_sz));
             goto fn_exit;
@@ -262,9 +263,9 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
         MPID_Segment_pack(sreq->dev.segment_ptr, sreq->dev.segment_first, &last, REQ_PTL(sreq)->chunk_buffer[0]);
         MPIU_Assert(last == sreq->dev.segment_size);
         REQ_PTL(sreq)->event_handler = handler_send_complete;
-        ret = PtlPut(MPIDI_nem_ptl_global_md, (ptl_size_t)REQ_PTL(sreq)->chunk_buffer[0], data_sz, PTL_ACK_REQ,
+        ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)REQ_PTL(sreq)->chunk_buffer[0], data_sz, PTL_ACK_REQ,
                      vc_ptl->id, vc_ptl->pt, NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), 0, sreq,
-                     NPTL_HEADER(ssend_flag, data_sz));
+                                    NPTL_HEADER(ssend_flag, data_sz), 1);
         MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s", MPID_nem_ptl_strerror(ret));
         DBG_MSG_PUT("global", data_sz, vc->pg_rank, NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), NPTL_HEADER(ssend_flag, data_sz));
         goto fn_exit;
@@ -292,9 +293,9 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
         REQ_PTL(sreq)->large = TRUE;
             
         REQ_PTL(sreq)->event_handler = handler_large;
-        ret = PtlPut(MPIDI_nem_ptl_global_md, (ptl_size_t)((char *)buf + dt_true_lb), PTL_LARGE_THRESHOLD, PTL_ACK_REQ, vc_ptl->id, vc_ptl->pt,
+        ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)((char *)buf + dt_true_lb), PTL_LARGE_THRESHOLD, PTL_ACK_REQ, vc_ptl->id, vc_ptl->pt,
                      NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), 0, sreq,
-                     NPTL_HEADER(ssend_flag | NPTL_LARGE, data_sz));
+                                    NPTL_HEADER(ssend_flag | NPTL_LARGE, data_sz), 1);
         MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s", MPID_nem_ptl_strerror(ret));
         DBG_MSG_PUT("global", PTL_LARGE_THRESHOLD, vc->pg_rank, NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), NPTL_HEADER(ssend_flag | NPTL_LARGE, data_sz));
         goto fn_exit;
@@ -359,9 +360,9 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
                 REQ_PTL(sreq)->large = TRUE;
                         
                 REQ_PTL(sreq)->event_handler = handler_large;
-                ret = PtlPut(REQ_PTL(sreq)->md, 0, PTL_LARGE_THRESHOLD, PTL_ACK_REQ, vc_ptl->id, vc_ptl->pt,
+                ret = MPID_nem_ptl_rptl_put(REQ_PTL(sreq)->md, 0, PTL_LARGE_THRESHOLD, PTL_ACK_REQ, vc_ptl->id, vc_ptl->pt,
                              NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), 0, sreq,
-                             NPTL_HEADER(ssend_flag | NPTL_LARGE, data_sz));
+                                            NPTL_HEADER(ssend_flag | NPTL_LARGE, data_sz), 1);
                 MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s", MPID_nem_ptl_strerror(ret));
                 DBG_MSG_PUT("req", PTL_LARGE_THRESHOLD, vc->pg_rank, NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), NPTL_HEADER(ssend_flag | NPTL_LARGE, data_sz));
                 goto fn_exit;
@@ -397,9 +398,9 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
     REQ_PTL(sreq)->large = TRUE;
     
     REQ_PTL(sreq)->event_handler = handler_large;
-    ret = PtlPut(MPIDI_nem_ptl_global_md, (ptl_size_t)REQ_PTL(sreq)->chunk_buffer[0], PTL_LARGE_THRESHOLD, PTL_ACK_REQ,
+    ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)REQ_PTL(sreq)->chunk_buffer[0], PTL_LARGE_THRESHOLD, PTL_ACK_REQ,
                  vc_ptl->id, vc_ptl->pt, NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), 0, sreq,
-                 NPTL_HEADER(ssend_flag | NPTL_LARGE, data_sz));
+                                NPTL_HEADER(ssend_flag | NPTL_LARGE, data_sz), 1);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s", MPID_nem_ptl_strerror(ret));
     DBG_MSG_PUT("global", PTL_LARGE_THRESHOLD, vc->pg_rank, NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), NPTL_HEADER(ssend_flag | NPTL_LARGE, data_sz));
     goto fn_exit;
@@ -438,9 +439,9 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
     REQ_PTL(sreq)->large = TRUE;
                         
     REQ_PTL(sreq)->event_handler = handler_large_multi;
-    ret = PtlPut(MPIDI_nem_ptl_global_md, (ptl_size_t)REQ_PTL(sreq_)->chunk_buffer[0], PTL_LARGE_THRESHOLD, PTL_ACK_REQ, vc_ptl->id,
+    ret = MPID_nem_ptl_rptl_put(MPIDI_nem_ptl_global_md, (ptl_size_t)REQ_PTL(sreq_)->chunk_buffer[0], PTL_LARGE_THRESHOLD, PTL_ACK_REQ, vc_ptl->id,
                  vc_ptl->pt, NPTL_MATCH(tag, comm->context_id + context_offset, comm->rank), 0, sreq,
-                 NPTL_HEADER(ssend_flag | NPTL_LARGE | NPTL_MULTIPLE, data_sz));
+                                NPTL_HEADER(ssend_flag | NPTL_LARGE | NPTL_MULTIPLE, data_sz), 1);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlput", "**ptlput %s", MPID_nem_ptl_strerror(ret));
 #endif
     
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c
new file mode 100644
index 0000000..da126e4
--- /dev/null
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.c
@@ -0,0 +1,1272 @@
+/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
+/*
+ *  (C) 2014 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ */
+
+#include "ptl_impl.h"
+#include "rptl.h"
+
+/*
+ * Prereqs:
+ *
+ * 1. We create an extra control portal that is only used by rportals.
+ *
+ * 2. All communication operations are logged at the origin process,
+ * and their ACKs and NACKs are kept track of.  If an operation gets
+ * an ACK, it is complete and can be deleted from the logs.  If an
+ * operation gets a NACK, it will need to be retransmitted once the
+ * flow-control protocol described below has completed.
+ *
+ *
+ * Flow control algorithm:
+ *
+ * 1. When the primary data portal gets disabled, the target sends
+ * PAUSE messages to all other processes.
+ *
+ * 2. Once each process confirms that it has no outstanding packets on
+ * the wire (i.e., all packets have either been ACKed or NACKed), it
+ * sends a PAUSE-ACK message.
+ *
+ * 3. When the target receives PAUSE-ACK messages from all processes
+ * (thus confirming that the network traffic to itself has been
+ * quiesced), it waits till the user has dequeued at least half the
+ * messages from the overflow buffer.  This is done by keeping track
+ * of the number of messages that are injected into the overflow
+ * buffer by portals and the number of messages that are dequeued by
+ * the user.
+ *
+ * 4. Once we know that there is enough free space in the overflow
+ * buffers, the target reenables the portal and send an UNPAUSE
+ * message to all processes.
+ *
+ *
+ * Known issues:
+ *
+ * 1. None of the error codes specified by portals allow us to return
+ * an "OTHER" error, when something bad happens internally.  So we
+ * arbitrarily return PTL_FAIL when it is an internal error even
+ * though that's not a specified error return code for some portals
+ * functions.  When portals functions are called internally, if they
+ * return an error, we funnel them back upstream.  This is not an
+ * "issue" per se, but is still ugly.
+ *
+ * 2. None of the pt index types specified by portals allow us to
+ * retuen an "INVALID" pt entry, to show that a portal is invalid.  So
+ * we arbitrarily use PTL_PT_ANY in such cases.  Again, this is not an
+ * "issue" per se, but is ugly.
+ */
+
+#define IDS_ARE_EQUAL(t1, t2) \
+    (t1.phys.nid == t2.phys.nid && t1.phys.pid == t2.phys.pid)
+
+#define RPTL_OP_POOL_SEGMENT_COUNT  (1024)
+
+static struct {
+    struct rptl *rptl_list;
+
+    struct rptl_op_pool_segment {
+        struct rptl_op op[RPTL_OP_POOL_SEGMENT_COUNT];
+        struct rptl_op_pool_segment *next;
+        struct rptl_op_pool_segment *prev;
+    } *op_segment_list;
+    struct rptl_op *op_pool;
+
+    struct rptl_op *op_list;
+
+    /* targets that we do not send messages to either because they
+     * sent a PAUSE message or because we received a NACK from them */
+    struct rptl_paused_target {
+        ptl_process_t id;
+        enum rptl_paused_target_state {
+            RPTL_TARGET_STATE_FLOWCONTROL,
+            RPTL_TARGET_STATE_DISABLED,
+            RPTL_TARGET_STATE_RECEIVED_PAUSE,
+            RPTL_TARGET_STATE_PAUSE_ACKED
+        } state;
+
+        /* the rptl on which the pause message came in, since we need
+         * to use it to send the pause ack to the right target
+         * portal */
+        struct rptl *rptl;
+
+        struct rptl_paused_target *next;
+        struct rptl_paused_target *prev;
+    } *paused_target_list;
+
+    int world_size;
+    uint64_t origin_events_left;
+    int (*get_target_info) (int rank, ptl_process_t * id, ptl_pt_index_t local_data_pt,
+                            ptl_pt_index_t * target_data_pt, ptl_pt_index_t * target_control_pt);
+} rptl_info;
+
+
+#undef FUNCNAME
+#define FUNCNAME alloc_target
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+static int alloc_target(ptl_process_t id, enum rptl_paused_target_state state, struct rptl *rptl)
+{
+    int mpi_errno = MPI_SUCCESS;
+    int ret = PTL_OK;
+    struct rptl_paused_target *target;
+    MPIU_CHKPMEM_DECL(1);
+    MPIDI_STATE_DECL(MPID_STATE_ALLOC_TARGET);
+
+    MPIDI_FUNC_ENTER(MPID_STATE_ALLOC_TARGET);
+
+    for (target = rptl_info.paused_target_list; target; target = target->next)
+        if (IDS_ARE_EQUAL(target->id, id))
+            break;
+
+    /* if a paused target does not already exist, create one */
+    if (target == NULL) {
+        /* create a new paused target */
+        MPIU_CHKPMEM_MALLOC(target, struct rptl_paused_target *, sizeof(struct rptl_paused_target),
+                            mpi_errno, "rptl paused target");
+        MPL_DL_APPEND(rptl_info.paused_target_list, target);
+
+        target->id = id;
+        target->state = state;
+        target->rptl = rptl;
+    }
+    else if (target->state < state) {
+        target->state = state;
+        target->rptl = rptl;
+    }
+    else {
+        /* target already exists and is in a higher state than the
+         * state we are trying to set.  e.g., this is possible if we
+         * got a PAUSE event from a different portal and acked. */
+    }
+
+  fn_exit:
+    MPIU_CHKPMEM_COMMIT();
+    MPIDI_FUNC_EXIT(MPID_STATE_ALLOC_TARGET);
+    return ret;
+
+  fn_fail:
+    if (mpi_errno)
+        ret = PTL_FAIL;
+    MPIU_CHKPMEM_REAP();
+    goto fn_exit;
+}
+
+
+#undef FUNCNAME
+#define FUNCNAME alloc_op_segment
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+static int alloc_op_segment(void)
+{
+    struct rptl_op_pool_segment *op_segment;
+    int mpi_errno = MPI_SUCCESS;
+    int i;
+    int ret = PTL_OK;
+    MPIU_CHKPMEM_DECL(1);
+    MPIDI_STATE_DECL(MPID_STATE_ALLOC_OP_SEGMENT);
+
+    MPIDI_FUNC_ENTER(MPID_STATE_ALLOC_OP_SEGMENT);
+
+    MPIU_CHKPMEM_MALLOC(op_segment, struct rptl_op_pool_segment *, sizeof(struct rptl_op_pool_segment),
+                        mpi_errno, "op pool segment");
+    MPL_DL_APPEND(rptl_info.op_segment_list, op_segment);
+
+    for (i = 0; i < RPTL_OP_POOL_SEGMENT_COUNT; i++)
+        MPL_DL_APPEND(rptl_info.op_pool, &op_segment->op[i]);
+
+  fn_exit:
+    MPIU_CHKPMEM_COMMIT();
+    MPIDI_FUNC_EXIT(MPID_STATE_ALLOC_OP_SEGMENT);
+    return ret;
+
+  fn_fail:
+    if (mpi_errno)
+        ret = PTL_FAIL;
+    MPIU_CHKPMEM_REAP();
+    goto fn_exit;
+}
+
+
+#undef FUNCNAME
+#define FUNCNAME MPID_nem_ptl_rptl_init
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+int MPID_nem_ptl_rptl_init(int world_size, uint64_t max_origin_events,
+                           int (*get_target_info) (int rank, ptl_process_t * id,
+                                                   ptl_pt_index_t local_data_pt,
+                                                   ptl_pt_index_t * target_data_pt,
+                                                   ptl_pt_index_t * target_control_pt))
+{
+    int mpi_errno = MPI_SUCCESS;
+    int ret = PTL_OK;
+    MPIDI_STATE_DECL(MPID_STATE_MPID_NEM_PTL_RPTL_INIT);
+
+    MPIDI_FUNC_ENTER(MPID_STATE_MPID_NEM_PTL_RPTL_INIT);
+
+    rptl_info.rptl_list = NULL;
+
+    rptl_info.op_pool = NULL;
+    ret = alloc_op_segment();
+    RPTLU_ERR_POP(ret, "error allocating op segment\n");
+
+    rptl_info.op_list = NULL;
+
+    rptl_info.paused_target_list = NULL;
+    rptl_info.world_size = world_size;
+    rptl_info.origin_events_left = max_origin_events;
+    rptl_info.get_target_info = get_target_info;
+
+  fn_exit:
+    MPIDI_FUNC_EXIT(MPID_STATE_MPID_NEM_PTL_RPTL_INIT);
+    return ret;
+
+  fn_fail:
+    goto fn_exit;
+}
+
+
+#undef FUNCNAME
+#define FUNCNAME MPID_nem_ptl_rptl_drain_eq
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+int MPID_nem_ptl_rptl_drain_eq(int eq_count, ptl_handle_eq_t *eq)
+{
+    int ret = PTL_OK;
+    ptl_event_t event;
+    struct rptl_op_pool_segment *op_segment;
+    int i;
+    MPIDI_STATE_DECL(MPID_STATE_MPID_NEM_PTL_RPTL_FINALIZE);
+
+    MPIDI_FUNC_ENTER(MPID_STATE_MPID_NEM_PTL_RPTL_FINALIZE);
+
+    while (rptl_info.op_list) {
+        for (i = 0; i < eq_count; i++) {
+            /* read and ignore all events */
+            ret = MPID_nem_ptl_rptl_eqget(eq[i], &event);
+            if (ret == PTL_EQ_EMPTY)
+                ret = PTL_OK;
+            RPTLU_ERR_POP(ret, "Error calling MPID_nem_ptl_rptl_eqget\n");
+        }
+    }
+
+    while (rptl_info.op_segment_list) {
+        op_segment = rptl_info.op_segment_list;
+        MPL_DL_DELETE(rptl_info.op_segment_list, op_segment);
+        MPIU_Free(op_segment);
+    }
+
+  fn_exit:
+    MPIDI_FUNC_EXIT(MPID_STATE_MPID_NEM_PTL_RPTL_FINALIZE);
+    return ret;
+
+  fn_fail:
+    goto fn_exit;
+}
+
+
+#undef FUNCNAME
+#define FUNCNAME post_empty_buffer
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+static inline int post_empty_buffer(ptl_handle_ni_t ni_handle, ptl_pt_index_t pt,
+                                    ptl_handle_me_t * me_handle)
+{
+    int ret;
+    ptl_me_t me;
+    ptl_process_t id;
+    MPIDI_STATE_DECL(MPID_STATE_POST_EMPTY_BUFFER);
+
+    MPIDI_FUNC_ENTER(MPID_STATE_POST_EMPTY_BUFFER);
+
+    id.phys.nid = PTL_NID_ANY;
+    id.phys.pid = PTL_PID_ANY;
+
+    me.start = NULL;
+    me.length = 0;
+    me.ct_handle = PTL_CT_NONE;
+    me.uid = PTL_UID_ANY;
+    me.options = (PTL_ME_OP_PUT | PTL_ME_OP_GET | PTL_ME_USE_ONCE | PTL_ME_IS_ACCESSIBLE |
+                  PTL_ME_EVENT_LINK_DISABLE | PTL_ME_EVENT_UNLINK_DISABLE);
+    me.match_id = id;
+    me.match_bits = 0;
+    me.ignore_bits = 0;
+    me.min_free = 0;
+
+    ret = PtlMEAppend(ni_handle, pt, &me, PTL_PRIORITY_LIST, NULL, me_handle);
+    RPTLU_ERR_POP(ret, "Error appending empty buffer to priority list\n");
+
+  fn_exit:
+    MPIDI_FUNC_EXIT(MPID_STATE_POST_EMPTY_BUFFER);
+    return ret;
+
+  fn_fail:
+    goto fn_exit;
+}
+
+
+#undef FUNCNAME
+#define FUNCNAME MPID_nem_ptl_rptl_ptinit
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+int MPID_nem_ptl_rptl_ptinit(ptl_handle_ni_t ni_handle, ptl_handle_eq_t eq_handle, ptl_pt_index_t data_pt,
+                             ptl_pt_index_t control_pt)
+{
+    int ret = PTL_OK;
+    struct rptl *rptl;
+    int mpi_errno = MPI_SUCCESS;
+    int i;
+    ptl_md_t md;
+    MPIU_CHKPMEM_DECL(2);
+    MPIDI_STATE_DECL(MPID_STATE_MPID_NEM_PTL_RPTL_PTINIT);
+
+    MPIDI_FUNC_ENTER(MPID_STATE_MPID_NEM_PTL_RPTL_PTINIT);
+
+
+    /* setup the parts of rptls that can be done before world size or
+     * target information */
+    MPIU_CHKPMEM_MALLOC(rptl, struct rptl *, sizeof(struct rptl), mpi_errno, "rptl");
+    MPL_DL_APPEND(rptl_info.rptl_list, rptl);
+
+    rptl->local_state = RPTL_LOCAL_STATE_NORMAL;
+    rptl->pause_ack_counter = 0;
+
+    rptl->data.ob_max_count = 0;
+    rptl->data.ob_curr_count = 0;
+
+    rptl->data.pt = data_pt;
+    rptl->control.pt = control_pt;
+
+    rptl->ni = ni_handle;
+    rptl->eq = eq_handle;
+
+    md.start = 0;
+    md.length = (ptl_size_t) (-1);
+    md.options = 0x0;
+    md.eq_handle = rptl->eq;
+    md.ct_handle = PTL_CT_NONE;
+    ret = PtlMDBind(rptl->ni, &md, &rptl->md);
+    RPTLU_ERR_POP(ret, "Error binding new global MD\n");
+
+    /* post world_size number of empty buffers on the control portal */
+    if (rptl->control.pt != PTL_PT_ANY) {
+        MPIU_CHKPMEM_MALLOC(rptl->control.me, ptl_handle_me_t *,
+                            2 * rptl_info.world_size * sizeof(ptl_handle_me_t), mpi_errno,
+                            "rptl target info");
+        for (i = 0; i < 2 * rptl_info.world_size; i++) {
+            ret = post_empty_buffer(rptl->ni, rptl->control.pt, &rptl->control.me[i]);
+            RPTLU_ERR_POP(ret, "Error in post_empty_buffer\n");
+        }
+        rptl->control.me_idx = 0;
+    }
+
+  fn_exit:
+    MPIU_CHKPMEM_COMMIT();
+    MPIDI_FUNC_EXIT(MPID_STATE_MPID_NEM_PTL_RPTL_PTINIT);
+    return ret;
+
+  fn_fail:
+    if (mpi_errno)
+        ret = PTL_FAIL;
+    MPIU_CHKPMEM_REAP();
+    goto fn_exit;
+}
+
+
+#undef FUNCNAME
+#define FUNCNAME MPID_nem_ptl_rptl_ptfini
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+int MPID_nem_ptl_rptl_ptfini(ptl_pt_index_t pt_index)
+{
+    int i;
+    int ret = PTL_OK;
+    struct rptl *rptl;
+    MPIDI_STATE_DECL(MPID_STATE_MPID_NEM_PTL_RPTL_PTFINI);
+
+    MPIDI_FUNC_ENTER(MPID_STATE_MPID_NEM_PTL_RPTL_PTFINI);
+
+    /* find the right rptl */
+    for (rptl = rptl_info.rptl_list; rptl && rptl->data.pt != pt_index; rptl = rptl->next);
+    assert(rptl);
+
+    /* free control portals that were created */
+    if (rptl->control.pt != PTL_PT_ANY) {
+        for (i = 0; i < rptl_info.world_size * 2; i++) {
+            ret = PtlMEUnlink(rptl->control.me[i]);
+            RPTLU_ERR_POP(ret, "Error unlinking control buffers\n");
+        }
+        MPIU_Free(rptl->control.me);
+    }
+
+    MPL_DL_DELETE(rptl_info.rptl_list, rptl);
+    MPIU_Free(rptl);
+
+  fn_exit:
+    MPIDI_FUNC_EXIT(MPID_STATE_MPID_NEM_PTL_RPTL_PTFINI);
+    return ret;
+
+  fn_fail:
+    goto fn_exit;
+}
+
+
+#undef FUNCNAME
+#define FUNCNAME alloc_op
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+int alloc_op(struct rptl_op **op)
+{
+    int ret = PTL_OK;
+    MPIDI_STATE_DECL(MPID_STATE_ALLOC_OP);
+
+    MPIDI_FUNC_ENTER(MPID_STATE_ALLOC_OP);
+
+    if (rptl_info.op_pool == NULL) {
+        ret = alloc_op_segment();
+        RPTLU_ERR_POP(ret, "error allocating op segment\n");
+    }
+
+    *op = rptl_info.op_pool;
+    MPL_DL_DELETE(rptl_info.op_pool, *op);
+
+  fn_exit:
+    MPIDI_FUNC_EXIT(MPID_STATE_ALLOC_OP);
+    return ret;
+
+  fn_fail:
+    goto fn_exit;
+}
+
+
+#undef FUNCNAME
+#define FUNCNAME free_op
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+void free_op(struct rptl_op *op)
+{
+    MPIDI_STATE_DECL(MPID_STATE_FREE_OP);
+
+    MPIDI_FUNC_ENTER(MPID_STATE_FREE_OP);
+
+    MPL_DL_APPEND(rptl_info.op_pool, op);
+
+    MPIDI_FUNC_EXIT(MPID_STATE_FREE_OP);
+}
+
+
+#undef FUNCNAME
+#define FUNCNAME issue_op
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+int issue_op(struct rptl_op *op)
+{
+    int ret = PTL_OK;
+    struct rptl_paused_target *target;
+    MPIDI_STATE_DECL(MPID_STATE_ISSUE_OP);
+
+    MPIDI_FUNC_ENTER(MPID_STATE_ISSUE_OP);
+
+    if (op->op_type == RPTL_OP_PUT) {
+        for (target = rptl_info.paused_target_list; target; target = target->next)
+            if (IDS_ARE_EQUAL(target->id, op->u.put.target_id))
+                break;
+
+        if (target && op->u.put.flow_control)
+            goto fn_exit;
+
+        if (rptl_info.origin_events_left < 2) {
+            ret = alloc_target(op->u.put.target_id, RPTL_TARGET_STATE_FLOWCONTROL, NULL);
+            RPTLU_ERR_POP(ret, "error allocating paused target\n");
+            goto fn_exit;
+        }
+        rptl_info.origin_events_left -= 2;
+
+        /* force request for an ACK even if the user didn't ask for
+         * it.  replace the user pointer with the OP id. */
+        ret =
+            PtlPut(op->u.put.md_handle, op->u.put.local_offset, op->u.put.length,
+                   PTL_ACK_REQ, op->u.put.target_id, op->u.put.pt_index,
+                   op->u.put.match_bits, op->u.put.remote_offset, op,
+                   op->u.put.hdr_data);
+        RPTLU_ERR_POP(ret, "Error issuing PUT\n");
+    }
+    else {
+        for (target = rptl_info.paused_target_list; target; target = target->next)
+            if (IDS_ARE_EQUAL(target->id, op->u.get.target_id))
+                break;
+
+        if (target)
+            goto fn_exit;
+
+        if (rptl_info.origin_events_left < 1) {
+            ret = alloc_target(op->u.get.target_id, RPTL_TARGET_STATE_FLOWCONTROL, NULL);
+            RPTLU_ERR_POP(ret, "error allocating paused target\n");
+            goto fn_exit;
+        }
+        rptl_info.origin_events_left--;
+
+        ret =
+            PtlGet(op->u.get.md_handle, op->u.get.local_offset, op->u.get.length,
+                   op->u.get.target_id, op->u.get.pt_index, op->u.get.match_bits,
+                   op->u.get.remote_offset, op);
+        RPTLU_ERR_POP(ret, "Error issuing GET\n");
+    }
+
+    op->state = RPTL_OP_STATE_ISSUED;
+
+  fn_exit:
+    MPIDI_FUNC_EXIT(MPID_STATE_ISSUE_OP);
+    return ret;
+
+  fn_fail:
+    goto fn_exit;
+}
+
+
+#undef FUNCNAME
+#define FUNCNAME MPID_nem_ptl_rptl_put
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+int MPID_nem_ptl_rptl_put(ptl_handle_md_t md_handle, ptl_size_t local_offset, ptl_size_t length,
+                          ptl_ack_req_t ack_req, ptl_process_t target_id, ptl_pt_index_t pt_index,
+                          ptl_match_bits_t match_bits, ptl_size_t remote_offset, void *user_ptr,
+                          ptl_hdr_data_t hdr_data, int flow_control)
+{
+    struct rptl_op *op;
+    int ret = PTL_OK;
+    MPIDI_STATE_DECL(MPID_STATE_MPID_NEM_PTL_RPTL_PUT);
+
+    MPIDI_FUNC_ENTER(MPID_STATE_MPID_NEM_PTL_RPTL_PUT);
+
+    ret = alloc_op(&op);
+    RPTLU_ERR_POP(ret, "error allocating op\n");
+
+    op->op_type = RPTL_OP_PUT;
+    op->state = RPTL_OP_STATE_QUEUED;
+
+    /* store the user parameters */
+    op->u.put.md_handle = md_handle;
+    op->u.put.local_offset = local_offset;
+    op->u.put.length = length;
+    op->u.put.ack_req = ack_req;
+    op->u.put.target_id = target_id;
+    op->u.put.pt_index = pt_index;
+    op->u.put.match_bits = match_bits;
+    op->u.put.remote_offset = remote_offset;
+    op->u.put.user_ptr = user_ptr;
+    op->u.put.hdr_data = hdr_data;
+
+    /* place to store the send and ack events */
+    op->u.put.send = NULL;
+    op->u.put.ack = NULL;
+    op->u.put.flow_control = flow_control;
+    op->events_ready = 0;
+
+    MPL_DL_APPEND(rptl_info.op_list, op);
+
+    /* if we are not in a PAUSED state, issue the operation */
+    ret = issue_op(op);
+    RPTLU_ERR_POP(ret, "Error from issue_op\n");
+
+  fn_exit:
+    MPIDI_FUNC_EXIT(MPID_STATE_MPID_NEM_PTL_RPTL_PUT);
+    return ret;
+
+  fn_fail:
+    goto fn_exit;
+}
+
+
+#undef FUNCNAME
+#define FUNCNAME MPID_nem_ptl_rptl_get
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+int MPID_nem_ptl_rptl_get(ptl_handle_md_t md_handle, ptl_size_t local_offset, ptl_size_t length,
+                          ptl_process_t target_id, ptl_pt_index_t pt_index,
+                          ptl_match_bits_t match_bits, ptl_size_t remote_offset, void *user_ptr)
+{
+    struct rptl_op *op;
+    int ret = PTL_OK;
+    struct rptl_paused_target *target;
+    MPIDI_STATE_DECL(MPID_STATE_MPID_NEM_PTL_RPTL_GET);
+
+    MPIDI_FUNC_ENTER(MPID_STATE_MPID_NEM_PTL_RPTL_GET);
+
+    ret = alloc_op(&op);
+    RPTLU_ERR_POP(ret, "error allocating op\n");
+
+    op->op_type = RPTL_OP_GET;
+    op->state = RPTL_OP_STATE_QUEUED;
+
+    /* store the user parameters */
+    op->u.get.md_handle = md_handle;
+    op->u.get.local_offset = local_offset;
+    op->u.get.length = length;
+    op->u.get.target_id = target_id;
+    op->u.get.pt_index = pt_index;
+    op->u.get.match_bits = match_bits;
+    op->u.get.remote_offset = remote_offset;
+    op->u.get.user_ptr = user_ptr;
+
+    op->events_ready = 0;
+
+    MPL_DL_APPEND(rptl_info.op_list, op);
+
+    for (target = rptl_info.paused_target_list; target; target = target->next)
+        if (IDS_ARE_EQUAL(target->id, target_id))
+            break;
+
+    ret = issue_op(op);
+    RPTLU_ERR_POP(ret, "Error from issue_op\n");
+
+  fn_exit:
+    MPIDI_FUNC_EXIT(MPID_STATE_MPID_NEM_PTL_RPTL_GET);
+    return ret;
+
+  fn_fail:
+    goto fn_exit;
+}
+
+
+#undef FUNCNAME
+#define FUNCNAME send_pause_messages
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+static int send_pause_messages(struct rptl *rptl)
+{
+    int i, mpi_errno = MPI_SUCCESS;
+    ptl_process_t id;
+    ptl_pt_index_t data_pt, control_pt;
+    int ret = PTL_OK;
+    MPIDI_STATE_DECL(MPID_STATE_SEND_PAUSE_MESSAGES);
+
+    MPIDI_FUNC_ENTER(MPID_STATE_SEND_PAUSE_MESSAGES);
+
+    /* if no control portal is setup for this rptl, we are doomed */
+    assert(rptl->control.pt != PTL_PT_ANY);
+
+    /* set the max message count in the overflow buffers we can keep
+     * before sending the unpause messages */
+    rptl->data.ob_max_count = rptl->data.ob_curr_count / 2;
+
+    for (i = 0; i < rptl_info.world_size; i++) {
+        mpi_errno = rptl_info.get_target_info(i, &id, rptl->data.pt, &data_pt, &control_pt);
+        if (mpi_errno) {
+            ret = PTL_FAIL;
+            RPTLU_ERR_POP(ret, "Error getting target info while sending pause messages\n");
+        }
+
+        /* disable flow control for control messages */
+        ret = MPID_nem_ptl_rptl_put(rptl->md, 0, 0, PTL_NO_ACK_REQ, id, control_pt, 0, 0,
+                                    NULL, RPTL_CONTROL_MSG_PAUSE, 0);
+        RPTLU_ERR_POP(ret, "Error sending pause message\n");
+    }
+
+  fn_exit:
+    MPIDI_FUNC_EXIT(MPID_STATE_SEND_PAUSE_MESSAGES);
+    return ret;
+
+  fn_fail:
+    goto fn_exit;
+}
+
+
+#undef FUNCNAME
+#define FUNCNAME send_pause_ack_messages
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+static int send_pause_ack_messages(void)
+{
+    struct rptl_op *op;
+    int ret = PTL_OK;
+    struct rptl_paused_target *target;
+    MPIDI_STATE_DECL(MPID_STATE_SEND_PAUSE_ACK_MESSAGES);
+
+    MPIDI_FUNC_ENTER(MPID_STATE_SEND_PAUSE_ACK_MESSAGES);
+
+    for (target = rptl_info.paused_target_list; target; target = target->next) {
+        if (target->state != RPTL_TARGET_STATE_RECEIVED_PAUSE)
+            continue;
+
+        for (op = rptl_info.op_list; op; op = op->next) {
+            if (op->op_type == RPTL_OP_GET && IDS_ARE_EQUAL(op->u.get.target_id, target->id) &&
+                op->state == RPTL_OP_STATE_ISSUED)
+                break;
+
+            if (op->op_type == RPTL_OP_PUT && IDS_ARE_EQUAL(op->u.put.target_id, target->id)) {
+                if (op->state == RPTL_OP_STATE_ISSUED)
+                    break;
+                if (op->u.put.send || op->u.put.ack)
+                    break;
+            }
+        }
+
+        if (op == NULL) {
+            ptl_process_t id;
+            ptl_pt_index_t data_pt, control_pt;
+            int i;
+            int mpi_errno = MPI_SUCCESS;
+
+            for (i = 0; i < rptl_info.world_size; i++) {
+                /* find the target that has this target id and get the
+                 * control portal information for it */
+                mpi_errno = rptl_info.get_target_info(i, &id, target->rptl->data.pt, &data_pt, &control_pt);
+                if (mpi_errno) {
+                    ret = PTL_FAIL;
+                    RPTLU_ERR_POP(ret,
+                                  "Error getting target info while sending pause ack message\n");
+                }
+                if (IDS_ARE_EQUAL(id, target->id))
+                    break;
+            }
+
+            /* disable flow control for control messages */
+            ret =
+                MPID_nem_ptl_rptl_put(target->rptl->md, 0, 0, PTL_NO_ACK_REQ, id, control_pt, 0,
+                                      0, NULL, RPTL_CONTROL_MSG_PAUSE_ACK, 0);
+            RPTLU_ERR_POP(ret, "Error sending pause ack message\n");
+
+            if (target->state == RPTL_TARGET_STATE_RECEIVED_PAUSE)
+                target->state = RPTL_TARGET_STATE_PAUSE_ACKED;
+        }
+    }
+
+  fn_exit:
+    MPIDI_FUNC_EXIT(MPID_STATE_SEND_PAUSE_ACK_MESSAGES);
+    return ret;
+
+  fn_fail:
+    goto fn_exit;
+}
+
+
+#undef FUNCNAME
+#define FUNCNAME send_unpause_messages
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+static int send_unpause_messages(void)
+{
+    int i, mpi_errno = MPI_SUCCESS;
+    ptl_process_t id;
+    ptl_pt_index_t data_pt, control_pt;
+    int ret = PTL_OK;
+    struct rptl *rptl;
+    MPIDI_STATE_DECL(MPID_STATE_SEND_UNPAUSE_MESSAGES);
+
+    MPIDI_FUNC_ENTER(MPID_STATE_SEND_UNPAUSE_MESSAGES);
+
+    for (rptl = rptl_info.rptl_list; rptl; rptl = rptl->next) {
+        assert(rptl->local_state != RPTL_LOCAL_STATE_AWAITING_PAUSE_ACKS ||
+               rptl->control.pt != PTL_PT_ANY);
+        if (rptl->control.pt == PTL_PT_ANY)
+            continue;
+        if (rptl->local_state != RPTL_LOCAL_STATE_AWAITING_PAUSE_ACKS)
+            continue;
+
+        if (rptl->pause_ack_counter == rptl_info.world_size) {
+            /* if we are over the max count limit, do not send an
+             * unpause message yet */
+            if (rptl->data.ob_curr_count > rptl->data.ob_max_count)
+                goto fn_exit;
+
+            ret = PtlPTEnable(rptl->ni, rptl->data.pt);
+            RPTLU_ERR_POP(ret, "Error returned while reenabling PT\n");
+
+            rptl->local_state = RPTL_LOCAL_STATE_NORMAL;
+
+            for (i = 0; i < rptl_info.world_size; i++) {
+                mpi_errno = rptl_info.get_target_info(i, &id, rptl->data.pt, &data_pt, &control_pt);
+                if (mpi_errno) {
+                    ret = PTL_FAIL;
+                    RPTLU_ERR_POP(ret,
+                                  "Error getting target info while sending unpause messages\n");
+                }
+
+                /* disable flow control for control messages */
+                ret =
+                    MPID_nem_ptl_rptl_put(rptl->md, 0, 0, PTL_NO_ACK_REQ, id, control_pt,
+                                          0, 0, NULL, RPTL_CONTROL_MSG_UNPAUSE, 0);
+                RPTLU_ERR_POP(ret, "Error sending unpause message\n");
+            }
+        }
+    }
+
+  fn_exit:
+    MPIDI_FUNC_EXIT(MPID_STATE_SEND_UNPAUSE_MESSAGES);
+    return ret;
+
+  fn_fail:
+    goto fn_exit;
+}
+
+
+#undef FUNCNAME
+#define FUNCNAME reissue_ops
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+static int reissue_ops(ptl_process_t target_id)
+{
+    struct rptl_paused_target *target;
+    struct rptl_op *op;
+    int ret = PTL_OK;
+    MPIDI_STATE_DECL(MPID_STATE_REISSUE_OPS);
+
+    MPIDI_FUNC_ENTER(MPID_STATE_REISSUE_OPS);
+
+    for (target = rptl_info.paused_target_list; target; target = target->next)
+        if (IDS_ARE_EQUAL(target->id, target_id))
+            break;
+    assert(target);
+
+    MPL_DL_DELETE(rptl_info.paused_target_list, target);
+    MPIU_Free(target);
+
+    for (op = rptl_info.op_list; op; op = op->next) {
+        if ((op->op_type == RPTL_OP_PUT && IDS_ARE_EQUAL(op->u.put.target_id, target_id)) ||
+            (op->op_type == RPTL_OP_GET && IDS_ARE_EQUAL(op->u.get.target_id, target_id))) {
+            if (op->state != RPTL_OP_STATE_ISSUED) {
+                ret = issue_op(op);
+                RPTLU_ERR_POP(ret, "Error calling issue_op\n");
+            }
+        }
+    }
+
+  fn_exit:
+    MPIDI_FUNC_EXIT(MPID_STATE_REISSUE_OPS);
+    return ret;
+
+  fn_fail:
+    goto fn_exit;
+}
+
+
+#undef FUNCNAME
+#define FUNCNAME get_event_info
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+static void get_event_info(ptl_event_t * event, struct rptl **ret_rptl, struct rptl_op **ret_op)
+{
+    struct rptl *rptl;
+    struct rptl_op *op;
+    struct rptl_paused_target *target, *tmp;
+    MPIDI_STATE_DECL(MPID_STATE_GET_EVENT_INFO);
+
+    MPIDI_FUNC_ENTER(MPID_STATE_GET_EVENT_INFO);
+
+    if (event->type == PTL_EVENT_SEND || event->type == PTL_EVENT_REPLY ||
+        event->type == PTL_EVENT_ACK) {
+        op = (struct rptl_op *) event->user_ptr;
+
+        rptl_info.origin_events_left++;
+
+        if (rptl_info.origin_events_left >= 2) {
+            for (target = rptl_info.paused_target_list; target;) {
+                if (target->state == RPTL_TARGET_STATE_FLOWCONTROL) {
+                    tmp = target->next;
+                    MPL_DL_DELETE(rptl_info.paused_target_list, target);
+                    MPIU_Free(target);
+                    target = tmp;
+                }
+                else
+                    target = target->next;
+            }
+        }
+
+        assert(op);
+        rptl = NULL;
+    }
+    else {
+        /* for all target-side events, we look up the rptl based on
+         * the pt_index */
+        for (rptl = rptl_info.rptl_list; rptl; rptl = rptl->next)
+            if (rptl->data.pt == event->pt_index || rptl->control.pt == event->pt_index)
+                break;
+
+        assert(rptl);
+        op = NULL;
+    }
+
+    *ret_rptl = rptl;
+    *ret_op = op;
+
+  fn_exit:
+    MPIDI_FUNC_EXIT(MPID_STATE_GET_EVENT_INFO);
+    return;
+
+  fn_fail:
+    goto fn_exit;
+}
+
+
+#undef FUNCNAME
+#define FUNCNAME stash_event
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+static int stash_event(struct rptl_op *op, ptl_event_t event)
+{
+    int mpi_errno = MPI_SUCCESS;
+    int ret = PTL_OK;
+    MPIU_CHKPMEM_DECL(1);
+    MPIDI_STATE_DECL(MPID_STATE_STASH_EVENT);
+
+    MPIDI_FUNC_ENTER(MPID_STATE_STASH_EVENT);
+
+    /* make sure this is of the event type we know of */
+    assert(event.type == PTL_EVENT_SEND || event.type == PTL_EVENT_ACK);
+
+    /* only PUT events are stashed */
+    assert(op->op_type == RPTL_OP_PUT);
+
+    /* we should never stash anything when we are in events ready */
+    assert(op->events_ready == 0);
+
+    /* only one of send or ack is stashed.  if we are in this
+     * function, both the events should be NULL at this point. */
+    assert(op->u.put.send == NULL && op->u.put.ack == NULL);
+
+    if (event.type == PTL_EVENT_SEND) {
+        MPIU_CHKPMEM_MALLOC(op->u.put.send, ptl_event_t *, sizeof(ptl_event_t), mpi_errno,
+                            "ptl event");
+        memcpy(op->u.put.send, &event, sizeof(ptl_event_t));
+    }
+    else {
+        MPIU_CHKPMEM_MALLOC(op->u.put.ack, ptl_event_t *, sizeof(ptl_event_t), mpi_errno,
+                            "ptl event");
+        memcpy(op->u.put.ack, &event, sizeof(ptl_event_t));
+    }
+
+  fn_exit:
+    MPIU_CHKPMEM_COMMIT();
+    MPIDI_FUNC_EXIT(MPID_STATE_STASH_EVENT);
+    return ret;
+
+  fn_fail:
+    if (mpi_errno)
+        ret = PTL_FAIL;
+    MPIU_CHKPMEM_REAP();
+    goto fn_exit;
+}
+
+
+#undef FUNCNAME
+#define FUNCNAME retrieve_event
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+static void retrieve_event(struct rptl *rptl, struct rptl_op *op, ptl_event_t * event)
+{
+    MPIDI_STATE_DECL(MPID_STATE_RETRIEVE_EVENT);
+
+    MPIDI_FUNC_ENTER(MPID_STATE_RETRIEVE_EVENT);
+
+    assert(op->op_type == RPTL_OP_PUT);
+    assert(op->u.put.send || op->u.put.ack);
+
+    if (op->u.put.send) {
+        memcpy(event, op->u.put.send, sizeof(ptl_event_t));
+        MPIU_Free(op->u.put.send);
+    }
+    else {
+        memcpy(event, op->u.put.ack, sizeof(ptl_event_t));
+        MPIU_Free(op->u.put.ack);
+    }
+    event->user_ptr = op->u.put.user_ptr;
+
+    MPL_DL_DELETE(rptl_info.op_list, op);
+    free_op(op);
+
+  fn_exit:
+    MPIDI_FUNC_EXIT(MPID_STATE_RETRIEVE_EVENT);
+    return;
+
+  fn_fail:
+    goto fn_exit;
+}
+
+
+#undef FUNCNAME
+#define FUNCNAME issue_pending_ops
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+static int issue_pending_ops(void)
+{
+    struct rptl_paused_target *target, *tmp;
+    struct rptl_op *op;
+    int ret = PTL_OK;
+    MPIDI_STATE_DECL(MPID_STATE_ISSUE_PENDING_OPS);
+
+    MPIDI_FUNC_ENTER(MPID_STATE_ISSUE_PENDING_OPS);
+
+    for (op = rptl_info.op_list; op; op = op->next) {
+        if (op->state == RPTL_OP_STATE_QUEUED) {
+            for (target = rptl_info.paused_target_list; target; target = target->next)
+                if ((op->op_type == RPTL_OP_PUT && IDS_ARE_EQUAL(op->u.put.target_id, target->id)) ||
+                    (op->op_type == RPTL_OP_GET && IDS_ARE_EQUAL(op->u.get.target_id, target->id)))
+                    break;
+            if (target == NULL) {
+                ret = issue_op(op);
+                RPTLU_ERR_POP(ret, "error issuing op\n");
+            }
+        }
+    }
+
+  fn_exit:
+    MPIDI_FUNC_EXIT(MPID_STATE_ISSUE_PENDING_OPS);
+    return ret;
+
+  fn_fail:
+    goto fn_exit;
+}
+
+
+#undef FUNCNAME
+#define FUNCNAME MPID_nem_ptl_rptl_eqget
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+int MPID_nem_ptl_rptl_eqget(ptl_handle_eq_t eq_handle, ptl_event_t * event)
+{
+    struct rptl_op *op;
+    struct rptl *rptl;
+    ptl_event_t e;
+    int ret = PTL_OK, tmp_ret = PTL_OK;
+    struct rptl_paused_target *target;
+    int mpi_errno = MPI_SUCCESS;
+    MPIU_CHKPMEM_DECL(1);
+    MPIDI_STATE_DECL(MPID_STATE_MPID_NEM_PTL_RPTL_EQGET);
+
+    MPIDI_FUNC_ENTER(MPID_STATE_MPID_NEM_PTL_RPTL_EQGET);
+
+    /* before we poll the eq, we need to check if there are any
+     * completed operations that need to be returned */
+    /* FIXME: this is an expensive loop over all pending operations
+     * everytime the user does an eqget */
+    for (op = rptl_info.op_list; op; op = op->next) {
+        if (op->events_ready) {
+            retrieve_event(rptl, op, event);
+            ret = PTL_OK;
+            goto fn_exit;
+        }
+    }
+
+    /* see if pause ack messages need to be sent out */
+    tmp_ret = send_pause_ack_messages();
+    if (tmp_ret) {
+        ret = tmp_ret;
+        RPTLU_ERR_POP(ret, "Error returned from send_pause_ack_messages\n");
+    }
+
+    /* see if unpause messages need to be sent out */
+    tmp_ret = send_unpause_messages();
+    if (tmp_ret) {
+        ret = tmp_ret;
+        RPTLU_ERR_POP(ret, "Error returned from send_unpause_messages\n");
+    }
+
+    /* see if there are any pending ops to be issued */
+    tmp_ret = issue_pending_ops();
+    if (tmp_ret) {
+        ret = tmp_ret;
+        RPTLU_ERR_POP(ret, "Error returned from issue_pending_ops\n");
+    }
+
+    ret = PtlEQGet(eq_handle, event);
+    if (ret == PTL_EQ_EMPTY)
+        goto fn_exit;
+
+    /* find the rptl and op associated with this event */
+    get_event_info(event, &rptl, &op);
+
+    /* PT_DISABLED events only occur on the target */
+    if (event->type == PTL_EVENT_PT_DISABLED) {
+        /* we hide PT disabled events from the user */
+        ret = PTL_EQ_EMPTY;
+
+        /* we should only receive disable events on the data pt */
+        assert(rptl->data.pt == event->pt_index);
+
+        /* if we don't have a control PT, we don't have a way to
+         * recover from disable events */
+        assert(rptl->control.pt != PTL_PT_ANY);
+
+        rptl->local_state = RPTL_LOCAL_STATE_AWAITING_PAUSE_ACKS;
+        rptl->pause_ack_counter = 0;
+
+        /* send out pause messages */
+        tmp_ret = send_pause_messages(rptl);
+        if (tmp_ret) {
+            ret = tmp_ret;
+            RPTLU_ERR_POP(ret, "Error returned from send_pause_messages\n");
+        }
+    }
+
+    /* PUT_OVERFLOW events only occur on the target and only for the
+     * data portal */
+    else if (event->type == PTL_EVENT_PUT_OVERFLOW || event->type == PTL_EVENT_GET_OVERFLOW) {
+        /* something is being pulled out of the overflow buffer,
+         * decrement counter */
+        rptl->data.ob_curr_count--;
+
+        /* we should only receive disable events on the data pt */
+        assert(rptl->data.pt == event->pt_index);
+    }
+
+    /* PUT events only occur on the target */
+    else if (event->type == PTL_EVENT_PUT || event->type == PTL_EVENT_GET) {
+        if (rptl->data.pt == event->pt_index) {
+            /* if the event is in the OVERFLOW list, then it means we
+             * just got a match in there */
+            if (event->ptl_list == PTL_OVERFLOW_LIST)
+                rptl->data.ob_curr_count++;
+            goto fn_exit;
+        }
+
+        /* control PT should never see a GET event */
+        assert(event->type == PTL_EVENT_PUT);
+
+        /* else, this message is on the control PT, so hide this event
+         * from the user */
+        ret = PTL_EQ_EMPTY;
+
+        /* the message came in on the control PT, repost it */
+        tmp_ret = post_empty_buffer(rptl->ni, rptl->control.pt,
+                                    &rptl->control.me[rptl->control.me_idx]);
+        if (tmp_ret) {
+            ret = tmp_ret;
+            RPTLU_ERR_POP(ret, "Error returned from post_empty_buffer\n");
+        }
+        rptl->control.me_idx++;
+        if (rptl->control.me_idx >= 2 * rptl_info.world_size)
+            rptl->control.me_idx = 0;
+
+        if (event->hdr_data == RPTL_CONTROL_MSG_PAUSE) {
+            tmp_ret = alloc_target(event->initiator, RPTL_TARGET_STATE_RECEIVED_PAUSE, rptl);
+            if (tmp_ret) {
+                ret = tmp_ret;
+                RPTLU_ERR_POP(ret, "Error returned from alloc_target\n");
+            }
+        }
+        else if (event->hdr_data == RPTL_CONTROL_MSG_PAUSE_ACK) {
+            rptl->pause_ack_counter++;
+        }
+        else {  /* got an UNPAUSE message */
+            /* reissue all operations to this target */
+            tmp_ret = reissue_ops(event->initiator);
+            if (tmp_ret) {
+                ret = tmp_ret;
+                RPTLU_ERR_POP(ret, "Error returned from reissue_ops\n");
+            }
+        }
+    }
+
+    /* origin side events */
+    else if (event->type == PTL_EVENT_SEND || event->type == PTL_EVENT_ACK ||
+             event->type == PTL_EVENT_REPLY) {
+
+        /* if this is a failed event, we simply drop this event */
+        if (event->ni_fail_type == PTL_NI_PT_DISABLED) {
+            /* hide the event from the user */
+            ret = PTL_EQ_EMPTY;
+
+            op->state = RPTL_OP_STATE_NACKED;
+
+            if (op->op_type == RPTL_OP_PUT) {
+                assert(!(event->type == PTL_EVENT_SEND && op->u.put.send));
+                assert(!(event->type == PTL_EVENT_ACK && op->u.put.ack));
+
+                /* if we have received both events, discard them.
+                 * otherwise, stash the one we received while waiting
+                 * for the other. */
+                if (event->type == PTL_EVENT_SEND && op->u.put.ack) {
+                    MPIU_Free(op->u.put.ack);
+                    op->u.put.ack = NULL;
+                }
+                else if (event->type == PTL_EVENT_ACK && op->u.put.send) {
+                    MPIU_Free(op->u.put.send);
+                    op->u.put.send = NULL;
+                }
+                else {
+                    ret = stash_event(op, *event);
+                    RPTLU_ERR_POP(ret, "error stashing event\n");
+                }
+            }
+
+            if (op->op_type == RPTL_OP_PUT)
+                tmp_ret = alloc_target(op->u.put.target_id, RPTL_TARGET_STATE_DISABLED, NULL);
+            else
+                tmp_ret = alloc_target(op->u.get.target_id, RPTL_TARGET_STATE_DISABLED, NULL);
+            if (tmp_ret) {
+                ret = tmp_ret;
+                RPTLU_ERR_POP(ret, "Error returned from alloc_target\n");
+            }
+        }
+
+        /* if this is a REPLY event, we are done with this op */
+        else if (event->type == PTL_EVENT_REPLY) {
+            assert(op->op_type == RPTL_OP_GET);
+
+            event->user_ptr = op->u.get.user_ptr;
+            MPL_DL_DELETE(rptl_info.op_list, op);
+            free_op(op);
+        }
+
+        else if (event->type == PTL_EVENT_SEND && op->u.put.ack) {
+            assert(op->op_type == RPTL_OP_PUT);
+
+            /* we already got the other event we needed earlier.  mark
+             * the op events as ready and return this current event to
+             * the user. */
+            op->events_ready = 1;
+            event->user_ptr = op->u.put.user_ptr;
+
+            /* if flow control is not set, ignore events */
+            if (op->u.put.flow_control == 0) {
+                retrieve_event(rptl, op, event);
+                ret = PTL_EQ_EMPTY;
+            }
+        }
+
+        else if (event->type == PTL_EVENT_ACK && op->u.put.send) {
+            assert(op->op_type == RPTL_OP_PUT);
+
+            /* we already got the other event we needed earlier.  mark
+             * the op events as ready and return this current event to
+             * the user. */
+            op->events_ready = 1;
+            event->user_ptr = op->u.put.user_ptr;
+
+            /* if flow control is not set, ignore events */
+            if (op->u.put.flow_control == 0) {
+                retrieve_event(rptl, op, event);
+                ret = PTL_EQ_EMPTY;
+            }
+
+            /* if the user asked for an ACK, just return this event.
+             * if not, discard this event and retrieve the send
+             * event. */
+            else if (!(op->u.put.ack_req & PTL_ACK_REQ))
+                retrieve_event(rptl, op, event);
+        }
+
+        else {
+            assert(!(event->type == PTL_EVENT_SEND && op->u.put.send));
+            assert(!(event->type == PTL_EVENT_ACK && op->u.put.ack));
+
+            /* stash this event as we need to wait for the buddy event
+             * as well before returning to the user */
+            ret = stash_event(op, *event);
+            RPTLU_ERR_POP(ret, "error stashing event\n");
+            ret = PTL_EQ_EMPTY;
+        }
+    }
+
+  fn_exit:
+    MPIU_CHKPMEM_COMMIT();
+    MPIDI_FUNC_EXIT(MPID_STATE_MPID_NEM_PTL_RPTL_EQGET);
+    return ret;
+
+  fn_fail:
+    if (mpi_errno)
+        ret = PTL_FAIL;
+    MPIU_CHKPMEM_REAP();
+    goto fn_exit;
+}
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.h b/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.h
new file mode 100644
index 0000000..359e24f
--- /dev/null
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/rptl.h
@@ -0,0 +1,149 @@
+/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
+/*
+ *  (C) 2014 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ */
+
+#if !defined RPTL_H_INCLUDED
+#define RPTL_H_INCLUDED
+
+#if !defined HAVE_MACRO_VA_ARGS
+#error "portals requires VA_ARGS support"
+#endif /* HAVE_MACRO_VA_ARGS */
+
+#if defined HAVE__FUNC__
+#define RPTLU_FUNC __func__
+#elif defined HAVE_CAP__FUNC__
+#define RPTLU_FUNC __FUNC__
+#elif defined HAVE__FUNCTION__
+#define RPTLU_FUNC __FUNCTION__
+#else
+#define RPTLU_FUNC "Unknown"
+#endif
+
+#define RPTLU_ERR_POP(ret, ...)                                         \
+    {                                                                   \
+        if (ret) {                                                      \
+            MPIU_Error_printf("%s (%d): ", RPTLU_FUNC, __LINE__);       \
+            MPIU_Error_printf(__VA_ARGS__);                             \
+            goto fn_fail;                                               \
+        }                                                               \
+    }
+
+struct rptl_op {
+    enum {
+        RPTL_OP_PUT,
+        RPTL_OP_GET
+    } op_type;
+
+    enum {
+        RPTL_OP_STATE_QUEUED,
+        RPTL_OP_STATE_ISSUED,
+        RPTL_OP_STATE_NACKED
+    } state;
+
+    union {
+        struct {
+            ptl_handle_md_t md_handle;
+            ptl_size_t local_offset;
+            ptl_size_t length;
+            ptl_ack_req_t ack_req;
+            ptl_process_t target_id;
+            ptl_pt_index_t pt_index;
+            ptl_match_bits_t match_bits;
+            ptl_size_t remote_offset;
+            void *user_ptr;
+            ptl_hdr_data_t hdr_data;
+
+            /* internal variables store events */
+            ptl_event_t *send;
+            ptl_event_t *ack;
+            int flow_control;
+        } put;
+        struct {
+            ptl_handle_md_t md_handle;
+            ptl_size_t local_offset;
+            ptl_size_t length;
+            ptl_process_t target_id;
+            ptl_pt_index_t pt_index;
+            ptl_match_bits_t match_bits;
+            ptl_size_t remote_offset;
+            void *user_ptr;
+        } get;
+    } u;
+
+    int events_ready;
+
+    struct rptl_op *next;
+    struct rptl_op *prev;
+};
+
+#define RPTL_CONTROL_MSG_PAUSE       (0)
+#define RPTL_CONTROL_MSG_PAUSE_ACK   (1)
+#define RPTL_CONTROL_MSG_UNPAUSE     (2)
+
+struct rptl {
+    /* local portal state */
+    enum {
+        RPTL_LOCAL_STATE_NORMAL,
+        RPTL_LOCAL_STATE_AWAITING_PAUSE_ACKS
+    } local_state;
+    uint64_t pause_ack_counter;
+
+    struct {
+        ptl_handle_eq_t eq;
+        ptl_pt_index_t pt;      /* primary pt for data exchange */
+
+        /* ob_max_count refers to the number of messages that were in
+         * the overflow buffer when the pt was disabled */
+        uint64_t ob_max_count;
+
+        /* ob_curr_count refers to the current tally of messages in
+         * the overflow buffer.  if we are in disabled state, when
+         * this count reaches half of the maximum count, we are ready
+         * to reenable the PT. */
+        uint64_t ob_curr_count;
+    } data;
+
+    struct {
+        ptl_pt_index_t pt;      /* pt for control messages */
+
+        /* the remaining contents of the control structure are only
+         * valid when the control.pt field is not PTL_PT_ANY */
+        ptl_handle_me_t *me;
+        int me_idx;
+    } control;
+
+    ptl_handle_ni_t ni;
+    ptl_handle_eq_t eq;
+    ptl_handle_md_t md;
+
+    struct rptl *next;
+    struct rptl *prev;
+};
+
+int MPID_nem_ptl_rptl_init(int world_size, uint64_t max_origin_events,
+                           int (*get_target_info) (int rank, ptl_process_t * id,
+                                                   ptl_pt_index_t local_data_pt,
+                                                   ptl_pt_index_t * target_data_pt,
+                                                   ptl_pt_index_t * target_control_pt));
+
+int MPID_nem_ptl_rptl_drain_eq(int eq_count, ptl_handle_eq_t *eq);
+
+int MPID_nem_ptl_rptl_ptinit(ptl_handle_ni_t ni_handle, ptl_handle_eq_t eq_handle, ptl_pt_index_t data_pt,
+                             ptl_pt_index_t control_pt);
+
+int MPID_nem_ptl_rptl_ptfini(ptl_pt_index_t pt_index);
+
+int MPID_nem_ptl_rptl_put(ptl_handle_md_t md_handle, ptl_size_t local_offset, ptl_size_t length,
+                          ptl_ack_req_t ack_req, ptl_process_t target_id, ptl_pt_index_t pt_index,
+                          ptl_match_bits_t match_bits, ptl_size_t remote_offset, void *user_ptr,
+                          ptl_hdr_data_t hdr_data, int flow_control);
+
+int MPID_nem_ptl_rptl_get(ptl_handle_md_t md_handle, ptl_size_t local_offset, ptl_size_t length,
+                          ptl_process_t target_id, ptl_pt_index_t pt_index,
+                          ptl_match_bits_t match_bits, ptl_size_t remote_offset, void *user_ptr);
+
+int MPID_nem_ptl_rptl_eqget(ptl_handle_eq_t eq_handle, ptl_event_t * event);
+
+#endif /* RPTL_H_INCLUDED */
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/subconfigure.m4 b/src/mpid/ch3/channels/nemesis/netmod/portals4/subconfigure.m4
index 5e98951..1b4db86 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/subconfigure.m4
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/subconfigure.m4
@@ -14,6 +14,8 @@ AC_DEFUN([PAC_SUBCFG_BODY_]PAC_SUBCFG_AUTO_SUFFIX,[
 AM_COND_IF([BUILD_NEMESIS_NETMOD_PORTALS4],[
     AC_MSG_NOTICE([RUNNING CONFIGURE FOR ch3:nemesis:portals4])
 
+    PAC_CC_FUNCTION_NAME_SYMBOL
+
     PAC_SET_HEADER_LIB_PATH(portals4)
     PAC_PUSH_FLAG(LIBS)
     PAC_CHECK_HEADER_LIB_FATAL(portals4, portals4.h, portals, PtlInit)

http://git.mpich.org/mpich.git/commitdiff/71a56602c788b588cb08e4cfb56da7c0cfd9a1c0

commit 71a56602c788b588cb08e4cfb56da7c0cfd9a1c0
Author: Huiwei Lu <huiweilu at mcs.anl.gov>
Date:   Thu Oct 30 16:35:32 2014 -0500

    Fixes configure.ac when no fortran is found
    
    Fixes the case when configured with default setting but with no fortran
    installed. It should give an error of 'No Fortran 77/90 compiler found'
    but not.
    
    This patch is related with [d4e30cc0], when configure was changed to
    support '--disable-fc'.
    
    Signed-off-by: Antonio J. Pena <apenya at mcs.anl.gov>

diff --git a/configure.ac b/configure.ac
index f7ffd0a..552e1d9 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1880,13 +1880,15 @@ fi
 # Handle default choices for the Fortran compilers
 # Note that these have already been set above
 
-if test "$enable_f77" = "yes" -a "$F77" = "no" ; then
-   # No Fortran 77 compiler found; abort
-   AC_MSG_ERROR([No Fortran 77 compiler found. If you don't need to
-   build any Fortran programs, you can disable Fortran support using
-   --disable-fortran. If you do want to build Fortran
-   programs, you need to install a Fortran compiler such as gfortran
-   or ifort before you can proceed.])
+if test "$enable_f77" = "yes"; then
+    if test "$F77" = "" -o "$F77" = "no"; then
+        # No Fortran 77 compiler found; abort
+        AC_MSG_ERROR([No Fortran 77 compiler found. If you don't need to
+        build any Fortran programs, you can disable Fortran support using
+        --disable-fortran. If you do want to build Fortran
+        programs, you need to install a Fortran compiler such as gfortran
+        or ifort before you can proceed.])
+    fi
 fi
 
 if test "$enable_f77" = yes ; then
@@ -2054,13 +2056,15 @@ if test "$enable_fc" = "yes" -a "$enable_f77" = yes ; then
         AC_MSG_WARN([Use --disable-fc to keep configure from searching for a Fortran 90 compiler])
     fi
 
-    if test "$enable_fc" = "yes" -a "$FC" = "no" ; then
-       # No Fortran 90 compiler found; abort
-       AC_MSG_ERROR([No Fortran 90 compiler found. If you don't need
-       to build any Fortran 90 programs, you can disable Fortran 90
-       support using --disable-fc. If you do want to build Fortran 90
-       programs, you need to install a Fortran 90 compiler such as
-       gfortran or ifort before you can proceed.])
+    if test "$enable_fc" = "yes"; then
+        if test "$FC" = "no" -o "$FC" = ""; then
+            # No Fortran 90 compiler found; abort
+            AC_MSG_ERROR([No Fortran 90 compiler found. If you don't need
+            to build any Fortran 90 programs, you can disable Fortran 90
+            support using --disable-fc. If you do want to build Fortran 90
+            programs, you need to install a Fortran 90 compiler such as
+            gfortran or ifort before you can proceed.])
+        fi
     fi
 fi
 

http://git.mpich.org/mpich.git/commitdiff/0249f87cb932e63658267bb9db0f92e1a5f94db7

commit 0249f87cb932e63658267bb9db0f92e1a5f94db7
Author: Xin Zhao <xinzhao3 at illinois.edu>
Date:   Thu Oct 30 15:57:02 2014 -0500

    Bug-fix: avoid free NULL pointer in RMA.
    
    req->dev.user_buf points to the data sent from origin process
    to target process, and for FOP sometimes it points to the IMMED
    area in packet header when data can be fit in packet header.
    In such case, we should not free req->dev.user_buf in final
    request handler since that data area will be freed by the
    runtime when packet header is freed.
    
    In this patch we initialize user_buf to NULL when creating the
    request, and set it to NULL when FOP is completed, and avoid free
    a NULL pointer in final request handler.
    
    Signed-off-by: Min Si <msi at il.is.s.u-tokyo.ac.jp>

diff --git a/src/mpid/ch3/src/ch3u_handle_recv_req.c b/src/mpid/ch3/src/ch3u_handle_recv_req.c
index 262cf71..5ed0828 100644
--- a/src/mpid/ch3/src/ch3u_handle_recv_req.c
+++ b/src/mpid/ch3/src/ch3u_handle_recv_req.c
@@ -322,7 +322,8 @@ int MPIDI_CH3_ReqHandler_GetAccumRespComplete( MPIDI_VC_t *vc,
     MPIDI_STATE_DECL(MPID_STATE_MPIDI_CH3_REQHANDLER_GETACCUMRESPCOMPLETE);
     
     MPIDI_FUNC_ENTER(MPID_STATE_MPIDI_CH3_REQHANDLER_GETACCUMRESPCOMPLETE);
-    MPIU_Free(rreq->dev.user_buf);
+    if (rreq->dev.user_buf != NULL)
+        MPIU_Free(rreq->dev.user_buf);
 
     MPID_Win_get_ptr(rreq->dev.target_win_handle, win_ptr);
 
@@ -616,6 +617,16 @@ int MPIDI_CH3_ReqHandler_FOPComplete( MPIDI_VC_t *vc,
     /* Free temporary buffer allocated in PktHandler_FOP */
     if (len > sizeof(int) * MPIDI_RMA_FOP_IMMED_INTS && rreq->dev.op != MPI_NO_OP) {
         MPIU_Free(rreq->dev.user_buf);
+        /* Assign user_buf to NULL so that reqHandler_GetAccumRespComplete()
+           will not try to free an empty buffer. */
+        rreq->dev.user_buf = NULL;
+    }
+    else {
+        /* FOP data fit in pkt header and user_buf just points to data area in pkt header
+           in pktHandler_FOP(), and it should be freed when pkt header is freed.
+           Here we assign user_buf to NULL so that reqHandler_GetAccumRespComplete()
+           will not try to free it. */
+        rreq->dev.user_buf = NULL;
     }
 
     *complete = 1;
diff --git a/src/mpid/ch3/src/ch3u_request.c b/src/mpid/ch3/src/ch3u_request.c
index c5ff48d..7caa636 100644
--- a/src/mpid/ch3/src/ch3u_request.c
+++ b/src/mpid/ch3/src/ch3u_request.c
@@ -88,6 +88,7 @@ MPID_Request * MPID_Request_create(void)
 	req->dev.iov_offset        = 0;
         req->dev.flags             = MPIDI_CH3_PKT_FLAG_NONE;
         req->dev.resp_request_handle = MPI_REQUEST_NULL;
+        req->dev.user_buf          = NULL;
         req->dev.OnDataAvail       = NULL;
         req->dev.OnFinal           = NULL;
 #ifdef MPIDI_CH3_REQUEST_INIT

http://git.mpich.org/mpich.git/commitdiff/061a996fd606b0a7117b8230bc7de6467d622236

commit 061a996fd606b0a7117b8230bc7de6467d622236
Author: Igor Ivanov <Igor.Ivanov at itseez.com>
Date:   Mon Oct 20 17:35:05 2014 +0200

    netmod/mxm: Avoid calling mxm send req handling from mxm send completion callback
    
    Signed-off-by: Devendar Bureddy <devendar at mellanox.com>
    Signed-off-by: Igor Ivanov <Igor.Ivanov at itseez.com>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_impl.h b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_impl.h
index 3b2bb12..43070da 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_impl.h
+++ b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_impl.h
@@ -69,6 +69,8 @@ void MPID_nem_mxm_get_adi_msg(mxm_conn_h conn, mxm_imm_t imm, void *data,
 void MPID_nem_mxm_anysource_posted(MPID_Request * req);
 int MPID_nem_mxm_anysource_matched(MPID_Request * req);
 
+int _mxm_handle_sreq(MPID_Request * req);
+
 /* List type as queue
  * Operations, initialization etc
  */
@@ -174,6 +176,25 @@ typedef struct {
 /* macro for mxm private in REQ */
 #define REQ_BASE(reqp) ((reqp) ? (MPID_nem_mxm_req_area *)((reqp)->ch.netmod_area.padding) : NULL)
 
+typedef GENERIC_Q_DECL(struct MPID_Request) MPID_nem_mxm_reqq_t;
+#define MPID_nem_mxm_queue_empty(q) GENERIC_Q_EMPTY (q)
+#define MPID_nem_mxm_queue_head(q) GENERIC_Q_HEAD (q)
+#define MPID_nem_mxm_queue_enqueue(qp, ep) do {                                           \
+        /* add refcount so req doesn't get freed before it's dequeued */                \
+        MPIR_Request_add_ref(ep);                                                       \
+        MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST,                         \
+                          "MPID_nem_mxm_queue_enqueue req=%p (handle=%#x), queue=%p",     \
+                          ep, (ep)->handle, qp));                                       \
+        GENERIC_Q_ENQUEUE (qp, ep, dev.next);                                           \
+    } while (0)
+#define MPID_nem_mxm_queue_dequeue(qp, ep)  do {                                          \
+        GENERIC_Q_DEQUEUE (qp, ep, dev.next);                                           \
+        MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST,                         \
+                          "MPID_nem_mxm_queue_dequeuereq=%p (handle=%#x), queue=%p",      \
+                          *(ep), *(ep) ? (*(ep))->handle : -1, qp));                    \
+        MPID_Request_release(*(ep));                                                    \
+    } while (0)
+
 typedef struct MPID_nem_mxm_module_t {
     char *runtime_version;
     const char *compiletime_version;
@@ -188,6 +209,7 @@ typedef struct MPID_nem_mxm_module_t {
     int mxm_np;
     MPID_nem_mxm_ep_t *endpoint;
     list_head_t free_queue;
+    MPID_nem_mxm_reqq_t sreq_queue;
     struct {
         int bulk_connect;       /* use bulk connect */
         int bulk_disconnect;    /* use bulk disconnect */
diff --git a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_init.c b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_init.c
index 949efc7..4d78565 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_init.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_init.c
@@ -482,6 +482,8 @@ static int _mxm_init(int rank, int size)
     list_grow_mxm_req(&_mxm_obj.free_queue);
     MPIU_Assert(list_length(&_mxm_obj.free_queue) == MXM_MPICH_MAX_REQ);
 
+    _mxm_obj.sreq_queue.head = _mxm_obj.sreq_queue.tail = NULL;
+
     mxm_obj = &_mxm_obj;
 
   fn_exit:
diff --git a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_poll.c b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_poll.c
index ba7686e..e8bddc3 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_poll.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_poll.c
@@ -24,10 +24,16 @@ static int _mxm_process_rdtype(MPID_Request ** rreq_p, MPI_Datatype datatype,
 int MPID_nem_mxm_poll(int in_blocking_progress)
 {
     int mpi_errno = MPI_SUCCESS;
+    MPID_Request *req = NULL;
 
     MPIDI_STATE_DECL(MPID_STATE_MXM_POLL);
     MPIDI_FUNC_ENTER(MPID_STATE_MXM_POLL);
 
+    while (!MPID_nem_mxm_queue_empty(mxm_obj->sreq_queue)) {
+        MPID_nem_mxm_queue_dequeue(&mxm_obj->sreq_queue, &req);
+        _mxm_handle_sreq(req);
+    }
+
     mpi_errno = _mxm_poll();
     if (mpi_errno)
         MPIU_ERR_POP(mpi_errno);
@@ -72,6 +78,7 @@ void MPID_nem_mxm_get_adi_msg(mxm_conn_h conn, mxm_imm_t imm, void *data,
     vc = mxm_conn_ctx_get(conn);
 
     _dbg_mxm_output(5, "========> Getting ADI msg (from=%d data_size %d) \n", vc->pg_rank, length);
+    _dbg_mxm_out_buf(data, (length > 16 ? 16 : length));
 
     MPID_nem_handle_pkt(vc, data, (MPIDI_msg_sz_t) (length));
 }
@@ -144,6 +151,10 @@ int MPID_nem_mxm_anysource_matched(MPID_Request * req)
 int MPID_nem_mxm_recv(MPIDI_VC_t * vc, MPID_Request * rreq)
 {
     int mpi_errno = MPI_SUCCESS;
+    MPIDI_msg_sz_t data_sz;
+    int dt_contig;
+    MPI_Aint dt_true_lb;
+    MPID_Datatype *dt_ptr;
 
     MPIDI_STATE_DECL(MPID_STATE_MPID_NEM_MXM_RECV);
     MPIDI_FUNC_ENTER(MPID_STATE_MPID_NEM_MXM_RECV);
@@ -152,18 +163,15 @@ int MPID_nem_mxm_recv(MPIDI_VC_t * vc, MPID_Request * rreq)
     MPIU_Assert(((rreq->dev.match.parts.rank == MPI_ANY_SOURCE) && (vc == NULL)) ||
                 (vc && !vc->ch.is_local));
 
+    MPIDI_Datatype_get_info(rreq->dev.user_count, rreq->dev.datatype, dt_contig, data_sz,
+                            dt_ptr, dt_true_lb);
+
     {
         MPIR_Context_id_t context_id = rreq->dev.match.parts.context_id;
         int tag = rreq->dev.match.parts.tag;
-        MPIDI_msg_sz_t data_sz;
-        int dt_contig;
-        MPI_Aint dt_true_lb;
-        MPID_Datatype *dt_ptr;
         MPID_nem_mxm_vc_area *vc_area = NULL;
         MPID_nem_mxm_req_area *req_area = NULL;
 
-        MPIDI_Datatype_get_info(rreq->dev.user_count, rreq->dev.datatype, dt_contig, data_sz,
-                                dt_ptr, dt_true_lb);
         rreq->dev.OnDataAvail = NULL;
         rreq->dev.tmpbuf = NULL;
         rreq->ch.vc = vc;
@@ -223,7 +231,6 @@ static int _mxm_handle_rreq(MPID_Request * req)
     MPIDI_msg_sz_t userbuf_sz;
     MPID_Datatype *dt_ptr;
     MPIDI_msg_sz_t data_sz;
-    MPIDI_VC_t *vc = NULL;
     MPID_nem_mxm_vc_area *vc_area ATTRIBUTE((unused)) = NULL;
     MPID_nem_mxm_req_area *req_area = NULL;
     void *tmp_buf = NULL;
@@ -319,7 +326,7 @@ static int _mxm_handle_rreq(MPID_Request * req)
         }
     }
 
-    MPIDI_CH3U_Handle_recv_req(vc, req, &complete);
+    MPIDI_CH3U_Handle_recv_req(req->ch.vc, req, &complete);
     MPIU_Assert(complete == TRUE);
 
     if (tmp_buf) MPIU_Free(tmp_buf);
diff --git a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_send.c b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_send.c
index 75003bf..69f3adc 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_send.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_send.c
@@ -15,7 +15,6 @@ enum {
 };
 
 
-static int _mxm_handle_sreq(MPID_Request * req);
 static void _mxm_send_completion_cb(void *context);
 static int _mxm_isend(MPID_nem_mxm_ep_t * ep, MPID_nem_mxm_req_area * req,
                       int type, mxm_mq_h mxm_mq, int mxm_rank, int id, mxm_tag_t tag, int block);
@@ -235,6 +234,7 @@ int MPID_nem_mxm_send(MPIDI_VC_t * vc, const void *buf, int count, MPI_Datatype
     MPIDI_Request_create_sreq(sreq, mpi_errno, goto fn_exit);
     MPIU_Assert(sreq != NULL);
     MPIDI_Request_set_type(sreq, MPIDI_REQUEST_TYPE_SEND);
+
     MPIDI_VC_FAI_send_seqnum(vc, seqnum);
     MPIDI_Request_set_seqnum(sreq, seqnum);
     if (HANDLE_GET_KIND(datatype) != HANDLE_KIND_BUILTIN) {
@@ -336,7 +336,8 @@ int MPID_nem_mxm_ssend(MPIDI_VC_t * vc, const void *buf, int count, MPI_Datatype
     /* create a request */
     MPIDI_Request_create_sreq(sreq, mpi_errno, goto fn_exit);
     MPIU_Assert(sreq != NULL);
-    MPIDI_Request_set_type(sreq, MPIDI_REQUEST_TYPE_SEND);
+    MPIDI_Request_set_type(sreq, MPIDI_REQUEST_TYPE_SSEND);
+
     MPIDI_VC_FAI_send_seqnum(vc, seqnum);
     MPIDI_Request_set_seqnum(sreq, seqnum);
     if (HANDLE_GET_KIND(datatype) != HANDLE_KIND_BUILTIN) {
@@ -439,6 +440,7 @@ int MPID_nem_mxm_isend(MPIDI_VC_t * vc, const void *buf, int count, MPI_Datatype
     MPIDI_Request_create_sreq(sreq, mpi_errno, goto fn_exit);
     MPIU_Assert(sreq != NULL);
     MPIDI_Request_set_type(sreq, MPIDI_REQUEST_TYPE_SEND);
+
     MPIDI_VC_FAI_send_seqnum(vc, seqnum);
     MPIDI_Request_set_seqnum(sreq, seqnum);
     if (HANDLE_GET_KIND(datatype) != HANDLE_KIND_BUILTIN) {
@@ -541,7 +543,8 @@ int MPID_nem_mxm_issend(MPIDI_VC_t * vc, const void *buf, int count, MPI_Datatyp
     /* create a request */
     MPIDI_Request_create_sreq(sreq, mpi_errno, goto fn_exit);
     MPIU_Assert(sreq != NULL);
-    MPIDI_Request_set_type(sreq, MPIDI_REQUEST_TYPE_SEND);
+    MPIDI_Request_set_type(sreq, MPIDI_REQUEST_TYPE_SSEND);
+
     MPIDI_VC_FAI_send_seqnum(vc, seqnum);
     MPIDI_Request_set_seqnum(sreq, seqnum);
     if (HANDLE_GET_KIND(datatype) != HANDLE_KIND_BUILTIN) {
@@ -619,10 +622,9 @@ int MPID_nem_mxm_issend(MPIDI_VC_t * vc, const void *buf, int count, MPI_Datatyp
 }
 
 
-static int _mxm_handle_sreq(MPID_Request * req)
+int _mxm_handle_sreq(MPID_Request * req)
 {
     int complete = FALSE;
-    int (*reqFn) (MPIDI_VC_t *, MPID_Request *, int *);
     MPID_nem_mxm_vc_area *vc_area = NULL;
     MPID_nem_mxm_req_area *req_area = NULL;
 
@@ -634,8 +636,10 @@ static int _mxm_handle_sreq(MPID_Request * req)
                       16 ? 16 : req_area->iov_buf[0].length));
 
     vc_area->pending_sends -= 1;
-    if (((req->dev.datatype_ptr != NULL) && (req->dev.tmpbuf != NULL))) {
-        MPIU_Free(req->dev.tmpbuf);
+    if (req->dev.tmpbuf) {
+        if (req->dev.datatype_ptr || req->ch.noncontig) {
+            MPIU_Free(req->dev.tmpbuf);
+        }
     }
 
     if (req_area->iov_count > MXM_MPICH_MAX_IOV) {
@@ -644,19 +648,8 @@ static int _mxm_handle_sreq(MPID_Request * req)
         req_area->iov_count = 0;
     }
 
-    reqFn = req->dev.OnDataAvail;
-    if (!reqFn) {
-        MPIDI_CH3U_Request_complete(req);
-        MPIU_DBG_MSG(CH3_CHANNEL, VERBOSE, ".... complete");
-    }
-    else {
-        MPIDI_VC_t *vc = req->ch.vc;
-
-        reqFn(vc, req, &complete);
-        if (!complete) {
-            MPIU_Assert(complete == TRUE);
-        }
-    }
+    MPIDI_CH3U_Handle_send_req(req->ch.vc, req, &complete);
+    MPIU_Assert(complete == TRUE);
 
     return complete;
 }
@@ -683,7 +676,7 @@ static void _mxm_send_completion_cb(void *context)
                     req, req->status.MPI_ERROR);
 
     if (likely(!MPIR_STATUS_GET_CANCEL_BIT(req->status))) {
-        _mxm_handle_sreq(req);
+        MPID_nem_mxm_queue_enqueue(&mxm_obj->sreq_queue, req);
     }
 }
 

http://git.mpich.org/mpich.git/commitdiff/ca610ca14a9469cab77dc79bd293dfe9bab5d767

commit ca610ca14a9469cab77dc79bd293dfe9bab5d767
Author: Antonio Pena Monferrer <apenya at mcs.anl.gov>
Date:   Fri Oct 31 15:10:17 2014 -0500

    portals4: consider MPI_STATUS_IGNORE in Probe

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_probe.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_probe.c
index f686381..3d88225 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_probe.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_probe.c
@@ -158,7 +158,8 @@ int MPID_nem_ptl_iprobe(MPIDI_VC_t *vc, int source, int tag, MPID_Comm *comm, in
     } while (!MPID_Request_is_complete(req));
 
     *flag = REQ_PTL(req)->found;
-    *status = req->status;
+    if (status != MPI_STATUS_IGNORE)
+        *status = req->status;
     
     MPID_Request_release(req);
 

http://git.mpich.org/mpich.git/commitdiff/f8aa784266110cf85b3a15d43e7718b364b00130

commit f8aa784266110cf85b3a15d43e7718b364b00130
Author: Min Si <msi at il.is.s.u-tokyo.ac.jp>
Date:   Fri Oct 31 10:27:47 2014 -0500

    Bug-fix: use normal PUT/ACC + FLUSH in req ops test.
    
    In reqops.c, the ring communication test assumes remote
    completion after MPI_RPut/MPI_Racc + MPI_Wait, which
    is not correct. MPI_Wait only guarantees local completion.
    
    Here we fixed it by replace MPI_Rput/MPI_Racc + MPI_Wait
    with MPI_Put/MPI_Acc + MPI_Win_flush.
    
    Signed-off-by: Xin Zhao <xinzhao3 at illinois.edu>

diff --git a/test/mpi/rma/reqops.c b/test/mpi/rma/reqops.c
index ef2636f..36ef18e 100644
--- a/test/mpi/rma/reqops.c
+++ b/test/mpi/rma/reqops.c
@@ -114,9 +114,9 @@ int main( int argc, char *argv[] )
         assert(req != MPI_REQUEST_NULL);
         MPI_Wait(&req, MPI_STATUS_IGNORE);
 
-        MPI_Rput(&rank, 1, MPI_INT, 0, 0, 1, MPI_INT, window, &req);
-        assert(req != MPI_REQUEST_NULL);
-        MPI_Wait(&req, MPI_STATUS_IGNORE);
+        /* Use flush to guarantee remote completion */
+        MPI_Put(&rank, 1, MPI_INT, 0, 0, 1, MPI_INT, window);
+        MPI_Win_flush(0, window);
 
         exp = (rank + nproc-1) % nproc;
 
@@ -153,9 +153,9 @@ int main( int argc, char *argv[] )
         assert(req != MPI_REQUEST_NULL);
         MPI_Wait(&req, MPI_STATUS_IGNORE);
 
-        MPI_Raccumulate(&rank, 1, MPI_INT, 0, 0, 1, MPI_INT, MPI_REPLACE, window, &req);
-        assert(req != MPI_REQUEST_NULL);
-        MPI_Wait(&req, MPI_STATUS_IGNORE);
+        /* Use flush to guarantee remote completion */
+        MPI_Accumulate(&rank, 1, MPI_INT, 0, 0, 1, MPI_INT, MPI_REPLACE, window);
+        MPI_Win_flush(0, window);
 
         exp = (rank + nproc-1) % nproc;
 

http://git.mpich.org/mpich.git/commitdiff/e409c8797834cad967dd47fa87572fce52fd50c2

commit e409c8797834cad967dd47fa87572fce52fd50c2
Author: Sangmin Seo <sseo at anl.gov>
Date:   Fri Oct 31 10:02:31 2014 -0500

    Change mpich-discuss at mcs.anl.gov to discuss at mpich.org.
    
    Used discuss at mpich.org instead of mpich-discucc at mcs.anl.gov in the
    installers' guide.
    
    Signed-off-by: Junchao Zhang <jczhang at mcs.anl.gov>

diff --git a/doc/installguide/install.tex.vin b/doc/installguide/install.tex.vin
index 0a77d84..b256706 100644
--- a/doc/installguide/install.tex.vin
+++ b/doc/installguide/install.tex.vin
@@ -98,7 +98,7 @@ run MPI applications.  Some particular features are different
 if you have system administration privileges (can become ``root'' on a
 Unix system), and these are explained here.  It is not necessary to have
 such privileges to build and install MPICH.  In the event of problems,
-send mail to \texttt{mpich-discuss at mcs.anl.gov}.  Once MPICH is
+send mail to \texttt{discuss at mpich.org}.  Once MPICH is
 installed, details on how to run MPI jobs are covered in the \emph{MPICH
 User's Guide}, found in this same \texttt{doc} subdirectory.
 
@@ -196,7 +196,7 @@ where the \texttt{$\backslash$} means that this is really one line.  (On
 instead of \verb+|& tee c.txt+).  Other configure options are
 described below.  Check the \texttt{c.txt} file to make sure
 everything went well.  Problems should be self-explanatory, but if not,
-send \texttt{c.txt} to \texttt{mpich-discuss at mcs.anl.gov}.
+send \texttt{c.txt} to \texttt{discuss at mpich.org}.
 The file \texttt{config.log} is created by \texttt{configure} and
 contains a record of the tests that \texttt{configure} performed.  It
 is normal for some tests recorded in \texttt{config.log} to fail.  
@@ -217,7 +217,7 @@ do a \texttt{make clean} and then run make again with \texttt{VERBOSE=1}
     make VERBOSE=1 2>&1 | tee m.txt   (for bash and sh)
 \end{verbatim}
 and then send \texttt{m.txt} and \texttt{c.txt} to 
-\texttt{mpich-discuss at mcs.anl.gov}.
+\texttt{discuss at mpich.org}.
 
 \item
 Install the MPICH commands:

http://git.mpich.org/mpich.git/commitdiff/696558f535c3669bbeb4ffe67672af7c5350b01a

commit 696558f535c3669bbeb4ffe67672af7c5350b01a
Author: Wesley Bland <wbland at anl.gov>
Date:   Thu Oct 30 10:45:28 2014 -0500

    Fix configure related to --enable-error-checking
    
    The default --enable-error-checking (used when no additional value is
    included) value ended up getting set to "yes" instead of "default". This
    ended up preventing configure from passing under certain conditions.
    This changes the default to "all" and fixes the check for "yes".
    
    Signed-off-by: Ken Raffenetti <raffenet at mcs.anl.gov>

diff --git a/configure.ac b/configure.ac
index 5c3d291..f7ffd0a 100644
--- a/configure.ac
+++ b/configure.ac
@@ -329,7 +329,7 @@ AC_ARG_ENABLE(error-checking,
         runtime   - error checking controllable at runtime through environment 
                     variables
         all       - error checking always enabled (default)
-],,enable_error_checking=default)
+],,enable_error_checking=all)
 
 AC_ARG_ENABLE(error-messages,
 [  --enable-error-messages=level - Control the amount of detail in error messages.
@@ -916,7 +916,7 @@ fi
 
 # error-checking
 # Change default into the specific value of the default
-if test "$enable_error_checking" = "default" ; then
+if test "$enable_error_checking" = "yes" ; then
    enable_error_checking=all
 fi
 # glue_romio.h needs the variable HAVE_ERROR_CHECKING to have the value 0 or 1
@@ -927,7 +927,7 @@ case "$enable_error_checking" in
     # checking tests in the test suite
     ac_configure_args="${ac_configure_args} --disable-checkerrors"
     ;;
-    all|yes|runtime)
+    all|runtime)
     error_checking_kind=`echo $enable_error_checking | \
     tr 'abcdefghijklmnopqrstuvwxyz' 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'`
     error_checking_kind=MPID_ERROR_LEVEL_$error_checking_kind

http://git.mpich.org/mpich.git/commitdiff/0e0230dfbff01a027673b1af404ac6d72885de37

commit 0e0230dfbff01a027673b1af404ac6d72885de37
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Fri Oct 24 13:39:19 2014 -0500

    tsuite: disable threads/rma tests when RMA disabled
    
    If the testsuite is configured with --disable-rma, ensure that rma
    tests in the threads directory do not still run.
    
    Signed-off-by: Wesley Bland <wbland at anl.gov>

diff --git a/test/mpi/threads/testlist.in b/test/mpi/threads/testlist.in
index 6854dd7..ac3f3cc 100644
--- a/test/mpi/threads/testlist.in
+++ b/test/mpi/threads/testlist.in
@@ -3,4 +3,4 @@ comm
 init
 mpi_t
 @spawndir@
-rma
+ at rmadir@

http://git.mpich.org/mpich.git/commitdiff/b628d864aa8f9b85be16d031abbe7c1ccadda563

commit b628d864aa8f9b85be16d031abbe7c1ccadda563
Author: Min Si <msi at il.is.s.u-tokyo.ac.jp>
Date:   Fri Oct 17 11:05:10 2014 -0500

    Bug-fix: trigger final req handler for receiving derived datatype.
    
    There are two request handlers used when receiving data:
    (1) OnDataAvail, which is triggered when data is arrived;
    (2) OnFinal, which is triggered when receiving data is finished;
    
    When receiving large derived datatype, the receiving iov can be divided
    into multiple iovs. The OnDataAvail handler is set to iov load function
    when still waiting for remaining data. However, such handler should be
    set to OnFinal when starting receiving the last iov.
    
    The original code does not set OnDataAvail handler to OnFinal at end.
    This patch fixes this bug.
    
    Note that this bug only appears in RMA calls, because only the RMA
    packet handers need to specify OnFinal.
    
    Resolve #2189.
    
    Signed-off-by: Xin Zhao <xinzhao3 at illinois.edu>

diff --git a/src/mpid/ch3/src/ch3u_request.c b/src/mpid/ch3/src/ch3u_request.c
index af39b3d..c5ff48d 100644
--- a/src/mpid/ch3/src/ch3u_request.c
+++ b/src/mpid/ch3/src/ch3u_request.c
@@ -88,6 +88,8 @@ MPID_Request * MPID_Request_create(void)
 	req->dev.iov_offset        = 0;
         req->dev.flags             = MPIDI_CH3_PKT_FLAG_NONE;
         req->dev.resp_request_handle = MPI_REQUEST_NULL;
+        req->dev.OnDataAvail       = NULL;
+        req->dev.OnFinal           = NULL;
 #ifdef MPIDI_CH3_REQUEST_INIT
 	MPIDI_CH3_REQUEST_INIT(req);
 #endif
@@ -397,7 +399,7 @@ int MPIDI_CH3U_Request_load_recv_iov(MPID_Request * const rreq)
 	    MPIU_DBG_MSG(CH3_CHANNEL,VERBOSE,
      "updating rreq to read the remaining data directly into the user buffer");
 	    /* Eventually, use OnFinal for this instead */
-	    rreq->dev.OnDataAvail = 0;
+	    rreq->dev.OnDataAvail = rreq->dev.OnFinal;
 	}
 	else if (last == rreq->dev.segment_size || 
 		 (last - rreq->dev.segment_first) / rreq->dev.iov_count >= MPIDI_IOV_DENSITY_MIN)
@@ -466,7 +468,7 @@ int MPIDI_CH3U_Request_load_recv_iov(MPID_Request * const rreq)
 	    rreq->dev.iov[0].MPID_IOV_LEN = data_sz;
 	    MPIU_Assert(MPIDI_Request_get_type(rreq) == MPIDI_REQUEST_TYPE_RECV);
 	    /* Eventually, use OnFinal for this instead */
-	    rreq->dev.OnDataAvail = 0;
+	    rreq->dev.OnDataAvail = rreq->dev.OnFinal;
 	}
 	else
 	{
diff --git a/src/mpid/ch3/src/ch3u_rma_sync.c b/src/mpid/ch3/src/ch3u_rma_sync.c
index 4cbf25c..1324324 100644
--- a/src/mpid/ch3/src/ch3u_rma_sync.c
+++ b/src/mpid/ch3/src/ch3u_rma_sync.c
@@ -4502,6 +4502,7 @@ int MPIDI_CH3_PktHandler_Put( MPIDI_VC_t *vc, MPIDI_CH3_Pkt_t *pkt,
         /* derived datatype */
         MPIDI_Request_set_type(req, MPIDI_REQUEST_TYPE_PUT_RESP_DERIVED_DT);
         req->dev.datatype = MPI_DATATYPE_NULL;
+        req->dev.OnFinal = MPIDI_CH3_ReqHandler_PutAccumRespComplete;
 	    
         req->dev.dtype_info = (MPIDI_RMA_dtype_info *) 
             MPIU_Malloc(sizeof(MPIDI_RMA_dtype_info));
@@ -4806,6 +4807,7 @@ int MPIDI_CH3_PktHandler_Accumulate( MPIDI_VC_t *vc, MPIDI_CH3_Pkt_t *pkt,
 	MPIDI_Request_set_type(req, MPIDI_REQUEST_TYPE_ACCUM_RESP_DERIVED_DT);
 	req->dev.OnDataAvail = MPIDI_CH3_ReqHandler_AccumRespDerivedDTComplete;
 	req->dev.datatype = MPI_DATATYPE_NULL;
+        req->dev.OnFinal = MPIDI_CH3_ReqHandler_PutAccumRespComplete;
                 
 	req->dev.dtype_info = (MPIDI_RMA_dtype_info *) 
 	    MPIU_Malloc(sizeof(MPIDI_RMA_dtype_info));

http://git.mpich.org/mpich.git/commitdiff/71083d9465508b28b2c98ba62423e70a0d7530ef

commit 71083d9465508b28b2c98ba62423e70a0d7530ef
Author: Pavan Balaji <balaji at anl.gov>
Date:   Tue Oct 7 22:11:10 2014 -0500

    Disable hugepage support by default.
    
    This patch is a workaround for an issue with older HPC-X machines.
    Once we are comfortable upgrading to the latest HPC-X version, the
    default value of the CVAR should be changed to true.
    
    Signed-off-by: Xin Zhao <xinzhao3 at illinois.edu>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_init.c b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_init.c
index ec04d56..949efc7 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_init.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_init.c
@@ -40,6 +40,18 @@ cvars:
         If true, force mxm to disconnect all processes at
         finalization time.
 
+    - name        : MPIR_CVAR_NEMESIS_MXM_HUGEPAGE
+      category    : CH3
+      type        : boolean
+      default     : false
+      class       : none
+      verbosity   : MPI_T_VERBOSITY_USER_BASIC
+      scope       : MPI_T_SCOPE_ALL_EQ
+      description : >-
+        If true, mxm tries detecting hugepage support.  On HPC-X 2.3
+        and earlier, this might cause problems on Ubuntu and other
+        platforms even if the system provides hugepage support.
+
 === END_MPI_T_CVAR_INFO_BLOCK ===
 */
 
@@ -126,6 +138,20 @@ int MPID_nem_mxm_init(MPIDI_PG_t * pg_p, int pg_rank, char **bc_val_p, int *val_
         MPIU_ERR_CHKANDJUMP(r, mpi_errno, MPI_ERR_OTHER, "**putenv");
     }
 
+    /* [PB @ 2014-10-06] If hugepage support is not enabled, we force
+     * memory allocation to go through mmap.  This is mainly to
+     * workaround issues in MXM with Ubuntu where the detection has
+     * some issues (either because of bugs on the platform or within
+     * MXM) causing errors.  This can probably be deleted eventually
+     * when this issue is resolved.  */
+    if (MPIR_CVAR_NEMESIS_MXM_HUGEPAGE == 0) {
+        if (getenv("MXM_MEM_ALLOC") == NULL) {
+            r = MPL_putenv("MXM_MEM_ALLOC=mmap,libc,sysv");
+            MPIU_ERR_CHKANDJUMP(r, mpi_errno, MPI_ERR_OTHER, "**putenv");
+        }
+    }
+
+
     mpi_errno = _mxm_init(pg_rank, pg_p->size);
     if (mpi_errno)
         MPIU_ERR_POP(mpi_errno);

http://git.mpich.org/mpich.git/commitdiff/c435225e9cd91a379b25e624296133a5b937cb0b

commit c435225e9cd91a379b25e624296133a5b937cb0b
Author: Junchao Zhang <jczhang at mcs.anl.gov>
Date:   Mon Oct 27 15:39:27 2014 -0500

    Make F08 buildiface also support MPIX_ subroutines
    
    No review since F08 binding is experimental now.

diff --git a/src/binding/fortran/use_mpi_f08/buildiface b/src/binding/fortran/use_mpi_f08/buildiface
index a4bfd93..0431474 100755
--- a/src/binding/fortran/use_mpi_f08/buildiface
+++ b/src/binding/fortran/use_mpi_f08/buildiface
@@ -25,7 +25,9 @@ open($pmpi_f08_fh, ">", $pmpi_f08_file) || die "Error: Could not open $pmpi_f08_
 while (<$mpi_f08_fh>) {
     if (/pmpi_f08/) {
         next; # Skip the "use :: pmpi_f08" line
-    } elsif (/module\s+mpi_f08/) {
+    }
+    # MPI_ part
+    elsif (/module\s+mpi_f08/) {
         $_ =~ s/module\s+mpi_f08/module pmpi_f08/;
     } elsif (/interface\s+MPI_/) {
         $_ =~ s/interface\s+MPI_/interface PMPI_/;
@@ -34,6 +36,14 @@ while (<$mpi_f08_fh>) {
     } elsif (/function\s+MPI_/) {
         $_ =~ s/function\s+MPI_/function PMPIR_/;
     }
+    # MPIX_ part
+    elsif (/interface\s+MPIX_/) {
+        $_ =~ s/interface\s+MPIX_/interface PMPIX_/;
+    } elsif (/subroutine\s+MPIX_/) {
+        $_ =~ s/subroutine\s+MPIX_/subroutine PMPIXR_/;
+    } elsif (/function\s+MPIX_/) {
+        $_ =~ s/function\s+MPIX_/function PMPIXR_/;
+    }
     print $pmpi_f08_fh $_;
 }
 
@@ -59,6 +69,12 @@ foreach my $mpi_file (glob("$wrappers_f_dir/*.F90")) {
         } elsif (/function\s+MPI_/) {
             $_ =~ s/function\s+MPI_/function PMPIR_/
         }
+        elsif (/subroutine\s+MPIX_/) {
+            $_ =~ s/subroutine\s+MPIX_/subroutine PMPIXR_/
+        } elsif (/function\s+MPIX_/) {
+            $_ =~ s/function\s+MPIX_/function PMPIXR_/
+        }
+
         print $pmpi_fh $_;
     }
 

http://git.mpich.org/mpich.git/commitdiff/ed75969d2d373b02a7be2690d2530081aee22e92

commit ed75969d2d373b02a7be2690d2530081aee22e92
Author: Junchao Zhang <jczhang at mcs.anl.gov>
Date:   Mon Oct 27 15:45:13 2014 -0500

    Revise F08 MPI_Wtime/Wtick to make buildiface simpler
    
    No review since F08 binding is experimental now.

diff --git a/src/binding/fortran/use_mpi_f08/buildiface b/src/binding/fortran/use_mpi_f08/buildiface
index 7f61bae..a4bfd93 100755
--- a/src/binding/fortran/use_mpi_f08/buildiface
+++ b/src/binding/fortran/use_mpi_f08/buildiface
@@ -33,10 +33,6 @@ while (<$mpi_f08_fh>) {
         $_ =~ s/subroutine\s+MPI_/subroutine PMPIR_/;
     } elsif (/function\s+MPI_/) {
         $_ =~ s/function\s+MPI_/function PMPIR_/;
-    } elsif (/MPI_Wtick_f08/) { # Replace return value
-        $_ =~ s/MPI_Wtick_f08/PMPIR_Wtick_f08/;
-    } elsif (/MPI_Wtime_f08/) { # Replace return value
-        $_ =~ s/MPI_Wtime_f08/PMPIR_Wtime_f08/;
     }
     print $pmpi_f08_fh $_;
 }
diff --git a/src/binding/fortran/use_mpi_f08/mpi_f08.F90 b/src/binding/fortran/use_mpi_f08/mpi_f08.F90
index 4b3cb9e..bd4c52d 100644
--- a/src/binding/fortran/use_mpi_f08/mpi_f08.F90
+++ b/src/binding/fortran/use_mpi_f08/mpi_f08.F90
@@ -4090,16 +4090,16 @@ interface MPI_Ineighbor_alltoallw
 end interface MPI_Ineighbor_alltoallw
 
 interface MPI_Wtick
-    function  MPI_Wtick_f08()
+    function  MPI_Wtick_f08() result(res)
         implicit none
-        double precision :: MPI_Wtick_f08
+        double precision :: res
     end function MPI_Wtick_f08
 end interface MPI_Wtick
 
 interface MPI_Wtime
-    function MPI_Wtime_f08()
+    function MPI_Wtime_f08() result(res)
         implicit none
-        double precision :: MPI_Wtime_f08
+        double precision :: res
     end function MPI_Wtime_f08
 end interface MPI_Wtime
 

http://git.mpich.org/mpich.git/commitdiff/33026041726904e3668ef781107afcc6c8ee3889

commit 33026041726904e3668ef781107afcc6c8ee3889
Author: Junchao Zhang <jczhang at mcs.anl.gov>
Date:   Wed Oct 29 16:36:20 2014 -0500

    Add the missing value attribute to intent(in) arguments
    
    This is needed since we pass these arguments by value
    
    No review since F08 binding is experimental now.

diff --git a/src/binding/fortran/use_mpi_f08/mpi_c_interface_cdesc.F90 b/src/binding/fortran/use_mpi_f08/mpi_c_interface_cdesc.F90
index 9c4738a..af7b15d 100644
--- a/src/binding/fortran/use_mpi_f08/mpi_c_interface_cdesc.F90
+++ b/src/binding/fortran/use_mpi_f08/mpi_c_interface_cdesc.F90
@@ -805,10 +805,10 @@ function MPIR_Compare_and_swap_cdesc(origin_addr, compare_addr, result_addr, dat
     type(*), dimension(..), intent(in), asynchronous :: origin_addr
     type(*), dimension(..), intent(in), asynchronous :: compare_addr
     type(*), dimension(..), asynchronous :: result_addr
-    integer(c_Datatype), intent(in) :: datatype
-    integer(c_int), intent(in) :: target_rank
-    integer(kind=MPI_ADDRESS_KIND), intent(in) :: target_disp
-    integer(c_Win), intent(in) :: win
+    integer(c_Datatype), value, intent(in) :: datatype
+    integer(c_int), value, intent(in) :: target_rank
+    integer(kind=MPI_ADDRESS_KIND), value, intent(in) :: target_disp
+    integer(c_Win), value, intent(in) :: win
     integer(c_int) :: ierror
 end function MPIR_Compare_and_swap_cdesc
 
@@ -821,11 +821,11 @@ function MPIR_Fetch_and_op_cdesc(origin_addr, result_addr, datatype, target_rank
     implicit none
     type(*), dimension(..), intent(in), asynchronous :: origin_addr
     type(*), dimension(..), asynchronous :: result_addr
-    integer(c_Datatype), intent(in) :: datatype
-    integer(c_int), intent(in) :: target_rank
-    integer(kind=MPI_ADDRESS_KIND), intent(in) :: target_disp
-    integer(c_Op), intent(in) :: op
-    integer(c_Win), intent(in) :: win
+    integer(c_Datatype), value, intent(in) :: datatype
+    integer(c_int), value, intent(in) :: target_rank
+    integer(kind=MPI_ADDRESS_KIND), value, intent(in) :: target_disp
+    integer(c_Op), value, intent(in) :: op
+    integer(c_Win), value, intent(in) :: win
     integer(c_int) :: ierror
 end function MPIR_Fetch_and_op_cdesc
 
@@ -855,11 +855,11 @@ function MPIR_Get_accumulate_cdesc(origin_addr, origin_count, origin_datatype, r
     implicit none
     type(*), dimension(..), intent(in), asynchronous :: origin_addr
     type(*), dimension(..), asynchronous :: result_addr
-    integer(c_int), intent(in) :: origin_count, result_count, target_rank, target_count
-    integer(c_Datatype), intent(in) :: origin_datatype, target_datatype, result_datatype
-    integer(kind=MPI_ADDRESS_KIND), intent(in) :: target_disp
-    integer(c_Op), intent(in) :: op
-    integer(c_Win), intent(in) :: win
+    integer(c_int), value, intent(in) :: origin_count, result_count, target_rank, target_count
+    integer(c_Datatype), value, intent(in) :: origin_datatype, target_datatype, result_datatype
+    integer(kind=MPI_ADDRESS_KIND), value, intent(in) :: target_disp
+    integer(c_Op), value, intent(in) :: op
+    integer(c_Win), value, intent(in) :: win
     integer(c_int) :: ierror
 end function MPIR_Get_accumulate_cdesc
 
@@ -887,11 +887,11 @@ function MPIR_Raccumulate_cdesc(origin_addr, origin_count, origin_datatype, targ
     use :: mpi_f08_compile_constants, only : MPI_ADDRESS_KIND
     implicit none
     type(*), dimension(..), intent(in), asynchronous :: origin_addr
-    integer, intent(in) :: origin_count, target_rank, target_count
-    integer(c_Datatype), intent(in) :: origin_datatype, target_datatype
-    integer(kind=MPI_ADDRESS_KIND), intent(in) :: target_disp
-    integer(c_Op), intent(in) :: op
-    integer(c_Win), intent(in) :: win
+    integer, value, intent(in) :: origin_count, target_rank, target_count
+    integer(c_Datatype), value, intent(in) :: origin_datatype, target_datatype
+    integer(kind=MPI_ADDRESS_KIND), value, intent(in) :: target_disp
+    integer(c_Op), value, intent(in) :: op
+    integer(c_Win), value, intent(in) :: win
     integer(c_Request), intent(out) :: request
     integer(c_int) :: ierror
 end function MPIR_Raccumulate_cdesc
@@ -904,10 +904,10 @@ function MPIR_Rget_cdesc(origin_addr, origin_count, origin_datatype, target_rank
     use :: mpi_f08_compile_constants, only : MPI_ADDRESS_KIND
     implicit none
     type(*), dimension(..), asynchronous :: origin_addr
-    integer, intent(in) :: origin_count, target_rank, target_count
-    integer(c_Datatype), intent(in) :: origin_datatype, target_datatype
-    integer(kind=MPI_ADDRESS_KIND), intent(in) :: target_disp
-    integer(c_Win), intent(in) :: win
+    integer, value, intent(in) :: origin_count, target_rank, target_count
+    integer(c_Datatype), value, intent(in) :: origin_datatype, target_datatype
+    integer(kind=MPI_ADDRESS_KIND), value, intent(in) :: target_disp
+    integer(c_Win), value, intent(in) :: win
     integer(c_Request), intent(out) :: request
     integer(c_int) :: ierror
 end function MPIR_Rget_cdesc
@@ -922,11 +922,11 @@ function MPIR_Rget_accumulate_cdesc(origin_addr, origin_count, origin_datatype,
     implicit none
     type(*), dimension(..), intent(in), asynchronous :: origin_addr
     type(*), dimension(..), asynchronous :: result_addr
-    integer, intent(in) :: origin_count, result_count, target_rank, target_count
-    integer(c_Datatype), intent(in) :: origin_datatype, target_datatype, result_datatype
-    integer(kind=MPI_ADDRESS_KIND), intent(in) :: target_disp
-    integer(c_Op), intent(in) :: op
-    integer(c_Win), intent(in) :: win
+    integer, value, intent(in) :: origin_count, result_count, target_rank, target_count
+    integer(c_Datatype), value, intent(in) :: origin_datatype, target_datatype, result_datatype
+    integer(kind=MPI_ADDRESS_KIND), value, intent(in) :: target_disp
+    integer(c_Op), value, intent(in) :: op
+    integer(c_Win), value, intent(in) :: win
     integer(c_Request), intent(out) :: request
     integer(c_int) :: ierror
 end function MPIR_Rget_accumulate_cdesc
@@ -939,10 +939,10 @@ function MPIR_Rput_cdesc(origin_addr, origin_count, origin_datatype, target_rank
     use :: mpi_f08_compile_constants, only : MPI_ADDRESS_KIND
     implicit none
     type(*), dimension(..), intent(in), asynchronous :: origin_addr
-    integer, intent(in) :: origin_count, target_rank, target_count
-    integer(c_Datatype), intent(in) :: origin_datatype, target_datatype
-    integer(kind=MPI_ADDRESS_KIND), intent(in) :: target_disp
-    integer(c_Win), intent(in) :: win
+    integer, value, intent(in) :: origin_count, target_rank, target_count
+    integer(c_Datatype), value, intent(in) :: origin_datatype, target_datatype
+    integer(kind=MPI_ADDRESS_KIND), value, intent(in) :: target_disp
+    integer(c_Win), value, intent(in) :: win
     integer(c_Request), intent(out) :: request
     integer(c_int) :: ierror
 end function MPIR_Rput_cdesc
@@ -953,9 +953,9 @@ function MPIR_Win_attach_cdesc(win, base, size) &
     use :: mpi_c_interface_types, only : c_Win
     use :: mpi_f08_compile_constants, only : MPI_ADDRESS_KIND
     implicit none
-    integer(c_Win), intent(in) :: win
+    integer(c_Win), value, intent(in) :: win
     type(*), dimension(..), asynchronous :: base
-    integer(kind=MPI_ADDRESS_KIND), intent(in) :: size
+    integer(kind=MPI_ADDRESS_KIND), value, intent(in) :: size
     integer(c_int) :: ierror
 end function MPIR_Win_attach_cdesc
 
@@ -979,7 +979,7 @@ function MPIR_Win_detach_cdesc(win, base) &
     use, intrinsic :: iso_c_binding, only : c_int
     use :: mpi_c_interface_types, only : c_Win
     implicit none
-    integer(c_Win), intent(in) :: win
+    integer(c_Win), value, intent(in) :: win
     type(*), dimension(..), asynchronous :: base
     integer(c_int) :: ierror
 end function MPIR_Win_detach_cdesc
diff --git a/src/binding/fortran/use_mpi_f08/mpi_c_interface_nobuf.F90 b/src/binding/fortran/use_mpi_f08/mpi_c_interface_nobuf.F90
index 306db6f..d1676b8 100644
--- a/src/binding/fortran/use_mpi_f08/mpi_c_interface_nobuf.F90
+++ b/src/binding/fortran/use_mpi_f08/mpi_c_interface_nobuf.F90
@@ -755,7 +755,7 @@ function MPIR_Comm_get_info_c(comm, info_used) &
     use, intrinsic :: iso_c_binding, only : c_int
     use :: mpi_c_interface_types, only : c_Comm, c_Info
     implicit none
-    integer(c_Comm), intent(in) :: comm
+    integer(c_Comm), value, intent(in) :: comm
     integer(c_Info), intent(out) :: info_used
     integer(c_int) :: ierror
 end function MPIR_Comm_get_info_c
@@ -765,8 +765,8 @@ function MPIR_Comm_set_info_c(comm, info) &
     use, intrinsic :: iso_c_binding, only : c_int
     use :: mpi_c_interface_types, only : c_Comm, c_Info
     implicit none
-    integer(c_Comm), intent(in) :: comm
-    integer(c_Info), intent(in) :: info
+    integer(c_Comm), value, intent(in) :: comm
+    integer(c_Info), value, intent(in) :: info
     integer(c_int) :: ierror
 end function MPIR_Comm_set_info_c
 
@@ -1787,10 +1787,10 @@ function MPIR_Win_allocate_c(size, disp_unit, info, comm, baseptr, win) &
     use :: mpi_c_interface_types, only : c_Info, c_Comm, c_Win
     use :: mpi_f08_compile_constants, only : MPI_ADDRESS_KIND
     implicit none
-    integer(kind=MPI_ADDRESS_KIND), intent(in) :: size
-    integer(c_int), intent(in) :: disp_unit
-    integer(c_Info), intent(in) :: info
-    integer(c_Comm), intent(in) :: comm
+    integer(kind=MPI_ADDRESS_KIND), value, intent(in) :: size
+    integer(c_int), value, intent(in) :: disp_unit
+    integer(c_Info), value, intent(in) :: info
+    integer(c_Comm), value, intent(in) :: comm
     type(c_ptr), intent(out) :: baseptr
     integer(c_Win), intent(out) :: win
     integer(c_int) :: ierror
@@ -1802,10 +1802,10 @@ function MPIR_Win_allocate_shared_c(size, disp_unit, info, comm, baseptr, win) &
     use :: mpi_c_interface_types, only : c_Info, c_Comm, c_Win
     use :: mpi_f08_compile_constants, only : MPI_ADDRESS_KIND
     implicit none
-    integer(kind=MPI_ADDRESS_KIND), intent(in) :: size
-    integer(c_int), intent(in) :: disp_unit
-    integer(c_Info), intent(in) :: info
-    integer(c_Comm), intent(in) :: comm
+    integer(kind=MPI_ADDRESS_KIND), value, intent(in) :: size
+    integer(c_int), value, intent(in) :: disp_unit
+    integer(c_Info), value, intent(in) :: info
+    integer(c_Comm), value, intent(in) :: comm
     type(c_ptr), intent(out) :: baseptr
     integer(c_Win), intent(out) :: win
     integer(c_int) :: ierror
@@ -1825,8 +1825,8 @@ function MPIR_Win_create_dynamic_c(info, comm, win) &
     use, intrinsic :: iso_c_binding, only : c_int
     use :: mpi_c_interface_types, only : c_Info, c_Comm, c_Win
     implicit none
-    integer(c_Info), intent(in) :: info
-    integer(c_Comm), intent(in) :: comm
+    integer(c_Info), value, intent(in) :: info
+    integer(c_Comm), value, intent(in) :: comm
     integer(c_Win), intent(out) :: win
     integer(c_int) :: ierror
 end function MPIR_Win_create_dynamic_c
@@ -1846,8 +1846,8 @@ function MPIR_Win_flush_c(rank, win) &
     use, intrinsic :: iso_c_binding, only : c_int
     use :: mpi_c_interface_types, only : c_Win
     implicit none
-    integer(c_int), intent(in) :: rank
-    integer(c_Win), intent(in) :: win
+    integer(c_int), value, intent(in) :: rank
+    integer(c_Win), value, intent(in) :: win
     integer(c_int) :: ierror
 end function MPIR_Win_flush_c
 
@@ -1856,7 +1856,7 @@ function MPIR_Win_flush_all_c(win) &
     use, intrinsic :: iso_c_binding, only : c_int
     use :: mpi_c_interface_types, only : c_Win
     implicit none
-    integer(c_Win), intent(in) :: win
+    integer(c_Win), value, intent(in) :: win
     integer(c_int) :: ierror
 end function MPIR_Win_flush_all_c
 
@@ -1865,8 +1865,8 @@ function MPIR_Win_flush_local_c(rank, win) &
     use, intrinsic :: iso_c_binding, only : c_int
     use :: mpi_c_interface_types, only : c_Win
     implicit none
-    integer(c_int), intent(in) :: rank
-    integer(c_Win), intent(in) :: win
+    integer(c_int), value, intent(in) :: rank
+    integer(c_Win), value, intent(in) :: win
     integer(c_int) :: ierror
 end function MPIR_Win_flush_local_c
 
@@ -1875,7 +1875,7 @@ function MPIR_Win_flush_local_all_c(win) &
     use, intrinsic :: iso_c_binding, only : c_int
     use :: mpi_c_interface_types, only : c_Win
     implicit none
-    integer(c_Win), intent(in) :: win
+    integer(c_Win), value, intent(in) :: win
     integer(c_int) :: ierror
 end function MPIR_Win_flush_local_all_c
 
@@ -1903,7 +1903,7 @@ function MPIR_Win_get_info_c(win, info_used) &
     use, intrinsic :: iso_c_binding, only : c_int
     use :: mpi_c_interface_types, only : c_Win, c_Info
     implicit none
-    integer(c_Win), intent(in) :: win
+    integer(c_Win), value, intent(in) :: win
     integer(c_Info), intent(out) :: info_used
     integer(c_int) :: ierror
 end function MPIR_Win_get_info_c
@@ -1923,8 +1923,8 @@ function MPIR_Win_lock_all_c(assert, win) &
     use, intrinsic :: iso_c_binding, only : c_int
     use :: mpi_c_interface_types, only : c_Win
     implicit none
-    integer(c_int), intent(in) :: assert
-    integer(c_Win), intent(in) :: win
+    integer(c_int), value, intent(in) :: assert
+    integer(c_Win), value, intent(in) :: win
     integer(c_int) :: ierror
 end function MPIR_Win_lock_all_c
 
@@ -1944,8 +1944,8 @@ function MPIR_Win_set_info_c(win, info) &
     use, intrinsic :: iso_c_binding, only : c_int
     use :: mpi_c_interface_types, only : c_Win, c_Info
     implicit none
-    integer(c_Win), intent(in) :: win
-    integer(c_Info), intent(in) :: info
+    integer(c_Win), value, intent(in) :: win
+    integer(c_Info), value, intent(in) :: info
     integer(c_int) :: ierror
 end function MPIR_Win_set_info_c
 
@@ -1955,8 +1955,8 @@ function MPIR_Win_shared_query_c(win, rank, size, disp_unit, baseptr) &
     use :: mpi_c_interface_types, only : c_Win
     use :: mpi_f08_compile_constants, only : MPI_ADDRESS_KIND
     implicit none
-    integer(c_Win), intent(in) :: win
-    integer(c_int), intent(in) :: rank
+    integer(c_Win), value, intent(in) :: win
+    integer(c_int), value, intent(in) :: rank
     integer(kind=MPI_ADDRESS_KIND), intent(out) :: size
     integer(c_int), intent(out) :: disp_unit
     type(c_ptr), intent(out) :: baseptr
@@ -1979,7 +1979,7 @@ function MPIR_Win_sync_c(win) &
     use, intrinsic :: iso_c_binding, only : c_int
     use :: mpi_c_interface_types, only : c_Win
     implicit none
-    integer(c_Win), intent(in) :: win
+    integer(c_Win), value, intent(in) :: win
     integer(c_int) :: ierror
 end function MPIR_Win_sync_c
 
@@ -2008,7 +2008,7 @@ function MPIR_Win_unlock_all_c(win) &
     use, intrinsic :: iso_c_binding, only : c_int
     use :: mpi_c_interface_types, only : c_Win
     implicit none
-    integer(c_Win), intent(in) :: win
+    integer(c_Win), value, intent(in) :: win
     integer(c_int) :: ierror
 end function MPIR_Win_unlock_all_c
 

http://git.mpich.org/mpich.git/commitdiff/72be0e10f27f13f8ca1a9d5f8563a6d670094dbe

commit 72be0e10f27f13f8ca1a9d5f8563a6d670094dbe
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Tue Oct 28 14:00:43 2014 -0500

    portals4: set reasonable interface limits
    
    Set reasonable limits for maximum unexpected headers and EQs at init
    time. We accomplish this with a pre-init stage where we fill in a limits
    struct with the system defaults, increase certain values (if they are not
    set already in the environment), then do the real init.
    
    If the "desired" limits structure had a way to allow default values for
    limits we don't care about, the pre-init stage could go away.
    
    Signed-off-by: Antonio J. Pena <apenya at mcs.anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
index 9535797..a6ef6e6 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
@@ -11,7 +11,8 @@
 #error Checkpointing not implemented
 #endif
 
-#define EQ_COUNT 1024
+#define UNEXPECTED_HDR_COUNT 32768
+#define EQ_COUNT             32768
 #define NID_KEY  "NID"
 #define PID_KEY  "PID"
 #define PTI_KEY  "PTI"
@@ -81,6 +82,7 @@ static int ptl_init(MPIDI_PG_t *pg_p, int pg_rank, char **bc_val_p, int *val_max
     int mpi_errno = MPI_SUCCESS;
     int ret;
     ptl_md_t md;
+    ptl_ni_limits_t desired;
     MPIDI_STATE_DECL(MPID_STATE_PTL_INIT);
 
     MPIDI_FUNC_ENTER(MPID_STATE_PTL_INIT);
@@ -101,19 +103,28 @@ static int ptl_init(MPIDI_PG_t *pg_p, int pg_rank, char **bc_val_p, int *val_max
 
     MPIDI_Anysource_improbe_fn = MPID_nem_ptl_anysource_improbe;
 
-    /* set the unexpected header limit before PtlInit, unless it is already set in the env */
-    if (getenv("PTL_LIM_MAX_UNEXPECTED_HEADERS") == NULL) {
-        char *envstr = MPIU_Strdup("PTL_LIM_MAX_UNEXPECTED_HEADERS=2000000");
-        MPL_putenv(envstr);
-        MPIU_Free(envstr);
-    }
-
     /* init portals */
     ret = PtlInit();
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlinit", "**ptlinit %s", MPID_nem_ptl_strerror(ret));
     
+    /* do an interface pre-init to get the default limits struct */
+    ret = PtlNIInit(PTL_IFACE_DEFAULT, PTL_NI_MATCHING | PTL_NI_PHYSICAL,
+                    PTL_PID_ANY, NULL, &desired, &MPIDI_nem_ptl_ni);
+    MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlniinit", "**ptlniinit %s", MPID_nem_ptl_strerror(ret));
+
+    /* finalize the interface so we can re-init with our desired maximums */
+    ret = PtlNIFini(MPIDI_nem_ptl_ni);
+    MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlnifini", "**ptlnifini %s", MPID_nem_ptl_strerror(ret));
+
+    /* set higher limits if they are determined to be too low */
+    if (desired.max_unexpected_headers < UNEXPECTED_HDR_COUNT && getenv("PTL_LIM_MAX_UNEXPECTED_HEADERS") == NULL)
+        desired.max_unexpected_headers = UNEXPECTED_HDR_COUNT;
+    if (desired.max_eqs < EQ_COUNT && getenv("PTL_LIM_MAX_EQS") == NULL)
+        desired.max_eqs = EQ_COUNT;
+
+    /* do the real init */
     ret = PtlNIInit(PTL_IFACE_DEFAULT, PTL_NI_MATCHING | PTL_NI_PHYSICAL,
-                    PTL_PID_ANY, NULL, &MPIDI_nem_ptl_ni_limits, &MPIDI_nem_ptl_ni);
+                    PTL_PID_ANY, &desired, &MPIDI_nem_ptl_ni_limits, &MPIDI_nem_ptl_ni);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlniinit", "**ptlniinit %s", MPID_nem_ptl_strerror(ret));
 
     ret = PtlEQAlloc(MPIDI_nem_ptl_ni, EQ_COUNT, &MPIDI_nem_ptl_eq);

http://git.mpich.org/mpich.git/commitdiff/ea14e01968c919f3c493b95afaf1a6d32c9e7a21

commit ea14e01968c919f3c493b95afaf1a6d32c9e7a21
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Tue Oct 28 13:37:58 2014 -0500

    portals4: reduce unexpected buffer space
    
    Out previous increase consumed too much memory. 8MB should be enough given
    the maximum put size and number of headers.
    
    Signed-off-by: Antonio J. Pena <apenya at mcs.anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c
index 3394ff4..8104eaf 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c
@@ -7,7 +7,7 @@
 #include "ptl_impl.h"
 
 #define OVERFLOW_LENGTH (1024*1024)
-#define NUM_OVERFLOW_ME 512
+#define NUM_OVERFLOW_ME 8
 
 static ptl_handle_me_t overflow_me_handle[NUM_OVERFLOW_ME];
 static void *overflow_buf[NUM_OVERFLOW_ME];

http://git.mpich.org/mpich.git/commitdiff/a019d91415622da976f6ad1d1f19a8525e936970

commit a019d91415622da976f6ad1d1f19a8525e936970
Author: Xin Zhao <xinzhao3 at illinois.edu>
Date:   Tue Oct 28 21:50:09 2014 -0500

    Revert "netmod/mxm: Avoid calling mxm send req handling from mxm send completion callback"
    
    This reverts commit 20f1f116f01d314fa3d6abf2be2a100b1a25d5de.

diff --git a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_impl.h b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_impl.h
index 43070da..3b2bb12 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_impl.h
+++ b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_impl.h
@@ -69,8 +69,6 @@ void MPID_nem_mxm_get_adi_msg(mxm_conn_h conn, mxm_imm_t imm, void *data,
 void MPID_nem_mxm_anysource_posted(MPID_Request * req);
 int MPID_nem_mxm_anysource_matched(MPID_Request * req);
 
-int _mxm_handle_sreq(MPID_Request * req);
-
 /* List type as queue
  * Operations, initialization etc
  */
@@ -176,25 +174,6 @@ typedef struct {
 /* macro for mxm private in REQ */
 #define REQ_BASE(reqp) ((reqp) ? (MPID_nem_mxm_req_area *)((reqp)->ch.netmod_area.padding) : NULL)
 
-typedef GENERIC_Q_DECL(struct MPID_Request) MPID_nem_mxm_reqq_t;
-#define MPID_nem_mxm_queue_empty(q) GENERIC_Q_EMPTY (q)
-#define MPID_nem_mxm_queue_head(q) GENERIC_Q_HEAD (q)
-#define MPID_nem_mxm_queue_enqueue(qp, ep) do {                                           \
-        /* add refcount so req doesn't get freed before it's dequeued */                \
-        MPIR_Request_add_ref(ep);                                                       \
-        MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST,                         \
-                          "MPID_nem_mxm_queue_enqueue req=%p (handle=%#x), queue=%p",     \
-                          ep, (ep)->handle, qp));                                       \
-        GENERIC_Q_ENQUEUE (qp, ep, dev.next);                                           \
-    } while (0)
-#define MPID_nem_mxm_queue_dequeue(qp, ep)  do {                                          \
-        GENERIC_Q_DEQUEUE (qp, ep, dev.next);                                           \
-        MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST,                         \
-                          "MPID_nem_mxm_queue_dequeuereq=%p (handle=%#x), queue=%p",      \
-                          *(ep), *(ep) ? (*(ep))->handle : -1, qp));                    \
-        MPID_Request_release(*(ep));                                                    \
-    } while (0)
-
 typedef struct MPID_nem_mxm_module_t {
     char *runtime_version;
     const char *compiletime_version;
@@ -209,7 +188,6 @@ typedef struct MPID_nem_mxm_module_t {
     int mxm_np;
     MPID_nem_mxm_ep_t *endpoint;
     list_head_t free_queue;
-    MPID_nem_mxm_reqq_t sreq_queue;
     struct {
         int bulk_connect;       /* use bulk connect */
         int bulk_disconnect;    /* use bulk disconnect */
diff --git a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_init.c b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_init.c
index f155b82..ec04d56 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_init.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_init.c
@@ -456,8 +456,6 @@ static int _mxm_init(int rank, int size)
     list_grow_mxm_req(&_mxm_obj.free_queue);
     MPIU_Assert(list_length(&_mxm_obj.free_queue) == MXM_MPICH_MAX_REQ);
 
-    _mxm_obj.sreq_queue.head = _mxm_obj.sreq_queue.tail = NULL;
-
     mxm_obj = &_mxm_obj;
 
   fn_exit:
diff --git a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_poll.c b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_poll.c
index e8bddc3..ba7686e 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_poll.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_poll.c
@@ -24,16 +24,10 @@ static int _mxm_process_rdtype(MPID_Request ** rreq_p, MPI_Datatype datatype,
 int MPID_nem_mxm_poll(int in_blocking_progress)
 {
     int mpi_errno = MPI_SUCCESS;
-    MPID_Request *req = NULL;
 
     MPIDI_STATE_DECL(MPID_STATE_MXM_POLL);
     MPIDI_FUNC_ENTER(MPID_STATE_MXM_POLL);
 
-    while (!MPID_nem_mxm_queue_empty(mxm_obj->sreq_queue)) {
-        MPID_nem_mxm_queue_dequeue(&mxm_obj->sreq_queue, &req);
-        _mxm_handle_sreq(req);
-    }
-
     mpi_errno = _mxm_poll();
     if (mpi_errno)
         MPIU_ERR_POP(mpi_errno);
@@ -78,7 +72,6 @@ void MPID_nem_mxm_get_adi_msg(mxm_conn_h conn, mxm_imm_t imm, void *data,
     vc = mxm_conn_ctx_get(conn);
 
     _dbg_mxm_output(5, "========> Getting ADI msg (from=%d data_size %d) \n", vc->pg_rank, length);
-    _dbg_mxm_out_buf(data, (length > 16 ? 16 : length));
 
     MPID_nem_handle_pkt(vc, data, (MPIDI_msg_sz_t) (length));
 }
@@ -151,10 +144,6 @@ int MPID_nem_mxm_anysource_matched(MPID_Request * req)
 int MPID_nem_mxm_recv(MPIDI_VC_t * vc, MPID_Request * rreq)
 {
     int mpi_errno = MPI_SUCCESS;
-    MPIDI_msg_sz_t data_sz;
-    int dt_contig;
-    MPI_Aint dt_true_lb;
-    MPID_Datatype *dt_ptr;
 
     MPIDI_STATE_DECL(MPID_STATE_MPID_NEM_MXM_RECV);
     MPIDI_FUNC_ENTER(MPID_STATE_MPID_NEM_MXM_RECV);
@@ -163,15 +152,18 @@ int MPID_nem_mxm_recv(MPIDI_VC_t * vc, MPID_Request * rreq)
     MPIU_Assert(((rreq->dev.match.parts.rank == MPI_ANY_SOURCE) && (vc == NULL)) ||
                 (vc && !vc->ch.is_local));
 
-    MPIDI_Datatype_get_info(rreq->dev.user_count, rreq->dev.datatype, dt_contig, data_sz,
-                            dt_ptr, dt_true_lb);
-
     {
         MPIR_Context_id_t context_id = rreq->dev.match.parts.context_id;
         int tag = rreq->dev.match.parts.tag;
+        MPIDI_msg_sz_t data_sz;
+        int dt_contig;
+        MPI_Aint dt_true_lb;
+        MPID_Datatype *dt_ptr;
         MPID_nem_mxm_vc_area *vc_area = NULL;
         MPID_nem_mxm_req_area *req_area = NULL;
 
+        MPIDI_Datatype_get_info(rreq->dev.user_count, rreq->dev.datatype, dt_contig, data_sz,
+                                dt_ptr, dt_true_lb);
         rreq->dev.OnDataAvail = NULL;
         rreq->dev.tmpbuf = NULL;
         rreq->ch.vc = vc;
@@ -231,6 +223,7 @@ static int _mxm_handle_rreq(MPID_Request * req)
     MPIDI_msg_sz_t userbuf_sz;
     MPID_Datatype *dt_ptr;
     MPIDI_msg_sz_t data_sz;
+    MPIDI_VC_t *vc = NULL;
     MPID_nem_mxm_vc_area *vc_area ATTRIBUTE((unused)) = NULL;
     MPID_nem_mxm_req_area *req_area = NULL;
     void *tmp_buf = NULL;
@@ -326,7 +319,7 @@ static int _mxm_handle_rreq(MPID_Request * req)
         }
     }
 
-    MPIDI_CH3U_Handle_recv_req(req->ch.vc, req, &complete);
+    MPIDI_CH3U_Handle_recv_req(vc, req, &complete);
     MPIU_Assert(complete == TRUE);
 
     if (tmp_buf) MPIU_Free(tmp_buf);
diff --git a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_send.c b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_send.c
index 69f3adc..75003bf 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_send.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_send.c
@@ -15,6 +15,7 @@ enum {
 };
 
 
+static int _mxm_handle_sreq(MPID_Request * req);
 static void _mxm_send_completion_cb(void *context);
 static int _mxm_isend(MPID_nem_mxm_ep_t * ep, MPID_nem_mxm_req_area * req,
                       int type, mxm_mq_h mxm_mq, int mxm_rank, int id, mxm_tag_t tag, int block);
@@ -234,7 +235,6 @@ int MPID_nem_mxm_send(MPIDI_VC_t * vc, const void *buf, int count, MPI_Datatype
     MPIDI_Request_create_sreq(sreq, mpi_errno, goto fn_exit);
     MPIU_Assert(sreq != NULL);
     MPIDI_Request_set_type(sreq, MPIDI_REQUEST_TYPE_SEND);
-
     MPIDI_VC_FAI_send_seqnum(vc, seqnum);
     MPIDI_Request_set_seqnum(sreq, seqnum);
     if (HANDLE_GET_KIND(datatype) != HANDLE_KIND_BUILTIN) {
@@ -336,8 +336,7 @@ int MPID_nem_mxm_ssend(MPIDI_VC_t * vc, const void *buf, int count, MPI_Datatype
     /* create a request */
     MPIDI_Request_create_sreq(sreq, mpi_errno, goto fn_exit);
     MPIU_Assert(sreq != NULL);
-    MPIDI_Request_set_type(sreq, MPIDI_REQUEST_TYPE_SSEND);
-
+    MPIDI_Request_set_type(sreq, MPIDI_REQUEST_TYPE_SEND);
     MPIDI_VC_FAI_send_seqnum(vc, seqnum);
     MPIDI_Request_set_seqnum(sreq, seqnum);
     if (HANDLE_GET_KIND(datatype) != HANDLE_KIND_BUILTIN) {
@@ -440,7 +439,6 @@ int MPID_nem_mxm_isend(MPIDI_VC_t * vc, const void *buf, int count, MPI_Datatype
     MPIDI_Request_create_sreq(sreq, mpi_errno, goto fn_exit);
     MPIU_Assert(sreq != NULL);
     MPIDI_Request_set_type(sreq, MPIDI_REQUEST_TYPE_SEND);
-
     MPIDI_VC_FAI_send_seqnum(vc, seqnum);
     MPIDI_Request_set_seqnum(sreq, seqnum);
     if (HANDLE_GET_KIND(datatype) != HANDLE_KIND_BUILTIN) {
@@ -543,8 +541,7 @@ int MPID_nem_mxm_issend(MPIDI_VC_t * vc, const void *buf, int count, MPI_Datatyp
     /* create a request */
     MPIDI_Request_create_sreq(sreq, mpi_errno, goto fn_exit);
     MPIU_Assert(sreq != NULL);
-    MPIDI_Request_set_type(sreq, MPIDI_REQUEST_TYPE_SSEND);
-
+    MPIDI_Request_set_type(sreq, MPIDI_REQUEST_TYPE_SEND);
     MPIDI_VC_FAI_send_seqnum(vc, seqnum);
     MPIDI_Request_set_seqnum(sreq, seqnum);
     if (HANDLE_GET_KIND(datatype) != HANDLE_KIND_BUILTIN) {
@@ -622,9 +619,10 @@ int MPID_nem_mxm_issend(MPIDI_VC_t * vc, const void *buf, int count, MPI_Datatyp
 }
 
 
-int _mxm_handle_sreq(MPID_Request * req)
+static int _mxm_handle_sreq(MPID_Request * req)
 {
     int complete = FALSE;
+    int (*reqFn) (MPIDI_VC_t *, MPID_Request *, int *);
     MPID_nem_mxm_vc_area *vc_area = NULL;
     MPID_nem_mxm_req_area *req_area = NULL;
 
@@ -636,10 +634,8 @@ int _mxm_handle_sreq(MPID_Request * req)
                       16 ? 16 : req_area->iov_buf[0].length));
 
     vc_area->pending_sends -= 1;
-    if (req->dev.tmpbuf) {
-        if (req->dev.datatype_ptr || req->ch.noncontig) {
-            MPIU_Free(req->dev.tmpbuf);
-        }
+    if (((req->dev.datatype_ptr != NULL) && (req->dev.tmpbuf != NULL))) {
+        MPIU_Free(req->dev.tmpbuf);
     }
 
     if (req_area->iov_count > MXM_MPICH_MAX_IOV) {
@@ -648,8 +644,19 @@ int _mxm_handle_sreq(MPID_Request * req)
         req_area->iov_count = 0;
     }
 
-    MPIDI_CH3U_Handle_send_req(req->ch.vc, req, &complete);
-    MPIU_Assert(complete == TRUE);
+    reqFn = req->dev.OnDataAvail;
+    if (!reqFn) {
+        MPIDI_CH3U_Request_complete(req);
+        MPIU_DBG_MSG(CH3_CHANNEL, VERBOSE, ".... complete");
+    }
+    else {
+        MPIDI_VC_t *vc = req->ch.vc;
+
+        reqFn(vc, req, &complete);
+        if (!complete) {
+            MPIU_Assert(complete == TRUE);
+        }
+    }
 
     return complete;
 }
@@ -676,7 +683,7 @@ static void _mxm_send_completion_cb(void *context)
                     req, req->status.MPI_ERROR);
 
     if (likely(!MPIR_STATUS_GET_CANCEL_BIT(req->status))) {
-        MPID_nem_mxm_queue_enqueue(&mxm_obj->sreq_queue, req);
+        _mxm_handle_sreq(req);
     }
 }
 

http://git.mpich.org/mpich.git/commitdiff/85e5828612587e5fbc9ea1415fba9263a9f5e168

commit 85e5828612587e5fbc9ea1415fba9263a9f5e168
Author: Paul Coffman <pkcoff at us.ibm.com>
Date:   Mon Oct 27 21:42:43 2014 -0500

    Assign large blocks first in ADIOI_GPFS_Calc_file_domains
    
    For files that are less than the size of a gpfs block there seems to be
    an issue if successive MPI_File_write_at_all are called with proceeding
    offsets.  Given the simple case of 2 aggs, the 2nd agg/fd will be utilized,
    however the initial offset into the 2nd agg is distorted on the 2nd call
    to MPI_File_write_at_all because of the negative size of the 1st agg/fd
    because the offset info the 2nd agg/fd is influenced by the size of the
    first.  Simple solution is to reverse the default large block assignment so
    in the case where only 1 agg/fd will be used it will be the first.  By chance
    in the 2 agg situation this is what the GPFSMPIO_BALANCECONTIG
    optimization does and it does not have this problem.
    
    Signed-off-by: Rob Latham <robl at mcs.anl.gov>

diff --git a/src/mpi/romio/adio/ad_gpfs/ad_gpfs_aggrs.c b/src/mpi/romio/adio/ad_gpfs/ad_gpfs_aggrs.c
index 0e67b54..e403d9b 100644
--- a/src/mpi/romio/adio/ad_gpfs/ad_gpfs_aggrs.c
+++ b/src/mpi/romio/adio/ad_gpfs/ad_gpfs_aggrs.c
@@ -367,10 +367,10 @@ void ADIOI_GPFS_Calc_file_domains(ADIO_File fd,
 	/* BG/L- and BG/P-style distribution of file domains: simple allocation of
 	 * file domins to each aggregator */
 	for (i=0; i<naggs; i++) {
-	    if (i < naggs_small) {
-		fd_size[i] = nb_cn_small     * blksize;
-	    } else {
+	    if (i < naggs_large) {
 		fd_size[i] = (nb_cn_small+1) * blksize;
+	    } else {
+		fd_size[i] = nb_cn_small     * blksize;
 	    }
 	}
     }
@@ -387,12 +387,12 @@ void ADIOI_GPFS_Calc_file_domains(ADIO_File fd,
 
 #else // not BGQ platform
 	for (i=0; i<naggs; i++) {
-	    if (i < naggs_small) {
-		fd_size[i] = nb_cn_small     * blksize;
-	    } else {
+	    if (i < naggs_large) {
 		fd_size[i] = (nb_cn_small+1) * blksize;
+	    } else {
+		fd_size[i] = nb_cn_small     * blksize;
 	    }
-	}
+    }
 
 #endif
 

http://git.mpich.org/mpich.git/commitdiff/52a4eea12b149d6a3151682a4c3eecd15599fdf6

commit 52a4eea12b149d6a3151682a4c3eecd15599fdf6
Author: Paul Coffman <pkcoff at us.ibm.com>
Date:   Mon Oct 27 13:57:22 2014 -0500

    MP_IOTASKLIST error checking
    
    PE users may manually specify the MP_IOTASKLIST for explicit aggregator
    selection.  Code needed to be added to verify that the user
    specification of aggregators were all valid.
    
    Do our best to maintain the old PE behavior of using as much of the
    correctly specified MP_IOTASKLIST as possible and issuing what it
    labeled error messages but were really warnings about the incorrect
    portions and functionally just ignoring it, unless none of it was usable
    in which case it fell back on the default.
    
    Signed-off-by: Rob Latham <robl at mcs.anl.gov>

diff --git a/src/mpi/romio/adio/ad_gpfs/pe/ad_pe_aggrs.c b/src/mpi/romio/adio/ad_gpfs/pe/ad_pe_aggrs.c
index dfeeff5..8453238 100644
--- a/src/mpi/romio/adio/ad_gpfs/pe/ad_pe_aggrs.c
+++ b/src/mpi/romio/adio/ad_gpfs/pe/ad_pe_aggrs.c
@@ -60,50 +60,89 @@ ADIOI_PE_gen_agg_ranklist(ADIO_File fd)
     int i,j;
     int inTERcommFlag = 0;
 
-    int myRank;
+    int myRank,commSize;
     MPI_Comm_rank(fd->comm, &myRank);
+    MPI_Comm_size(fd->comm, &commSize);
 
     MPI_Comm_test_inter(fd->comm, &inTERcommFlag);
     if (inTERcommFlag) {
-      FPRINTF(stderr,"inTERcomms are not supported in MPI-IO - aborting....\n");
+      FPRINTF(stderr,"ERROR: ATTENTION: inTERcomms are not supported in MPI-IO - aborting....\n");
       perror("ADIOI_PE_gen_agg_ranklist:");
       MPI_Abort(MPI_COMM_WORLD, 1);
     }
 
     if (ioTaskList) {
+      int ioTaskListLen = strlen(ioTaskList);
+      int ioTaskListPos = 0;
       char tmpBuf[8];   /* Big enough for 1M tasks (7 digits task ID). */
       tmpBuf[7] = '\0';
       for (i=0; i<7; i++) {
          tmpBuf[i] = *ioTaskList++;      /* Maximum is 7 digits for 1 million. */
+         ioTaskListPos++;
          if (*ioTaskList == ':') {       /* If the next char is a ':' ends it. */
              tmpBuf[i+1] = '\0';
              break;
          }
       }
       numAggs = atoi(tmpBuf);
+      if (numAggs == 0)
+        FPRINTF(stderr,"ERROR: ATTENTION: Number of aggregators specified in MP_IOTASKLIST set at 0 - default aggregator selection will be used.\n");
+      else if (!((numAggs > 0 ) && (numAggs <= commSize))) {
+        FPRINTF(stderr,"ERROR: ATTENTION: The number of aggregators (%s) specified in MP_IOTASKLIST is outside the communicator task range of %d.\n",tmpBuf,commSize);
+        numAggs = commSize;
+      }
       fd->hints->ranklist = (int *) ADIOI_Malloc (numAggs * sizeof(int));
 
-      for (j=0; j<numAggs; j++) {
+      int aggIndex = 0;
+      while (aggIndex < numAggs) {
          ioTaskList++;                /* Advance past the ':' */
+         ioTaskListPos++;
+         int allDigits=1;
          for (i=0; i<7; i++) {
+            if (*ioTaskList < '0' || *ioTaskList > '9')
+              allDigits=0;
             tmpBuf[i] = *ioTaskList++;
+            ioTaskListPos++;
             if ( (*ioTaskList == ':') || (*ioTaskList == '\0') ) {
                 tmpBuf[i+1] = '\0';
                 break;
             }
          }
-         fd->hints->ranklist[j] = atoi(tmpBuf);
+         if (allDigits) {
+           int newAggRank = atoi(tmpBuf);
+           if (!((newAggRank >= 0 ) && (newAggRank < commSize))) {
+             FPRINTF(stderr,"ERROR: ATTENTION: The aggregator '%s' specified in MP_IOTASKLIST is not within the communicator task range of 0 to %d  - it will be ignored.\n",tmpBuf,commSize-1);
+           }
+           else {
+             int aggAlreadyAdded = 0;
+             for (i=0;i<aggIndex;i++)
+               if (fd->hints->ranklist[i] == newAggRank) {
+                 aggAlreadyAdded = 1;
+                 break;
+               }
+             if (!aggAlreadyAdded)
+               fd->hints->ranklist[aggIndex++] = newAggRank;
+             else
+               FPRINTF(stderr,"ERROR: ATTENTION: The aggregator '%d' is specified multiple times in MP_IOTASKLIST - duplicates are ignored.\n",newAggRank);
+           }
+         }
+         else {
+           FPRINTF(stderr,"ERROR: ATTENTION: The aggregator '%s' specified in MP_IOTASKLIST is not a valid integer task id  - it will be ignored.\n",tmpBuf);
+         }
 
          /* At the end check whether the list is shorter than specified. */
-         if (*ioTaskList == '\0') {
-            if (j < (numAggs-1)) {
-               numAggs = j;
-            }
-            break;
+         if (ioTaskListPos == ioTaskListLen) {
+           if (aggIndex == 0) {
+             FPRINTF(stderr,"ERROR: ATTENTION: No aggregators were correctly specified in MP_IOTASKLIST - default aggregator selection will be used.\n");
+             ADIOI_Free(fd->hints->ranklist);
+           }
+           else if (aggIndex < numAggs)
+             FPRINTF(stderr,"ERROR: ATTENTION: %d aggregators were specified in MP_IOTASKLIST but only %d were correctly specified - setting the number of aggregators to %d.\n",numAggs, aggIndex,aggIndex);
+           numAggs = aggIndex;
          }
       }
     }
-    else {
+    if (numAggs == 0)  {
       MPID_Comm *mpidCommData;
 
       MPID_Comm_get_ptr(fd->comm,mpidCommData);

http://git.mpich.org/mpich.git/commitdiff/fcd9f271157054099b4868e8eee3c3314a730ee9

commit fcd9f271157054099b4868e8eee3c3314a730ee9
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Thu Oct 23 17:02:59 2014 -0500

    portals4: fix get buffer location for packed sends
    
    This code did not account for the fact that the first part of the message
    is already sent in a PtlPut.
    
    Signed-off-by: Pavan Balaji <balaji at anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
index 0ab2bc7..72d64b3 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
@@ -379,8 +379,8 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
     MPI_nem_ptl_pack_byte(sreq->dev.segment_ptr, 0, data_sz, REQ_PTL(sreq)->chunk_buffer[0], &REQ_PTL(sreq)->overflow[0]);
 
     /* create ME for buffer so receiver can issue a GET for the data */
-    me.start = REQ_PTL(sreq)->chunk_buffer[0];
-    me.length = data_sz;
+    me.start = (char *)REQ_PTL(sreq)->chunk_buffer[0] + PTL_LARGE_THRESHOLD;
+    me.length = data_sz - PTL_LARGE_THRESHOLD;
     me.ct_handle = PTL_CT_NONE;
     me.uid = PTL_UID_ANY;
     me.options = ( PTL_ME_OP_PUT | PTL_ME_OP_GET | PTL_ME_USE_ONCE | PTL_ME_IS_ACCESSIBLE | PTL_ME_EVENT_LINK_DISABLE |

http://git.mpich.org/mpich.git/commitdiff/730c2fd96f3fba3bf263803b70b62e928c76958b

commit 730c2fd96f3fba3bf263803b70b62e928c76958b
Author: Pavan Balaji <balaji at anl.gov>
Date:   Tue Oct 7 22:11:10 2014 -0500

    MPICH-specific initialization of mxm.
    
    The defaults used by MXM might not be fully appropriate for mpich.  So
    we automatically initialize it to our preferred defaults unless the
    user is trying to override it.
    
    Signed-off-by: Xin Zhao <xinzhao3 at illinois.edu>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_init.c b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_init.c
index 37e0558..f155b82 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_init.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_init.c
@@ -107,6 +107,7 @@ static int _mxm_conf(void);
 #define FCNAME MPIDI_QUOTE(FUNCNAME)
 int MPID_nem_mxm_init(MPIDI_PG_t * pg_p, int pg_rank, char **bc_val_p, int *val_max_sz_p)
 {
+    int r;
     int mpi_errno = MPI_SUCCESS;
 
     MPIDI_STATE_DECL(MPID_STATE_MXM_INIT);
@@ -116,6 +117,15 @@ int MPID_nem_mxm_init(MPIDI_PG_t * pg_p, int pg_rank, char **bc_val_p, int *val_
     MPIU_Assert(sizeof(MPID_nem_mxm_vc_area) <= MPID_NEM_VC_NETMOD_AREA_LEN);
     MPIU_Assert(sizeof(MPID_nem_mxm_req_area) <= MPID_NEM_REQ_NETMOD_AREA_LEN);
 
+
+    /* mpich-specific initialization of mxm */
+    /* check if the user is not trying to override the tls setting
+     * before resetting it */
+    if (getenv("MXM_TLS") == NULL) {
+        r = MPL_putenv("MXM_TLS=rc,dc,ud");
+        MPIU_ERR_CHKANDJUMP(r, mpi_errno, MPI_ERR_OTHER, "**putenv");
+    }
+
     mpi_errno = _mxm_init(pg_rank, pg_p->size);
     if (mpi_errno)
         MPIU_ERR_POP(mpi_errno);

http://git.mpich.org/mpich.git/commitdiff/c038bc9e2c21e6c3b8429e8af20e7940387a899b

commit c038bc9e2c21e6c3b8429e8af20e7940387a899b
Author: Xin Zhao <xinzhao3 at illinois.edu>
Date:   Sat Oct 25 16:15:59 2014 -0500

    Revert "MPICH-specific initialization of mxm."
    
    This reverts commit 4ce4103ae1ee924c8d27dc5202f669e6bb5e8a35.

diff --git a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_init.c b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_init.c
index f155b82..37e0558 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_init.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_init.c
@@ -107,7 +107,6 @@ static int _mxm_conf(void);
 #define FCNAME MPIDI_QUOTE(FUNCNAME)
 int MPID_nem_mxm_init(MPIDI_PG_t * pg_p, int pg_rank, char **bc_val_p, int *val_max_sz_p)
 {
-    int r;
     int mpi_errno = MPI_SUCCESS;
 
     MPIDI_STATE_DECL(MPID_STATE_MXM_INIT);
@@ -117,15 +116,6 @@ int MPID_nem_mxm_init(MPIDI_PG_t * pg_p, int pg_rank, char **bc_val_p, int *val_
     MPIU_Assert(sizeof(MPID_nem_mxm_vc_area) <= MPID_NEM_VC_NETMOD_AREA_LEN);
     MPIU_Assert(sizeof(MPID_nem_mxm_req_area) <= MPID_NEM_REQ_NETMOD_AREA_LEN);
 
-
-    /* mpich-specific initialization of mxm */
-    /* check if the user is not trying to override the tls setting
-     * before resetting it */
-    if (getenv("MXM_TLS") == NULL) {
-        r = MPL_putenv("MXM_TLS=rc,dc,ud");
-        MPIU_ERR_CHKANDJUMP(r, mpi_errno, MPI_ERR_OTHER, "**putenv");
-    }
-
     mpi_errno = _mxm_init(pg_rank, pg_p->size);
     if (mpi_errno)
         MPIU_ERR_POP(mpi_errno);

http://git.mpich.org/mpich.git/commitdiff/f1a258b10c4b1ae4c17c5dee1f440be7d76954b5

commit f1a258b10c4b1ae4c17c5dee1f440be7d76954b5
Author: Devendar Bureddy <devendar at mellanox.com>
Date:   Sat Oct 25 03:04:00 2014 +0300

    Fix for threaded_sr
    
    Do not block in mxm for blocking sends. Instead , set req pointer let
    MPICH mpi layer to block on it. This will allow progress go to through
    MPIDI_CH3I_Progress(), which can release the global mutex in thread
    multiple case.
    
    Signed-off-by: Pavan Balaji <balaji at anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_send.c b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_send.c
index 7e490b3..69f3adc 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_send.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_send.c
@@ -297,7 +297,7 @@ int MPID_nem_mxm_send(MPIDI_VC_t * vc, const void *buf, int count, MPI_Datatype
                                                                                               comm->context_id
                                                                                               +
                                                                                               context_offset),
-                           1);
+                           0);
     if (mpi_errno)
         MPIU_ERR_POP(mpi_errno);
 
@@ -400,7 +400,7 @@ int MPID_nem_mxm_ssend(MPIDI_VC_t * vc, const void *buf, int count, MPI_Datatype
                                                                                               comm->context_id
                                                                                               +
                                                                                               context_offset),
-                           1);
+                           0);
     if (mpi_errno)
         MPIU_ERR_POP(mpi_errno);
 

http://git.mpich.org/mpich.git/commitdiff/3e73779aa952521b7b3f18651aaa8b91a1128ed4

commit 3e73779aa952521b7b3f18651aaa8b91a1128ed4
Author: Rob Latham <robl at mcs.anl.gov>
Date:   Fri Oct 24 09:31:28 2014 -0500

    guard against null file data representation
    
    "native", "internal", and "external32" are the only valid values for
    datarep, but if a user passes null
    (http://stackoverflow.com/questions/26548398/segmentation-fault-while-using-mpi-file-read-at/)
    then strcmp will segfault.
    
    Signed-off-by: Wesley Bland <wbland at anl.gov>

diff --git a/src/mpi/romio/mpi-io/set_view.c b/src/mpi/romio/mpi-io/set_view.c
index 7f871c7..4a820a8 100644
--- a/src/mpi/romio/mpi-io/set_view.c
+++ b/src/mpi/romio/mpi-io/set_view.c
@@ -126,12 +126,12 @@ int MPI_File_set_view(MPI_File fh, MPI_Offset disp, MPI_Datatype etype,
 	goto fn_exit;
     }
 
-    if (strcmp(datarep, "native") && 
+    if ((datarep == NULL) || (strcmp(datarep, "native") &&
 	    strcmp(datarep, "NATIVE") &&
 	    strcmp(datarep, "external32") &&
 	    strcmp(datarep, "EXTERNAL32") &&
 	    strcmp(datarep, "internal") &&
-	    strcmp(datarep, "INTERNAL"))
+	    strcmp(datarep, "INTERNAL")) )
     {
 	error_code = MPIO_Err_create_code(MPI_SUCCESS, MPIR_ERR_RECOVERABLE,
 					  myname, __LINE__,

http://git.mpich.org/mpich.git/commitdiff/abfad0fca2124c496b584c7e1595110e9ab29f37

commit abfad0fca2124c496b584c7e1595110e9ab29f37
Author: Pavan Balaji <balaji at anl.gov>
Date:   Tue Oct 7 22:11:10 2014 -0500

    MPICH-specific initialization of mxm.
    
    The defaults used by MXM might not be fully appropriate for mpich.  So
    we automatically initialize it to our preferred defaults unless the
    user is trying to override it.
    
    Signed-off-by: Xin Zhao <xinzhao3 at illinois.edu>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_init.c b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_init.c
index 37e0558..f155b82 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_init.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_init.c
@@ -107,6 +107,7 @@ static int _mxm_conf(void);
 #define FCNAME MPIDI_QUOTE(FUNCNAME)
 int MPID_nem_mxm_init(MPIDI_PG_t * pg_p, int pg_rank, char **bc_val_p, int *val_max_sz_p)
 {
+    int r;
     int mpi_errno = MPI_SUCCESS;
 
     MPIDI_STATE_DECL(MPID_STATE_MXM_INIT);
@@ -116,6 +117,15 @@ int MPID_nem_mxm_init(MPIDI_PG_t * pg_p, int pg_rank, char **bc_val_p, int *val_
     MPIU_Assert(sizeof(MPID_nem_mxm_vc_area) <= MPID_NEM_VC_NETMOD_AREA_LEN);
     MPIU_Assert(sizeof(MPID_nem_mxm_req_area) <= MPID_NEM_REQ_NETMOD_AREA_LEN);
 
+
+    /* mpich-specific initialization of mxm */
+    /* check if the user is not trying to override the tls setting
+     * before resetting it */
+    if (getenv("MXM_TLS") == NULL) {
+        r = MPL_putenv("MXM_TLS=rc,dc,ud");
+        MPIU_ERR_CHKANDJUMP(r, mpi_errno, MPI_ERR_OTHER, "**putenv");
+    }
+
     mpi_errno = _mxm_init(pg_rank, pg_p->size);
     if (mpi_errno)
         MPIU_ERR_POP(mpi_errno);

http://git.mpich.org/mpich.git/commitdiff/ace5acc6f9e7762debbc234a10b69c4161535998

commit ace5acc6f9e7762debbc234a10b69c4161535998
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Thu Oct 23 10:14:08 2014 -0500

    portals4: increase unexpected message limits
    
    Set the unpexected message header limit to 2 million and allocate
    512MB of buffer space.
    
    Signed-off-by: Pavan Balaji <balaji at anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
index 09d84f7..9535797 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
@@ -101,6 +101,13 @@ static int ptl_init(MPIDI_PG_t *pg_p, int pg_rank, char **bc_val_p, int *val_max
 
     MPIDI_Anysource_improbe_fn = MPID_nem_ptl_anysource_improbe;
 
+    /* set the unexpected header limit before PtlInit, unless it is already set in the env */
+    if (getenv("PTL_LIM_MAX_UNEXPECTED_HEADERS") == NULL) {
+        char *envstr = MPIU_Strdup("PTL_LIM_MAX_UNEXPECTED_HEADERS=2000000");
+        MPL_putenv(envstr);
+        MPIU_Free(envstr);
+    }
+
     /* init portals */
     ret = PtlInit();
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlinit", "**ptlinit %s", MPID_nem_ptl_strerror(ret));
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c
index 2167e18..3394ff4 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_poll.c
@@ -7,7 +7,7 @@
 #include "ptl_impl.h"
 
 #define OVERFLOW_LENGTH (1024*1024)
-#define NUM_OVERFLOW_ME 50
+#define NUM_OVERFLOW_ME 512
 
 static ptl_handle_me_t overflow_me_handle[NUM_OVERFLOW_ME];
 static void *overflow_buf[NUM_OVERFLOW_ME];

http://git.mpich.org/mpich.git/commitdiff/b9f30794ca2f45fc3d5dbda4b7b5d04cc3083cbd

commit b9f30794ca2f45fc3d5dbda4b7b5d04cc3083cbd
Author: Wesley Bland <wbland at anl.gov>
Date:   Thu Oct 23 12:47:58 2014 -0500

    Fix typo in 72513b14

diff --git a/src/mpi/coll/helper_fns.c b/src/mpi/coll/helper_fns.c
index a730f7e..8fbbe56 100644
--- a/src/mpi/coll/helper_fns.c
+++ b/src/mpi/coll/helper_fns.c
@@ -730,7 +730,7 @@ int MPIC_Waitall(int numreq, MPI_Request requests[], MPI_Status statuses[], int
 
  fn_exit:
     MPIU_DBG_MSG_S(PT2PT, TYPICAL, "OUT: errflag = %s", *errflag?"TRUE":"FALSE");
-    MPIDI_FUNC_EXIT(MPID_STATE_MPIC_WAITALL
+    MPIDI_FUNC_EXIT(MPID_STATE_MPIC_WAITALL);
     return mpi_errno;
  fn_fail:
     goto fn_exit;

http://git.mpich.org/mpich.git/commitdiff/621c618061176588d8d8d96737c925c86ba51348

commit 621c618061176588d8d8d96737c925c86ba51348
Author: Wesley Bland <wbland at anl.gov>
Date:   Thu Oct 23 10:13:42 2014 -0500

    Remove _FT from state names
    
    Back in the 3.1 series, we made the FT versions of all of the MPIC functions
    default. However, we never changed the names of all of the states. This
    removes the extra state names.
    
    No reviewer.

diff --git a/src/mpi/coll/helper_fns.c b/src/mpi/coll/helper_fns.c
index 1ee252c..a730f7e 100644
--- a/src/mpi/coll/helper_fns.c
+++ b/src/mpi/coll/helper_fns.c
@@ -284,9 +284,9 @@ int MPIC_Send(const void *buf, int count, MPI_Datatype datatype, int dest, int t
     int context_id;
     MPID_Request *request_ptr = NULL;
     MPID_Comm *comm_ptr = NULL;
-    MPIDI_STATE_DECL(MPID_STATE_MPIC_SEND_FT);
+    MPIDI_STATE_DECL(MPID_STATE_MPIC_SEND);
 
-    MPIDI_FUNC_ENTER(MPID_STATE_MPIC_SEND_FT);
+    MPIDI_FUNC_ENTER(MPID_STATE_MPIC_SEND);
 
     MPIU_DBG_MSG_S(PT2PT, TYPICAL, "IN: errflag = %s", *errflag?"TRUE":"FALSE");
 
@@ -310,7 +310,7 @@ int MPIC_Send(const void *buf, int count, MPI_Datatype datatype, int dest, int t
     }
 
  fn_exit:
-    MPIDI_FUNC_EXIT(MPID_STATE_MPIC_SEND_FT);
+    MPIDI_FUNC_EXIT(MPID_STATE_MPIC_SEND);
     return mpi_errno;
  fn_fail:
     /* --BEGIN ERROR HANDLING-- */
@@ -331,9 +331,9 @@ int MPIC_Recv(void *buf, int count, MPI_Datatype datatype, int source, int tag,
     MPI_Status mystatus;
     MPID_Request *request_ptr = NULL;
     MPID_Comm *comm_ptr = NULL;
-    MPIDI_STATE_DECL(MPID_STATE_MPIC_RECV_FT);
+    MPIDI_STATE_DECL(MPID_STATE_MPIC_RECV);
 
-    MPIDI_FUNC_ENTER(MPID_STATE_MPIC_RECV_FT);
+    MPIDI_FUNC_ENTER(MPID_STATE_MPIC_RECV);
 
     MPIU_DBG_MSG_S(PT2PT, TYPICAL, "IN: errflag = %s", *errflag?"TRUE":"FALSE");
 
@@ -374,7 +374,7 @@ int MPIC_Recv(void *buf, int count, MPI_Datatype datatype, int source, int tag,
 
  fn_exit:
     MPIU_DBG_MSG_S(PT2PT, TYPICAL, "OUT: errflag = %s", *errflag?"TRUE":"FALSE");
-    MPIDI_FUNC_EXIT(MPID_STATE_MPIC_RECV_FT);
+    MPIDI_FUNC_EXIT(MPID_STATE_MPIC_RECV);
     return mpi_errno;
  fn_fail:
     /* --BEGIN ERROR HANDLING-- */
@@ -394,9 +394,9 @@ int MPIC_Ssend(const void *buf, int count, MPI_Datatype datatype, int dest, int
     int context_id;
     MPID_Request *request_ptr = NULL;
     MPID_Comm *comm_ptr = NULL;
-    MPIDI_STATE_DECL(MPID_STATE_MPIC_SSEND_FT);
+    MPIDI_STATE_DECL(MPID_STATE_MPIC_SSEND);
 
-    MPIDI_FUNC_ENTER(MPID_STATE_MPIC_SSEND_FT);
+    MPIDI_FUNC_ENTER(MPID_STATE_MPIC_SSEND);
 
     MPIU_DBG_MSG_S(PT2PT, TYPICAL, "IN: errflag = %s", *errflag?"TRUE":"FALSE");
 
@@ -420,7 +420,7 @@ int MPIC_Ssend(const void *buf, int count, MPI_Datatype datatype, int dest, int
     }
 
  fn_exit:
-    MPIDI_FUNC_EXIT(MPID_STATE_MPIC_SSEND_FT);
+    MPIDI_FUNC_EXIT(MPID_STATE_MPIC_SSEND);
     return mpi_errno;
  fn_fail:
     /* --BEGIN ERROR HANDLING-- */
@@ -443,9 +443,9 @@ int MPIC_Sendrecv(const void *sendbuf, int sendcount, MPI_Datatype sendtype,
     MPI_Status mystatus;
     MPID_Request *recv_req_ptr = NULL, *send_req_ptr = NULL;
     MPID_Comm *comm_ptr = NULL;
-    MPIDI_STATE_DECL(MPID_STATE_MPIC_SENDRECV_FT);
+    MPIDI_STATE_DECL(MPID_STATE_MPIC_SENDRECV);
 
-    MPIDI_FUNC_ENTER(MPID_STATE_MPIC_SENDRECV_FT);
+    MPIDI_FUNC_ENTER(MPID_STATE_MPIC_SENDRECV);
 
     MPIU_DBG_MSG_S(PT2PT, TYPICAL, "IN: errflag = %s", *errflag?"TRUE":"FALSE");
 
@@ -494,7 +494,7 @@ int MPIC_Sendrecv(const void *sendbuf, int sendcount, MPI_Datatype sendtype,
  fn_exit:
     MPIU_DBG_MSG_S(PT2PT, TYPICAL, "OUT: errflag = %s", *errflag?"TRUE":"FALSE");
 
-    MPIDI_FUNC_EXIT(MPID_STATE_MPIC_SENDRECV_FT);
+    MPIDI_FUNC_EXIT(MPID_STATE_MPIC_SENDRECV);
     return mpi_errno;
  fn_fail:
     goto fn_exit;
@@ -522,13 +522,13 @@ int MPIC_Sendrecv_replace(void *buf, int count, MPI_Datatype datatype,
     MPI_Aint tmpbuf_count = 0;
     MPID_Comm *comm_ptr;
     MPIU_CHKLMEM_DECL(1);
-    MPIDI_STATE_DECL(MPID_STATE_MPIC_SENDRECV_REPLACE_FT);
+    MPIDI_STATE_DECL(MPID_STATE_MPIC_SENDRECV_REPLACE);
 #ifdef MPID_LOG_ARROWS
     /* The logging macros log sendcount and recvcount */
     int sendcount = count, recvcount = count;
 #endif
 
-    MPIDI_FUNC_ENTER(MPID_STATE_MPIC_SENDRECV_REPLACE_FT);
+    MPIDI_FUNC_ENTER(MPID_STATE_MPIC_SENDRECV_REPLACE);
 
     MPIU_DBG_MSG_S(PT2PT, TYPICAL, "IN: errflag = %s", *errflag?"TRUE":"FALSE");
 
@@ -610,7 +610,7 @@ int MPIC_Sendrecv_replace(void *buf, int count, MPI_Datatype datatype,
  fn_exit:
     MPIU_CHKLMEM_FREEALL();
     MPIU_DBG_MSG_S(PT2PT, TYPICAL, "OUT: errflag = %s", *errflag?"TRUE":"FALSE");
-    MPIDI_FUNC_EXIT(MPID_STATE_MPIC_SENDRECV_REPLACE_FT);
+    MPIDI_FUNC_EXIT(MPID_STATE_MPIC_SENDRECV_REPLACE);
     return mpi_errno;
  fn_fail:
     goto fn_exit;
@@ -627,9 +627,9 @@ int MPIC_Isend(const void *buf, int count, MPI_Datatype datatype, int dest, int
     int context_id;
     MPID_Request *request_ptr = NULL;
     MPID_Comm *comm_ptr = NULL;
-    MPIDI_STATE_DECL(MPID_STATE_MPIC_ISEND_FT);
+    MPIDI_STATE_DECL(MPID_STATE_MPIC_ISEND);
 
-    MPIDI_FUNC_ENTER(MPID_STATE_MPIC_ISEND_FT);
+    MPIDI_FUNC_ENTER(MPID_STATE_MPIC_ISEND);
 
     MPIU_DBG_MSG_S(PT2PT, TYPICAL, "IN: errflag = %s", *errflag?"TRUE":"FALSE");
 
@@ -650,7 +650,7 @@ int MPIC_Isend(const void *buf, int count, MPI_Datatype datatype, int dest, int
     *request = request_ptr->handle;
 
  fn_exit:
-    MPIDI_FUNC_EXIT(MPID_STATE_MPIC_ISEND_FT);
+    MPIDI_FUNC_EXIT(MPID_STATE_MPIC_ISEND);
     return mpi_errno;
  fn_fail:
     goto fn_exit;
@@ -667,9 +667,9 @@ int MPIC_Irecv(void *buf, int count, MPI_Datatype datatype, int source,
     int context_id;
     MPID_Request *request_ptr = NULL;
     MPID_Comm *comm_ptr = NULL;
-    MPIDI_STATE_DECL(MPID_STATE_MPIC_IRECV_FT);
+    MPIDI_STATE_DECL(MPID_STATE_MPIC_IRECV);
 
-    MPIDI_FUNC_ENTER(MPID_STATE_MPIC_IRECV_FT);
+    MPIDI_FUNC_ENTER(MPID_STATE_MPIC_IRECV);
 
     MPIU_ERR_CHKANDJUMP1((count < 0), mpi_errno, MPI_ERR_COUNT,
                          "**countneg", "**countneg %d", count);
@@ -685,7 +685,7 @@ int MPIC_Irecv(void *buf, int count, MPI_Datatype datatype, int source,
     *request = request_ptr->handle;
 
  fn_exit:
-    MPIDI_FUNC_EXIT(MPID_STATE_MPIC_IRECV_FT);
+    MPIDI_FUNC_EXIT(MPID_STATE_MPIC_IRECV);
     return mpi_errno;
  fn_fail:
     goto fn_exit;
@@ -700,9 +700,9 @@ int MPIC_Waitall(int numreq, MPI_Request requests[], MPI_Status statuses[], int
 {
     int mpi_errno = MPI_SUCCESS;
     int i;
-    MPIDI_STATE_DECL(MPID_STATE_MPIC_WAITALL_FT);
+    MPIDI_STATE_DECL(MPID_STATE_MPIC_WAITALL);
 
-    MPIDI_FUNC_ENTER(MPID_STATE_MPIC_WAITALL_FT);
+    MPIDI_FUNC_ENTER(MPID_STATE_MPIC_WAITALL);
 
     MPIU_Assert(statuses != MPI_STATUSES_IGNORE);
 
@@ -730,7 +730,7 @@ int MPIC_Waitall(int numreq, MPI_Request requests[], MPI_Status statuses[], int
 
  fn_exit:
     MPIU_DBG_MSG_S(PT2PT, TYPICAL, "OUT: errflag = %s", *errflag?"TRUE":"FALSE");
-    MPIDI_FUNC_EXIT(MPID_STATE_MPIC_WAITALL_FT);
+    MPIDI_FUNC_EXIT(MPID_STATE_MPIC_WAITALL
     return mpi_errno;
  fn_fail:
     goto fn_exit;

http://git.mpich.org/mpich.git/commitdiff/edb3a6fc322250717f74f1504ad3b048ee4e10ca

commit edb3a6fc322250717f74f1504ad3b048ee4e10ca
Author: Igor Ivanov <Igor.Ivanov at itseez.com>
Date:   Mon Oct 20 17:35:05 2014 +0200

    netmod/mxm: Avoid calling mxm send req handling from mxm send completion callback
    
    Signed-off-by: Devendar Bureddy <devendar at mellanox.com>
    Signed-off-by: Igor Ivanov <Igor.Ivanov at itseez.com>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_impl.h b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_impl.h
index 3b2bb12..43070da 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_impl.h
+++ b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_impl.h
@@ -69,6 +69,8 @@ void MPID_nem_mxm_get_adi_msg(mxm_conn_h conn, mxm_imm_t imm, void *data,
 void MPID_nem_mxm_anysource_posted(MPID_Request * req);
 int MPID_nem_mxm_anysource_matched(MPID_Request * req);
 
+int _mxm_handle_sreq(MPID_Request * req);
+
 /* List type as queue
  * Operations, initialization etc
  */
@@ -174,6 +176,25 @@ typedef struct {
 /* macro for mxm private in REQ */
 #define REQ_BASE(reqp) ((reqp) ? (MPID_nem_mxm_req_area *)((reqp)->ch.netmod_area.padding) : NULL)
 
+typedef GENERIC_Q_DECL(struct MPID_Request) MPID_nem_mxm_reqq_t;
+#define MPID_nem_mxm_queue_empty(q) GENERIC_Q_EMPTY (q)
+#define MPID_nem_mxm_queue_head(q) GENERIC_Q_HEAD (q)
+#define MPID_nem_mxm_queue_enqueue(qp, ep) do {                                           \
+        /* add refcount so req doesn't get freed before it's dequeued */                \
+        MPIR_Request_add_ref(ep);                                                       \
+        MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST,                         \
+                          "MPID_nem_mxm_queue_enqueue req=%p (handle=%#x), queue=%p",     \
+                          ep, (ep)->handle, qp));                                       \
+        GENERIC_Q_ENQUEUE (qp, ep, dev.next);                                           \
+    } while (0)
+#define MPID_nem_mxm_queue_dequeue(qp, ep)  do {                                          \
+        GENERIC_Q_DEQUEUE (qp, ep, dev.next);                                           \
+        MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST,                         \
+                          "MPID_nem_mxm_queue_dequeuereq=%p (handle=%#x), queue=%p",      \
+                          *(ep), *(ep) ? (*(ep))->handle : -1, qp));                    \
+        MPID_Request_release(*(ep));                                                    \
+    } while (0)
+
 typedef struct MPID_nem_mxm_module_t {
     char *runtime_version;
     const char *compiletime_version;
@@ -188,6 +209,7 @@ typedef struct MPID_nem_mxm_module_t {
     int mxm_np;
     MPID_nem_mxm_ep_t *endpoint;
     list_head_t free_queue;
+    MPID_nem_mxm_reqq_t sreq_queue;
     struct {
         int bulk_connect;       /* use bulk connect */
         int bulk_disconnect;    /* use bulk disconnect */
diff --git a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_init.c b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_init.c
index 7bd2cad..37e0558 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_init.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_init.c
@@ -446,6 +446,8 @@ static int _mxm_init(int rank, int size)
     list_grow_mxm_req(&_mxm_obj.free_queue);
     MPIU_Assert(list_length(&_mxm_obj.free_queue) == MXM_MPICH_MAX_REQ);
 
+    _mxm_obj.sreq_queue.head = _mxm_obj.sreq_queue.tail = NULL;
+
     mxm_obj = &_mxm_obj;
 
   fn_exit:
diff --git a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_poll.c b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_poll.c
index ba7686e..e8bddc3 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_poll.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_poll.c
@@ -24,10 +24,16 @@ static int _mxm_process_rdtype(MPID_Request ** rreq_p, MPI_Datatype datatype,
 int MPID_nem_mxm_poll(int in_blocking_progress)
 {
     int mpi_errno = MPI_SUCCESS;
+    MPID_Request *req = NULL;
 
     MPIDI_STATE_DECL(MPID_STATE_MXM_POLL);
     MPIDI_FUNC_ENTER(MPID_STATE_MXM_POLL);
 
+    while (!MPID_nem_mxm_queue_empty(mxm_obj->sreq_queue)) {
+        MPID_nem_mxm_queue_dequeue(&mxm_obj->sreq_queue, &req);
+        _mxm_handle_sreq(req);
+    }
+
     mpi_errno = _mxm_poll();
     if (mpi_errno)
         MPIU_ERR_POP(mpi_errno);
@@ -72,6 +78,7 @@ void MPID_nem_mxm_get_adi_msg(mxm_conn_h conn, mxm_imm_t imm, void *data,
     vc = mxm_conn_ctx_get(conn);
 
     _dbg_mxm_output(5, "========> Getting ADI msg (from=%d data_size %d) \n", vc->pg_rank, length);
+    _dbg_mxm_out_buf(data, (length > 16 ? 16 : length));
 
     MPID_nem_handle_pkt(vc, data, (MPIDI_msg_sz_t) (length));
 }
@@ -144,6 +151,10 @@ int MPID_nem_mxm_anysource_matched(MPID_Request * req)
 int MPID_nem_mxm_recv(MPIDI_VC_t * vc, MPID_Request * rreq)
 {
     int mpi_errno = MPI_SUCCESS;
+    MPIDI_msg_sz_t data_sz;
+    int dt_contig;
+    MPI_Aint dt_true_lb;
+    MPID_Datatype *dt_ptr;
 
     MPIDI_STATE_DECL(MPID_STATE_MPID_NEM_MXM_RECV);
     MPIDI_FUNC_ENTER(MPID_STATE_MPID_NEM_MXM_RECV);
@@ -152,18 +163,15 @@ int MPID_nem_mxm_recv(MPIDI_VC_t * vc, MPID_Request * rreq)
     MPIU_Assert(((rreq->dev.match.parts.rank == MPI_ANY_SOURCE) && (vc == NULL)) ||
                 (vc && !vc->ch.is_local));
 
+    MPIDI_Datatype_get_info(rreq->dev.user_count, rreq->dev.datatype, dt_contig, data_sz,
+                            dt_ptr, dt_true_lb);
+
     {
         MPIR_Context_id_t context_id = rreq->dev.match.parts.context_id;
         int tag = rreq->dev.match.parts.tag;
-        MPIDI_msg_sz_t data_sz;
-        int dt_contig;
-        MPI_Aint dt_true_lb;
-        MPID_Datatype *dt_ptr;
         MPID_nem_mxm_vc_area *vc_area = NULL;
         MPID_nem_mxm_req_area *req_area = NULL;
 
-        MPIDI_Datatype_get_info(rreq->dev.user_count, rreq->dev.datatype, dt_contig, data_sz,
-                                dt_ptr, dt_true_lb);
         rreq->dev.OnDataAvail = NULL;
         rreq->dev.tmpbuf = NULL;
         rreq->ch.vc = vc;
@@ -223,7 +231,6 @@ static int _mxm_handle_rreq(MPID_Request * req)
     MPIDI_msg_sz_t userbuf_sz;
     MPID_Datatype *dt_ptr;
     MPIDI_msg_sz_t data_sz;
-    MPIDI_VC_t *vc = NULL;
     MPID_nem_mxm_vc_area *vc_area ATTRIBUTE((unused)) = NULL;
     MPID_nem_mxm_req_area *req_area = NULL;
     void *tmp_buf = NULL;
@@ -319,7 +326,7 @@ static int _mxm_handle_rreq(MPID_Request * req)
         }
     }
 
-    MPIDI_CH3U_Handle_recv_req(vc, req, &complete);
+    MPIDI_CH3U_Handle_recv_req(req->ch.vc, req, &complete);
     MPIU_Assert(complete == TRUE);
 
     if (tmp_buf) MPIU_Free(tmp_buf);
diff --git a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_send.c b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_send.c
index b817459..7e490b3 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_send.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_send.c
@@ -15,7 +15,6 @@ enum {
 };
 
 
-static int _mxm_handle_sreq(MPID_Request * req);
 static void _mxm_send_completion_cb(void *context);
 static int _mxm_isend(MPID_nem_mxm_ep_t * ep, MPID_nem_mxm_req_area * req,
                       int type, mxm_mq_h mxm_mq, int mxm_rank, int id, mxm_tag_t tag, int block);
@@ -235,6 +234,7 @@ int MPID_nem_mxm_send(MPIDI_VC_t * vc, const void *buf, int count, MPI_Datatype
     MPIDI_Request_create_sreq(sreq, mpi_errno, goto fn_exit);
     MPIU_Assert(sreq != NULL);
     MPIDI_Request_set_type(sreq, MPIDI_REQUEST_TYPE_SEND);
+
     MPIDI_VC_FAI_send_seqnum(vc, seqnum);
     MPIDI_Request_set_seqnum(sreq, seqnum);
     if (HANDLE_GET_KIND(datatype) != HANDLE_KIND_BUILTIN) {
@@ -336,7 +336,8 @@ int MPID_nem_mxm_ssend(MPIDI_VC_t * vc, const void *buf, int count, MPI_Datatype
     /* create a request */
     MPIDI_Request_create_sreq(sreq, mpi_errno, goto fn_exit);
     MPIU_Assert(sreq != NULL);
-    MPIDI_Request_set_type(sreq, MPIDI_REQUEST_TYPE_SEND);
+    MPIDI_Request_set_type(sreq, MPIDI_REQUEST_TYPE_SSEND);
+
     MPIDI_VC_FAI_send_seqnum(vc, seqnum);
     MPIDI_Request_set_seqnum(sreq, seqnum);
     if (HANDLE_GET_KIND(datatype) != HANDLE_KIND_BUILTIN) {
@@ -439,6 +440,7 @@ int MPID_nem_mxm_isend(MPIDI_VC_t * vc, const void *buf, int count, MPI_Datatype
     MPIDI_Request_create_sreq(sreq, mpi_errno, goto fn_exit);
     MPIU_Assert(sreq != NULL);
     MPIDI_Request_set_type(sreq, MPIDI_REQUEST_TYPE_SEND);
+
     MPIDI_VC_FAI_send_seqnum(vc, seqnum);
     MPIDI_Request_set_seqnum(sreq, seqnum);
     if (HANDLE_GET_KIND(datatype) != HANDLE_KIND_BUILTIN) {
@@ -541,7 +543,8 @@ int MPID_nem_mxm_issend(MPIDI_VC_t * vc, const void *buf, int count, MPI_Datatyp
     /* create a request */
     MPIDI_Request_create_sreq(sreq, mpi_errno, goto fn_exit);
     MPIU_Assert(sreq != NULL);
-    MPIDI_Request_set_type(sreq, MPIDI_REQUEST_TYPE_SEND);
+    MPIDI_Request_set_type(sreq, MPIDI_REQUEST_TYPE_SSEND);
+
     MPIDI_VC_FAI_send_seqnum(vc, seqnum);
     MPIDI_Request_set_seqnum(sreq, seqnum);
     if (HANDLE_GET_KIND(datatype) != HANDLE_KIND_BUILTIN) {
@@ -619,10 +622,9 @@ int MPID_nem_mxm_issend(MPIDI_VC_t * vc, const void *buf, int count, MPI_Datatyp
 }
 
 
-static int _mxm_handle_sreq(MPID_Request * req)
+int _mxm_handle_sreq(MPID_Request * req)
 {
     int complete = FALSE;
-    int (*reqFn) (MPIDI_VC_t *, MPID_Request *, int *);
     MPID_nem_mxm_vc_area *vc_area = NULL;
     MPID_nem_mxm_req_area *req_area = NULL;
 
@@ -634,8 +636,10 @@ static int _mxm_handle_sreq(MPID_Request * req)
                       16 ? 16 : req_area->iov_buf[0].length));
 
     vc_area->pending_sends -= 1;
-    if (((req->dev.datatype_ptr != NULL) && (req->dev.tmpbuf != NULL))) {
-        MPIU_Free(req->dev.tmpbuf);
+    if (req->dev.tmpbuf) {
+        if (req->dev.datatype_ptr || req->ch.noncontig) {
+            MPIU_Free(req->dev.tmpbuf);
+        }
     }
 
     if (req_area->iov_count > MXM_MPICH_MAX_IOV) {
@@ -644,19 +648,8 @@ static int _mxm_handle_sreq(MPID_Request * req)
         req_area->iov_count = 0;
     }
 
-    reqFn = req->dev.OnDataAvail;
-    if (!reqFn) {
-        MPIDI_CH3U_Request_complete(req);
-        MPIU_DBG_MSG(CH3_CHANNEL, VERBOSE, ".... complete");
-    }
-    else {
-        MPIDI_VC_t *vc = req->ch.vc;
-
-        reqFn(vc, req, &complete);
-        if (!complete) {
-            MPIU_Assert(complete == TRUE);
-        }
-    }
+    MPIDI_CH3U_Handle_send_req(req->ch.vc, req, &complete);
+    MPIU_Assert(complete == TRUE);
 
     return complete;
 }
@@ -683,7 +676,7 @@ static void _mxm_send_completion_cb(void *context)
                     req, req->status.MPI_ERROR);
 
     if (likely(!MPIR_STATUS_GET_CANCEL_BIT(req->status))) {
-        _mxm_handle_sreq(req);
+        MPID_nem_mxm_queue_enqueue(&mxm_obj->sreq_queue, req);
     }
 }
 

http://git.mpich.org/mpich.git/commitdiff/db87c8d07c03c467d5d367aa03be91c2bda1e0b5

commit db87c8d07c03c467d5d367aa03be91c2bda1e0b5
Author: Pavan Balaji <balaji at anl.gov>
Date:   Tue Oct 21 16:58:29 2014 -0500

    Bug-fix: free all created portals.
    
    Signed-off-by: Ken Raffenetti <raffenet at mcs.anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
index 984f8c1..09d84f7 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
@@ -177,6 +177,12 @@ static int ptl_finalize(void)
     ret = PtlPTFree(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_pt);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptfree", "**ptlptfree %s", MPID_nem_ptl_strerror(ret));
 
+    ret = PtlPTFree(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_get_pt);
+    MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptfree", "**ptlptfree %s", MPID_nem_ptl_strerror(ret));
+
+    ret = PtlPTFree(MPIDI_nem_ptl_ni, MPIDI_nem_ptl_control_pt);
+    MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlptfree", "**ptlptfree %s", MPID_nem_ptl_strerror(ret));
+
     ret = PtlNIFini(MPIDI_nem_ptl_ni);
     MPIU_ERR_CHKANDJUMP1(ret, mpi_errno, MPI_ERR_OTHER, "**ptlnifini", "**ptlnifini %s", MPID_nem_ptl_strerror(ret));
 

http://git.mpich.org/mpich.git/commitdiff/fdf63d8e40102d19d0a60a120ee029220cdca901

commit fdf63d8e40102d19d0a60a120ee029220cdca901
Author: Wesley Bland <wbland at anl.gov>
Date:   Tue Oct 21 10:27:20 2014 -0500

    Fix typo in bcast macro
    
    The macro that called the bcast function left out an underscore in the
    mpi_errno return value. This caused the test to always return MPI_ERR_OTHER
    instead of the value being returned by the underlying bcast function.
    
    Signed-off-by: Huiwei Lu <huiweilu at mcs.anl.gov>

diff --git a/src/mpi/coll/bcast.c b/src/mpi/coll/bcast.c
index 836dd4d..7e42727 100644
--- a/src/mpi/coll/bcast.c
+++ b/src/mpi/coll/bcast.c
@@ -982,7 +982,7 @@ fn_fail:
             /* for communication errors, just record the error but continue */                   \
             *(errflag_) = TRUE;                                                                  \
             MPIU_ERR_SET(mpi_errno_, MPI_ERR_OTHER, "**fail");                                   \
-            MPIU_ERR_ADD(mpi_errno_ret_, mpi_errno);                                             \
+            MPIU_ERR_ADD(mpi_errno_ret_, mpi_errno_);                                            \
         }                                                                                        \
     } while (0)
 
diff --git a/test/mpi/ft/bcast.c b/test/mpi/ft/bcast.c
index 06cfeff..8c877a9 100644
--- a/test/mpi/ft/bcast.c
+++ b/test/mpi/ft/bcast.c
@@ -27,6 +27,10 @@ int main(int argc, char **argv)
     MPI_Comm_size(MPI_COMM_WORLD, &size);
     MPI_Comm_set_errhandler(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
 
+    MPI_Comm_group(MPI_COMM_WORLD, &world);
+    MPI_Group_excl(world, 1, deadprocs, &newgroup);
+    MPI_Comm_create_group(MPI_COMM_WORLD, newgroup, 0, &newcomm);
+
     if (size < 3) {
         fprintf( stderr, "Must run with at least 3 processes\n" );
         MPI_Abort( MPI_COMM_WORLD, 1 );
@@ -66,10 +70,6 @@ int main(int argc, char **argv)
     }
 #endif
 
-    MPI_Comm_group(MPI_COMM_WORLD, &world);
-    MPI_Group_excl(world, 1, deadprocs, &newgroup);
-    MPI_Comm_create_group(MPI_COMM_WORLD, newgroup, 0, &newcomm);
-
     rc = MPI_Reduce(&errs, &toterrs, 1, MPI_INT, MPI_SUM, 0, newcomm);
     if(rc)
         fprintf(stderr, "Failed to get errors from other processes\n");

http://git.mpich.org/mpich.git/commitdiff/ded935df4be4df3f6c1e5d0bd799b75c3d5e2ced

commit ded935df4be4df3f6c1e5d0bd799b75c3d5e2ced
Author: Pavan Balaji <balaji at anl.gov>
Date:   Thu Oct 16 18:14:06 2014 -0500

    Remove unnecessary arbitrary code.
    
    Signed-off-by: Wesley Bland <wbland at anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/mx/mx_cancel.c b/src/mpid/ch3/channels/nemesis/netmod/mx/mx_cancel.c
index e183cac..755c327 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/mx/mx_cancel.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/mx/mx_cancel.c
@@ -65,29 +65,6 @@ int MPID_nem_mx_cancel_send(MPIDI_VC_t *vc, MPID_Request *sreq)
 }
 
 
-/* code in cancel_recv */
-/* FIXME: The vc is only needed to find which function to call*/
-/* This is otherwise any_source ready */
-/*
-#ifdef ENABLE_COMM_OVERRIDES
- {                                                              
-      MPIDI_VC_t *vc;
-      MPIU_Assert(rreq->dev.match.parts.rank != MPI_ANY_SOURCE);
-      MPIDI_Comm_get_vc_set_active(rreq->comm, rreq->dev.match.parts.rank, &vc);
-      if (vc->comm_ops && vc->comm_ops->cancel_recv)
-      {
-         int handled;
-         handled = vc->comm_ops->cancel_recv(NULL, rreq);
-         if (handled)
-         {
-            MPIDI_FUNC_EXIT(MPID_STATE_MPID_CANCEL_RECV);
-            return MPI_SUCCESS;
-         }
-      }
-  }
-  #endif
-*/
-
 #undef FUNCNAME
 #define FUNCNAME MPID_nem_mx_cancel_recv
 #undef FCNAME

http://git.mpich.org/mpich.git/commitdiff/f7df2d1b57fe4409e825cfe993b3c0b9ef013b7c

commit f7df2d1b57fe4409e825cfe993b3c0b9ef013b7c
Author: Paul Coffman <pkcoff at us.ibm.com>
Date:   Sun Oct 19 20:07:18 2014 -0500

    fix failure to update status in p2pcontig case
    
    ADIOI_GPFS_WriteStridedColl and ADIOI_GPFS_ReadStridedColl need to call
    MPIR_Status_set_bytes when GPFSMPIO_P2PCONTIG=1.
    
    When the GPFSMPIO_P2PCONTIG optimization is set, the code path for
    ADIOI_GPFS_WriteStridedColl and ADIOI_GPFS_ReadStridedColl returns
    before MPIR_Status_set_bytes is called.  Duplicate the call to
    MPIR_Status_set_bytes in the GPFSMPIO_P2PCONTIG code path.
    
    Signed-off-by: Rob Latham <robl at mcs.anl.gov>

diff --git a/src/mpi/romio/adio/ad_gpfs/ad_gpfs_rdcoll.c b/src/mpi/romio/adio/ad_gpfs/ad_gpfs_rdcoll.c
index 92b6336..c2cad8b 100644
--- a/src/mpi/romio/adio/ad_gpfs/ad_gpfs_rdcoll.c
+++ b/src/mpi/romio/adio/ad_gpfs/ad_gpfs_rdcoll.c
@@ -299,8 +299,8 @@ void ADIOI_GPFS_ReadStridedColl(ADIO_File fd, void *buf, int count,
             ADIOI_Free(end_offsets);
             ADIOI_Free(fd_start);
             ADIOI_Free(fd_end);
+	    goto fn_exit;
 
-	    return;
 	}
     }
 
@@ -398,6 +398,7 @@ void ADIOI_GPFS_ReadStridedColl(ADIO_File fd, void *buf, int count,
     ADIOI_Free(fd_start);
     ADIOI_Free(fd_end);
 
+fn_exit:
 #ifdef HAVE_STATUS_SET_BYTES
     MPI_Type_size_x(datatype, &size);
     bufsize = size * count;
diff --git a/src/mpi/romio/adio/ad_gpfs/ad_gpfs_wrcoll.c b/src/mpi/romio/adio/ad_gpfs/ad_gpfs_wrcoll.c
index 0e2a1d2..968e6e6 100644
--- a/src/mpi/romio/adio/ad_gpfs/ad_gpfs_wrcoll.c
+++ b/src/mpi/romio/adio/ad_gpfs/ad_gpfs_wrcoll.c
@@ -288,7 +288,7 @@ void ADIOI_GPFS_WriteStridedColl(ADIO_File fd, const void *buf, int count,
             ADIOI_Free(fd_start);
             ADIOI_Free(fd_end);
 
-	    return;
+	    goto fn_exit;
 	}
     }
 
@@ -370,6 +370,7 @@ void ADIOI_GPFS_WriteStridedColl(ADIO_File fd, const void *buf, int count,
     ADIOI_Free(fd_start);
     ADIOI_Free(fd_end);
 
+fn_exit:
 #ifdef HAVE_STATUS_SET_BYTES
     if (status) {
       MPI_Count bufsize, size;

http://git.mpich.org/mpich.git/commitdiff/459534e47de4c2199aa3c77b30b75678eb37a981

commit 459534e47de4c2199aa3c77b30b75678eb37a981
Author: Rob Latham <robl at mcs.anl.gov>
Date:   Thu Aug 21 15:38:03 2014 +0000

    romio: small formatting fix for compiler warnings
    
    Update a debug-only print string to accomodate recent updates to the
    type of the length parameter.
    
    No reviewer

diff --git a/src/mpi/romio/adio/ad_gpfs/ad_gpfs_aggrs.c b/src/mpi/romio/adio/ad_gpfs/ad_gpfs_aggrs.c
index 517db04..0e67b54 100644
--- a/src/mpi/romio/adio/ad_gpfs/ad_gpfs_aggrs.c
+++ b/src/mpi/romio/adio/ad_gpfs/ad_gpfs_aggrs.c
@@ -607,7 +607,7 @@ void ADIOI_GPFS_Calc_my_req(ADIO_File fd, ADIO_Offset *offset_list, ADIO_Offset
 	    DBG_FPRINTF(stderr, "data needed from %d (count = %d):\n", i,
 		    my_req[i].count);
 	    for (l=0; l < my_req[i].count; l++) {
-		DBG_FPRINTF(stderr, "   off[%d] = %lld, len[%d] = %d\n", l,
+		DBG_FPRINTF(stderr, "   off[%d] = %lld, len[%d] = %lld\n", l,
 			my_req[i].offsets[l], l, my_req[i].lens[l]);
 	    }
 	}

http://git.mpich.org/mpich.git/commitdiff/3825f25a4b67cc671d3b9ab7135852d2b0bc9400

commit 3825f25a4b67cc671d3b9ab7135852d2b0bc9400
Author: Wesley Bland <wbland at anl.gov>
Date:   Thu Aug 21 10:21:10 2014 -0500

    Fix FUNC_ENTER macros for some ft functions
    
    The calls in MPID_Comm_get_all_failed_procs and MPID_Comm_agree  were the
    wrong macros for entering and exiting an MPID function. This corrects it.
    
    Signed-off-by: Huiwei Lu <huiweilu at mcs.anl.gov>

diff --git a/src/mpid/ch3/src/mpid_comm_agree.c b/src/mpid/ch3/src/mpid_comm_agree.c
index 12ae988..3bd5849 100644
--- a/src/mpid/ch3/src/mpid_comm_agree.c
+++ b/src/mpid/ch3/src/mpid_comm_agree.c
@@ -64,8 +64,8 @@ int MPID_Comm_agree(MPID_Comm *comm_ptr, uint32_t *bitarray, int *flag, int new_
     int errflag = new_fail;
     int tmp_flag;
 
-    MPID_MPI_STATE_DECL(MPID_STATE_MPID_COMM_AGREE);
-    MPID_MPI_FUNC_ENTER(MPID_STATE_MPID_COMM_AGREE);
+    MPIDI_STATE_DECL(MPID_STATE_MPID_COMM_AGREE);
+    MPIDI_FUNC_ENTER(MPID_STATE_MPID_COMM_AGREE);
 
     children = (int *) MPIU_Malloc(sizeof(int) * ((comm_ptr->local_size) / 2));
 
@@ -113,7 +113,7 @@ int MPID_Comm_agree(MPID_Comm *comm_ptr, uint32_t *bitarray, int *flag, int new_
     MPIU_Free(children);
 
   fn_exit:
-    MPID_MPI_FUNC_EXIT(MPID_STATE_MPID_COMM_AGREE);
+    MPIDI_FUNC_EXIT(MPID_STATE_MPID_COMM_AGREE);
     return mpi_errno;
   fn_fail:
     goto fn_exit;
diff --git a/src/mpid/ch3/src/mpid_comm_get_all_failed_procs.c b/src/mpid/ch3/src/mpid_comm_get_all_failed_procs.c
index 9d29cf6..98e8230 100644
--- a/src/mpid/ch3/src/mpid_comm_get_all_failed_procs.c
+++ b/src/mpid/ch3/src/mpid_comm_get_all_failed_procs.c
@@ -90,7 +90,7 @@ int MPID_Comm_get_all_failed_procs(MPID_Comm *comm_ptr, MPID_Group **failed_grou
     MPID_Group *local_fail;
     MPIDI_STATE_DECL(MPID_STATE_MPID_COMM_GET_ALL_FAILED_PROCS);
 
-    MPID_MPI_FUNC_ENTER(MPID_STATE_MPID_COMM_GET_ALL_FAILED_PROCS);
+    MPIDI_FUNC_ENTER(MPID_STATE_MPID_COMM_GET_ALL_FAILED_PROCS);
 
     /* Kick the progress engine in case it's been a while so we get all the
      * latest updates about failures */
@@ -152,7 +152,7 @@ int MPID_Comm_get_all_failed_procs(MPID_Comm *comm_ptr, MPID_Group **failed_grou
     MPIU_Free(remote_bitarray);
 
   fn_exit:
-    MPID_MPI_FUNC_EXIT(MPID_STATE_MPID_COMM_GET_ALL_FAILED_PROCS);
+    MPIDI_FUNC_EXIT(MPID_STATE_MPID_COMM_GET_ALL_FAILED_PROCS);
     return mpi_errno;
   fn_fail:
     goto fn_exit;

http://git.mpich.org/mpich.git/commitdiff/7f3076fcd0198245419a39b7335bf017456e15e3

commit 7f3076fcd0198245419a39b7335bf017456e15e3
Author: Huiwei Lu <huiweilu at mcs.anl.gov>
Date:   Fri Aug 8 15:53:24 2014 -0500

    Fixes one missing case in MPIDI_CH3U_Clean_recvq
    
    Revoke will call MPIDI_CH3U_Clean_recvq to dequeue all requests with
    revoked communicators. There is one missing case: when there's hierarchy
    communicators that use a different context id. This patch adds a case to
    check the hierarchy communicators.
    
    Signed-off-by: Wesley Bland <wbland at anl.gov>

diff --git a/src/mpid/ch3/src/ch3u_recvq.c b/src/mpid/ch3/src/ch3u_recvq.c
index c37e4b3..e4b7a38 100644
--- a/src/mpid/ch3/src/ch3u_recvq.c
+++ b/src/mpid/ch3/src/ch3u_recvq.c
@@ -958,7 +958,8 @@ int MPIDI_CH3U_Clean_recvq(MPID_Comm *comm_ptr)
         match.parts.context_id = comm_ptr->recvcontext_id + MPID_CONTEXT_INTRA_COLL;
 
         if (MATCH_WITH_LEFT_RIGHT_MASK(rreq->dev.match, match, mask)) {
-            if (rreq->dev.match.parts.tag != MPIR_AGREE_TAG && rreq->dev.match.parts.tag != MPIR_SHRINK_TAG) {
+            if (MPIR_TAG_MASK_ERROR_BIT(rreq->dev.match.parts.tag) != MPIR_AGREE_TAG &&
+                MPIR_TAG_MASK_ERROR_BIT(rreq->dev.match.parts.tag) != MPIR_SHRINK_TAG) {
                 MPIU_DBG_MSG_FMT(CH3_OTHER,VERBOSE,(MPIU_DBG_FDEST,
                             "cleaning up unexpected collective pkt rank=%d tag=%d contextid=%d",
                             rreq->dev.match.parts.rank, rreq->dev.match.parts.tag, rreq->dev.match.parts.context_id));
@@ -967,12 +968,56 @@ int MPIDI_CH3U_Clean_recvq(MPID_Comm *comm_ptr)
             }
         }
 
-        if (MPIR_CVAR_ENABLE_SMP_COLLECTIVES && MPIR_Comm_is_node_aware(comm_ptr)) {
-            int offset = (comm_ptr->comm_kind == MPID_INTRACOMM) ?  MPID_CONTEXT_INTRA_COLL : MPID_CONTEXT_INTER_COLL;
+        if (MPIR_Comm_is_node_aware(comm_ptr)) {
+            int offset;
+            offset = (comm_ptr->comm_kind == MPID_INTRACOMM) ?  MPID_CONTEXT_INTRA_PT2PT : MPID_CONTEXT_INTER_PT2PT;
             match.parts.context_id = comm_ptr->recvcontext_id + MPID_CONTEXT_INTRANODE_OFFSET + offset;
 
             if (MATCH_WITH_LEFT_RIGHT_MASK(rreq->dev.match, match, mask)) {
-                if (rreq->dev.match.parts.tag != MPIR_AGREE_TAG && rreq->dev.match.parts.tag != MPIR_SHRINK_TAG) {
+                if (MPIR_TAG_MASK_ERROR_BIT(rreq->dev.match.parts.tag) != MPIR_AGREE_TAG &&
+                    MPIR_TAG_MASK_ERROR_BIT(rreq->dev.match.parts.tag) != MPIR_SHRINK_TAG) {
+                    MPIU_DBG_MSG_FMT(CH3_OTHER,VERBOSE,(MPIU_DBG_FDEST,
+                                "cleaning up unexpected pt2pt pkt rank=%d tag=%d contextid=%d",
+                                rreq->dev.match.parts.rank, rreq->dev.match.parts.tag, rreq->dev.match.parts.context_id));
+                    dequeue_and_set_error(&rreq, prev_rreq, &recvq_unexpected_head, &recvq_unexpected_tail, &error, MPI_PROC_NULL);
+                    continue;
+                }
+            }
+
+            offset = (comm_ptr->comm_kind == MPID_INTRACOMM) ?  MPID_CONTEXT_INTRA_COLL : MPID_CONTEXT_INTER_COLL;
+            match.parts.context_id = comm_ptr->recvcontext_id + MPID_CONTEXT_INTRANODE_OFFSET + offset;
+
+            if (MATCH_WITH_LEFT_RIGHT_MASK(rreq->dev.match, match, mask)) {
+                if (MPIR_TAG_MASK_ERROR_BIT(rreq->dev.match.parts.tag) != MPIR_AGREE_TAG &&
+                    MPIR_TAG_MASK_ERROR_BIT(rreq->dev.match.parts.tag) != MPIR_SHRINK_TAG) {
+                    MPIU_DBG_MSG_FMT(CH3_OTHER,VERBOSE,(MPIU_DBG_FDEST,
+                                "cleaning up unexpected collective pkt rank=%d tag=%d contextid=%d",
+                                rreq->dev.match.parts.rank, rreq->dev.match.parts.tag, rreq->dev.match.parts.context_id));
+                    dequeue_and_set_error(&rreq, prev_rreq, &recvq_unexpected_head, &recvq_unexpected_tail, &error, MPI_PROC_NULL);
+                    continue;
+                }
+            }
+
+            offset = (comm_ptr->comm_kind == MPID_INTRACOMM) ?  MPID_CONTEXT_INTRA_PT2PT : MPID_CONTEXT_INTER_PT2PT;
+            match.parts.context_id = comm_ptr->recvcontext_id + MPID_CONTEXT_INTERNODE_OFFSET + offset;
+
+            if (MATCH_WITH_LEFT_RIGHT_MASK(rreq->dev.match, match, mask)) {
+                if (MPIR_TAG_MASK_ERROR_BIT(rreq->dev.match.parts.tag) != MPIR_AGREE_TAG &&
+                    MPIR_TAG_MASK_ERROR_BIT(rreq->dev.match.parts.tag) != MPIR_SHRINK_TAG) {
+                    MPIU_DBG_MSG_FMT(CH3_OTHER,VERBOSE,(MPIU_DBG_FDEST,
+                                "cleaning up unexpected pt2pt pkt rank=%d tag=%d contextid=%d",
+                                rreq->dev.match.parts.rank, rreq->dev.match.parts.tag, rreq->dev.match.parts.context_id));
+                    dequeue_and_set_error(&rreq, prev_rreq, &recvq_unexpected_head, &recvq_unexpected_tail, &error, MPI_PROC_NULL);
+                    continue;
+                }
+            }
+
+            offset = (comm_ptr->comm_kind == MPID_INTRACOMM) ?  MPID_CONTEXT_INTRA_COLL : MPID_CONTEXT_INTER_COLL;
+            match.parts.context_id = comm_ptr->recvcontext_id + MPID_CONTEXT_INTERNODE_OFFSET + offset;
+
+            if (MATCH_WITH_LEFT_RIGHT_MASK(rreq->dev.match, match, mask)) {
+                if (MPIR_TAG_MASK_ERROR_BIT(rreq->dev.match.parts.tag) != MPIR_AGREE_TAG &&
+                    MPIR_TAG_MASK_ERROR_BIT(rreq->dev.match.parts.tag) != MPIR_SHRINK_TAG) {
                     MPIU_DBG_MSG_FMT(CH3_OTHER,VERBOSE,(MPIU_DBG_FDEST,
                                 "cleaning up unexpected collective pkt rank=%d tag=%d contextid=%d",
                                 rreq->dev.match.parts.rank, rreq->dev.match.parts.tag, rreq->dev.match.parts.context_id));
@@ -1005,7 +1050,8 @@ int MPIDI_CH3U_Clean_recvq(MPID_Comm *comm_ptr)
         match.parts.context_id = comm_ptr->recvcontext_id + MPID_CONTEXT_INTRA_COLL;
 
         if (MATCH_WITH_LEFT_RIGHT_MASK(rreq->dev.match, match, mask)) {
-            if (rreq->dev.match.parts.tag != MPIR_AGREE_TAG && rreq->dev.match.parts.tag != MPIR_SHRINK_TAG) {
+            if (MPIR_TAG_MASK_ERROR_BIT(rreq->dev.match.parts.tag) != MPIR_AGREE_TAG &&
+                MPIR_TAG_MASK_ERROR_BIT(rreq->dev.match.parts.tag) != MPIR_SHRINK_TAG) {
                 MPIU_DBG_MSG_FMT(CH3_OTHER,VERBOSE,(MPIU_DBG_FDEST,
                             "cleaning up posted collective pkt rank=%d tag=%d contextid=%d",
                             rreq->dev.match.parts.rank, rreq->dev.match.parts.tag, rreq->dev.match.parts.context_id));
@@ -1014,12 +1060,54 @@ int MPIDI_CH3U_Clean_recvq(MPID_Comm *comm_ptr)
             }
         }
 
-        if (MPIR_CVAR_ENABLE_SMP_COLLECTIVES && MPIR_Comm_is_node_aware(comm_ptr)) {
-            int offset = (comm_ptr->comm_kind == MPID_INTRACOMM) ?  MPID_CONTEXT_INTRA_COLL : MPID_CONTEXT_INTER_COLL;
+        if (MPIR_Comm_is_node_aware(comm_ptr)) {
+            int offset;
+            offset = (comm_ptr->comm_kind == MPID_INTRACOMM) ?  MPID_CONTEXT_INTRA_PT2PT : MPID_CONTEXT_INTER_PT2PT;
             match.parts.context_id = comm_ptr->recvcontext_id + MPID_CONTEXT_INTRANODE_OFFSET + offset;
 
             if (MATCH_WITH_LEFT_RIGHT_MASK(rreq->dev.match, match, mask)) {
-                if (rreq->dev.match.parts.tag != MPIR_AGREE_TAG && rreq->dev.match.parts.tag != MPIR_SHRINK_TAG) {
+                if (MPIR_TAG_MASK_ERROR_BIT(rreq->dev.match.parts.tag) != MPIR_AGREE_TAG &&
+                    MPIR_TAG_MASK_ERROR_BIT(rreq->dev.match.parts.tag) != MPIR_SHRINK_TAG) {
+                    MPIU_DBG_MSG_FMT(CH3_OTHER,VERBOSE,(MPIU_DBG_FDEST,
+                                "cleaning up posted pt2pt pkt rank=%d tag=%d contextid=%d",
+                                rreq->dev.match.parts.rank, rreq->dev.match.parts.tag, rreq->dev.match.parts.context_id));
+                    dequeue_and_set_error(&rreq, prev_rreq, &recvq_posted_head, &recvq_posted_tail, &error, MPI_PROC_NULL);
+                    continue;
+                }
+            }
+
+            offset = (comm_ptr->comm_kind == MPID_INTRACOMM) ?  MPID_CONTEXT_INTRA_COLL : MPID_CONTEXT_INTER_COLL;
+
+            if (MATCH_WITH_LEFT_RIGHT_MASK(rreq->dev.match, match, mask)) {
+                if (MPIR_TAG_MASK_ERROR_BIT(rreq->dev.match.parts.tag) != MPIR_AGREE_TAG &&
+                    MPIR_TAG_MASK_ERROR_BIT(rreq->dev.match.parts.tag) != MPIR_SHRINK_TAG) {
+                    MPIU_DBG_MSG_FMT(CH3_OTHER,VERBOSE,(MPIU_DBG_FDEST,
+                                "cleaning up posted collective pkt rank=%d tag=%d contextid=%d",
+                                rreq->dev.match.parts.rank, rreq->dev.match.parts.tag, rreq->dev.match.parts.context_id));
+                    dequeue_and_set_error(&rreq, prev_rreq, &recvq_posted_head, &recvq_posted_tail, &error, MPI_PROC_NULL);
+                    continue;
+                }
+            }
+
+            offset = (comm_ptr->comm_kind == MPID_INTRACOMM) ?  MPID_CONTEXT_INTRA_PT2PT : MPID_CONTEXT_INTER_PT2PT;
+            match.parts.context_id = comm_ptr->recvcontext_id + MPID_CONTEXT_INTERNODE_OFFSET + offset;
+
+            if (MATCH_WITH_LEFT_RIGHT_MASK(rreq->dev.match, match, mask)) {
+                if (MPIR_TAG_MASK_ERROR_BIT(rreq->dev.match.parts.tag) != MPIR_AGREE_TAG &&
+                    MPIR_TAG_MASK_ERROR_BIT(rreq->dev.match.parts.tag) != MPIR_SHRINK_TAG) {
+                    MPIU_DBG_MSG_FMT(CH3_OTHER,VERBOSE,(MPIU_DBG_FDEST,
+                                "cleaning up posted pt2pt pkt rank=%d tag=%d contextid=%d",
+                                rreq->dev.match.parts.rank, rreq->dev.match.parts.tag, rreq->dev.match.parts.context_id));
+                    dequeue_and_set_error(&rreq, prev_rreq, &recvq_posted_head, &recvq_posted_tail, &error, MPI_PROC_NULL);
+                    continue;
+                }
+            }
+
+            offset = (comm_ptr->comm_kind == MPID_INTRACOMM) ?  MPID_CONTEXT_INTRA_COLL : MPID_CONTEXT_INTER_COLL;
+
+            if (MATCH_WITH_LEFT_RIGHT_MASK(rreq->dev.match, match, mask)) {
+                if (MPIR_TAG_MASK_ERROR_BIT(rreq->dev.match.parts.tag) != MPIR_AGREE_TAG &&
+                    MPIR_TAG_MASK_ERROR_BIT(rreq->dev.match.parts.tag) != MPIR_SHRINK_TAG) {
                     MPIU_DBG_MSG_FMT(CH3_OTHER,VERBOSE,(MPIU_DBG_FDEST,
                                 "cleaning up posted collective pkt rank=%d tag=%d contextid=%d",
                                 rreq->dev.match.parts.rank, rreq->dev.match.parts.tag, rreq->dev.match.parts.context_id));

http://git.mpich.org/mpich.git/commitdiff/d1acc9c160dbd9fee3b02dbe992abdb6991c4492

commit d1acc9c160dbd9fee3b02dbe992abdb6991c4492
Author: Huiwei Lu <huiweilu at mcs.anl.gov>
Date:   Thu Aug 7 16:32:04 2014 -0500

    Simplify revoke_nofail test
    
    Simplifies the test to only use 2 processes and not make as many MPI calls.
    
    Modifications by Wesley Bland.
    
    Signed-off-by: Wesley Bland <wbland at anl.gov>

diff --git a/test/mpi/ft/revoke_nofail.c b/test/mpi/ft/revoke_nofail.c
index 9e48712..df7b470 100644
--- a/test/mpi/ft/revoke_nofail.c
+++ b/test/mpi/ft/revoke_nofail.c
@@ -18,23 +18,23 @@ int main(int argc, char **argv)
     int rank, size;
     int rc, ec;
     char error[MPI_MAX_ERROR_STRING];
-    MPI_Comm world_dup, world_dup2;
+    MPI_Comm world_dup;
 
     MPI_Init(&argc, &argv);
     MPI_Comm_rank(MPI_COMM_WORLD, &rank);
     MPI_Comm_size(MPI_COMM_WORLD, &size);
-    if (size < 4) {
-        fprintf( stderr, "Must run with at least 4 processes\n" );
+    if (size < 2) {
+        fprintf( stderr, "Must run with at least 2 processes\n" );
         MPI_Abort(MPI_COMM_WORLD, 1);
     }
 
     MPI_Comm_set_errhandler(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
 
     MPI_Comm_dup(MPI_COMM_WORLD, &world_dup);
-    MPI_Comm_dup(MPI_COMM_WORLD, &world_dup2);
 
-    if (rank == 3)
+    if (rank == 1) {
         MPIX_Comm_revoke(world_dup);
+    }
 
     rc = MPI_Barrier(world_dup);
     MPI_Error_class(rc, &ec);
@@ -45,17 +45,7 @@ int main(int argc, char **argv)
         MPI_Abort(MPI_COMM_WORLD, 1);
     }
 
-    rc = MPI_Barrier(world_dup2);
-    MPI_Error_class(rc, &ec);
-    if (ec != MPI_SUCCESS) {
-        MPI_Error_string(ec, error, &size);
-        fprintf(stderr, "[%d] MPI_Barrier should have returned MPI_SUCCESS, but it actually returned: %d\n%s\n",
-                rank, ec, error);
-        MPI_Abort(MPI_COMM_WORLD, 1);
-    }
-
     MPI_Comm_free(&world_dup);
-    MPI_Comm_free(&world_dup2);
 
     if (rank == 0)
         fprintf(stdout, " No errors\n");
diff --git a/test/mpi/ft/testlist b/test/mpi/ft/testlist
index f26bd47..2f93edd 100644
--- a/test/mpi/ft/testlist
+++ b/test/mpi/ft/testlist
@@ -13,6 +13,6 @@ reduce 4 mpiexecarg=-disable-auto-cleanup resultTest=TestStatusNoErrors strict=f
 bcast 4 mpiexecarg=-disable-auto-cleanup resultTest=TestStatusNoErrors strict=false timeLimit=10 xfail=ticket1945
 scatter 4 mpiexecarg=-disable-auto-cleanup resultTest=TestStatusNoErrors strict=false timeLimit=10 xfail=ticket1945
 anysource 3 mpiexecarg=-disable-auto-cleanup resultTest=TestStatusNoErrors strict=false timeLimit=10 xfail=ticket1945
-revoke_nofail 4 mpiexecarg=-disable-auto-cleanup resultTest=TestStatusNoErrors strict=false timeLimit=10 xfail=ticket1945
+revoke_nofail 2 mpiexecarg=-disable-auto-cleanup resultTest=TestStatusNoErrors strict=false timeLimit=10 xfail=ticket1945
 shrink 8 mpiexecarg=-disable-auto-cleanup resultTest=TestStatusNoErrors strict=false timeLimit=10 xfail=ticket1945
 agree 4 mpiexecarg=-disable-auto-cleanup resultTest=TestStatusNoErrors strict=false timeLimit=10 xfail=ticket1945

http://git.mpich.org/mpich.git/commitdiff/aad518e5020428493b426d55cdc1559022d3c128

commit aad518e5020428493b426d55cdc1559022d3c128
Author: Pavan Balaji <balaji at anl.gov>
Date:   Thu Oct 16 18:13:22 2014 -0500

    Change macros to use standard format
    
    This makes sure that macros work the normal way (using a semicolon at
    the end, etc.) It also removes a block of unused code from
    mx_cancel.c.
    
    Modified by Wesley to split from previous patch.
    
    Signed-off-by: Wesley Bland <wbland at anl.gov>
    Signed-off-by: Pavan Balaji <balaji at anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/ib/ib_poll.c b/src/mpid/ch3/channels/nemesis/netmod/ib/ib_poll.c
index 9106eb9..e6cc5b4 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/ib/ib_poll.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/ib/ib_poll.c
@@ -24,14 +24,16 @@ static int entered_drain_scq = 0;
 #define MPID_NEM_IB_SEND_PROGRESS_POLLINGSET MPID_nem_ib_send_progress(vc);
 #else
 #define MPID_NEM_IB_SEND_PROGRESS_POLLINGSET {     \
-    int n;                                         \
-    for (n = 0; n < MPID_NEM_IB_NRINGBUF; n++) {                    \
-        if (((MPID_nem_ib_ringbuf_allocated[n / 64] >> (n & 63)) & 1) == 0) { \
-            continue;                                               \
-        } \
-        mpi_errno = MPID_nem_ib_poll_eager(&MPID_nem_ib_ringbuf[n]); /*FIXME: perform send_progress for all sendqs */\
-        MPIU_ERR_CHKANDJUMP(mpi_errno, mpi_errno, MPI_ERR_OTHER, "**MPID_nem_ib_poll_eager"); \
-         } \
+        do {                                                        \
+            int n;                                                      \
+            for (n = 0; n < MPID_NEM_IB_NRINGBUF; n++) {                \
+                if (((MPID_nem_ib_ringbuf_allocated[n / 64] >> (n & 63)) & 1) == 0) { \
+                    continue;                                           \
+                }                                                       \
+                mpi_errno = MPID_nem_ib_poll_eager(&MPID_nem_ib_ringbuf[n]); /*FIXME: perform send_progress for all sendqs */ \
+                MPIU_ERR_CHKANDJUMP(mpi_errno, mpi_errno, MPI_ERR_OTHER, "**MPID_nem_ib_poll_eager"); \
+            }                                                           \
+        } while (0)
 }
 #if 0
    int n;                                         \
@@ -45,11 +47,16 @@ static int entered_drain_scq = 0;
 #endif
 #if 1
 #define MPID_NEM_IB_CHECK_AND_SEND_PROGRESS \
-    if (!MPID_nem_ib_sendq_empty(vc_ib->sendq) && MPID_nem_ib_sendq_ready_to_send_head(vc_ib)) { \
-    MPID_nem_ib_send_progress(vc); \
-}
+    do {                                                                \
+        if (!MPID_nem_ib_sendq_empty(vc_ib->sendq) && MPID_nem_ib_sendq_ready_to_send_head(vc_ib)) { \
+            MPID_nem_ib_send_progress(vc);                              \
+        }                                                               \
+    } while (0)
 #else
-#define MPID_NEM_IB_CHECK_AND_SEND_PROGRESS MPID_NEM_IB_SEND_PROGRESS_POLLINGSET
+#define MPID_NEM_IB_CHECK_AND_SEND_PROGRESS \
+    do { \
+        MPID_NEM_IB_SEND_PROGRESS_POLLINGSET; \
+    } while (0)
 #endif
 
 #undef FUNCNAME
@@ -1690,13 +1697,13 @@ int MPID_nem_ib_PktHandler_EagerSend(MPIDI_VC_t * vc,
      * progress_send for all of VCs using nces in ib_poll. */
     dprintf("pkthandler,eagersend,send_progress\n");
     fflush(stdout);
-    MPID_NEM_IB_CHECK_AND_SEND_PROGRESS
-        /* fall back to the original handler */
-        /* we don't need to worry about the difference caused by embedding seq_num
-         * because size of MPI-header of MPIDI_CH3_PKT_EAGER_SEND equals to sizeof(MPIDI_CH3_Pkt_t)
-         * see MPID_nem_ib_iSendContig
-         */
-        //ch3_pkt->type = MPIDI_CH3_PKT_EAGER_SEND;
+    MPID_NEM_IB_CHECK_AND_SEND_PROGRESS;
+    /* fall back to the original handler */
+    /* we don't need to worry about the difference caused by embedding seq_num
+     * because size of MPI-header of MPIDI_CH3_PKT_EAGER_SEND equals to sizeof(MPIDI_CH3_Pkt_t)
+     * see MPID_nem_ib_iSendContig
+     */
+    //ch3_pkt->type = MPIDI_CH3_PKT_EAGER_SEND;
 #if 0
         mpi_errno = MPID_nem_handle_pkt(vc, (char *) pkt_parent_class, *buflen);
 #else
@@ -2254,7 +2261,8 @@ int MPID_nem_ib_pkt_GET_DONE_handler(MPIDI_VC_t * vc,
 
         if (REQ_FIELD(req, seg_seq_num) == REQ_FIELD(req, seg_num)) {
             /* last packet of segments */
-            MPID_NEM_IB_CHECK_AND_SEND_PROGRESS mpi_errno = vc->ch.lmt_done_send(vc, req);
+            MPID_NEM_IB_CHECK_AND_SEND_PROGRESS;
+            mpi_errno = vc->ch.lmt_done_send(vc, req);
             if (mpi_errno)
                 MPIU_ERR_POP(mpi_errno);
         }
@@ -2446,8 +2454,10 @@ int MPID_nem_ib_PktHandler_reply_seq_num(MPIDI_VC_t * vc,
     /* try to send from sendq because at least one RDMA-write-to buffer has been released */
     //dprintf("pkthandler,reply_seq_num,send_progress\n");
     dprintf("pkthandler,reply_seq_num,send_progress\n");
-    MPID_NEM_IB_CHECK_AND_SEND_PROGRESS fn_exit:MPIDI_FUNC_EXIT
-        (MPID_STATE_MPID_NEM_IB_PKTHANDLER_REPLY_SEQ_NUM);
+    MPID_NEM_IB_CHECK_AND_SEND_PROGRESS;
+
+  fn_exit:
+    MPIDI_FUNC_EXIT(MPID_STATE_MPID_NEM_IB_PKTHANDLER_REPLY_SEQ_NUM);
     return mpi_errno;
     //fn_fail:
     goto fn_exit;

http://git.mpich.org/mpich.git/commitdiff/6b78c3cbec63c7131b24c6e64e42836b7ca8da29

commit 6b78c3cbec63c7131b24c6e64e42836b7ca8da29
Author: Pavan Balaji <balaji at anl.gov>
Date:   Thu Oct 9 13:35:56 2014 -0500

    Function state cleanup.
    
    We were not setting the function states correctly in a bunch of
    functions.
    
    Modifications by Wesley to split up big commit.
    
    Signed-off-by: Wesley Bland <wbland at anl.gov>
    Signed-off-by: Pavan Balaji <balaji at anl.gov>

diff --git a/src/mpi/attr/win_set_attr.c b/src/mpi/attr/win_set_attr.c
index a211331..8522d40 100644
--- a/src/mpi/attr/win_set_attr.c
+++ b/src/mpi/attr/win_set_attr.c
@@ -200,6 +200,7 @@ int MPI_Win_set_attr(MPI_Win win, int win_keyval, void *attribute_val)
 {
     int mpi_errno = MPI_SUCCESS;
     MPID_MPI_STATE_DECL(MPID_STATE_MPI_WIN_SET_ATTR);
+    MPID_MPI_FUNC_ENTER(MPID_STATE_MPI_WIN_SET_ATTR);
 
     MPIR_ERRTEST_INITIALIZED_ORDIE();
     
diff --git a/src/mpi/comm/comm_agree.c b/src/mpi/comm/comm_agree.c
index 870f43d..49a71d7 100644
--- a/src/mpi/comm/comm_agree.c
+++ b/src/mpi/comm/comm_agree.c
@@ -98,6 +98,7 @@ int MPIR_Comm_agree(MPID_Comm *comm_ptr, int *flag)
     }
 
   fn_exit:
+    MPID_MPI_FUNC_EXIT(MPID_STATE_MPIR_COMM_AGREE);
     return mpi_errno;
   fn_fail:
     goto fn_exit;
diff --git a/src/mpi/comm/comm_set_info.c b/src/mpi/comm/comm_set_info.c
index e6940a8..0acbfbe 100644
--- a/src/mpi/comm/comm_set_info.c
+++ b/src/mpi/comm/comm_set_info.c
@@ -62,6 +62,7 @@ int MPIR_Comm_set_info_impl(MPID_Comm * comm_ptr, MPID_Info * info_ptr)
     }
 
   fn_exit:
+    MPID_MPI_FUNC_EXIT(MPID_STATE_MPIR_COMM_SET_INFO_IMPL);
     return mpi_errno;
   fn_fail:
     goto fn_exit;
diff --git a/src/mpi/comm/comm_shrink.c b/src/mpi/comm/comm_shrink.c
index 24560e7..844d617 100644
--- a/src/mpi/comm/comm_shrink.c
+++ b/src/mpi/comm/comm_shrink.c
@@ -82,6 +82,7 @@ int MPIR_Comm_shrink(MPID_Comm *comm_ptr, MPID_Comm **newcomm_ptr)
 
   fn_exit:
     MPIR_Group_release(comm_grp);
+    MPID_MPI_FUNC_EXIT(MPID_STATE_MPIR_COMM_SHRINK);
     return mpi_errno;
   fn_fail:
     if (*newcomm_ptr) MPIU_Object_set_ref(*newcomm_ptr, 0);
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c
index 1934ce6..1032a2b 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_nm.c
@@ -163,6 +163,8 @@ static inline void save_iov(MPID_Request *sreq, void *hdr, void *data, MPIDI_msg
         ++index;
     }
     sreq->dev.iov_count = index;
+
+    MPIDI_FUNC_EXIT(MPID_STATE_SAVE_IOV);
 }
 
 #undef FUNCNAME
diff --git a/src/mpid/ch3/channels/nemesis/netmod/wintcp/socksm.c b/src/mpid/ch3/channels/nemesis/netmod/wintcp/socksm.c
index a6f5099..fec9e89 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/wintcp/socksm.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/wintcp/socksm.c
@@ -1972,7 +1972,7 @@ static int state_c_tmpvcack_success_handler(MPIU_EXOVERLAPPED *rd_ov)
     }
 
  fn_exit:
-    MPIDI_FUNC_ENTER(MPID_STATE_STATE_C_TMPVCACK_SUCCESS_HANDLER);
+    MPIDI_FUNC_EXIT(MPID_STATE_STATE_C_TMPVCACK_SUCCESS_HANDLER);
     return mpi_errno;
  fn_fail:
     goto fn_exit;
diff --git a/src/mpid/ch3/include/mpidrma.h b/src/mpid/ch3/include/mpidrma.h
index 02c0b9c..8024bf2 100644
--- a/src/mpid/ch3/include/mpidrma.h
+++ b/src/mpid/ch3/include/mpidrma.h
@@ -1095,7 +1095,7 @@ static inline int MPIDI_CH3I_Wait_for_pt_ops_finish(MPID_Win *win_ptr)
     }
 
  fn_exit:
-    MPIDI_RMA_FUNC_ENTER(MPID_STATE_MPIDI_CH3I_WAIT_FOR_PT_OPS_FINISH);
+    MPIDI_RMA_FUNC_EXIT(MPID_STATE_MPIDI_CH3I_WAIT_FOR_PT_OPS_FINISH);
     return mpi_errno;
  fn_fail:
     goto fn_exit;
diff --git a/src/mpid/ch3/src/mpid_comm_agree.c b/src/mpid/ch3/src/mpid_comm_agree.c
index 6377397..12ae988 100644
--- a/src/mpid/ch3/src/mpid_comm_agree.c
+++ b/src/mpid/ch3/src/mpid_comm_agree.c
@@ -113,6 +113,7 @@ int MPID_Comm_agree(MPID_Comm *comm_ptr, uint32_t *bitarray, int *flag, int new_
     MPIU_Free(children);
 
   fn_exit:
+    MPID_MPI_FUNC_EXIT(MPID_STATE_MPID_COMM_AGREE);
     return mpi_errno;
   fn_fail:
     goto fn_exit;
diff --git a/src/mpid/ch3/src/mpid_comm_failure_ack.c b/src/mpid/ch3/src/mpid_comm_failure_ack.c
index fe70fdc..e5034a1 100644
--- a/src/mpid/ch3/src/mpid_comm_failure_ack.c
+++ b/src/mpid/ch3/src/mpid_comm_failure_ack.c
@@ -139,6 +139,7 @@ int MPID_Comm_failed_bitarray(MPID_Comm *comm_ptr, uint32_t **bitarray, int acke
 
   fn_exit:
     MPIU_CHKLMEM_FREEALL();
+    MPIDI_FUNC_EXIT(MPID_STATE_COMM_FAILED_BITARRAY);
     return mpi_errno;
   fn_fail:
     goto fn_exit;

http://git.mpich.org/mpich.git/commitdiff/55017de2e2f7088a42858a984de47ceb2158a024

commit 55017de2e2f7088a42858a984de47ceb2158a024
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Thu Oct 16 09:47:38 2014 -0500

    disable interlibrary dependencies on BGQ
    
    The default linker behavior on BGQ makes interlibrary dependencies
    tricky to support correctly. Just disable them to make our lives easier.
    
    Signed-off-by: Rob Latham <robl at mcs.anl.gov>

diff --git a/src/mpid/pamid/subconfigure.m4 b/src/mpid/pamid/subconfigure.m4
index d9f2bb7..f67f86e 100644
--- a/src/mpid/pamid/subconfigure.m4
+++ b/src/mpid/pamid/subconfigure.m4
@@ -228,6 +228,13 @@ if test "${pamid_platform}" = "BGQ" ; then
   dnl
   dnl Only the 'cpi', 'mpivars', and 'mpichversion' executables have this problem.
   MPID_LIBTOOL_STATIC_FLAG="-all-static"
+
+  dnl Another bgq special case. The default linker behavior is to use static versions
+  dnl of libraries. This makes supporting interlibrary dependencies difficult. Just
+  dnl disable them to make our lives easier.
+  if test "$INTERLIB_DEPS" = "yes"; then
+	INTERLIB_DEPS="no"
+  fi
 fi
 
 if test "${pamid_platform}" = "PE" ; then

http://git.mpich.org/mpich.git/commitdiff/20d584c19e9f6b9b2c35a53c20996dbef09bd6f3

commit 20d584c19e9f6b9b2c35a53c20996dbef09bd6f3
Author: Pavan Balaji <balaji at anl.gov>
Date:   Tue Oct 14 15:00:46 2014 -0500

    Allow ROMIO to propagate library dependencies to MPICH.
    
    Signed-off-by: Rob Latham <robl at mcs.anl.gov>

diff --git a/configure.ac b/configure.ac
index c7e4592..5c3d291 100644
--- a/configure.ac
+++ b/configure.ac
@@ -5977,7 +5977,7 @@ AC_DEFINE(HAVE_MPICHCONF,1,[Define so that we can test whether the mpichconf.h f
 # add the LDFLAGS/LIBS we got so far to WRAPPERs
 if test "$INTERLIB_DEPS" = "no" ; then
    WRAPPER_LDFLAGS="$WRAPPER_LDFLAGS $LDFLAGS"
-   WRAPPER_LIBS="$WRAPPER_LIBS $EXTERNAL_LIBS $ROMIO_EXTERNAL_LIBS"
+   WRAPPER_LIBS="$WRAPPER_LIBS $EXTERNAL_LIBS"
 fi
 
 if test "$USE_PMI2_API" = "yes" ; then
diff --git a/src/mpi/romio/localdefs.in b/src/mpi/romio/localdefs.in
index 11edc9e..bf057d1 100644
--- a/src/mpi/romio/localdefs.in
+++ b/src/mpi/romio/localdefs.in
@@ -1,4 +1,7 @@
 #! /bin/sh
-ROMIO_EXTERNAL_LIBS="@LIBS@"
+
+# Append ROMIO library dependencies to the global list
+EXTERNAL_LIBS="$EXTERNAL_LIBS @LIBS@"
+
 MPI_OFFSET_TYPE="@MPI_OFFSET_TYPE@"
 FORTRAN_MPI_OFFSET="@FORTRAN_MPI_OFFSET@"

http://git.mpich.org/mpich.git/commitdiff/6de2a1d9e8d72514beb5ebf4bfddde3d8aa9ea9e

commit 6de2a1d9e8d72514beb5ebf4bfddde3d8aa9ea9e
Author: Paul Coffman <pkcoff at us.ibm.com>
Date:   Tue Oct 7 21:11:27 2014 -0500

    MPIDI_Coll_comm_create unsetting default collops for inTERcomms
    
    There are no pami optimized collectives available for inTERcomms
    so when MPIDI_Coll_comm_create is called it checks to see
    if the comm is an inTRAcomm and if not just returns,
    however it was doing the new malloc for comm->coll_fns
    before the if-check which was unsetting the MPICH defaults.
    The solution is to move the malloc after the inTRAcomm
    if-check.
    
    Signed-off-by: Rob Latham <robl at mcs.anl.gov>

diff --git a/src/mpid/pamid/src/comm/mpid_comm.c b/src/mpid/pamid/src/comm/mpid_comm.c
index 173c352..5359da8 100644
--- a/src/mpid/pamid/src/comm/mpid_comm.c
+++ b/src/mpid/pamid/src/comm/mpid_comm.c
@@ -227,12 +227,12 @@ void MPIDI_Coll_comm_create(MPID_Comm *comm)
   if (!MPIDI_Process.optimized.collectives)
     return;
 
-  comm->coll_fns = MPIU_Calloc0(1, MPID_Collops);
-  MPID_assert(comm->coll_fns != NULL);
-
   if(comm->comm_kind != MPID_INTRACOMM) return;
   /* Create a geometry */
 
+  comm->coll_fns = MPIU_Calloc0(1, MPID_Collops);
+  MPID_assert(comm->coll_fns != NULL);
+
    if(comm->mpid.geometry != MPIDI_Process.world_geometry)
    {
       if(unlikely(MPIDI_Process.verbose >= MPIDI_VERBOSE_DETAILS_0 && comm->rank == 0))

http://git.mpich.org/mpich.git/commitdiff/0d69b60645097892b62d183f73a0a34a2019a5ae

commit 0d69b60645097892b62d183f73a0a34a2019a5ae
Author: Paul Coffman <pkcoff at us.ibm.com>
Date:   Tue Oct 7 21:05:26 2014 -0500

    MPIDO_Ibarrier optimized pami code incorrect
    
    The optimized pami code currently within MPIDO_Ibarrier
    is incorrect - for now do not run it, instead just
    kick back to MPICH if mpir_nbc is set, otherwise
    call the blocking MPIR_Barrier().
    
    Signed-off-by: Rob Latham <robl at mcs.anl.gov>

diff --git a/src/mpid/pamid/src/coll/barrier/mpido_ibarrier.c b/src/mpid/pamid/src/coll/barrier/mpido_ibarrier.c
index 10607c7..fd33193 100644
--- a/src/mpid/pamid/src/coll/barrier/mpido_ibarrier.c
+++ b/src/mpid/pamid/src/coll/barrier/mpido_ibarrier.c
@@ -34,8 +34,11 @@ int MPIDO_Ibarrier(MPID_Comm *comm_ptr, MPID_Request **request)
 {
    TRACE_ERR("Entering MPIDO_Ibarrier\n");
 
-   if(unlikely(comm_ptr->mpid.user_selected_type[PAMI_XFER_BARRIER] == MPID_COLL_USE_MPICH))
-   {
+     /*
+      * There is actually no current pami optimization for this
+      * so just kick it back to MPICH if mpir_nbc is set, otherwise
+      * call the blocking MPIR_Barrier().
+      */
      if (MPIDI_Process.mpir_nbc != 0)
        return 0;
 
@@ -56,61 +59,4 @@ int MPIDO_Ibarrier(MPID_Comm *comm_ptr, MPID_Request **request)
       MPIDI_Request_complete_norelease_inline(mpid_request);
 
       return rc;
-   }
-
-   MPIDI_Post_coll_t barrier_post;
-   pami_xfer_t barrier;
-   pami_algorithm_t my_barrier;
-   pami_metadata_t *my_barrier_md;
-   int queryreq = 0;
-
-   MPID_Request * mpid_request = MPID_Request_create_inline();
-   mpid_request->kind = MPID_COLL_REQUEST;
-   *request = mpid_request;
-
-   barrier.cb_done = cb_ibarrier;
-   barrier.cookie = (void *)mpid_request;
-
-   if(comm_ptr->mpid.user_selected_type[PAMI_XFER_BARRIER] == MPID_COLL_OPTIMIZED)
-   {
-      TRACE_ERR("Optimized barrier (%s) was pre-selected\n", comm_ptr->mpid.opt_protocol_md[PAMI_XFER_BARRIER][0].name);
-      my_barrier = comm_ptr->mpid.opt_protocol[PAMI_XFER_BARRIER][0];
-      my_barrier_md = &comm_ptr->mpid.opt_protocol_md[PAMI_XFER_BARRIER][0];
-      queryreq = comm_ptr->mpid.must_query[PAMI_XFER_BARRIER][0];
-   }
-   else
-   {
-      TRACE_ERR("Barrier (%s) was specified by user\n", comm_ptr->mpid.user_metadata[PAMI_XFER_BARRIER].name);
-      my_barrier = comm_ptr->mpid.user_selected[PAMI_XFER_BARRIER];
-      my_barrier_md = &comm_ptr->mpid.user_metadata[PAMI_XFER_BARRIER];
-      queryreq = comm_ptr->mpid.user_selected_type[PAMI_XFER_BARRIER];
-   }
-
-   barrier.algorithm = my_barrier;
-   /* There is no support for query-required barrier protocols here */
-   MPID_assert_always(queryreq != MPID_COLL_ALWAYS_QUERY);
-   MPID_assert_always(queryreq != MPID_COLL_CHECK_FN_REQUIRED);
-
-   /* TODO Name needs fixed somehow */
-   MPIDI_Update_last_algorithm(comm_ptr, my_barrier_md->name);
-   if(unlikely(MPIDI_Process.verbose >= MPIDI_VERBOSE_DETAILS_ALL && comm_ptr->rank == 0))
-   {
-      unsigned long long int threadID;
-      MPIU_Thread_id_t tid;
-      MPIU_Thread_self(&tid);
-      threadID = (unsigned long long int)tid;
-      fprintf(stderr,"<%llx> Using protocol %s for barrier on %u\n",
-              threadID,
-              my_barrier_md->name,
-              (unsigned) comm_ptr->context_id);
-   }
-   TRACE_ERR("%s barrier\n", MPIDI_Process.context_post.active>0?"posting":"invoking");
-   MPIDI_Context_post(MPIDI_Context[0], &barrier_post.state,
-                      MPIDI_Pami_post_wrapper, (void *)&barrier);
-   TRACE_ERR("barrier %s rc: %d\n", MPIDI_Process.context_post.active>0?"posted":"invoked", rc);
-
-   MPID_Progress_wait_inline(1);
-
-   TRACE_ERR("exiting mpido_ibarrier\n");
-   return 0;
 }

http://git.mpich.org/mpich.git/commitdiff/ed9fd1c0b4d014f2ee2fbb2ae96439dc80cd1ba1

commit ed9fd1c0b4d014f2ee2fbb2ae96439dc80cd1ba1
Author: Paul Coffman <pkcoff at us.ibm.com>
Date:   Tue Oct 7 20:51:16 2014 -0500

    Disable MPIDI_Init_collsel_extension call on BGQ
    
    At the end of MPIDI_PAMI_context_init the MPIDI_Init_collsel_extension
    is called to enable the dynamic optimized collective advisor.  This
    is not supported on BGQ so ifdef out the call.
    
    Signed-off-by: Rob Latham <robl at mcs.anl.gov>

diff --git a/src/mpid/pamid/src/mpid_init.c b/src/mpid/pamid/src/mpid_init.c
index 51ccd9b..064f085 100644
--- a/src/mpid/pamid/src/mpid_init.c
+++ b/src/mpid/pamid/src/mpid_init.c
@@ -810,8 +810,12 @@ MPIDI_PAMI_context_init(int* threading, int *size)
   /* Get collective selection advisor and cache it */
   /* --------------------------------------------- */
   /* Context is created, i.e. collective selection extension is initialized in PAMI. Now I can get the
-     advisor if I am not in TUNE mode. If in TUNE mode, I can init collsel and generate the table */
+     advisor if I am not in TUNE mode. If in TUNE mode, I can init collsel and generate the table.
+     This is not supported on BGQ.
+  */
+#ifndef __BGQ_
   MPIDI_Init_collsel_extension();
+#endif
 
 #if (MPIDI_STATISTICS || MPIDI_PRINTENV)
   MPIDI_open_pe_extension();

http://git.mpich.org/mpich.git/commitdiff/1b2a48f0881fabeff151c8d66f0b1b20197d94a0

commit 1b2a48f0881fabeff151c8d66f0b1b20197d94a0
Author: Paul Coffman <pkcoff at us.ibm.com>
Date:   Tue Oct 7 20:35:59 2014 -0500

    Remove Blue Gene/Q specific MPID_VCR_GET_LPIDS
    
    Since Blue Gene/Q does not support dynamic tasking there was only 1
    element in the MPID_VCR_t data structure so a shortcut was taken
    to avoid a malloc and free of a new list of pami_task_t in a form
    the pami geometry creation was expecting.  However it seems an
    array of structures with 1 pami_task_t element in it is not exactly the same
    in memory as an array of pami_task_t  themselves so the pami
    geometry creation was failing.  The fix is to simply do what
    all other platforms do and malloc a separate list of pami_task_t.
    
    Signed-off-by: Rob Latham <robl at mcs.anl.gov>

diff --git a/src/mpid/pamid/include/mpidi_macros.h b/src/mpid/pamid/include/mpidi_macros.h
index b8a1b45..5c59753 100644
--- a/src/mpid/pamid/include/mpidi_macros.h
+++ b/src/mpid/pamid/include/mpidi_macros.h
@@ -133,14 +133,6 @@ _data_sz_out)                                                   \
   vcr[index]->taskid;                           \
 })
 
-#ifdef __BGQ__
-/* BGQ just shares the MPICH vcr/tasklist.
-   This relies on the VCR being a simple task list which is asserted
-   in static_assertions() in mpid_init.c */
-#define MPID_VCR_GET_LPIDS(comm, taskids) taskids =  &((*comm->vcr)->taskid);
-#define MPID_VCR_FREE_LPIDS(taskids) 
-#else
-/* non-BGQ mallocs and copies the MPICH vcr/tasklist */
 #define MPID_VCR_GET_LPIDS(comm, taskids)                      \
 ({                                                             \
   int i;                                                       \
@@ -151,8 +143,6 @@ _data_sz_out)                                                   \
 })
 #define MPID_VCR_FREE_LPIDS(taskids) MPIU_Free(taskids)
 
-#endif
-
 #define MPID_GPID_Get(comm_ptr, rank, gpid)             \
 ({                                                      \
   gpid[1] = MPID_VCR_GET_LPID(comm_ptr->vcr, rank);     \

http://git.mpich.org/mpich.git/commitdiff/2f747908de96935856ad4bc9cb12a3cbcb6c7f1a

commit 2f747908de96935856ad4bc9cb12a3cbcb6c7f1a
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Tue Oct 14 18:10:01 2014 -0500

    portals4: large threshold comparison
    
    If a message size is <= PTL_LARGE_THRESHOLD, use a single operation.
    Previously, this would generate unnecessary 0-byte operations when
    messages were exactly the size of the threshold.
    
    Signed-off-by: Pavan Balaji <balaji at anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c
index 005b678..15a9345 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_recv.c
@@ -399,7 +399,7 @@ int MPID_nem_ptl_recv_posted(MPIDI_VC_t *vc, MPID_Request *rreq)
     MPIDI_Datatype_get_info(rreq->dev.user_count, rreq->dev.datatype, dt_contig, data_sz, dt_ptr, dt_true_lb);
     MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST, "count=%d datatype=%#x contig=%d data_sz=%lu", rreq->dev.user_count, rreq->dev.datatype, dt_contig, data_sz));
 
-    if (data_sz < PTL_LARGE_THRESHOLD) {
+    if (data_sz <= PTL_LARGE_THRESHOLD) {
         if (dt_contig) {
             /* small contig message */
             MPIU_DBG_MSG(CH3_CHANNEL, VERBOSE, "Small contig message");
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
index f86cbcc..0ab2bc7 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
@@ -202,7 +202,7 @@ static int send_msg(ptl_hdr_data_t ssend_flag, struct MPIDI_VC *vc, const void *
     MPIDI_Datatype_get_info(count, datatype, dt_contig, data_sz, dt_ptr, dt_true_lb);
     MPIU_DBG_MSG_FMT(CH3_CHANNEL, VERBOSE, (MPIU_DBG_FDEST, "count=%d datatype=%#x contig=%d data_sz=%lu", count, datatype, dt_contig, data_sz));
     
-    if (data_sz < PTL_LARGE_THRESHOLD) {
+    if (data_sz <= PTL_LARGE_THRESHOLD) {
         /* Small message.  Send all data eagerly */
         if (dt_contig) {
             MPIU_DBG_MSG(CH3_CHANNEL, VERBOSE, "Small contig message");

http://git.mpich.org/mpich.git/commitdiff/f86d6fdc4c865b49e9f7ff5c98a23dd72501216d

commit f86d6fdc4c865b49e9f7ff5c98a23dd72501216d
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Mon Oct 13 12:29:52 2014 -0500

    portals4: remove pointer update from unmatched improbe
    
    Previously, the message pointer in an improbe call in the portals4
    netmod layer was set to MPI_MESSAGE_NULL if there was no match. This
    is incorrect because the ch3 layer eventually returns either a valid
    pointer or NULL. As ch3 already starts the value at NULL, we can just
    omit the update.
    
    Signed-off-by: Pavan Balaji <balaji at anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_probe.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_probe.c
index b7c7dcc..f686381 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_probe.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_probe.c
@@ -241,7 +241,6 @@ int MPID_nem_ptl_improbe(MPIDI_VC_t *vc, int source, int tag, MPID_Comm *comm, i
     }
     else {
         MPID_Request_release(req);
-        *message = MPI_MESSAGE_NULL;
     }
 
  fn_exit:

http://git.mpich.org/mpich.git/commitdiff/1084143383906236f49e1c5aae556a58be62785b

commit 1084143383906236f49e1c5aae556a58be62785b
Author: Huiwei Lu <huiweilu at mcs.anl.gov>
Date:   Tue Oct 14 13:49:40 2014 -0500

    Mark two MPI_Comm_idup related tests as xfail
    
    They are known to be failing. Mark them as xfail so they will not send
    false alarms to other patches.
    
    No reviewer.

diff --git a/test/mpi/threads/comm/testlist b/test/mpi/threads/comm/testlist
index 2dfc780..d3b6b5e 100644
--- a/test/mpi/threads/comm/testlist
+++ b/test/mpi/threads/comm/testlist
@@ -3,5 +3,5 @@ dup_leak_test 2
 comm_dup_deadlock 4
 comm_create_threads 4
 comm_create_group_threads 4
-comm_idup 4 mpiversion=3.0
-ctxidup 4
+comm_idup 4 mpiversion=3.0 xfail=ticket2108
+ctxidup 4 mpiversion=3.0 xfail=ticket2108

http://git.mpich.org/mpich.git/commitdiff/b3350c45058fa464375c13c65e661dd389ccb6bf

commit b3350c45058fa464375c13c65e661dd389ccb6bf
Author: Pavan Balaji <balaji at anl.gov>
Date:   Tue Oct 14 12:54:47 2014 -0500

    Fixes to the man and www page builds.
    
    We need to check both the build and src directories before installing
    the man and www pages.  We were only checking the build directory for
    man and the src directory for www.  Also, make sure to install both
    the man pages and the www pages on install.
    
    Signed-off-by: Sangmin Seo <sseo at anl.gov>

diff --git a/Makefile.am b/Makefile.am
index fe42721..d08430e 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -453,7 +453,14 @@ htmldoc-local: $(mpi_sources:.c=.html-phony) $(doc1_src_txt:.txt=.html1-phony)
 INSTALL_DATA_LOCAL_TARGETS += install-man-local
 # this is a variation on the recipe that was previously generated by simplemake
 install-man-local:
-	@if test -d $(top_builddir)/man && cd $(top_builddir)/man ; then \
+	@dir= ; \
+	if test -d $(builddir)/man ; then \
+	 dir=$(builddir)/man ; \
+	elif test -d $(srcdir)/man ; then \
+	 dir=$(srcdir)/man ; \
+	fi ; \
+	export dir ; \
+	if test ! -z $$dir && test -d $$dir && cd $$dir ; then \
 	 for name in * ; do \
 	  if [ "$$name" = "*" ] ; then continue ; fi ; \
 	  if [ -f $$name ] ; then \
@@ -477,8 +484,16 @@ install-man-local:
 	 done ; \
 	fi
 
+INSTALL_DATA_LOCAL_TARGETS += install-html-local
 install-html-local:
-	@if test -d $(builddir)/www && cd $(builddir)/www ; then \
+	@dir= ; \
+	if test -d $(builddir)/www ; then \
+	 dir=$(builddir)/www ; \
+	elif test -d $(srcdir)/www ; then \
+	 dir=$(srcdir)/www ; \
+	fi ; \
+	export dir ; \
+	if test ! -z $$dir && test -d $$dir && cd $$dir ; then \
 	 for name in * ; do \
 	  if [ "$$name" = "*" ] ; then continue ; fi ; \
 	  if [ -f $$name ] ; then \

http://git.mpich.org/mpich.git/commitdiff/92d2dfcce8bfcf47f861b9e687b1349164076f37

commit 92d2dfcce8bfcf47f861b9e687b1349164076f37
Author: Wesley Bland <wbland at anl.gov>
Date:   Tue Oct 14 11:16:13 2014 -0500

    Test MPI_IN_PLACE variants of NBCs.
    
    Signed-off-by: Junchao Zhang <jczhang at mcs.anl.gov>

diff --git a/test/mpi/coll/nonblocking.c b/test/mpi/coll/nonblocking.c
index 3a306f5..b9f4f5b 100644
--- a/test/mpi/coll/nonblocking.c
+++ b/test/mpi/coll/nonblocking.c
@@ -86,48 +86,108 @@ int main(int argc, char **argv)
     MPI_Igather(sbuf, NUM_INTS, MPI_INT, rbuf, NUM_INTS, MPI_INT, 0, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
+    if (0 == rank)
+        MPI_Igather(MPI_IN_PLACE, -1, MPI_DATATYPE_NULL, rbuf, NUM_INTS, MPI_INT, 0, comm, &req);
+    else
+        MPI_Igather(sbuf, NUM_INTS, MPI_INT, rbuf, NUM_INTS, MPI_INT, 0, comm, &req);
+    MPI_Wait(&req, MPI_STATUS_IGNORE);
+
     MPI_Igatherv(sbuf, NUM_INTS, MPI_INT, rbuf, rcounts, rdispls, MPI_INT, 0, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
+    if (0 == rank)
+        MPI_Igatherv(MPI_IN_PLACE, -1, MPI_DATATYPE_NULL, rbuf, rcounts, rdispls, MPI_INT, 0, comm, &req);
+    else
+        MPI_Igatherv(sbuf, NUM_INTS, MPI_INT, rbuf, rcounts, rdispls, MPI_INT, 0, comm, &req);
+    MPI_Wait(&req, MPI_STATUS_IGNORE);
+
     MPI_Iscatter(sbuf, NUM_INTS, MPI_INT, rbuf, NUM_INTS, MPI_INT, 0, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
+    if (0 == rank)
+        MPI_Iscatter(sbuf, NUM_INTS, MPI_INT, MPI_IN_PLACE, -1, MPI_DATATYPE_NULL, 0, comm, &req);
+    else
+        MPI_Iscatter(sbuf, NUM_INTS, MPI_INT, rbuf, NUM_INTS, MPI_INT, 0, comm, &req);
+    MPI_Wait(&req, MPI_STATUS_IGNORE);
+
     MPI_Iscatterv(sbuf, scounts, sdispls, MPI_INT, rbuf, NUM_INTS, MPI_INT, 0, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
+    if (0 == rank)
+        MPI_Iscatterv(sbuf, scounts, sdispls, MPI_INT, MPI_IN_PLACE, -1, MPI_DATATYPE_NULL, 0, comm, &req);
+    else
+        MPI_Iscatterv(sbuf, scounts, sdispls, MPI_INT, rbuf, NUM_INTS, MPI_INT, 0, comm, &req);
+    MPI_Wait(&req, MPI_STATUS_IGNORE);
+
     MPI_Iallgather(sbuf, NUM_INTS, MPI_INT, rbuf, NUM_INTS, MPI_INT, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
+    MPI_Iallgather(MPI_IN_PLACE, -1, MPI_DATATYPE_NULL, rbuf, NUM_INTS, MPI_INT, comm, &req);
+    MPI_Wait(&req, MPI_STATUS_IGNORE);
+
     MPI_Iallgatherv(sbuf, NUM_INTS, MPI_INT, rbuf, rcounts, rdispls, MPI_INT, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
+    MPI_Iallgatherv(MPI_IN_PLACE, -1, MPI_DATATYPE_NULL, rbuf, rcounts, rdispls, MPI_INT, comm, &req);
+    MPI_Wait(&req, MPI_STATUS_IGNORE);
+
     MPI_Ialltoall(sbuf, NUM_INTS, MPI_INT, rbuf, NUM_INTS, MPI_INT, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
+    MPI_Ialltoall(MPI_IN_PLACE, -1, MPI_DATATYPE_NULL, rbuf, NUM_INTS, MPI_INT, comm, &req);
+    MPI_Wait(&req, MPI_STATUS_IGNORE);
+
     MPI_Ialltoallv(sbuf, scounts, sdispls, MPI_INT, rbuf, rcounts, rdispls, MPI_INT, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
+    MPI_Ialltoallv(MPI_IN_PLACE, NULL, NULL, MPI_DATATYPE_NULL, rbuf, rcounts, rdispls, MPI_INT, comm, &req);
+    MPI_Wait(&req, MPI_STATUS_IGNORE);
+
     MPI_Ialltoallw(sbuf, scounts, sdispls, types, rbuf, rcounts, rdispls, types, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
+    MPI_Ialltoallw(MPI_IN_PLACE, NULL, NULL, NULL, rbuf, rcounts, rdispls, types, comm, &req);
+    MPI_Wait(&req, MPI_STATUS_IGNORE);
+
     MPI_Ireduce(sbuf, rbuf, NUM_INTS, MPI_INT, MPI_SUM, 0, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
+    if (0 == rank)
+        MPI_Ireduce(MPI_IN_PLACE, rbuf, NUM_INTS, MPI_INT, MPI_SUM, 0, comm, &req);
+    else
+        MPI_Ireduce(sbuf, rbuf, NUM_INTS, MPI_INT, MPI_SUM, 0, comm, &req);
+    MPI_Wait(&req, MPI_STATUS_IGNORE);
+
     MPI_Iallreduce(sbuf, rbuf, NUM_INTS, MPI_INT, MPI_SUM, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
+    MPI_Iallreduce(MPI_IN_PLACE, rbuf, NUM_INTS, MPI_INT, MPI_SUM, comm, &req);
+    MPI_Wait(&req, MPI_STATUS_IGNORE);
+
     MPI_Ireduce_scatter(sbuf, rbuf, rcounts, MPI_INT, MPI_SUM, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
+    MPI_Ireduce_scatter(MPI_IN_PLACE, rbuf, rcounts, MPI_INT, MPI_SUM, comm, &req);
+    MPI_Wait(&req, MPI_STATUS_IGNORE);
+
     MPI_Ireduce_scatter_block(sbuf, rbuf, NUM_INTS, MPI_INT, MPI_SUM, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
+    MPI_Ireduce_scatter_block(MPI_IN_PLACE, rbuf, NUM_INTS, MPI_INT, MPI_SUM, comm, &req);
+    MPI_Wait(&req, MPI_STATUS_IGNORE);
+
     MPI_Iscan(sbuf, rbuf, NUM_INTS, MPI_INT, MPI_SUM, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
+    MPI_Iscan(MPI_IN_PLACE, rbuf, NUM_INTS, MPI_INT, MPI_SUM, comm, &req);
+    MPI_Wait(&req, MPI_STATUS_IGNORE);
+
     MPI_Iexscan(sbuf, rbuf, NUM_INTS, MPI_INT, MPI_SUM, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
+    MPI_Iexscan(MPI_IN_PLACE, rbuf, NUM_INTS, MPI_INT, MPI_SUM, comm, &req);
+    MPI_Wait(&req, MPI_STATUS_IGNORE);
+
     if (sbuf) free(sbuf);
     if (rbuf) free(rbuf);
     if (scounts) free(scounts);

http://git.mpich.org/mpich.git/commitdiff/8e9a9c84a4e5b42f42ded7a90307527d1aaab25b

commit 8e9a9c84a4e5b42f42ded7a90307527d1aaab25b
Author: Ken Raffenetti <raffenet at mcs.anl.gov>
Date:   Fri Oct 10 16:30:16 2014 -0500

    portals4: PTL_EVENT_PUT is only valid at the target
    
    Remove incorrect event from send handler assertion. PTL_EVENT_PUT should
    only be seen at the target.
    
    Signed-off-by: Pavan Balaji <balaji at anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
index 12b942c..f86cbcc 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_send.c
@@ -20,7 +20,7 @@ static int handler_send_complete(const ptl_event_t *e)
 
     MPIDI_FUNC_ENTER(MPID_STATE_HANDLER_SEND_COMPLETE);
 
-    MPIU_Assert(e->type == PTL_EVENT_ACK || e->type == PTL_EVENT_PUT || e->type == PTL_EVENT_GET);
+    MPIU_Assert(e->type == PTL_EVENT_ACK || e->type == PTL_EVENT_GET);
 
     if (REQ_PTL(sreq)->md != PTL_INVALID_HANDLE) {
         ret = PtlMDRelease(REQ_PTL(sreq)->md);

http://git.mpich.org/mpich.git/commitdiff/9af91d6824f1f7d66d56c73858306b4c40b3120c

commit 9af91d6824f1f7d66d56c73858306b4c40b3120c
Author: Wesley Bland <wbland at anl.gov>
Date:   Fri Oct 10 23:51:56 2014 -0500

    Revert "Test MPI_IN_PLACE variants of NBCs."
    
    This reverts commit dd62e80936512f3bf9e69ff466ff462202b44d00.

diff --git a/test/mpi/coll/nonblocking.c b/test/mpi/coll/nonblocking.c
index bbd2f95..3a306f5 100644
--- a/test/mpi/coll/nonblocking.c
+++ b/test/mpi/coll/nonblocking.c
@@ -86,93 +86,48 @@ int main(int argc, char **argv)
     MPI_Igather(sbuf, NUM_INTS, MPI_INT, rbuf, NUM_INTS, MPI_INT, 0, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
-    MPI_Igather(MPI_IN_PLACE, -1, MPI_DATATYPE_NULL, rbuf, NUM_INTS, MPI_INT, 0, comm, &req);
-    MPI_Wait(&req, MPI_STATUS_IGNORE);
-
     MPI_Igatherv(sbuf, NUM_INTS, MPI_INT, rbuf, rcounts, rdispls, MPI_INT, 0, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
-    MPI_Igatherv(NULL, -1, MPI_DATATYPE_NULL, rbuf, rcounts, rdispls, MPI_INT, 0, comm, &req);
-    MPI_Wait(&req, MPI_STATUS_IGNORE);
-
     MPI_Iscatter(sbuf, NUM_INTS, MPI_INT, rbuf, NUM_INTS, MPI_INT, 0, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
-    MPI_Iscatter(sbuf, NUM_INTS, MPI_INT, MPI_IN_PLACE, -1, MPI_DATATYPE_NULL, 0, comm, &req);
-    MPI_Wait(&req, MPI_STATUS_IGNORE);
-
     MPI_Iscatterv(sbuf, scounts, sdispls, MPI_INT, rbuf, NUM_INTS, MPI_INT, 0, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
-    MPI_Iscatterv(sbuf, scounts, sdispls, MPI_INT, MPI_IN_PLACE, -1, MPI_DATATYPE_NULL, 0, comm, &req);
-    MPI_Wait(&req, MPI_STATUS_IGNORE);
-
     MPI_Iallgather(sbuf, NUM_INTS, MPI_INT, rbuf, NUM_INTS, MPI_INT, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
-    MPI_Iallgather(MPI_IN_PLACE, -1, MPI_DATATYPE_NULL, rbuf, NUM_INTS, MPI_INT, comm, &req);
-    MPI_Wait(&req, MPI_STATUS_IGNORE);
-
     MPI_Iallgatherv(sbuf, NUM_INTS, MPI_INT, rbuf, rcounts, rdispls, MPI_INT, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
-    MPI_Iallgatherv(MPI_IN_PLACE, -1, MPI_DATATYPE_NULL, rbuf, rcounts, rdispls, MPI_INT, comm, &req);
-    MPI_Wait(&req, MPI_STATUS_IGNORE);
-
     MPI_Ialltoall(sbuf, NUM_INTS, MPI_INT, rbuf, NUM_INTS, MPI_INT, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
-    MPI_Ialltoall(MPI_IN_PLACE, -1, MPI_DATATYPE_NULL, rbuf, NUM_INTS, MPI_INT, comm, &req);
-    MPI_Wait(&req, MPI_STATUS_IGNORE);
-
     MPI_Ialltoallv(sbuf, scounts, sdispls, MPI_INT, rbuf, rcounts, rdispls, MPI_INT, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
-    MPI_Ialltoallv(MPI_IN_PLACE, NULL, NULL, MPI_DATATYPE_NULL, rbuf, rcounts, rdispls, MPI_INT, comm, &req);
-    MPI_Wait(&req, MPI_STATUS_IGNORE);
-
     MPI_Ialltoallw(sbuf, scounts, sdispls, types, rbuf, rcounts, rdispls, types, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
-    MPI_Ialltoallw(MPI_IN_PLACE, NULL, NULL, NULL, rbuf, rcounts, rdispls, types, comm, &req);
-    MPI_Wait(&req, MPI_STATUS_IGNORE);
-
     MPI_Ireduce(sbuf, rbuf, NUM_INTS, MPI_INT, MPI_SUM, 0, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
-    MPI_Ireduce(MPI_IN_PLACE, rbuf, NUM_INTS, MPI_INT, MPI_SUM, 0, comm, &req);
-    MPI_Wait(&req, MPI_STATUS_IGNORE);
-
     MPI_Iallreduce(sbuf, rbuf, NUM_INTS, MPI_INT, MPI_SUM, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
-    MPI_Iallreduce(MPI_IN_PLACE, rbuf, NUM_INTS, MPI_INT, MPI_SUM, comm, &req);
-    MPI_Wait(&req, MPI_STATUS_IGNORE);
-
     MPI_Ireduce_scatter(sbuf, rbuf, rcounts, MPI_INT, MPI_SUM, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
-    MPI_Ireduce_scatter(MPI_IN_PLACE, rbuf, rcounts, MPI_INT, MPI_SUM, comm, &req);
-    MPI_Wait(&req, MPI_STATUS_IGNORE);
-
     MPI_Ireduce_scatter_block(sbuf, rbuf, NUM_INTS, MPI_INT, MPI_SUM, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
-    MPI_Ireduce_scatter_block(MPI_IN_PLACE, rbuf, NUM_INTS, MPI_INT, MPI_SUM, comm, &req);
-    MPI_Wait(&req, MPI_STATUS_IGNORE);
-
     MPI_Iscan(sbuf, rbuf, NUM_INTS, MPI_INT, MPI_SUM, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
-    MPI_Iscan(MPI_IN_PLACE, rbuf, NUM_INTS, MPI_INT, MPI_SUM, comm, &req);
-    MPI_Wait(&req, MPI_STATUS_IGNORE);
-
     MPI_Iexscan(sbuf, rbuf, NUM_INTS, MPI_INT, MPI_SUM, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
-    MPI_Iexscan(MPI_IN_PLACE, rbuf, NUM_INTS, MPI_INT, MPI_SUM, comm, &req);
-    MPI_Wait(&req, MPI_STATUS_IGNORE);
-
     if (sbuf) free(sbuf);
     if (rbuf) free(rbuf);
     if (scounts) free(scounts);

http://git.mpich.org/mpich.git/commitdiff/bb68b1aa4ef1524c16bcc2848fba3dd3229eaf7a

commit bb68b1aa4ef1524c16bcc2848fba3dd3229eaf7a
Author: Pavan Balaji <balaji at anl.gov>
Date:   Fri Oct 10 19:29:55 2014 -0500

    Test MPI_IN_PLACE variants of NBCs.
    
    This doesn't pass with mpich yet.
    
    Signed-off-by: Wesley Bland <wbland at anl.gov>

diff --git a/test/mpi/coll/nonblocking.c b/test/mpi/coll/nonblocking.c
index 3a306f5..bbd2f95 100644
--- a/test/mpi/coll/nonblocking.c
+++ b/test/mpi/coll/nonblocking.c
@@ -86,48 +86,93 @@ int main(int argc, char **argv)
     MPI_Igather(sbuf, NUM_INTS, MPI_INT, rbuf, NUM_INTS, MPI_INT, 0, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
+    MPI_Igather(MPI_IN_PLACE, -1, MPI_DATATYPE_NULL, rbuf, NUM_INTS, MPI_INT, 0, comm, &req);
+    MPI_Wait(&req, MPI_STATUS_IGNORE);
+
     MPI_Igatherv(sbuf, NUM_INTS, MPI_INT, rbuf, rcounts, rdispls, MPI_INT, 0, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
+    MPI_Igatherv(NULL, -1, MPI_DATATYPE_NULL, rbuf, rcounts, rdispls, MPI_INT, 0, comm, &req);
+    MPI_Wait(&req, MPI_STATUS_IGNORE);
+
     MPI_Iscatter(sbuf, NUM_INTS, MPI_INT, rbuf, NUM_INTS, MPI_INT, 0, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
+    MPI_Iscatter(sbuf, NUM_INTS, MPI_INT, MPI_IN_PLACE, -1, MPI_DATATYPE_NULL, 0, comm, &req);
+    MPI_Wait(&req, MPI_STATUS_IGNORE);
+
     MPI_Iscatterv(sbuf, scounts, sdispls, MPI_INT, rbuf, NUM_INTS, MPI_INT, 0, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
+    MPI_Iscatterv(sbuf, scounts, sdispls, MPI_INT, MPI_IN_PLACE, -1, MPI_DATATYPE_NULL, 0, comm, &req);
+    MPI_Wait(&req, MPI_STATUS_IGNORE);
+
     MPI_Iallgather(sbuf, NUM_INTS, MPI_INT, rbuf, NUM_INTS, MPI_INT, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
+    MPI_Iallgather(MPI_IN_PLACE, -1, MPI_DATATYPE_NULL, rbuf, NUM_INTS, MPI_INT, comm, &req);
+    MPI_Wait(&req, MPI_STATUS_IGNORE);
+
     MPI_Iallgatherv(sbuf, NUM_INTS, MPI_INT, rbuf, rcounts, rdispls, MPI_INT, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
+    MPI_Iallgatherv(MPI_IN_PLACE, -1, MPI_DATATYPE_NULL, rbuf, rcounts, rdispls, MPI_INT, comm, &req);
+    MPI_Wait(&req, MPI_STATUS_IGNORE);
+
     MPI_Ialltoall(sbuf, NUM_INTS, MPI_INT, rbuf, NUM_INTS, MPI_INT, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
+    MPI_Ialltoall(MPI_IN_PLACE, -1, MPI_DATATYPE_NULL, rbuf, NUM_INTS, MPI_INT, comm, &req);
+    MPI_Wait(&req, MPI_STATUS_IGNORE);
+
     MPI_Ialltoallv(sbuf, scounts, sdispls, MPI_INT, rbuf, rcounts, rdispls, MPI_INT, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
+    MPI_Ialltoallv(MPI_IN_PLACE, NULL, NULL, MPI_DATATYPE_NULL, rbuf, rcounts, rdispls, MPI_INT, comm, &req);
+    MPI_Wait(&req, MPI_STATUS_IGNORE);
+
     MPI_Ialltoallw(sbuf, scounts, sdispls, types, rbuf, rcounts, rdispls, types, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
+    MPI_Ialltoallw(MPI_IN_PLACE, NULL, NULL, NULL, rbuf, rcounts, rdispls, types, comm, &req);
+    MPI_Wait(&req, MPI_STATUS_IGNORE);
+
     MPI_Ireduce(sbuf, rbuf, NUM_INTS, MPI_INT, MPI_SUM, 0, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
+    MPI_Ireduce(MPI_IN_PLACE, rbuf, NUM_INTS, MPI_INT, MPI_SUM, 0, comm, &req);
+    MPI_Wait(&req, MPI_STATUS_IGNORE);
+
     MPI_Iallreduce(sbuf, rbuf, NUM_INTS, MPI_INT, MPI_SUM, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
+    MPI_Iallreduce(MPI_IN_PLACE, rbuf, NUM_INTS, MPI_INT, MPI_SUM, comm, &req);
+    MPI_Wait(&req, MPI_STATUS_IGNORE);
+
     MPI_Ireduce_scatter(sbuf, rbuf, rcounts, MPI_INT, MPI_SUM, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
+    MPI_Ireduce_scatter(MPI_IN_PLACE, rbuf, rcounts, MPI_INT, MPI_SUM, comm, &req);
+    MPI_Wait(&req, MPI_STATUS_IGNORE);
+
     MPI_Ireduce_scatter_block(sbuf, rbuf, NUM_INTS, MPI_INT, MPI_SUM, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
+    MPI_Ireduce_scatter_block(MPI_IN_PLACE, rbuf, NUM_INTS, MPI_INT, MPI_SUM, comm, &req);
+    MPI_Wait(&req, MPI_STATUS_IGNORE);
+
     MPI_Iscan(sbuf, rbuf, NUM_INTS, MPI_INT, MPI_SUM, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
+    MPI_Iscan(MPI_IN_PLACE, rbuf, NUM_INTS, MPI_INT, MPI_SUM, comm, &req);
+    MPI_Wait(&req, MPI_STATUS_IGNORE);
+
     MPI_Iexscan(sbuf, rbuf, NUM_INTS, MPI_INT, MPI_SUM, comm, &req);
     MPI_Wait(&req, MPI_STATUS_IGNORE);
 
+    MPI_Iexscan(MPI_IN_PLACE, rbuf, NUM_INTS, MPI_INT, MPI_SUM, comm, &req);
+    MPI_Wait(&req, MPI_STATUS_IGNORE);
+
     if (sbuf) free(sbuf);
     if (rbuf) free(rbuf);
     if (scounts) free(scounts);

http://git.mpich.org/mpich.git/commitdiff/8fb07be51ac7cbefc11e4a6f211e9a60c056c10e

commit 8fb07be51ac7cbefc11e4a6f211e9a60c056c10e
Author: Igor Ivanov <Igor.Ivanov at itseez.com>
Date:   Fri Oct 10 14:07:21 2014 +0200

    mpid: Fix REQUEST memory leak for Irsend
    
    Memory leak can appear in case netmod usage. Comm override functions
    are responsible for creating its own request. So they  need to come
    before the sreq is created.
    
    Signed-off-by: Igor Ivanov <Igor.Ivanov at itseez.com>
    Signed-off-by: Pavan Balaji <balaji at anl.gov>

diff --git a/src/mpid/ch3/src/mpid_irsend.c b/src/mpid/ch3/src/mpid_irsend.c
index d8a0c0a..0bae646 100644
--- a/src/mpid/ch3/src/mpid_irsend.c
+++ b/src/mpid/ch3/src/mpid_irsend.c
@@ -50,6 +50,19 @@ int MPID_Irsend(const void * buf, int count, MPI_Datatype datatype, int rank, in
 	mpi_errno = MPIDI_Isend_self(buf, count, datatype, rank, tag, comm, context_offset, MPIDI_REQUEST_TYPE_RSEND, &sreq);
 	goto fn_exit;
     }
+
+    if (rank != MPI_PROC_NULL) {
+        MPIDI_Comm_get_vc_set_active(comm, rank, &vc);
+#ifdef ENABLE_COMM_OVERRIDES
+        /* this needs to come before the sreq is created, since the override
+         * function is responsible for creating its own request */
+        if (vc->comm_ops && vc->comm_ops->irsend)
+        {
+            mpi_errno = vc->comm_ops->irsend( vc, buf, count, datatype, rank, tag, comm, context_offset, &sreq);
+            goto fn_exit;
+        }
+#endif
+    }
     
     MPIDI_Request_create_sreq(sreq, mpi_errno, goto fn_exit);
     MPIDI_Request_set_type(sreq, MPIDI_REQUEST_TYPE_RSEND);
@@ -62,16 +75,6 @@ int MPID_Irsend(const void * buf, int count, MPI_Datatype datatype, int rank, in
 	goto fn_exit;
     }
     
-    MPIDI_Comm_get_vc_set_active(comm, rank, &vc);
-
-#ifdef ENABLE_COMM_OVERRIDES
-    if (vc->comm_ops && vc->comm_ops->irsend)
-    {
-	mpi_errno = vc->comm_ops->irsend( vc, buf, count, datatype, rank, tag, comm, context_offset, &sreq);
-	goto fn_exit;
-    }
-#endif
-    
     MPIDI_Datatype_get_info(count, datatype, dt_contig, data_sz, dt_ptr, dt_true_lb);
 
     MPIDI_Pkt_init(ready_pkt, MPIDI_CH3_PKT_READY_SEND);

http://git.mpich.org/mpich.git/commitdiff/7bab4b511228c73d846e4f90df9eb6239b7c0df8

commit 7bab4b511228c73d846e4f90df9eb6239b7c0df8
Author: Igor Ivanov <Igor.Ivanov at itseez.com>
Date:   Fri Oct 10 14:33:37 2014 +0200

    mpi/coll: Fix incorrect parameter check
    
    Fixed wrong parameter check condition for MPI_Iallgather and MPI_Iallgatherv
    -1 is valid value for sendcount in case MPI_IN_PLACE
    MPI spec says:
    The in place option for intracommunicators is specified by passing the value
    MPI_IN_PLACE to the argument sendbuf at all processes. In such a case, sendcount and
    sendtype are ignored, and the input data of each process is assumed to be in the area where
    that process would receive its own contribution to the receive buffer.
    
    Signed-off-by: Igor Ivanov <Igor.Ivanov at itseez.com>
    Signed-off-by: Pavan Balaji <balaji at anl.gov>

diff --git a/src/mpi/coll/iallgather.c b/src/mpi/coll/iallgather.c
index 18d798c..1e0eb10 100644
--- a/src/mpi/coll/iallgather.c
+++ b/src/mpi/coll/iallgather.c
@@ -675,9 +675,10 @@ int MPI_Iallgather(const void *sendbuf, int sendcount, MPI_Datatype sendtype,
     {
         MPID_BEGIN_ERROR_CHECKS
         {
-            if (sendbuf != MPI_IN_PLACE)
+            if (sendbuf != MPI_IN_PLACE) {
                 MPIR_ERRTEST_DATATYPE(sendtype, "sendtype", mpi_errno);
-            MPIR_ERRTEST_COUNT(sendcount, mpi_errno);
+                MPIR_ERRTEST_COUNT(sendcount, mpi_errno);
+            }
             MPIR_ERRTEST_DATATYPE(recvtype, "recvtype", mpi_errno);
             MPIR_ERRTEST_COMM(comm, mpi_errno);
 
diff --git a/src/mpi/coll/iallgatherv.c b/src/mpi/coll/iallgatherv.c
index 5d5fd15..39b88d6 100644
--- a/src/mpi/coll/iallgatherv.c
+++ b/src/mpi/coll/iallgatherv.c
@@ -779,8 +779,10 @@ int MPI_Iallgatherv(const void *sendbuf, int sendcount, MPI_Datatype sendtype, v
     {
         MPID_BEGIN_ERROR_CHECKS
         {
-            if (sendbuf != MPI_IN_PLACE)
+            if (sendbuf != MPI_IN_PLACE) {
                 MPIR_ERRTEST_DATATYPE(sendtype, "sendtype", mpi_errno);
+                MPIR_ERRTEST_COUNT(sendcount, mpi_errno);
+            }
             MPIR_ERRTEST_DATATYPE(recvtype, "recvtype", mpi_errno);
             MPIR_ERRTEST_COMM(comm, mpi_errno);
 

http://git.mpich.org/mpich.git/commitdiff/9597d51e0d2b9d5252c9e4bd200ac86c395a3e80

commit 9597d51e0d2b9d5252c9e4bd200ac86c395a3e80
Author: Igor Ivanov <Igor.Ivanov at itseez.com>
Date:   Thu Oct 9 15:26:10 2014 +0200

    netmode/mxm: Add mprobe/imrecv support
    
    Signed-off-by: Igor Ivanov <Igor.Ivanov at itseez.com>

diff --git a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_impl.h b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_impl.h
index bd80a69..3b2bb12 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_impl.h
+++ b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_impl.h
@@ -265,6 +265,11 @@ static inline void _mxm_req_wait(mxm_req_base_t * req)
     mxm_wait(&mxm_wreq);
 }
 
+static inline int _mxm_eager_threshold(void)
+{
+    return 262144;
+}
+
 /*
  * Tag management section
  */
diff --git a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_init.c b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_init.c
index ca31588..7bd2cad 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_init.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_init.c
@@ -70,7 +70,7 @@ static MPIDI_Comm_ops_t comm_ops = {
     MPID_nem_mxm_ssend, /* ssend */
     MPID_nem_mxm_isend, /* isend */
     MPID_nem_mxm_isend, /* irsend */
-    MPID_nem_mxm_issend,        /* issend */
+    MPID_nem_mxm_issend,/* issend */
 
     NULL,       /* send_init */
     NULL,       /* bsend_init */
@@ -142,6 +142,8 @@ int MPID_nem_mxm_init(MPIDI_PG_t * pg_p, int pg_rank, char **bc_val_p, int *val_
     if (mpi_errno)
         MPIU_ERR_POP(mpi_errno);
 
+    MPIDI_Anysource_improbe_fn = MPID_nem_mxm_anysource_improbe;
+
   fn_exit:
     MPIDI_FUNC_EXIT(MPID_STATE_MXM_INIT);
     return mpi_errno;
@@ -266,8 +268,12 @@ int MPID_nem_mxm_vc_init(MPIDI_VC_t * vc)
 
     vc_area->pending_sends = 0;
 
-    vc->rndvSend_fn = NULL;
-    vc->rndvRecv_fn = NULL;
+    /* Use default rendezvous functions */
+    vc->eager_max_msg_sz = _mxm_eager_threshold();
+    vc->ready_eager_max_msg_sz = vc->eager_max_msg_sz;
+    vc->rndvSend_fn = MPID_nem_lmt_RndvSend;
+    vc->rndvRecv_fn = MPID_nem_lmt_RndvRecv;
+
     vc->sendNoncontig_fn = MPID_nem_mxm_SendNoncontig;
     vc->comm_ops = &comm_ops;
 
diff --git a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_probe.c b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_probe.c
index c5d6d8a..0d7cf24 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_probe.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/mxm/mxm_probe.c
@@ -107,11 +107,76 @@ int MPID_nem_mxm_improbe(MPIDI_VC_t * vc, int source, int tag, MPID_Comm * comm,
                          int *flag, MPID_Request ** message, MPI_Status * status)
 {
     int mpi_errno = MPI_SUCCESS;
+    mxm_error_t err;
+    mxm_recv_req_t mxm_req;
+    mxm_message_h mxm_msg;
+    MPID_nem_mxm_vc_area *vc_area = (vc ? VC_BASE(vc) : NULL);
 
     MPIDI_STATE_DECL(MPID_STATE_MXM_IMPROBE);
     MPIDI_FUNC_ENTER(MPID_STATE_MXM_IMPROBE);
 
-    MPIU_Assert(0 && "not currently implemented");
+    mxm_req.base.state = MXM_REQ_NEW;
+    mxm_req.base.mq = (mxm_mq_h) comm->dev.ch.netmod_priv;
+    mxm_req.base.conn = (vc_area ? vc_area->mxm_ep->mxm_conn : 0);
+
+    mxm_req.tag = _mxm_tag_mpi2mxm(tag, comm->context_id + context_offset);
+    mxm_req.tag_mask = _mxm_tag_mask(tag);
+
+    err = mxm_req_mprobe(&mxm_req, &mxm_msg);
+    if (MXM_OK == err) {
+        MPID_Request *req;
+
+        *flag = 1;
+
+        req = MPID_Request_create();
+        MPIU_Object_set_ref(req, 2);
+        req->kind = MPID_REQUEST_MPROBE;
+        req->comm = comm;
+        MPIR_Comm_add_ref(comm);
+        req->ch.vc = vc;
+//        MPIDI_Request_set_sync_send_flag(req, 1); /* set this flag in case MXM_REQ_OP_SEND_SYNC*/
+        MPIDI_Request_set_msg_type(req, MPIDI_REQUEST_EAGER_MSG);
+        req->dev.recv_pending_count = 1;
+
+        _mxm_to_mpi_status(mxm_req.base.error, &req->status);
+        req->status.MPI_TAG = _mxm_tag_mxm2mpi(mxm_req.completion.sender_tag);
+        req->status.MPI_SOURCE = mxm_req.completion.sender_imm;
+        req->dev.recv_data_sz = mxm_req.completion.sender_len;
+        MPIR_STATUS_SET_COUNT(req->status, req->dev.recv_data_sz);
+        req->dev.tmpbuf = MPIU_Malloc(req->dev.recv_data_sz);
+        MPIU_Assert(req->dev.tmpbuf);
+
+        mxm_req.base.completed_cb = NULL;
+        mxm_req.base.context = req;
+        mxm_req.base.data_type = MXM_REQ_DATA_BUFFER;
+        mxm_req.base.data.buffer.ptr = req->dev.tmpbuf;
+        mxm_req.base.data.buffer.length = req->dev.recv_data_sz;
+
+        err = mxm_message_recv(&mxm_req, mxm_msg);
+        _mxm_req_wait(&mxm_req.base);
+
+        MPIDI_CH3U_Request_complete(req);
+
+        *message = req;
+
+        /* TODO: Should we change status
+                _mxm_to_mpi_status(mxm_req.base.error, status);
+        */
+        status->MPI_SOURCE = req->status.MPI_SOURCE;
+        status->MPI_TAG = req->status.MPI_TAG;
+        MPIR_STATUS_SET_COUNT(*status, req->dev.recv_data_sz);
+
+        _dbg_mxm_output(8,
+                        "imProbe ========> Found USER msg (context %d from %d tag %d size %d) \n",
+                        comm->context_id + context_offset, status->MPI_SOURCE, status->MPI_TAG, MPIR_STATUS_GET_COUNT(*status));
+    }
+    else if (MXM_ERR_NO_MESSAGE == err) {
+        *flag = 0;
+        *message = NULL;
+    }
+    else {
+        mpi_errno = MPI_ERR_INTERN;
+    }
 
   fn_exit:
     MPIDI_FUNC_EXIT(MPID_STATE_MXM_IMPROBE);

http://git.mpich.org/mpich.git/commitdiff/62b464bc47400f82ce5910d02de6336079d4d05a

commit 62b464bc47400f82ce5910d02de6336079d4d05a
Author: Pavan Balaji <balaji at anl.gov>
Date:   Tue Oct 7 22:11:02 2014 -0500

    MPICH-specific initialization of hcoll.
    
    In some cases, we cannot let hcoll use whatever transport it needs.
    For example, ch3:sock assumes that while blocking the next event will
    come over the socket channel.  If HCOLL decides to use a different
    transport (such as mxm) and the next event comes on that transport,
    this can result in a deadlock.  In this patch, we let the channel
    specify what transports it can accept.
    
    Signed-off-by: Devendar Bureddy <devendar at mellanox.com>

diff --git a/src/mpid/ch3/channels/sock/subconfigure.m4 b/src/mpid/ch3/channels/sock/subconfigure.m4
index 79b2250..b95f9cb 100644
--- a/src/mpid/ch3/channels/sock/subconfigure.m4
+++ b/src/mpid/ch3/channels/sock/subconfigure.m4
@@ -12,6 +12,7 @@ AC_DEFUN([PAC_SUBCFG_PREREQ_]PAC_SUBCFG_AUTO_SUFFIX,[
         build_ch3u_sock=yes
 
         MPID_MAX_THREAD_LEVEL=MPI_THREAD_MULTIPLE
+        MPID_CH3I_CH_HCOLL_BCOL="basesmuma,basesmuma,ptpcoll"
 
         # code that formerly lived in setup_args
         #
diff --git a/src/mpid/ch3/src/ch3u_comm.c b/src/mpid/ch3/src/ch3u_comm.c
index b0e10c3..46bfe64 100644
--- a/src/mpid/ch3/src/ch3u_comm.c
+++ b/src/mpid/ch3/src/ch3u_comm.c
@@ -57,6 +57,9 @@ static hook_elt *destroy_hooks_tail = NULL;
 int MPIDI_CH3I_Comm_init(void)
 {
     int mpi_errno = MPI_SUCCESS;
+#if defined HAVE_LIBHCOLL && MPID_CH3I_CH_HCOLL_BCOL
+    MPIU_CHKLMEM_DECL(1);
+#endif
     MPIDI_STATE_DECL(MPID_STATE_MPIDI_CH3U_COMM_INIT);
 
     MPIDI_FUNC_ENTER(MPID_STATE_MPIDI_CH3U_COMM_INIT);
@@ -69,6 +72,31 @@ int MPIDI_CH3I_Comm_init(void)
 
 #if defined HAVE_LIBHCOLL
     if (MPIR_CVAR_CH3_ENABLE_HCOLL) {
+        int r;
+
+        /* check if the user is not trying to override the multicast
+         * setting before resetting it */
+        if (getenv("HCOLL_ENABLE_MCAST_ALL") == NULL) {
+            /* FIXME: We should not unconditionally disable multicast.
+             * Test to make sure it's available before choosing to
+             * enable or disable it. */
+            r = MPL_putenv("HCOLL_ENABLE_MCAST_ALL=0");
+            MPIU_ERR_CHKANDJUMP(r, mpi_errno, MPI_ERR_OTHER, "**putenv");
+        }
+
+#if defined MPID_CH3I_CH_HCOLL_BCOL
+        if (getenv("HCOLL_BCOL") == NULL) {
+            char *envstr;
+            int size = strlen("HCOLL_BCOL=") + strlen(MPID_CH3I_CH_HCOLL_BCOL) + 1;
+
+            MPIU_CHKLMEM_MALLOC(envstr, char *, size, mpi_errno, "**malloc");
+            MPL_snprintf(envstr, size, "HCOLL_BCOL=%s", MPID_CH3I_CH_HCOLL_BCOL);
+
+            r = MPL_putenv(envstr);
+            MPIU_ERR_CHKANDJUMP(r, mpi_errno, MPI_ERR_OTHER, "**putenv");
+        }
+#endif
+
         mpi_errno = MPIDI_CH3U_Comm_register_create_hook(hcoll_comm_create, NULL);
         if (mpi_errno) MPIU_ERR_POP(mpi_errno);
         mpi_errno = MPIDI_CH3U_Comm_register_destroy_hook(hcoll_comm_destroy, NULL);
@@ -81,6 +109,9 @@ int MPIDI_CH3I_Comm_init(void)
     
  fn_exit:
     MPIDI_FUNC_EXIT(MPID_STATE_MPIDI_CH3U_COMM_INIT);
+#if defined HAVE_LIBHCOLL && MPID_CH3I_CH_HCOLL_BCOL
+    MPIU_CHKLMEM_FREEALL();
+#endif
     return mpi_errno;
  fn_fail:
     goto fn_exit;

http://git.mpich.org/mpich.git/commitdiff/aacb14557b8d2ad9965227a62e6ec0c65d0387a2

commit aacb14557b8d2ad9965227a62e6ec0c65d0387a2
Author: Pavan Balaji <balaji at anl.gov>
Date:   Sun Oct 5 20:50:22 2014 -0500

    Added a note in the README about MXM_LOG_LEVEL.
    
    Tell users that they can disable warnings if needed.
    
    Signed-off-by: Devendar Bureddy <devendar at mellanox.com>

diff --git a/README.vin b/README.vin
index 528164b..b1ad747 100644
--- a/README.vin
+++ b/README.vin
@@ -492,6 +492,16 @@ include headers are present in /path/to/mxm/include):
   --with-mxm-lib=/path/to/mxm/lib
   --with-mxm-include=/path/to/mxm/include
 
+By default, the mxm library throws warnings when the system does not
+enable certain features that might hurt performance.  These are
+important warnings that might cause performance degradation on your
+system.  But you might need root privileges to fix some of them.  If
+you would like to disable such warnings, you can set the MXM log level
+to "error" instead of the default "warn" by using:
+
+  MXM_LOG_LEVEL=error
+  export MXM_LOG_LEVEL
+
 ib network module
 `````````````````
 The IB netmod provides support for InfiniBand on x86_64 platforms

-----------------------------------------------------------------------


hooks/post-receive
-- 
MPICH primary repository


More information about the commits mailing list