[mpich-discuss] Error when using Einstein Toolkit

Jeff Squyres (jsquyres) jsquyres at cisco.com
Tue Apr 10 12:57:27 CDT 2018


Swarnim --

Note that the general substance of my answer to you on the Open MPI list (https://www.mail-archive.com/users@lists.open-mpi.org/msg32131.html) will be the same over here on the MPICH list:

> That being said, the error that you display is usually indicative of an error 
> in your program: i.e., you passed a bad communicator argument to 
> MPI_Comm_rank().  Double check your source code and make sure that the 
> communicator parameter value that you're passing to MPI_Comm_rank() is 
> initialized / valid / etc.

If there's an actual MPICH error, the good folks over here on the MPICH list can help you with the MPICH specifics (which is why I referred you to this MPICH list).  ...and actually, looking closer at that error code, I'm actually not 100% sure whether you are running Open MPI or MPICH...!  Are you somehow mixing the use of both MPICH and Open MPI, perchance?

Regardless, preliminarily, this looks like an error in your application.



> On Apr 10, 2018, at 1:47 PM, Swarnim Shashank <swarnim.shashank at cbs.ac.in> wrote:
> 
> Hello,
> 
> I am an undergraduate student. I have never worked with MPI and I have to use the Einstein Toolkit code which uses MPI for my project. I am running it on my personal laptop, it has two cores and four threads.
> 
> I get this error in my simulation:
> 
> Fatal error in PMPI_Comm_rank: Invalid communicator, error stack:
> PMPI_Comm_rank(110): MPI_Comm_rank(comm=0x57ed58e0, rank=0x7ffe6cb18148) failed
> PMPI_Comm_rank(68).: Invalid communicator
> [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=609541
> :
> system msg for write_line failure : Bad file descriptor
> -------------------------------------------------------
> Primary job  terminated normally, but 1 process returned
> a non-zero exit code.. Per user-direction, the job has been aborted.
> -------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun detected that one or more processes exited with non-zero status, thus causing
> the job to be terminated. The first process to do so was:
> 
>   Process name: [[32146,1],0]
>   Exit code:    5
> --------------------------------------------------------------------------
> 
> 
> Please let me know what the problem is and how to solve it.
> 
> Thank You
> Regards
> Swarnim Shashank
> Fourth Year Integrated MSc. Student
> UM-DAE-Centre for Excellence in Basic Sciences
> Mumbai
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss


-- 
Jeff Squyres
jsquyres at cisco.com

_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list