[mpich-discuss] BLCR kernel module not present

Raghunath rajachan at cse.ohio-state.edu
Fri Apr 26 15:56:49 CDT 2013


Michael,

The BLCR support in MVAPICH works fine as well.  MVAPICH implements
its own Checkpoint-Restart mechanism for the CH3-IB and Nemesis-IB
channels. The MPICH design for the Nemesis-TCP channel is left
untouched, as Pavan indicated.

--
Raghu


On Fri, Apr 26, 2013 at 4:25 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
>
> I can't speak for mvapich, of course.  I'm only speaking about mpich.
> However, most of our derivatives don't destroy the features that are in
> stock mpich.  So I'd think it'll work fine with mvapich as well.
>
>  -- Pavan
>
> On 04/26/2013 03:20 PM US Central Time, michael wrote:
>> Thanks Pavan
>>
>> Just to be clear, you're saying if I use mvapich with blcr then a
>> running multi-node MPI job when killed (eg out of time) by a batch
>> scheduler can be restarted (from which checkpoint?) presuming it doesn't
>> have open files?
>>
>> Many thanks, M
>>
>> On Fri, 2013-04-26 at 15:12 -0500, Pavan Balaji wrote:
>>> Michael,
>>>
>>> BLCR support for mpich should work fine; if something is broken, please
>>> let us know.
>>>
>>> However, the core BLCR group itself hadn't released updates in a while,
>>> primarily because they didn't have direct funding for it.  I believe
>>> that's fixed now and they are working on newer releases.
>>>
>>>  -- Pavan
>>>
>>> On 04/26/2013 03:07 PM US Central Time, michael wrote:
>>> > Hi folks
>>> > I was wondering what the state of BLCR for mpich/mvapich is? eg how
>>> > reliably can one presume it to be?
>>> > Thanks, Michael
>>> >
>>> >
>>> > On Fri, 2013-04-26 at 14:33 -0500, Wesley Bland wrote:
>>> >> It looks like you might have missed installing the kernel module for
>>> >> BLCR. What is the output of `lsmod`?
>>> >>
>>> >>
>>> >> Alternatively, if you installed BLCR by using apt-get in Ubuntu, you
>>> >> should be able to use dkms to manage your kernel modules
>>> >> automatically. Make sure you have the package 'blcr-dkms' installed
>>> >> (you should be able to check this by typing `dims status`.
>>> >>
>>> >>
>>> >> Do either of those solutions solve your issue?
>>> >>
>>> >>
>>> >> Wesley
>>> >>
>>> >> On Apr 26, 2013, at 1:55 PM, basma a.azeem
>>> >> <basmaabdelazeem at hotmail.com <mailto:basmaabdelazeem at hotmail.com> <mailto:basmaabdelazeem at hotmail.com>> wrote:
>>> >>
>>> >>>
>>> >>>
>>> >>> Thank you for your help
>>> >>>
>>> >>>
>>> >>> i installed BLCR 0.8.5 on my ubuntu 12.10  to be used for MPICH -3.0.3
>>> >>> this version of blcr should support to kernels through 3.7.1
>>> >>>
>>> >>> when i run the command :
>>> >>> basma at basma-Satellite-A500:~$ mpiexec --info
>>> >>>
>>> >>> results:
>>> >>>
>>> >>> HYDRA build details:
>>> >>>     Version:                                 3.0.3
>>> >>>     Release Date:                            Thu Mar 28 16:01:21 CDT 2013
>>> >>>     CC:                              gcc
>>> >>>     CXX:                             c++
>>> >>>     F77:                             no
>>> >>>     F90:                             no
>>> >>>     Configure options:
>>> >>> '--disable-option-checking' '--prefix=/home/basma/mpich2-install'
>>> >>> '--disable-f77' '--disable-fc' '--enable-checkpointing'
>>> >>> '--with-hydra-ckpointlib=blcr' '--cache-file=/dev/null' '--srcdir=.'
>>> >>> 'CC=gcc' 'CFLAGS= -O2' 'LDFLAGS= ' 'LIBS=-lrt -lcr -lpthread '
>>> >>> 'CPPFLAGS= -I/home/basma/libraries/mpich-3.0.3/src/mpl/include
>>> >>> -I/home/basma/libraries/mpich-3.0.3/src/mpl/include
>>> >>> -I/home/basma/libraries/mpich-3.0.3/src/openpa/src
>>> >>> -I/home/basma/libraries/mpich-3.0.3/src/openpa/src
>>> >>> -I/home/basma/libraries/mpich-3.0.3/src/mpi/romio/include'
>>> >>>     Process Manager:                         pmi
>>> >>>     Launchers available:                     ssh rsh fork slurm ll
>>> >>> lsf sge manual persist
>>> >>>     Topology libraries available:            hwloc
>>> >>>     Resource management kernels available:   user slurm ll lsf sge
>>> >>> pbs cobalt
>>> >>>     Checkpointing libraries available:       blcr
>>> >>>     Demux engines available:                 poll select
>>> >>>
>>> >>> so i thought that every thing is ok but when i try to rum mpiexec it
>>> >>> failed:
>>> >>>
>>> >>> basma at basma-Satellite-A500:~$ mpiexec -ckpointlib blcr
>>> >>> -ckpoint-prefix /home/business/ckpts/app.ckpoint -ckpoint-interval
>>> >>> 3600  -n 4 /home/basma/libraries/mpich-3.0.3/examples/cpi
>>> >>>
>>> >>> results:
>>> >>>
>>> >>> Fatal error in MPI_Init: Other MPI error, error stack:
>>> >>> MPIR_Init_thread(433)...:
>>> >>> MPID_Init(151)..........: channel initialization failed
>>> >>> MPIDI_CH3_Init(70)......:
>>> >>> MPID_nem_init(379)......:
>>> >>> MPIDI_nem_ckpt_init(153): BLCR kernel module not present
>>> >>> Fatal error in MPI_Init: Other MPI error, error stack:
>>> >>> MPIR_Init_thread(433)...:
>>> >>> MPID_Init(151)..........: channel initialization failed
>>> >>> MPIDI_CH3_Init(70)......:
>>> >>> MPID_nem_init(379)......:
>>> >>> MPIDI_nem_ckpt_init(153): BLCR kernel module not present
>>> >>> Fatal error in MPI_Init: Other MPI error, error stack:
>>> >>> MPIR_Init_thread(433)...:
>>> >>> MPID_Init(151)..........: channel initialization failed
>>> >>> MPIDI_CH3_Init(70)......:
>>> >>> MPID_nem_init(379)......:
>>> >>> MPIDI_nem_ckpt_init(153): BLCR kernel module not present
>>> >>> Fatal error in MPI_Init: Other MPI error, error stack:
>>> >>> MPIR_Init_thread(433)...:
>>> >>> MPID_Init(151)..........: channel initialization failed
>>> >>> MPIDI_CH3_Init(70)......:
>>> >>> MPID_nem_init(379)......:
>>> >>> MPIDI_nem_ckpt_init(153): BLCR kernel module not present
>>> >>>
>>> >>> ===================================================================================
>>> >>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>> >>> =   EXIT CODE: 1
>>> >>> =   CLEANING UP REMAINING PROCESSES
>>> >>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>> >>> ===================================================================================
>>> >>>
>>> >>>
>>> >>>
>>> >>> i am a Linux and parallel programming beginner
>>> >>>
>>> >>> Thank you
>>> >>>
>>> >>>
>>> >>>
>>> >>> _______________________________________________
>>> >>> discuss mailing list     discuss at mpich.org <mailto:discuss at mpich.org> <mailto:discuss at mpich.org>
>>> >>> To manage subscription options or unsubscribe:
>>> >>> https://lists.mpich.org/mailman/listinfo/discuss
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> discuss mailing list     discuss at mpich.org <mailto:discuss at mpich.org> <mailto:discuss at mpich.org>
>>> >> To manage subscription options or unsubscribe:
>>> >> https://lists.mpich.org/mailman/listinfo/discuss
>>> >
>>> >
>>> > _______________________________________________
>>> > discuss mailing list     discuss at mpich.org <mailto:discuss at mpich.org>
>>> > To manage subscription options or unsubscribe:
>>> > https://lists.mpich.org/mailman/listinfo/discuss
>>> >
>>>
>>
>>
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss



More information about the discuss mailing list