[mpich-discuss] [Openmp-dev] Libomptarget fatal error 1: failure of target construct while offloading is mandatory

Siegmar Gross siegmar.gross at informatik.hs-fulda.de
Mon Oct 1 12:48:16 CDT 2018


Hi George,

> The problem was related to parallel reduction. Jonas Hahnfeld
> submitted a patch (thanks Jonas). Your code must now work.

Jonas, thank you very much for your patch.


> Please update your copy of libomptarget 
> and recompile it. Let us know if there are any further issues.

It seems to work now. Do you know, why I get more or less the same
execution time for CPU and GPU? I would have expected a smaller
value for the GPU.

loki introduction 143 clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda dot_prod_accelerator_OpenMP.c
loki introduction 144 /usr/bin/time -p a.out
Number of processors:     24
Number of devices:        1
Default device:           0
Is initial device:        1
sum = 6.000000e+08
real 3.93
user 0.66
sys 2.64
loki introduction 145 clang -fopenmp -fopenmp-targets=x86_64-pc-linux-gnu dot_prod_accelerator_OpenMP.c
loki introduction 146 /usr/bin/time -p a.out
Number of processors:     24
Number of devices:        4
Default device:           0
Is initial device:        1
sum = 6.000000e+08
real 3.51
user 10.35
sys 6.04
loki introduction 147


Best regards and thank you very much for all help once more

Siegmar


> 
> George
> 
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------
> *From:* Siegmar Gross <siegmar.gross at informatik.hs-fulda.de>
> *Sent:* 01 October 2018 15:59
> *To:* George Rokos; llvm-openmp-dev
> *Subject:* Re: [Openmp-dev] Libomptarget fatal error 1: failure of target construct while offloading is mandatory
> Hi George,
> 
> thank you very much for your suggestions.
> 
>> Apparently your application fails to offload to the GPU. And because offloading
>> is mandatory (that's the default behavior) the library terminates the application.
>> 
>> Can you compile libomptarget in debug mode and run the app with 
>> LIBOMPTARGET_DEBUG=1 to see the debug output? That will help us identify the 
>> problem.
> 
> 
> 
> loki introduction 115 clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda
> dot_prod_accelerator_OpenMP.c
> loki introduction 116 a.out
> Number of processors:     24
> Number of devices:        1
> Default device:           0
> Is initial device:        1
> Libomptarget fatal error 1: failure of target construct while offloading is
> mandatory
> 
> 
> loki introduction 117 setenv LIBOMPTARGET_DEBUG 1
> loki introduction 118 a.out
> Libomptarget --> Loading RTLs...
> Libomptarget --> Loading library 'libomptarget.rtl.ppc64.so'...
> Libomptarget --> Unable to load library 'libomptarget.rtl.ppc64.so':
> libomptarget.rtl.ppc64.so: cannot open shared object file: No such file or
> directory!
> Libomptarget --> Loading library 'libomptarget.rtl.x86_64.so'...
> Libomptarget --> Successfully loaded library 'libomptarget.rtl.x86_64.so'!
> Libomptarget --> Registering RTL libomptarget.rtl.x86_64.so supporting 4 devices!
> Libomptarget --> Loading library 'libomptarget.rtl.cuda.so'...
> Target CUDA RTL --> Start initializing CUDA
> Libomptarget --> Successfully loaded library 'libomptarget.rtl.cuda.so'!
> Libomptarget --> Registering RTL libomptarget.rtl.cuda.so supporting 1 devices!
> Libomptarget --> Loading library 'libomptarget.rtl.aarch64.so'...
> Libomptarget --> Unable to load library 'libomptarget.rtl.aarch64.so':
> libomptarget.rtl.aarch64.so: cannot open shared object file: No such file or
> directory!
> Libomptarget --> RTLs loaded!
> Libomptarget --> Image 0x0000000000602090 is NOT compatible with RTL
> libomptarget.rtl.x86_64.so!
> Libomptarget --> Image 0x0000000000602090 is compatible with RTL
> libomptarget.rtl.cuda.so!
> Libomptarget --> RTL 0x00000000609f95d0 has index 0!
> Libomptarget --> Registering image 0x0000000000602090 with RTL
> libomptarget.rtl.cuda.so!
> Libomptarget --> Done registering entries!
> Libomptarget --> Call to omp_get_num_devices returning 1
> Libomptarget --> Default TARGET OFFLOAD policy is now mandatory (devicew were found)
> Libomptarget --> Entering target region with entry point 0x00000000004012d0 and
> device Id -1
> Libomptarget --> Checking whether device 0 is ready.
> Libomptarget --> Is the device 0 (local ID 0) initialized? 0
> Target CUDA RTL --> Getting device 0
> Target CUDA RTL --> Max CUDA blocks per grid 2147483647 exceeds the hard team
> limit 65536, capping at the hard limit
> Target CUDA RTL --> Using 1024 CUDA threads per block
> Target CUDA RTL --> Max number of CUDA blocks 65536, threads 1024 & warp size 32
> Target CUDA RTL --> Default number of teams set according to library's default 128
> Target CUDA RTL --> Default number of threads set according to library's default 128
> Libomptarget --> Device 0 is ready to use.
> Target CUDA RTL --> Load data from image 0x0000000000602090
> Target CUDA RTL --> CUDA module successfully loaded!
> Target CUDA RTL --> Entry point 0x0000000000000000 maps to
> __omp_offloading_2b_1890d30_main_l48 (0x0000000060f23320)
> Target CUDA RTL --> Entry point 0x0000000000000001 maps to
> __omp_offloading_2b_1890d30_main_l67 (0x0000000060f27c70)
> Target CUDA RTL --> Sending global device environment data 4 bytes
> Libomptarget --> Entry  0: Base=0x0000000000613bf0, Begin=0x0000000000613bf0,
> Size=800000000, Type=0x22
> Libomptarget --> Entry  1: Base=0x00000000301043f0, Begin=0x00000000301043f0,
> Size=800000000, Type=0x22
> Libomptarget --> Looking up mapping(HstPtrBegin=0x0000000000613bf0,
> Size=800000000)...
> Libomptarget --> Creating new map entry: HstBase=0x0000000000613bf0,
> HstBegin=0x0000000000613bf0, HstEnd=0x00000000301043f0, TgtBegin=0x0000000b08c20000
> Libomptarget --> There are 800000000 bytes allocated at target address
> 0x0000000b08c20000 - is new
> Libomptarget --> Looking up mapping(HstPtrBegin=0x00000000301043f0,
> Size=800000000)...
> Libomptarget --> Creating new map entry: HstBase=0x00000000301043f0,
> HstBegin=0x00000000301043f0, HstEnd=0x000000005fbf4bf0, TgtBegin=0x0000000b38720000
> Libomptarget --> There are 800000000 bytes allocated at target address
> 0x0000000b38720000 - is new
> Libomptarget --> Looking up mapping(HstPtrBegin=0x0000000000613bf0,
> Size=800000000)...
> Libomptarget --> Mapping exists with HstPtrBegin=0x0000000000613bf0,
> TgtPtrBegin=0x0000000b08c20000, Size=800000000, RefCount=1
> Libomptarget --> Obtained target argument 0x0000000b08c20000 from host pointer
> 0x0000000000613bf0
> Libomptarget --> Looking up mapping(HstPtrBegin=0x00000000301043f0,
> Size=800000000)...
> Libomptarget --> Mapping exists with HstPtrBegin=0x00000000301043f0,
> TgtPtrBegin=0x0000000b38720000, Size=800000000, RefCount=1
> Libomptarget --> Obtained target argument 0x0000000b38720000 from host pointer
> 0x00000000301043f0
> Libomptarget --> Launching target execution __omp_offloading_2b_1890d30_main_l48
> with pointer 0x0000000060ee2ee0 (index=0).
> Target CUDA RTL --> Setting CUDA threads per block to default 128
> Target CUDA RTL --> Using requested number of teams 1
> Target CUDA RTL --> Launch kernel with 1 blocks and 128 threads
> Target CUDA RTL --> Launch of entry point at 0x0000000060ee2ee0 successful!
> Target CUDA RTL --> Kernel execution at 0x0000000060ee2ee0 successful!
> Libomptarget --> Looking up mapping(HstPtrBegin=0x00000000301043f0,
> Size=800000000)...
> Libomptarget --> Mapping exists with HstPtrBegin=0x00000000301043f0,
> TgtPtrBegin=0x0000000b38720000, Size=800000000, updated RefCount=1
> Libomptarget --> There are 800000000 bytes allocated at target address
> 0x0000000b38720000 - is last
> Libomptarget --> Moving 800000000 bytes (tgt:0x0000000b38720000) ->
> (hst:0x00000000301043f0)
> Libomptarget --> Looking up mapping(HstPtrBegin=0x00000000301043f0,
> Size=800000000)...
> Libomptarget --> Deleting tgt data 0x0000000b38720000 of size 800000000
> Libomptarget --> Removing mapping with HstPtrBegin=0x00000000301043f0,
> TgtPtrBegin=0x0000000b38720000, Size=800000000
> Libomptarget --> Looking up mapping(HstPtrBegin=0x0000000000613bf0,
> Size=800000000)...
> Libomptarget --> Mapping exists with HstPtrBegin=0x0000000000613bf0,
> TgtPtrBegin=0x0000000b08c20000, Size=800000000, updated RefCount=1
> Libomptarget --> There are 800000000 bytes allocated at target address
> 0x0000000b08c20000 - is last
> Libomptarget --> Moving 800000000 bytes (tgt:0x0000000b08c20000) ->
> (hst:0x0000000000613bf0)
> Libomptarget --> Looking up mapping(HstPtrBegin=0x0000000000613bf0,
> Size=800000000)...
> Libomptarget --> Deleting tgt data 0x0000000b08c20000 of size 800000000
> Libomptarget --> Removing mapping with HstPtrBegin=0x0000000000613bf0,
> TgtPtrBegin=0x0000000b08c20000, Size=800000000
> Libomptarget --> Call to omp_get_num_devices returning 1
> Number of processors:     24
> Number of devices:        1
> Default device:           0
> Is initial device:        1
> Libomptarget --> Entering target region with entry point 0x00000000004012d1 and
> device Id -1
> Libomptarget --> Checking whether device 0 is ready.
> Libomptarget --> Is the device 0 (local ID 0) initialized? 1
> Libomptarget --> Device 0 is ready to use.
> Libomptarget --> Entry  0: Base=0x0000000000613bf0, Begin=0x0000000000613bf0,
> Size=800000000, Type=0x21
> Libomptarget --> Entry  1: Base=0x00000000301043f0, Begin=0x00000000301043f0,
> Size=800000000, Type=0x21
> Libomptarget --> Entry  2: Base=0x00007fff707a86e8, Begin=0x00007fff707a86e8,
> Size=8, Type=0x23
> Libomptarget --> Looking up mapping(HstPtrBegin=0x0000000000613bf0,
> Size=800000000)...
> Libomptarget --> Creating new map entry: HstBase=0x0000000000613bf0,
> HstBegin=0x0000000000613bf0, HstEnd=0x00000000301043f0, TgtBegin=0x0000000b08c20000
> Libomptarget --> There are 800000000 bytes allocated at target address
> 0x0000000b08c20000 - is new
> Libomptarget --> Moving 800000000 bytes (hst:0x0000000000613bf0) ->
> (tgt:0x0000000b08c20000)
> Libomptarget --> Looking up mapping(HstPtrBegin=0x00000000301043f0,
> Size=800000000)...
> Libomptarget --> Creating new map entry: HstBase=0x00000000301043f0,
> HstBegin=0x00000000301043f0, HstEnd=0x000000005fbf4bf0, TgtBegin=0x0000000b38720000
> Libomptarget --> There are 800000000 bytes allocated at target address
> 0x0000000b38720000 - is new
> Libomptarget --> Moving 800000000 bytes (hst:0x00000000301043f0) ->
> (tgt:0x0000000b38720000)
> Libomptarget --> Looking up mapping(HstPtrBegin=0x00007fff707a86e8, Size=8)...
> Libomptarget --> Creating new map entry: HstBase=0x00007fff707a86e8,
> HstBegin=0x00007fff707a86e8, HstEnd=0x00007fff707a86f0, TgtBegin=0x0000000b68220000
> Libomptarget --> There are 8 bytes allocated at target address
> 0x0000000b68220000 - is new
> Libomptarget --> Moving 8 bytes (hst:0x00007fff707a86e8) -> (tgt:0x0000000b68220000)
> Libomptarget --> Looking up mapping(HstPtrBegin=0x0000000000613bf0,
> Size=800000000)...
> Libomptarget --> Mapping exists with HstPtrBegin=0x0000000000613bf0,
> TgtPtrBegin=0x0000000b08c20000, Size=800000000, RefCount=1
> Libomptarget --> Obtained target argument 0x0000000b08c20000 from host pointer
> 0x0000000000613bf0
> Libomptarget --> Looking up mapping(HstPtrBegin=0x00000000301043f0,
> Size=800000000)...
> Libomptarget --> Mapping exists with HstPtrBegin=0x00000000301043f0,
> TgtPtrBegin=0x0000000b38720000, Size=800000000, RefCount=1
> Libomptarget --> Obtained target argument 0x0000000b38720000 from host pointer
> 0x00000000301043f0
> Libomptarget --> Looking up mapping(HstPtrBegin=0x00007fff707a86e8, Size=8)...
> Libomptarget --> Mapping exists with HstPtrBegin=0x00007fff707a86e8,
> TgtPtrBegin=0x0000000b68220000, Size=8, RefCount=1
> Libomptarget --> Obtained target argument 0x0000000b68220000 from host pointer
> 0x00007fff707a86e8
> Libomptarget --> Launching target execution __omp_offloading_2b_1890d30_main_l67
> with pointer 0x0000000060ee2e70 (index=1).
> Target CUDA RTL --> Setting CUDA threads per block to default 128
> Target CUDA RTL --> Using requested number of teams 1
> Target CUDA RTL --> Launch kernel with 1 blocks and 128 threads
> Target CUDA RTL --> Launch of entry point at 0x0000000060ee2e70 successful!
> Target CUDA RTL --> Kernel execution error at 0x0000000060ee2e70!
> Target CUDA RTL --> CUDA error is: an illegal memory access was encountered
> Libomptarget --> Executing target region abort target.
> Libomptarget fatal error 1: failure of target construct while offloading is
> mandatory
> Libomptarget --> Unloading target library!
> Libomptarget --> Image 0x0000000000602090 is compatible with RTL 0x00000000609f95d0!
> Libomptarget --> Unregistered image 0x0000000000602090 from RTL 0x00000000609f95d0!
> Libomptarget --> Done unregistering images!
> Libomptarget --> Removing translation table for descriptor 0x0000000000613b90
> Libomptarget --> Done unregistering library!
> Target CUDA RTL --> Error when unloading CUDA module
> Target CUDA RTL --> CUDA error is: an illegal memory access was encountered
> loki introduction 119
> 
> 
> Thank you very much for your help in advance.
> 
> 
> Best regards
> 
> Siegmar
> 
> 
> 
> 
>> 
>> George
>> 
>> --------------------------------------------------------------------------------
>> *From:* Openmp-dev <openmp-dev-bounces at lists.llvm.org> on behalf of Siegmar 
>> Gross via Openmp-dev <openmp-dev at lists.llvm.org>
>> *Sent:* 01 October 2018 13:26
>> *To:* llvm-openmp-dev
>> *Subject:* [Openmp-dev] Libomptarget fatal error 1: failure of target construct
>> while offloading is mandatory
>> Hi,
>> 
>> today I've installed llvm-trunk. Unfortunately, I get an error for one of my
>> programs.
>> 
>> 
>> loki introduction 110 clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda
>> dot_prod_accelerator_OpenMP.c
>> loki introduction 111 a.out
>> Number of processors:     24
>> Number of devices:        1
>> Default device:           0
>> Is initial device:        1
>> Libomptarget fatal error 1: failure of target construct while offloading is
>> mandatory
>> 
>> loki introduction 112 setenv OMP_DEFAULT_DEVICE 1
>> loki introduction 113 a.out
>> Libomptarget fatal error 1: failure of target construct while offloading is
>> mandatory
>> 
>> loki introduction 114 clang -v
>> clang version 8.0.0 (trunk 343447)
>> Target: x86_64-unknown-linux-gnu
>> Thread model: posix
>> InstalledDir: /usr/local/llvm-trunk/bin
>> Found candidate GCC installation: /usr/lib64/gcc/x86_64-suse-linux/4.8
>> Selected GCC installation: /usr/lib64/gcc/x86_64-suse-linux/4.8
>> Candidate multilib: .;@m64
>> Candidate multilib: 32;@m32
>> Selected multilib: .;@m64
>> Found CUDA installation: /usr/local/cuda-9.0, version 9.0
>> loki introduction 115
>> 
>> 
>> 
>> The program works fine with llvm-7.0.0.
>> 
>> loki introduction 125 clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda
>> dot_prod_accelerator_OpenMP.c
>> loki introduction 126 a.out
>> Number of processors:     24
>> Number of devices:        1
>> Default device:           0
>> Is initial device:        1
>> sum = 6.000000e+08
>> 
>> loki introduction 127 setenv OMP_DEFAULT_DEVICE 1
>> loki introduction 128 a.out
>> Number of processors:     24
>> Number of devices:        1
>> Default device:           1
>> Is initial device:        1
>> sum = 6.000000e+08
>> 
>> loki introduction 129 clang -v
>> clang version 7.0.0 (tags/RELEASE_700/final)
>> Target: x86_64-unknown-linux-gnu
>> Thread model: posix
>> InstalledDir: /usr/local/llvm-7.0.0/bin
>> Found candidate GCC installation: /usr/lib64/gcc/x86_64-suse-linux/4.8
>> Selected GCC installation: /usr/lib64/gcc/x86_64-suse-linux/4.8
>> Candidate multilib: .;@m64
>> Candidate multilib: 32;@m32
>> Selected multilib: .;@m64
>> Found CUDA installation: /usr/local/cuda-9.0, version 9.0
>> loki introduction 130
>> 
>> 
>> Hopefully somebody can fix the problem. Do you need anything else to locate the
>> error? Thank you very much for any help in advance.
>> 
>> 
>> Kind regards
>> 
>> Siegmar
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list