GPU computing on the bwGRiD
First tests on the bwGRiD with nvidia Tesla
The queue is called gpu
. The wall time is currently 48h
but open for negotiation. 4 gpu nodes with 2 GPU Cards each available. For more Information see below.
Access to /scratch
is very slow at the moment. But we hope to get it a bit faster. Infiniband is not possible, so max 1GB NFS is the max. speed for the access.
gpu3:~$ /cluster/NVIDIA_GPU_Computing_SDK/C/bin/linux/release/deviceQuery /cluster/NVIDIA_GPU_Computing_SDK/C/bin/linux/release/deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) There are 2 devices supporting CUDA Device 0: "Tesla T10 Processor" CUDA Driver Version: 3.20 CUDA Runtime Version: 3.20 CUDA Capability Major/Minor version number: 1.3 Total amount of global memory: 4294770688 bytes Multiprocessors x Cores/MP = Cores: 30 (MP) x 8 (Cores/MP) = 240 (Cores) Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 16384 bytes Total number of registers available per block: 16384 Warp size: 32 Maximum number of threads per block: 512 Maximum sizes of each dimension of a block: 512 x 512 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 1 Maximum memory pitch: 2147483647 bytes Texture alignment: 256 bytes Clock rate: 1.30 GHz Concurrent copy and execution: Yes Run time limit on kernels: No Integrated: No Support host page-locked memory mapping: Yes Compute mode: Default (multiple host threads can use this device simultaneously) Concurrent kernel execution: No Device has ECC support enabled: No Device is using TCC driver mode: No Device 1: "Tesla T10 Processor" CUDA Driver Version: 3.20 CUDA Runtime Version: 3.20 CUDA Capability Major/Minor version number: 1.3 Total amount of global memory: 4294770688 bytes Multiprocessors x Cores/MP = Cores: 30 (MP) x 8 (Cores/MP) = 240 (Cores) Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 16384 bytes Total number of registers available per block: 16384 Warp size: 32 Maximum number of threads per block: 512 Maximum sizes of each dimension of a block: 512 x 512 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 1 Maximum memory pitch: 2147483647 bytes Texture alignment: 256 bytes Clock rate: 1.30 GHz Concurrent copy and execution: Yes Run time limit on kernels: No Integrated: No Support host page-locked memory mapping: Yes Compute mode: Default (multiple host threads can use this device simultaneously) Concurrent kernel execution: No Device has ECC support enabled: No Device is using TCC driver mode: No deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.20, CUDA Runtime Version = 3.20, NumDevs = 2, Device = Tesla T10 Processor, Device = Tesla T10 Processor PASSED Press <Enter> to Quit... -----------------------------------------------------------
gpu3:~$ /cluster/NVIDIA_GPU_Computing_SDK/OpenCL/bin/linux/release/oclDeviceQuery oclDeviceQuery.exe Starting... OpenCL SW Info: CL_PLATFORM_NAME: NVIDIA CUDA CL_PLATFORM_VERSION: OpenCL 1.0 CUDA 3.2.1 OpenCL SDK Revision: 7027912 OpenCL Device Info: 2 devices found supporting OpenCL: --------------------------------- Device Tesla T10 Processor --------------------------------- CL_DEVICE_NAME: Tesla T10 Processor CL_DEVICE_VENDOR: NVIDIA Corporation CL_DRIVER_VERSION: 260.19.29 CL_DEVICE_VERSION: OpenCL 1.0 CUDA CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU CL_DEVICE_MAX_COMPUTE_UNITS: 30 CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3 CL_DEVICE_MAX_WORK_ITEM_SIZES: 512 / 512 / 64 CL_DEVICE_MAX_WORK_GROUP_SIZE: 512 CL_DEVICE_MAX_CLOCK_FREQUENCY: 1296 MHz CL_DEVICE_ADDRESS_BITS: 32 CL_DEVICE_MAX_MEM_ALLOC_SIZE: 1023 MByte CL_DEVICE_GLOBAL_MEM_SIZE: 4095 MByte CL_DEVICE_ERROR_CORRECTION_SUPPORT: no CL_DEVICE_LOCAL_MEM_TYPE: local CL_DEVICE_LOCAL_MEM_SIZE: 16 KByte CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE CL_DEVICE_IMAGE_SUPPORT: 1 CL_DEVICE_MAX_READ_IMAGE_ARGS: 128 CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 8 CL_DEVICE_SINGLE_FP_CONFIG: INF-quietNaNs round-to-nearest round-to-zero round-to-inf fma CL_DEVICE_IMAGE <dim> 2D_MAX_WIDTH 4096 2D_MAX_HEIGHT 32768 3D_MAX_WIDTH 2048 3D_MAX_HEIGHT 2048 3D_MAX_DEPTH 2048 CL_DEVICE_EXTENSIONS: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 CL_DEVICE_COMPUTE_CAPABILITY_NV: 1.3 NUMBER OF MULTIPROCESSORS: 30 NUMBER OF CUDA CORES: 240 CL_DEVICE_REGISTERS_PER_BLOCK_NV: 16384 CL_DEVICE_WARP_SIZE_NV: 32 CL_DEVICE_GPU_OVERLAP_NV: CL_TRUE CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV: CL_FALSE CL_DEVICE_INTEGRATED_MEMORY_NV: CL_FALSE CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t> CHAR 1, SHORT 1, INT 1, LONG 1, FLOAT 1, DOUBLE 1 --------------------------------- Device Tesla T10 Processor --------------------------------- CL_DEVICE_NAME: Tesla T10 Processor CL_DEVICE_VENDOR: NVIDIA Corporation CL_DRIVER_VERSION: 260.19.29 CL_DEVICE_VERSION: OpenCL 1.0 CUDA CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU CL_DEVICE_MAX_COMPUTE_UNITS: 30 CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3 CL_DEVICE_MAX_WORK_ITEM_SIZES: 512 / 512 / 64 CL_DEVICE_MAX_WORK_GROUP_SIZE: 512 CL_DEVICE_MAX_CLOCK_FREQUENCY: 1296 MHz CL_DEVICE_ADDRESS_BITS: 32 CL_DEVICE_MAX_MEM_ALLOC_SIZE: 1023 MByte CL_DEVICE_GLOBAL_MEM_SIZE: 4095 MByte CL_DEVICE_ERROR_CORRECTION_SUPPORT: no CL_DEVICE_LOCAL_MEM_TYPE: local CL_DEVICE_LOCAL_MEM_SIZE: 16 KByte CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE CL_DEVICE_IMAGE_SUPPORT: 1 CL_DEVICE_MAX_READ_IMAGE_ARGS: 128 CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 8 CL_DEVICE_SINGLE_FP_CONFIG: INF-quietNaNs round-to-nearest round-to-zero round-to-inf fma CL_DEVICE_IMAGE <dim> 2D_MAX_WIDTH 4096 2D_MAX_HEIGHT 32768 3D_MAX_WIDTH 2048 3D_MAX_HEIGHT 2048 3D_MAX_DEPTH 2048 CL_DEVICE_EXTENSIONS: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 CL_DEVICE_COMPUTE_CAPABILITY_NV: 1.3 NUMBER OF MULTIPROCESSORS: 30 NUMBER OF CUDA CORES: 240 CL_DEVICE_REGISTERS_PER_BLOCK_NV: 16384 CL_DEVICE_WARP_SIZE_NV: 32 CL_DEVICE_GPU_OVERLAP_NV: CL_TRUE CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV: CL_FALSE CL_DEVICE_INTEGRATED_MEMORY_NV: CL_FALSE CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t> CHAR 1, SHORT 1, INT 1, LONG 1, FLOAT 1, DOUBLE 1 --------------------------------- 2D Image Formats Supported (71) --------------------------------- # Channel Order Channel Type 1 CL_R CL_FLOAT 2 CL_R CL_HALF_FLOAT 3 CL_R CL_UNORM_INT8 4 CL_R CL_UNORM_INT16 5 CL_R CL_SNORM_INT16 6 CL_R CL_SIGNED_INT8 7 CL_R CL_SIGNED_INT16 8 CL_R CL_SIGNED_INT32 9 CL_R CL_UNSIGNED_INT8 10 CL_R CL_UNSIGNED_INT16 11 CL_R CL_UNSIGNED_INT32 12 CL_A CL_FLOAT 13 CL_A CL_HALF_FLOAT 14 CL_A CL_UNORM_INT8 15 CL_A CL_UNORM_INT16 16 CL_A CL_SNORM_INT16 17 CL_A CL_SIGNED_INT8 18 CL_A CL_SIGNED_INT16 19 CL_A CL_SIGNED_INT32 20 CL_A CL_UNSIGNED_INT8 21 CL_A CL_UNSIGNED_INT16 22 CL_A CL_UNSIGNED_INT32 23 CL_RG CL_FLOAT 24 CL_RG CL_HALF_FLOAT 25 CL_RG CL_UNORM_INT8 26 CL_RG CL_UNORM_INT16 27 CL_RG CL_SNORM_INT16 28 CL_RG CL_SIGNED_INT8 29 CL_RG CL_SIGNED_INT16 30 CL_RG CL_SIGNED_INT32 31 CL_RG CL_UNSIGNED_INT8 32 CL_RG CL_UNSIGNED_INT16 33 CL_RG CL_UNSIGNED_INT32 34 CL_RA CL_FLOAT 35 CL_RA CL_HALF_FLOAT 36 CL_RA CL_UNORM_INT8 37 CL_RA CL_UNORM_INT16 38 CL_RA CL_SNORM_INT16 39 CL_RA CL_SIGNED_INT8 40 CL_RA CL_SIGNED_INT16 41 CL_RA CL_SIGNED_INT32 42 CL_RA CL_UNSIGNED_INT8 43 CL_RA CL_UNSIGNED_INT16 44 CL_RA CL_UNSIGNED_INT32 45 CL_RGBA CL_FLOAT 46 CL_RGBA CL_HALF_FLOAT 47 CL_RGBA CL_UNORM_INT8 48 CL_RGBA CL_UNORM_INT16 49 CL_RGBA CL_SNORM_INT16 50 CL_RGBA CL_SIGNED_INT8 51 CL_RGBA CL_SIGNED_INT16 52 CL_RGBA CL_SIGNED_INT32 53 CL_RGBA CL_UNSIGNED_INT8 54 CL_RGBA CL_UNSIGNED_INT16 55 CL_RGBA CL_UNSIGNED_INT32 56 CL_BGRA CL_UNORM_INT8 57 CL_BGRA CL_SIGNED_INT8 58 CL_BGRA CL_UNSIGNED_INT8 59 CL_ARGB CL_UNORM_INT8 60 CL_ARGB CL_SIGNED_INT8 61 CL_ARGB CL_UNSIGNED_INT8 62 CL_INTENSITY CL_FLOAT 63 CL_INTENSITY CL_HALF_FLOAT 64 CL_INTENSITY CL_UNORM_INT8 65 CL_INTENSITY CL_UNORM_INT16 66 CL_INTENSITY CL_SNORM_INT16 67 CL_LUMINANCE CL_FLOAT 68 CL_LUMINANCE CL_HALF_FLOAT 69 CL_LUMINANCE CL_UNORM_INT8 70 CL_LUMINANCE CL_UNORM_INT16 71 CL_LUMINANCE CL_SNORM_INT16 --------------------------------- 3D Image Formats Supported (71) --------------------------------- # Channel Order Channel Type 1 CL_R CL_FLOAT 2 CL_R CL_HALF_FLOAT 3 CL_R CL_UNORM_INT8 4 CL_R CL_UNORM_INT16 5 CL_R CL_SNORM_INT16 6 CL_R CL_SIGNED_INT8 7 CL_R CL_SIGNED_INT16 8 CL_R CL_SIGNED_INT32 9 CL_R CL_UNSIGNED_INT8 10 CL_R CL_UNSIGNED_INT16 11 CL_R CL_UNSIGNED_INT32 12 CL_A CL_FLOAT 13 CL_A CL_HALF_FLOAT 14 CL_A CL_UNORM_INT8 15 CL_A CL_UNORM_INT16 16 CL_A CL_SNORM_INT16 17 CL_A CL_SIGNED_INT8 18 CL_A CL_SIGNED_INT16 19 CL_A CL_SIGNED_INT32 20 CL_A CL_UNSIGNED_INT8 21 CL_A CL_UNSIGNED_INT16 22 CL_A CL_UNSIGNED_INT32 23 CL_RG CL_FLOAT 24 CL_RG CL_HALF_FLOAT 25 CL_RG CL_UNORM_INT8 26 CL_RG CL_UNORM_INT16 27 CL_RG CL_SNORM_INT16 28 CL_RG CL_SIGNED_INT8 29 CL_RG CL_SIGNED_INT16 30 CL_RG CL_SIGNED_INT32 31 CL_RG CL_UNSIGNED_INT8 32 CL_RG CL_UNSIGNED_INT16 33 CL_RG CL_UNSIGNED_INT32 34 CL_RA CL_FLOAT 35 CL_RA CL_HALF_FLOAT 36 CL_RA CL_UNORM_INT8 37 CL_RA CL_UNORM_INT16 38 CL_RA CL_SNORM_INT16 39 CL_RA CL_SIGNED_INT8 40 CL_RA CL_SIGNED_INT16 41 CL_RA CL_SIGNED_INT32 42 CL_RA CL_UNSIGNED_INT8 43 CL_RA CL_UNSIGNED_INT16 44 CL_RA CL_UNSIGNED_INT32 45 CL_RGBA CL_FLOAT 46 CL_RGBA CL_HALF_FLOAT 47 CL_RGBA CL_UNORM_INT8 48 CL_RGBA CL_UNORM_INT16 49 CL_RGBA CL_SNORM_INT16 50 CL_RGBA CL_SIGNED_INT8 51 CL_RGBA CL_SIGNED_INT16 52 CL_RGBA CL_SIGNED_INT32 53 CL_RGBA CL_UNSIGNED_INT8 54 CL_RGBA CL_UNSIGNED_INT16 55 CL_RGBA CL_UNSIGNED_INT32 56 CL_BGRA CL_UNORM_INT8 57 CL_BGRA CL_SIGNED_INT8 58 CL_BGRA CL_UNSIGNED_INT8 59 CL_ARGB CL_UNORM_INT8 60 CL_ARGB CL_SIGNED_INT8 61 CL_ARGB CL_UNSIGNED_INT8 62 CL_INTENSITY CL_FLOAT 63 CL_INTENSITY CL_HALF_FLOAT 64 CL_INTENSITY CL_UNORM_INT8 65 CL_INTENSITY CL_UNORM_INT16 66 CL_INTENSITY CL_SNORM_INT16 67 CL_LUMINANCE CL_FLOAT 68 CL_LUMINANCE CL_HALF_FLOAT 69 CL_LUMINANCE CL_UNORM_INT8 70 CL_LUMINANCE CL_UNORM_INT16 71 CL_LUMINANCE CL_SNORM_INT16 oclDeviceQuery, Platform Name = NVIDIA CUDA, Platform Version = OpenCL 1.0 CUDA 3.2.1, SDK Revision = 7027912, NumDevs = 2, Device = Tesla T10 Processor, Device = Tesla T10 Processor System Info: Local Time/Date = 19:22:40, 01/21/2011 CPU Name: Intel(R) Xeon(R) CPU E5520 @ 2.27GHz # of CPU processors: 8 Linux version 2.6.18-194.26.1.el5 (brewbuilder@norob.fnal.gov) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #1 SMP Tue Nov 9 12:46:16 EST 2010 PASSED Press <Enter> to Quit... -----------------------------------------------------------