Deck 2: Nvidia CUDA and GPU Programming

Full screen (f)
exit full mode
Question
NVIDIA CUDA Warp is made up of how many threads?

A)512
B)1024
C)312
D)32
Use Space or
up arrow
down arrow
to flip the card.
Question
Out-of-order instructions is not possible on GPUs.
Question
CUDA supports programming in ....

A)c or c++ only
B)java, python, and more
C)c, c++, third party wrappers for java, python, and more
D)pascal
Question
FADD, FMAD, FMIN, FMAX are ----- supported by Scalar Processors of NVIDIA GPU.

A)32-bit ieee floating point instructions
B)32-bit integer instructions
C)both
D)none of the above
Question
Each streaming multiprocessor (SM) of CUDA herdware has ------ scalar processors (SP).

A)1024
B)128
C)512
D)8
Question
Each NVIDIA GPU has ------ Streaming Multiprocessors

A)8
B)1024
C)512
D)16
Question
CUDA provides ------- warp and thread scheduling. Also, the overhead of thread creation is on the order of ----.

A)"programming-overhead", 2 clock
B)"zero-overhead", 1 clock
C)64, 2 clock
D)32, 1 clock
Question
Each warp of GPU receives a single instruction and "broadcasts" it to all of its threads. It is a ---- operation.

A)simd (single instruction multiple data)
B)simt (single instruction multiple thread)
C)sisd (single instruction single data)
D)sist (single instruction single thread)
Question
Limitations of CUDA Kernel

A)recursion, call stack, static variable declaration
B)no recursion, no call stack, no static variable declarations
C)recursion, no call stack, static variable declaration
D)no recursion, call stack, no static variable declarations
Question
What is Unified Virtual Machine

A)it is a technique that allow both cpu and gpu to read from single virtual machine, simultaneously.
B)it is a technique for managing separate host and device memory spaces.
C)it is a technique for executing device code on host and host code on device.
D)it is a technique for executing general purpose programs on device instead of host.
Question
_______ became the first language specifically designed by a GPU Company to facilitate general purpose computing on ____.

A)python, gpus.
B)c, cpus.
C)cuda c, gpus.
D)java, cpus.
Question
The CUDA architecture consists of --------- for parallel computing kernels and functions.

A)risc instruction set architecture
B)cisc instruction set architecture
C)zisc instruction set architecture
D)ptx instruction set architecture
Question
CUDA stands for --------, designed by NVIDIA.

A)common union discrete architecture
B)complex unidentified device architecture
C)compute unified device architecture
D)complex unstructured distributed architecture
Question
The host processor spawns multithread tasks (or kernels as they are known in CUDA) onto the GPU device.
Question
The NVIDIA G80 is a ---- CUDA core device, the NVIDIA G200 is a ---- CUDA core device, and the NVIDIA Fermi is a ---- CUDA core device.

A)128, 256, 512
B)32, 64, 128
C)64, 128, 256
D)256, 512, 1024
Question
NVIDIA 8-series GPUs offer -------- .

A)50-200 gflops
B)200-400 gflops
C)400-800 gflops
D)800-1000 gflops
Question
IADD, IMUL24, IMAD24, IMIN, IMAX are ----------- supported by Scalar Processors of NVIDIA GPU.

A)32-bit ieee floating point instructions
B)32-bit integer instructions
C)both
D)none of the above
Question
CUDA Hardware programming model supports:
A. fully generally data-parallel archtecture;
B. General thread launch;
C. Global load-store;
D. Parallel data cache;
E. Scalar architecture;
F. Integers, bit operation

A)a,c,d,f
B)b,c,d,e
C)a,d,e,f
D)a,b,c,d,e,f
Question
In CUDA memory model there are following memory types available:
A. Registers;
B. Local Memory;
C. Shared Memory;
D. Global Memory;
E. Constant Memory;
F. Texture Memory.

A)a, b, d, f
B)a, c, d, e, f
C)a, b, c, d, e, f
D)b, c, e, f
Question
What is the equivalent of general C program with CUDA C:
Int main(void)
{
Printf("Hello, World!\n");
Return 0;
}

A) int main ( void )
{
Kernel <<<1,1>>>();
Printf("hello, world!\\n");
Return 0;
}
B)__global__ void kernel( void ) { }
Int main ( void ) { kernel <<<1,1>>>();
Printf("hello, world!\\n");
Return 0;
}
C)__global__ void kernel( void ) {
Kernel <<<1,1>>>();
Printf("hello, world!\\n");
Return 0;
}
D)__global__ int main ( void ) {
Kernel <<<1,1>>>();
Printf("hello, world!\\n");
Return 0;
}
Question
A simple kernel for adding two integers:
__global__ void add( int *a, int *b, int *c ) { *c = *a + *b; }
Where __global__ is a CUDA C keyword which indicates that:

A)add() will execute on device, add() will be called from host
B)add() will execute on host, add() will be called from device
C)add() will be called and executed on host
D)add() will be called and executed on device
Question
If variable a is host variable and dev_a is a device (GPU) variable, to allocate memory to dev_a select correct statement:

A)cudamalloc( &dev_a, sizeof( int ) )
B)malloc( &dev_a, sizeof( int ) )
C)cudamalloc( (void**) &dev_a, sizeof( int ) )
D)malloc( (void**) &dev_a, sizeof( int ) )
Question
If variable a is host variable and dev_a is a device (GPU) variable, to copy input from variable a to variable dev_a select correct statement:

A)memcpy( dev_a, &a, size);
B)cudamemcpy( dev_a, &a, size, cudamemcpyhosttodevice );
C)memcpy( (void*) dev_a, &a, size);
D)cudamemcpy( (void*) &dev_a, &a, size, cudamemcpydevicetohost );
Question
Triple angle brackets mark in a statement inside main function, what does it indicates?

A)a call from host code to device code
B)a call from device code to host code
C)less than comparison
D)greater than comparison
Unlock Deck
Sign up to unlock the cards in this deck!
Unlock Deck
Unlock Deck
1/24
auto play flashcards
Play
simple tutorial
Full screen (f)
exit full mode
Deck 2: Nvidia CUDA and GPU Programming
1
NVIDIA CUDA Warp is made up of how many threads?

A)512
B)1024
C)312
D)32
32
2
Out-of-order instructions is not possible on GPUs.
False
3
CUDA supports programming in ....

A)c or c++ only
B)java, python, and more
C)c, c++, third party wrappers for java, python, and more
D)pascal
c, c++, third party wrappers for java, python, and more
4
FADD, FMAD, FMIN, FMAX are ----- supported by Scalar Processors of NVIDIA GPU.

A)32-bit ieee floating point instructions
B)32-bit integer instructions
C)both
D)none of the above
Unlock Deck
Unlock for access to all 24 flashcards in this deck.
Unlock Deck
k this deck
5
Each streaming multiprocessor (SM) of CUDA herdware has ------ scalar processors (SP).

A)1024
B)128
C)512
D)8
Unlock Deck
Unlock for access to all 24 flashcards in this deck.
Unlock Deck
k this deck
6
Each NVIDIA GPU has ------ Streaming Multiprocessors

A)8
B)1024
C)512
D)16
Unlock Deck
Unlock for access to all 24 flashcards in this deck.
Unlock Deck
k this deck
7
CUDA provides ------- warp and thread scheduling. Also, the overhead of thread creation is on the order of ----.

A)"programming-overhead", 2 clock
B)"zero-overhead", 1 clock
C)64, 2 clock
D)32, 1 clock
Unlock Deck
Unlock for access to all 24 flashcards in this deck.
Unlock Deck
k this deck
8
Each warp of GPU receives a single instruction and "broadcasts" it to all of its threads. It is a ---- operation.

A)simd (single instruction multiple data)
B)simt (single instruction multiple thread)
C)sisd (single instruction single data)
D)sist (single instruction single thread)
Unlock Deck
Unlock for access to all 24 flashcards in this deck.
Unlock Deck
k this deck
9
Limitations of CUDA Kernel

A)recursion, call stack, static variable declaration
B)no recursion, no call stack, no static variable declarations
C)recursion, no call stack, static variable declaration
D)no recursion, call stack, no static variable declarations
Unlock Deck
Unlock for access to all 24 flashcards in this deck.
Unlock Deck
k this deck
10
What is Unified Virtual Machine

A)it is a technique that allow both cpu and gpu to read from single virtual machine, simultaneously.
B)it is a technique for managing separate host and device memory spaces.
C)it is a technique for executing device code on host and host code on device.
D)it is a technique for executing general purpose programs on device instead of host.
Unlock Deck
Unlock for access to all 24 flashcards in this deck.
Unlock Deck
k this deck
11
_______ became the first language specifically designed by a GPU Company to facilitate general purpose computing on ____.

A)python, gpus.
B)c, cpus.
C)cuda c, gpus.
D)java, cpus.
Unlock Deck
Unlock for access to all 24 flashcards in this deck.
Unlock Deck
k this deck
12
The CUDA architecture consists of --------- for parallel computing kernels and functions.

A)risc instruction set architecture
B)cisc instruction set architecture
C)zisc instruction set architecture
D)ptx instruction set architecture
Unlock Deck
Unlock for access to all 24 flashcards in this deck.
Unlock Deck
k this deck
13
CUDA stands for --------, designed by NVIDIA.

A)common union discrete architecture
B)complex unidentified device architecture
C)compute unified device architecture
D)complex unstructured distributed architecture
Unlock Deck
Unlock for access to all 24 flashcards in this deck.
Unlock Deck
k this deck
14
The host processor spawns multithread tasks (or kernels as they are known in CUDA) onto the GPU device.
Unlock Deck
Unlock for access to all 24 flashcards in this deck.
Unlock Deck
k this deck
15
The NVIDIA G80 is a ---- CUDA core device, the NVIDIA G200 is a ---- CUDA core device, and the NVIDIA Fermi is a ---- CUDA core device.

A)128, 256, 512
B)32, 64, 128
C)64, 128, 256
D)256, 512, 1024
Unlock Deck
Unlock for access to all 24 flashcards in this deck.
Unlock Deck
k this deck
16
NVIDIA 8-series GPUs offer -------- .

A)50-200 gflops
B)200-400 gflops
C)400-800 gflops
D)800-1000 gflops
Unlock Deck
Unlock for access to all 24 flashcards in this deck.
Unlock Deck
k this deck
17
IADD, IMUL24, IMAD24, IMIN, IMAX are ----------- supported by Scalar Processors of NVIDIA GPU.

A)32-bit ieee floating point instructions
B)32-bit integer instructions
C)both
D)none of the above
Unlock Deck
Unlock for access to all 24 flashcards in this deck.
Unlock Deck
k this deck
18
CUDA Hardware programming model supports:
A. fully generally data-parallel archtecture;
B. General thread launch;
C. Global load-store;
D. Parallel data cache;
E. Scalar architecture;
F. Integers, bit operation

A)a,c,d,f
B)b,c,d,e
C)a,d,e,f
D)a,b,c,d,e,f
Unlock Deck
Unlock for access to all 24 flashcards in this deck.
Unlock Deck
k this deck
19
In CUDA memory model there are following memory types available:
A. Registers;
B. Local Memory;
C. Shared Memory;
D. Global Memory;
E. Constant Memory;
F. Texture Memory.

A)a, b, d, f
B)a, c, d, e, f
C)a, b, c, d, e, f
D)b, c, e, f
Unlock Deck
Unlock for access to all 24 flashcards in this deck.
Unlock Deck
k this deck
20
What is the equivalent of general C program with CUDA C:
Int main(void)
{
Printf("Hello, World!\n");
Return 0;
}

A) int main ( void )
{
Kernel <<<1,1>>>();
Printf("hello, world!\\n");
Return 0;
}
B)__global__ void kernel( void ) { }
Int main ( void ) { kernel <<<1,1>>>();
Printf("hello, world!\\n");
Return 0;
}
C)__global__ void kernel( void ) {
Kernel <<<1,1>>>();
Printf("hello, world!\\n");
Return 0;
}
D)__global__ int main ( void ) {
Kernel <<<1,1>>>();
Printf("hello, world!\\n");
Return 0;
}
Unlock Deck
Unlock for access to all 24 flashcards in this deck.
Unlock Deck
k this deck
21
A simple kernel for adding two integers:
__global__ void add( int *a, int *b, int *c ) { *c = *a + *b; }
Where __global__ is a CUDA C keyword which indicates that:

A)add() will execute on device, add() will be called from host
B)add() will execute on host, add() will be called from device
C)add() will be called and executed on host
D)add() will be called and executed on device
Unlock Deck
Unlock for access to all 24 flashcards in this deck.
Unlock Deck
k this deck
22
If variable a is host variable and dev_a is a device (GPU) variable, to allocate memory to dev_a select correct statement:

A)cudamalloc( &dev_a, sizeof( int ) )
B)malloc( &dev_a, sizeof( int ) )
C)cudamalloc( (void**) &dev_a, sizeof( int ) )
D)malloc( (void**) &dev_a, sizeof( int ) )
Unlock Deck
Unlock for access to all 24 flashcards in this deck.
Unlock Deck
k this deck
23
If variable a is host variable and dev_a is a device (GPU) variable, to copy input from variable a to variable dev_a select correct statement:

A)memcpy( dev_a, &a, size);
B)cudamemcpy( dev_a, &a, size, cudamemcpyhosttodevice );
C)memcpy( (void*) dev_a, &a, size);
D)cudamemcpy( (void*) &dev_a, &a, size, cudamemcpydevicetohost );
Unlock Deck
Unlock for access to all 24 flashcards in this deck.
Unlock Deck
k this deck
24
Triple angle brackets mark in a statement inside main function, what does it indicates?

A)a call from host code to device code
B)a call from device code to host code
C)less than comparison
D)greater than comparison
Unlock Deck
Unlock for access to all 24 flashcards in this deck.
Unlock Deck
k this deck
locked card icon
Unlock Deck
Unlock for access to all 24 flashcards in this deck.