[问题] CUDA 执行结果问题

楼主: v00623 (阿哩他命EX PLUS)   2017-05-08 09:48:55
开发平台(Platform): (Ex: Win10, Linux, ...)
Linux GPGPU-sim
编译器(Ex: GCC, clang, VC++...)+目标环境(跟开发平台不同的话需列出)
nvcc
问题(Question):
正在练习简单的vectorAdd
原本是在main()中呼叫function来launch kernel 这样没问题
不过想试着把launch kernel放到main()中
但却没有出现如预期的结果
目前找不到是什么问题
预期的正确结果(Expected Output):
dataD = 1.000000
dataD = 1.000000
dataD = 1.000000
dataD = 1.000000
dataD = 1.000000
dataD = 1.000000
dataD = 1.000000
dataD = 1.000000
dataD = 1.000000
dataD = 1.000000
错误结果(Wrong Output):
dataD = 1.000000
dataD = 1.000000
dataD = 0.000000
dataD = 0.000000
dataD = 0.000000
dataD = 0.000000
dataD = 0.000000
dataD = 0.000000
dataD = 0.000000
dataD = 0.000000
程式码(Code):(请善用置底文网页, 记得排版)
__global__ void VectorAdd( float* arrayA, float* arrayB, float* output )
{
int idx = threadIdx.x;
output[idx] = arrayA[idx] + arrayB[idx] + 1;
}
void add_vector_gpu( float* a, float* b, float *c, int size );
int main( int argc, char** argv){
int data_size = 10;
float *dataA = new float[data_size],
*dataB = new float[data_size],
*dataC = new float[data_size],
*dataD = new float[data_size],
*dataE = new float[data_size];
for( int i = 0; i < data_size; ++ i )
{
dataA[i] = i;
dataB[i] = -1 * i;
}
add_vector_cpu( dataA, dataB, dataC, data_size );
float data_size2 = data_size * sizeof(float);
float *dev_A, *dev_B, *dev_C, *dev_D;
cudaMalloc( (void**)&dev_A, data_size2 );
cudaMalloc( (void**)&dev_B, data_size2 );
cudaMalloc( (void**)&dev_C, data_size2 );
cudaMalloc( (void**)&dev_D, data_size2 );
cudaMemcpy( dev_A, dataA, data_size, cudaMemcpyHostToDevice );
cudaMemcpy( dev_B, dataB, data_size, cudaMemcpyHostToDevice );
VectorAdd<<< 1, 10 >>>( dev_A, dev_B, dev_C );
cudaMemcpy( dataD, dev_C, data_size, cudaMemcpyDeviceToHost );
for( int i = 0; i < data_size; ++ i )
{
printf( "dataD = %f\n", dataD[i] );
}
}
补充说明(Supplement):
另外想请问 如果想在kernel中printf一些资料该怎么做
有看到说要 #include "cuPrintf.cu"
才可以使用 cuPrintf ("Thread_number %d\n", threadIdx.x);
但还是没有print 是不是方法用错?
作者: mike0227 (我又小看了那复杂的世界)   2017-05-08 11:26:00
你memcpy的大小是data_size不是data_size2然后应该要用size_t不是floatprintf in kernel只要不是超老的卡应该都有支援了就直接printf("Hello world!\n") nvcc会帮你搞定
作者: LPH66 (-6.2598534e+18f)   2017-05-08 14:00:00
float data_size2 ← 这个要是 size_t

Links booklink

Contact Us: admin [ a t ] ucptt.com