cuda - reduction for sum of vector when size is not power of 2? -
for classical reduction algorithm on gpu, works if size of vector power of 2. if not case? @ point have find sum of odd number of element. best way deal that?
you can compute sum of matrix doesn't have size of power of two. @ example :
#include <math.h> #define n 1022 //total size __global__ void sum(int *a, int *c) { __shared__ int temp[blockdim.x]; int idx = threadidx.x+blockdim.x*blockidx.x; int local_idx = threadidx.x; temp[local_idx] = a[idx]; int i=ceil(blockdim.x/2); __syncthreads(); while(i!=0) { if(idx+i<n && local_idx<i) temp[local_idx] += tmp[local_idx+i]; i/=2; __syncthreads(); } if(local_idx == 0) c[blockidx.x] = temp[0]; }
Comments
Post a Comment