cuda - reduction for sum of vector when size is not power of 2? -


for classical reduction algorithm on gpu, works if size of vector power of 2. if not case? @ point have find sum of odd number of element. best way deal that?

you can compute sum of matrix doesn't have size of power of two. @ example :

#include <math.h> #define n 1022 //total size __global__ void sum(int *a, int *c) {         __shared__ int temp[blockdim.x];         int idx = threadidx.x+blockdim.x*blockidx.x;         int local_idx = threadidx.x;         temp[local_idx] = a[idx];         int i=ceil(blockdim.x/2);         __syncthreads();         while(i!=0)         {                  if(idx+i<n && local_idx<i)                           temp[local_idx] += tmp[local_idx+i];                  i/=2;                  __syncthreads();          }        if(local_idx == 0)            c[blockidx.x] = temp[0];  } 

Comments

Popular posts from this blog

how to insert data php javascript mysql with multiple array session 2 -

multithreading - Exception in Application constructor -

windows - CertCreateCertificateContext returns CRYPT_E_ASN1_BADTAG / 8009310b -