c++ - Why doesn't my OpenCL 3d image lookup work? -
i have been having trouble opencl kernel i've written producing incorrect results (compared reference brute-force cpu implementation).
i tracked problem down 3d lookup table i'm using seems returning garbage results, rather values passed in.
i have following (simplified) opencl kernel reading precomputed function 3d image type:
__constant sampler_t legsampler = clk_normalized_coords_true | clk_address_clamp_to_edge | clk_filter_linear; inline float normalizedlegendre(int n, int m, float z, image3d_t legendrelut) { float ncoord = (((float) n) / get_image_width(legendrelut)); float mcoord = (((float) m) / get_image_height(legendrelut)); float zcoord = ((z + 1.0f) / 2.0f); float4 coord = (float4)(floor(ncoord) + 0.5f, floor(mcoord) + 0.5f, zcoord, 0.0f); return read_imagef(legendrelut, legsampler, coord).x; } _kernel void noisemain(__read_only image3d_t legendrelut, __global float* outlegdump) { //k linear index array. int k = get_global_id(0); if(k < get_image_depth(legendrelut)) { float z = ((float) k / (float) get_image_depth(legendrelut)) * 2.0 - 1.0; float leglookup = normalizedlegendre(5, 4, z, legendrelut); float texcoord = ((float) k / 1024.0) * 2 - 1; outlegdump = leglookup; } } on host side, generate 3d image, legendrelut, using following code:
static const size_t nlegpolybins = 1024; static const size_t nlegpolyorders = 16; boost::scoped_array<float> legendrehostbuffer(new float[nlegpolyorders * nlegpolyorders * nlegpolybins]); float stepsize = 1.0 / (((float) nlegpolybins/2.0) - 0.5); float z = -1.0; std::cout << "generating legendre polynomials..." << std::endl; for(size_t n = 0; n < nlegpolyorders; n++) { for(size_t m = 0; m < nlegpolyorders; m++) { for(size_t zi = 0; zi < nlegpolybins; zi++) { using namespace boost::math; size_t index = (n * nlegpolyorders * nlegpolybins) + (m * nlegpolybins) + zi; //-1..1 in nlegpolybins steps... float val; if(m > n) { legendrehostbuffer[index] = 0; continue; } else { //boost::math::legendre_p val = legendre_p<float>(n,m,z); } float npm = n+m; float nmm = n-m; float factnum; float factden; factnum = factorial<float>(n-m); factden = factorial<float>(n+m); float nrmterm; nrmterm = pow(-1.0, m) * sqrt((n + 0.5) * (factnum/factden)); legendrehostbuffer[index] = val; z += stepsize; if(z > 1.0) z + 1.0; } z = -1.0; } } //debugging step: dump we've generated m = 4, n = 5, z=-1..1 std::ofstream legdump("legdump.txt"); for(size_t = 0; < nlegpolybins; i++) { int n =5; int m = 4; size_t index = (n * nlegpolyorders * nlegpolybins) + (m * nlegpolybins) + i; float texcoord = ((float) / (float) nlegpolybins) * 2 - 1; legdump << << " " << texcoord << " " << legendrehostbuffer[index] << std::endl; } legdump.close(); std::cout << "creating legendre polynomial table image..." << std::endl; cl::imageformat legformat(cl_r, cl_float); //generate out legendre polynomials image... m_legendretable = cl::image3d(m_clcontext, cl_mem_read_only | cl_mem_copy_host_ptr, legformat, nlegpolyorders, nlegpolyorders, nlegpolybins, 0, 0, legendrehostbuffer.get()); other index, actual generation of values more or less irrelevant, i've included here completeness.
and here how execute kernel , read results:
cl::buffer outlegdump = cl::buffer(m_clcontext, cl_mem_write_only, nlegpolybins * sizeof(float)); //create out kernel... cl::kernel kernel(m_program, "noisemain"); kernel.setarg(0, m_legendretable); kernel.setarg(1, outlegdump); size_t kernelsize = 1024; cl::ndrange globalrange(kernelsize); cl::ndrange localrange(1); m_commandqueue.enqueuendrangekernel(kernel, cl::nullrange, globalrange, cl::nullrange); m_commandqueue.finish(); boost::scoped_array<float> legdumphost(new float[nlegpolybins]); m_commandqueue.enqueuereadbuffer(outlegdump, cl_true, 0, nlegpolybins * sizeof(float), legdumphost.get()); std::ofstream legreadback("legreadback.txt"); for(size_t = 0; < nlegpolybins; i++) { legreadback << << " " << legdumphost[i] << std::endl; } legreadback.close(); when @ dumped data (i.e. put out in legdump.txt host-side buffer), expected data. however, when compare data received back device side (i.e. looked kernel , put out in legreadback.txt), incorrect values.
since i'm calculating 1024 values in both cases, i'll spare whole dump, however, here first few/last values of each:
legdump.txt (host side sanity check):
0 -0 1 -0.0143913 2 -0.0573401 3 -0.12851 4 -0.227566 5 -0.354175 .. .. 1020 0.12859 1021 0.0144185 1022 0.0144185 1023 1.2905e-8 legreadback.txt (device-side lookup , readback)
0 1 1 1 2 1 3 1 4 0.5 5 0 .. .. 1020 7.74249e+11 1021 -1.91171e+15 1022 -3.81029e+15 1023 -1.91173e+15 note these values same across multiple runs of code, don't think it's initialization problem.
i can assume i'm calculating indices wrong somewhere, don't know where. i've checked calculation of z coordinate (which naturally defined on -1..1), conversion texture coordinates (0..1 range), , conversion of m , n texture coordinates (which should done without interpolation), , found nothing wrong.
so question thus:
what proper way create , index 3d lookup table in opencl?
as expected, problem turned out in indexing on host-side used generate lookup table.
the previous index calculation:
size_t index = (n * nlegpolyorders * nlegpolybins) + (m * nlegpolybins) + zi; was based on c++ 3d array indexing, not way addressing works in opencl 3d image. 3d image can thought of "stack" of 2d images on top of each other, depth coordinate (z in case) selects image, , horizontal , vertical coordinates (m , n in case) select pixel within selected image.
the correct indexing calculation is:
size_t index = m * nlegpolyorders + n + (zi * nlegpolyorders * nlegpolyorders); as 1 can see, new approach fits "stacked image" layout described previously, whereas previous calculation not.
Comments
Post a Comment