c++ - 3D Convolution with Intel MKL -


i have written c/c++ code uses intel mkl compute 3d convolution of array has about 300×200×200 elements. want apply kernel either 3×3×3 or 5×5×5. both 3d input array , kernel have real values.

this 3d array stored 1d array of type double in columnwise fashion. kernel of type double , saved columnwise. example,

for( int k = 0; k < nk; k++ ) // loop through height.     for( int j = 0; j < nj; j++ ) // loop through rows.         for( int = 0; < ni; i++ ) // loop through columns.         {             ijk = + ni * j + ni * nj * k;             my3darray[ ijk ] = 1.0;         } 

for computation of convolution, want perform not-in-place fft on input array , kernel , prevent them getting modified (i need use them later in code) , backward computation in-place.

when compare result obtained code 1 obtained matlab different. kindly me fix issue? missing in code?

here matlab code used:

a = ones( 10, 10, 10 ); kernel = ones( 3, 3, 3 ); aconvolved = convn( a, kernel, 'same' ); 

here c/c++ code:

#include <stdio.h> #include "mkl.h"  void conv3d(     double *in, double *ker, double *out,     int nrows, int ncols, int nheights) {      int ni = nrows;     int nj = ncols;     int nk = nheights;      double *in_fft  = new double [ni*nj*nk];     double *ker_fft = new double [ni*nj*nk];      dfti_descriptor_handle fft_desc = 0;     mkl_long sizes[]   = { nk, nj, ni };     mkl_long strides[] = { 0, nj*ni, ni, 1 };      dfticreatedescriptor( &fft_desc, dfti_double, dfti_real, 3, sizes     );     dftisetvalue        (  fft_desc, dfti_placement     , dfti_not_inplace);   // out-of-place computation.     dftisetvalue        (  fft_desc, dfti_input_strides , strides         );     dftisetvalue        (  fft_desc, dfti_output_strides, strides         );     dftisetvalue        (  fft_desc, dfti_backward_scale, 1/ni/nj/nk      );     dfticommitdescriptor(  fft_desc );      dfticomputeforward  (  fft_desc, in , in_fft  );     dfticomputeforward  (  fft_desc, ker, ker_fft );      (long long = 0; < (long long)ni*nj*nk; ++i )         out[i] = in_fft[i]*ker_fft[i];      // in-place computation.     dftisetvalue        (  fft_desc, dfti_placement, dfti_inplace );     dfticommitdescriptor(  fft_desc      );     dfticomputebackward (  fft_desc, out );      dftifreedescriptor  ( &fft_desc );      delete[] in_fft;     delete[] ker_fft;  }  int main(int argc, char* argv[]) {     int n = 10;     int nkernel = 3;      double *a          = new double [n*n*n]; // array real.     double *aconvolved = new double [n*n*n]; // convolved array real.     double *kernel     = new double [nkernel*nkernel*nkernel]; // kernel real.      // fill array 'real' numbers.     for( int = 0; < n*n*n; i++ )         a[ ] = 1.0;      // fill kernel 'real' numbers.     for( int = 0; < nkernel*nkernel*nkernel; i++ )         kernel[ ] = 1.0;      // calculate convolution.     conv3d( a, kernel, aconvolved, n, n, n );      printf("convolved:\n");     for( int = 0; < n*n*n; i++ )         printf( "%15.8f\n", aconvolved[i] );      delete[] a;     delete[] kernel;     delete[] aconvolved;      return 0; } 

you can't reverse fft real-valued frequency data (just magnitude). forward fft needs output complex data. done setting dfti_forward_domain setting dfti_complex.

dfticreatedescriptor( &fft_desc, dfti_double, dfti_complex, 3, sizes     ); 

doing implicitly sets backward domain complex too.

you need complex data type. like,

mkl_complex16* in_fft  = new mkl_complex16[ni*nj*nk]; 

this means have multiply both real , imaginary parts:

for (size_t = 0; < (size_t)ni*nj*nk; ++i) {     out_fft[i].real = in_fft[i].real * ker_fft[i].real;     out_fft[i].imag = in_fft[i].imag * ker_fft[i].imag; } 

the output of inverse fft complex, , assuming input data real, can grab .real component , result. means you'll need temporary complex output array (say, out_fft above).

also note avoid artifacts, want size of fft (at least) m+n-1 on each dimension. choose next highest power of 2 speed.

i suggest implement in matlab first, using ffts. there many such implementations available (example), start basics , make simple function on own.


Comments

Popular posts from this blog

c++ - OpenMP unpredictable overhead -

ruby on rails - RuntimeError: Circular dependency detected while autoloading constant - ActiveAdmin.register Role -

javascript - Wordpress slider, not displayed 100% width -