介绍OpenCL与D3D 10中间的互操作,但个人感觉CUDA比OpenCL更好一点

   
OpenCL是一个并行异构总结的框架,包涵intel,英特尔,英特尔等等许多厂家都有对它的扶助,不过速龙只到1.2版本,主要发展大团结的CUDA去了。纵然并未用过CUDA,但个人感觉CUDA比OpenCL更好一点,但OpenCL辅助面更管,CPU,GPU,DSP,FPGA等多样芯片都能帮忙OpenCL。OpenCL与D3D中的像素着色器相当相像。

   
OpenCL是一个并行异构总计的框架,包涵intel,英特尔,AMD等等许多厂家都有对它的协助,但是英特尔只到1.2版本,首要发展协调的CUDA去了。纵然尚未用过CUDA,但个人感觉CUDA比OpenCL更好一些,但OpenCL援助面更管,CPU,GPU,DSP,FPGA等三种芯片都能支撑OpenCL。OpenCL与D3D中的像素着色器非常相像。

《OpenCL编程指南》之 与Direct3D互操作,opencl编程指南

    介绍OpenCL与D3D 10里头的互操作。

1.双边滤波原理

    双边滤波器的法则参考女神雷切尔-Zhang的博客
两边滤波器的规律及完结.
引自雷切尔-Zhang的博客,原理如下:

双方滤波(Bilateral
filter)是一种可以保边去噪的滤波器。之所以得以达到此去噪效果,是因为滤波器是由多个函数构成。一个函数是由几何空间距离决定滤波器全面。另一个由像素差值决定滤波器周全。可以与其相相比的多个filter:高斯低通滤波器(http://en.wikipedia.org/wiki/Gaussian_filter)和α-截尾均值滤波器(去掉百分率为α的最小值和最大之后剩下像素的均值作为滤波器)。

互相滤波器中,输出像素的值看重于邻域像素的值的加权组合,

图片 1

         
权重周全w(i,j,k,l)取决于定义域核图片 2和值域核图片 3的乘积图片 4。同时考虑了空间域与值域的出入,而Gaussian
Filter和α均值滤波分别只考虑了空间域和值域差距。

本文基于那几个公式用OpenCL已毕双方滤波来做美颜。

1.三头滤波原理

    双边滤波器的法则参考女神雷切尔-Zhang的博客
两岸滤波器的规律及完毕.
引自雷切尔-Zhang的博客,原理如下:

相互滤波(Bilateral
filter)是一种可以保边去噪的滤波器。之所以得以已毕此去噪效果,是因为滤波器是由七个函数构成。一个函数是由几何空间距离决定滤波器全面。另一个由像素差值决定滤波器周密。可以与其相比较的多个filter:高斯低通滤波器(http://en.wikipedia.org/wiki/Gaussian_filter)和α-截尾均值滤波器(去掉百分率为α的最小值和最大将来剩下像素的均值作为滤波器)。

四头滤波器中,输出像素的值信赖于邻域像素的值的加权组合,

图片 5

         
权重周全w(i,j,k,l)取决于定义域核图片 6和值域核图片 7的乘积图片 8。同时考虑了空间域与值域的差距,而Gaussian
Filter和α均值滤波分别只考虑了空间域和值域差距。

本文基于那些公式用OpenCL落成三头滤波来做美颜。

1.开始化OpenCL上下文完成Direct3D互操作

    OpenCL共享由pragma cl_khr_d3d10_sharing启用:

    #pragma OPENCL EXTENSION cl_khr_d3d10_sharing: enable

   
启用D3D共享时,很多OpenCL函数会具备伸张,将经受一些甩卖D3D10共享的参数类型和值。

    可以用D3D互操作属性来创设OpenCL上下文:

    ·CL_CONTEXT_D3D10_DEVICE_KHR  
在clCreateContext和clCreateContextFromtype的特性参数中作为一个属性名。

    函数可以查询D3D互操作特定的目的参数:

    ·CL_CONTEXT_D3D10_PREFER_SHARED_RESOURCES_KHR 
作为clGetContextInfo的param_name参数值。

    ·CL_MEM_D3D10_RESOURCE_KHR
作为clGetMemObjectInfo的param_name参数值。

    ·CL_IMAGE_D3D10_SUBRESOURCE_KHR
作为clGetImageInfo的param_name参数值。

    ·CL_COMMAND_ACQUIRE_D3D10_OBJECTS_KHR
CL_COMMAND_RELEASE_D3D10_OBJECTS_KHR
当param_name为CL_ENCENT_COMMAND_TYPE时,在clGetEventInfo的参数param_value中返回。

    OpenCL
D3D10互操作函数在头文件cl_d3d10.h中。D3D10的Khronos伸张可以从Khronos网站得到。对于一些发表版本,可能须求下载那些伸张。

   
开首化OpenCL的进度与平日基本相同,唯有几点细小差异。首先平台可以使用clGetPlatformIDs函数列出。由于大家在搜索一个支撑D3D共享的阳台,要在一一平台上拔取clGetPlatformInfo()调用来询问它扶助的增添。假设扩充串中蕴涵cl_khr_d3d10_sharing,表达可以选拔这一个平台来促成D3D共享。

   
给定一个支撑D3D共享的cl_platform_id,可以在那么些平台上选用clGetDeviceIDsFromD3D10KHR()查询相应的OpenCL设备ID:

cl_int clGetDeviceIDsFromD3D10KHR(
    cl_platform_id             platform,
    cl_d3d10_device_source_khr d3d_device_source,
    void *                     d3d_object,
    cl_d3d10_device_set_khr    d3d_device_set,
    cl_uint                    num_entries,
    cl_device_id *             devices,
    cl_uint *                  num_devices)

例如:

errNum = clGetDeviceIDsFromD3D10KHR(
    platformIds[index_platform],
    CL_D3D10_DEVICE_KHR,
    g_pD3DDevice,
    CL_PREFERRED_DEVICES_FOR_D3D10_KHR,
    1,
    &cdDevice,
    &num_devices);

if (errNum == CL_INVALID_PLATFORM) {
    printf("Invalid Platform: Specified platform is not valid\n");
} else if( errNum == CL_INVALID_VALUE) {
    printf("Invalid Value: d3d_device_source, d3d_device_set is not valid or num_entries = 0 and devices != NULL or num_devices == devices == NULL\n");
} else if( errNum == CL_DEVICE_NOT_FOUND) {
    printf("No OpenCL devices corresponding to the d3d_object were found\n");
}

代码为选取的OpenCL平台(platformIds[index_platform])获取一个OpenCL设备ID(cdDevice)。常量CL_D3D10_DEVICE_KHR提醒发送的D3D10对象(g_pD3DDevice)是一个D3D10设备,通过CL_PREFERRED_DEVICES_FOR_D3D10_KHR来抉择该平台的期待设备。那会回到与平台和D3D10装置关联的期望OpenCL设备。

   
那么些函数再次回到的设施ID可以用来创造一个协理D3D共享的上下文。创设OpenCL上下文时,clCreateContext*()调用中的cl_context_properties域应当包涵要共享的D3D10设备的指针。例如:

cl_context_properties contextProperties[] =
{
    CL_CONTEXT_D3D10_DEVICE_KHR, 
    (cl_context_properties)g_pD3DDevice,
    CL_CONTEXT_PLATFORM,
    (cl_context_properties)platformIds[index_platform],
    0
};


context = clCreateContextFromType( contextProperties, CL_DEVICE_TYPE_GPU, NULL, NULL, &errNum ) ;

本条示例代码中,会从D3D10CreateDeviceAndSwapChain()调用重回D3D10设备g_pD3DDevice的指针。

2.核函数

   
磨皮算法原理参考自http://www.zealfilter.com/portal.php?mod=view&aid=138,其中的肤色检测算法倒霉,我给去掉了,本来还要做个锐化处理的,但意识不做锐化效果也蛮好,所以就先没做,学下一步的OpenCL时在做锐化。

const sampler_t sampler = CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST;

kernel void bilateralBlur(read_only image2d_t src,write_only image2d_t dst)  
{
    int x = (int)get_global_id(0);  
    int y = (int)get_global_id(1);  
    if (x >= get_image_width(src) || y >= get_image_height(src))  
        return;  

    int ksize = 11;
    float sigma_d = 3.0;
    float sigma_r = 0.1;

    float4 fij = read_imagef(src, sampler, (int2)(x, y));
    float alpha = 0.2;

    float4 fkl;
    float dkl;
    float4 rkl;
    float4 wkl;

    float4 numerator = (float4)(0.0f,0.0f,0.0f,0.0f);
    float4 denominator = (float4)(1.0f, 1.0f, 1.0f, 1.0f);
    for (int K = -ksize / 2; K <= ksize / 2; K++)
    {
        for (int L = -ksize / 2; L <= ksize / 2; L++)
        {
            fkl = read_imagef(src, sampler, (int2)(x + K, y + L));

            dkl = -(K*K + L*L) / (2 * sigma_d*sigma_d);
            rkl.x = -(fij.x - fkl.x)*(fij.x - fkl.x) / (2 * sigma_r*sigma_r);
            rkl.y = -(fij.y - fkl.y)*(fij.y - fkl.y) / (2 * sigma_r*sigma_r);
            rkl.z = -(fij.z - fkl.z)*(fij.z - fkl.z) / (2 * sigma_r*sigma_r);

            wkl.x = exp(dkl + rkl.x);
            wkl.y = exp(dkl + rkl.y);
            wkl.z = exp(dkl + rkl.z);

            numerator.x += fkl.x * wkl.x;
            numerator.y += fkl.y * wkl.y;
            numerator.z += fkl.z * wkl.z;

            denominator.x += wkl.x;
            denominator.y += wkl.y;
            denominator.z += wkl.z;
        }
    }

    float4 gij = (float4)(0.0f, 0.0f, 0.0f, 1.0f);
    if (denominator.x > 0 && denominator.y > 0 && denominator.z)
    {
        gij.x = numerator.x / denominator.x;
        gij.y = numerator.y / denominator.y;
        gij.z = numerator.z / denominator.z;

        //双边滤波后再做一个融合

         gij.x = fij.x*alpha + gij.x*(1.0 - alpha);
        gij.y = fij.y*alpha + gij.y*(1.0 - alpha);
        gij.z = fij.z*alpha + gij.z*(1.0 - alpha);
    }

    write_imagef(dst, (int2)(x, y), gij);
}

kernel函数里面基本就是把数学公式写出来,可以说是极度不难的。

2.核函数

   
磨皮算法原理参考自http://www.zealfilter.com/portal.php?mod=view&aid=138,其中的肤色检测算法不佳,我给去掉了,本来还要做个锐化处理的,但意识不做锐化效果也蛮好,所以就先没做,学下一步的OpenCL时在做锐化。

const sampler_t sampler = CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST;

kernel void bilateralBlur(read_only image2d_t src,write_only image2d_t dst)  
{
    int x = (int)get_global_id(0);  
    int y = (int)get_global_id(1);  
    if (x >= get_image_width(src) || y >= get_image_height(src))  
        return;  

    int ksize = 11;
    float sigma_d = 3.0;
    float sigma_r = 0.1;

    float4 fij = read_imagef(src, sampler, (int2)(x, y));
    float alpha = 0.2;

    float4 fkl;
    float dkl;
    float4 rkl;
    float4 wkl;

    float4 numerator = (float4)(0.0f,0.0f,0.0f,0.0f);
    float4 denominator = (float4)(1.0f, 1.0f, 1.0f, 1.0f);
    for (int K = -ksize / 2; K <= ksize / 2; K++)
    {
        for (int L = -ksize / 2; L <= ksize / 2; L++)
        {
            fkl = read_imagef(src, sampler, (int2)(x + K, y + L));

            dkl = -(K*K + L*L) / (2 * sigma_d*sigma_d);
            rkl.x = -(fij.x - fkl.x)*(fij.x - fkl.x) / (2 * sigma_r*sigma_r);
            rkl.y = -(fij.y - fkl.y)*(fij.y - fkl.y) / (2 * sigma_r*sigma_r);
            rkl.z = -(fij.z - fkl.z)*(fij.z - fkl.z) / (2 * sigma_r*sigma_r);

            wkl.x = exp(dkl + rkl.x);
            wkl.y = exp(dkl + rkl.y);
            wkl.z = exp(dkl + rkl.z);

            numerator.x += fkl.x * wkl.x;
            numerator.y += fkl.y * wkl.y;
            numerator.z += fkl.z * wkl.z;

            denominator.x += wkl.x;
            denominator.y += wkl.y;
            denominator.z += wkl.z;
        }
    }

    float4 gij = (float4)(0.0f, 0.0f, 0.0f, 1.0f);
    if (denominator.x > 0 && denominator.y > 0 && denominator.z)
    {
        gij.x = numerator.x / denominator.x;
        gij.y = numerator.y / denominator.y;
        gij.z = numerator.z / denominator.z;

        //双边滤波后再做一个融合

         gij.x = fij.x*alpha + gij.x*(1.0 - alpha);
        gij.y = fij.y*alpha + gij.y*(1.0 - alpha);
        gij.z = fij.z*alpha + gij.z*(1.0 - alpha);
    }

    write_imagef(dst, (int2)(x, y), gij);
}

kernel函数里面基本就是把数学公式写出来,可以说是卓殊简单的。

2.从D3D缓冲区和纹理创设OpenCL内存对象

    可以使用clCreateFromD3D10*KHR()
OpenCL函数由现有的D3D缓冲区对象和纹理创制OpenCL缓冲区和图像对象。

   
可以动用clCreateFromD3D10BufferKHR()由现有的D3D缓冲区创设OpenCL内存对象:

cl_mem clCreateFromD3D10BufferKHR(
    cl_context     context,
    cl_mem_flags   flags,
    ID3D10Buffer * resource,
    cl_int *       errcode_ret)

   
所重临的OpenCL缓冲区目标的轻重与resource的高低同等。这么些调用将使resource上的中间Direct3D引用计数增1.所重临OpenCL内存对象上的OpenCL引用计数减至0时,resource上的内部Direct3D引用计数会减1.

    缓冲区与纹理都得以与OpenCL共享。

     在D3D10中,纹理可以如下创设:

// 2D texture
D3D10_TEXTURE2D_DESC desc;
ZeroMemory( &desc, sizeof(D3D10_TEXTURE2D_DESC) );
desc.Width = g_WindowWidth;
desc.Height = g_WindowHeight;
desc.MipLevels = 1;
desc.ArraySize = 1;
desc.Format = DXGI_FORMAT_R8G8B8A8_UNORM;
desc.SampleDesc.Count = 1;
desc.Usage = D3D10_USAGE_DEFAULT;
desc.BindFlags = D3D10_BIND_SHADER_RESOURCE;
if (FAILED(g_pD3DDevice->CreateTexture2D( &desc, NULL, &g_pTexture2D)))
    return E_FAIL;

其一共享的纹路格式为DXGI_FORMAT_R8G8B8A8_UNORM。然后可以使用

cl_mem clCreateFromD3D10Texture2DKHR(
    cl_context        context,
    cl_mem_flags      flags,
    ID3D10Texture2D * resource,
    UINT              subresource,
    cl_int *          errcode_ret)

开创一个OpenCL图像对象。所再次回到的OpenCL图像对象的大幅度、中度和深度由resource得子资源subresource的增幅、高度、深度控制。所再次来到的OpenCL图像对象的大道类型和顺序由resource的格式确定。

   
那一个调用将使resource上的内部Direct3D引用计数增1.所重回的OpenCL内存对象上的OpenCL引用计数减至0时,resource上的中间Direct3D引用计数减1.

    类似有3D的,

cl_mem clCreateFromD3D10Texture3DKHR(
    cl_context        context,
    cl_mem_flags      flags,
    ID3D10Texture3D * resource,
    UINT              subresource,
    cl_int *          errcode_ret)

 

cl_int clEnqueueAcquireD3D10ObjectsKHR( cl_command_queue
command_queue, cl_uint num_objects, const cl_mem * mem_objects,
cl_uint num_events_in_wait_list, const cl_event *
event_wait_list, cl_event * event)

那会取得由D3D10资源创造的的OpenCL内存对象。

cl_int clEnqueueAcquireD3D10ObjectsKHR(
    cl_command_queue command_queue,
    cl_uint          num_objects,
    const cl_mem *   mem_objects,
    cl_uint          num_events_in_wait_list,
    const cl_event * event_wait_list,
    cl_event *       event)

那会赢得由Direct3D
10资源创造OpenCL内存对象。clEnqueueAcquireD3D10ObjectsKHR()提供了一头保障,在调用clEnqueueAcquireD3D10ObjectsKHR()从前做出的持有D3D
10调用都必须先完全实施,之后event才能告诉形成,command_queue中的所有继续OpenCL工作才能开头实施。

       释放函数为:

cl_int clEnqueueReleaseD3D10ObjectsKHR(
    cl_command_queue command_queue,
    cl_uint          num_objects,
    const cl_mem *   mem_objects,
    cl_uint          num_events_in_wait_list,
    const cl_event * event_wait_list,
    cl_event *       event)

那会得到由Direct3D
10资源成立OpenCL内存对象。clEnqueueReleaseD3D10ObjectsKHR()提供了联合保险,在调用clEnqueueReleaseD3D10ObjectsKHR()之后做出的有所D3D
10调用不会及时伊始进行,直到event_wait_list中装有事件都已经完结,而且付出到command_queue中的所有工作都早就到位实施之后这一个D3D
10调用才会起来。

   
另外,与D3D10见仁见智,OpenGL获取函数不会提供联合保险。此外,获取和假释纹理时,最高效的做法是同时取得和自由具有共享的纹路和资源。其余,最好在切换回D3D甩卖以前处理完所有opencl内核。采纳那种方法,获取和释放调用可以用来组合opencl和D3D处理的边际。

3.host端代码

   
OpenCL代码分为host端的代码和device端的代码,kernel是跑在相互设备device上的,host一般适合跑串行的逻辑性强的代码,device则比较适合用来做总括,如卷积运算。总计机中,平日把CPU当host,把GPU当device。可是实在CPU也得以看作device,因为intel也是永葆OpenCL的。本文以CPU为host,GPU为device。

#include "stdafx.h"

#include <iostream>  
#include <fstream>  
#include <sstream>  
#include <malloc.h> 
#include <string.h>  
#include <opencv2/opencv.hpp>  

#include <CL/cl.h>  


 //----------获取OpenCL平台设备信息---------

void DisplayPlatformInfo(
    cl_platform_id id,
    cl_platform_info name,
    std::string str)
{
    cl_int errNum;
    std::size_t paramValueSize;

    errNum = clGetPlatformInfo(
        id,
        name,
        0,
        NULL,
        &paramValueSize);
    if (errNum != CL_SUCCESS)
    {
        std::cerr << "Failed to find OpenCL platform " << str << "." << std::endl;
        return;
    }

    char * info = (char *)alloca(sizeof(char) * paramValueSize);
    errNum = clGetPlatformInfo(
        id,
        name,
        paramValueSize,
        info,
        NULL);
    if (errNum != CL_SUCCESS)
    {
        std::cerr << "Failed to find OpenCL platform " << str << "." << std::endl;
        return;
    }

    std::cout << "\t" << str << ":\t" << info << std::endl;
}

template<typename T>
void appendBitfield(T info, T value, std::string name, std::string & str)
{
    if (info & value)
    {
        if (str.length() > 0)
        {
            str.append(" | ");
        }
        str.append(name);
    }
}

///
// Display information for a particular device.
// As different calls to clGetDeviceInfo may return
// values of different types a template is used. 
// As some values returned are arrays of values, a templated class is
// used so it can be specialized for this case, see below.
//
template <typename T>
class InfoDevice
{
public:
    static void display(
        cl_device_id id,
        cl_device_info name,
        std::string str)
    {
        cl_int errNum;
        std::size_t paramValueSize;

        errNum = clGetDeviceInfo(
            id,
            name,
            0,
            NULL,
            &paramValueSize);
        if (errNum != CL_SUCCESS)
        {
            std::cerr << "Failed to find OpenCL device info " << str << "." << std::endl;
            return;
        }

        T * info = (T *)alloca(sizeof(T) * paramValueSize);
        errNum = clGetDeviceInfo(
            id,
            name,
            paramValueSize,
            info,
            NULL);
        if (errNum != CL_SUCCESS)
        {
            std::cerr << "Failed to find OpenCL device info " << str << "." << std::endl;
            return;
        }

        // Handle a few special cases
        switch (name)
        {
        case CL_DEVICE_TYPE:
        {
            std::string deviceType;

            appendBitfield<cl_device_type>(
                *(reinterpret_cast<cl_device_type*>(info)),
                CL_DEVICE_TYPE_CPU,
                "CL_DEVICE_TYPE_CPU",
                deviceType);

            appendBitfield<cl_device_type>(
                *(reinterpret_cast<cl_device_type*>(info)),
                CL_DEVICE_TYPE_GPU,
                "CL_DEVICE_TYPE_GPU",
                deviceType);

            appendBitfield<cl_device_type>(
                *(reinterpret_cast<cl_device_type*>(info)),
                CL_DEVICE_TYPE_ACCELERATOR,
                "CL_DEVICE_TYPE_ACCELERATOR",
                deviceType);

            appendBitfield<cl_device_type>(
                *(reinterpret_cast<cl_device_type*>(info)),
                CL_DEVICE_TYPE_DEFAULT,
                "CL_DEVICE_TYPE_DEFAULT",
                deviceType);

            std::cout << "\t\t" << str << ":\t" << deviceType << std::endl;
        }
            break;
        case CL_DEVICE_SINGLE_FP_CONFIG:
        {
            std::string fpType;

            appendBitfield<cl_device_fp_config>(
                *(reinterpret_cast<cl_device_fp_config*>(info)),
                CL_FP_DENORM,
                "CL_FP_DENORM",
                fpType);

            appendBitfield<cl_device_fp_config>(
                *(reinterpret_cast<cl_device_fp_config*>(info)),
                CL_FP_INF_NAN,
                "CL_FP_INF_NAN",
                fpType);

            appendBitfield<cl_device_fp_config>(
                *(reinterpret_cast<cl_device_fp_config*>(info)),
                CL_FP_ROUND_TO_NEAREST,
                "CL_FP_ROUND_TO_NEAREST",
                fpType);

            appendBitfield<cl_device_fp_config>(
                *(reinterpret_cast<cl_device_fp_config*>(info)),
                CL_FP_ROUND_TO_ZERO,
                "CL_FP_ROUND_TO_ZERO",
                fpType);

            appendBitfield<cl_device_fp_config>(
                *(reinterpret_cast<cl_device_fp_config*>(info)),
                CL_FP_ROUND_TO_INF,
                "CL_FP_ROUND_TO_INF",
                fpType);

            appendBitfield<cl_device_fp_config>(
                *(reinterpret_cast<cl_device_fp_config*>(info)),
                CL_FP_FMA,
                "CL_FP_FMA",
                fpType);

#ifdef CL_FP_SOFT_FLOAT
            appendBitfield<cl_device_fp_config>(
                *(reinterpret_cast<cl_device_fp_config*>(info)),
                CL_FP_SOFT_FLOAT,
                "CL_FP_SOFT_FLOAT",
                fpType);
#endif

            std::cout << "\t\t" << str << ":\t" << fpType << std::endl;
        }
        case CL_DEVICE_GLOBAL_MEM_CACHE_TYPE:
        {
            std::string memType;

            appendBitfield<cl_device_mem_cache_type>(
                *(reinterpret_cast<cl_device_mem_cache_type*>(info)),
                CL_NONE,
                "CL_NONE",
                memType);
            appendBitfield<cl_device_mem_cache_type>(
                *(reinterpret_cast<cl_device_mem_cache_type*>(info)),
                CL_READ_ONLY_CACHE,
                "CL_READ_ONLY_CACHE",
                memType);

            appendBitfield<cl_device_mem_cache_type>(
                *(reinterpret_cast<cl_device_mem_cache_type*>(info)),
                CL_READ_WRITE_CACHE,
                "CL_READ_WRITE_CACHE",
                memType);

            std::cout << "\t\t" << str << ":\t" << memType << std::endl;
        }
            break;
        case CL_DEVICE_LOCAL_MEM_TYPE:
        {
            std::string memType;

            appendBitfield<cl_device_local_mem_type>(
                *(reinterpret_cast<cl_device_local_mem_type*>(info)),
                CL_GLOBAL,
                "CL_LOCAL",
                memType);

            appendBitfield<cl_device_local_mem_type>(
                *(reinterpret_cast<cl_device_local_mem_type*>(info)),
                CL_GLOBAL,
                "CL_GLOBAL",
                memType);

            std::cout << "\t\t" << str << ":\t" << memType << std::endl;
        }
            break;
        case CL_DEVICE_EXECUTION_CAPABILITIES:
        {
            std::string memType;

            appendBitfield<cl_device_exec_capabilities>(
                *(reinterpret_cast<cl_device_exec_capabilities*>(info)),
                CL_EXEC_KERNEL,
                "CL_EXEC_KERNEL",
                memType);

            appendBitfield<cl_device_exec_capabilities>(
                *(reinterpret_cast<cl_device_exec_capabilities*>(info)),
                CL_EXEC_NATIVE_KERNEL,
                "CL_EXEC_NATIVE_KERNEL",
                memType);

            std::cout << "\t\t" << str << ":\t" << memType << std::endl;
        }
            break;
        case CL_DEVICE_QUEUE_PROPERTIES:
        {
            std::string memType;

            appendBitfield<cl_device_exec_capabilities>(
                *(reinterpret_cast<cl_device_exec_capabilities*>(info)),
                CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE,
                "CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE",
                memType);

            appendBitfield<cl_device_exec_capabilities>(
                *(reinterpret_cast<cl_device_exec_capabilities*>(info)),
                CL_QUEUE_PROFILING_ENABLE,
                "CL_QUEUE_PROFILING_ENABLE",
                memType);

            std::cout << "\t\t" << str << ":\t" << memType << std::endl;
        }
            break;
        default:
            std::cout << "\t\t" << str << ":\t" << *info << std::endl;
            break;
        }
    }
};

///
// Simple trait class used to wrap base types.
//
template <typename T>
class ArrayType
{
public:
    static bool isChar() { return false; }
};

///
// Specialized for the char (i.e. null terminated string case).
//
template<>
class ArrayType<char>
{
public:
    static bool isChar() { return true; }
};

///
// Specialized instance of class InfoDevice for array types.
//
template <typename T>
class InfoDevice<ArrayType<T> >
{
public:
    static void display(
        cl_device_id id,
        cl_device_info name,
        std::string str)
    {
        cl_int errNum;
        std::size_t paramValueSize;

        errNum = clGetDeviceInfo(
            id,
            name,
            0,
            NULL,
            &paramValueSize);
        if (errNum != CL_SUCCESS)
        {
            std::cerr
                << "Failed to find OpenCL device info "
                << str
                << "."
                << std::endl;
            return;
        }

        T * info = (T *)alloca(sizeof(T) * paramValueSize);
        errNum = clGetDeviceInfo(
            id,
            name,
            paramValueSize,
            info,
            NULL);
        if (errNum != CL_SUCCESS)
        {
            std::cerr
                << "Failed to find OpenCL device info "
                << str
                << "."
                << std::endl;
            return;
        }

        if (ArrayType<T>::isChar())
        {
            std::cout << "\t" << str << ":\t" << info << std::endl;
        }
        else if (name == CL_DEVICE_MAX_WORK_ITEM_SIZES)
        {
            cl_uint maxWorkItemDimensions;

            errNum = clGetDeviceInfo(
                id,
                CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS,
                sizeof(cl_uint),
                &maxWorkItemDimensions,
                NULL);
            if (errNum != CL_SUCCESS)
            {
                std::cerr
                    << "Failed to find OpenCL device info "
                    << "CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS."
                    << std::endl;
                return;
            }

            std::cout << "\t" << str << ":\t";
            for (cl_uint i = 0; i < maxWorkItemDimensions; i++)
            {
                std::cout << info[i] << " ";
            }
            std::cout << std::endl;
        }
    }
};

///
//  Enumerate platforms and display information about them 
//  and their associated devices.
//
void displayInfo(void)
{
    cl_int errNum;
    cl_uint numPlatforms;
    cl_platform_id * platformIds;
    cl_context context = NULL;

    // First, query the total number of platforms
    errNum = clGetPlatformIDs(0, NULL, &numPlatforms);
    if (errNum != CL_SUCCESS || numPlatforms <= 0)
    {
        std::cerr << "Failed to find any OpenCL platform." << std::endl;
        return;
    }

    // Next, allocate memory for the installed plaforms, and qeury 
    // to get the list.
    platformIds = (cl_platform_id *)alloca(sizeof(cl_platform_id) * numPlatforms);
    // First, query the total number of platforms
    errNum = clGetPlatformIDs(numPlatforms, platformIds, NULL);
    if (errNum != CL_SUCCESS)
    {
        std::cerr << "Failed to find any OpenCL platforms." << std::endl;
        return;
    }

    std::cout << "Number of platforms: \t" << numPlatforms << std::endl;
    // Iterate through the list of platforms displaying associated information
    for (cl_uint i = 0; i < numPlatforms; i++) {
        // First we display information associated with the platform
        DisplayPlatformInfo(
            platformIds[i],
            CL_PLATFORM_PROFILE,
            "CL_PLATFORM_PROFILE");
        DisplayPlatformInfo(
            platformIds[i],
            CL_PLATFORM_VERSION,
            "CL_PLATFORM_VERSION");
        DisplayPlatformInfo(
            platformIds[i],
            CL_PLATFORM_VENDOR,
            "CL_PLATFORM_VENDOR");
        DisplayPlatformInfo(
            platformIds[i],
            CL_PLATFORM_EXTENSIONS,
            "CL_PLATFORM_EXTENSIONS");

        // Now query the set of devices associated with the platform
        cl_uint numDevices;
        errNum = clGetDeviceIDs(
            platformIds[i],
            CL_DEVICE_TYPE_ALL,
            0,
            NULL,
            &numDevices);
        if (errNum != CL_SUCCESS)
        {
            std::cerr << "Failed to find OpenCL devices." << std::endl;
            return;
        }

        cl_device_id * devices = (cl_device_id *)alloca(sizeof(cl_device_id) * numDevices);
        errNum = clGetDeviceIDs(
            platformIds[i],
            CL_DEVICE_TYPE_ALL,
            numDevices,
            devices,
            NULL);
        if (errNum != CL_SUCCESS)
        {
            std::cerr << "Failed to find OpenCL devices." << std::endl;
            return;
        }

        std::cout << "\tNumber of devices: \t" << numDevices << std::endl;
        // Iterate through each device, displaying associated information
        for (cl_uint j = 0; j < numDevices; j++)
        {
            InfoDevice<cl_device_type>::display(
                devices[j],
                CL_DEVICE_TYPE,
                "CL_DEVICE_TYPE");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_VENDOR_ID,
                "CL_DEVICE_VENDOR_ID");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_MAX_COMPUTE_UNITS,
                "CL_DEVICE_MAX_COMPUTE_UNITS");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS,
                "CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS");

            InfoDevice<ArrayType<size_t> >::display(
                devices[j],
                CL_DEVICE_MAX_WORK_ITEM_SIZES,
                "CL_DEVICE_MAX_WORK_ITEM_SIZES");

            InfoDevice<std::size_t>::display(
                devices[j],
                CL_DEVICE_MAX_WORK_GROUP_SIZE,
                "CL_DEVICE_MAX_WORK_GROUP_SIZE");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR,
                "CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT,
                "CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT,
                "CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG,
                "CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT,
                "CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE,
                "CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE");

#ifdef CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF,
                "CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR,
                "CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT,
                "CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_NATIVE_VECTOR_WIDTH_INT,
                "CL_DEVICE_NATIVE_VECTOR_WIDTH_INT");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG,
                "CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT,
                "CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE,
                "CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF,
                "CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF");
#endif

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_MAX_CLOCK_FREQUENCY,
                "CL_DEVICE_MAX_CLOCK_FREQUENCY");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_ADDRESS_BITS,
                "CL_DEVICE_ADDRESS_BITS");

            InfoDevice<cl_ulong>::display(
                devices[j],
                CL_DEVICE_MAX_MEM_ALLOC_SIZE,
                "CL_DEVICE_MAX_MEM_ALLOC_SIZE");

            InfoDevice<cl_bool>::display(
                devices[j],
                CL_DEVICE_IMAGE_SUPPORT,
                "CL_DEVICE_IMAGE_SUPPORT");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_MAX_READ_IMAGE_ARGS,
                "CL_DEVICE_MAX_READ_IMAGE_ARGS");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_MAX_WRITE_IMAGE_ARGS,
                "CL_DEVICE_MAX_WRITE_IMAGE_ARGS");

            InfoDevice<std::size_t>::display(
                devices[j],
                CL_DEVICE_IMAGE2D_MAX_WIDTH,
                "CL_DEVICE_IMAGE2D_MAX_WIDTH");

            InfoDevice<std::size_t>::display(
                devices[j],
                CL_DEVICE_IMAGE2D_MAX_WIDTH,
                "CL_DEVICE_IMAGE2D_MAX_WIDTH");

            InfoDevice<std::size_t>::display(
                devices[j],
                CL_DEVICE_IMAGE2D_MAX_HEIGHT,
                "CL_DEVICE_IMAGE2D_MAX_HEIGHT");

            InfoDevice<std::size_t>::display(
                devices[j],
                CL_DEVICE_IMAGE3D_MAX_WIDTH,
                "CL_DEVICE_IMAGE3D_MAX_WIDTH");

            InfoDevice<std::size_t>::display(
                devices[j],
                CL_DEVICE_IMAGE3D_MAX_HEIGHT,
                "CL_DEVICE_IMAGE3D_MAX_HEIGHT");

            InfoDevice<std::size_t>::display(
                devices[j],
                CL_DEVICE_IMAGE3D_MAX_DEPTH,
                "CL_DEVICE_IMAGE3D_MAX_DEPTH");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_MAX_SAMPLERS,
                "CL_DEVICE_MAX_SAMPLERS");

            InfoDevice<std::size_t>::display(
                devices[j],
                CL_DEVICE_MAX_PARAMETER_SIZE,
                "CL_DEVICE_MAX_PARAMETER_SIZE");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_MEM_BASE_ADDR_ALIGN,
                "CL_DEVICE_MEM_BASE_ADDR_ALIGN");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE,
                "CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE");

            InfoDevice<cl_device_fp_config>::display(
                devices[j],
                CL_DEVICE_SINGLE_FP_CONFIG,
                "CL_DEVICE_SINGLE_FP_CONFIG");

            InfoDevice<cl_device_mem_cache_type>::display(
                devices[j],
                CL_DEVICE_GLOBAL_MEM_CACHE_TYPE,
                "CL_DEVICE_GLOBAL_MEM_CACHE_TYPE");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE,
                "CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE");

            InfoDevice<cl_ulong>::display(
                devices[j],
                CL_DEVICE_GLOBAL_MEM_CACHE_SIZE,
                "CL_DEVICE_GLOBAL_MEM_CACHE_SIZE");

            InfoDevice<cl_ulong>::display(
                devices[j],
                CL_DEVICE_GLOBAL_MEM_SIZE,
                "CL_DEVICE_GLOBAL_MEM_SIZE");

            InfoDevice<cl_ulong>::display(
                devices[j],
                CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE,
                "CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_MAX_CONSTANT_ARGS,
                "CL_DEVICE_MAX_CONSTANT_ARGS");

            InfoDevice<cl_device_local_mem_type>::display(
                devices[j],
                CL_DEVICE_LOCAL_MEM_TYPE,
                "CL_DEVICE_LOCAL_MEM_TYPE");

            InfoDevice<cl_ulong>::display(
                devices[j],
                CL_DEVICE_LOCAL_MEM_SIZE,
                "CL_DEVICE_LOCAL_MEM_SIZE");

            InfoDevice<cl_bool>::display(
                devices[j],
                CL_DEVICE_ERROR_CORRECTION_SUPPORT,
                "CL_DEVICE_ERROR_CORRECTION_SUPPORT");

#ifdef CL_DEVICE_HOST_UNIFIED_MEMORY
            InfoDevice<cl_bool>::display(
                devices[j],
                CL_DEVICE_HOST_UNIFIED_MEMORY,
                "CL_DEVICE_HOST_UNIFIED_MEMORY");
#endif

            InfoDevice<std::size_t>::display(
                devices[j],
                CL_DEVICE_PROFILING_TIMER_RESOLUTION,
                "CL_DEVICE_PROFILING_TIMER_RESOLUTION");

            InfoDevice<cl_bool>::display(
                devices[j],
                CL_DEVICE_ENDIAN_LITTLE,
                "CL_DEVICE_ENDIAN_LITTLE");

            InfoDevice<cl_bool>::display(
                devices[j],
                CL_DEVICE_AVAILABLE,
                "CL_DEVICE_AVAILABLE");

            InfoDevice<cl_bool>::display(
                devices[j],
                CL_DEVICE_COMPILER_AVAILABLE,
                "CL_DEVICE_COMPILER_AVAILABLE");

            InfoDevice<cl_device_exec_capabilities>::display(
                devices[j],
                CL_DEVICE_EXECUTION_CAPABILITIES,
                "CL_DEVICE_EXECUTION_CAPABILITIES");

            InfoDevice<cl_command_queue_properties>::display(
                devices[j],
                CL_DEVICE_QUEUE_PROPERTIES,
                "CL_DEVICE_QUEUE_PROPERTIES");

            InfoDevice<cl_platform_id>::display(
                devices[j],
                CL_DEVICE_PLATFORM,
                "CL_DEVICE_PLATFORM");

            InfoDevice<ArrayType<char> >::display(
                devices[j],
                CL_DEVICE_NAME,
                "CL_DEVICE_NAME");

            InfoDevice<ArrayType<char> >::display(
                devices[j],
                CL_DEVICE_VENDOR,
                "CL_DEVICE_VENDOR");

            InfoDevice<ArrayType<char> >::display(
                devices[j],
                CL_DRIVER_VERSION,
                "CL_DRIVER_VERSION");

            InfoDevice<ArrayType<char> >::display(
                devices[j],
                CL_DEVICE_PROFILE,
                "CL_DEVICE_PROFILE");

            InfoDevice<ArrayType<char> >::display(
                devices[j],
                CL_DEVICE_VERSION,
                "CL_DEVICE_VERSION");

#ifdef CL_DEVICE_OPENCL_C_VERSION
            InfoDevice<ArrayType<char> >::display(
                devices[j],
                CL_DEVICE_OPENCL_C_VERSION,
                "CL_DEVICE_OPENCL_C_VERSION");
#endif

            InfoDevice<ArrayType<char> >::display(
                devices[j],
                CL_DEVICE_EXTENSIONS,
                "CL_DEVICE_EXTENSIONS");


            std::cout << std::endl << std::endl;
        }
    }
}

//-----------以上为获取并显示OpenCL设备信息的代码------------------

cl_program CreateProgram(cl_context context, cl_device_id device, const char* fileName)  
{  
    cl_int errNum;  
    cl_program program;  

    std::ifstream kernelFile(fileName, std::ios::in);  
    if (!kernelFile.is_open())  
    {  
        std::cerr << "Failed to open file for reading: " << fileName << std::endl;  
        return NULL;  
    }  

    std::ostringstream oss;  
    oss << kernelFile.rdbuf();  

    std::string srcStdStr = oss.str();  
    const char *srcStr = srcStdStr.c_str();  
    program = clCreateProgramWithSource(context, 1,  
        (const char**)&srcStr,  
        NULL, NULL);  
    if (program == NULL)  
    {  
        std::cerr << "Failed to create CL program from source." << std::endl;  
        return NULL;  
    }  

    errNum = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);  
    if (errNum != CL_SUCCESS)  
    {  
        // Determine the reason for the error  
        char buildLog[16384];  
        clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG,  
            sizeof(buildLog), buildLog, NULL);  

        std::cerr << "Error in kernel: " << std::endl;  
        std::cerr << buildLog;  
        clReleaseProgram(program);  
        return NULL;  
    }  

    return program;  
}  


void Cleanup(cl_context context, cl_command_queue commandQueue,  
             cl_program program, cl_kernel kernel, cl_mem imageObjects[2])  
{  
    for (int i = 0; i < 2; i++)  
    {  
        if (imageObjects[i] != 0)  
            clReleaseMemObject(imageObjects[i]);  
    }  
    if (commandQueue != 0)  
        clReleaseCommandQueue(commandQueue);  

    if (kernel != 0)  
        clReleaseKernel(kernel);  

    if (program != 0)  
        clReleaseProgram(program);  

    if (context != 0)  
        clReleaseContext(context);  

}  

cl_mem LoadImage(cl_context context, char *fileName, int &width, int &height)  
{  
    cv::Mat image1 = cv::imread(fileName);  
    width = image1.cols;  
    height = image1.rows;  
    char *buffer = new char[width * height * 4];  
    int w = 0;  
    for (int v = height - 1; v >= 0; v--)  
    {  
        for (int u = 0; u <width; u++)  
        {  
            buffer[w++] = image1.at<cv::Vec3b>(v, u)[0];  
            buffer[w++] = image1.at<cv::Vec3b>(v, u)[1];  
            buffer[w++] = image1.at<cv::Vec3b>(v, u)[2];  
            w++;  
        }  
    }  

    // Create OpenCL image  
    cl_image_format clImageFormat;  
    clImageFormat.image_channel_order = CL_RGBA;  
    clImageFormat.image_channel_data_type = CL_UNORM_INT8;  

    cl_int errNum;  
    cl_mem clImage;  
    clImage = clCreateImage2D(context,  
        CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,  
        &clImageFormat,  
        width,  
        height,  
        0,  
        buffer,  
        &errNum);  

    if (errNum != CL_SUCCESS)  
    {  
        std::cerr << "Error creating CL image object" << std::endl;  
        return 0;  
    }  

    return clImage;  
}  

size_t RoundUp(int groupSize, int globalSize)  
{  
    int r = globalSize % groupSize;  
    if (r == 0)  
    {  
        return globalSize;  
    }  
    else  
    {  
        return globalSize + groupSize - r;  
    }  
}  

int main(int argc, char** argv)  
{  
    cl_context context = 0;  
    cl_command_queue commandQueue = 0;  
    cl_program program = 0;  
    cl_device_id device = 0;  
    cl_kernel kernel = 0;  
    cl_mem imageObjects[2] = { 0, 0 };  
    cl_int errNum;  

    //打印所有OpenCL平台设备信息
    displayInfo();

    cl_uint numplatforms;
    errNum = clGetPlatformIDs(0, NULL, &numplatforms);
    if (errNum != CL_SUCCESS || numplatforms <= 0){
        printf("没有找到OpenCL平台 \n");
        return 1;
    }

    cl_platform_id * platformIds;
    platformIds = (cl_platform_id*)alloca(sizeof(cl_platform_id)*numplatforms);
    errNum = clGetPlatformIDs(numplatforms, platformIds, NULL);
    if (errNum != CL_SUCCESS){
        printf("没有找到OpenCL平台 \n");
        return 1;
    }
    printf("平台数:%d \n", numplatforms);

    //选用CL_DEVICE_MAX_WORK_GROUP_SIZE最大的显卡
    cl_uint numDevices,index_platform = 0,index_device = 0;
    cl_device_id *devicesIds;
    std::size_t paramValueSize = 0;
    for (cl_uint i = 0; i < numplatforms; i++){
        errNum = clGetDeviceIDs(platformIds[i], CL_DEVICE_TYPE_GPU, 0, NULL, &numDevices);
        if (errNum != CL_SUCCESS || numDevices <= 0){
            printf("平台 %d 没有找到设备",i);
            continue;
        }
        devicesIds = (cl_device_id*)alloca(sizeof(cl_device_id)*numDevices);
        errNum = clGetDeviceIDs(platformIds[i], CL_DEVICE_TYPE_GPU, numDevices, devicesIds, NULL);
        if (errNum != CL_SUCCESS ){
            printf("平台 %d 获取设备ID失败", i);
            continue;
        }

        for (cl_uint j = 0; j < numDevices; j++){
            std::size_t tmpSize = 0;
            errNum = clGetDeviceInfo(devicesIds[j], CL_DEVICE_MAX_WORK_GROUP_SIZE, sizeof(size_t), &tmpSize, NULL);
            if (errNum != CL_SUCCESS){
                std::cerr << "Failed to find OpenCL device info " << std::endl;
                continue;
            }

            if (tmpSize >= paramValueSize){
                index_platform = i;
                index_device = j;
            }
        }
    }

    cl_context_properties contextProperties[] ={
        CL_CONTEXT_PLATFORM,
        (cl_context_properties)platformIds[index_platform],
        0
    };
    context = clCreateContext(contextProperties, numDevices, devicesIds, NULL, NULL, &errNum);
    if (errNum != CL_SUCCESS){
        std::cerr << "Failed to Create Context " << std::endl;
        return 1;
    }

    device = devicesIds[index_device];

    // Create a command-queue on the first device available  
    // on the created context  
    commandQueue = clCreateCommandQueue(context, device, CL_QUEUE_PROFILING_ENABLE, &errNum);
    if (commandQueue == NULL)  {  
        Cleanup(context, commandQueue, program, kernel, imageObjects);  
         system("pause") ; return 1; 
    }  

    // Make sure the device supports images, otherwise exit  
    cl_bool imageSupport = CL_FALSE;  
    clGetDeviceInfo(device, CL_DEVICE_IMAGE_SUPPORT, sizeof(cl_bool), &imageSupport, NULL);  
    if (imageSupport != CL_TRUE)  {  
        std::cerr << "OpenCL device does not support images." << std::endl;  
        Cleanup(context, commandQueue, program, kernel, imageObjects);  
         system("pause") ; return 1; 
    }  

    // Load input image from file and load it into  
    // an OpenCL image object  
    int width, height;  
    char *src0 = "test.png";
    imageObjects[0] = LoadImage(context, src0, width, height);  
    if (imageObjects[0] == 0)  {  
        std::cerr << "Error loading: " << std::string(src0) << std::endl;  
        Cleanup(context, commandQueue, program, kernel, imageObjects);  
         system("pause") ; return 1; 
    }  

    // Create ouput image object  
    cl_image_format clImageFormat;  
    clImageFormat.image_channel_order = CL_RGBA;  
    clImageFormat.image_channel_data_type = CL_UNORM_INT8;  
    imageObjects[1] = clCreateImage2D(context,  
        CL_MEM_WRITE_ONLY,  
        &clImageFormat,  
        width,  
        height,  
        0,  
        NULL,  
        &errNum);  

    if (errNum != CL_SUCCESS){  
        std::cerr << "Error creating CL output image object." << std::endl;  
        Cleanup(context, commandQueue, program, kernel, imageObjects);  
         system("pause") ; return 1; 
    }  

    // Create OpenCL program  
    program = CreateProgram(context, device, "bilateralBlur.cl");  
    if (program == NULL)  {  
        Cleanup(context, commandQueue, program, kernel, imageObjects);  
         system("pause") ; return 1; 
    }  
    // Create OpenCL kernel  
    kernel = clCreateKernel(program, "bilateralBlur", NULL);  
    if (kernel == NULL)  {  
        std::cerr << "Failed to create kernel" << std::endl;  
        Cleanup(context, commandQueue, program, kernel, imageObjects);  
         system("pause") ; return 1; 
    }  

    // Set the kernel arguments  
    errNum = clSetKernelArg(kernel, 0, sizeof(cl_mem), &imageObjects[0]);  
    errNum |= clSetKernelArg(kernel, 1, sizeof(cl_mem), &imageObjects[1]);  
    if (errNum != CL_SUCCESS)  {  
        std::cerr << "Error setting kernel arguments." << std::endl;  
        Cleanup(context, commandQueue, program, kernel, imageObjects);  
        system("pause") ; return 1; 
    }  

    size_t localWorkSize[2] = { 32, 32 };  
    size_t globalWorkSize[2] = { RoundUp(localWorkSize[0], width),  
        RoundUp(localWorkSize[1], height) };  

    cl_event prof_event;

    // Queue the kernel up for execution  
    errNum = clEnqueueNDRangeKernel(commandQueue, kernel, 2, NULL,  
        globalWorkSize, localWorkSize,  
        0, NULL, &prof_event);
    if (errNum != CL_SUCCESS)  
    {  
        std::cerr << "Error queuing kernel for execution." << std::endl;  
        Cleanup(context, commandQueue, program, kernel, imageObjects);  
         system("pause") ; return 1; 
    }

    clFinish(commandQueue);
    errNum = clWaitForEvents(1, &prof_event);
    if (errNum)
    {
        printf("clWaitForEvents() failed for histogram_rgba_unorm8 kernel. (%d)\n", errNum);
        return EXIT_FAILURE;
    }

    cl_ulong ev_start_time = (cl_ulong)0;
    cl_ulong ev_end_time = (cl_ulong)0;
    size_t return_bytes;

    errNum = clGetEventProfilingInfo(prof_event, CL_PROFILING_COMMAND_QUEUED,sizeof(cl_ulong), &ev_start_time, &return_bytes);
    errNum |= clGetEventProfilingInfo(prof_event, CL_PROFILING_COMMAND_END,sizeof(cl_ulong), &ev_end_time, &return_bytes);
    if (errNum)
    {
        printf("clGetEventProfilingInfo() failed for kernel. (%d)\n", errNum);
        return EXIT_FAILURE;
    }

    double run_time = (double)(ev_end_time - ev_start_time);

    printf("Image dimensions: %d x %d pixels, Image type = CL_RGBA, CL_UNORM_INT8\n", width, height);
    printf("Work Timer:%lfms\n", run_time / 1000000);

    clReleaseEvent(prof_event);

    // Read the output buffer back to the Host  
    char *buffer = new char[width * height * 4];  
    size_t origin[3] = { 0, 0, 0 };  
    size_t region[3] = { width, height, 1 };  
    errNum = clEnqueueReadImage(commandQueue, imageObjects[1], CL_TRUE,  
        origin, region, 0, 0, buffer,  
        0, NULL, NULL);  
    if (errNum != CL_SUCCESS)  {  
        std::cerr << "Error reading result buffer." << std::endl;  
        Cleanup(context, commandQueue, program, kernel, imageObjects);  
         system("pause") ; return 1; 
    }  

    std::cout << std::endl;  
    std::cout << "Executed program succesfully." << std::endl;  

    // Save the image out to disk  
    char *saveImage = "output.jpg";
    //std::cout << buffer << std::endl;  
    cv::Mat imageColor = cv::imread(src0);  
    cv::Mat imageColor2;  
    imageColor2.create(imageColor.rows, imageColor.cols, imageColor.type());  
    int w = 0;  
    for (int v = imageColor2.rows-1; v >=0; v--)  {  
        for (int u =0 ; u <imageColor2.cols; u++)  {  
            imageColor2.at<cv::Vec3b>(v, u)[0] = buffer[w++];  
            imageColor2.at<cv::Vec3b>(v, u)[1] = buffer[w++];  
            imageColor2.at<cv::Vec3b>(v, u)[2] = buffer[w++];  
            w++;  
        }  
    }

    cv::imshow("原始图像", imageColor);
    cv::imshow("磨皮后", imageColor2);  
    cv::imwrite(saveImage, imageColor2);  
    cv::waitKey(0);  

    delete[] buffer;  

    Cleanup(context, commandQueue, program, kernel, imageObjects);  

    return 0;  
}

   
那个host端的次序包罗了opencv的少数内容,主倘使用opencv来读取图片,用别样措施读取图片当然也是可以的。实际上,opencv本身有一个ocl模块,貌似是由英特尔给opencv做得OpenCL扩充,其中包涵了不少用OpenCL达成的opencv的有些常用函数,其中就早已席卷了二者滤波和自适应双边滤波。

   
那段程序采取了CL_DEVICE_MAX_WORK_GROUP_SIZE最大的显卡,最佳的OpenCL设备的抉择相应综合考虑,在自家的处理器上CL_DEVICE_MAX_WORK_GROUP_SIZE的CPU就像是就是最佳的OpenCL设备,即使在实际获得的设备音信中CPU的成百上千参数比GPU强,不过其实运作的时长却是GPU的几倍,所以对于用哪些参数来判断一个OpenCL设备是顶级的我也不是很明亮,希望通晓朋友可以指点有限。

   
其余,那段程序其实是很不难的,实际可行的代码唯有300多行,获取装备消息的代码只是为着看看自己的微处理器上有哪些OpenCL设备以及有关的音讯,main中的displayInfo();完全可以注释掉。

   
其余关于OpenCL库文件的获得,可以从intel,英特尔,AMD等取得到,我所利用的OpenCL的头文件和lib文件就是从英特尔的CUDA里面copy出来的,你也足以一贯就是用自己的。

3.host端代码

   
OpenCL代码分为host端的代码和device端的代码,kernel是跑在互相设备device上的,host一般适合跑串行的逻辑性强的代码,device则比较适合用来做计算,如卷积运算。总结机中,寻常把CPU当host,把GPU当device。不过实在CPU也得以当作device,因为intel也是永葆OpenCL的。本文以CPU为host,GPU为device。

#include "stdafx.h"

#include <iostream>  
#include <fstream>  
#include <sstream>  
#include <malloc.h> 
#include <string.h>  
#include <opencv2/opencv.hpp>  

#include <CL/cl.h>  


 //----------获取OpenCL平台设备信息---------

void DisplayPlatformInfo(
    cl_platform_id id,
    cl_platform_info name,
    std::string str)
{
    cl_int errNum;
    std::size_t paramValueSize;

    errNum = clGetPlatformInfo(
        id,
        name,
        0,
        NULL,
        &paramValueSize);
    if (errNum != CL_SUCCESS)
    {
        std::cerr << "Failed to find OpenCL platform " << str << "." << std::endl;
        return;
    }

    char * info = (char *)alloca(sizeof(char) * paramValueSize);
    errNum = clGetPlatformInfo(
        id,
        name,
        paramValueSize,
        info,
        NULL);
    if (errNum != CL_SUCCESS)
    {
        std::cerr << "Failed to find OpenCL platform " << str << "." << std::endl;
        return;
    }

    std::cout << "\t" << str << ":\t" << info << std::endl;
}

template<typename T>
void appendBitfield(T info, T value, std::string name, std::string & str)
{
    if (info & value)
    {
        if (str.length() > 0)
        {
            str.append(" | ");
        }
        str.append(name);
    }
}

///
// Display information for a particular device.
// As different calls to clGetDeviceInfo may return
// values of different types a template is used. 
// As some values returned are arrays of values, a templated class is
// used so it can be specialized for this case, see below.
//
template <typename T>
class InfoDevice
{
public:
    static void display(
        cl_device_id id,
        cl_device_info name,
        std::string str)
    {
        cl_int errNum;
        std::size_t paramValueSize;

        errNum = clGetDeviceInfo(
            id,
            name,
            0,
            NULL,
            &paramValueSize);
        if (errNum != CL_SUCCESS)
        {
            std::cerr << "Failed to find OpenCL device info " << str << "." << std::endl;
            return;
        }

        T * info = (T *)alloca(sizeof(T) * paramValueSize);
        errNum = clGetDeviceInfo(
            id,
            name,
            paramValueSize,
            info,
            NULL);
        if (errNum != CL_SUCCESS)
        {
            std::cerr << "Failed to find OpenCL device info " << str << "." << std::endl;
            return;
        }

        // Handle a few special cases
        switch (name)
        {
        case CL_DEVICE_TYPE:
        {
            std::string deviceType;

            appendBitfield<cl_device_type>(
                *(reinterpret_cast<cl_device_type*>(info)),
                CL_DEVICE_TYPE_CPU,
                "CL_DEVICE_TYPE_CPU",
                deviceType);

            appendBitfield<cl_device_type>(
                *(reinterpret_cast<cl_device_type*>(info)),
                CL_DEVICE_TYPE_GPU,
                "CL_DEVICE_TYPE_GPU",
                deviceType);

            appendBitfield<cl_device_type>(
                *(reinterpret_cast<cl_device_type*>(info)),
                CL_DEVICE_TYPE_ACCELERATOR,
                "CL_DEVICE_TYPE_ACCELERATOR",
                deviceType);

            appendBitfield<cl_device_type>(
                *(reinterpret_cast<cl_device_type*>(info)),
                CL_DEVICE_TYPE_DEFAULT,
                "CL_DEVICE_TYPE_DEFAULT",
                deviceType);

            std::cout << "\t\t" << str << ":\t" << deviceType << std::endl;
        }
            break;
        case CL_DEVICE_SINGLE_FP_CONFIG:
        {
            std::string fpType;

            appendBitfield<cl_device_fp_config>(
                *(reinterpret_cast<cl_device_fp_config*>(info)),
                CL_FP_DENORM,
                "CL_FP_DENORM",
                fpType);

            appendBitfield<cl_device_fp_config>(
                *(reinterpret_cast<cl_device_fp_config*>(info)),
                CL_FP_INF_NAN,
                "CL_FP_INF_NAN",
                fpType);

            appendBitfield<cl_device_fp_config>(
                *(reinterpret_cast<cl_device_fp_config*>(info)),
                CL_FP_ROUND_TO_NEAREST,
                "CL_FP_ROUND_TO_NEAREST",
                fpType);

            appendBitfield<cl_device_fp_config>(
                *(reinterpret_cast<cl_device_fp_config*>(info)),
                CL_FP_ROUND_TO_ZERO,
                "CL_FP_ROUND_TO_ZERO",
                fpType);

            appendBitfield<cl_device_fp_config>(
                *(reinterpret_cast<cl_device_fp_config*>(info)),
                CL_FP_ROUND_TO_INF,
                "CL_FP_ROUND_TO_INF",
                fpType);

            appendBitfield<cl_device_fp_config>(
                *(reinterpret_cast<cl_device_fp_config*>(info)),
                CL_FP_FMA,
                "CL_FP_FMA",
                fpType);

#ifdef CL_FP_SOFT_FLOAT
            appendBitfield<cl_device_fp_config>(
                *(reinterpret_cast<cl_device_fp_config*>(info)),
                CL_FP_SOFT_FLOAT,
                "CL_FP_SOFT_FLOAT",
                fpType);
#endif

            std::cout << "\t\t" << str << ":\t" << fpType << std::endl;
        }
        case CL_DEVICE_GLOBAL_MEM_CACHE_TYPE:
        {
            std::string memType;

            appendBitfield<cl_device_mem_cache_type>(
                *(reinterpret_cast<cl_device_mem_cache_type*>(info)),
                CL_NONE,
                "CL_NONE",
                memType);
            appendBitfield<cl_device_mem_cache_type>(
                *(reinterpret_cast<cl_device_mem_cache_type*>(info)),
                CL_READ_ONLY_CACHE,
                "CL_READ_ONLY_CACHE",
                memType);

            appendBitfield<cl_device_mem_cache_type>(
                *(reinterpret_cast<cl_device_mem_cache_type*>(info)),
                CL_READ_WRITE_CACHE,
                "CL_READ_WRITE_CACHE",
                memType);

            std::cout << "\t\t" << str << ":\t" << memType << std::endl;
        }
            break;
        case CL_DEVICE_LOCAL_MEM_TYPE:
        {
            std::string memType;

            appendBitfield<cl_device_local_mem_type>(
                *(reinterpret_cast<cl_device_local_mem_type*>(info)),
                CL_GLOBAL,
                "CL_LOCAL",
                memType);

            appendBitfield<cl_device_local_mem_type>(
                *(reinterpret_cast<cl_device_local_mem_type*>(info)),
                CL_GLOBAL,
                "CL_GLOBAL",
                memType);

            std::cout << "\t\t" << str << ":\t" << memType << std::endl;
        }
            break;
        case CL_DEVICE_EXECUTION_CAPABILITIES:
        {
            std::string memType;

            appendBitfield<cl_device_exec_capabilities>(
                *(reinterpret_cast<cl_device_exec_capabilities*>(info)),
                CL_EXEC_KERNEL,
                "CL_EXEC_KERNEL",
                memType);

            appendBitfield<cl_device_exec_capabilities>(
                *(reinterpret_cast<cl_device_exec_capabilities*>(info)),
                CL_EXEC_NATIVE_KERNEL,
                "CL_EXEC_NATIVE_KERNEL",
                memType);

            std::cout << "\t\t" << str << ":\t" << memType << std::endl;
        }
            break;
        case CL_DEVICE_QUEUE_PROPERTIES:
        {
            std::string memType;

            appendBitfield<cl_device_exec_capabilities>(
                *(reinterpret_cast<cl_device_exec_capabilities*>(info)),
                CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE,
                "CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE",
                memType);

            appendBitfield<cl_device_exec_capabilities>(
                *(reinterpret_cast<cl_device_exec_capabilities*>(info)),
                CL_QUEUE_PROFILING_ENABLE,
                "CL_QUEUE_PROFILING_ENABLE",
                memType);

            std::cout << "\t\t" << str << ":\t" << memType << std::endl;
        }
            break;
        default:
            std::cout << "\t\t" << str << ":\t" << *info << std::endl;
            break;
        }
    }
};

///
// Simple trait class used to wrap base types.
//
template <typename T>
class ArrayType
{
public:
    static bool isChar() { return false; }
};

///
// Specialized for the char (i.e. null terminated string case).
//
template<>
class ArrayType<char>
{
public:
    static bool isChar() { return true; }
};

///
// Specialized instance of class InfoDevice for array types.
//
template <typename T>
class InfoDevice<ArrayType<T> >
{
public:
    static void display(
        cl_device_id id,
        cl_device_info name,
        std::string str)
    {
        cl_int errNum;
        std::size_t paramValueSize;

        errNum = clGetDeviceInfo(
            id,
            name,
            0,
            NULL,
            &paramValueSize);
        if (errNum != CL_SUCCESS)
        {
            std::cerr
                << "Failed to find OpenCL device info "
                << str
                << "."
                << std::endl;
            return;
        }

        T * info = (T *)alloca(sizeof(T) * paramValueSize);
        errNum = clGetDeviceInfo(
            id,
            name,
            paramValueSize,
            info,
            NULL);
        if (errNum != CL_SUCCESS)
        {
            std::cerr
                << "Failed to find OpenCL device info "
                << str
                << "."
                << std::endl;
            return;
        }

        if (ArrayType<T>::isChar())
        {
            std::cout << "\t" << str << ":\t" << info << std::endl;
        }
        else if (name == CL_DEVICE_MAX_WORK_ITEM_SIZES)
        {
            cl_uint maxWorkItemDimensions;

            errNum = clGetDeviceInfo(
                id,
                CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS,
                sizeof(cl_uint),
                &maxWorkItemDimensions,
                NULL);
            if (errNum != CL_SUCCESS)
            {
                std::cerr
                    << "Failed to find OpenCL device info "
                    << "CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS."
                    << std::endl;
                return;
            }

            std::cout << "\t" << str << ":\t";
            for (cl_uint i = 0; i < maxWorkItemDimensions; i++)
            {
                std::cout << info[i] << " ";
            }
            std::cout << std::endl;
        }
    }
};

///
//  Enumerate platforms and display information about them 
//  and their associated devices.
//
void displayInfo(void)
{
    cl_int errNum;
    cl_uint numPlatforms;
    cl_platform_id * platformIds;
    cl_context context = NULL;

    // First, query the total number of platforms
    errNum = clGetPlatformIDs(0, NULL, &numPlatforms);
    if (errNum != CL_SUCCESS || numPlatforms <= 0)
    {
        std::cerr << "Failed to find any OpenCL platform." << std::endl;
        return;
    }

    // Next, allocate memory for the installed plaforms, and qeury 
    // to get the list.
    platformIds = (cl_platform_id *)alloca(sizeof(cl_platform_id) * numPlatforms);
    // First, query the total number of platforms
    errNum = clGetPlatformIDs(numPlatforms, platformIds, NULL);
    if (errNum != CL_SUCCESS)
    {
        std::cerr << "Failed to find any OpenCL platforms." << std::endl;
        return;
    }

    std::cout << "Number of platforms: \t" << numPlatforms << std::endl;
    // Iterate through the list of platforms displaying associated information
    for (cl_uint i = 0; i < numPlatforms; i++) {
        // First we display information associated with the platform
        DisplayPlatformInfo(
            platformIds[i],
            CL_PLATFORM_PROFILE,
            "CL_PLATFORM_PROFILE");
        DisplayPlatformInfo(
            platformIds[i],
            CL_PLATFORM_VERSION,
            "CL_PLATFORM_VERSION");
        DisplayPlatformInfo(
            platformIds[i],
            CL_PLATFORM_VENDOR,
            "CL_PLATFORM_VENDOR");
        DisplayPlatformInfo(
            platformIds[i],
            CL_PLATFORM_EXTENSIONS,
            "CL_PLATFORM_EXTENSIONS");

        // Now query the set of devices associated with the platform
        cl_uint numDevices;
        errNum = clGetDeviceIDs(
            platformIds[i],
            CL_DEVICE_TYPE_ALL,
            0,
            NULL,
            &numDevices);
        if (errNum != CL_SUCCESS)
        {
            std::cerr << "Failed to find OpenCL devices." << std::endl;
            return;
        }

        cl_device_id * devices = (cl_device_id *)alloca(sizeof(cl_device_id) * numDevices);
        errNum = clGetDeviceIDs(
            platformIds[i],
            CL_DEVICE_TYPE_ALL,
            numDevices,
            devices,
            NULL);
        if (errNum != CL_SUCCESS)
        {
            std::cerr << "Failed to find OpenCL devices." << std::endl;
            return;
        }

        std::cout << "\tNumber of devices: \t" << numDevices << std::endl;
        // Iterate through each device, displaying associated information
        for (cl_uint j = 0; j < numDevices; j++)
        {
            InfoDevice<cl_device_type>::display(
                devices[j],
                CL_DEVICE_TYPE,
                "CL_DEVICE_TYPE");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_VENDOR_ID,
                "CL_DEVICE_VENDOR_ID");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_MAX_COMPUTE_UNITS,
                "CL_DEVICE_MAX_COMPUTE_UNITS");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS,
                "CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS");

            InfoDevice<ArrayType<size_t> >::display(
                devices[j],
                CL_DEVICE_MAX_WORK_ITEM_SIZES,
                "CL_DEVICE_MAX_WORK_ITEM_SIZES");

            InfoDevice<std::size_t>::display(
                devices[j],
                CL_DEVICE_MAX_WORK_GROUP_SIZE,
                "CL_DEVICE_MAX_WORK_GROUP_SIZE");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR,
                "CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT,
                "CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT,
                "CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG,
                "CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT,
                "CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE,
                "CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE");

#ifdef CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF,
                "CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR,
                "CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT,
                "CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_NATIVE_VECTOR_WIDTH_INT,
                "CL_DEVICE_NATIVE_VECTOR_WIDTH_INT");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG,
                "CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT,
                "CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE,
                "CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF,
                "CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF");
#endif

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_MAX_CLOCK_FREQUENCY,
                "CL_DEVICE_MAX_CLOCK_FREQUENCY");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_ADDRESS_BITS,
                "CL_DEVICE_ADDRESS_BITS");

            InfoDevice<cl_ulong>::display(
                devices[j],
                CL_DEVICE_MAX_MEM_ALLOC_SIZE,
                "CL_DEVICE_MAX_MEM_ALLOC_SIZE");

            InfoDevice<cl_bool>::display(
                devices[j],
                CL_DEVICE_IMAGE_SUPPORT,
                "CL_DEVICE_IMAGE_SUPPORT");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_MAX_READ_IMAGE_ARGS,
                "CL_DEVICE_MAX_READ_IMAGE_ARGS");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_MAX_WRITE_IMAGE_ARGS,
                "CL_DEVICE_MAX_WRITE_IMAGE_ARGS");

            InfoDevice<std::size_t>::display(
                devices[j],
                CL_DEVICE_IMAGE2D_MAX_WIDTH,
                "CL_DEVICE_IMAGE2D_MAX_WIDTH");

            InfoDevice<std::size_t>::display(
                devices[j],
                CL_DEVICE_IMAGE2D_MAX_WIDTH,
                "CL_DEVICE_IMAGE2D_MAX_WIDTH");

            InfoDevice<std::size_t>::display(
                devices[j],
                CL_DEVICE_IMAGE2D_MAX_HEIGHT,
                "CL_DEVICE_IMAGE2D_MAX_HEIGHT");

            InfoDevice<std::size_t>::display(
                devices[j],
                CL_DEVICE_IMAGE3D_MAX_WIDTH,
                "CL_DEVICE_IMAGE3D_MAX_WIDTH");

            InfoDevice<std::size_t>::display(
                devices[j],
                CL_DEVICE_IMAGE3D_MAX_HEIGHT,
                "CL_DEVICE_IMAGE3D_MAX_HEIGHT");

            InfoDevice<std::size_t>::display(
                devices[j],
                CL_DEVICE_IMAGE3D_MAX_DEPTH,
                "CL_DEVICE_IMAGE3D_MAX_DEPTH");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_MAX_SAMPLERS,
                "CL_DEVICE_MAX_SAMPLERS");

            InfoDevice<std::size_t>::display(
                devices[j],
                CL_DEVICE_MAX_PARAMETER_SIZE,
                "CL_DEVICE_MAX_PARAMETER_SIZE");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_MEM_BASE_ADDR_ALIGN,
                "CL_DEVICE_MEM_BASE_ADDR_ALIGN");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE,
                "CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE");

            InfoDevice<cl_device_fp_config>::display(
                devices[j],
                CL_DEVICE_SINGLE_FP_CONFIG,
                "CL_DEVICE_SINGLE_FP_CONFIG");

            InfoDevice<cl_device_mem_cache_type>::display(
                devices[j],
                CL_DEVICE_GLOBAL_MEM_CACHE_TYPE,
                "CL_DEVICE_GLOBAL_MEM_CACHE_TYPE");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE,
                "CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE");

            InfoDevice<cl_ulong>::display(
                devices[j],
                CL_DEVICE_GLOBAL_MEM_CACHE_SIZE,
                "CL_DEVICE_GLOBAL_MEM_CACHE_SIZE");

            InfoDevice<cl_ulong>::display(
                devices[j],
                CL_DEVICE_GLOBAL_MEM_SIZE,
                "CL_DEVICE_GLOBAL_MEM_SIZE");

            InfoDevice<cl_ulong>::display(
                devices[j],
                CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE,
                "CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE");

            InfoDevice<cl_uint>::display(
                devices[j],
                CL_DEVICE_MAX_CONSTANT_ARGS,
                "CL_DEVICE_MAX_CONSTANT_ARGS");

            InfoDevice<cl_device_local_mem_type>::display(
                devices[j],
                CL_DEVICE_LOCAL_MEM_TYPE,
                "CL_DEVICE_LOCAL_MEM_TYPE");

            InfoDevice<cl_ulong>::display(
                devices[j],
                CL_DEVICE_LOCAL_MEM_SIZE,
                "CL_DEVICE_LOCAL_MEM_SIZE");

            InfoDevice<cl_bool>::display(
                devices[j],
                CL_DEVICE_ERROR_CORRECTION_SUPPORT,
                "CL_DEVICE_ERROR_CORRECTION_SUPPORT");

#ifdef CL_DEVICE_HOST_UNIFIED_MEMORY
            InfoDevice<cl_bool>::display(
                devices[j],
                CL_DEVICE_HOST_UNIFIED_MEMORY,
                "CL_DEVICE_HOST_UNIFIED_MEMORY");
#endif

            InfoDevice<std::size_t>::display(
                devices[j],
                CL_DEVICE_PROFILING_TIMER_RESOLUTION,
                "CL_DEVICE_PROFILING_TIMER_RESOLUTION");

            InfoDevice<cl_bool>::display(
                devices[j],
                CL_DEVICE_ENDIAN_LITTLE,
                "CL_DEVICE_ENDIAN_LITTLE");

            InfoDevice<cl_bool>::display(
                devices[j],
                CL_DEVICE_AVAILABLE,
                "CL_DEVICE_AVAILABLE");

            InfoDevice<cl_bool>::display(
                devices[j],
                CL_DEVICE_COMPILER_AVAILABLE,
                "CL_DEVICE_COMPILER_AVAILABLE");

            InfoDevice<cl_device_exec_capabilities>::display(
                devices[j],
                CL_DEVICE_EXECUTION_CAPABILITIES,
                "CL_DEVICE_EXECUTION_CAPABILITIES");

            InfoDevice<cl_command_queue_properties>::display(
                devices[j],
                CL_DEVICE_QUEUE_PROPERTIES,
                "CL_DEVICE_QUEUE_PROPERTIES");

            InfoDevice<cl_platform_id>::display(
                devices[j],
                CL_DEVICE_PLATFORM,
                "CL_DEVICE_PLATFORM");

            InfoDevice<ArrayType<char> >::display(
                devices[j],
                CL_DEVICE_NAME,
                "CL_DEVICE_NAME");

            InfoDevice<ArrayType<char> >::display(
                devices[j],
                CL_DEVICE_VENDOR,
                "CL_DEVICE_VENDOR");

            InfoDevice<ArrayType<char> >::display(
                devices[j],
                CL_DRIVER_VERSION,
                "CL_DRIVER_VERSION");

            InfoDevice<ArrayType<char> >::display(
                devices[j],
                CL_DEVICE_PROFILE,
                "CL_DEVICE_PROFILE");

            InfoDevice<ArrayType<char> >::display(
                devices[j],
                CL_DEVICE_VERSION,
                "CL_DEVICE_VERSION");

#ifdef CL_DEVICE_OPENCL_C_VERSION
            InfoDevice<ArrayType<char> >::display(
                devices[j],
                CL_DEVICE_OPENCL_C_VERSION,
                "CL_DEVICE_OPENCL_C_VERSION");
#endif

            InfoDevice<ArrayType<char> >::display(
                devices[j],
                CL_DEVICE_EXTENSIONS,
                "CL_DEVICE_EXTENSIONS");


            std::cout << std::endl << std::endl;
        }
    }
}

//-----------以上为获取并显示OpenCL设备信息的代码------------------

cl_program CreateProgram(cl_context context, cl_device_id device, const char* fileName)  
{  
    cl_int errNum;  
    cl_program program;  

    std::ifstream kernelFile(fileName, std::ios::in);  
    if (!kernelFile.is_open())  
    {  
        std::cerr << "Failed to open file for reading: " << fileName << std::endl;  
        return NULL;  
    }  

    std::ostringstream oss;  
    oss << kernelFile.rdbuf();  

    std::string srcStdStr = oss.str();  
    const char *srcStr = srcStdStr.c_str();  
    program = clCreateProgramWithSource(context, 1,  
        (const char**)&srcStr,  
        NULL, NULL);  
    if (program == NULL)  
    {  
        std::cerr << "Failed to create CL program from source." << std::endl;  
        return NULL;  
    }  

    errNum = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);  
    if (errNum != CL_SUCCESS)  
    {  
        // Determine the reason for the error  
        char buildLog[16384];  
        clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG,  
            sizeof(buildLog), buildLog, NULL);  

        std::cerr << "Error in kernel: " << std::endl;  
        std::cerr << buildLog;  
        clReleaseProgram(program);  
        return NULL;  
    }  

    return program;  
}  


void Cleanup(cl_context context, cl_command_queue commandQueue,  
             cl_program program, cl_kernel kernel, cl_mem imageObjects[2])  
{  
    for (int i = 0; i < 2; i++)  
    {  
        if (imageObjects[i] != 0)  
            clReleaseMemObject(imageObjects[i]);  
    }  
    if (commandQueue != 0)  
        clReleaseCommandQueue(commandQueue);  

    if (kernel != 0)  
        clReleaseKernel(kernel);  

    if (program != 0)  
        clReleaseProgram(program);  

    if (context != 0)  
        clReleaseContext(context);  

}  

cl_mem LoadImage(cl_context context, char *fileName, int &width, int &height)  
{  
    cv::Mat image1 = cv::imread(fileName);  
    width = image1.cols;  
    height = image1.rows;  
    char *buffer = new char[width * height * 4];  
    int w = 0;  
    for (int v = height - 1; v >= 0; v--)  
    {  
        for (int u = 0; u <width; u++)  
        {  
            buffer[w++] = image1.at<cv::Vec3b>(v, u)[0];  
            buffer[w++] = image1.at<cv::Vec3b>(v, u)[1];  
            buffer[w++] = image1.at<cv::Vec3b>(v, u)[2];  
            w++;  
        }  
    }  

    // Create OpenCL image  
    cl_image_format clImageFormat;  
    clImageFormat.image_channel_order = CL_RGBA;  
    clImageFormat.image_channel_data_type = CL_UNORM_INT8;  

    cl_int errNum;  
    cl_mem clImage;  
    clImage = clCreateImage2D(context,  
        CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,  
        &clImageFormat,  
        width,  
        height,  
        0,  
        buffer,  
        &errNum);  

    if (errNum != CL_SUCCESS)  
    {  
        std::cerr << "Error creating CL image object" << std::endl;  
        return 0;  
    }  

    return clImage;  
}  

size_t RoundUp(int groupSize, int globalSize)  
{  
    int r = globalSize % groupSize;  
    if (r == 0)  
    {  
        return globalSize;  
    }  
    else  
    {  
        return globalSize + groupSize - r;  
    }  
}  

int main(int argc, char** argv)  
{  
    cl_context context = 0;  
    cl_command_queue commandQueue = 0;  
    cl_program program = 0;  
    cl_device_id device = 0;  
    cl_kernel kernel = 0;  
    cl_mem imageObjects[2] = { 0, 0 };  
    cl_int errNum;  

    //打印所有OpenCL平台设备信息
    displayInfo();

    cl_uint numplatforms;
    errNum = clGetPlatformIDs(0, NULL, &numplatforms);
    if (errNum != CL_SUCCESS || numplatforms <= 0){
        printf("没有找到OpenCL平台 \n");
        return 1;
    }

    cl_platform_id * platformIds;
    platformIds = (cl_platform_id*)alloca(sizeof(cl_platform_id)*numplatforms);
    errNum = clGetPlatformIDs(numplatforms, platformIds, NULL);
    if (errNum != CL_SUCCESS){
        printf("没有找到OpenCL平台 \n");
        return 1;
    }
    printf("平台数:%d \n", numplatforms);

    //选用CL_DEVICE_MAX_WORK_GROUP_SIZE最大的显卡
    cl_uint numDevices,index_platform = 0,index_device = 0;
    cl_device_id *devicesIds;
    std::size_t paramValueSize = 0;
    for (cl_uint i = 0; i < numplatforms; i++){
        errNum = clGetDeviceIDs(platformIds[i], CL_DEVICE_TYPE_GPU, 0, NULL, &numDevices);
        if (errNum != CL_SUCCESS || numDevices <= 0){
            printf("平台 %d 没有找到设备",i);
            continue;
        }
        devicesIds = (cl_device_id*)alloca(sizeof(cl_device_id)*numDevices);
        errNum = clGetDeviceIDs(platformIds[i], CL_DEVICE_TYPE_GPU, numDevices, devicesIds, NULL);
        if (errNum != CL_SUCCESS ){
            printf("平台 %d 获取设备ID失败", i);
            continue;
        }

        for (cl_uint j = 0; j < numDevices; j++){
            std::size_t tmpSize = 0;
            errNum = clGetDeviceInfo(devicesIds[j], CL_DEVICE_MAX_WORK_GROUP_SIZE, sizeof(size_t), &tmpSize, NULL);
            if (errNum != CL_SUCCESS){
                std::cerr << "Failed to find OpenCL device info " << std::endl;
                continue;
            }

            if (tmpSize >= paramValueSize){
                index_platform = i;
                index_device = j;
            }
        }
    }

    cl_context_properties contextProperties[] ={
        CL_CONTEXT_PLATFORM,
        (cl_context_properties)platformIds[index_platform],
        0
    };
    context = clCreateContext(contextProperties, numDevices, devicesIds, NULL, NULL, &errNum);
    if (errNum != CL_SUCCESS){
        std::cerr << "Failed to Create Context " << std::endl;
        return 1;
    }

    device = devicesIds[index_device];

    // Create a command-queue on the first device available  
    // on the created context  
    commandQueue = clCreateCommandQueue(context, device, CL_QUEUE_PROFILING_ENABLE, &errNum);
    if (commandQueue == NULL)  {  
        Cleanup(context, commandQueue, program, kernel, imageObjects);  
         system("pause") ; return 1; 
    }  

    // Make sure the device supports images, otherwise exit  
    cl_bool imageSupport = CL_FALSE;  
    clGetDeviceInfo(device, CL_DEVICE_IMAGE_SUPPORT, sizeof(cl_bool), &imageSupport, NULL);  
    if (imageSupport != CL_TRUE)  {  
        std::cerr << "OpenCL device does not support images." << std::endl;  
        Cleanup(context, commandQueue, program, kernel, imageObjects);  
         system("pause") ; return 1; 
    }  

    // Load input image from file and load it into  
    // an OpenCL image object  
    int width, height;  
    char *src0 = "test.png";
    imageObjects[0] = LoadImage(context, src0, width, height);  
    if (imageObjects[0] == 0)  {  
        std::cerr << "Error loading: " << std::string(src0) << std::endl;  
        Cleanup(context, commandQueue, program, kernel, imageObjects);  
         system("pause") ; return 1; 
    }  

    // Create ouput image object  
    cl_image_format clImageFormat;  
    clImageFormat.image_channel_order = CL_RGBA;  
    clImageFormat.image_channel_data_type = CL_UNORM_INT8;  
    imageObjects[1] = clCreateImage2D(context,  
        CL_MEM_WRITE_ONLY,  
        &clImageFormat,  
        width,  
        height,  
        0,  
        NULL,  
        &errNum);  

    if (errNum != CL_SUCCESS){  
        std::cerr << "Error creating CL output image object." << std::endl;  
        Cleanup(context, commandQueue, program, kernel, imageObjects);  
         system("pause") ; return 1; 
    }  

    // Create OpenCL program  
    program = CreateProgram(context, device, "bilateralBlur.cl");  
    if (program == NULL)  {  
        Cleanup(context, commandQueue, program, kernel, imageObjects);  
         system("pause") ; return 1; 
    }  
    // Create OpenCL kernel  
    kernel = clCreateKernel(program, "bilateralBlur", NULL);  
    if (kernel == NULL)  {  
        std::cerr << "Failed to create kernel" << std::endl;  
        Cleanup(context, commandQueue, program, kernel, imageObjects);  
         system("pause") ; return 1; 
    }  

    // Set the kernel arguments  
    errNum = clSetKernelArg(kernel, 0, sizeof(cl_mem), &imageObjects[0]);  
    errNum |= clSetKernelArg(kernel, 1, sizeof(cl_mem), &imageObjects[1]);  
    if (errNum != CL_SUCCESS)  {  
        std::cerr << "Error setting kernel arguments." << std::endl;  
        Cleanup(context, commandQueue, program, kernel, imageObjects);  
        system("pause") ; return 1; 
    }  

    size_t localWorkSize[2] = { 32, 32 };  
    size_t globalWorkSize[2] = { RoundUp(localWorkSize[0], width),  
        RoundUp(localWorkSize[1], height) };  

    cl_event prof_event;

    // Queue the kernel up for execution  
    errNum = clEnqueueNDRangeKernel(commandQueue, kernel, 2, NULL,  
        globalWorkSize, localWorkSize,  
        0, NULL, &prof_event);
    if (errNum != CL_SUCCESS)  
    {  
        std::cerr << "Error queuing kernel for execution." << std::endl;  
        Cleanup(context, commandQueue, program, kernel, imageObjects);  
         system("pause") ; return 1; 
    }

    clFinish(commandQueue);
    errNum = clWaitForEvents(1, &prof_event);
    if (errNum)
    {
        printf("clWaitForEvents() failed for histogram_rgba_unorm8 kernel. (%d)\n", errNum);
        return EXIT_FAILURE;
    }

    cl_ulong ev_start_time = (cl_ulong)0;
    cl_ulong ev_end_time = (cl_ulong)0;
    size_t return_bytes;

    errNum = clGetEventProfilingInfo(prof_event, CL_PROFILING_COMMAND_QUEUED,sizeof(cl_ulong), &ev_start_time, &return_bytes);
    errNum |= clGetEventProfilingInfo(prof_event, CL_PROFILING_COMMAND_END,sizeof(cl_ulong), &ev_end_time, &return_bytes);
    if (errNum)
    {
        printf("clGetEventProfilingInfo() failed for kernel. (%d)\n", errNum);
        return EXIT_FAILURE;
    }

    double run_time = (double)(ev_end_time - ev_start_time);

    printf("Image dimensions: %d x %d pixels, Image type = CL_RGBA, CL_UNORM_INT8\n", width, height);
    printf("Work Timer:%lfms\n", run_time / 1000000);

    clReleaseEvent(prof_event);

    // Read the output buffer back to the Host  
    char *buffer = new char[width * height * 4];  
    size_t origin[3] = { 0, 0, 0 };  
    size_t region[3] = { width, height, 1 };  
    errNum = clEnqueueReadImage(commandQueue, imageObjects[1], CL_TRUE,  
        origin, region, 0, 0, buffer,  
        0, NULL, NULL);  
    if (errNum != CL_SUCCESS)  {  
        std::cerr << "Error reading result buffer." << std::endl;  
        Cleanup(context, commandQueue, program, kernel, imageObjects);  
         system("pause") ; return 1; 
    }  

    std::cout << std::endl;  
    std::cout << "Executed program succesfully." << std::endl;  

    // Save the image out to disk  
    char *saveImage = "output.jpg";
    //std::cout << buffer << std::endl;  
    cv::Mat imageColor = cv::imread(src0);  
    cv::Mat imageColor2;  
    imageColor2.create(imageColor.rows, imageColor.cols, imageColor.type());  
    int w = 0;  
    for (int v = imageColor2.rows-1; v >=0; v--)  {  
        for (int u =0 ; u <imageColor2.cols; u++)  {  
            imageColor2.at<cv::Vec3b>(v, u)[0] = buffer[w++];  
            imageColor2.at<cv::Vec3b>(v, u)[1] = buffer[w++];  
            imageColor2.at<cv::Vec3b>(v, u)[2] = buffer[w++];  
            w++;  
        }  
    }

    cv::imshow("原始图像", imageColor);
    cv::imshow("磨皮后", imageColor2);  
    cv::imwrite(saveImage, imageColor2);  
    cv::waitKey(0);  

    delete[] buffer;  

    Cleanup(context, commandQueue, program, kernel, imageObjects);  

    return 0;  
}

   
这些host端的次序包括了opencv的少数内容,重若是用opencv来读取图片,用别样措施读取图片当然也是可以的。实际上,opencv本身有一个ocl模块,貌似是由英特尔给opencv做得OpenCL扩充,其中囊括了比比皆是用OpenCL达成的opencv的一部分常用函数,其中就已经席卷了两边滤波和自适应双边滤波。

   
这段程序拔取了CL_DEVICE_MAX_WORK_GROUP_SIZE最大的显卡,最佳的OpenCL设备的精选相应综合考虑,在自身的计算机上CL_DEVICE_MAX_WORK_GROUP_SIZE的CPU就像就是一级的OpenCL设备,纵然在实际取得的装置音信中CPU的许多参数比GPU强,不过实际运作的时长却是GPU的几倍,所以对于用哪些参数来判定一个OpenCL设备是顶级的本身也不是很了然,希望精晓朋友可以引导有限。

   
其余,那段程序其实是很简短的,实际可行的代码唯有300多行,获取装备音讯的代码只是为着看看自己的处理器上有哪些OpenCL设备以及有关的音信,main中的displayInfo();完全可以注释掉。

   
其它关于OpenCL库文件的获取,可以从intel,英特尔,英特尔等获得到,我所利用的OpenCL的头文件和lib文件就是从英特尔的CUDA里面copy出来的,你也能够直接就是用自己的。

4.OpenCL中处理D3D纹理

      opencl修改纹理内容:

cl_int computeTexture()
{
    cl_int errNum;

    static cl_int seq =0;
    seq = (seq+1)%(g_WindowWidth*2);

    errNum = clSetKernelArg(tex_kernel, 0, sizeof(cl_mem), &g_clTexture2D);
    errNum = clSetKernelArg(tex_kernel, 1, sizeof(cl_int), &g_WindowWidth);
    errNum = clSetKernelArg(tex_kernel, 2, sizeof(cl_int), &g_WindowHeight);
    errNum = clSetKernelArg(tex_kernel, 3, sizeof(cl_int), &seq);

    size_t tex_globalWorkSize[2] = { g_WindowWidth, g_WindowHeight };
    size_t tex_localWorkSize[2] = { 32, 4 } ;

    errNum = clEnqueueAcquireD3D10ObjectsKHR(commandQueue, 1, &g_clTexture2D, 0, NULL, NULL );

    errNum = clEnqueueNDRangeKernel(commandQueue, tex_kernel, 2, NULL,
                                    tex_globalWorkSize, tex_localWorkSize,
                                    0, NULL, NULL);
    if (errNum != CL_SUCCESS)
    {
        std::cerr << "Error queuing kernel for execution." << std::endl;
    }
    errNum = clEnqueueReleaseD3D10ObjectsKHR(commandQueue, 1, &g_clTexture2D, 0, NULL, NULL );
    clFinish(commandQueue);
    return 0;
}

用opencl内核计算生成一个D3D纹理对象的始末:

__kernel void xyz_init_texture_kernel(__write_only image2d_t im, int w, int h, int seq )
{
    int2 coord = { get_global_id(0), get_global_id(1) };
    float4 color =  { 
                      (float)coord.x/(float)w,
                      (float)coord.y/(float)h,
                      (float)abs(seq-w)/(float)w,
                      1.0f};
    write_imagef( im, coord, color );
}

其一纹理使用write_imagef()函数写至基本。那里seq是一个连串号变量,在宿主机上每一帧会循环递增,并发送至内核。在基本中,seq变量用于生成纹理颜色值。seq递增时,颜色会改变来贯彻纹理动画。

       
其余,代码中选拔了一种渲染技术g_pTechnique。那是一个骨干处理管线,会用到一个不难易行的巅峰着色器,将顶点和纹理坐标传递到一个像素着色器:

//
// Vertex Shader
//
PS_INPUT VS( VS_INPUT input )
{
    PS_INPUT output = (PS_INPUT)0;
    output.Pos = input.Pos;
    output.Tex = input.Tex;

    return output;
}

technique10 Render
{
    pass P0
    {
        SetVertexShader( CompileShader( vs_4_0, VS() ) );
        SetGeometryShader( NULL );
        SetPixelShader( CompileShader( ps_4_0, PS() ) );
    }
}

       
那么些技能运用正规的D3D10调用加载。像素着色器再对OpenCL内核修改的纹路完结纹理查找,比提供显示:

SamplerState samLinear
{
    Filter = MIN_MAG_MIP_LINEAR;
    AddressU = Wrap;
    AddressV = Wrap;
};

float4 PS( PS_INPUT input) : SV_Target
{
    return txDiffuse.Sample( samLinear, input.Tex );
}

在像素着色器中,samLinear是输入纹理的一个线性采样器。对于渲染循环的历次迭代,OpenCL在computeTexture()中更新纹理内容,有D3D10显得更新的纹理。

4.运行结果

(1)硬件音讯

图片 9图片 10

(2)控制台出口OpenCL设备的音信

Number of platforms:    2
        CL_PLATFORM_PROFILE:    FULL_PROFILE
        CL_PLATFORM_VERSION:    OpenCL 2.0
        CL_PLATFORM_VENDOR:     Intel(R) Corporation
        CL_PLATFORM_EXTENSIONS: cl_intel_dx9_media_sharing
cl_khr_3d_image_writes cl_khr_byte_addressable_store
cl_khr_d3d11_sharing cl_khr_depth_images
cl_khr_dx9_media_sharing cl_khr_gl_sharing
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics cl_khr_icd
cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics cl_khr_spir
        Number of devices:      2
                CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
                CL_DEVICE_VENDOR_ID:    32902
                CL_DEVICE_MAX_COMPUTE_UNITS:    24
                CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS:     3
        CL_DEVICE_MAX_WORK_ITEM_SIZES:  256 256 256
                CL_DEVICE_MAX_WORK_GROUP_SIZE:  256
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR:  1
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT: 1
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT:   1
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG:  1
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT: 1
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE:       
0
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF:  1
                CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR:     1
                CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT:    1
                CL_DEVICE_NATIVE_VECTOR_WIDTH_INT:      1
                CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG:     1
                CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT:    1
                CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE:   0
                CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF:     1
                CL_DEVICE_MAX_CLOCK_FREQUENCY:  1050
                CL_DEVICE_ADDRESS_BITS: 32
                CL_DEVICE_MAX_MEM_ALLOC_SIZE:   390280806
                CL_DEVICE_IMAGE_SUPPORT:        1
                CL_DEVICE_MAX_READ_IMAGE_ARGS:  128
                CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 128
                CL_DEVICE_IMAGE2D_MAX_WIDTH:    16384
                CL_DEVICE_IMAGE2D_MAX_WIDTH:    16384
                CL_DEVICE_IMAGE2D_MAX_HEIGHT:   16384
                CL_DEVICE_IMAGE3D_MAX_WIDTH:    16384
                CL_DEVICE_IMAGE3D_MAX_HEIGHT:   16384
                CL_DEVICE_IMAGE3D_MAX_DEPTH:    2048
                CL_DEVICE_MAX_SAMPLERS: 16
                CL_DEVICE_MAX_PARAMETER_SIZE:   1024
                CL_DEVICE_MEM_BASE_ADDR_ALIGN:  1024
                CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE:     128
                CL_DEVICE_SINGLE_FP_CONFIG:     CL_FP_DENORM |
CL_FP_INF_NAN | CL_FP_ROUND_TO_NEAREST |
CL_FP_ROUND_TO_ZERO | CL_FP_ROUND_TO_INF
                CL_DEVICE_SINGLE_FP_CONFIG:    
CL_READ_ONLY_CACHE | CL_READ_WRITE_CACHE
                CL_DEVICE_GLOBAL_MEM_CACHE_TYPE:       
CL_READ_WRITE_CACHE
                CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE:    64
                CL_DEVICE_GLOBAL_MEM_CACHE_SIZE:        524288
                CL_DEVICE_GLOBAL_MEM_SIZE:      1561123226
                CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE:     65536
                CL_DEVICE_MAX_CONSTANT_ARGS:    8
                CL_DEVICE_LOCAL_MEM_TYPE:
                CL_DEVICE_LOCAL_MEM_SIZE:       65536
                CL_DEVICE_ERROR_CORRECTION_SUPPORT:     0
                CL_DEVICE_HOST_UNIFIED_MEMORY:  1
                CL_DEVICE_PROFILING_TIMER_RESOLUTION:   83
                CL_DEVICE_ENDIAN_LITTLE:        1
                CL_DEVICE_AVAILABLE:    1
                CL_DEVICE_COMPILER_AVAILABLE:   1
                CL_DEVICE_EXECUTION_CAPABILITIES:      
CL_EXEC_KERNEL
                CL_DEVICE_QUEUE_PROPERTIES:    
CL_QUEUE_PROFILING_ENABLE
                CL_DEVICE_PLATFORM:     00DEC488
        CL_DEVICE_NAME: Intel(R) HD Graphics 520
        CL_DEVICE_VENDOR:       Intel(R) Corporation
        CL_DRIVER_VERSION:      20.19.15.4364
        CL_DEVICE_PROFILE:      FULL_PROFILE
        CL_DEVICE_VERSION:      OpenCL 2.0
        CL_DEVICE_OPENCL_C_VERSION:     OpenCL C 2.0
        CL_DEVICE_EXTENSIONS:   cl_intel_accelerator
cl_intel_advanced_motion_estimation cl_intel_ctz
cl_intel_d3d11_nv12_media_sharing cl_intel_dx9_media_sharing
cl_intel_motion_estimation cl_intel_simultaneous_sharing
cl_intel_subgroups cl_khr_3d_image_writes
cl_khr_byte_addressable_store cl_khr_d3d10_sharing
cl_khr_d3d11_sharing cl_khr_depth_images
cl_khr_dx9_media_sharing cl_khr_fp16 cl_khr_gl_depth_images
cl_khr_gl_event cl_khr_gl_msaa_sharing
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics cl_khr_gl_sharing
cl_khr_icd cl_khr_image2d_from_buffer
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics cl_khr_mipmap_image
cl_khr_mipmap_image_writes cl_khr_spir

                CL_DEVICE_TYPE: CL_DEVICE_TYPE_CPU
                CL_DEVICE_VENDOR_ID:    32902
                CL_DEVICE_MAX_COMPUTE_UNITS:    4
                CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS:     3
        CL_DEVICE_MAX_WORK_ITEM_SIZES:  8192 8192 8192
                CL_DEVICE_MAX_WORK_GROUP_SIZE:  8192
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR:  1
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT: 1
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT:   1
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG:  1
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT: 1
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE:       
1
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF:  0
                CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR:     32
                CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT:    16
                CL_DEVICE_NATIVE_VECTOR_WIDTH_INT:      8
                CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG:     4
                CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT:    8
                CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE:   4
                CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF:     0
                CL_DEVICE_MAX_CLOCK_FREQUENCY:  2500
                CL_DEVICE_ADDRESS_BITS: 32
                CL_DEVICE_MAX_MEM_ALLOC_SIZE:   536838144
                CL_DEVICE_IMAGE_SUPPORT:        1
                CL_DEVICE_MAX_READ_IMAGE_ARGS:  480
                CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 480
                CL_DEVICE_IMAGE2D_MAX_WIDTH:    16384
                CL_DEVICE_IMAGE2D_MAX_WIDTH:    16384
                CL_DEVICE_IMAGE2D_MAX_HEIGHT:   16384
                CL_DEVICE_IMAGE3D_MAX_WIDTH:    2048
                CL_DEVICE_IMAGE3D_MAX_HEIGHT:   2048
                CL_DEVICE_IMAGE3D_MAX_DEPTH:    2048
                CL_DEVICE_MAX_SAMPLERS: 480
                CL_DEVICE_MAX_PARAMETER_SIZE:   3840
                CL_DEVICE_MEM_BASE_ADDR_ALIGN:  1024
                CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE:     128
                CL_DEVICE_SINGLE_FP_CONFIG:     CL_FP_DENORM |
CL_FP_INF_NAN | CL_FP_ROUND_TO_NEAREST
                CL_DEVICE_SINGLE_FP_CONFIG:    
CL_READ_ONLY_CACHE | CL_READ_WRITE_CACHE
                CL_DEVICE_GLOBAL_MEM_CACHE_TYPE:       
CL_READ_WRITE_CACHE
                CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE:    64
                CL_DEVICE_GLOBAL_MEM_CACHE_SIZE:        262144
                CL_DEVICE_GLOBAL_MEM_SIZE:      2147352576
                CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE:     131072
                CL_DEVICE_MAX_CONSTANT_ARGS:    480
                CL_DEVICE_LOCAL_MEM_TYPE:       CL_LOCAL |
CL_GLOBAL
                CL_DEVICE_LOCAL_MEM_SIZE:       32768
                CL_DEVICE_ERROR_CORRECTION_SUPPORT:     0
                CL_DEVICE_HOST_UNIFIED_MEMORY:  1
                CL_DEVICE_PROFILING_TIMER_RESOLUTION:   395
                CL_DEVICE_ENDIAN_LITTLE:        1
                CL_DEVICE_AVAILABLE:    1
                CL_DEVICE_COMPILER_AVAILABLE:   1
                CL_DEVICE_EXECUTION_CAPABILITIES:      
CL_EXEC_KERNEL | CL_EXEC_NATIVE_KERNEL
                CL_DEVICE_QUEUE_PROPERTIES:    
CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE |
CL_QUEUE_PROFILING_ENABLE
                CL_DEVICE_PLATFORM:     00DEC488
        CL_DEVICE_NAME: Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz
        CL_DEVICE_VENDOR:       Intel(R) Corporation
        CL_DRIVER_VERSION:      5.2.0.10094
        CL_DEVICE_PROFILE:      FULL_PROFILE
        CL_DEVICE_VERSION:      OpenCL 2.0 (Build 10094)
        CL_DEVICE_OPENCL_C_VERSION:     OpenCL C 2.0
        CL_DEVICE_EXTENSIONS:   cl_khr_icd
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_khr_byte_addressable_store cl_khr_depth_images
cl_khr_3d_image_writes cl_intel_exec_by_local_thread
cl_khr_spir cl_khr_dx9_media_sharing
cl_intel_dx9_media_sharing cl_khr_d3d11_sharing
cl_khr_gl_sharing cl_khr_fp64 cl_khr_image2d_from_buffer

        CL_PLATFORM_PROFILE:    FULL_PROFILE
        CL_PLATFORM_VERSION:    OpenCL 1.2 CUDA 8.0.44
        CL_PLATFORM_VENDOR:     NVIDIA Corporation
        CL_PLATFORM_EXTENSIONS:
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics cl_khr_fp64
cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing
cl_nv_compiler_options cl_nv_device_attribute_query
cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing
cl_nv_d3d11_sharing cl_nv_copy_opts
        Number of devices:      1
                CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
                CL_DEVICE_VENDOR_ID:    4318
                CL_DEVICE_MAX_COMPUTE_UNITS:    3
                CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS:     3
        CL_DEVICE_MAX_WORK_ITEM_SIZES:  1024 1024 64
                CL_DEVICE_MAX_WORK_GROUP_SIZE:  1024
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR:  1
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT: 1
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT:   1
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG:  1
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT: 1
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE:       
1
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF:  0
                CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR:     1
                CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT:    1
                CL_DEVICE_NATIVE_VECTOR_WIDTH_INT:      1
                CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG:     1
                CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT:    1
                CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE:   1
                CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF:     0
                CL_DEVICE_MAX_CLOCK_FREQUENCY:  1241
                CL_DEVICE_ADDRESS_BITS: 32
                CL_DEVICE_MAX_MEM_ALLOC_SIZE:   536870912
                CL_DEVICE_IMAGE_SUPPORT:        1
                CL_DEVICE_MAX_READ_IMAGE_ARGS:  256
                CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 16
                CL_DEVICE_IMAGE2D_MAX_WIDTH:    16384
                CL_DEVICE_IMAGE2D_MAX_WIDTH:    16384
                CL_DEVICE_IMAGE2D_MAX_HEIGHT:   16384
                CL_DEVICE_IMAGE3D_MAX_WIDTH:    4096
                CL_DEVICE_IMAGE3D_MAX_HEIGHT:   4096
                CL_DEVICE_IMAGE3D_MAX_DEPTH:    4096
                CL_DEVICE_MAX_SAMPLERS: 32
                CL_DEVICE_MAX_PARAMETER_SIZE:   4352
                CL_DEVICE_MEM_BASE_ADDR_ALIGN:  4096
                CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE:     128
                CL_DEVICE_SINGLE_FP_CONFIG:     CL_FP_DENORM |
CL_FP_INF_NAN | CL_FP_ROUND_TO_NEAREST |
CL_FP_ROUND_TO_ZERO | CL_FP_ROUND_TO_INF | CL_FP_FMA
                CL_DEVICE_SINGLE_FP_CONFIG:    
CL_READ_ONLY_CACHE | CL_READ_WRITE_CACHE
                CL_DEVICE_GLOBAL_MEM_CACHE_TYPE:       
CL_READ_WRITE_CACHE
                CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE:    128
                CL_DEVICE_GLOBAL_MEM_CACHE_SIZE:        49152
                CL_DEVICE_GLOBAL_MEM_SIZE:      2147483648
                CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE:     65536
                CL_DEVICE_MAX_CONSTANT_ARGS:    9
                CL_DEVICE_LOCAL_MEM_TYPE:
                CL_DEVICE_LOCAL_MEM_SIZE:       49152
                CL_DEVICE_ERROR_CORRECTION_SUPPORT:     0
                CL_DEVICE_HOST_UNIFIED_MEMORY:  0
                CL_DEVICE_PROFILING_TIMER_RESOLUTION:   1000
                CL_DEVICE_ENDIAN_LITTLE:        1
                CL_DEVICE_AVAILABLE:    1
                CL_DEVICE_COMPILER_AVAILABLE:   1
                CL_DEVICE_EXECUTION_CAPABILITIES:      
CL_EXEC_KERNEL
                CL_DEVICE_QUEUE_PROPERTIES:    
CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE |
CL_QUEUE_PROFILING_ENABLE
                CL_DEVICE_PLATFORM:     00E30580
        CL_DEVICE_NAME: GeForce 940MX
        CL_DEVICE_VENDOR:       NVIDIA Corporation
        CL_DRIVER_VERSION:      369.30
        CL_DEVICE_PROFILE:      FULL_PROFILE
        CL_DEVICE_VERSION:      OpenCL 1.2 CUDA
        CL_DEVICE_OPENCL_C_VERSION:     OpenCL C 1.2
        CL_DEVICE_EXTENSIONS:  
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics cl_khr_fp64
cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing
cl_nv_compiler_options cl_nv_device_attribute_query
cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing
cl_nv_d3d11_sharing cl_nv_copy_opts

平台数:2
Image dimensions: 273 x 415 pixels, Image type = CL_RGBA,
CL_UNORM_INT8
Work Timer:3.422816ms

Executed program succesfully.

273X415尺寸的图形用时不到4ms。

(3)双边滤波的法力

图片 11

   
效果应该来说是很显明的。可是出于没有肤色检测和尾声一步锐化,以及参数的设置等题材,连自己爱人都说这几个磨皮效果太嫩了,瞧着很假。所以在算法上本人这一个是有待健全的。

    其它,在进度上,那一个算法应该依然有优化的半空中。

 

 

源码:http://download.csdn.net/download/qq_33892166/9761287

    源码若是报错“Error queuing kernel for execution.”,尝试修改 size_t
localWorkSize[2] = { 32, 32 }; 为 size_t localWorkSize[2] = { 16,
16 };

 

  

4.周转结果

(1)硬件信息

图片 12图片 13

(2)控制台出口OpenCL设备的信息

Number of platforms:    2
        CL_PLATFORM_PROFILE:    FULL_PROFILE
        CL_PLATFORM_VERSION:    OpenCL 2.0
        CL_PLATFORM_VENDOR:     Intel(R) Corporation
        CL_PLATFORM_EXTENSIONS: cl_intel_dx9_media_sharing
cl_khr_3d_image_writes cl_khr_byte_addressable_store
cl_khr_d3d11_sharing cl_khr_depth_images
cl_khr_dx9_media_sharing cl_khr_gl_sharing
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics cl_khr_icd
cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics cl_khr_spir
        Number of devices:      2
                CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
                CL_DEVICE_VENDOR_ID:    32902
                CL_DEVICE_MAX_COMPUTE_UNITS:    24
                CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS:     3
        CL_DEVICE_MAX_WORK_ITEM_SIZES:  256 256 256
                CL_DEVICE_MAX_WORK_GROUP_SIZE:  256
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR:  1
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT: 1
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT:   1
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG:  1
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT: 1
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE:       
0
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF:  1
                CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR:     1
                CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT:    1
                CL_DEVICE_NATIVE_VECTOR_WIDTH_INT:      1
                CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG:     1
                CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT:    1
                CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE:   0
                CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF:     1
                CL_DEVICE_MAX_CLOCK_FREQUENCY:  1050
                CL_DEVICE_ADDRESS_BITS: 32
                CL_DEVICE_MAX_MEM_ALLOC_SIZE:   390280806
                CL_DEVICE_IMAGE_SUPPORT:        1
                CL_DEVICE_MAX_READ_IMAGE_ARGS:  128
                CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 128
                CL_DEVICE_IMAGE2D_MAX_WIDTH:    16384
                CL_DEVICE_IMAGE2D_MAX_WIDTH:    16384
                CL_DEVICE_IMAGE2D_MAX_HEIGHT:   16384
                CL_DEVICE_IMAGE3D_MAX_WIDTH:    16384
                CL_DEVICE_IMAGE3D_MAX_HEIGHT:   16384
                CL_DEVICE_IMAGE3D_MAX_DEPTH:    2048
                CL_DEVICE_MAX_SAMPLERS: 16
                CL_DEVICE_MAX_PARAMETER_SIZE:   1024
                CL_DEVICE_MEM_BASE_ADDR_ALIGN:  1024
                CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE:     128
                CL_DEVICE_SINGLE_FP_CONFIG:     CL_FP_DENORM |
CL_FP_INF_NAN | CL_FP_ROUND_TO_NEAREST |
CL_FP_ROUND_TO_ZERO | CL_FP_ROUND_TO_INF
                CL_DEVICE_SINGLE_FP_CONFIG:    
CL_READ_ONLY_CACHE | CL_READ_WRITE_CACHE
                CL_DEVICE_GLOBAL_MEM_CACHE_TYPE:       
CL_READ_WRITE_CACHE
                CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE:    64
                CL_DEVICE_GLOBAL_MEM_CACHE_SIZE:        524288
                CL_DEVICE_GLOBAL_MEM_SIZE:      1561123226
                CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE:     65536
                CL_DEVICE_MAX_CONSTANT_ARGS:    8
                CL_DEVICE_LOCAL_MEM_TYPE:
                CL_DEVICE_LOCAL_MEM_SIZE:       65536
                CL_DEVICE_ERROR_CORRECTION_SUPPORT:     0
                CL_DEVICE_HOST_UNIFIED_MEMORY:  1
                CL_DEVICE_PROFILING_TIMER_RESOLUTION:   83
                CL_DEVICE_ENDIAN_LITTLE:        1
                CL_DEVICE_AVAILABLE:    1
                CL_DEVICE_COMPILER_AVAILABLE:   1
                CL_DEVICE_EXECUTION_CAPABILITIES:      
CL_EXEC_KERNEL
                CL_DEVICE_QUEUE_PROPERTIES:    
CL_QUEUE_PROFILING_ENABLE
                CL_DEVICE_PLATFORM:     00DEC488
        CL_DEVICE_NAME: Intel(R) HD Graphics 520
        CL_DEVICE_VENDOR:       Intel(R) Corporation
        CL_DRIVER_VERSION:      20.19.15.4364
        CL_DEVICE_PROFILE:      FULL_PROFILE
        CL_DEVICE_VERSION:      OpenCL 2.0
        CL_DEVICE_OPENCL_C_VERSION:     OpenCL C 2.0
        CL_DEVICE_EXTENSIONS:   cl_intel_accelerator
cl_intel_advanced_motion_estimation cl_intel_ctz
cl_intel_d3d11_nv12_media_sharing cl_intel_dx9_media_sharing
cl_intel_motion_estimation cl_intel_simultaneous_sharing
cl_intel_subgroups cl_khr_3d_image_writes
cl_khr_byte_addressable_store cl_khr_d3d10_sharing
cl_khr_d3d11_sharing cl_khr_depth_images
cl_khr_dx9_media_sharing cl_khr_fp16 cl_khr_gl_depth_images
cl_khr_gl_event cl_khr_gl_msaa_sharing
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics cl_khr_gl_sharing
cl_khr_icd cl_khr_image2d_from_buffer
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics cl_khr_mipmap_image
cl_khr_mipmap_image_writes cl_khr_spir

                CL_DEVICE_TYPE: CL_DEVICE_TYPE_CPU
                CL_DEVICE_VENDOR_ID:    32902
                CL_DEVICE_MAX_COMPUTE_UNITS:    4
                CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS:     3
        CL_DEVICE_MAX_WORK_ITEM_SIZES:  8192 8192 8192
                CL_DEVICE_MAX_WORK_GROUP_SIZE:  8192
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR:  1
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT: 1
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT:   1
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG:  1
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT: 1
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE:       
1
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF:  0
                CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR:     32
                CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT:    16
                CL_DEVICE_NATIVE_VECTOR_WIDTH_INT:      8
                CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG:     4
                CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT:    8
                CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE:   4
                CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF:     0
                CL_DEVICE_MAX_CLOCK_FREQUENCY:  2500
                CL_DEVICE_ADDRESS_BITS: 32
                CL_DEVICE_MAX_MEM_ALLOC_SIZE:   536838144
                CL_DEVICE_IMAGE_SUPPORT:        1
                CL_DEVICE_MAX_READ_IMAGE_ARGS:  480
                CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 480
                CL_DEVICE_IMAGE2D_MAX_WIDTH:    16384
                CL_DEVICE_IMAGE2D_MAX_WIDTH:    16384
                CL_DEVICE_IMAGE2D_MAX_HEIGHT:   16384
                CL_DEVICE_IMAGE3D_MAX_WIDTH:    2048
                CL_DEVICE_IMAGE3D_MAX_HEIGHT:   2048
                CL_DEVICE_IMAGE3D_MAX_DEPTH:    2048
                CL_DEVICE_MAX_SAMPLERS: 480
                CL_DEVICE_MAX_PARAMETER_SIZE:   3840
                CL_DEVICE_MEM_BASE_ADDR_ALIGN:  1024
                CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE:     128
                CL_DEVICE_SINGLE_FP_CONFIG:     CL_FP_DENORM |
CL_FP_INF_NAN | CL_FP_ROUND_TO_NEAREST
                CL_DEVICE_SINGLE_FP_CONFIG:    
CL_READ_ONLY_CACHE | CL_READ_WRITE_CACHE
                CL_DEVICE_GLOBAL_MEM_CACHE_TYPE:       
CL_READ_WRITE_CACHE
                CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE:    64
                CL_DEVICE_GLOBAL_MEM_CACHE_SIZE:        262144
                CL_DEVICE_GLOBAL_MEM_SIZE:      2147352576
                CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE:     131072
                CL_DEVICE_MAX_CONSTANT_ARGS:    480
                CL_DEVICE_LOCAL_MEM_TYPE:       CL_LOCAL |
CL_GLOBAL
                CL_DEVICE_LOCAL_MEM_SIZE:       32768
                CL_DEVICE_ERROR_CORRECTION_SUPPORT:     0
                CL_DEVICE_HOST_UNIFIED_MEMORY:  1
                CL_DEVICE_PROFILING_TIMER_RESOLUTION:   395
                CL_DEVICE_ENDIAN_LITTLE:        1
                CL_DEVICE_AVAILABLE:    1
                CL_DEVICE_COMPILER_AVAILABLE:   1
                CL_DEVICE_EXECUTION_CAPABILITIES:      
CL_EXEC_KERNEL | CL_EXEC_NATIVE_KERNEL
                CL_DEVICE_QUEUE_PROPERTIES:    
CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE |
CL_QUEUE_PROFILING_ENABLE
                CL_DEVICE_PLATFORM:     00DEC488
        CL_DEVICE_NAME: Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz
        CL_DEVICE_VENDOR:       Intel(R) Corporation
        CL_DRIVER_VERSION:      5.2.0.10094
        CL_DEVICE_PROFILE:      FULL_PROFILE
        CL_DEVICE_VERSION:      OpenCL 2.0 (Build 10094)
        CL_DEVICE_OPENCL_C_VERSION:     OpenCL C 2.0
        CL_DEVICE_EXTENSIONS:   cl_khr_icd
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_khr_byte_addressable_store cl_khr_depth_images
cl_khr_3d_image_writes cl_intel_exec_by_local_thread
cl_khr_spir cl_khr_dx9_media_sharing
cl_intel_dx9_media_sharing cl_khr_d3d11_sharing
cl_khr_gl_sharing cl_khr_fp64 cl_khr_image2d_from_buffer

        CL_PLATFORM_PROFILE:    FULL_PROFILE
        CL_PLATFORM_VERSION:    OpenCL 1.2 CUDA 8.0.44
        CL_PLATFORM_VENDOR:     NVIDIA Corporation
        CL_PLATFORM_EXTENSIONS:
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics cl_khr_fp64
cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing
cl_nv_compiler_options cl_nv_device_attribute_query
cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing
cl_nv_d3d11_sharing cl_nv_copy_opts
        Number of devices:      1
                CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
                CL_DEVICE_VENDOR_ID:    4318
                CL_DEVICE_MAX_COMPUTE_UNITS:    3
                CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS:     3
        CL_DEVICE_MAX_WORK_ITEM_SIZES:  1024 1024 64
                CL_DEVICE_MAX_WORK_GROUP_SIZE:  1024
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR:  1
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT: 1
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT:   1
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG:  1
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT: 1
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE:       
1
                CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF:  0
                CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR:     1
                CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT:    1
                CL_DEVICE_NATIVE_VECTOR_WIDTH_INT:      1
                CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG:     1
                CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT:    1
                CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE:   1
                CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF:     0
                CL_DEVICE_MAX_CLOCK_FREQUENCY:  1241
                CL_DEVICE_ADDRESS_BITS: 32
                CL_DEVICE_MAX_MEM_ALLOC_SIZE:   536870912
                CL_DEVICE_IMAGE_SUPPORT:        1
                CL_DEVICE_MAX_READ_IMAGE_ARGS:  256
                CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 16
                CL_DEVICE_IMAGE2D_MAX_WIDTH:    16384
                CL_DEVICE_IMAGE2D_MAX_WIDTH:    16384
                CL_DEVICE_IMAGE2D_MAX_HEIGHT:   16384
                CL_DEVICE_IMAGE3D_MAX_WIDTH:    4096
                CL_DEVICE_IMAGE3D_MAX_HEIGHT:   4096
                CL_DEVICE_IMAGE3D_MAX_DEPTH:    4096
                CL_DEVICE_MAX_SAMPLERS: 32
                CL_DEVICE_MAX_PARAMETER_SIZE:   4352
                CL_DEVICE_MEM_BASE_ADDR_ALIGN:  4096
                CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE:     128
                CL_DEVICE_SINGLE_FP_CONFIG:     CL_FP_DENORM |
CL_FP_INF_NAN | CL_FP_ROUND_TO_NEAREST |
CL_FP_ROUND_TO_ZERO | CL_FP_ROUND_TO_INF | CL_FP_FMA
                CL_DEVICE_SINGLE_FP_CONFIG:    
CL_READ_ONLY_CACHE | CL_READ_WRITE_CACHE
                CL_DEVICE_GLOBAL_MEM_CACHE_TYPE:       
CL_READ_WRITE_CACHE
                CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE:    128
                CL_DEVICE_GLOBAL_MEM_CACHE_SIZE:        49152
                CL_DEVICE_GLOBAL_MEM_SIZE:      2147483648
                CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE:     65536
                CL_DEVICE_MAX_CONSTANT_ARGS:    9
                CL_DEVICE_LOCAL_MEM_TYPE:
                CL_DEVICE_LOCAL_MEM_SIZE:       49152
                CL_DEVICE_ERROR_CORRECTION_SUPPORT:     0
                CL_DEVICE_HOST_UNIFIED_MEMORY:  0
                CL_DEVICE_PROFILING_TIMER_RESOLUTION:   1000
                CL_DEVICE_ENDIAN_LITTLE:        1
                CL_DEVICE_AVAILABLE:    1
                CL_DEVICE_COMPILER_AVAILABLE:   1
                CL_DEVICE_EXECUTION_CAPABILITIES:      
CL_EXEC_KERNEL
                CL_DEVICE_QUEUE_PROPERTIES:    
CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE |
CL_QUEUE_PROFILING_ENABLE
                CL_DEVICE_PLATFORM:     00E30580
        CL_DEVICE_NAME: GeForce 940MX
        CL_DEVICE_VENDOR:       NVIDIA Corporation
        CL_DRIVER_VERSION:      369.30
        CL_DEVICE_PROFILE:      FULL_PROFILE
        CL_DEVICE_VERSION:      OpenCL 1.2 CUDA
        CL_DEVICE_OPENCL_C_VERSION:     OpenCL C 1.2
        CL_DEVICE_EXTENSIONS:  
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics cl_khr_fp64
cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing
cl_nv_compiler_options cl_nv_device_attribute_query
cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing
cl_nv_d3d11_sharing cl_nv_copy_opts

平台数:2
Image dimensions: 273 x 415 pixels, Image type = CL_RGBA,
CL_UNORM_INT8
Work Timer:3.422816ms

Executed program succesfully.

273X415轻重缓急的图样用时不到4ms。

(3)双边滤波的听从

图片 14

   
效果应该来说是很分明的。不过由于并未肤色检测和最终一步锐化,以及参数的装置等题材,连本人朋友都说那个磨皮效果太嫩了,瞧着很假。所以在算法上自己那个是有待健全的。

    此外,在进程上,这几个算法应该如故有优化的空中。

 

 

源码:http://download.csdn.net/download/qq_33892166/9761287

    源码假设报错“Error queuing kernel for execution.”,尝试修改 size_t
localWorkSize[2] = { 32, 32 }; 为 size_t localWorkSize[2] = { 16,
16 };

 

  

5.OpenCL中拍卖D3D顶点数据

     现考虑
使用一个包罗顶点数据的D3D缓冲区在显示器上制图一个正弦曲线。首先为D3D中的顶点缓冲区定义一个简单的协会:

struct SimpleSineVertex
{
    D3DXVECTOR4 Pos;
};

可以为这一个协会成立一个D3D10缓冲区,那里缓冲区中带有256个因素:

bd.Usage = D3D10_USAGE_DEFAULT;
bd.ByteWidth = sizeof( SimpleSineVertex ) * 256;
bd.BindFlags = D3D10_BIND_VERTEX_BUFFER;
bd.CPUAccessFlags = 0;
bd.MiscFlags = 0;

hr = g_pD3DDevice->CreateBuffer( &bd, NULL, &g_pSineVertexBuffer );

因为要选取OpenCL设置缓冲区中的数据,所以为首个参数pInitialData传入NULL,只分红空间。一旦创建了D3D缓冲区
g_pSineVertexBuffer,可以使用clCreateFromD3D10BufferKHR()函数从g_pSineVertexBuffer成立一个OpenCL缓冲区:

g_clBuffer = clCreateFromD3D10BufferKHR( context, CL_MEM_READ_WRITE, g_pSineVertexBuffer, &errNum );
if( errNum != CL_SUCCESS)
{

    std::cerr << "Error creating buffer from D3D10" << std::endl;
    return E_FAIL;
}

与前近乎,g_clBuffer可以视作一个基本参数发送到一个生产数量的OpenCL内核。
在示范代码中,正弦曲线的顶峰地点在根本中生成:

__kernel void init_vbo_kernel(__global float4 *vbo, int w, int h, int seq)
{
    int gid = get_global_id(0);
    float4 linepts;
    float f = 1.0f;
    float a = 0.4f;
    float b = 0.0f;

    linepts.x = gid/(w/2.0f)-1.0f;
    linepts.y = b + a*sin(3.14*2.0*((float)gid/(float)w*f + (float)seq/(float)w));
    linepts.z = 0.5f;
    linepts.w = 0.0f;

    vbo[gid] = linepts;
}

渲染时,设置布局和缓冲区,并指定一个线条带。接下来,computeBuffer()调用前面的水源更新缓冲区。激活一个粗略的渲染管线,并绘制256个数据点:

// Set the input layout
g_pD3DDevice->IASetInputLayout( g_pSineVertexLayout );
// Set vertex buffer
stride = sizeof( SimpleSineVertex );
offset = 0;
g_pD3DDevice->IASetVertexBuffers( 0, 1, &g_pSineVertexBuffer, &stride, &offset );

// Set primitive topology
g_pD3DDevice->IASetPrimitiveTopology( D3D10_PRIMITIVE_TOPOLOGY_LINESTRIP );

computeBuffer();
g_pTechnique->GetPassByIndex( 1 )->Apply( 0 );
g_pD3DDevice->Draw( 256, 0 );

       
运行时,程序会应用这一个内核生成纹理内容,然后运行D3D管线对纹理采样,并在显示屏上显得。然后还会绘制顶点缓冲区,在显示器上获取一个正弦曲线。

图片 15

 

示范工程源码:http://download.csdn.net/download/qq\_33892166/9867159

http://www.bkjia.com/cjjc/1214625.htmlwww.bkjia.comtruehttp://www.bkjia.com/cjjc/1214625.htmlTechArticle《OpenCL编程指南》之
与Direct3D互操作,opencl编程指南 介绍OpenCL与D3D 10里头的互操作。
1.起头化OpenCL上下文完成Direct3D互操作 OpenCL共享由pr…

相关文章