CUDA统一内存工作（具体来说，cudaMallocManaged（）;）

I recently have been playing around with CUDA, and was hoping to try out the unified memory model. I tried playing with sample code, and strangely, when launching the kernel, no values seemed to be updating. Modifying unified data from the host works fine, yet kernels launched simply won't modify the unified data.

我最近一直在玩CUDA，并希望尝试统一的内存模型。我尝试使用示例代码，奇怪的是，在启动内核时，似乎没有值更新。从主机修改统一数据工作正常，但启动的内核不会修改统一数据。

My card is a GTX 770 with 4GB of memory. I'm running Arch Linux, kernel 3.14-2, using GCC 4.8 to compile my samples. I'm setting the compute arch to sm_30, and activative -m64 flag

我的卡是GTX 770，内存为4GB。我正在运行Arch Linux，内核3.14-2，使用GCC 4.8编译我的样本。我将计算arch设置为sm_30，并激活-m64标志

Here's one sample that I was playing with. X[0] and X[1] always evaluate to 0, even when the kernel launches.

这是我正在玩的一个样本。即使内核启动，X [0]和X [1]也总是计算为0。

#include<stdio.h>
#include <cuda.h>

__global__ void kernel(int* x){
    x[threadIdx.x] = 2;
}

int main(){
    int* x;
    cudaMallocManaged(&x, sizeof(int) * 2);
    cudaError_t error = cudaGetLastError();
    printf("%s\n", error);
    x[0] = 0;
    x[1] = 0;

    kernel<<<1, 2>>>(x);
    cudaDeviceSynchronize();

    printf("result = %d\n", x[1]);

    cudaFree(x);
    return 0;
}

Another sample is this:

另一个例子是：

__global__ void adjacency_map_init_gpu(adjacency_map_t* map){
    int row = threadIdx.y + blockIdx.y * blockDim.y;
    int col = threadIdx.x + blockIdx.x * blockDim.x;

    int i = row * map->width + col;

    max(i, 0);
    min(i, map->width * map->height);

    map->connections[i] = 0;
}

__global__ void adjacency_map_connect_gpu(edge_t* edges, int num_edges, adjacency_map_t* map){

    int i = threadIdx.x + (((gridDim.x * blockIdx.y) + blockIdx.x)*blockDim.x);

    max(i, 0);
    min(i, num_edges);

    int n_start = edges[i].n_start;
    int n_end = edges[i].n_end;

    int map_index = n_start * map->width + n_end;
    map->connections[map_index] = 1;
    printf("%d new value: %d\n", map_index, map->connections[map_index]);
}

adjacency_map_t* adjacency_map_init(int num_nodes, edge_t* edges, int num_edges){
    adjacency_map_t *map;// = (adjacency_map_t*)malloc(sizeof(adjacency_map_t));
    cudaMallocManaged(&map, sizeof(adjacency_map_t));
    cudaMallocManaged(&(map->connections), num_nodes * num_nodes * sizeof(int));
    //map->connections = (int*)malloc(sizeof(int) * num_nodes * num_nodes);

    map->width = num_nodes;
    map->height = num_nodes;

    map->stride = 0;

    //GPU stuff
//    adjacency_map_t *d_map;
//    int* d_connections;

//    cudaMalloc((void**) &d_map, sizeof(adjacency_map_t));
//    cudaMalloc((void**) &d_connections, num_nodes * num_nodes * sizeof(int));

//    cudaMemcpy(d_map, map, sizeof(adjacency_map_t), cudaMemcpyHostToDevice);
//    cudaMemcpy(d_connections, map->connections, num_nodes * num_nodes, cudaMemcpyHostToDevice);
//cudaMemcpy(&(d_map->connections), &d_connections, sizeof(int*), cudaMemcpyHostToDevice);

//    edge_t* d_edges;
//    cudaMalloc((void**) &d_edges, num_edges * sizeof(edge_t));
//    cudaMemcpy(d_edges, edges, num_edges * sizeof(edge_t), cudaMemcpyHostToDevice);

adjacency_map_init_gpu<<<1, 3>>>(map);
cudaDeviceSynchronize();
//adjacency_map_connect_gpu<<<1, 3>>>(edges, num_edges, map);

cudaDeviceSynchronize();

//    cudaMemcpy(map, d_map, sizeof(adjacency_map_t), cudaMemcpyDeviceToHost);
//Synchronize everything
//    cudaFree(map);
//    cudaFree(edges);

return map;

}

Basically, I can access all the elements in the original structure on the host for the second snippet of code. Once I try to launch a kernel function, however, the pointer becomes inaccessible (at least, tested from gdb), and the entire object's data is inaccessible. The only portion of the edges and the map pointer I can still see after the first kernel launch are their respective locations.

基本上，我可以访问主机上原始结构中的所有元素，以获取第二段代码。但是，一旦我尝试启动内核函数，指针变得不可访问（至少从gdb测试），并且整个对象的数据不可访问。在第一次内核启动后，我仍然可以看到边缘的唯一部分和地图指针是它们各自的位置。

Any help would be greatly appreciated! Thanks so much!

任何帮助将不胜感激！非常感谢！

2 个解决方案

#1

Got it!

得到它了！

Turns out it was a problem with the IOMMU kernel option enabled. My motherboard, GIGABYTE 990-FXAUD3 seems to have had an error with IOMMU between the GPU and the CPU.

事实证明，启用IOMMU内核选项是一个问题。我的主板，GIGABYTE 990-FXAUD3似乎在GPU和CPU之间出现了IOMMU错误。

Detection: Whenever you launch Unified Memory accessing code in the console (without X), there should be an error message resembling this:

检测：每当您在控制台中启动统一内存访问代码（没有X）时，都应该出现类似这样的错误消息：

AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0017 address=0x00000002d80d5000 flags=0x0010]

AMD-Vi：记录事件[IO_PAGE_FAULT device = 01：00.0 domain = 0x0017 address = 0x00000002d80d5000 flags = 0x0010]

scrolling down the page. There might also be some discolouration in the top right of the screen (there was for me, at least).

向下滚动页面。屏幕右上方可能还有一些褪色（至少对我而言）。

Here's the solution (assuming you use GRUB):

这是解决方案（假设您使用GRUB）：

Open /etc/default/grub, and for the line GRUB_CMDLINE_LINUX_DEFAULT="" add the option iommu=soft inside the quotes.

打开/ etc / default / grub，对于行GRUB_CMDLINE_LINUX_DEFAULT =“”在引号内添加选项iommu = soft。

Hope this helps people out! Big thanks to Robert Crovella for helping me narrow down the problem!

希望这可以帮助人们！非常感谢Robert Crovella帮助我缩小问题范围！

2 个解决方案

#1

更多相关文章

随机推荐