問題描述
在編寫 CUDA 應(yīng)用程序時,您可以在驅(qū)動程序級別或運行時級別工作,如圖所示(庫是 CUFFT 和 CUBLAS 用于高級數(shù)學(xué)):
When writing CUDA applications, you can either work at the driver level or at the runtime level as illustrated on this image (The libraries are CUFFT and CUBLAS for advanced math):
(來源:tomshw.it)
我假設(shè)兩者之間的權(quán)衡是提高低級 API 的性能,但以增加代碼復(fù)雜性為代價.具體的區(qū)別是什么?有哪些重要的事情是高級 API 不能做的?
I assume the tradeoff between the two are increased performance for the low-evel API but at the cost of increased complexity of code. What are the concrete differences and are there any significant things which you cannot do with the high-level API?
我正在使用 CUDA.net 與 C# 進行互操作,它是作為驅(qū)動程序 API 的副本構(gòu)建的.這鼓勵在 C# 中編寫大量相當復(fù)雜的代碼,而使用運行時 API 的 C++ 等效代碼會更簡單.這樣做有什么好處嗎?我可以看到的一個好處是更容易將智能錯誤處理與其他 C# 代碼集成.
I am using CUDA.net for interop with C# and it is built as a copy of the driver API. This encourages writing a lot of rather complex code in C# while the C++ equivalent would be more simple using the runtime API. Is there anything to win by doing it this way? The one benefit I can see is that it is easier to integrate intelligent error handling with the rest of the C# code.
推薦答案
CUDA 運行時可以將您的 CUDA 內(nèi)核編譯和鏈接到可執(zhí)行文件中.這意味著您不必將 cubin 文件與您的應(yīng)用程序一起分發(fā),或者處理通過驅(qū)動程序 API 加載它們.正如您所指出的,它通常更易于使用.
The CUDA runtime makes it possible to compile and link your CUDA kernels into executables. This means that you don't have to distribute cubin files with your application, or deal with loading them through the driver API. As you have noted, it is generally easier to use.
相比之下,驅(qū)動程序 API 更難編程,但可以更好地控制 CUDA 的使用方式.程序員必須直接處理初始化、模塊加載等.
In contrast, the driver API is harder to program but provided more control over how CUDA is used. The programmer has to directly deal with initialization, module loading, etc.
顯然,通過驅(qū)動程序 API 可以查詢比通過運行時 API 更詳細的設(shè)備信息.例如,設(shè)備上可用的空閑內(nèi)存只能通過驅(qū)動程序 API 查詢.
Apparently more detailed device information can be queried through the driver API than through the runtime API. For instance, the free memory available on the device can be queried only through the driver API.
來自 CUDA 程序員指南:
From the CUDA Programmer's Guide:
它由兩個API組成:
- 一種稱為 CUDA 驅(qū)動程序 API 的低級 API,
- 一種稱為 CUDA 運行時 API 的高級 API,它在CUDA 驅(qū)動程序 API.
這些 API 是互斥的:應(yīng)用程序應(yīng)該使用其中一個或其他.
These APIs are mutually exclusive: An application should use either one or the other.
CUDA 運行時通過提供隱式來簡化設(shè)備代碼管理初始化、上下文管理和模塊管理.C 主機代碼nvcc 生成的基于 CUDA 運行時(參見第 4.2.5 節(jié)),所以鏈接到此代碼的應(yīng)用程序必須使用 CUDA 運行時 API.
The CUDA runtime eases device code management by providing implicit initialization, context management, and module management. The C host code generated by nvcc is based on the CUDA runtime (see Section 4.2.5), so applications that link to this code must use the CUDA runtime API.
相比之下,CUDA 驅(qū)動程序 API 需要更多代碼,更難編程和調(diào)試,但提供了更好的控制水平并且與語言無關(guān),因為它只處理 cubin 對象(參見第 4.2.5 節(jié)).尤其是更難使用 CUDA 驅(qū)動程序 API 配置和啟動內(nèi)核,因為執(zhí)行必須使用顯式函數(shù)調(diào)用指定配置和內(nèi)核參數(shù)而不是第 4.2.3 節(jié)中描述的執(zhí)行配置語法.此外,設(shè)備仿真(參見第 4.5.2.9 節(jié))不適用于 CUDA 驅(qū)動程序 API.
In contrast, the CUDA driver API requires more code, is harder to program and debug, but offers a better level of control and is language-independent since it only deals with cubin objects (see Section 4.2.5). In particular, it is more difficult to configure and launch kernels using the CUDA driver API, since the execution configuration and kernel parameters must be specified with explicit function calls instead of the execution configuration syntax described in Section 4.2.3. Also, device emulation (see Section 4.5.2.9) does not work with the CUDA driver API.
API 之間沒有明顯的性能差異.你的內(nèi)核如何使用內(nèi)存以及它們在 GPU 上的布局方式(在扭曲和塊中)將產(chǎn)生更明顯的效果.
There is no noticeable performance difference between the API's. How your kernels use memory and how they are laid out on the GPU (in warps and blocks) will have a much more pronounced effect.
這篇關(guān)于CUDA 驅(qū)動程序 API 與 CUDA 運行時的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網(wǎng)!