問題描述
我正在將 N 個字節從 pSrc
復制到 pDest
.這可以在一個循環中完成:
I am copying N bytes from pSrc
to pDest
. This can be done in a single loop:
for (int i = 0; i < N; i++)
*pDest++ = *pSrc++
為什么這比 memcpy
或 memmove
慢?他們使用什么技巧來加快速度?
Why is this slower than memcpy
or memmove
? What tricks do they use to speed it up?
推薦答案
因為 memcpy 使用字指針而不是字節指針,所以 memcpy 的實現也經常用 SIMD 指令,可以一次對 128 位進行混洗.
Because memcpy uses word pointers instead of byte pointers, also the memcpy implementations are often written with SIMD instructions which makes it possible to shuffle 128 bits at a time.
SIMD 指令是匯編指令,可以對最多 16 字節長的向量中的每個元素執行相同的操作.這包括加載和存儲指令.
SIMD instructions are assembly instructions that can perform the same operation on each element in a vector up to 16 bytes long. That includes load and store instructions.
這篇關于為什么 memcpy() 和 memmove() 比指針增量快?的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!