跳转至主要内容
Version: v1.6.0

SIMT Intrinsics

For the CUDA backend, Taichi supports warp-level and block-level intrinsics, which are needed for writing high-performance SIMT kernels. 你可以在 Taichi 中以类似于 CUDA 内核中的用法 使用它们。 目前支持以下函数:

运算映射的 CUDA 内建函数
ti.simt.warp.all_nonzero__all_sync
ti.simt.warp.any_nonzero__any_sync
ti.simt.warp.unique__uni_sync
ti.simt.warp.ballot__ballot_sync
ti.simt.warp.shfl_sync_i32__shfl_sync
ti.simt.warp.shfl_sync_f32__shfl_sync
ti.simt.warp.shfl_up_i32__shfl_up_sync
ti.simt.warp.shfl_up_f32__shfl_up_sync
ti.simt.warp.shfl_down_i32__shfl_down_sync
ti.simt.warp.shfl_down_f32__shfl_down_sync
ti.simt.warp.shfl_xor_i32__shfl_xor_sync
ti.simt.warp.match_any__match_any_sync
ti.simt.warp.match_all__match_all_sync
ti.simt.warp.active_mask__activemask
ti.simt.warp.sync__syncwarp

See Taichi's API reference for more information on each function.

Here is an example of performing data exchange within a warp in Taichi:

a = ti.field(dtype=ti.i32, shape=32)

@ti.kernel
def foo():
ti.loop_config(block_dim=32)
for i in range(32):
a[i] = ti.simt.warp.shfl_up_i32(ti.u32(0xFFFFFFFF), a[i], 1)

for i in range(32):
a[i] = i * i

foo()

for i in range(1, 32):
assert a[i] == (i - 1) * (i - 1)
本文有帮助吗?