跳转至主要内容
Version: develop

一个 Taichi kernel 的生命周期

理解Taichi kernel的声明周期有时是很有帮助的。 简单来说,编译仅会发生在第一次创建一个kernel的实例时。

一个Taichi kernel的生命周期包含以下阶段:

  • 登记kernel
  • 模板实例化及缓存
  • Python AST转换
  • Taichi IR的编译、优化及可执行生成
  • 启动

image

让我们考虑下面这个简单的kernel

@ti.kernel
def add(field: ti.template(), delta: ti.i32):
for i in field:
field[i] += delta

为了进行简单的讨论,我们分配两个1D的field:

x = ti.field(dtype=ti.f32, shape=128)
y = ti.field(dtype=ti.f32, shape=16)

登记kernel

当执行到 ti.kernel 修饰符时,一个名为 add 的kernel就被注册。 具体来说,Taichi将记住 add 函数的Python 抽象语法树(Abstract Syntax Tree, AST)。 而其编译不会发生在在第一次调用 add 之前。

模板实例化及缓存

add(x, 42)

When add is called for the first time, the Taichi frontend compiler will instantiate the kernel.

When you have a second call with the same template signature (explained later), e.g.,

add(x, 1)

Taichi will directly reuse the previously compiled binary.

Arguments hinted with ti.template() are template arguments, and will incur template instantiation. 例如,

add(y, 42)

will lead to a new instantiation of add.

note

Template signatures are what distinguish different instantiations of a kernel template. The signature of add(x, 42) is (x, ti.i32), which is the same as that of add(x, 1). Therefore, the latter can reuse the previously compiled binary. The signature of add(y, 42) is (y, ti.i32), a different value from the previous signature, hence a new kernel will be instantiated and compiled.

note

Many basic operations in the Taichi standard library are implemented using Taichi kernels using metaprogramming tricks. Invoking them will incur implicit kernel instantiations.

Examples include x.to_numpy() and y.from_torch(torch_tensor). When you invoke these functions, you will see kernel instantiations, as Taichi kernels will be generated to offload the hard work to multiple CPU cores/GPUs.

As mentioned before, the second time you call the same operation, the cached compiled kernel will be reused and no further compilation is needed.

Code transformation and optimizations

When a new instantiation happens, the Taichi frontend compiler (i.e., the ASTTransformer Python class) will transform the kernel body AST into a Python script, which, when executed, emits a Taichi frontend AST. Basically, some patches are applied to the Python AST so that the Taichi frontend can recognize it.

The Taichi AST lowering pass translates Taichi frontend IR into hierarchical static single assignment (SSA) IR, which allows a series of further IR passes to happen, such as

  • Loop vectorization
  • Type inference and checking
  • General simplifications such as common subexpression elimination (CSE), dead instruction elimination (DIE), constant folding, and store forwarding
  • Access lowering
  • Data access optimizations
  • Reverse-mode automatic differentiation (if using differentiable programming)
  • Parallelization and offloading
  • Atomic operation demotion

The just-in-time (JIT) compilation engine

Finally, the optimized SSA IR is fed into backend compilers such as LLVM or Apple Metal/OpenGL shader compilers. The backend compilers then generate high-performance executable CPU/GPU programs.

Kernel launching

Taichi kernels will be ultimately launched as multi-threaded CPU tasks or GPU kernels.