std::bodun::blog

SHEPHERD: Serving DNNs in the Wild

Original link: https://www.bodunhu.com/blog/posts/shepherd-serving-dnns-in-the-wild/ Paper link: SHEPHERD: Serving DNNs in the Wild Achieving scalability, high system goodput and maximize resource utilization, at the same time is hard for an inference system. While individual request streams can be highly unpredictable, aggregating request streams into moderately-sized groups greatly improves predictability, permitting high resource utilization as well as scalability […]

SHEPHERD: Serving DNNs in the Wild Read More »

TensorIR Transformation

Original link: https://www.bodunhu.com/blog/posts/tensorir-transformation/ In the previous post , we’ve explored how to write primitive functions in TensorIR. Here, we will see how to transform TensorIR into other (potentially more performant) variants. The content is driven from the mlc course taught by Tianqi Chen . Batched BMM ReLu A batched matrix multiplication followed by a ReLu

TensorIR Transformation Read More »

Dive into TensorIR

Original link: https://www.bodunhu.com/blog/posts/dive-into-tensorir/ TensorIR is a compiler abstraction for optimizing programs with tensor computation primitives in TVM . Imagine a DNN task as a graph, where each node represents a tensor computation. TensorIR explains how each node/tensor computation primitive in the graph is carried out. This post explains my attempt to implement 2D convolution using

Dive into TensorIR Read More »