Machine learning software development frameworks have undergone dramatic changes over the past decade. While most frameworks rely heavily on CUDA, the best performance is on CUDA. But with the arrival of PyTorch 2.0 and OpenAI Triton, NVIDIA CUDA’s monopoly on machine learning is gradually disintegrating. A few years ago, Google’s TensorFlow framework and dedicated accelerator TPU had a first-mover advantage, and Google was considered to be on track to dominate the machine learning industry. Yet Google failed to turn its first-mover advantage into industry dominance. PyTorch wins. Google without PyTorch and GPUs is isolated from the machine learning community. Google favors its own software stack and hardware, and even, in typical Google fashion, it has developed a second framework, Jax, to compete directly with its own TensorFlow. A big advantage of PyTorch is its flexibility, and the upcoming 2.0 will make it easier to utilize different hardware resources. PyTorch 2.0 improves NVIDIA A100 GPU training performance by 86%, and CPU inference performance by 26%, significantly reducing the computing time and cost required for training models. It also scales to GPUs and accelerators from AMD, Intel, Tenstorrent, Luminous Computing, Tesla, Google, Amazon, Microsoft, Marvell, Meta, Graphcore, Cerebras, SambaNova, and others. OpenAI’s Triton also has an impact on Nvidia’s machine learning closed-source software moat. It can skip closed-source CUDA libraries such as cuBLAS and use open-source libraries such as cutlass. Using CUDA requires a deep understanding of the underlying hardware, and Triton enables high-level languages to achieve comparable performance to low-level languages, improving usability. OpenAI Triton currently only supports Nvidia GPUs, but will support other hardware manufacturers in the future.
This article is transferred from: https://www.solidot.org/story?sid=73904
This site is only for collection, and the copyright belongs to the original author.