Original link: https://soulteary.com/2023/07/22/quantizing-meta-ai-llama2-chinese-version-large-models-using-transformers.html
This article talks about how to use HuggingFace’s Transformers to quantify the LLaMA2 large model produced by Meta AI, so that the model can run with only about 5GB of video memory.
This article is transferred from: https://soulteary.com/2023/07/22/quantizing-meta-ai-llama2-chinese-version-large-models-using-transformers.html
This site is only for collection, and the copyright belongs to the original author.