Dumousi: I chatted a few more words about ChatGPT with my colleagues in China, and roughly came to the following conclusions: 1. The quality and quantity of domestic corpus are far inferior to other languages. If it is not as good as English, it is normal, and it is less than Japanese. Off the spectrum. Therefore, if you want to train a large language model in China, you must use other languages, hoping that the model can learn knowledge from other languages through translation. In fact, this is not just a problem of large language models. In the human world, we call this “studying abroad”. Not only is it a question of large language models, but it is also a question of what is “language/corpus quality” and what is the purpose of studying abroad. While I don’t think there’s anything outrageous about being less than Japanese, I’d be interested to know if that’s the case, and why.
Original link: https://blog.yitianshijie.net/2023/04/19/3694/ Lawrence Li Dumus : I chatted a few more words about ChatGPT with my domestic counterparts, and roughly came to the following conclusions: 1. The quality and quantity of domestic corpus are far inferior to other languages. If it is not as good as English, it is still normal, and it is […]