Where are the barriers at the AI application layer

Original link: http://gaocegege.com/Blog/ai-hype

AI has recently become a hot topic again. The release of ChatGPT half a year ago shocked everyone like a bolt from the blue. But half a year has passed, and the monthly activity of ChatGPT has begun to decline. After trying many applications myself, I gradually converged on a few limited applications such as Poe and GitHub Copilot.

It happened that there was no Internet on the way of business trip, so I wanted to take the opportunity to write about my views on the future of AI. Just as mobile phones and mobile networks ignited the spark for the development of the mobile Internet, we are still facing some missing elements in order for AI to truly enter all walks of life. What is missing, and where are the barriers to future AI applications? These issues have some very immature views, and they are written to attract ideas.

Previous generation AI

Looking back at the AI boom of the previous generation, it can be traced back to 12-15 years. One of the important milestones is the emergence of AlexNet. In the ImageNet image recognition challenge in 2012, AlexNet defeated traditional methods with amazing accuracy, which aroused widespread attention and interest.

However, the previous generation of AI faced some challenges in landing applications. In areas such as CV (computer vision) and audio, AI requires more human intervention and expertise to achieve good results. At the same time, text is a broad application scenario, and natural language processing (NLP) does have room for improvement in terms of intelligence. Despite some important advances in the field of NLP, such as machine translation, sentiment analysis, and text generation, there are still challenges in understanding semantics, handling context, and generating natural and fluent text.

AI finally showed the best results in Internet search advertising recommendations. This kind of business is not only profitable, but also forms a positive feedback flywheel effect through data. As Internet applications acquire more user data, the recommendation system can continuously improve its performance through continuous training and iteration, thereby providing better services for users and advertisers.

In other fields, it is difficult to have the same characteristics as search advertising recommendations. For example, CV scenarios need to consider data security and privacy issues, which makes it difficult to form a positive feedback loop of data and models at low cost.

ChatGPT

Compared to before, the reason why the recent ChatGPT can become one of the fastest growing applications in human history and realize intelligence in text scenes, except for model scale and dialogue-based friendly interaction methods, the most important thing is RLHF. A model is large, but its capabilities are always static if there is no way to further optimize and iterate. The imaginative space of AI comes from the fact that this is artificial intelligence, which can be continuously learned and optimized.

On the other hand, AI applications that are in full swing now have very few core barriers. In my opinion, this is mainly due to the inability to use the data obtained by itself efficiently and cost-effectively. We have seen a lot of AIGC applications invested by YC, and most of them are using ChatGPT to target a certain segment and use their own understanding of the industry to make products. However, the barriers to such products are not deep, and they come entirely from the know-how of the industry. In principle, it is not an AI company.

Now the better applications, Perplexity AI, Midijourney, Runway, etc. all have their own models, and can continuously use new data to iterate their own models. Looking at these projects, most of them are already using the scale effect of the Internet to obtain more data, and use the new data to further optimize the model and provide better services. Has formed its own flywheel effect. And if you just rely on claude or ChatGPT for productization, it will be a very introverted game.

In-context Learning vs. Fine-tune

So why are only a few companies able to do this now? I think that in the current NLP scenario, there is no way to optimize the model by using a large amount of low-to-medium quality data at low cost . Internet search advertising was the industry that really benefited from the last AI boom because it could leverage new data to optimize models at a very low computational cost. For example, TikTok, the user’s behavior can continue to optimize the recommendation system behind it through online learning or offline training.

The barriers to future AI applications in various subdivisions also lie in data. Only by being able to efficiently use data can we achieve business success and avoid falling into a situation of involution. Only by using the newly obtained data to continuously optimize the model and provide users with better services can we stand out from the competition. Therefore, efficient use of data will be the key to the success of future AI applications.

How to use data is currently mainly in-context learning and finetune. Let’s look at finetune first. I think finetune technology is not very mature in the NLP field, the cost is high, and the effect is difficult to adjust. In the image field, finetune can be performed at a very low cost through methods such as lora. And the base model will not change, the lora patch obtained from training may only be tens of MB. This will also be more convenient when deploying. The base model part can be reused. Only the patch needs to occupy independent video memory. Finetune and inference can be realized at a small cost to realize the individualization of the model.

However, the situation in the field of natural language processing is slightly different. The scale of LLM is much larger than SD and other models in the CV field, and problems such as overfitting and catastrophic forgetting of finetune LLM are more difficult to solve. Now if you use qlora or other algorithms finetune, there are a lot of engineering tricks to deal with, and because of the larger scale, the requirements for hardware resources are very high, and dozens of A100 cards are already relatively small.

Let’s look at in-context learning again. The main problem is that we don’t know what its principle is, so that we don’t know whether it can become the new mainstream learning method of LLM. It will be used more and more in the future, which I agree with. But whether it can continue to show good results under a large amount of data, or can only achieve a “not bad” effect through a few tricks, I have doubts. The core is that it is the ability that emerges after the scale of the model becomes larger, and the research on it is still in its infancy. At present, the context window undoubtedly limits its level, and the large window obtained through flash attention will encounter the problem that the middle part of the context is forgotten in engineering.

Taken together, the industry still lacks the ability to efficiently utilize data iterative models. I think this capability is to AI what TCP/IP is to the Internet. AI is AI only if it can use data to continuously optimize the model.

It is imagined that if there is such a “cheap finetune” capability, each travel agency application can use new data to continuously optimize the model, and will be able to provide a more personalized and optimized user experience. Technologies such as embedding recall and prompt engineering are also very important optimization methods, but it is still adding an “external brain” to the model. Through pre-designed recall rules and hint engineering, the model can be guided to generate more accurate and useful answers. But it is difficult to iterate and optimize with new data.

further future

These are just some of the things we can expect in the near future. From a longer-term perspective, Agent is the more promising future of AI. If the ability to efficiently utilize data iterative models is TCP/IP in the AI era, Agent is the Internet itself.

Moreover, what is more worth (my personal) expectation is that it will put forward new requirements for infra and developer tools . Referring to the development of the Internet, the demand for developer tools was very weak in the early stage of building a website. A webmaster can finish all the work with php plus mysql, and a hao123 is enough. However, with the increase in business complexity and the pursuit of efficiency, the front and back ends of the Internet have emerged, and their respective frameworks have emerged. For example, the front-end iterates from jquery and angular to vue and react, all of which are constantly improving the development efficiency of engineers.

And if AI enters the agent era, the complexity of business will increase by orders of magnitude. The demand for tools will also increase, and in the field of AI infra, like traditional infra, there will be more subdivided categories. Every scene needs a good tool.

But when will the future come? I do not know.

License

This article is licensed under CC BY-NC-SA 3.0 .
Please contact me for commercial use.

This article is transferred from: http://gaocegege.com/Blog/ai-hype
This site is only for collection, and the copyright belongs to the original author.