Original link: https://reiner.host/posts/8f0289a0.html
With the opening of the API interface by OPENAI, the AI of major manufacturers has sprung up like mushrooms after the rain. Just like the Internet fire ten years ago, the future outlet must be on AI.
Of course, based on the self-training model/self-developed AI, the threshold is too high, it is not capable of individuals or small and medium-sized factories, and even if there is, there is a big gap with OPENAI, so the only thing that ordinary people can roll is the application layer.
Based on this background, I started to study the GPT-based custom data index question answering robot, and then I discovered the two frameworks llama_index and langchain, and I will record their usage here.
llama_index: GitHub – jerryjliu/llama_index: LlamaIndex (GPT Index) is a project that provides a central interface to connect your LLM’s with external data.
langchain: GitHub – hwchase17/langchain: Building applications with LLMs through composability
OPENAI model fine-tuning?
At first, I tried to use OPENAI’s model fine-tuning. I tried to feed hundreds of KB of text data into it, but found that when I used the fine-tuning model to talk, the AI’s reply was always a few words or even a complete sentence.
After searching for information, I realized that model fine-tuning cannot achieve my desired goal with hundreds or megabytes of text data.
Finally I found that llama_index + langchain can achieve the desired effect
llama_index + langchian realizes intelligent question answering robot
step 1. Installation environment
Install python3.10 or above
Install dependent libraries:
pip install llama-index
pip install openai
pip install langchain
pip install pandas
Prepare API KEY for OPENAI
step 2. Prepare data
Prepare the database for the robot to answer, which can include PDF, HTML, WORD documents, SQL, API interfaces, or even network resources such as GITHUB, WIKI, etc. In this chapter, I will use simple TXT text, and the content of the example is as follows:
When is The Legend of Zelda: Tears of the Kingdom coming out? “The Legend of Zelda: Tears of the Kingdom” will be released on May 12, 2023, so stay tuned!
step 3. Write python code
from llama_index import SimpleDirectoryReader, ServiceContext, GPTVectorStoreIndex, PromptHelper, load_index_from_storage, StorageContext
final step. Run the py file
Run the python code to see if it can answer the questions in the database normally
Points that can be optimized in the future:
Use websocket + streaming test output to achieve a typewriter-like effect, and the streaming output responds faster and the user experience is better
Record the user’s historical dialogue context
A robot that integrates database Q&A + chat, automatically identifies whether it belongs to data Q&A or ordinary chat
Recommend GPT related projects
AutoGPT: GitHub – Significant-Gravitas/Auto-GPT: An experimental open-source attempt to make GPT-4 fully autonomous.
GPT4-FREE: GitHub – xtekky/gpt4free: decentralizing the Ai Industry, just some language model api’s…
OPENAI-JAVA: GitHub – TheoKanning/openai-java: OpenAI GPT-3 Api Client in Java
This article is transferred from: https://reiner.host/posts/8f0289a0.html
This site is only for collection, and the copyright belongs to the original author.