LLM Prompt Engineering Practical Memory 1

Original link: http://www.zmonster.me/2023/06/21/llm-prompt-engineering-practice-memory-1.html

This article is the third article in the series of “Where to Run NLP”. The series of articles are as follows:

  1. LLM Prompt Engineering Practice: Preface ZMonster’s Blog
  2. LLM Prompt Engineering in Practice: Prototype ZMonster’s Blog

Earlier I asked some questions:

  1. ChatMessageHistory is a memory-based historical message recording method. Once you exit the historical conversation, all messages will be lost
  2. When the topic of discussion has been discussed earlier, the practice of selecting recent historical news cannot take advantage of previous discussions
  3. The method of fixedly setting the maximum number of tokens to output takes into account the situation of long output, which will result in the inability to use more historical messages in the case of short output
  4. A globally set temperature parameter will affect the performance of the robot in handling different tasks. For example, tasks such as translation, classification, and factual question and answer do not require randomness, while writing and idea generation have a certain degree of randomness.
  5. In the current mode, the practice of setting robot properties at the very beginning cannot guarantee the consistency and continuity of the settings, and users can always break through this setting through intentionally constructed inputs

Among them, 1/2/3 are designed to the problem of memory mechanism, and these problems can be independently thought and tried to solve, but before that, it is still necessary to do more discussion on these problems.

This is what I said when I first proposed to use historical dialogue records in the previous xorius development log: this code is not a dialogue mode, and does not consider the chat history. If you want to chat, you need to maintain a chat history.

So the question is, why should we consider the chat history when responding to the current user input? This approach actually assumes that the chat history is helpful for responding to the current user input, then:

  1. Will this assumption always hold?
  2. Under what circumstances does this assumption hold? What are the possible correlations between historical memory and current user input at this time?
  3. If the assumption is true, using the recent K rounds of historical memory, will K have an upper limit that the model’s computing power can bear?
  4. If the assumption is true, using the latest K rounds of historical memory, how much will the relevant content account for in this K round of historical memory?
  5. If the assumption is true, using the recent K rounds of historical memory, will the irrelevant content in it have a negative impact? If yes, what are the possible negative impacts?

For the first question, since it is a hypothesis, then of course it will not always be true, otherwise it is the truth rather than a hypothesis, and the previous implementation of xorius realized the use of historical memory when answering under this hypothesis—— That is, it will try to use historical memory at any time, so it will naturally bring certain side effects: when responding to user input that does not require historical memory, it may cause wrong answers due to forced use of historical memory.

For the second question, I personally think that there may be the following situations:

  1. The current input mentions something in the history conversation, e.g.
    • Using pronouns to refer to one or several objects talked about in historical conversations, this situation is called reference

      “It” refers to “language model”

        
      You: What is a language model Xorius:  
      A language model is a computer program used to predict the probability distribution of the next word or character in a natural language text. It is an important component in natural language processing and can be used for many tasks such as speech recognition, machine translation, text generation and automatic summarization, etc. Language models are usually based on statistical methods or neural network methods, which build a model by learning a large amount of text data, and use the model to predict the probability distribution of the next word or character. In practical applications, language models can help computers better understand and generate natural language, thereby improving the effect and quality of natural language processing.  
        
      ------------------- Above is the historical dialogue -------------------  
        
      You: Is it intelligent Xorius:  
      A language model has no intelligence in itself, it is just a program that predicts the probability distribution of the next word or character given the text data it has. However, through continuous training and optimization, language models can gradually improve the accuracy and efficiency of their predictions, thereby better simulating human language habits. In addition, the language model can also be used in combination with other artificial intelligence technologies, such as machine learning, deep learning, natural language processing, etc., to achieve more intelligent applications. Therefore, although the language model itself has no intelligence, it can provide important support and foundation for the development of artificial intelligence.  
      
    • Mention one or several objects talked about in historical conversations, but in an omitted form, this situation is called zero reference

      “How to use language model in speech recognition” omits “language model”

        
      You: What is a language model Xorius:  
      A language model is a technique used in natural language processing that models the linguistic structure of a piece of text to calculate the probability of that text. Language can be used in many applications, such as speech recognition, machine translation, natural language generation, etc. In speech recognition, the language model can help identify the most likely speech transcription results; in machine translation, the language model can help the translation system choose the most appropriate translation result; in natural language generation, the language model can help generate natural fluent language expression. Common language models include n-gram models, neural network language models, etc.  
        
      ------------------- Above is the historical dialogue -------------------  
        
      You: How to use Xorius in speech recognition:  
      In speech recognition, language models are mainly used to solve the problem of recognition ambiguity. When a speech recognition system transcribes a piece of speech, it may encounter multiple candidate results that appear similar on the audio signal but differ greatly in linguistic expression. At this point, the language model can help the system choose the most likely outcome.  
        
      Specifically, the language model will score each candidate result and calculate its probability in language expression. The system then selects the result with the highest score as the final transcription. In this way, the language model can help the system to recognize speech more accurately and improve the recognition accuracy.  
        
      In speech recognition, commonly used language models include n-gram models and neural network language models. The n-gram model is a statistics-based model that models by calculating the probability of occurrence of adjacent n words in the text. A neural network language model is a neural network-based model that trains a neural network to predict the probability of the next word. These models can be used for language model modeling in speech recognition.  
      
    • Use pronouns to refer to objects in historical conversations that are more granular than words, such as events, ideas, sentences, or even larger paragraphs

      “The last paragraph” refers to the “They succeeded in activating the device…and inspired more humans to explore the universe.” passage in the answer to the historical dialogue at the end.

        
      You: Write an adventure story on Europa Xorius:  
      On the surface of Europa, there is a huge crater called the "Red Sea". The crater is deeper than the deepest ocean on Earth and is believed to be one of the deepest places in the solar system. One day, a group of brave astronauts decided to explore this mysterious place.  
        
      They put on special spacesuits, took a small probe, and started their adventure. They saw some strange energy signatures on the probe that seemed to come from deep underground caves.  
        
      They decided to follow the signals and entered a huge cave system. Inside the cave, they found strange creatures that seemed to have lived in this deep environment for a long time.  
        
      As they travel deeper into the cave, they discover mysterious symbols and patterns that appear to be from an ancient civilization. They began to study these symbols, trying to understand the history and purpose of this civilization.  
        
      During their expedition, they encountered several dangers, including earthquakes and floods. But they persisted, and finally reached the depths of the cave. There, they discovered a mysterious device that appeared to be an energy generator that could power the entirety of Europa.  
        
      They successfully activated the device and brought it back to Earth. This discovery has changed human understanding of Europa and provided a new source of energy for human beings. This adventure story has become an important event in human history, and it has also inspired more humans to explore the adventurous spirit of the universe.  
        
      ------------------- Above is the historical dialogue -------------------  
        
      You: Translate the last paragraph into English Xorius: "They successfully activated the device and brought it back to Earth. This discovery changed humanity's understanding of Europa and provided a new source of energy for humanity. This adventure story became an important event in human his tory and inspired more people to explore the adventure spirit of the universe."  
      
    • Refers to objects in historical dialogue that are more granular than words, such as events, ideas, sentences, or even larger paragraphs, but in an omitted form

      “Squeeze it” omits the object, referring to the entire response in the historical dialogue “In the distant future…seeking unknown worlds and possibilities”

        
      You: Write a Martian story Xorius:  
      In the distant future, humans have begun to explore space, including Mars. An astronaut named Amy unfortunately lost contact with his colleagues during a mission and was eventually abandoned on Mars.  
        
      Amy realizes that she has been abandoned on this desolate planet without any sign of human life. She began to search for food and water sources, but soon discovered that the environment on Mars was very harsh, and there was almost no possibility of any life.  
        
      Over the next few months, Amy continued to search for food and water sources, while also working to repair her communication equipment, hoping to be able to contact humans on Earth. However, there was no response to any of her efforts.  
        
      In the days of loneliness and despair, Amy began to think about her own life and the meaning of existence. She began to re-examine her values ​​and beliefs, and gradually came to accept the reality that she might live alone on this planet forever.  
        
      However, just when Amy was about to give up, she received a signal from the earth. It turns out that scientists on Earth have been looking for her and have sent a rescue team to Mars to rescue her.  
        
      With the help of the rescue team, Amy finally returned to Earth successfully and became a well-respected astronaut. Her experience has also inspired more humans to explore space, looking for unknown worlds and possibilities.  
        
      ------------------- Above is the historical dialogue -------------------  
        
      You: Compress it, use shorter text Xorius:  
      Abandoned on Mars, an astronaut is alone searching for food and water, repairing communications equipment, and contemplating the meaning of life. In the end, he received a rescue signal from the Earth, successfully returned to Earth, and became a well-respected astronaut, inspiring more humans to explore space.  
      
  2. The content in the historical memory can be used as the knowledge to answer the current input, for example
    • The definition or detailed description of some knowledge currently input is provided in the historical memory, and the correct use of the knowledge provided in the historical memory when answering the current input can prevent AI from making up
    • The user’s (or AI’s) preferences, three views, or other personalized (or personified?) attribute descriptions are described in the historical memory, and the current input is based on these personalized information, so the relevant historical memory is used correctly It can make the personality shown in the answer consistent and continuous with that in historical memory
  3. A mix of the above two cases

In the research of multiple rounds of dialogues, there should be more comprehensive and detailed classifications based on the correlation between history and current input. I don’t want to go through the relevant research materials for the time being. Go and have a look, now it is mainly to do a general sorting out, knowing that there are different situations when the model uses the chat history.

On the basis of the answer to the second question, a certain answer can be made to the third question

  • If the relationship between the current input and historical memory is the first case (the current input refers to some content in the historical dialogue), it can be considered that the historical memory to be used in this case will be relatively recent, that is, K does not It will be too big, I personally think it can be directly processed by the model
  • If the relationship between the current input and historical memory is the second case (the content in the historical memory can be used as the knowledge to answer the current input), then the relevant historical memory may not be recent, it can be the content in the long-term memory, for example When I let it recommend books, I hope it can remember that I said a year ago that I like to read science fiction and then recommend some of the latest science fiction. In this case, the K may be very, very large. I personally think that it is used in this case. The practice of recent K rounds of historical memory will exceed the upper limit of the model’s computing power
  • If the relationship between the current input and historical memory is the third case (a mixture of the first case and the second case), then the same as the second case

For the fourth question, if the energy allows, maybe you can analyze some public multi-round dialogue data sets to give a relatively quantitative result, but I don’t want to do this at the moment. I just want to explain this question. One point – even when the assumption that chat history is helpful in responding to current user input holds, and the helpful information is in the most recent K rounds of history, not all K rounds of history may be helpful.

The fifth question is to think further on the fourth question. If there are irrelevant contents in the recent K rounds of historical memory, then they must have a negative impact. They may dilute the token in the answer and the model input (currently The attention weight of the useful content in the user input and the historical memory used), which may lead to deviations from expectations. The size of this negative impact depends to some extent on the ability of the model itself, and a strong model (such as GPT-4) may not be very obvious.

In summary, there are actually some problems with using the recent K-round historical memory, but this method is the easiest to implement, the most intuitive and understandable, and the effect is also good when the model is strong. There is no recognized better solution, so everyone generally does this.

It is normal for the method to be flawed. In fact, there is no perfect method without any problems. However, the biggest problem with this method is not its assumptions, but that it does not design corresponding error handling or optimized iteration measures. Once this mechanism does not work properly, you can only expect the improvement of the model itself or the user can master the so-called prompt writing skills to try again. If you are aware of these problems and have proper expectations, then there is nothing wrong with using this method, but if you use this method and go wrong, you will only ridicule the AI ​​for being mentally retarded, which is not interesting.

This article is transferred from: http://www.zmonster.me/2023/06/21/llm-prompt-engineering-practice-memory-1.html
This site is only for collection, and the copyright belongs to the original author.