I conducted some preliminary tests to assess the efficacy of employing GPT-4 for analyzing reference documents and generating answers for tasks assigned in the first semester of my graduate certificate program. It’s worth noting that this project was carried out post-completion of those courses, and therefore, was not utilized for the assignments associated with those specific classes.
Retrieval-augmented generation (RAG) is a technique for enhancing the accuracy and reliability of generative AI models with facts fetched from external sources. (What Is Retrieval Augmented Generation aka RAG | NVIDIA Blogs)
Conceptually, large language models (LLMs) operate as random text generators based on trained probability distributions. While their output texts may encompass a wide range of possibilities, they might not consistently reflect factual information. To address this limitation and ensure generated texts align with facts, a reference or contextual check becomes imperative. However, LLMs often have constraints, such as token limits, typically around 2048 tokens (or 4096 bytes) for models like ChatGPT, so we cannot give all texts of a long document to a LLM – some search and filter are required beforehand. A RAG system does the following:
- First, break down a sizable referencing text into manageable pieces or chunks. Each chunk is then assigned a numerical representation of meaning, known as embeddings, facilitating machine analysis. These embeddings are stored either locally or in an external repository, depending on their size.
- When a question is posed about the document, the system compares the similarity between the question and the stored chunks, identifying and retrieving the most relevant ones.
- The LLM processes and summarizes the returned text chunks, using this information to generate a coherent and contextually accurate answer.
Please refer to my Linkedin article for more details: https://www.linkedin.com/pulse/unlocking-knowledge-ai-powered-conversations-your-private-shijun-ju-snofe