What is RAG (Retrieval-Augmented Generation)?
A technique that enhances language model outputs by retrieving relevant information from external knowledge sources before generating responses.
Detailed Explanation
Retrieval-Augmented Generation (RAG) is an architecture pattern that combines information retrieval with text generation. Instead of relying solely on the knowledge embedded in a language model's parameters, RAG systems first search external knowledge bases (documents, databases, web pages) for relevant information, then provide that context to the language model along with the user's query. This produces more accurate, current, and verifiable responses. RAG is particularly important for agents like Hermes that need to recall specific user preferences, project details, or facts learned in previous sessions — the persistent memory system functions as a RAG pipeline, retrieving relevant context before the agent responds.
Related Terms
Frequently Asked Questions
Does RAG replace fine-tuning?
No. RAG and fine-tuning serve different purposes. RAG provides access to dynamic, up-to-date information. Fine-tuning changes how the model behaves. They are often used together.
What makes a good RAG system?
Key factors: quality of the knowledge base, effectiveness of the retrieval mechanism (semantic search + keyword), relevance ranking, context window management, and handling of conflicting information.
How is RAG used in AI agents?
Agents use RAG to recall relevant past interactions, user preferences, and learned facts before responding. This makes conversations contextually aware rather than stateless.