Apple’s CLaRa system fundamentally reimagines R...
Apple just quietly solved one of the biggest bottlenecks in RAG. The new system Clara isn't only providing better answers, but also dramatically more efficient. Current RAG systems receive a query, generate an embedding, and then hope to find the right documents. That means the query has to be semantically similar to the answer. But Apple does it differently. Clara digests the documents into memory tokens. They're basically hyper-compressed representations of the document's content without fillers or syntax. Now, the retriever takes the query, transforms it into a hypothetically ideal answer, and queries the database against the right memory tokens. But this part is trainable. The query reasoner is directly connected to the database, being able to improve its query generation through a few training runs. This whole concept is basically hide-on-steroids, running on super-reasonable model sizes. I know it's a little complex, but I created a summary for you. Comment RAG if you want a link, and don't forget to follow.
Summary
CLaRa transforms document retrieval by using Memory Tokens, which are compressed content representations, and a trainable Query Reasoner that generates ideal answers before retrieval. This approach improves accuracy without requiring extensive computational resources, operating more efficiently than traditional RAG systems by directly connecting the query reasoner to the database for continuous optimization.
Tags
Save videos. Search everything.
Build your personal library of inspiration. Find any quote, hook, or idea in seconds.
Create Free Account No credit card required