Introduction
The allure of “talking to my data with AI” has many engineering teams working on prototypes, testing models, and exploring new ways to improve productivity and efficiency. Regardless of whether you buy into the hype surrounding retrieval augmented generation (RAG) - a fancy phrase for using the power of a large language model (LLM) on a smaller dataset - there are definitely some opportunities for quick wins. You gather some documents from Google Drive, make them searchable, add a prompt template, call a model, and… it works!
It’s true: a simple RAG system can feel magical. It’s fun to see the first glimmers of a correct answer when you try it out. Before long though, the depth and challenges of building such a system for the enterprise become apparent. The challenges aren’t isolated either. They are interlinked and each new challenge can require changes throughout the system.
In this post, we’ll highlight some of the challenges to give you an understanding of the big picture. They include diversity of sources, data design, permissions, and of course, all the internal technical details for search and LLMs.
Diversity of data sources
A valuable answer engine can draw on the right sources of data from across the company. If you’re a Microsoft shop you’ve got interesting data in Sharepoint, Teams, Outlook and more. Oh, and several of the engineering teams went off road and installed Slack. And there are a few rogue Notion instances over in Marketing. This proliferation is normal, but now your RAG system needs data from all these sources.
Most modern systems provide an API for getting the text and metadata. However, it’s a whole software project unto itself to connect to all these sources and extract data. You’ve got numerous types of APIs including REST, gRPC, and GraphQL. Each API is structured differently and has different objects. That’s all before grappling with details like rate limits, streaming, and data updates. Hey engineers, Frank just updated the HR FAQ doc and the answers are wrong!
To further compound the problem, many answers don’t exist just in one place. Want a quick summary on a particular customer? Well, you just need to ask Salesforce for recent account updates… and then also Zendesk for the latest support tickets… and also Slack for those active conversations about this customer. In today’s business, you don’t work in one place, so even the idea of using an “AI Assistant” in one of these products isn’t going to give your users a complete picture.
Good retrieval means normalizing data
Once you have the data, it’s time to put it in a search engine. You can’t have RAG without retrieval! Search is a deep domain, but one that’s often taken for granted when software engineers build a prototype. There’s a reason many enterprise search teams are well suited to succeeding with modern search and answer engines.
The diversity of sources means a diversity of text and metadata structure. A well-tuned search platform benefits from uniform chunks of text, normalized schema, and well-formed metadata. Your connectors and pipeline must understand how to assemble and batch that data, how to map various fields to a common schema, and how to push everything into your search engine efficiently. These are table stakes - and we haven’t even talked about embeddings, hybrid search, and ranking!
A few Slack messages are not the same as a 10-page Google Doc, but both can have great value depending on what you need in the moment. As soon as you expand to more than one source, you’ll quickly see how much work and effort this can be. Even the most rigorous, documentation-centric teams that we’ve worked with will admit that sometimes decisions are documented in GitHub PRs, or Jira tickets, or Teams channels. Your job is to improve the SEO for answers, which all starts with effective normalization.
Permissions, privacy, and governance
In our consumer lives, finding answers using tools like ChatGPT or Microsoft Copilot is remarkably effective due to the accessibility of public domain knowledge and the power of LLMs. Building search and RAG over your public-facing knowledge base is a nice step, or even building an internal tool which covers internally-public data like the employee handbook. But not everyone in your organization has access to everything, right? You need a system that effectively leverages internal knowledge bases, documents within each department, private chat channels, and even email.
This is an intersection of permission systems, privacy policies, and information governance critical for the company. Sally the software engineer better not get answers which include CFO Julio’s private email about compensation. Some RAG tools, like ChatGPT Enterprise, have tried to workaround this experience by having you upload the candidate documents first, which is great when you’ve already found the place where the answers live. But this just sidesteps the inherent permissions problem - and still leaves the user with a search problem to figure out themselves. It takes serious technical understanding and implementation to layer in a permission-aware system for data ingestion and search in a RAG platform.
Technology selection for a RAG platform
We’ve talked a lot about the data. Yet, you still have to put everything together and make key technology selections. Plus you have to manage and operate all these systems. There are numerous search engines which will support basic text search, but only a few also support vectors and hybrid search. Speaking of, let’s spend the next few weeks selecting an embedding model and figuring out how to generate vectors for all our text.
Now, let’s pick the right language model (LLM) to call. I hear gpt4 is pretty hot, so is it ok if you send your corporate data over the wire to OpenAI? What is Amazon Bedrock and Azure OpenAI service? What are 1M tokens and why is the boss asking about prompt engineering?
All of this adds up to some pretty important long-term decisions to enable the experiences you want for your end-users. Not to mention the concerns from your security or legal teams about data privacy and governance.
Closing
Admittedly, we think all this makes for loads of fun!
We’ve painted a high level picture of search and RAG challenges for your enterprise. We’ve also spent the last few years tackling these challenges, optimizing our system, and adapting to the new wave of AI. Maybe some of these problems aren’t ones that you really want to solve. But that’s also where we can help!
If you’re interested in getting more value from your corporate data, reach out! We love to talk with folks about their internal challenges, and we can help you deliver on your AI readiness plans with limited risk trials.
Dave Cliffe is the Head of RAG (Rendering AI Guidance) at Atolio. Atolio helps enterprises use Large Language Models (LLMs) to find answers privately and securely.