Building Cost-Efficient Agentic RAG on Long-Text Documents in SQL Tables

a reliable, low-latency, cost-efficient RAG system on a SQL table that stores large documents in long-text fields — without changing the existing schema?

This is not a theoretical problem.

In most enterprises, critical business knowledge already lives inside traditional relational databases. Proposals, reports, contracts, articles — all stored in TEXT or LONGTEXT columns — designed for keyword matching and aggregations, not semantic retrieval.

With the advent of LLMs, business demands have evolved to structured computation, deep semantic understanding and contextual insights in a natural, conversational manner.

For example:

How many projects over $1M were approved from 2023 to 2025?
Summarize the major trends seen in technology over the last 6 months
What has been the differentiators of winning proposals in 2025?

They require a retrieval strategy that can decide when to compute, when to semantically search, and when to combine both. In this article, I’ll demonstrate an Agentic RAG architecture that operates directly on top of a traditional SQL database — without schema changes — and discuss the design principles required to make it reliable in production.

System setup

For this illustration, I have used a subset of the Social Animal 10K Articles with NLP dataset, which has a large number of news articles and blog posts along with metadata. The SQL database created has the following columns — url, title, authors, published_date, article_category, word_count and the full_content.

The title can be considered to be a unique identifier (primary key) for the content. The article categories are technology, business, sports, travel, health, entertainment, politics and fashion. The articles are distributed approximately evenly across the categories. The LLM used is gemini-2.5-flash and FAISS to index and store the vector embeddings. The design is applicable for any choice of LLM or vector database.

Architecture

Besides embedding the raw text, we mirrored the vector store metadata with the same fields present in SQL (except the full content). This allows for Filtering, as we will see in the results. For long documents, a sliding window chunking and embedding strategy can be adopted with the metadata attached to each embedding.

The metadata code snippet is attached

for idx, row in df_sql.iterrows():
    content = str(row['full_content']).strip()
    if not content:
        continue
        
    metadata = {
        "source": row.get('url', ''),
        "title": row.get('title', ''),
        "authors": str(row.get('authors', '')),
        "article_category": str(row.get('article_category', 'unknown')),
        "published_date": str(row.get('published_date', '')),
        "word_count": int(row.get('content_word_count', 0))
    }
    
    doc = Document(page_content=content, metadata=metadata)
    documents.append(doc)

We built two specialised, intelligent tools that the ReAct agent can invoke using the following architecture. The ReAct (router) agent orchestrates the entire query pipeline by intelligently deciding which tool to invoke based on the nature of the query. It uses the metadata and query context to determine whether the SQL tool, the vector tool, or a hybrid approach is most appropriate. The following figure depicts the query decision flow:

The tools are as follows:

search_database (SQL tool): Handles questions that require computation, aggregation or complex logic. It executes SQL queries
search_articles (Vector tool): Handles questions about content, topic or specific entities. Accepts a natural language query, and optionally, metadata filters to execute a global semantic search (eg: “articles about children“) or search a subset of the data (eg: “filter_authors=’XYZ’, “query”=”articles”).

As can be seen in the above figure, a query can take the following pathways:

For computations (eg; how many articles..), inequalities/range (eg: Articles published between Jan and Apr 2023) or aggregations (eg; What is the average word count..), use the SQL tool only.
Semantic search, with or without filters, use the Vector tool as explained above.
Hybrid query: Hybrid queries are essential when you need both structured data (e.g., filtering by date) and unstructured content (e.g., semantic search on articles). The query has a metadata filter criteria (usually category or a date range), for which the SQL tool is used to fetch articles. Then the list of titles is passed to the Vector tool to perform a semantic search on only those articles. An example would be “between march and may 2023 is there any article about mother’s day in fashion“

Results

Here are the results of a few queries of each type. It would be interesting to note the tool pathway followed in each case.

1. “what are the articles by year and category?” — This is a complex computation query, so only the SQL tool will be used.

The result is here

QUERY: what are the articles by year and category ?
TOOL SELECTION:
  - search_database: {'query': "SELECT strftime('%Y', published_date) AS year, article_category, COUNT(*) FROM articles GROUP BY year, article_category"}
DOCUMENTS SELECTED:
  - Tool Output: [('1970', 'business', 1), ('2001', 'politics', 1), ('2013', 'politics', 1), ('2018', 'entertainment', 1), ('2020', 'entertainment', 1), ('2020', 'sports', 1), ('2022', 'fashion', 1), ('2023', 'business', 19), ('2023', 'entertainment', 16), ('2023', 'fashion', 16), ('2023', 'health', 18), ('2023', 'politics', 16), ('2023', 'sports', 16), ('2023', 'technology', 18), ('2023', 'travel', 18)]
RESULT:
Here are the number of articles by year and category:
* 1970: business (1)
* 2001: politics (1)
* 2013: politics (1)
* 2018: entertainment (1)
* 2020: entertainment (1), sports (1)
* 2022: fashion (1)
* 2023: business (19), entertainment (16), fashion (16), health (18), politics (16), sports (16), technology (18), travel (18)

2. “what articles about children do you have?” — This does not match any of metadata categories that the schema consists of, so the agent decides to use the Vector tool with a global semantic search.

The result is here

QUERY: what articles about children do you have
TOOL SELECTION:
  - search_articles: {'query': 'children'}
DOCUMENTS SELECTED:
  - Tool Output: --- Document 1 ---
Source: https://www.filmcompanion.in/features/indepth-stories/streaming/best-childrens-movies-on-amazon-prime-video-netflix-hotstar-klaus-kung-fu-panda-jagga-jasoos-childrens-day-2020
Title: 10 Films That Children Should Watch
Category: entertainment
Date: 2020-11-14 02:30:36
Content: On this children's day, here's a list of stories designed to help young ones understand the world around them, gain life lessons and be dazzled by colourful imagination. It's a great time to be a...
  - https://www.filmcompanion.in/features/indepth-stories/streaming/best-childrens-movies-on-amazon-prime-video-netflix-hotstar-klaus-kung-fu-panda-jagga-jasoos-childrens-day-2020
  - https://africabusiness.com/2023/04/07/save-the-children-and-thinkmd-expand-partnership-to-improve-the-lives-of-children-globally/
  - https://www.tcpalm.com/story/news/education/st-lucie-county-schools/2023/04/11/books-stay-in-st-lucie-county-schools-but-most-move-to-high-school/70098338007/
RESULT:
Here are some articles about children:

1.  10 Films That Children Should Watch (entertainment)
2.  Save the Children and THINKMD Expand Partnership to Improve the Lives of Children Globally (health)
3.  St. Lucie County School Board decides to keep challenged books in school libraries (health)

3. “what are the trends in fashion?” — The agent finds the category = fashion, and runs the semantic match using the Vector tool with this filter criteria.

The result is here

QUERY: what are the trends in fashion
TOOL SELECTION:
  - search_articles: {'query': 'trends', 'filter_category': 'fashion'}
DOCUMENTS SELECTED:
  - Tool Output: --- Document 1 ---
Source: https://www.sightunseen.com/2023/04/the-best-thing-we-saw-in-milan-today-india-mahdavi-for-gebruder-thonet-vienna/
Title: The Best Thing We Saw in Milan Today: India Mahdavi for Gebrüder Thonet Vienna - Sight Unseen
Category: fashion
Date: 2023-04-18 12:00:00
Content: How to Live With Objects

The Best Thing We Saw in Milan Today: India Mahdavi for Gebrüder Thonet Vienna

Sight Unseen is on the ground at the Milan Furniture Fair right now and we’ll be bringing you load...
  - https://www.sightunseen.com/2023/04/the-best-thing-we-saw-in-milan-today-india-mahdavi-for-gebruder-thonet-vienna/
  - https://themoderndaygirlfriend.com/clean-make-up-skincare-brand-in-2023/
  - https://poprazzi.com/the-80s-inspired-jewelry-trend-im-absolutely-fawning-over/
RESULT:
The search results mention the following trends in fashion: India Mahdavi for Gebrüder Thonet Vienna, clean make up & skincare, and 80's-inspired jewelry.

4. “tell me technology articles about crypto in 2023” — This is a hybrid query where the SQL tool will be used to get the titles in 2023 for category = technology, then the Vector tool will be invoked with the query = crypto and the title list. The result will be found within that subset.

The result is here

QUERY: tell me technology articles about crypto in 2023
TOOL SELECTION:
  - search_database: {'query': "SELECT title FROM articles WHERE article_category = 'technology' AND published_date LIKE '2023%'"}
  - search_articles: {'filter_titles': ['NPR quits Twitter, says Musk-led platform is “undermining our credibility”', 'Crypto.com arena considers rebrand after FTX fallout reignites investor anger', "World's Biggest EV Battery Maker Unveils S", 'What Checks Should You Perform Before Buying a Used Car? - Free Car Mag', 'Preeclampsia Diagnostics Market 2023-2030 Overall Analysis Report | Fortune Business Insights', 'Spanish startup on ‘mission to save planet’s beer’ from climate crisis', 'Best third-party apps for Reddit (2023)', "How Ford's 'driverless' BlueCruise system works", "Save $100 on Apple's M1-Powered iPad Air", 'Media release: Music cassette sales highest in nearly two decades - allmediascotland…media jobs, media release service and media resources for all', 'Software Engineer, Service Platform at Stripe | echojobs.io', 'Driving data to deliver net zero with life cycle assessments', 'The Future of Work: Trends and Predictions for the Next Decade', 'Risks of Outsourcing Software Development to Uruguay', 'Kochie Arrest News Addressed, Affected By Scam', 'Paying attention to employee retention | IRIS', 'Could this foam-filled £7,250 track bike rival the Team GB Hope HB.T?', 'Aave, Chainlink, and Binance are concentrating on enhancing the Shapella, while Big Eyes Coin is aiming for a significant final push'], 'query': 'crypto'}
DOCUMENTS SELECTED:
  - Tool Output: [('NPR quits Twitter, says Musk-led platform is “undermining our credibility”',), ('Crypto.com arena considers rebrand after FTX fallout reignites investor anger',), ("World's Biggest EV Battery Maker Unveils S",), ('What Checks Should You Perform Before Buying a Used Car? - Free Car Mag',), ('Preeclampsia Diagnostics Market 2023-2030 Overall Analysis Report | Fortune Business Insights',), ('Spanish startup on ‘mission to save planet’s beer’ from climate crisis',), ('Best third-party apps for Re...
  - Tool Output: --- Document 1 ---
Source: https://www.analyticsinsight.net/crypto-com-arena-considers-rebrand-after-ftx-fallout-reignites-investor-anger/
Title: Crypto.com arena considers rebrand after FTX fallout reignites investor anger
Category: technology
Date: 2023-04-19 01:15:52
Content: The Crypto.com Arena, formerly known as the Staples Center, is reportedly considering a rebrand after the recent controversy surrounding FTX, a cryptocurrency exchange. The FTX fallout reignited anger among investors, wh...
  - https://www.analyticsinsight.net/crypto-com-arena-considers-rebrand-after-ftx-fallout-reignites-investor-anger/
  - https://theprint.in/ani-press-releases/aave-chainlink-and-binance-are-concentrating-on-enhancing-the-shapella-while-big-eyes-coin-is-aiming-for-a-significant-final-push/1527613/
  - https://echojobs.io/job/stripe-software-engineer-service-platform-wndr6
RESULT:
Here are some technology articles from 2023 that mention crypto:

1.  **Crypto.com arena considers rebrand after FTX fallout reignites investor anger**: This article discusses the possibility of the Crypto.com Arena rebranding due to the controversy surrounding the FTX cryptocurrency exchange.
2.  **Aave, Chainlink, and Binance are concentrating on enhancing the Shapella, while Big Eyes Coin is aiming for a significant final push**: This article mentions the Shapella upgrade and its implications for cryptocurrencies on the Ethereum network, including Binance, Aave, and Chainlink.
3. **Software Engineer, Service Platform at Stripe | echojobs.io**: This article mentions Stripe, a financial infrastructure platform that allows businesses to accept payments.

Key Considerations

As with any architecture, there are design principles to be considered for a robust application. Here are a few of them:

Tool docstrings vs system prompt: These are two types of instructions that guide the agent behavior in different ways. It is important to use them for the intended purposes without any overlap or conflict for a reliable agent performance. Tool docstring, located inside the @tool decorator, describes what the tool does and how to use it. Besides the tool name, it defines the parameters, types and descriptions.

Here is the example of the search_articles tool docstring.

@tool
def search_articles(query: str, filter_category: Optional[str] = None, ...):
    """Useful for finding information about specific topics, summaries, or details within articles.
    
    You can filter by metadata for precision:
    - `filter_category`: 'health', 'tech', etc.
    - `filter_titles`: List of exact titles to retrieve (BATCH MODE).
    - `filter_date`: Published date (YYYY-MM-DD) for EXACT or PARTIAL match only.
    ...
    """

On the other hand, the system prompt intelligently guides the routing strategy for the agent, enabling it to decide when to use the SQL tool, Vector tool or a combination. It is also the most complex and fragile component of the application. It defines how tools are combined in hybrid workflows, provides examples of correct tool usage, and specifies mandatory rules and constraints. To adequately design the system prompt, It is crucial to begin with a test case repository of expected user queries, provide examples in the system prompt, and continue enriching it for deviations that come up for edge cases during operations.

Here is a sample of the system prompt

system_prompt = (
    "1. **LISTING/BROWSING QUERIES** (e.g., 'what articles are in politics'):\n"
    "   - **ALWAYS use [search_database] to list titles\n"
    "   - DO NOT use [search_articles] without a semantic query\n"
    ...
    "### MANDATORY RULES\n"
    "1. **DATE RANGES & INEQUALITIES**: Use SQL first, then pass titles to vector tool\n"
    ...
)

Pre and Post filtering vector databases: This is a subtle point that can have unintended and hard-to-explain results for specific queries. Consider the following two queries where the only difference is the mis-spelt name: “summarize articles about Doo ley in politics on 17th apr 2023” and “summarize articles about Dooley in politics on 17th apr 2023“. Both the queries follow identical paths, whereby the SQL tool successfully selects the titles for this category and date (there is only 1 article mentioning Judge Dooley), then the Vector tool is called on this title list with the query. Strangely, for the first query, the Vector tool returns "Tool Output: No documents found matching the criteria." for this minor spelling error even when the list has only 1 article to select from, whereas for the second query it returns the correct article.

Here is the result of the first query

QUERY: QUERY: summarize articles about Doo ley in politics on 17th apr 2023
TOOL SELECTION:
  - search_database: {'query': "SELECT title FROM articles WHERE published_date LIKE '2023-04-17%' AND article_category = 'politics'"}
  - search_articles: {'query': 'Doo ley', 'filter_category': 'politics', 'filter_titles': ['Judge Dooley Ends Hartford Police Consent Decree Despite Concerns']}
DOCUMENTS SELECTED:
  - Tool Output: [('Judge Dooley Ends Hartford Police Consent Decree Despite Concerns',)]
  - Tool Output: No documents found matching the criteria.

And the second query

QUERY: summarize articles about Dooley in politics on 17th apr 2023
TOOL SELECTION:
  - search_database: {'query': "SELECT title FROM articles WHERE published_date LIKE '2023-04-17%' AND article_category = 'politics'"}
  - search_articles: {'query': 'Dooley', 'filter_category': 'politics', 'filter_titles': ['Judge Dooley Ends Hartford Police Consent Decree Despite Concerns']}
DOCUMENTS SELECTED:
  - Tool Output: [('Judge Dooley Ends Hartford Police Consent Decree Despite Concerns',)]
  - Tool Output: --- Document 1 ---
Source: https://www.nbcconnecticut.com/news/local/judge-ends-hartford-police-consent-decree-despite-concerns/3015203/
Title: Judge Dooley Ends Hartford Police Consent Decree Despite Concerns
Category: politics
Date: 2023-04-17 05:36:24
Content: Judge Dooley has ended the nearly 50 years of federal oversight of police in Hartford, despite continued concerns the department still has not hired enough minority officers to reflect the city's large Black and Hispanic populations.

And the reason is not just a weaker embedding due to incorrect spelling. It is because FAISS (and Chroma etc) perform post-filtering — first do a global search for the query, and then filter the results for the metadata (= the title list). In this case, the correct article does not feature in the top_k = 3 articles after semantic search. A pre-filtering database, on the other hand, would have done the semantic search only on the articles in the title list and found the correct article even with the incorrect spelling.

Can all metadata filters be removed from the Vector Tool?: Yes, it is possible, but its a higher cost option, as simple semantic queries with a metadata filter (such as category or author), will become a hybrid query, requiring two tool calls, adding to token usage and latency. A pragmatic middle ground would be to keep dates (and possibly other numeric metadata such as word counts in this case) in the SQL only, and mirror all text and categorical metadata in the vector database.

Conclusion

Building RAG on top of SQL is not about adding embeddings. It’s about designing the right retrieval strategy.

When structured metadata and long-form content live in the same table, the real challenge is orchestration — deciding when to compute with SQL, when to semantically search, and when to combine both. Subtle details like metadata filtering and tool routing can make the difference between a reliable system and one that silently fails.

With a well-designed Agentic RAG layer, legacy SQL databases can power semantic applications without schema changes, costly migrations, or performance trade-offs.

Connect with me and share your comments at www.linkedin.com/in/partha-sarkar-lets-talk-AI

Reference

Social Animal 10K Articles with NLP — Dataset by Alex P (Owner) (CC BY-SA 4.0)

_{Images used in this article are generated using Google Gemini. Dataset used under} _{CC-BY-SA 4.0 license. Figures and underlying code created by me.}

Source link

The post Building Cost-Efficient Agentic RAG on Long-Text Documents in SQL Tables first appeared on TechToday.

This post originally appeared on TechToday.

Building Cost-Efficient Agentic RAG on Long-Text Documents in SQL Tables

System setup

Architecture

Results

Key Considerations

Conclusion

Reference

This laptop deal is ideal for WFH and hybrid work – with a $250 price cut

A dangerous new Android backdoor has been found – Keenadu lurks in firmware, here’s what we know

Skullcandy Method 540 review: AI chatbot and price hike are the only changes

Leave a Reply Cancel reply