How We Built a Model-Routing Architecture for Financial AI

A behind-the-scenes look at Ask Linc’s model-routing architecture: combining Claude, Gemini, deterministic finance math, and RAG for better financial analysis.

How We Built a Model-Routing Architecture for Financial AI
Photo by Luca Bravo / Unsplash

When we started building Ask Linc, the architecture was simple:

User question → single LLM → answer.

That worked well enough at first. But as the system evolved, we ran into a fundamental problem:

Financial questions are extremely diverse.

Some require reasoning.
Some require structured data analysis.
Some require context retrieval.
Some require quick summaries.

No single model consistently performed best across all of those tasks.

So we redesigned the architecture.

Instead of relying on one model, Ask Linc now uses a model-routing system that selects the best model for each request.

Here’s how it works.


Step 1: Context assembly

Before any model is called, Ask Linc assembles the context needed to answer the question.

This includes:

  • the user’s financial accounts and balances
  • portfolio holdings and allocations
  • historical data
  • the evolving user profile
  • the daily market summary
  • retrieved documents from our RAG system

The goal is to provide the model with the full financial picture.

This step ensures that the model isn't guessing or relying on generic financial advice.


Step 2: Query classification

Next, the system classifies the user’s question.

At a high level, questions typically fall into categories like:

  • reasoning-heavy financial analysis
  • structured data comparison
  • investment portfolio evaluation
  • macroeconomic explanation
  • lightweight summaries

The classification doesn’t need to be perfect. It just needs to identify which model is most likely to perform best.


Step 3: Model routing

Once the query is classified, the system routes the request to the appropriate model.

In practice this looks something like:

Claude

Used for:

  • multi-step reasoning
  • financial decision analysis
  • scenario evaluation
  • complex explanations

Gemini

Used for:

  • structured data interpretation
  • cross-account comparisons
  • portfolio breakdowns
  • pattern identification

This combination has performed consistently well in our evaluations.


Step 4: Financial reasoning layer

In addition to LLM reasoning, Ask Linc uses deterministic financial calculations when appropriate.

For example:

  • retirement withdrawal simulations
  • Monte Carlo projections
  • portfolio stress tests
  • safe withdrawal analysis

These calculations provide structured outputs that the model can interpret and explain.

This hybrid approach improves both accuracy and trustworthiness.


Step 5: Response generation

Once the routed model receives the context and data, it produces the final response.

The system instructs the model to:

  • reference the user's real financial data
  • explain reasoning clearly
  • highlight assumptions
  • avoid generic financial advice

This produces answers that are specific to the user’s financial situation, not hypothetical examples.


Why this architecture works better

Moving to a routing system improved three things immediately.

1. Better reasoning quality

Claude consistently performed better on multi-step financial reasoning.

Routing those questions to Claude improved answer quality.


2. Better structured data analysis

Gemini showed strong performance when analyzing financial tables and account-level comparisons.

Routing those tasks to Gemini reduced errors and improved clarity.


3. Lower infrastructure cost

Not every request requires the most expensive model.

Routing allows the system to use the right model for each task rather than defaulting to the most powerful one.


The bigger lesson

The biggest lesson from building this system is simple:

The model itself is only one part of the architecture.

What matters just as much is:

  • how context is assembled
  • how requests are classified
  • how models are selected
  • how structured calculations are integrated

AI applications are increasingly becoming orchestration systems, not just model wrappers.

The models are powerful.

But the real product is how you combine them.