How We Built a Model-Routing Architecture for Financial AI
A behind-the-scenes look at Ask Linc’s model-routing architecture: combining Claude, Gemini, deterministic finance math, and RAG for better financial analysis.
When we started building Ask Linc, the architecture was simple:
User question → single LLM → answer.
That worked well enough at first. But as the system evolved, we ran into a fundamental problem:
Financial questions are extremely diverse.
Some require reasoning.
Some require structured data analysis.
Some require context retrieval.
Some require quick summaries.
No single model consistently performed best across all of those tasks.
So we redesigned the architecture.
Instead of relying on one model, Ask Linc now uses a model-routing system that selects the best model for each request.
Here’s how it works.
Step 1: Context assembly
Before any model is called, Ask Linc assembles the context needed to answer the question.
This includes:
- the user’s financial accounts and balances
- portfolio holdings and allocations
- historical data
- the evolving user profile
- the daily market summary
- retrieved documents from our RAG system
The goal is to provide the model with the full financial picture.
This step ensures that the model isn't guessing or relying on generic financial advice.
Step 2: Query classification
Next, the system classifies the user’s question.
At a high level, questions typically fall into categories like:
- reasoning-heavy financial analysis
- structured data comparison
- investment portfolio evaluation
- macroeconomic explanation
- lightweight summaries
The classification doesn’t need to be perfect. It just needs to identify which model is most likely to perform best.
Step 3: Model routing
Once the query is classified, the system routes the request to the appropriate model.
In practice this looks something like:
Claude
Used for:
- multi-step reasoning
- financial decision analysis
- scenario evaluation
- complex explanations
Gemini
Used for:
- structured data interpretation
- cross-account comparisons
- portfolio breakdowns
- pattern identification
This combination has performed consistently well in our evaluations.
Step 4: Financial reasoning layer
In addition to LLM reasoning, Ask Linc uses deterministic financial calculations when appropriate.
For example:
- retirement withdrawal simulations
- Monte Carlo projections
- portfolio stress tests
- safe withdrawal analysis
These calculations provide structured outputs that the model can interpret and explain.
This hybrid approach improves both accuracy and trustworthiness.
Step 5: Response generation
Once the routed model receives the context and data, it produces the final response.
The system instructs the model to:
- reference the user's real financial data
- explain reasoning clearly
- highlight assumptions
- avoid generic financial advice
This produces answers that are specific to the user’s financial situation, not hypothetical examples.
Why this architecture works better
Moving to a routing system improved three things immediately.
1. Better reasoning quality
Claude consistently performed better on multi-step financial reasoning.
Routing those questions to Claude improved answer quality.
2. Better structured data analysis
Gemini showed strong performance when analyzing financial tables and account-level comparisons.
Routing those tasks to Gemini reduced errors and improved clarity.
3. Lower infrastructure cost
Not every request requires the most expensive model.
Routing allows the system to use the right model for each task rather than defaulting to the most powerful one.
The bigger lesson
The biggest lesson from building this system is simple:
The model itself is only one part of the architecture.
What matters just as much is:
- how context is assembled
- how requests are classified
- how models are selected
- how structured calculations are integrated
AI applications are increasingly becoming orchestration systems, not just model wrappers.
The models are powerful.
But the real product is how you combine them.
Comments ()