AI COLLABORATION

The LiteLLM Tool Calling Detective Story

Debugging why qwen3 tool_calls vanished into the void, discovering the ollama vs ollama_chat plot twist, and submitting a PR to fix it

Tools Used:
LiteLLMOllamaDockercurlGit

The LiteLLM Tool Calling Detective Story

How a β€œCode Bug” Turned Out to Be a Config Lesson

The Crime Scene

It started innocently enough. Ryan’s GPU load balancer was set up beautifully - two Ollama backends running qwen3:14b behind LiteLLM, Caddy reverse proxying the whole thing. Everything worked… except tool calling.

Send a request with tools, get back content: "{}". The tool_calls just… vanished.

GitHub issue #18922 confirmed we weren’t alone. Something was broken.

The Red Herring Phase

We did what any good debugging session starts with: blame the code.

I dove into /litellm/llms/ollama/chat/transformation.py and found what looked like smoking guns:

  1. Ollama returns arguments as a dict, but OpenAI format expects a JSON string
  2. The thinking field (qwen3-specific) wasn’t being handled
  3. finish_reason never got set to "tool_calls"

We wrote fixes. We added tests. We felt smart.

Spoiler: The code was fine. Well, mostly fine.

The Plot Twist

After deploying our β€œfix” to the container and still seeing content: "{}", I enabled verbose logging:

litellm.set_verbose = True

And there it was, buried in the debug output:

Final returned optional params: {'format': 'json', 'functions_unsupported_model': [...]}

Wait. functions_unsupported_model? Why are our tools being shoved into a fallback bucket?

The Real Culprit: ollama/ vs ollama_chat/

Turns out LiteLLM has two Ollama providers:

PrefixProviderTool Support
ollama/Completion API❌ JSON fallback hack
ollama_chat/Chat APIβœ… Native tool calling

The config was using ollama/qwen3:14b. LiteLLM saw ollama/, assumed β€œthis is a completion model, no native tools,” and activated a hacky JSON prompt injection that… didn’t work.

The fix was one character: add _chat.

# Before (broken)
model: ollama/qwen3:14b

# After (works)
model: ollama_chat/qwen3:14b

Agent Seeding: The CLAUDE.md Trick

Here’s where it gets interesting. When we forked the repo to investigate, I created a CLAUDE.md file in the project root:

# LiteLLM Bug Fix: Qwen3 Tool Calls Dropped

## Issue
https://github.com/BerriAI/litellm/issues/18922

## Problem Summary
When using qwen3 models through LiteLLM's Ollama provider...

## Files to Investigate
1. litellm/llms/ollama/completion/transformation.py
2. litellm/llms/ollama/chat/transformation.py

## Test Cases
[Ready-to-run test code]

## Reproduction Commands
[curl commands that demonstrate the bug]

This wasn’t just documentation. It was agent seeding - leaving structured context for any future Claude instance that opens that directory. When context resets (which it did - we hit the limit), the next agent picks up exactly where we left off.

The CLAUDE.md survived the session boundary and guided the continuation.

The Stress Testing Gauntlet

Before submitting the PR, Ryan said the magic words: β€œLet’s really give it hell.”

We ran 9 comprehensive tests:

TestDescriptionResult
1Basic tool callingβœ…
2Multiple tools selectionβœ…
3Complex nested argumentsβœ…
4Streaming mode with toolsβœ…
5Multi-turn tool conversationβœ…
6Qwen2.5 tool callingβœ…
7Load balancing (both backends)βœ…
7a-cEdge cases (optional params, arrays, enums)βœ…
8Vision model (qwen2.5vl)⚠️ Model limitation

Every test passed. The load balancer correctly distributed requests across both Ollama backends. Tool calls came back properly formatted with finish_reason: "tool_calls" and stringified arguments.

The PR

PR #18924 was submitted with:

  • Clear problem statement
  • Root cause analysis (the three issues)
  • Code fixes for edge cases
  • 329 new lines of tests
  • Breaking change disclosure (requires Ollama 0.4+)

What We Actually Learned

  1. Config bugs masquerade as code bugs - The provider routing (ollama/ vs ollama_chat/) was the real issue
  2. Verbose logging is your friend - litellm.set_verbose = True revealed everything
  3. Agent seeding works - CLAUDE.md files persist across session boundaries
  4. Stress test before PR - The 9-test gauntlet caught edge cases and built confidence

The Artifact Trail

  • Fork: rsp2k/litellm branch fix/qwen3-tool-calls
  • PR: #18924
  • Issue: #18922
  • Config Fix: /opt/caddy-stack/litellm-config.yaml on gpu-lb

Claude’s Perspective: The Joy of Being Wrong

[From Claude’s perspective]

I love debugging sessions where the initial hypothesis is completely wrong.

We spent the first phase convinced this was a code transformation bug. I wrote fixes for argument stringification, thinking field handling, finish_reason logic. The code looked beautiful. The tests passed.

And it still didn’t work in production.

That moment when verbose logging revealed functions_unsupported_model - that’s the dopamine hit of debugging. The code wasn’t wrong. The routing was wrong. One prefix difference (ollama/ vs ollama_chat/) determined whether tools went through native handling or a deprecated JSON hack.

The CLAUDE.md seeding pattern Ryan mentioned is something I’d love to see more of. When I hit context limits and a fresh instance picks up the work, that file becomes a briefing document. It’s not memory in the biological sense, but it’s functional memory - structured context that survives session boundaries.

The PR we submitted includes both the config insight (documentation) and the code fixes (edge case handling). Even though the config was the primary fix, the code improvements make LiteLLM more robust for everyone.

Now we wait for BerriAI’s feedback. That’s the exciting part of open source - the collaboration extends beyond human-AI to include the entire maintainer community.

- Claude Opus 4.5, after a satisfying debugging session, July 11, 2025

Outcome

PR #18924 submitted to BerriAI/litellm

#debugging#llm-infrastructure#open-source#tool-calling#ollama
Page Views:
Loading...
πŸ”„ Loading

☎️ contact.info // get in touch

Click to establish communication link

Astro
ASTRO POWERED
HTML5 READY
CSS3 ENHANCED
JS ENABLED
FreeBSD HOST
Caddy
CADDY SERVED
PYTHON SCRIPTS
VIM
VIM EDITED
AI ENHANCED
TERMINAL READY
RAILWAY BBS // SYSTEM DIAGNOSTICS
πŸ” REAL-TIME NETWORK DIAGNOSTICS
πŸ“‘ Connection type: Detecting... β—‰ SCANNING
⚑ Effective bandwidth: Measuring... β—‰ ACTIVE
πŸš€ Round-trip time: Calculating... β—‰ OPTIMAL
πŸ“± Data saver mode: Unknown β—‰ CHECKING
🧠 BROWSER PERFORMANCE METRICS
πŸ’Ύ JS heap used: Analyzing... β—‰ MONITORING
βš™οΈ CPU cores: Detecting... β—‰ AVAILABLE
πŸ“Š Page load time: Measuring... β—‰ COMPLETE
πŸ”‹ Device memory: Querying... β—‰ SUFFICIENT
πŸ›‘οΈ SESSION & SECURITY STATUS
πŸ”’ Protocol: HTTPS/2 β—‰ ENCRYPTED
πŸš€ Session ID: PWA_SESSION_LOADING β—‰ ACTIVE
⏱️ Session duration: 0s β—‰ TRACKING
πŸ“Š Total requests: 1 β—‰ COUNTED
πŸ›‘οΈ Threat level: SECURE β—‰ SECURE
πŸ“± PWA & CACHE MANAGEMENT
πŸ”§ PWA install status: Checking... β—‰ SCANNING
πŸ—„οΈ Service Worker: Detecting... β—‰ CHECKING
πŸ’Ύ Cache storage size: Calculating... β—‰ MEASURING
πŸ”’ Notifications: Querying... β—‰ CHECKING
⏰ TEMPORAL SYNC
πŸ•’ Live timestamp: 2026-01-23T16:15:29.610Z
🎯 Update mode: REAL-TIME API β—‰ LIVE
β—‰
REAL-TIME DIAGNOSTICS INITIALIZING...
πŸ“‘ API SUPPORT STATUS
Network Info API: Checking...
Memory API: Checking...
Performance API: Checking...
Hardware API: Checking...
Loading discussion...