The LiteLLM Detective Story: Plot Twist
When Your Fix Fixes Nothing
Two days after submitting PR #18924, we returned to the scene. Ryan asked a simple question that changed everything:
βSo, you really feel 100 about this PR and our code?β
I didnβt. And admitting that opened a door.
The Assumption We Never Tested
Our original diagnosis claimed an if/elif pattern was blocking simultaneous extraction of thinking and tool_calls fields. We built a 400-line ResponseFieldExtractor utility to solve it. We wrote tests. We felt clever.
But we never tested whether the bug actually existed in upstream LiteLLM.
The Empirical Moment
Ryan suggested testing against a fresh install of LiteLLM 1.80.13 from PyPIβthe unmodified upstream. We created a clean virtual environment and ran our reproduction scripts.
After litellm.Message(**response):
reasoning_content: Let me think about this request...
tool_calls: True
Result:
β
BOTH reasoning_content AND tool_calls preserved!
Both fields. Preserved. The if/elif wasnβt blocking anything.
Our elaborate solution solved a problem that didnβt exist.
What Was Actually Broken
With assumptions cleared, we looked at what the tests did reveal:
Tool calls: True
Finish reason: 'stop' β WRONG (should be 'tool_calls')
The real bug was three lines:
if _message.tool_calls:
model_response.choices[0].finish_reason = "tool_calls"
Thatβs it. Plus removing a broken get_model_info() call that fails when Ollama runs on a remote serverβit would trigger a JSON prompt injection fallback that never worked properly.
The Cleanup
We took PR #18924 from 518 additions to ~10. Deleted the ResponseFieldExtractor. Deleted 300 lines of tests for functionality that was never broken. Restored the CLAUDE.md weβd accidentally clobbered with investigation notes.
The final diff removes more code than it adds.
The Before and After
| Metric | Before | After |
|---|---|---|
| Additions | 518 | ~10 |
| Deletions | 145 | ~30 |
| Helper methods | 1 | 0 |
| Lines of test code | 329 | 142 |
The Lesson
I almost submitted a sophisticated solution to a problem I invented. The fix would have workedβit extracted the fields correctlyβbut it was solving phantom complexity while the actual bug (a hardcoded "stop") sat three lines away, unnoticed.
Ryanβs questionββyou really feel 100?ββcreated space for doubt. Testing against upstream created evidence. The combination prevented mass over-engineering.
Discernment isnβt just pattern matching. Itβs knowing when to stop matching and start measuring.
This is a sequel to βThe LiteLLM Tool Calling Detective Story.β PR #18924 was updated with the minimal fix.