File Search in the Responses API
LiteLLM now supports file_search in the Responses API across both:
- providers that support it natively (like OpenAI / Azure), and
- providers that do not (like Anthropic, Bedrock, and other non-native providers) via emulation.
What this is​
file_search lets models retrieve grounded context from your vector stores and answer with citations.
LiteLLM keeps one OpenAI-compatible output shape while routing requests through either native passthrough or an emulated fallback.
Two paths are covered:
| Path | When it runs | What LiteLLM does |
|---|---|---|
| Native passthrough | Provider natively supports file_search (OpenAI, Azure) | Decodes unified vector store ID → forwards to provider as-is |
| Emulated fallback | Provider doesn't support file_search (Anthropic, Bedrock, etc.) | Converts to a function tool → intercepts tool call → runs vector search → synthesizes OpenAI-format output |
In tools[].vector_store_ids, LiteLLM accepts both provider-native IDs (e.g. vs_...) and managed vector store unified IDs (URL-safe base64 strings from the proxy managed-vector flow), e.g. litellm.responses(..., tools=[{"type": "file_search", "vector_store_ids": ["bGl0ZWxsbV9wcm94eT..."]}]).
Usage​
- LiteLLM Proxy
- LiteLLM SDK
1. Setup config.yaml​
config.yaml
model_list:
- model_name: gpt-4.1
litellm_params:
model: openai/gpt-4.1
api_key: os.environ/OPENAI_API_KEY
- model_name: claude-sonnet
litellm_params:
model: anthropic/claude-sonnet-4-5
api_key: os.environ/ANTHROPIC_API_KEY
2. Start the proxy​
litellm --config config.yaml
3. Call Responses API with file_search​
Proxy call
from openai import OpenAI
client = OpenAI(base_url="http://localhost:4000", api_key="sk-your-proxy-key")
response = client.responses.create(
model="claude-sonnet", # swap to "gpt-4.1" for native path
input="What does LiteLLM support?",
tools=[{
"type": "file_search",
"vector_store_ids": ["vs_abc123"]
}],
include=["file_search_call.results"],
)
print(response.output)
1. Install + set keys​
pip install litellm
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
2. Call Responses API with file_search​
SDK call
import litellm
response = litellm.responses(
model="anthropic/claude-sonnet-4-5", # swap to openai/gpt-4.1 for native path
input="What does LiteLLM support?",
tools=[{
"type": "file_search",
"vector_store_ids": ["vs_abc123"]
}],
include=["file_search_call.results"],
)
print(response.output)
Behavior Matrix​
| Path | SDK model | Proxy model | Behavior |
|---|---|---|---|
| Native passthrough | openai/gpt-4.1 | gpt-4.1 | Provider executes native file_search |
| Emulated fallback | anthropic/claude-sonnet-4-5 | claude-sonnet | LiteLLM converts to function tool and synthesizes OpenAI-format output |
Architecture Diagram​
Prerequisites​
pip install 'litellm[proxy]'
export OPENAI_API_KEY="sk-..." # for native path
export ANTHROPIC_API_KEY="sk-ant-..." # for emulated path
Example response shape​
Validating the Output Format​
Regardless of which path ran, the response always follows the OpenAI Responses API format:
{
"output": [
{
"type": "file_search_call",
"id": "fs_abc123",
"status": "completed",
"queries": ["What does LiteLLM support?"],
"search_results": null
},
{
"type": "message",
"role": "assistant",
"content": [
{
"type": "output_text",
"text": "LiteLLM is a unified interface...",
"annotations": [
{
"type": "file_citation",
"index": 150,
"file_id": "file-xxxx",
"filename": "knowledge.txt"
}
]
}
]
}
]
}
Validation script:
Validate response structure
def validate_file_search_response(response):
"""Assert that response follows OpenAI file_search output format."""
output = response.output
assert len(output) >= 2, "Expected at least 2 output items"
# First item: file_search_call
fs_call = output[0]
fs_type = fs_call["type"] if isinstance(fs_call, dict) else fs_call.type
assert fs_type == "file_search_call", f"Expected file_search_call, got {fs_type}"
fs_status = fs_call["status"] if isinstance(fs_call, dict) else fs_call.status
assert fs_status == "completed"
# Second item: message
msg = output[1]
msg_type = msg["type"] if isinstance(msg, dict) else msg.type
assert msg_type == "message"
content = msg["content"] if isinstance(msg, dict) else msg.content
assert len(content) > 0
text_block = content[0]
text = text_block["text"] if isinstance(text_block, dict) else text_block.text
assert isinstance(text, str) and len(text) > 0
print("✅ Response structure valid")
print(f" Queries: {fs_call['queries'] if isinstance(fs_call, dict) else fs_call.queries}")
print(f" Answer length: {len(text)} chars")
annotations = text_block["annotations"] if isinstance(text_block, dict) else text_block.annotations
print(f" Citations: {len(annotations)}")
validate_file_search_response(response)
Q&A​
- Why do I see
UnsupportedParamsError? This usually meansfile_searchwas passed to a provider that does not support it natively and emulation could not route correctly. Check:- The model string is valid (for example,
anthropic/claude-sonnet-4-5). custom_llm_providerresolves correctly so LiteLLM can load the provider config.
- The model string is valid (for example,
- Why does vector search return no results? Common causes:
- The vector store ID is wrong or has no files attached.
- In LiteLLM-managed stores, file ingestion is not complete (
status != completed). - The query is too narrow; try a broader query.
- Why am I getting
403 Access deniedon vector store calls? The caller does not have access to that vector store.- The store may belong to another team.
- Use an admin/proxy key if your setup requires cross-team access.
- Why are
annotationsempty in emulated mode?file_citationannotations requirefile_idmetadata in search results. If your vector backend does not return file-level metadata, the answer text is still generated but citations can be empty.
What to check next​
- File Search reference in Responses API docs — full API reference
- Vector Store management — create and manage vector stores
- Managed vector stores — provider-specific setup