Learn how to manually instrument your code to use Sentry's Agents module.
With Sentry AI Agent Monitoring, you can monitor and debug your AI systems with full-stack context. You'll be able to track key insights like token usage, latency, tool usage, and error rates. AI Agent Monitoring data will be fully connected to your other Sentry data like logs, errors, and traces.
As a prerequisite to setting up AI Agent Monitoring with Ruby, you'll need to first set up tracing. Once this is done, you can use the custom instrumentation described below to capture AI agent spans.
For your AI agents data to show up in the Sentry AI Agents Insights, at least one of the AI spans needs to be created and have well-defined names and data attributes. See details below.
Make sure that there's a transaction running when you create the spans. If you're using a web framework like Rails those transactions will be created for you automatically.
The text representation of the model's responses. [0]
"[\"The weather in Paris is rainy\", \"The weather in London is sunny\"]"
gen_ai.usage.input_tokens.cache_write
int
optional
The number of tokens written to the cache when processing the AI input (prompt).
100
gen_ai.usage.input_tokens.cached
int
optional
The number of cached tokens used in the AI input (prompt)
50
gen_ai.usage.input_tokens
int
optional
The number of tokens used in the AI input (prompt).
10
gen_ai.usage.output_tokens.reasoning
int
optional
The number of tokens used for reasoning.
30
gen_ai.usage.output_tokens
int
optional
The number of tokens used in the AI response.
100
gen_ai.usage.total_tokens
int
optional
The total number of tokens used to process the prompt. (input and output)
190
[0]: Span attributes only allow primitive data types. This means you need to use a stringified version of a list of dictionaries. Do NOT set [{"foo": "bar"}] but rather the string "[{\"foo\": \"bar\"}]".
[1]: Each message item uses the format {role:"", content:""}. The role can be "user", "assistant", or "system". The content can be either a string or a list of dictionaries.
require'json'messages =[{role:'user',content:'Tell me a joke'}]Sentry.with_child_span(op:'gen_ai.request',description:'chat o3-mini')do|span| span.set_data('gen_ai.request.model','o3-mini') span.set_data('gen_ai.request.messages', messages.to_json) span.set_data('gen_ai.operation.name','invoke_agent')# Call your LLM here result = client.chat(model:'o3-mini',messages: messages) span.set_data('gen_ai.response.text',[result.choices[0].message.content].to_json)# Set token usage span.set_data('gen_ai.usage.input_tokens', result.usage.prompt_tokens) span.set_data('gen_ai.usage.output_tokens', result.usage.completion_tokens)end
The text representation of the model's responses. [0]
"[\"The weather in Paris is rainy\", \"The weather in London is sunny\"]"
gen_ai.usage.input_tokens.cache_write
int
optional
The number of tokens written to the cache when processing the AI input (prompt).
100
gen_ai.usage.input_tokens.cached
int
optional
The number of cached tokens used in the AI input (prompt)
50
gen_ai.usage.input_tokens
int
optional
The number of tokens used in the AI input (prompt).
10
gen_ai.usage.output_tokens.reasoning
int
optional
The number of tokens used for reasoning.
30
gen_ai.usage.output_tokens
int
optional
The number of tokens used in the AI response.
100
gen_ai.usage.total_tokens
int
optional
The total number of tokens used to process the prompt. (input and output)
190
[0]: Span attributes only allow primitive data types (like int, float, boolean, string). This means you need to use a stringified version of a list of dictionaries. Do NOT set [{"foo": "bar"}] but rather the string "[{\"foo\": \"bar\"}]".
[1]: Each message item uses the format {role:"", content:""}. The role can be "user", "assistant", or "system". The content can be either a string or a list of dictionaries.
This span marks the transition of control from one agent to another, typically when the current agent determines another agent is better suited to handle the task.
Handoff span attributes
A span that describes the handoff from one agent to another.
The spans op MUST be "gen_ai.handoff".
The spans name SHOULD be "handoff from {from_agent} to {to_agent}".
Sentry.with_child_span(op:'gen_ai.handoff',description:'handoff from Weather Agent to Travel Agent')do|span|# Handoff span just marks the transitionendSentry.with_child_span(op:'gen_ai.invoke_agent',description:'invoke_agent Travel Agent')do|span|# Run the target agent hereend
When manually setting token attributes, be aware of how Sentry uses them to calculate model costs.
Cached and reasoning tokens are subsets, not separate counts.gen_ai.usage.input_tokens is the total input token count that already includes any cached tokens. Similarly, gen_ai.usage.output_tokens already includes reasoning tokens. Sentry subtracts the cached/reasoning counts from the totals to compute the "raw" portion, so reporting them incorrectly can produce wrong or negative costs.
For example, say your LLM call uses 100 input tokens total, 90 of which were served from cache. Using a standard rate of $0.01 per token and a cached rate of $0.001 per token:
Correct — input_tokens is the total (includes cached):
Help improve this content Our documentation is open source and available on GitHub. Your contributions are welcome, whether fixing a typo (drat!) or suggesting an update ("yeah, this would be better").