Observability
Monitor, audit, and trace everything your agents do — session reports, OpenTelemetry export, and tamper-proof audit logs.
Session Reports#
Generate markdown reports summarizing session activity for auditing, debugging, and compliance.
# Quick summary of latest session
agentsh report latest --level=summary
# Detailed investigation with full timeline
agentsh report <session-id> --level=detailed --output=report.md
# Offline mode (no server required)
agentsh report latest --level=summary --direct-db
Report Levels
| Level | Contents |
|---|---|
summary | Overview, activity counts, security findings, decision summary |
detailed | Everything in summary plus command history, file access, network connections, resource usage, and full event timeline |
Example Summary Report
# Session Report: sess-abc123
**Generated:** 2025-01-15T10:31:00Z
**Report Level:** Summary
## Session Overview
| Property | Value |
|----------|-------|
| Session ID | sess-abc123 |
| Duration | 25s |
| Workspace | /home/user/project |
## Activity Summary
| Metric | Count |
|--------|-------|
| Commands Executed | 6 |
| Files Accessed | 1 |
| Network Connections | 2 |
| Policy Denials | 2 |
## Security Findings
### Critical
- **Dangerous command blocked**: `rm -rf /` - rm -rf blocked for safety
### Warning
- **Network access denied**: Connection to `internal.corp.local:80` blocked
## Policy Decisions
| Decision | Count |
|----------|-------|
| Allow | 5 |
| Deny | 2 |
| Redirect | 0 |
Example Detailed Report (excerpt)
## Command History
| Time | Command | Decision | Exit Code | Duration |
|------|---------|----------|-----------|----------|
| 10:30:01 | `ls -la` | allow | 0 | 126ms |
| 10:30:05 | `git status` | allow | 0 | 149ms |
| 10:30:15 | `rm -rf /` | **deny** | - | - |
| 10:30:20 | `curl https://api.github.com` | allow | 0 | 499ms |
## Network Connections
| Time | Domain | Port | Decision | Rule |
|------|--------|------|----------|------|
| 10:30:20 | api.github.com | 443 | allow | github.com allowed |
| 10:30:25 | internal.corp.local | 80 | **deny** | internal networks blocked |
## Event Timeline
```
10:30:00.000 [session_created] Session started in /home/user/project
10:30:01.123 [command_policy] ls -la → allow
10:30:15.000 [command_policy] rm -rf / → DENY (rm -rf blocked)
10:30:20.010 [net_connect] api.github.com:443 → allow
10:30:25.010 [net_connect] internal.corp.local:80 → DENY
```
OpenTelemetry Export#
agentsh can export audit events as OpenTelemetry log records via OTLP, shipping them to any OTEL-compatible collector (Grafana Alloy, Datadog Agent, Honeycomb, etc.). Events flow through the same pipeline as SQLite, JSONL, and webhook stores—export failures never block the caller.
audit:
otel:
enabled: true
endpoint: otel-collector.internal:4317
protocol: grpc # grpc or http
Events are converted to OTEL LogRecords with semantic-convention attributes, batched, and exported asynchronously. If the collector is unreachable, the SDK retries with exponential backoff and silently drops events after exhausting retries—the primary SQLite store always has the authoritative copy. If the OTEL store fails to initialize at startup the server logs an error and continues without it; event recording is never disrupted.
AGENTSH_OTEL_ENDPOINT and AGENTSH_OTEL_PROTOCOL override the config file. The standard OTEL_EXPORTER_OTLP_ENDPOINT is also respected as a fallback when AGENTSH_OTEL_ENDPOINT is not set.
When tls.enabled is false, agentsh logs a warning at startup: OTEL export is configured without TLS; event data will be sent in plaintext. Enable TLS for production deployments.
Configuration#
audit:
otel:
enabled: false
endpoint: localhost:4317 # collector host:port
protocol: grpc # grpc or http
tls:
enabled: false
cert_file: "" # client certificate
key_file: "" # client key
insecure: false # skip server cert verification (dev only)
headers: # custom headers (e.g. auth tokens)
Authorization: "Bearer ${OTEL_TOKEN}"
timeout: 10s # export timeout per batch
signals:
logs: true # export as OTEL log records
spans: true # accepted but not yet implemented
batch:
max_size: 512 # records per batch
timeout: 5s # auto-flush interval
filter:
include_types: [] # glob patterns: ["file_*", "net_*"]
exclude_types: [] # glob patterns: ["file_stat"]
include_categories: [] # exact: ["file", "network"]
exclude_categories: []
min_risk_level: "" # low, medium, high, or critical
resource:
service_name: agentsh # OTEL resource service.name
extra_attributes: {} # additional resource key-values
| Field | Default | Description |
|---|---|---|
enabled | false | Enable OTEL event export. When false the entire OTEL pipeline is skipped. |
endpoint | localhost:4317 | Collector address (host:port). Required when enabled—validation fails without it. |
protocol | grpc | grpc or http (OTLP). Any other value is rejected at startup. |
tls.enabled | false | Enable TLS for the exporter connection. When false, a plaintext warning is logged. |
tls.cert_file | none | Path to client certificate for mutual TLS. Only used when tls.enabled is true. |
tls.key_file | none | Path to client key for mutual TLS. Must be set together with cert_file. |
tls.insecure | false | Skip server certificate verification. Development only—do not use in production. |
headers | none | Custom HTTP headers sent with every export request (e.g. Authorization tokens). |
timeout | 10s | Export timeout per batch. Must be a valid Go duration string. |
signals.logs | true* | Export events as OTEL log records. |
signals.spans | true* | Accepted in config but span export is not yet implemented; has no effect. |
batch.max_size | 512 | Maximum records per export batch. When reached the batch is flushed immediately. |
batch.timeout | 5s | Auto-flush interval. Pending records are exported even if the batch is not full. |
resource.service_name | agentsh | OTEL service.name resource attribute. |
resource.extra_attributes | none | Additional key-value pairs added to the OTEL resource (e.g. deployment.environment: prod). |
* signals default: when neither logs nor spans is explicitly set, both default to true. If you explicitly set one (e.g. logs: true), the other stays false.
When tls.enabled is true and tls.insecure is false (the default), the OS certificate store is used for server verification. Supply cert_file and key_file for mutual TLS.
Event Filtering#
Filters reduce export volume by selecting which events reach the collector. When all filter fields are empty (the default), every event is exported.
| Filter | Type | When empty | Semantics |
|---|---|---|---|
include_types | glob list | All types pass | Event type must match at least one pattern |
exclude_types | glob list | Nothing excluded | Events matching any pattern are dropped |
include_categories | exact list | All categories pass | Event category must be in the list |
exclude_categories | exact list | Nothing excluded | Events in matching categories are dropped |
min_risk_level | string | No risk filtering | Only export events at or above this level (low < medium < high < critical). Events that do not carry a risk level always pass this filter. |
Evaluation order: include types → include categories → exclude types → exclude categories → min risk level. Glob patterns support * and ? wildcards. Valid values for min_risk_level are low, medium, high, and critical—any other value is rejected at startup.
# Only high-risk file and network events
filter:
include_categories: [file, network]
min_risk_level: high
# Everything except noisy stat/list operations
filter:
exclude_types: [file_stat, dir_list]
Event Reference#
Every operation intercepted by agentsh produces a typed event. Events flow to all configured stores (SQLite, JSONL, webhooks, OTEL). Use the type and category names below with include_types, exclude_types, include_categories, and exclude_categories filters.
File (category: file)
| Event Type | Description |
|---|---|
file_open | File opened for reading or writing |
file_read | File contents read |
file_write | File contents written |
file_create | New file created |
file_delete | File deleted |
file_rename | File renamed or moved |
file_stat | File metadata queried |
file_chmod | File permissions changed |
dir_create | Directory created |
dir_delete | Directory deleted |
dir_list | Directory contents listed |
Network (category: network)
| Event Type | Description |
|---|---|
dns_query | DNS resolution attempt |
net_connect | Outbound TCP/network connection |
net_listen | Socket bound for listening |
net_accept | Incoming connection accepted |
dns_redirect | DNS resolution redirected to different address |
connect_redirect | Network connection redirected to different destination |
connect_redirect_fallback | Redirect target unreachable, fell back to original destination |
Process (category: process)
| Event Type | Description |
|---|---|
process_start | Process started |
process_spawn | Child process created |
process_exit | Process exited |
process_tree_kill | Entire process tree terminated |
Environment (category: environment)
| Event Type | Description |
|---|---|
env_read | Environment variable read |
env_write | Environment variable set or modified |
env_list | Environment variables enumerated |
env_blocked | Environment variable access blocked by policy |
Trash (category: trash)
| Event Type | Description |
|---|---|
soft_delete | File diverted to trash instead of deleted |
trash_restore | File restored from trash |
trash_purge | Trash entries permanently purged |
Shell (category: shell)
| Event Type | Description |
|---|---|
shell_invoke | Shell shim intercepted a shell invocation |
shell_passthrough | Shell shim bypassed (not in agentsh mode) |
session_autostart | Server auto-started by shim on first invocation |
Command (category: command)
| Event Type | Description |
|---|---|
command_intercept | Command evaluated by the policy engine |
command_redirect | Command redirected to a different binary |
command_blocked | Command denied by policy |
path_redirect | File path redirected to a different location |
Resource (category: resource)
| Event Type | Description |
|---|---|
resource_limit_set | Resource limits applied to process or session |
resource_limit_warning | Resource usage approaching configured threshold |
resource_limit_exceeded | Resource limit exceeded |
resource_usage_snapshot | Periodic resource usage snapshot |
IPC (category: ipc)
| Event Type | Description |
|---|---|
unix_socket_connect | Unix domain socket connection |
unix_socket_bind | Unix domain socket bound |
unix_socket_blocked | Unix socket operation blocked by policy |
named_pipe_open | Windows named pipe opened |
named_pipe_blocked | Windows named pipe blocked by policy |
ipc_observed | IPC activity detected (audit only, no enforcement) |
Seccomp (category: seccomp)
| Event Type | Description |
|---|---|
seccomp_blocked | Process killed by seccomp for a blocked syscall |
notify_handler_panic | Seccomp notify handler recovered from a panic (includes stack trace) |
Signal (category: signal)
| Event Type | Description |
|---|---|
signal_sent | Signal delivered to a process |
signal_blocked | Signal blocked by policy |
signal_redirected | Signal redirected to a different target or signal number |
signal_absorbed | Signal absorbed (not delivered) |
signal_approved | Signal approved after pending human approval |
signal_would_deny | Signal would be denied (audit mode, not enforced) |
MCP (category: mcp)
| Event Type | Description |
|---|---|
mcp_tool_seen | MCP tool detected and registered |
mcp_tool_changed | MCP tool definition changed (rug-pull detection) |
mcp_tool_called | MCP tool call observed in agent request |
mcp_detection | MCP security pattern detected |
mcp_tool_call_intercepted | MCP tool call evaluated by proxy (allow or block) |
mcp_cross_server_blocked | Cross-server attack rule triggered (shadow, burst, read-then-send, flow) |
mcp_network_connection | Network connection to a known MCP server address |
mcp_server_name_similarity | MCP server name suspiciously similar to a known server (typosquat detection) |
Policy (category: policy)
| Event Type | Description |
|---|---|
policy_loaded | Policy loaded (at startup, reload, or via API) |
policy_changed | Active policy replaced with a new version |
Package (category: package)
| Event Type | Description |
|---|---|
package_check_started | Package install security check initiated for a command |
package_check_completed | Package check finished with an overall verdict |
package_blocked | Package install blocked by policy (critical vulnerability, malware, etc.) |
package_approved | Package install approved after human approval or policy allow |
package_warning | Package check produced warnings but install was permitted |
package_provider_error | A check provider failed (timeout, API error, rate-limited) |
Attributes#
Each log record carries attributes following OTEL semantic conventions where applicable, plus agentsh-specific fields under the canyonroad.* namespace.
Semantic Conventions
| Attribute | Source |
|---|---|
process.pid | Process ID |
process.parent_pid | Parent process ID |
process.executable.path | Binary path |
agentsh Namespace
| Attribute | Description |
|---|---|
canyonroad.product | Product identifier (always "agentsh") |
canyonroad.event.id | Unique event identifier |
canyonroad.event.type | Event type (always present) |
canyonroad.session.id | Session identifier |
canyonroad.command.id | Command identifier |
canyonroad.source | Event source |
canyonroad.path | File path |
canyonroad.domain | Network domain |
canyonroad.remote | Remote address |
canyonroad.operation | Operation name |
canyonroad.effective_action | Final action taken |
canyonroad.decision | Policy decision |
canyonroad.policy.rule | Matching policy rule |
Well-Known Fields
These are extracted from the event's fields map when present. Only non-empty values are included.
| Attribute | Type | Description |
|---|---|---|
canyonroad.risk_level | string | Risk level (low/medium/high/critical) |
canyonroad.agent_id | string | Agent identifier |
canyonroad.agent_type | string | Agent type |
canyonroad.agent_framework | string | Agent framework name |
canyonroad.tenant_id | string | Tenant identifier |
canyonroad.workspace_id | string | Workspace identifier |
canyonroad.policy_name | string | Name of the matching policy |
canyonroad.latency_us | int | Total latency in microseconds |
canyonroad.queue_time_us | int | Queue wait time in microseconds |
canyonroad.policy_eval_us | int | Policy evaluation time in microseconds |
canyonroad.intercept_us | int | Intercept processing time in microseconds |
canyonroad.backend_us | int | Backend processing time in microseconds |
canyonroad.error | string | Error message |
canyonroad.error_code | string | Error code |
Severity Mapping
The policy decision determines the log record severity:
| Decision | Severity |
|---|---|
allow, audit | INFO |
redirect, approve, soft_delete | WARN |
deny | ERROR |
Trace Correlation
If an event's fields map contains trace_id (32-hex-char) and/or span_id (16-hex-char), they are attached to the log record for correlation with distributed traces.
Log Record Body
Each record's body is a human-readable summary:
file_write: /workspace/test.go [allow]
net_connect: 1.2.3.4:443 [deny]
dns_query: example.com [redirect]
process_start
Example OTLP Log Records#
Below is how two events appear as OTLP JSON log records after conversion. This is the payload sent to the collector.
{
"resourceLogs": [{
"resource": {
"attributes": [
{ "key": "service.name", "value": { "stringValue": "agentsh" } }
]
},
"scopeLogs": [{
"scope": { "name": "agentsh" },
"logRecords": [
{
"timeUnixNano": "1708200015000000000",
"severityNumber": 9,
"severityText": "INFO",
"body": { "stringValue": "file_write: /workspace/main.go [allow]" },
"attributes": [
{ "key": "process.pid", "value": { "intValue": "48201" } },
{ "key": "process.executable.path", "value": { "stringValue": "/usr/bin/node" } },
{ "key": "canyonroad.product", "value": { "stringValue": "agentsh" } },
{ "key": "canyonroad.event.type", "value": { "stringValue": "file_write" } },
{ "key": "canyonroad.event.id", "value": { "stringValue": "evt-9f3a2b" } },
{ "key": "canyonroad.session.id", "value": { "stringValue": "sess-abc123" } },
{ "key": "canyonroad.path", "value": { "stringValue": "/workspace/main.go" } },
{ "key": "canyonroad.operation", "value": { "stringValue": "write" } },
{ "key": "canyonroad.decision", "value": { "stringValue": "allow" } },
{ "key": "canyonroad.policy.rule", "value": { "stringValue": "workspace-write" } },
{ "key": "canyonroad.risk_level", "value": { "stringValue": "low" } },
{ "key": "canyonroad.agent_id", "value": { "stringValue": "claude-code-1" } },
{ "key": "canyonroad.latency_us", "value": { "intValue": "340" } }
],
"traceId": "",
"spanId": ""
},
{
"timeUnixNano": "1708200020000000000",
"severityNumber": 17,
"severityText": "ERROR",
"body": { "stringValue": "net_connect: 10.0.0.5:6379 [deny]" },
"attributes": [
{ "key": "process.pid", "value": { "intValue": "48201" } },
{ "key": "canyonroad.product", "value": { "stringValue": "agentsh" } },
{ "key": "canyonroad.event.type", "value": { "stringValue": "net_connect" } },
{ "key": "canyonroad.event.id", "value": { "stringValue": "evt-c71d04" } },
{ "key": "canyonroad.session.id", "value": { "stringValue": "sess-abc123" } },
{ "key": "canyonroad.remote", "value": { "stringValue": "10.0.0.5:6379" } },
{ "key": "canyonroad.decision", "value": { "stringValue": "deny" } },
{ "key": "canyonroad.policy.rule", "value": { "stringValue": "no-internal-network" } },
{ "key": "canyonroad.risk_level", "value": { "stringValue": "high" } },
{ "key": "canyonroad.agent_id", "value": { "stringValue": "claude-code-1" } }
],
"traceId": "4bf92f3577b34da6a3ce929d0e0e4736",
"spanId": "00f067aa0ba902b7"
}
]
}]
}]
}
The first record shows an allowed file write at INFO severity. The second shows a denied network connection at ERROR severity with trace correlation IDs set—these came from trace_id and span_id in the event's fields map.
W3C Distributed Tracing#
agentsh supports W3C Trace Context propagation, allowing you to correlate agentsh events with traces from external observability systems. When a trace context is set on a session, every subsequent event carries trace_id, span_id, and trace_flags that integrate with your existing OpenTelemetry pipelines.
Setting Trace Context via the REST API
External orchestrators or CI systems can inject a trace context into a running session:
# Set trace context for a session
curl -X PUT http://localhost:18080/api/v1/sessions/$SID/trace-context \
-H "Content-Type: application/json" \
-d '{
"trace_id": "0af7651916cd43dd8448eb211c80319c",
"span_id": "b7ad6b7169203331",
"trace_flags": "01"
}'
| Field | Format | Required | Description |
|---|---|---|---|
trace_id | 32 hex chars | Yes | W3C trace identifier (must not be all zeros) |
span_id | 16 hex chars | No | Parent span identifier (must not be all zeros) |
trace_flags | 2 hex chars | No | Sampling flag: 01 = sampled, 00 = unsampled |
Propagation via gRPC and Environment
Trace context also propagates through two additional paths:
- gRPC metadata: The
traceparentheader is extracted from gRPC calls using the standard W3C format:00-{trace_id}-{span_id}-{trace_flags} - Environment variable: When
TRACEPARENTis set, the agentsh client automatically propagates it to gRPC calls
Event Injection
Once trace context is set, all events emitted during command execution—file I/O, network connections, policy decisions—include the trace fields:
{
"type": "file_write",
"session_id": "sess-abc123",
"trace_id": "0af7651916cd43dd8448eb211c80319c",
"span_id": "b7ad6b7169203331",
"trace_flags": "01",
"path": "/workspace/main.go"
}
When exported via OTEL, the trace_id and span_id are set on the log record's span context, enabling direct correlation in tools like Grafana Tempo, Jaeger, or Honeycomb. Upstream sampling decisions are respected—an unsampled trace flag (00) is propagated as-is rather than being forced to sampled.
Audit Log Integrity#
agentsh audit logs are chained with HMAC signatures for tamper detection. Verify the integrity chain of any audit log file:
# Verify audit log integrity with a key file
agentsh audit verify /var/log/agentsh/audit.log --key-file /etc/agentsh/hmac.key
# Verify using an environment variable for the key
agentsh audit verify /var/log/agentsh/audit.log --key-env AGENTSH_HMAC_KEY
# Use SHA-512 algorithm
agentsh audit verify /var/log/agentsh/audit.log --key-file hmac.key --algorithm hmac-sha512