Observability

Monitor, audit, and trace everything your agents do — session reports, OpenTelemetry export, and tamper-proof audit logs.

Session Reports#

Generate markdown reports summarizing session activity for auditing, debugging, and compliance.

# Quick summary of latest session
agentsh report latest --level=summary

# Detailed investigation with full timeline
agentsh report <session-id> --level=detailed --output=report.md

# Offline mode (no server required)
agentsh report latest --level=summary --direct-db

Report Levels

Level	Contents
`summary`	Overview, activity counts, security findings, decision summary
`detailed`	Everything in summary plus command history, file access, network connections, resource usage, and full event timeline

Example Summary Report

# Session Report: sess-abc123

**Generated:** 2025-01-15T10:31:00Z
**Report Level:** Summary

## Session Overview

| Property | Value |
|----------|-------|
| Session ID | sess-abc123 |
| Duration | 25s |
| Workspace | /home/user/project |

## Activity Summary

| Metric | Count |
|--------|-------|
| Commands Executed | 6 |
| Files Accessed | 1 |
| Network Connections | 2 |
| Policy Denials | 2 |

## Security Findings

### Critical
- **Dangerous command blocked**: `rm -rf /` - rm -rf blocked for safety

### Warning
- **Network access denied**: Connection to `internal.corp.local:80` blocked

## Policy Decisions

| Decision | Count |
|----------|-------|
| Allow | 5 |
| Deny | 2 |
| Redirect | 0 |

Example Detailed Report (excerpt)

## Command History

| Time | Command | Decision | Exit Code | Duration |
|------|---------|----------|-----------|----------|
| 10:30:01 | `ls -la` | allow | 0 | 126ms |
| 10:30:05 | `git status` | allow | 0 | 149ms |
| 10:30:15 | `rm -rf /` | **deny** | - | - |
| 10:30:20 | `curl https://api.github.com` | allow | 0 | 499ms |

## Network Connections

| Time | Domain | Port | Decision | Rule |
|------|--------|------|----------|------|
| 10:30:20 | api.github.com | 443 | allow | github.com allowed |
| 10:30:25 | internal.corp.local | 80 | **deny** | internal networks blocked |

## Event Timeline

```
10:30:00.000 [session_created] Session started in /home/user/project
10:30:01.123 [command_policy] ls -la → allow
10:30:15.000 [command_policy] rm -rf / → DENY (rm -rf blocked)
10:30:20.010 [net_connect] api.github.com:443 → allow
10:30:25.010 [net_connect] internal.corp.local:80 → DENY
```

OpenTelemetry Export#

agentsh can export audit events as OpenTelemetry log records via OTLP, shipping them to any OTEL-compatible collector (Grafana Alloy, Datadog Agent, Honeycomb, etc.). Events flow through the same pipeline as SQLite, JSONL, and webhook stores—export failures never block the caller.

audit:
  otel:
    enabled: true
    endpoint: otel-collector.internal:4317
    protocol: grpc   # grpc or http

Events are converted to OTEL LogRecords with semantic-convention attributes, batched, and exported asynchronously. If the collector is unreachable, the SDK retries with exponential backoff and silently drops events after exhausting retries—the primary SQLite store always has the authoritative copy. If the OTEL store fails to initialize at startup the server logs an error and continues without it; event recording is never disrupted.

Environment variable overrides

AGENTSH_OTEL_ENDPOINT and AGENTSH_OTEL_PROTOCOL override the config file. The standard OTEL_EXPORTER_OTLP_ENDPOINT is also respected as a fallback when AGENTSH_OTEL_ENDPOINT is not set.

Plaintext warning

When tls.enabled is false, agentsh logs a warning at startup: OTEL export is configured without TLS; event data will be sent in plaintext. Enable TLS for production deployments.

Configuration#

audit:
  otel:
    enabled: false
    endpoint: localhost:4317       # collector host:port
    protocol: grpc                 # grpc or http

    tls:
      enabled: false
      cert_file: ""               # client certificate
      key_file: ""                # client key
      insecure: false             # skip server cert verification (dev only)

    headers:                        # custom headers (e.g. auth tokens)
      Authorization: "Bearer ${OTEL_TOKEN}"

    timeout: 10s                   # export timeout per batch

    signals:
      logs: true                   # export as OTEL log records
      spans: true                  # accepted but not yet implemented

    batch:
      max_size: 512                # records per batch
      timeout: 5s                 # auto-flush interval

    filter:
      include_types: []             # glob patterns: ["file_*", "net_*"]
      exclude_types: []             # glob patterns: ["file_stat"]
      include_categories: []        # exact: ["file", "network"]
      exclude_categories: []
      min_risk_level: ""            # low, medium, high, or critical

    resource:
      service_name: agentsh         # OTEL resource service.name
      extra_attributes: {}          # additional resource key-values

Field	Default	Description
`enabled`	`false`	Enable OTEL event export. When `false` the entire OTEL pipeline is skipped.
`endpoint`	`localhost:4317`	Collector address (`host:port`). Required when enabled—validation fails without it.
`protocol`	`grpc`	`grpc` or `http` (OTLP). Any other value is rejected at startup.
`tls.enabled`	`false`	Enable TLS for the exporter connection. When `false`, a plaintext warning is logged.
`tls.cert_file`	none	Path to client certificate for mutual TLS. Only used when `tls.enabled` is `true`.
`tls.key_file`	none	Path to client key for mutual TLS. Must be set together with `cert_file`.
`tls.insecure`	`false`	Skip server certificate verification. Development only—do not use in production.
`headers`	none	Custom HTTP headers sent with every export request (e.g. `Authorization` tokens).
`timeout`	`10s`	Export timeout per batch. Must be a valid Go duration string.
`signals.logs`	`true`*	Export events as OTEL log records.
`signals.spans`	`true`*	Accepted in config but span export is not yet implemented; has no effect.
`batch.max_size`	`512`	Maximum records per export batch. When reached the batch is flushed immediately.
`batch.timeout`	`5s`	Auto-flush interval. Pending records are exported even if the batch is not full.
`resource.service_name`	`agentsh`	OTEL `service.name` resource attribute.
`resource.extra_attributes`	none	Additional key-value pairs added to the OTEL resource (e.g. `deployment.environment: prod`).

* signals default: when neither logs nor spans is explicitly set, both default to true. If you explicitly set one (e.g. logs: true), the other stays false.

TLS in production

When tls.enabled is true and tls.insecure is false (the default), the OS certificate store is used for server verification. Supply cert_file and key_file for mutual TLS.

Event Filtering#

Filters reduce export volume by selecting which events reach the collector. When all filter fields are empty (the default), every event is exported.

Filter	Type	When empty	Semantics
`include_types`	glob list	All types pass	Event type must match at least one pattern
`exclude_types`	glob list	Nothing excluded	Events matching any pattern are dropped
`include_categories`	exact list	All categories pass	Event category must be in the list
`exclude_categories`	exact list	Nothing excluded	Events in matching categories are dropped
`min_risk_level`	string	No risk filtering	Only export events at or above this level (`low` < `medium` < `high` < `critical`). Events that do not carry a risk level always pass this filter.

Evaluation order: include types → include categories → exclude types → exclude categories → min risk level. Glob patterns support * and ? wildcards. Valid values for min_risk_level are low, medium, high, and critical—any other value is rejected at startup.

# Only high-risk file and network events
filter:
  include_categories: [file, network]
  min_risk_level: high

# Everything except noisy stat/list operations
filter:
  exclude_types: [file_stat, dir_list]

Event Reference#

Every operation intercepted by agentsh produces a typed event. Events flow to all configured stores (SQLite, JSONL, webhooks, OTEL). Use the type and category names below with include_types, exclude_types, include_categories, and exclude_categories filters.

File (category: `file`)

Event Type	Description
`file_open`	File opened for reading or writing
`file_read`	File contents read
`file_write`	File contents written
`file_create`	New file created
`file_delete`	File deleted
`file_rename`	File renamed or moved
`file_stat`	File metadata queried
`file_chmod`	File permissions changed
`dir_create`	Directory created
`dir_delete`	Directory deleted
`dir_list`	Directory contents listed

Network (category: `network`)

Event Type	Description
`dns_query`	DNS resolution attempt
`net_connect`	Outbound TCP/network connection
`net_listen`	Socket bound for listening
`net_accept`	Incoming connection accepted
`dns_redirect`	DNS resolution redirected to different address
`connect_redirect`	Network connection redirected to different destination
`connect_redirect_fallback`	Redirect target unreachable, fell back to original destination

Process (category: `process`)

Event Type	Description
`process_start`	Process started
`process_spawn`	Child process created
`process_exit`	Process exited
`process_tree_kill`	Entire process tree terminated

Environment (category: `environment`)

Event Type	Description
`env_read`	Environment variable read
`env_write`	Environment variable set or modified
`env_list`	Environment variables enumerated
`env_blocked`	Environment variable access blocked by policy

Trash (category: `trash`)

Event Type	Description
`soft_delete`	File diverted to trash instead of deleted
`trash_restore`	File restored from trash
`trash_purge`	Trash entries permanently purged

Shell (category: `shell`)

Event Type	Description
`shell_invoke`	Shell shim intercepted a shell invocation
`shell_passthrough`	Shell shim bypassed (not in agentsh mode)
`session_autostart`	Server auto-started by shim on first invocation

Command (category: `command`)

Event Type	Description
`command_intercept`	Command evaluated by the policy engine
`command_redirect`	Command redirected to a different binary
`command_blocked`	Command denied by policy
`path_redirect`	File path redirected to a different location

Resource (category: `resource`)

Event Type	Description
`resource_limit_set`	Resource limits applied to process or session
`resource_limit_warning`	Resource usage approaching configured threshold
`resource_limit_exceeded`	Resource limit exceeded
`resource_usage_snapshot`	Periodic resource usage snapshot

IPC (category: `ipc`)

Event Type	Description
`unix_socket_connect`	Unix domain socket connection
`unix_socket_bind`	Unix domain socket bound
`unix_socket_blocked`	Unix socket operation blocked by policy
`named_pipe_open`	Windows named pipe opened
`named_pipe_blocked`	Windows named pipe blocked by policy
`ipc_observed`	IPC activity detected (audit only, no enforcement)

Seccomp (category: `seccomp`)

Event Type	Description
`seccomp_blocked`	Process killed by seccomp for a blocked syscall
`notify_handler_panic`	Seccomp notify handler recovered from a panic (includes stack trace)

Signal (category: `signal`)

Event Type	Description
`signal_sent`	Signal delivered to a process
`signal_blocked`	Signal blocked by policy
`signal_redirected`	Signal redirected to a different target or signal number
`signal_absorbed`	Signal absorbed (not delivered)
`signal_approved`	Signal approved after pending human approval
`signal_would_deny`	Signal would be denied (audit mode, not enforced)

MCP (category: `mcp`)

Event Type	Description
`mcp_tool_seen`	MCP tool detected and registered
`mcp_tool_changed`	MCP tool definition changed (rug-pull detection)
`mcp_tool_called`	MCP tool call observed in agent request
`mcp_detection`	MCP security pattern detected
`mcp_tool_call_intercepted`	MCP tool call evaluated by proxy (allow or block)
`mcp_cross_server_blocked`	Cross-server attack rule triggered (shadow, burst, read-then-send, flow)
`mcp_network_connection`	Network connection to a known MCP server address
`mcp_server_name_similarity`	MCP server name suspiciously similar to a known server (typosquat detection)

Policy (category: `policy`)

Event Type	Description
`policy_loaded`	Policy loaded (at startup, reload, or via API)
`policy_changed`	Active policy replaced with a new version

Package (category: `package`)

Event Type	Description
`package_check_started`	Package install security check initiated for a command
`package_check_completed`	Package check finished with an overall verdict
`package_blocked`	Package install blocked by policy (critical vulnerability, malware, etc.)
`package_approved`	Package install approved after human approval or policy allow
`package_warning`	Package check produced warnings but install was permitted
`package_provider_error`	A check provider failed (timeout, API error, rate-limited)

Attributes#

Each log record carries attributes following OTEL semantic conventions where applicable, plus agentsh-specific fields under the canyonroad.* namespace.

Semantic Conventions

Attribute	Source
`process.pid`	Process ID
`process.parent_pid`	Parent process ID
`process.executable.path`	Binary path

agentsh Namespace

Attribute	Description
`canyonroad.product`	Product identifier (always `"agentsh"`)
`canyonroad.event.id`	Unique event identifier
`canyonroad.event.type`	Event type (always present)
`canyonroad.session.id`	Session identifier
`canyonroad.command.id`	Command identifier
`canyonroad.source`	Event source
`canyonroad.path`	File path
`canyonroad.domain`	Network domain
`canyonroad.remote`	Remote address
`canyonroad.operation`	Operation name
`canyonroad.effective_action`	Final action taken
`canyonroad.decision`	Policy decision
`canyonroad.policy.rule`	Matching policy rule

Well-Known Fields

These are extracted from the event's fields map when present. Only non-empty values are included.

Attribute	Type	Description
`canyonroad.risk_level`	string	Risk level (low/medium/high/critical)
`canyonroad.agent_id`	string	Agent identifier
`canyonroad.agent_type`	string	Agent type
`canyonroad.agent_framework`	string	Agent framework name
`canyonroad.tenant_id`	string	Tenant identifier
`canyonroad.workspace_id`	string	Workspace identifier
`canyonroad.policy_name`	string	Name of the matching policy
`canyonroad.latency_us`	int	Total latency in microseconds
`canyonroad.queue_time_us`	int	Queue wait time in microseconds
`canyonroad.policy_eval_us`	int	Policy evaluation time in microseconds
`canyonroad.intercept_us`	int	Intercept processing time in microseconds
`canyonroad.backend_us`	int	Backend processing time in microseconds
`canyonroad.error`	string	Error message
`canyonroad.error_code`	string	Error code

Severity Mapping

The policy decision determines the log record severity:

Decision	Severity
`allow`, `audit`	INFO
`redirect`, `approve`, `soft_delete`	WARN
`deny`	ERROR

Trace Correlation

If an event's fields map contains trace_id (32-hex-char) and/or span_id (16-hex-char), they are attached to the log record for correlation with distributed traces.

Log Record Body

Each record's body is a human-readable summary:

file_write: /workspace/test.go [allow]
net_connect: 1.2.3.4:443 [deny]
dns_query: example.com [redirect]
process_start

Example OTLP Log Records#

Below is how two events appear as OTLP JSON log records after conversion. This is the payload sent to the collector.

{
  "resourceLogs": [{
    "resource": {
      "attributes": [
        { "key": "service.name", "value": { "stringValue": "agentsh" } }
      ]
    },
    "scopeLogs": [{
      "scope": { "name": "agentsh" },
      "logRecords": [
        {
          "timeUnixNano": "1708200015000000000",
          "severityNumber": 9,
          "severityText": "INFO",
          "body": { "stringValue": "file_write: /workspace/main.go [allow]" },
          "attributes": [
            { "key": "process.pid",              "value": { "intValue": "48201" } },
            { "key": "process.executable.path",  "value": { "stringValue": "/usr/bin/node" } },
            { "key": "canyonroad.product",           "value": { "stringValue": "agentsh" } },
            { "key": "canyonroad.event.type",       "value": { "stringValue": "file_write" } },
            { "key": "canyonroad.event.id",         "value": { "stringValue": "evt-9f3a2b" } },
            { "key": "canyonroad.session.id",       "value": { "stringValue": "sess-abc123" } },
            { "key": "canyonroad.path",             "value": { "stringValue": "/workspace/main.go" } },
            { "key": "canyonroad.operation",        "value": { "stringValue": "write" } },
            { "key": "canyonroad.decision",         "value": { "stringValue": "allow" } },
            { "key": "canyonroad.policy.rule",      "value": { "stringValue": "workspace-write" } },
            { "key": "canyonroad.risk_level",       "value": { "stringValue": "low" } },
            { "key": "canyonroad.agent_id",         "value": { "stringValue": "claude-code-1" } },
            { "key": "canyonroad.latency_us",       "value": { "intValue": "340" } }
          ],
          "traceId": "",
          "spanId": ""
        },
        {
          "timeUnixNano": "1708200020000000000",
          "severityNumber": 17,
          "severityText": "ERROR",
          "body": { "stringValue": "net_connect: 10.0.0.5:6379 [deny]" },
          "attributes": [
            { "key": "process.pid",              "value": { "intValue": "48201" } },
            { "key": "canyonroad.product",           "value": { "stringValue": "agentsh" } },
            { "key": "canyonroad.event.type",       "value": { "stringValue": "net_connect" } },
            { "key": "canyonroad.event.id",         "value": { "stringValue": "evt-c71d04" } },
            { "key": "canyonroad.session.id",       "value": { "stringValue": "sess-abc123" } },
            { "key": "canyonroad.remote",           "value": { "stringValue": "10.0.0.5:6379" } },
            { "key": "canyonroad.decision",         "value": { "stringValue": "deny" } },
            { "key": "canyonroad.policy.rule",      "value": { "stringValue": "no-internal-network" } },
            { "key": "canyonroad.risk_level",       "value": { "stringValue": "high" } },
            { "key": "canyonroad.agent_id",         "value": { "stringValue": "claude-code-1" } }
          ],
          "traceId": "4bf92f3577b34da6a3ce929d0e0e4736",
          "spanId": "00f067aa0ba902b7"
        }
      ]
    }]
  }]
}

The first record shows an allowed file write at INFO severity. The second shows a denied network connection at ERROR severity with trace correlation IDs set—these came from trace_id and span_id in the event's fields map.

W3C Distributed Tracing#

agentsh supports W3C Trace Context propagation, allowing you to correlate agentsh events with traces from external observability systems. When a trace context is set on a session, every subsequent event carries trace_id, span_id, and trace_flags that integrate with your existing OpenTelemetry pipelines.

Setting Trace Context via the REST API

External orchestrators or CI systems can inject a trace context into a running session:

# Set trace context for a session
curl -X PUT http://localhost:18080/api/v1/sessions/$SID/trace-context \
  -H "Content-Type: application/json" \
  -d '{
    "trace_id": "0af7651916cd43dd8448eb211c80319c",
    "span_id": "b7ad6b7169203331",
    "trace_flags": "01"
  }'

Field	Format	Required	Description
`trace_id`	32 hex chars	Yes	W3C trace identifier (must not be all zeros)
`span_id`	16 hex chars	No	Parent span identifier (must not be all zeros)
`trace_flags`	2 hex chars	No	Sampling flag: `01` = sampled, `00` = unsampled

Propagation via gRPC and Environment

Trace context also propagates through two additional paths:

gRPC metadata: The traceparent header is extracted from gRPC calls using the standard W3C format: 00-{trace_id}-{span_id}-{trace_flags}
Environment variable: When TRACEPARENT is set, the agentsh client automatically propagates it to gRPC calls

Event Injection

Once trace context is set, all events emitted during command execution—file I/O, network connections, policy decisions—include the trace fields:

{
  "type": "file_write",
  "session_id": "sess-abc123",
  "trace_id": "0af7651916cd43dd8448eb211c80319c",
  "span_id": "b7ad6b7169203331",
  "trace_flags": "01",
  "path": "/workspace/main.go"
}

When exported via OTEL, the trace_id and span_id are set on the log record's span context, enabling direct correlation in tools like Grafana Tempo, Jaeger, or Honeycomb. Upstream sampling decisions are respected—an unsampled trace flag (00) is propagated as-is rather than being forced to sampled.

Audit Log Integrity#

agentsh audit logs are chained with HMAC signatures for tamper detection. Verify the integrity chain of any audit log file:

# Verify audit log integrity with a key file
agentsh audit verify /var/log/agentsh/audit.log --key-file /etc/agentsh/hmac.key

# Verify using an environment variable for the key
agentsh audit verify /var/log/agentsh/audit.log --key-env AGENTSH_HMAC_KEY

# Use SHA-512 algorithm
agentsh audit verify /var/log/agentsh/audit.log --key-file hmac.key --algorithm hmac-sha512

Observability

Session Reports#

Report Levels

Example Summary Report

Example Detailed Report (excerpt)

OpenTelemetry Export#

Configuration#

Event Filtering#

Event Reference#

File (category: file)

Network (category: network)

Process (category: process)

Environment (category: environment)

Trash (category: trash)

Shell (category: shell)

Command (category: command)

Resource (category: resource)

IPC (category: ipc)

Seccomp (category: seccomp)

Signal (category: signal)

MCP (category: mcp)

Policy (category: policy)

Package (category: package)

Attributes#

Semantic Conventions

agentsh Namespace

Well-Known Fields

Severity Mapping

Trace Correlation

Log Record Body

Example OTLP Log Records#

W3C Distributed Tracing#

Setting Trace Context via the REST API

Propagation via gRPC and Environment

Event Injection

Audit Log Integrity#

File (category: `file`)

Network (category: `network`)

Process (category: `process`)

Environment (category: `environment`)

Trash (category: `trash`)

Shell (category: `shell`)

Command (category: `command`)

Resource (category: `resource`)

IPC (category: `ipc`)

Seccomp (category: `seccomp`)

Signal (category: `signal`)

MCP (category: `mcp`)

Policy (category: `policy`)

Package (category: `package`)