Observability

Monitor, audit, and trace everything your agents do — session reports, OpenTelemetry export, and tamper-proof audit logs.

Session Reports#

Generate markdown reports summarizing session activity for auditing, debugging, and compliance.

# Quick summary of latest session
agentsh report latest --level=summary

# Detailed investigation with full timeline
agentsh report <session-id> --level=detailed --output=report.md

# Offline mode (no server required)
agentsh report latest --level=summary --direct-db

Report Levels

LevelContents
summaryOverview, activity counts, security findings, decision summary
detailedEverything in summary plus command history, file access, network connections, resource usage, and full event timeline

Example Summary Report

# Session Report: sess-abc123

**Generated:** 2025-01-15T10:31:00Z
**Report Level:** Summary

## Session Overview

| Property | Value |
|----------|-------|
| Session ID | sess-abc123 |
| Duration | 25s |
| Workspace | /home/user/project |

## Activity Summary

| Metric | Count |
|--------|-------|
| Commands Executed | 6 |
| Files Accessed | 1 |
| Network Connections | 2 |
| Policy Denials | 2 |

## Security Findings

### Critical
- **Dangerous command blocked**: `rm -rf /` - rm -rf blocked for safety

### Warning
- **Network access denied**: Connection to `internal.corp.local:80` blocked

## Policy Decisions

| Decision | Count |
|----------|-------|
| Allow | 5 |
| Deny | 2 |
| Redirect | 0 |

Example Detailed Report (excerpt)

## Command History

| Time | Command | Decision | Exit Code | Duration |
|------|---------|----------|-----------|----------|
| 10:30:01 | `ls -la` | allow | 0 | 126ms |
| 10:30:05 | `git status` | allow | 0 | 149ms |
| 10:30:15 | `rm -rf /` | **deny** | - | - |
| 10:30:20 | `curl https://api.github.com` | allow | 0 | 499ms |

## Network Connections

| Time | Domain | Port | Decision | Rule |
|------|--------|------|----------|------|
| 10:30:20 | api.github.com | 443 | allow | github.com allowed |
| 10:30:25 | internal.corp.local | 80 | **deny** | internal networks blocked |

## Event Timeline

```
10:30:00.000 [session_created] Session started in /home/user/project
10:30:01.123 [command_policy] ls -la → allow
10:30:15.000 [command_policy] rm -rf / → DENY (rm -rf blocked)
10:30:20.010 [net_connect] api.github.com:443 → allow
10:30:25.010 [net_connect] internal.corp.local:80 → DENY
```

OpenTelemetry Export#

agentsh can export audit events as OpenTelemetry log records via OTLP, shipping them to any OTEL-compatible collector (Grafana Alloy, Datadog Agent, Honeycomb, etc.). Events flow through the same pipeline as SQLite, JSONL, and webhook stores—export failures never block the caller.

audit:
  otel:
    enabled: true
    endpoint: otel-collector.internal:4317
    protocol: grpc   # grpc or http

Events are converted to OTEL LogRecords with semantic-convention attributes, batched, and exported asynchronously. If the collector is unreachable, the SDK retries with exponential backoff and silently drops events after exhausting retries—the primary SQLite store always has the authoritative copy. If the OTEL store fails to initialize at startup the server logs an error and continues without it; event recording is never disrupted.

Environment variable overrides

AGENTSH_OTEL_ENDPOINT and AGENTSH_OTEL_PROTOCOL override the config file. The standard OTEL_EXPORTER_OTLP_ENDPOINT is also respected as a fallback when AGENTSH_OTEL_ENDPOINT is not set.

Plaintext warning

When tls.enabled is false, agentsh logs a warning at startup: OTEL export is configured without TLS; event data will be sent in plaintext. Enable TLS for production deployments.

Configuration#

audit:
  otel:
    enabled: false
    endpoint: localhost:4317       # collector host:port
    protocol: grpc                 # grpc or http

    tls:
      enabled: false
      cert_file: ""               # client certificate
      key_file: ""                # client key
      insecure: false             # skip server cert verification (dev only)

    headers:                        # custom headers (e.g. auth tokens)
      Authorization: "Bearer ${OTEL_TOKEN}"

    timeout: 10s                   # export timeout per batch

    signals:
      logs: true                   # export as OTEL log records
      spans: true                  # accepted but not yet implemented

    batch:
      max_size: 512                # records per batch
      timeout: 5s                 # auto-flush interval

    filter:
      include_types: []             # glob patterns: ["file_*", "net_*"]
      exclude_types: []             # glob patterns: ["file_stat"]
      include_categories: []        # exact: ["file", "network"]
      exclude_categories: []
      min_risk_level: ""            # low, medium, high, or critical

    resource:
      service_name: agentsh         # OTEL resource service.name
      extra_attributes: {}          # additional resource key-values
FieldDefaultDescription
enabledfalseEnable OTEL event export. When false the entire OTEL pipeline is skipped.
endpointlocalhost:4317Collector address (host:port). Required when enabled—validation fails without it.
protocolgrpcgrpc or http (OTLP). Any other value is rejected at startup.
tls.enabledfalseEnable TLS for the exporter connection. When false, a plaintext warning is logged.
tls.cert_filenonePath to client certificate for mutual TLS. Only used when tls.enabled is true.
tls.key_filenonePath to client key for mutual TLS. Must be set together with cert_file.
tls.insecurefalseSkip server certificate verification. Development only—do not use in production.
headersnoneCustom HTTP headers sent with every export request (e.g. Authorization tokens).
timeout10sExport timeout per batch. Must be a valid Go duration string.
signals.logstrue*Export events as OTEL log records.
signals.spanstrue*Accepted in config but span export is not yet implemented; has no effect.
batch.max_size512Maximum records per export batch. When reached the batch is flushed immediately.
batch.timeout5sAuto-flush interval. Pending records are exported even if the batch is not full.
resource.service_nameagentshOTEL service.name resource attribute.
resource.extra_attributesnoneAdditional key-value pairs added to the OTEL resource (e.g. deployment.environment: prod).

* signals default: when neither logs nor spans is explicitly set, both default to true. If you explicitly set one (e.g. logs: true), the other stays false.

TLS in production

When tls.enabled is true and tls.insecure is false (the default), the OS certificate store is used for server verification. Supply cert_file and key_file for mutual TLS.

Event Filtering#

Filters reduce export volume by selecting which events reach the collector. When all filter fields are empty (the default), every event is exported.

FilterTypeWhen emptySemantics
include_typesglob listAll types passEvent type must match at least one pattern
exclude_typesglob listNothing excludedEvents matching any pattern are dropped
include_categoriesexact listAll categories passEvent category must be in the list
exclude_categoriesexact listNothing excludedEvents in matching categories are dropped
min_risk_levelstringNo risk filteringOnly export events at or above this level (low < medium < high < critical). Events that do not carry a risk level always pass this filter.

Evaluation order: include types → include categories → exclude types → exclude categories → min risk level. Glob patterns support * and ? wildcards. Valid values for min_risk_level are low, medium, high, and critical—any other value is rejected at startup.

# Only high-risk file and network events
filter:
  include_categories: [file, network]
  min_risk_level: high

# Everything except noisy stat/list operations
filter:
  exclude_types: [file_stat, dir_list]

Event Reference#

Every operation intercepted by agentsh produces a typed event. Events flow to all configured stores (SQLite, JSONL, webhooks, OTEL). Use the type and category names below with include_types, exclude_types, include_categories, and exclude_categories filters.

File (category: file)

Event TypeDescription
file_openFile opened for reading or writing
file_readFile contents read
file_writeFile contents written
file_createNew file created
file_deleteFile deleted
file_renameFile renamed or moved
file_statFile metadata queried
file_chmodFile permissions changed
dir_createDirectory created
dir_deleteDirectory deleted
dir_listDirectory contents listed

Network (category: network)

Event TypeDescription
dns_queryDNS resolution attempt
net_connectOutbound TCP/network connection
net_listenSocket bound for listening
net_acceptIncoming connection accepted
dns_redirectDNS resolution redirected to different address
connect_redirectNetwork connection redirected to different destination
connect_redirect_fallbackRedirect target unreachable, fell back to original destination

Process (category: process)

Event TypeDescription
process_startProcess started
process_spawnChild process created
process_exitProcess exited
process_tree_killEntire process tree terminated

Environment (category: environment)

Event TypeDescription
env_readEnvironment variable read
env_writeEnvironment variable set or modified
env_listEnvironment variables enumerated
env_blockedEnvironment variable access blocked by policy

Trash (category: trash)

Event TypeDescription
soft_deleteFile diverted to trash instead of deleted
trash_restoreFile restored from trash
trash_purgeTrash entries permanently purged

Shell (category: shell)

Event TypeDescription
shell_invokeShell shim intercepted a shell invocation
shell_passthroughShell shim bypassed (not in agentsh mode)
session_autostartServer auto-started by shim on first invocation

Command (category: command)

Event TypeDescription
command_interceptCommand evaluated by the policy engine
command_redirectCommand redirected to a different binary
command_blockedCommand denied by policy
path_redirectFile path redirected to a different location

Resource (category: resource)

Event TypeDescription
resource_limit_setResource limits applied to process or session
resource_limit_warningResource usage approaching configured threshold
resource_limit_exceededResource limit exceeded
resource_usage_snapshotPeriodic resource usage snapshot

IPC (category: ipc)

Event TypeDescription
unix_socket_connectUnix domain socket connection
unix_socket_bindUnix domain socket bound
unix_socket_blockedUnix socket operation blocked by policy
named_pipe_openWindows named pipe opened
named_pipe_blockedWindows named pipe blocked by policy
ipc_observedIPC activity detected (audit only, no enforcement)

Seccomp (category: seccomp)

Event TypeDescription
seccomp_blockedProcess killed by seccomp for a blocked syscall
notify_handler_panicSeccomp notify handler recovered from a panic (includes stack trace)

Signal (category: signal)

Event TypeDescription
signal_sentSignal delivered to a process
signal_blockedSignal blocked by policy
signal_redirectedSignal redirected to a different target or signal number
signal_absorbedSignal absorbed (not delivered)
signal_approvedSignal approved after pending human approval
signal_would_denySignal would be denied (audit mode, not enforced)

MCP (category: mcp)

Event TypeDescription
mcp_tool_seenMCP tool detected and registered
mcp_tool_changedMCP tool definition changed (rug-pull detection)
mcp_tool_calledMCP tool call observed in agent request
mcp_detectionMCP security pattern detected
mcp_tool_call_interceptedMCP tool call evaluated by proxy (allow or block)
mcp_cross_server_blockedCross-server attack rule triggered (shadow, burst, read-then-send, flow)
mcp_network_connectionNetwork connection to a known MCP server address
mcp_server_name_similarityMCP server name suspiciously similar to a known server (typosquat detection)

Policy (category: policy)

Event TypeDescription
policy_loadedPolicy loaded (at startup, reload, or via API)
policy_changedActive policy replaced with a new version

Package (category: package)

Event TypeDescription
package_check_startedPackage install security check initiated for a command
package_check_completedPackage check finished with an overall verdict
package_blockedPackage install blocked by policy (critical vulnerability, malware, etc.)
package_approvedPackage install approved after human approval or policy allow
package_warningPackage check produced warnings but install was permitted
package_provider_errorA check provider failed (timeout, API error, rate-limited)

Attributes#

Each log record carries attributes following OTEL semantic conventions where applicable, plus agentsh-specific fields under the canyonroad.* namespace.

Semantic Conventions

AttributeSource
process.pidProcess ID
process.parent_pidParent process ID
process.executable.pathBinary path

agentsh Namespace

AttributeDescription
canyonroad.productProduct identifier (always "agentsh")
canyonroad.event.idUnique event identifier
canyonroad.event.typeEvent type (always present)
canyonroad.session.idSession identifier
canyonroad.command.idCommand identifier
canyonroad.sourceEvent source
canyonroad.pathFile path
canyonroad.domainNetwork domain
canyonroad.remoteRemote address
canyonroad.operationOperation name
canyonroad.effective_actionFinal action taken
canyonroad.decisionPolicy decision
canyonroad.policy.ruleMatching policy rule

Well-Known Fields

These are extracted from the event's fields map when present. Only non-empty values are included.

AttributeTypeDescription
canyonroad.risk_levelstringRisk level (low/medium/high/critical)
canyonroad.agent_idstringAgent identifier
canyonroad.agent_typestringAgent type
canyonroad.agent_frameworkstringAgent framework name
canyonroad.tenant_idstringTenant identifier
canyonroad.workspace_idstringWorkspace identifier
canyonroad.policy_namestringName of the matching policy
canyonroad.latency_usintTotal latency in microseconds
canyonroad.queue_time_usintQueue wait time in microseconds
canyonroad.policy_eval_usintPolicy evaluation time in microseconds
canyonroad.intercept_usintIntercept processing time in microseconds
canyonroad.backend_usintBackend processing time in microseconds
canyonroad.errorstringError message
canyonroad.error_codestringError code

Severity Mapping

The policy decision determines the log record severity:

DecisionSeverity
allow, auditINFO
redirect, approve, soft_deleteWARN
denyERROR

Trace Correlation

If an event's fields map contains trace_id (32-hex-char) and/or span_id (16-hex-char), they are attached to the log record for correlation with distributed traces.

Log Record Body

Each record's body is a human-readable summary:

file_write: /workspace/test.go [allow]
net_connect: 1.2.3.4:443 [deny]
dns_query: example.com [redirect]
process_start

Example OTLP Log Records#

Below is how two events appear as OTLP JSON log records after conversion. This is the payload sent to the collector.

{
  "resourceLogs": [{
    "resource": {
      "attributes": [
        { "key": "service.name", "value": { "stringValue": "agentsh" } }
      ]
    },
    "scopeLogs": [{
      "scope": { "name": "agentsh" },
      "logRecords": [
        {
          "timeUnixNano": "1708200015000000000",
          "severityNumber": 9,
          "severityText": "INFO",
          "body": { "stringValue": "file_write: /workspace/main.go [allow]" },
          "attributes": [
            { "key": "process.pid",              "value": { "intValue": "48201" } },
            { "key": "process.executable.path",  "value": { "stringValue": "/usr/bin/node" } },
            { "key": "canyonroad.product",           "value": { "stringValue": "agentsh" } },
            { "key": "canyonroad.event.type",       "value": { "stringValue": "file_write" } },
            { "key": "canyonroad.event.id",         "value": { "stringValue": "evt-9f3a2b" } },
            { "key": "canyonroad.session.id",       "value": { "stringValue": "sess-abc123" } },
            { "key": "canyonroad.path",             "value": { "stringValue": "/workspace/main.go" } },
            { "key": "canyonroad.operation",        "value": { "stringValue": "write" } },
            { "key": "canyonroad.decision",         "value": { "stringValue": "allow" } },
            { "key": "canyonroad.policy.rule",      "value": { "stringValue": "workspace-write" } },
            { "key": "canyonroad.risk_level",       "value": { "stringValue": "low" } },
            { "key": "canyonroad.agent_id",         "value": { "stringValue": "claude-code-1" } },
            { "key": "canyonroad.latency_us",       "value": { "intValue": "340" } }
          ],
          "traceId": "",
          "spanId": ""
        },
        {
          "timeUnixNano": "1708200020000000000",
          "severityNumber": 17,
          "severityText": "ERROR",
          "body": { "stringValue": "net_connect: 10.0.0.5:6379 [deny]" },
          "attributes": [
            { "key": "process.pid",              "value": { "intValue": "48201" } },
            { "key": "canyonroad.product",           "value": { "stringValue": "agentsh" } },
            { "key": "canyonroad.event.type",       "value": { "stringValue": "net_connect" } },
            { "key": "canyonroad.event.id",         "value": { "stringValue": "evt-c71d04" } },
            { "key": "canyonroad.session.id",       "value": { "stringValue": "sess-abc123" } },
            { "key": "canyonroad.remote",           "value": { "stringValue": "10.0.0.5:6379" } },
            { "key": "canyonroad.decision",         "value": { "stringValue": "deny" } },
            { "key": "canyonroad.policy.rule",      "value": { "stringValue": "no-internal-network" } },
            { "key": "canyonroad.risk_level",       "value": { "stringValue": "high" } },
            { "key": "canyonroad.agent_id",         "value": { "stringValue": "claude-code-1" } }
          ],
          "traceId": "4bf92f3577b34da6a3ce929d0e0e4736",
          "spanId": "00f067aa0ba902b7"
        }
      ]
    }]
  }]
}

The first record shows an allowed file write at INFO severity. The second shows a denied network connection at ERROR severity with trace correlation IDs set—these came from trace_id and span_id in the event's fields map.

W3C Distributed Tracing#

agentsh supports W3C Trace Context propagation, allowing you to correlate agentsh events with traces from external observability systems. When a trace context is set on a session, every subsequent event carries trace_id, span_id, and trace_flags that integrate with your existing OpenTelemetry pipelines.

Setting Trace Context via the REST API

External orchestrators or CI systems can inject a trace context into a running session:

# Set trace context for a session
curl -X PUT http://localhost:18080/api/v1/sessions/$SID/trace-context \
  -H "Content-Type: application/json" \
  -d '{
    "trace_id": "0af7651916cd43dd8448eb211c80319c",
    "span_id": "b7ad6b7169203331",
    "trace_flags": "01"
  }'
FieldFormatRequiredDescription
trace_id32 hex charsYesW3C trace identifier (must not be all zeros)
span_id16 hex charsNoParent span identifier (must not be all zeros)
trace_flags2 hex charsNoSampling flag: 01 = sampled, 00 = unsampled

Propagation via gRPC and Environment

Trace context also propagates through two additional paths:

Event Injection

Once trace context is set, all events emitted during command execution—file I/O, network connections, policy decisions—include the trace fields:

{
  "type": "file_write",
  "session_id": "sess-abc123",
  "trace_id": "0af7651916cd43dd8448eb211c80319c",
  "span_id": "b7ad6b7169203331",
  "trace_flags": "01",
  "path": "/workspace/main.go"
}

When exported via OTEL, the trace_id and span_id are set on the log record's span context, enabling direct correlation in tools like Grafana Tempo, Jaeger, or Honeycomb. Upstream sampling decisions are respected—an unsampled trace flag (00) is propagated as-is rather than being forced to sampled.

Audit Log Integrity#

agentsh audit logs are chained with HMAC signatures for tamper detection. Verify the integrity chain of any audit log file:

# Verify audit log integrity with a key file
agentsh audit verify /var/log/agentsh/audit.log --key-file /etc/agentsh/hmac.key

# Verify using an environment variable for the key
agentsh audit verify /var/log/agentsh/audit.log --key-env AGENTSH_HMAC_KEY

# Use SHA-512 algorithm
agentsh audit verify /var/log/agentsh/audit.log --key-file hmac.key --algorithm hmac-sha512