Punk Docs - Governance

Governance

Punk governance answers: who is acting, what are they trying to do, is it allowed, what evidence was recorded, and can this action influence future optimized routes?

What Governance Covers

Agent identity.
Trust tiers and trust score.
Policy evaluation.
Side-effect classification.
Web session actions.
Workflow MCP tool calls.
Approval-required rules.
Policy exceptions.
Audit events.
Payload redaction.
Encrypted provider/tool credentials.

Agent Identity

Agents are resolved from request identity:

Tenant from bearer token or open dev default.
App from pinned API key or X-Punk-App.
Agent from X-Punk-Agent or anonymous.
Subject from X-Punk-Subject.

The trust engine creates agent records as needed.

Trust tiers:

unverified
verified
trusted
privileged

Trust score is behavioral and currently collapsed into a numeric score in [0,1].

Policy Format

Policies are YAML files in PUNK_POLICIES_DIR.

Minimal example:

name: default
version: "1.0"
description: Default policy

appliesTo:
  trustTiers:
    - unverified
    - verified
    - trusted
    - privileged

rules:
  - effect: allow
    actions:
      - "model:*"
      - "read:*"

  - effect: deny
    actions:
      - "write:payment"
      - "delete:*"

defaultEffect: deny

Approval example:

name: email-approval
version: "1.0"
appliesTo:
  trustTiers: ["verified"]
rules:
  - effect: allow
    actions: ["write:email"]
    requiresApproval: true
defaultEffect: allow

Policy Selection

Punk selects the most specific applicable policy:

agentIds
tags
trustTiers
selectorless policy applies to everyone

Within the selected policy, rules are evaluated in order. First matching action wins. If no rule matches, defaultEffect applies. If no policy applies at all, Punk allows traffic so unconfigured local development does not break.

Action Matching

Actions use <verb>:<resource>.

Examples:

model:chat
read:crm
write:email
delete:file
execute:tool

Wildcards:

*
read:*
*:email
write:email

Side-Effect Classification

SDK tools should declare sideEffectLevel.

Level	Meaning	Example
0	Pure computation	parse, format, math
1	Read-only external	CRM read, search, web read
2	Reversible/idempotent write	upsert with idempotency key
3	User-visible write	email, Slack, ticket creation
4	High-impact write	payments, deletion, permissions

Undeclared tools default to level 3. This is intentional.

Approval Required

When a rule has requiresApproval: true:

The request fails closed in optimize mode.
Punk creates or reuses a pending approval.
Operators decide from the dashboard or API.
An approved policy exception allows matching future requests for the configured TTL.
Decisions are audited.

Observe mode records what would have happened but does not block.

Optimization Promotion Approvals

When a candidate optimized route passes evidence gates but human sign-off is required, Punk opens a promotion approval.

Approval promotes the route. Rejection leaves it unpromoted. Rollback and quarantine remain available after promotion.

Redaction

Set tenant setting redaction=true to redact SDK tool and side-effect payloads before they are stored in run history.

Secrets are also protected in settings:

webhook_secret is accepted by PUT.
GET returns [set] rather than the real value.
Audit metadata never stores the raw secret.

Provider keys (BYOK): a tenant's own OpenAI, Anthropic, OpenRouter, DeepSeek, Moonshot/Kimi, or Gemini API keys live in the encrypted credentials vault (Governance -> Provider keys), are never returned by the API, and every live run's trace records whether the tenant key or the platform key served it. See Configuration § Provider Keys.

Tripwires and Honeytokens

A tripwire is a planted decoy: a fake credential, token, record id, or URL that should never legitimately appear in your agents' traffic. When the decoy turns up, the tripwire fires. It's a deception layer: plant honeytokens, detect exfiltration or misuse.

Plant tripwires under Governance → Tripwires (admin), or via the API:

GET/POST /api/v1/tripwires: list / plant ({kind, value, label}; kind is credential|token|record|url).
POST /api/v1/tripwires/:id/arm | /disarm, DELETE /api/v1/tripwires/:id.
GET /api/v1/tripwire-events: the trip feed.

On every chat request, Punk scans the prompt, model output (and tool args/results where visible) for any armed decoy. The scan is case-sensitive substring matching, and it's free when no tripwires are armed (the armed-value list is cached per tenant, ~60s). When a decoy fires, Punk:

records a tripwire_events row and traces tripwire.fired {tripwireId, surface},
lowers the offending agent's trust score (a deny-shaped signal),
enqueues a tripwire.fired webhook when webhook_url is set.

By default a trip alerts: it detects and signals but does not block. Set tenant setting tripwire_action=block to make a trip also block the run (route blocked, reason tripwire). Default-off everywhere: no armed tripwires means no scanning and no behavior change.

Streaming DLP

Trace redaction masks the stored payload. Streaming DLP masks the bytes sent to the client: exfiltration prevention at the egress. Set tenant setting streaming_dlp=true (default false) and Punk runs every emitted text chunk through the deterministic masker (emails, cards, SSNs, secret-shaped tokens including common API key prefixes and AWS access key IDs) before it leaves the gateway, on both the OpenAI and Anthropic wires plus cached/artifact responses. Non-streaming responses are masked too when the setting is on, so the policy is consistent.

A secret split across two stream chunks is still caught: the masker buffers a small tail (64 chars) across chunk boundaries and only emits the safe prefix, flushing at stream end. Punk traces dlp.redacted {hits} whenever anything was masked.

Tradeoff: masking is deterministic, so the client sees [EMAIL:abc123]-style tokens in place of the real values. You trade response fidelity (and a touch of latency from the boundary buffer) for guaranteed egress redaction. Trace redaction and streaming DLP are independent settings; turn on either, both, or neither.

Memory Quarantine

Classify the memory/context that influences a run by trust lane, and stop low-trust content from driving high-impact actions. This makes "untrusted web content can't trigger a payment" enforceable.

Declare an influence on a run (always allowed; cheap, useful telemetry even with enforcement off):

POST /api/v1/runs/:runId/memory  { "source": "web:example.com", "trustLane": "untrusted" }

Trust lanes, lowest to highest: untrusted < observed < verified < human_approved. Each declaration records a memory_influences row and a memory.influence trace event (the run detail view shows them as trust-lane badges).

Enforcement is opt-in: set memory_quarantine=true. When a run has recorded a low-trust influence (untrusted/observed) and attempts a tool/side-effect at level ≥ memory_quarantine_min_level (default 3, user-visible writes and above), the action is gated to approval_required and fails closed via the normal approval flow, unless a verified/human_approved influence on the same run covers it. Wired into both the chat router's tool governance check and the workflow tool_call path.

Audit Events

Audit records include:

Tenant.
Trace id or subject id.
Actor.
Action.
Decision.
Policy name where applicable.
Reason.
Metadata.

Use /api/v1/audit to inspect governance events.

Punk separates human identity from agent identity:

Agents authenticate with API keys (bearer tokens) and are governed by policies.
Humans authenticate with email + password and get a punk_session cookie.

User identity flows into governance:

A session resolves the tenant from the user's primary organization membership.
Admin rights come from the platform role (admin) or an org role of owner/admin.
Logins, logouts, password changes, and user create/delete/reset are all audited.
Actions taken through a session audit with actor human:<email>.

Passwords are argon2id-hashed; session tokens are stored only as hashes. Temporary passwords (invites and resets) force a password change on first sign-in. See CONFIGURATION.md for login-mode behavior and bootstrap variables.

Observe Mode

Observe mode is the safest rollout mode for new tenants:

Live provider still serves responses.
Policy denials do not block.
Caches and artifacts are not used to serve users.
Punk records ghost savings and what route it would have selected.

Use observe mode to build trust before moving to optimize mode.

//DOCS Governance