Punk Docs - 30 Minutes

Punk in 30 Minutes

Punk is the runtime between your agents and the world. It observes real work, governs risky actions, turns web pages into compact agent context, learns repeated patterns, proves cheaper routes with evidence, and explains every routing decision.

This guide gets a local Punk checkout running, then gives each user type a simple first path. Everything works offline with the mock provider; set provider keys only when you want live model calls.

For a full pilot and production rollout plan, use the Onboarding Guide after this quickstart.

Hosted reference: punktechnologies.com. Local default: http://localhost:4100.

What you need: Bun 1.2+ and a Punk checkout.

0-5 Min: Start Punk

Start the gateway:

bun install
bun run dev

Open http://localhost:4100.

If the org is blank, Overview shows a Getting started panel. Click SEED DEMO to create the support-triage workflow template plus a demo agent, or jump straight into Chat.

To drive the full optimization loop with repeatable demo traffic, keep the gateway running and use a second terminal:

bun run demo

The dashboard seed gives you objects to inspect. The CLI demo drives live traffic, cache hits, web fetches, repeated work, evidence review, promotion, and optimized traffic at near-zero cost. Run the demo a second time and the optimized share climbs because Punk remembers what it proved.

The dashboard sections map to the system:

Section	First thing to check
Overview	Savings, route mix, recent activity.
Runs	Every model request, route explanation, trace, cost, and latency.
Patterns	Repeated request shapes Punk discovered.
Artifacts	Proven optimized routes, evidence, promote/rollback.
Learning	Evidence, blockers, and confidence trajectory for repeated work.
Web	Compact page snapshots and token savings from web fetches.
Workflows	Built-in workflow creator, templates, runs, and node timelines.
Agents	Simple scheduled one-task runners built on workflows.
Chat	Conversation UI where every assistant reply is a real gateway run.
Governance	Policies, audit, users, API keys, MCP servers, credentials.
Billing	Plan, usage, quotas, spend, savings, and Stripe-backed upgrade flow when enabled.
Approvals	Human decisions for policy exceptions and artifact promotion.

After the common setup, choose the path that matches your role.

Path A: Chat User Or Evaluator

Use this path if you want to see the runtime without wiring an app.

Open http://localhost:4100/#/chat.
Click NEW CHAT.
Ask a concrete repeatable question, for example:

Classify this support ticket: Customer cannot reset password after SSO migration.
Return JSON with category and priority.

Ask the same question again in a new chat.
Look at the route and cost badge under each assistant reply.

What you should see:

The first reply is a real gateway run.
The repeated reply can route through exact_cache.
Each assistant message links back to the underlying run.
The run detail shows the route explanation, alternatives, policy verdict, cost, and trace events.

Turn a useful chat into a scheduled agent:

Open the conversation.
Click SAVE AS AGENT.
Review the prefilled agent form.
Add a cron schedule if needed, or leave it blank for on-demand runs.
Click CREATE AGENT.
Click RUN NOW.

What happened: the chat system prompt became the agent instructions, the last user message became the prompt template, and the agent was stored as a kind: "agent" workflow with a fixed start -> llm -> output graph.

After you have a few repeated runs, open http://localhost:4100/#/learning to see which patterns were stable, what evidence exists, and why an optimization is or is not eligible.

Path B: Workflow Builder

Use this path if you want to build multi-step agent jobs in the dashboard.

Open http://localhost:4100/#/workflows.
Start from the template gallery. Pick one:

Template	Use it for
`support-triage`	Classify a ticket, branch on priority, optionally notify.
`web-research`	Fetch a URL as compact page context, summarize it with an LLM node.
`pricing-monitor`	Fetch pricing pages and extract structured plan data.

Click USE TEMPLATE.
Open the created workflow.
Click RUN and provide JSON input.

Example for support-triage:

{
  "ticket": {
    "subject": "SSO reset broken",
    "description": "The customer cannot reset a password after SSO migration."
  }
}

Example for web-research or pricing-monitor:

{
  "url": "https://example.com"
}

Open the run timeline from the result.

What you should see:

Every workflow node emits workflow.node.started and workflow.node.completed or workflow.node.failed.
Every llm node is a real gateway run, so repeated node work can be cached, learned, and routed through proven optimized paths.
Workflow cost and savings roll up from child gateway runs.
The editor is a graph creator, not generated code. The graph is validated and interpreted.

To edit from scratch:

Click NEW WORKFLOW.
Add nodes from the palette.
Connect output ports to target nodes.
Configure nodes in the inspector.
Click SAVE.
Click RUN.

Useful node kinds:

Node	Use it when
`llm`	You need a governed model call whose output can be cached and learned.
`web_fetch`	You need a URL turned into compact page context.
`choice`	You need branching.
`tool_call`	You need a registered MCP tool.
`notify`	You need an outbound webhook.
`output`	You are done and want to return a value.

Path C: App Developer

Use this path if you already have an agent or app calling a model provider.

OpenAI-Compatible Apps

Change only the base URL:

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:4100/v1",
  apiKey: process.env.PUNK_API_KEY ?? "punk-local",
  defaultHeaders: {
    "X-Punk-App": "support-app",
    "X-Punk-Agent": "support-agent",
    "X-Punk-Subject": "user-123"
  }
});

Every existing client.chat.completions.create(...) call now flows through Punk. If the gateway runs with PUNK_API_KEY, pass that key to the provider SDK as its client key; provider keys stay on the Punk gateway, not in your app.

Anthropic-Compatible Apps

Punk also exposes an Anthropic-compatible endpoint:

POST http://localhost:4100/v1/messages

Use a claude-* model. With ANTHROPIC_API_KEY set on the gateway, Punk routes to the live Anthropic backend; otherwise the mock provider keeps local testing offline. With the official Anthropic SDK, set baseURL to http://localhost:4100 and authToken to PUNK_API_KEY when the gateway requires bearer auth.

For Chorus evaluation, keep model: "punk/chorus" and set live_synthesis_model when you want to test a specific configured solver lane. The response still comes back through the same gateway wire.

Read The First Route Explanation

Send the same request twice. Each response includes:

Header	Meaning
`x-punk-run-id`	The run id for trace lookup.
`x-punk-route`	The selected route, such as `live`, `exact_cache`, `semantic_cache`, `artifact`, `hybrid_artifact`, `model_substitution`, or `blocked`.

Inspect the run:

curl -s http://localhost:4100/api/v1/runs/<runId> | jq .run.routeExplanation

Add Tool Tracing When It Matters

The base-URL swap sees model traffic. The SDK adds tool tracing, side-effect classification, tool-result caching, feedback, and web fetch helpers.

import { Punk } from "@punk/sdk";

const punk = new Punk({
  app: "support-app",
  agent: "support-agent",
  subject: "user-123"
});

const lookupAccount = punk.traceTool({
  name: "crm.lookupAccount",
  sideEffectLevel: 1,
  ttlSeconds: 300,
  execute: async (args: { accountId: string }) => crm.get(args.accountId)
});

const sendEmail = punk.traceTool({
  name: "email.send",
  sideEffectLevel: 3,
  execute: async (args: { to: string; body: string }) => mailer.send(args)
});

Side-effect rule of thumb:

Level	Meaning	Punk behavior
0	Pure computation	Cacheable and replayable.
1	Read-only external	Cacheable with TTL and replayable.
2	Reversible/idempotent write	Requires careful policy.
3	User-visible write	Not cached; suppressed in replay/shadow; policy-gated.
4	High-impact write	Live plus approval by default.

Undeclared tools default to level 3.

Close The Loop With Feedback

const r = await punk.chat({ model: "gpt-4o", messages: [...] });
await punk.feedback(r.runId, 1);
await punk.feedback(r.runId, -1, "correct answer here");

Feedback affects pattern stability and artifact confidence.

Path D: Operator Or Admin

Use this path if you are preparing a real deployment or tenant.

Protect gateway traffic. PUNK_API_KEY gates /v1/* and /api/v1/* bearer clients:

PUNK_API_KEY=replace-me bun run dev

Bootstrap dashboard login for human admins. This is idempotent and never resets a changed password:

PUNK_ADMIN_EMAIL=admin@example.com \
PUNK_ADMIN_PASSWORD='replace-me' \
PUNK_REQUIRE_LOGIN=true \
bun run dev

Set a real credentials-vault key before storing provider keys, MCP secrets, or OAuth tokens:

PUNK_ENCRYPTION_KEY="$(openssl rand -base64 32)" bun run dev

Use the dashboard Governance section to configure the tenant:

Object	Why
Organization	Rename the active org, invite users, review per-org roles.
API keys	Tenant/app-scoped bearer tokens.
Users	Human login sessions.
Provider keys	Tenant BYOK credentials for configured provider families.
MCP servers	External tools available to workflow `tool_call` nodes.
Credentials	Stored secrets referenced as `cred:<id>`.

Open http://localhost:4100/#/billing and decide quota posture:

Self-host or internal deployment: set PUNK_BILLING_DISABLED=true if all quotas should be unlimited.
Hosted billing: set Stripe env vars and let the Billing view create checkout/portal sessions.
Free/dev: leave Stripe unset; the local dashboard can still switch plans directly.

For production storage, set Postgres:

PUNK_DATABASE_URL='postgres://...' bun run dev

Pick the job runner shape:

Persistent API process: the embedded worker drains learning, webhook, retention, and scheduled workflow jobs.
Scale-out deployment: run additional workers against the same Postgres database:

bun run worker

Serverless/Vercel-style deployment: configure PUNK_CRON_SECRET and CRON_SECRET so /api/v1/internal/tick can drain jobs once per minute.

For safer artifact rollout, enable canaries:

curl -X PUT http://localhost:4100/api/v1/settings \
  -H 'content-type: application/json' \
  -H 'authorization: Bearer replace-me' \
  -d '{ "key": "canary_enabled", "value": true }'

Also decide tenant settings for redaction, retention_days, semantic_cache, model_substitutions, model_substitution_enabled, webhook_url, and approval_exception_ttl_hours.

How Optimization Is Approved

Whether traffic comes from the gateway, chat, an agent, or a workflow, the promotion model is the same:

Punk observes repeated request shapes.
The learning loop groups them into patterns.
Punk prepares candidate optimized routes when the task is stable enough.
Evidence checks confirm the candidate is safe to serve.
A human or configured policy promotes it when approval is required.
Matching future traffic routes through the cheapest safe path.

Force a learning pass when you want to inspect current evidence:

curl -X POST http://localhost:4100/api/v1/learning/tick

In protected mode, add Authorization: Bearer <token>.

Environment Quick Reference

Area	Variables	Effect
Provider	`OPENAI_API_KEY` / `OPENAI_BASE_URL`	Live OpenAI provider.
Provider	`ANTHROPIC_API_KEY` / `ANTHROPIC_BASE_URL`	Live Anthropic Messages provider.
Provider	`OPENROUTER_API_KEY` / `OPENROUTER_BASE_URL`	Live OpenRouter portfolio provider for DeepSeek, Kimi/Moonshot, Google, Anthropic, OpenAI, and other routed model slugs.
Provider	`DEEPSEEK_API_KEY` / `DEEPSEEK_BASE_URL`	Direct DeepSeek provider.
Provider	`MOONSHOT_API_KEY` or `KIMI_API_KEY`	Direct Moonshot/Kimi provider.
Provider	`PUNK_PROVIDER=mock`	Force the offline provider.
Storage	`PUNK_DATABASE_URL`	Postgres/Neon-compatible storage.
Storage	`PUNK_DB_PATH`	SQLite path, default `data/punk.db`.
Auth	`PUNK_API_KEY`	Require bearer auth for protected routes and gateway calls.
Auth	`PUNK_ADMIN_EMAIL` / `PUNK_ADMIN_PASSWORD` / `PUNK_REQUIRE_LOGIN=true`	Bootstrap dashboard login and force login mode.
Identity	`PUNK_ALLOW_PUBLIC_SIGNUP` / `PUNK_APP_BASE_URL`	Enable public signup and set invite/verification links.
Email	`RESEND_API_KEY` / `PUNK_EMAIL_FROM`	Send real invite/verification email instead of console logs.
Secrets	`PUNK_ENCRYPTION_KEY`	32-byte base64 key for stored credentials.
Learning	`PUNK_AUTO_PROMOTE` / `PUNK_LEARN_INTERVAL_MS`	Hands-free promotion and learning cadence.
Workers	`PUNK_WORKER_POLL_MS` / `PUNK_WORKER_CONCURRENCY`	Embedded or standalone worker polling and concurrency.
Serverless	`PUNK_CRON_SECRET` / `CRON_SECRET`	Enable the secret-gated tick endpoint for scheduled jobs on Vercel-style deployments.
Retention	`PUNK_RETENTION_DAYS`	Trace/audit retention sweep window.
Safety	`PUNK_ALLOW_PRIVATE_WEB_FETCH` / `PUNK_ALLOW_PRIVATE_WEBHOOKS`	Explicit private-network escape hatches; leave off in production unless intended.
Billing	`PUNK_BILLING_DISABLED` / `STRIPE_SECRET_KEY` / `STRIPE_WEBHOOK_SECRET` / `STRIPE_PRICE_*`	Disable quotas for self-hosting or enable Stripe billing.
Governance	`PUNK_POLICIES_DIR`	Optional policy directory.
Public site	`PUNK_MARKETING_HOST` / `PUNK_MEET_HOST` / `PUNK_APP_HOST` / `PUNK_DOCS_DIR`	Serve the marketing splash or product deck by host, redirect `/app`, or override docs source.

//DOCS 30 Minutes