Punk in 30 Minutes
Punk is the runtime between your agents and the world. It observes real work, governs risky actions, turns web pages into compact agent context, learns repeated patterns, proves cheaper routes with evidence, and explains every routing decision.
This guide gets a local Punk checkout running, then gives each user type a simple first path. Everything works offline with the mock provider; set provider keys only when you want live model calls.
For a full pilot and production rollout plan, use the Onboarding Guide after this quickstart.
Hosted reference: punktechnologies.com. Local default: http://localhost:4100.
What you need: Bun 1.2+ and a Punk checkout.
0-5 Min: Start Punk
Start the gateway:
bun install
bun run dev
Open http://localhost:4100.
If the org is blank, Overview shows a Getting started panel. Click SEED DEMO to create the support-triage workflow template plus a demo agent, or jump straight into Chat.
To drive the full optimization loop with repeatable demo traffic, keep the gateway running and use a second terminal:
bun run demo
The dashboard seed gives you objects to inspect. The CLI demo drives live traffic, cache hits, web fetches, repeated work, evidence review, promotion, and optimized traffic at near-zero cost. Run the demo a second time and the optimized share climbs because Punk remembers what it proved.
The dashboard sections map to the system:
| Section | First thing to check |
|---|---|
| Overview | Savings, route mix, recent activity. |
| Runs | Every model request, route explanation, trace, cost, and latency. |
| Patterns | Repeated request shapes Punk discovered. |
| Artifacts | Proven optimized routes, evidence, promote/rollback. |
| Learning | Evidence, blockers, and confidence trajectory for repeated work. |
| Web | Compact page snapshots and token savings from web fetches. |
| Workflows | Built-in workflow creator, templates, runs, and node timelines. |
| Agents | Simple scheduled one-task runners built on workflows. |
| Chat | Conversation UI where every assistant reply is a real gateway run. |
| Governance | Policies, audit, users, API keys, MCP servers, credentials. |
| Billing | Plan, usage, quotas, spend, savings, and Stripe-backed upgrade flow when enabled. |
| Approvals | Human decisions for policy exceptions and artifact promotion. |
After the common setup, choose the path that matches your role.
Path A: Chat User Or Evaluator
Use this path if you want to see the runtime without wiring an app.
- Open
http://localhost:4100/#/chat. - Click
NEW CHAT. - Ask a concrete repeatable question, for example:
Classify this support ticket: Customer cannot reset password after SSO migration.
Return JSON with category and priority.
- Ask the same question again in a new chat.
- Look at the route and cost badge under each assistant reply.
What you should see:
- The first reply is a real gateway run.
- The repeated reply can route through
exact_cache. - Each assistant message links back to the underlying run.
- The run detail shows the route explanation, alternatives, policy verdict, cost, and trace events.
Turn a useful chat into a scheduled agent:
- Open the conversation.
- Click
SAVE AS AGENT. - Review the prefilled agent form.
- Add a cron schedule if needed, or leave it blank for on-demand runs.
- Click
CREATE AGENT. - Click
RUN NOW.
What happened: the chat system prompt became the agent instructions, the last user message became the prompt template, and the agent was stored as a kind: "agent" workflow with a fixed start -> llm -> output graph.
After you have a few repeated runs, open http://localhost:4100/#/learning to see which patterns were stable, what evidence exists, and why an optimization is or is not eligible.
Read next: Chat & Agents.
Path B: Workflow Builder
Use this path if you want to build multi-step agent jobs in the dashboard.
- Open
http://localhost:4100/#/workflows. - Start from the template gallery. Pick one:
| Template | Use it for |
|---|---|
support-triage | Classify a ticket, branch on priority, optionally notify. |
web-research | Fetch a URL as compact page context, summarize it with an LLM node. |
pricing-monitor | Fetch pricing pages and extract structured plan data. |
- Click
USE TEMPLATE. - Open the created workflow.
- Click
RUNand provide JSON input.
Example for support-triage:
{
"ticket": {
"subject": "SSO reset broken",
"description": "The customer cannot reset a password after SSO migration."
}
}
Example for web-research or pricing-monitor:
{
"url": "https://example.com"
}
- Open the run timeline from the result.
What you should see:
- Every workflow node emits
workflow.node.startedandworkflow.node.completedorworkflow.node.failed. - Every
llmnode is a real gateway run, so repeated node work can be cached, learned, and routed through proven optimized paths. - Workflow cost and savings roll up from child gateway runs.
- The editor is a graph creator, not generated code. The graph is validated and interpreted.
To edit from scratch:
- Click
NEW WORKFLOW. - Add nodes from the palette.
- Connect output ports to target nodes.
- Configure nodes in the inspector.
- Click
SAVE. - Click
RUN.
Useful node kinds:
| Node | Use it when |
|---|---|
llm | You need a governed model call whose output can be cached and learned. |
web_fetch | You need a URL turned into compact page context. |
choice | You need branching. |
tool_call | You need a registered MCP tool. |
notify | You need an outbound webhook. |
output | You are done and want to return a value. |
Read next: Workflows and Governance.
Path C: App Developer
Use this path if you already have an agent or app calling a model provider.
OpenAI-Compatible Apps
Change only the base URL:
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:4100/v1",
apiKey: process.env.PUNK_API_KEY ?? "punk-local",
defaultHeaders: {
"X-Punk-App": "support-app",
"X-Punk-Agent": "support-agent",
"X-Punk-Subject": "user-123"
}
});
Every existing client.chat.completions.create(...) call now flows through Punk. If the gateway runs with PUNK_API_KEY, pass that key to the provider SDK as its client key; provider keys stay on the Punk gateway, not in your app.
Anthropic-Compatible Apps
Punk also exposes an Anthropic-compatible endpoint:
POST http://localhost:4100/v1/messages
Use a claude-* model. With ANTHROPIC_API_KEY set on the gateway, Punk routes to the live Anthropic backend; otherwise the mock provider keeps local testing offline. With the official Anthropic SDK, set baseURL to http://localhost:4100 and authToken to PUNK_API_KEY when the gateway requires bearer auth.
For Chorus evaluation, keep model: "punk/chorus" and set live_synthesis_model when you want to test a specific configured solver lane. The response still comes back through the same gateway wire.
Read The First Route Explanation
Send the same request twice. Each response includes:
| Header | Meaning |
|---|---|
x-punk-run-id | The run id for trace lookup. |
x-punk-route | The selected route, such as live, exact_cache, semantic_cache, artifact, hybrid_artifact, model_substitution, or blocked. |
Inspect the run:
curl -s http://localhost:4100/api/v1/runs/<runId> | jq .run.routeExplanation
Add Tool Tracing When It Matters
The base-URL swap sees model traffic. The SDK adds tool tracing, side-effect classification, tool-result caching, feedback, and web fetch helpers.
import { Punk } from "@punk/sdk";
const punk = new Punk({
app: "support-app",
agent: "support-agent",
subject: "user-123"
});
const lookupAccount = punk.traceTool({
name: "crm.lookupAccount",
sideEffectLevel: 1,
ttlSeconds: 300,
execute: async (args: { accountId: string }) => crm.get(args.accountId)
});
const sendEmail = punk.traceTool({
name: "email.send",
sideEffectLevel: 3,
execute: async (args: { to: string; body: string }) => mailer.send(args)
});
Side-effect rule of thumb:
| Level | Meaning | Punk behavior |
|---|---|---|
| 0 | Pure computation | Cacheable and replayable. |
| 1 | Read-only external | Cacheable with TTL and replayable. |
| 2 | Reversible/idempotent write | Requires careful policy. |
| 3 | User-visible write | Not cached; suppressed in replay/shadow; policy-gated. |
| 4 | High-impact write | Live plus approval by default. |
Undeclared tools default to level 3.
Close The Loop With Feedback
const r = await punk.chat({ model: "gpt-4o", messages: [...] });
await punk.feedback(r.runId, 1);
await punk.feedback(r.runId, -1, "correct answer here");
Feedback affects pattern stability and artifact confidence.
Read next: SDK, API, and examples/.
Path D: Operator Or Admin
Use this path if you are preparing a real deployment or tenant.
- Protect gateway traffic.
PUNK_API_KEYgates/v1/*and/api/v1/*bearer clients:
PUNK_API_KEY=replace-me bun run dev
- Bootstrap dashboard login for human admins. This is idempotent and never resets a changed password:
PUNK_ADMIN_EMAIL=admin@example.com \
PUNK_ADMIN_PASSWORD='replace-me' \
PUNK_REQUIRE_LOGIN=true \
bun run dev
- Set a real credentials-vault key before storing provider keys, MCP secrets, or OAuth tokens:
PUNK_ENCRYPTION_KEY="$(openssl rand -base64 32)" bun run dev
- Use the dashboard
Governancesection to configure the tenant:
| Object | Why |
|---|---|
| Organization | Rename the active org, invite users, review per-org roles. |
| API keys | Tenant/app-scoped bearer tokens. |
| Users | Human login sessions. |
| Provider keys | Tenant BYOK credentials for configured provider families. |
| MCP servers | External tools available to workflow tool_call nodes. |
| Credentials | Stored secrets referenced as cred:<id>. |
- Open
http://localhost:4100/#/billingand decide quota posture:
- Self-host or internal deployment: set
PUNK_BILLING_DISABLED=trueif all quotas should be unlimited. - Hosted billing: set Stripe env vars and let the Billing view create checkout/portal sessions.
- Free/dev: leave Stripe unset; the local dashboard can still switch plans directly.
- For production storage, set Postgres:
PUNK_DATABASE_URL='postgres://...' bun run dev
- Pick the job runner shape:
- Persistent API process: the embedded worker drains learning, webhook, retention, and scheduled workflow jobs.
- Scale-out deployment: run additional workers against the same Postgres database:
bun run worker
- Serverless/Vercel-style deployment: configure
PUNK_CRON_SECRETandCRON_SECRETso/api/v1/internal/tickcan drain jobs once per minute.
- For safer artifact rollout, enable canaries:
curl -X PUT http://localhost:4100/api/v1/settings \
-H 'content-type: application/json' \
-H 'authorization: Bearer replace-me' \
-d '{ "key": "canary_enabled", "value": true }'
Also decide tenant settings for redaction, retention_days, semantic_cache, model_substitutions, model_substitution_enabled, webhook_url, and approval_exception_ttl_hours.
Read next: Configuration, Governance, Accounts & Orgs, Billing & Usage, and Troubleshooting.
How Optimization Is Approved
Whether traffic comes from the gateway, chat, an agent, or a workflow, the promotion model is the same:
- Punk observes repeated request shapes.
- The learning loop groups them into patterns.
- Punk prepares candidate optimized routes when the task is stable enough.
- Evidence checks confirm the candidate is safe to serve.
- A human or configured policy promotes it when approval is required.
- Matching future traffic routes through the cheapest safe path.
Force a learning pass when you want to inspect current evidence:
curl -X POST http://localhost:4100/api/v1/learning/tick
In protected mode, add Authorization: Bearer <token>.
Environment Quick Reference
| Area | Variables | Effect |
|---|---|---|
| Provider | OPENAI_API_KEY / OPENAI_BASE_URL | Live OpenAI provider. |
| Provider | ANTHROPIC_API_KEY / ANTHROPIC_BASE_URL | Live Anthropic Messages provider. |
| Provider | OPENROUTER_API_KEY / OPENROUTER_BASE_URL | Live OpenRouter portfolio provider for DeepSeek, Kimi/Moonshot, Google, Anthropic, OpenAI, and other routed model slugs. |
| Provider | DEEPSEEK_API_KEY / DEEPSEEK_BASE_URL | Direct DeepSeek provider. |
| Provider | MOONSHOT_API_KEY or KIMI_API_KEY | Direct Moonshot/Kimi provider. |
| Provider | PUNK_PROVIDER=mock | Force the offline provider. |
| Storage | PUNK_DATABASE_URL | Postgres/Neon-compatible storage. |
| Storage | PUNK_DB_PATH | SQLite path, default data/punk.db. |
| Auth | PUNK_API_KEY | Require bearer auth for protected routes and gateway calls. |
| Auth | PUNK_ADMIN_EMAIL / PUNK_ADMIN_PASSWORD / PUNK_REQUIRE_LOGIN=true | Bootstrap dashboard login and force login mode. |
| Identity | PUNK_ALLOW_PUBLIC_SIGNUP / PUNK_APP_BASE_URL | Enable public signup and set invite/verification links. |
RESEND_API_KEY / PUNK_EMAIL_FROM | Send real invite/verification email instead of console logs. | |
| Secrets | PUNK_ENCRYPTION_KEY | 32-byte base64 key for stored credentials. |
| Learning | PUNK_AUTO_PROMOTE / PUNK_LEARN_INTERVAL_MS | Hands-free promotion and learning cadence. |
| Workers | PUNK_WORKER_POLL_MS / PUNK_WORKER_CONCURRENCY | Embedded or standalone worker polling and concurrency. |
| Serverless | PUNK_CRON_SECRET / CRON_SECRET | Enable the secret-gated tick endpoint for scheduled jobs on Vercel-style deployments. |
| Retention | PUNK_RETENTION_DAYS | Trace/audit retention sweep window. |
| Safety | PUNK_ALLOW_PRIVATE_WEB_FETCH / PUNK_ALLOW_PRIVATE_WEBHOOKS | Explicit private-network escape hatches; leave off in production unless intended. |
| Billing | PUNK_BILLING_DISABLED / STRIPE_SECRET_KEY / STRIPE_WEBHOOK_SECRET / STRIPE_PRICE_* | Disable quotas for self-hosting or enable Stripe billing. |
| Governance | PUNK_POLICIES_DIR | Optional policy directory. |
| Public site | PUNK_MARKETING_HOST / PUNK_MEET_HOST / PUNK_APP_HOST / PUNK_DOCS_DIR | Serve the marketing splash or product deck by host, redirect /app, or override docs source. |
What To Read Next
- Docs Home: reader paths and full docs map.
- Onboarding Guide: extended pilot and rollout plan from workflow selection to production operation.
- Chat & Agents: chat economics, save-as-agent, scheduled task agents.
- Workflows: workflow creator, IR, templates, scheduling, MCP tools.
- SDK: TypeScript client reference.
- API: HTTP routes, auth, identity headers, response conventions.
- Configuration: env vars, auth modes, database choices, tenant settings.
- Governance: policies, trust, approvals, audit, observe mode.
- Accounts & Orgs: users, orgs, invitations, public signup, email.
- Billing & Usage: plans, quotas, usage metering, Stripe.