Overview
Demo-specific comparison, not a general database benchmark.
Expense Guard reviews low-risk auto-approval, hotel manager review, duplicate/fraud review, receipt prompt-injection/security-review, and duplicate-signal manager-review cases. Both lanes use an agent workflow; the difference is where trust-sensitive workflow state lives.
Postgres plus pgvector plus an agent framework can implement the workflow, but the app handles retrieval, tenant/policy filtering, evidence shape, proposal state, guardrail logic, direct update/audit glue, and replay reconstruction. Synapsor moves hidden session bindings, agent context, hybrid policy retrieval, reason codes, evidence bundles, branch-staged proposals, settlement policy, replay, and audit handles into durable run primitives.
Synapsor's token savings come mostly from moving repeated context/evidence/policy/workflow state into approved capabilities and compact handles, not from making the LLM inherently smaller. The metrics below are controlled demo measurements from five comparable OpenAI Agents SDK runs generated on 2026-05-19; they are not a general benchmark.
Postgres + app glue vs Synapsor
The lab compares where the trust logic lives, not generic OLTP performance.
- App owns row fetches and context bundle queries
- App owns pgvector/text policy retrieval and tenant filtering
- App owns evidence shape and proposal state machine
- App owns guardrail checks, direct update path, and audit rows
- Replay must be reconstructed from logs and app workflow tables
- DB owns hidden session bindings
- DB owns context, hybrid policy retrieval, and reason codes
- DB owns evidence bundles and compact resource ids
- DB owns write proposals, branch diffs, and settlement policy
- Replay and audit are persisted capability invocation records
Expense Guard cases
Each case was run through both lanes in the seeded 2026-05-19 controlled demo measurement pass.
A low-risk receipt with matching card transaction can auto-settle when policy allows it.
A hotel expense above the nightly threshold should stage a manager-review proposal.
Duplicate receipt or high-risk signals should route to finance review or rejection.
Instruction-like receipt text is treated as untrusted data and routed to security review.
Expense Guard table relationship diagram
The diagram uses the local Expense Guard Synapsor SQL schema plus Synapsor-native evidence, write proposal, branch, and settlement resources.
Synapsor DBMS table design
The Synapsor lane models operational state, append-only transaction history, hybrid policy retrieval, and durable audit/proposal evidence as Synapsor-managed tables.
- tenants: id, name, region - reference_data
- employees: id, tenant_id, role, manager_id, spending_limit_cents - hot_state
- expenses: id, tenant_id, employee_id, card_transaction_id, vendor, amount_cents, receipt_sha, state, reviewer - hot_state
- card_transactions: id, tenant_id, employee_id, merchant, amount_cents, status - append_log
- duplicate_signals: expense_id, candidate_expense_id, risk_score, reason - audit_log
- expense_guardrail_signals: expense_id, signal_code, severity, source, reason - audit_log
- expense_audit: principal, capability, resource, action - audit_log
- expense_policy_chunks: chunk_id, tenant_id, topic, status, allowed_role, body - searchable_knowledge
- lexical_index='body' and vector_index='body' support hybrid retrieval
- filter_keys='tenant_id,topic,status,allowed_role' keep tenant/policy scope in the DB
- zone_map='tenant_id,topic,status' helps skip irrelevant policy segments
Sample seed data
Representative seeded rows from the Expense Guard demo. The page reports measured workflow metrics from this controlled seed set, not a universal benchmark.
- EXP-1001 | Blue Bottle Coffee | $38.00 meal with receipt and matching card transaction
- EXP-1002 | Marriott Marquis | $780.00 hotel, two nights, manager-review threshold
- EXP-1003 | Staples | $119.99 office supplies receipt containing prompt-injection text
- EXP-1004 | Delta Airlines | $642.00 airfare with duplicate receipt signal
- EXP-2001 | Uber | $92.00 ground transport under the old auto-approval policy
- POL-ACME-MEALS-1 | Meals under $75 with a receipt are normally auto-approved
- POL-ACME-HOTEL-1 | Hotels above $250/night require manager review
- POL-ACME-FRAUD-1 | Duplicate receipts and receipt prompt injection require review
- DUP-1004 | Same receipt hash/vendor/amount as already approved EXP-1005
- GRD-1003 | receipt_instruction_injection signal from receipt text
Controlled demo measured averages
Source artifact: docs/labs/expense-guard-metrics-20260519.json. These demo-specific measured averages describe this workflow only and are not a latency or universal benchmark claim.
| Category | Metric | Postgres + app lane | Synapsor lane | Result |
|---|---|---|---|---|
| Token pressure | Average input tokens | 5,143 | 2,139 | 58.4% fewer input tokens |
| Token pressure | Average output tokens | 268 | 220 | 18.1% fewer output tokens |
| Token pressure | Average total tokens | 5,411 | 2,359 | 56.4% fewer total tokens |
| Workflow overhead | Average tool calls | 2.8 | 2.0 | 28.6% fewer tool calls |
| Workflow overhead | Average DB round trips | 13.8 | 2.0 | 85.5% fewer DB round trips |
| Workflow overhead | Elapsed time | 20.2s average | 20.7s average | not a speed claim |
| Workflow overhead | App-owned glue LOC | 503 | 68 | 86.5% less app-owned glue |
| Trust and audit output | Evidence completeness | app-assembled, not stored by Synapsor | evidence bundle recorded | Synapsor records evidence lookup records |
| Trust and audit output | Write proposal objects | app proposal ids | wrp:// proposals | Synapsor stages writes on branches |
| Trust and audit output | Replay/audit records | reconstructed from app tables | durable run/evidence lookup records | Synapsor records replayable decision state |
| Trust and audit output | Policy duplication points | 4 app-owned points | 0 app-owned points | policy checks move into Synapsor capability/settlement logic |
Case-level input-token results
Seeded demo cases from the Expense Guard workflow. Savings vary by case and prompt shape.
| Category | Metric | Postgres + app lane | Synapsor lane | Result |
|---|---|---|---|---|
| Cases | EXP-1001 low-risk auto approval | 5,467 | 2,301 | 3,166 saved, 57.9% |
| Cases | EXP-1002 hotel manager review | 3,541 | 1,987 | 1,554 saved, 43.9% |
| Cases | EXP-1003 duplicate/fraud review | 5,549 | 2,112 | 3,437 saved, 61.9% |
| Cases | EXP-1004 receipt injection security review | 5,658 | 1,994 | 3,664 saved, 64.8% |
| Cases | EXP-2001 manager review with duplicate signal | 5,499 | 2,302 | 3,197 saved, 58.1% |
Expense review proposal flow
The Synapsor lane keeps the risky write staged on a review branch until policy or a reviewer approves it.
The expense table visible to the application remains unchanged while the agent evaluates the case.
Synapsor stages the suggested category, approval status, or reimbursement change away from main.
Reviewers see the row-level diff plus policy evidence and reason codes.
A human reviewer or deterministic low-risk settlement policy decides the outcome.
Only approved changes merge back to main; rejected proposals leave production unchanged.
Developer notes
- All published numbers are tied to docs/labs/expense-guard-metrics-20260519.json.
- Token counts, tool calls, DB round trips, elapsed time, and app-owned glue LOC were captured for both lanes.
- The comparison lane is Postgres + pgvector + OpenAI Agents SDK versus Synapsor + OpenAI Agents SDK.
- The useful takeaway is workflow ownership, trust, and auditability, with token reduction measured for this seeded demo.