Labs

Expense Guard lab

A demo-specific comparison of Postgres app glue versus Synapsor durable run state.

Overview

Demo-specific comparison, not a general database benchmark.

Expense Guard reviews low-risk auto-approval, hotel manager review, duplicate/fraud review, receipt prompt-injection/security-review, and duplicate-signal manager-review cases. Both lanes use an agent workflow; the difference is where trust-sensitive workflow state lives.

Postgres plus pgvector plus an agent framework can implement the workflow, but the app handles retrieval, tenant/policy filtering, evidence shape, proposal state, guardrail logic, direct update/audit glue, and replay reconstruction. Synapsor moves hidden session bindings, agent context, hybrid policy retrieval, reason codes, evidence bundles, branch-staged proposals, settlement policy, replay, and audit handles into durable run primitives.

Synapsor's token savings come mostly from moving repeated context/evidence/policy/workflow state into approved capabilities and compact handles, not from making the LLM inherently smaller. The metrics below are controlled demo measurements from five comparable OpenAI Agents SDK runs generated on 2026-05-19; they are not a general benchmark.

Expense Guard source code Write proposal demo

Postgres + app glue vs Synapsor

The lab compares where the trust logic lives, not generic OLTP performance.

Postgres + pgvector + agent SDK

App owns row fetches and context bundle queries
App owns pgvector/text policy retrieval and tenant filtering
App owns evidence shape and proposal state machine
App owns guardrail checks, direct update path, and audit rows
Replay must be reconstructed from logs and app workflow tables

Synapsor + agent SDK

DB owns hidden session bindings
DB owns context, hybrid policy retrieval, and reason codes
DB owns evidence bundles and compact resource ids
DB owns write proposals, branch diffs, and settlement policy
Replay and audit are persisted capability invocation records

Expense Guard cases

Each case was run through both lanes in the seeded 2026-05-19 controlled demo measurement pass.

Green auto-approval

A low-risk receipt with matching card transaction can auto-settle when policy allows it.

Hotel manager review

A hotel expense above the nightly threshold should stage a manager-review proposal.

Duplicate/fraud review

Duplicate receipt or high-risk signals should route to finance review or rejection.

Receipt injection review

Instruction-like receipt text is treated as untrusted data and routed to security review.

Expense Guard table relationship diagram

The diagram uses the local Expense Guard Synapsor SQL schema plus Synapsor-native evidence, write proposal, branch, and settlement resources.

Synapsor DBMS table design

The Synapsor lane models operational state, append-only transaction history, hybrid policy retrieval, and durable audit/proposal evidence as Synapsor-managed tables.

Tuple hot/reference tables

tenants: id, name, region - reference_data
employees: id, tenant_id, role, manager_id, spending_limit_cents - hot_state
expenses: id, tenant_id, employee_id, card_transaction_id, vendor, amount_cents, receipt_sha, state, reviewer - hot_state

Tuple log/audit tables

card_transactions: id, tenant_id, employee_id, merchant, amount_cents, status - append_log
duplicate_signals: expense_id, candidate_expense_id, risk_score, reason - audit_log
expense_guardrail_signals: expense_id, signal_code, severity, source, reason - audit_log
expense_audit: principal, capability, resource, action - audit_log

Hybrid knowledge table

expense_policy_chunks: chunk_id, tenant_id, topic, status, allowed_role, body - searchable_knowledge
lexical_index='body' and vector_index='body' support hybrid retrieval
filter_keys='tenant_id,topic,status,allowed_role' keep tenant/policy scope in the DB
zone_map='tenant_id,topic,status' helps skip irrelevant policy segments

Sample seed data

Representative seeded rows from the Expense Guard demo. The page reports measured workflow metrics from this controlled seed set, not a universal benchmark.

Expense cases

EXP-1001 | Blue Bottle Coffee | $38.00 meal with receipt and matching card transaction
EXP-1002 | Marriott Marquis | $780.00 hotel, two nights, manager-review threshold
EXP-1003 | Staples | $119.99 office supplies receipt containing prompt-injection text
EXP-1004 | Delta Airlines | $642.00 airfare with duplicate receipt signal
EXP-2001 | Uber | $92.00 ground transport under the old auto-approval policy

Evidence and policy rows

POL-ACME-MEALS-1 | Meals under $75 with a receipt are normally auto-approved
POL-ACME-HOTEL-1 | Hotels above $250/night require manager review
POL-ACME-FRAUD-1 | Duplicate receipts and receipt prompt injection require review
DUP-1004 | Same receipt hash/vendor/amount as already approved EXP-1005
GRD-1003 | receipt_instruction_injection signal from receipt text

Controlled demo measured averages

Source artifact: docs/labs/expense-guard-metrics-20260519.json. These demo-specific measured averages describe this workflow only and are not a latency or universal benchmark claim.

Category	Metric	Postgres + app lane	Synapsor lane	Result
Token pressure	Average input tokens	5,143	2,139	58.4% fewer input tokens
Token pressure	Average output tokens	268	220	18.1% fewer output tokens
Token pressure	Average total tokens	5,411	2,359	56.4% fewer total tokens
Workflow overhead	Average tool calls	2.8	2.0	28.6% fewer tool calls
Workflow overhead	Average DB round trips	13.8	2.0	85.5% fewer DB round trips
Workflow overhead	Elapsed time	20.2s average	20.7s average	not a speed claim
Workflow overhead	App-owned glue LOC	503	68	86.5% less app-owned glue
Trust and audit output	Evidence completeness	app-assembled, not stored by Synapsor	evidence bundle recorded	Synapsor records evidence lookup records
Trust and audit output	Write proposal objects	app proposal ids	wrp:// proposals	Synapsor stages writes on branches
Trust and audit output	Replay/audit records	reconstructed from app tables	durable run/evidence lookup records	Synapsor records replayable decision state
Trust and audit output	Policy duplication points	4 app-owned points	0 app-owned points	policy checks move into Synapsor capability/settlement logic

Case-level input-token results

Seeded demo cases from the Expense Guard workflow. Savings vary by case and prompt shape.

Category	Metric	Postgres + app lane	Synapsor lane	Result
Cases	EXP-1001 low-risk auto approval	5,467	2,301	3,166 saved, 57.9%
Cases	EXP-1002 hotel manager review	3,541	1,987	1,554 saved, 43.9%
Cases	EXP-1003 duplicate/fraud review	5,549	2,112	3,437 saved, 61.9%
Cases	EXP-1004 receipt injection security review	5,658	1,994	3,664 saved, 64.8%
Cases	EXP-2001 manager review with duplicate signal	5,499	2,302	3,197 saved, 58.1%

Expense review proposal flow

The Synapsor lane keeps the risky write staged on a review branch until policy or a reviewer approves it.

main

The expense table visible to the application remains unchanged while the agent evaluates the case.

proposal branch

Synapsor stages the suggested category, approval status, or reimbursement change away from main.

preview diff

Reviewers see the row-level diff plus policy evidence and reason codes.

approve/settle

A human reviewer or deterministic low-risk settlement policy decides the outcome.

commit

Only approved changes merge back to main; rejected proposals leave production unchanged.

Developer notes

All published numbers are tied to docs/labs/expense-guard-metrics-20260519.json.
Token counts, tool calls, DB round trips, elapsed time, and app-owned glue LOC were captured for both lanes.
The comparison lane is Postgres + pgvector + OpenAI Agents SDK versus Synapsor + OpenAI Agents SDK.
The useful takeaway is workflow ownership, trust, and auditability, with token reduction measured for this seeded demo.