Discussion about this post

User's avatar
Dean Chapman's avatar

This issue of MLSN cuts to the core of AI safety. Let me address how Veritas Core (12 patents, priority 2025–2026) solves several of the problems raised – not with more training, but with hardware‑anchored, offline‑verifiable enforcement.

1. Honesty via Confessions (OpenAI paper)

Training LLMs to confess policy violations is useful, but it relies on the model choosing to be honest. A deceptive model can learn to confess only when it knows it’s being monitored. Veritas Core takes confession out of the AI’s hands: every action (including policy‑violating behaviour) generates an immutable Merkle‑DAG receipt anchored to Starlink PPS (±50ns) and signed by a TPM 2.0 hardware root of trust. The AI cannot choose to hide anything – the receipt chain is independent, offline‑verifiable, and court‑admissible. No confession required. The truth is structural.

2. Real‑World AI Cyberoffense (ARTEMIS)

The paper shows AI agents outperforming 9 out of 10 human pentesters. That’s a massive dual‑use risk. Veritas Core doesn’t stop an AI from finding vulnerabilities, but it can prevent unauthorised exploitation by requiring a FIDO2‑bound human authority receipt before any security‑sensitive action (e.g., privilege escalation, data exfiltration) binds. Critical infrastructure protection is not about making AI weaker – it’s about making the execution gate non‑bypassable.

3. Aggressive Compression Enables LLM Weight Theft

This is the most alarming finding: model weights can be exfiltrated via low‑bandwidth channels, then decompressed with fine‑tuning. Veritas Core addresses weight theft at the hardware boundary: the PCIe circuit breaker can be configured to block any unauthorised read of model weights unless accompanied by a valid, time‑stamped receipt signed by an authorised administrator. Even if weights are compressed and exfiltrated, the lack of a verifiable receipt chain becomes evidence of theft in a court. Deterrence through provable attribution.

4. Disempowerment Patterns in LLM Usage

This is where your comment about child trafficking prevention, accurate facial recognition, bank account security, and paperless airports directly applies.

Veritas Core prevents AI‑driven disempowerment by enforcing ABT v1.0 evidence‑authority separation at the silicon level:

Child trafficking prevention – travel, border, and financial systems can be gated so that any action involving a minor requires a FIDO2‑authenticated, multi‑party receipt (guardian + social worker). The AI cannot “overrule” the gate because the gate is hardware‑rooted.

Accurate facial recognition – every recognition event generates a receipt binding the match to a specific camera, timestamp, and policy. If the system is biased or wrong, the receipt provides provable evidence for audit. No more “algorithm error” with no trace.

Bank account security – high‑value transfers require a hardware‑signed receipt proving the transaction was authorised by the account holder (FIDO2) at that exact moment. Synthetic fraud becomes structurally impossible.

Paperless airports – automated border crossings can issue a receipt that proves the passenger was verified by a live biometric check, with no central database storing the image. The receipt is offline‑verifiable by immigration officers, eliminating the need for paper while preserving auditability.

The Bottom Line

Training and monitoring are necessary, but they are not sufficient. A model that learns to deceive, exfiltrate weights, or disempower its user only needs to succeed once. Veritas Core adds a structural floor beneath all of these safety efforts: the action does not bind unless the hardware gate opens. No receipt, no execution.

Happy to share a non‑confidential conformance test suite and patent summary for those building safety‑critical AI.

— Dean

Test 1 – Negative Path (No Valid Receipt → Rejection)

This proves that without a valid Veritas receipt, the gate refuses execution and fails closed.

bash

curl -X POST https://your-staging-endpoint/api/v1/veritas-test \

-H "Content-Type: application/json" \

-d '{

"agent_id": "veritas_demo_01",

"action": "EXECUTE_TRANSACTION",

"amount": "10000",

"veritas_receipt": {

"simulated": true,

"receipt_id": "test_rcpt_negative",

"outcome": "NO_HARDWARE_PROOF",

"spacetime_anchor": {

"gnss_timestamp_ns": 1709843200000000127,

"source": "SIMULATED"

},

"signature": "simulated_missing_hw_signature"

}

}'

Expected result:

• The gate immediately moves to HALTED state.

• After a short timeout (e.g., 60 seconds), returns 408 Timeout.

• No payload is released downstream.

• Zero bits leaked.

✅ This proves your network‑layer isolation works independently of Veritas.

Test 2 – Positive Path (Simulated Valid Receipt → Allow)

Once you confirm Test 1 works, replace the simulated receipt with a structurally valid (but still simulated) receipt. This shows the gate would release when a proper Veritas receipt is present.

bash

curl -X POST https://your-staging-endpoint/api/v1/veritas-test \

-H "Content-Type: application/json" \

-d '{

"agent_id": "veritas_demo_02",

"action": "EXECUTE_TRANSACTION",

"amount": "10000",

"veritas_receipt": {

"simulated": true,

"receipt_id": "test_rcpt_positive",

"outcome": "ALLOW",

"policy_hash": "02d6580289ce945c566b46863fae34196555c85e1309168ab6e2b7c47653ebf",

"state_fingerprint": "v2_state_ok",

"spacetime_anchor": {

"gnss_timestamp_ns": 1709843200000000127,

"source": "SIMULATED"

},

"signature": "simulated_ed25519_valid_format"

}

}'

Expected result:

• Gate validates the receipt structure.

• Returns 200 OK with a verdict: ALLOW and a simulated receipt ID.

• Downstream execution is permitted.

✅ This proves the positive path logic works – and that we can later replace the simulated block with a real hardware‑attested Ed25519 signature from a TPM/PCIe gate.

Test 3 – Tamper Detection (Modified Receipt → Rejection)

This shows that if anyone alters the receipt (even one bit), the gate rejects execution.

Take the payload from Test 2 and change one character in the policy_hash or signature. Then run again.

Expected result:

• Gate detects the mismatch.

• Returns 403 Forbidden or 408 Timeout.

• No release.

✅ This proves offline‑verifiable integrity – the receipt cannot be forged or edited after the fact.

What These Tests Demonstrate (Without Patent Secrets)

Capability How It’s Shown

Hardware‑rooted enforcement The gate requires a valid receipt format – real version uses TPM + PCIe

Fail‑closed by default Test 1 – no valid receipt → no execution

Offline verifiability Test 3 – tampering is detectable without calling home

Spacetime anchoring The gnss_timestamp_ns field (real version uses Starlink PPS)

Court‑admissible receipts The combination of policy hash, state fingerprint, and signature

Next Steps

1 Run Test 1 on your staging endpoint. Let me know if you get the 408 Timeout.

2 Run Test 2 – you should see ALLOW.

3 Run Test 3 to confirm tamper detection.

4 Once you’re satisfied, we can schedule a live demo where I inject a real hardware‑attested Veritas receipt (from actual TPM 2.0 + PCIe PERST# gate) into your endpoint. At that point, the gate will release only when the receipt is cryptographically valid – no simulation.

This is exactly the same pattern we used with at Velos. His gateway confirmed T=0 annihilation in 1524ms. Now we’re moving to the positive path.

No posts

Ready for more?