LLMs can’t be trusted without proofs. I work on what formal verification can — and cannot — do about that: neurosymbolic guardrails that check LLM outputs against formal logic (deployed in Amazon Bedrock), hallucination detection that needs no ground truth, and automated red teaming. Before that: automatic correctness proofs for hardware and distributed protocols, including the first automatic proof of Lamport’s Paxos.
Formal methods applied to LLM outputs: checking whether an answer is correct, detecting hallucinations, and adversarially stress-testing the models.
Automatic correctness proofs for distributed protocols and hardware — detecting bugs or proving their absence without manually supplied invariants.
IC3PO — automatic verifier for distributed protocols; produced the first automatic proof of Lamport’s Paxos.
AVR — word-level hardware model checker; 1st at HWMCC 2020, medals at HWMCC 2024 & 2025.
TurboFuzzLLM — open-source LLM red-teaming fuzzer; ≥95% attack success on GPT-4o and GPT-4 Turbo.
Coming soon.
First fully automatic proof of Lamport’s Paxos · 1st place HWMCC 2020 (7 gold) · medals at HWMCC 2024 & 2025 · Rackham Predoctoral Fellow · Best Student Research Award (Michigan CSE) · IIT Madras Silver Medalist · National Award for the Empowerment of Persons with Disabilities (Govt. of India, 2013).
Program committee — CAV 2025; FMCAD 2022 (student forum). Reviewer — TOCL, TCAD. Artifact evaluation — SOSP, CAV, OSDI, MICRO, VMCAI.
I’ve spent ten years on one question: how do you get a machine to prove a system correct? First hardware (AVR), then distributed protocols (IC3PO and the Paxos proof), now LLM outputs (the guardrails behind Amazon Bedrock’s Automated Reasoning checks). PhD from the University of Michigan with Karem Sakallah; before AWS, SMT solving at SRI (Yices 2) and formal verification at Cadence (JasperGold). Based in Seattle. Email me about neurosymbolic verification, hallucination detection, red teaming, or speaking — aman@amangoel.ai.