CHAPTERS
Why identical outputs aren’t equally trustworthy: mechanism matters
James Brady opens by arguing that trust depends not just on the final answer but on the process used to produce it. He illustrates how two identical outputs can deserve different levels of confidence depending on model quality, tool use, and internal checks.
Three requirements for Elicit’s research agent: legibility, fidelity, faithful execution
He outlines the core desiderata that pushed Elicit toward a domain-specific language. The agent’s workflow must be inspectable, stable under iteration, and actually executed as written.
Introducing ÆSHPL: a constrained, opinionated Python subset for research workflows
James introduces Elicit’s DSL, ÆSHPL, designed to encode agentic workflows in a controlled way. It is intentionally limited to improve predictability and enable verification and caching.
What ÆSHPL code looks like and what it represents
He shows an example ÆSHPL program that resembles Python and encodes a competitive analysis workflow. The key idea: the plan is not just documentation—it’s executable.
The core execution loop: write ÆSHPL → interpret → redraft → re-interpret
He explains Elicit’s internal cycle: a model component writes the program, the system executes it, and the program is iteratively revised based on errors and results. This becomes the engine of progress inside a session.
System architecture: UI + event log + Python service + sandboxed curator
James maps the DSL workflow into a production system architecture. Event sourcing connects user actions to the evolving ÆSHPL program and its execution.
Operational components: wrapper, model gateway, and credential isolation
He details supporting infrastructure that makes the approach secure and flexible. These layers allow swapping model harnesses while protecting secrets from prompt injection or exfiltration.
From code to execution: parsing, type-checking, AST interpretation, and caching
He walks through the interpreter pipeline: parse and validate the program, then interpret it via an AST walker in Python. A content-addressed store enables memoization critical to performance and iteration.
Why re-run the whole program each time: avoiding drift while staying fast
Elicit reinterprets the full ÆSHPL program after each redraft, rather than patch-executing snippets. This design improves coherence and allows stronger guarantees, with memoization keeping it responsive.
Demo setup: Elicit’s “research landscape” workflow and rigor-first positioning
James transitions into a demo, positioning Elicit on the “rigor” end of the speed–quality spectrum. He uses a saved session that mapped organizations investing in foundation models for biology.
Demo walkthrough: layered searches, enrichment, screening, and artifact generation
He shows the stepwise analysis blocks: multiple web/paper searches, full-text fetching, and filtering. The output becomes structured “artifacts” (tables) with extracted attributes and provenance.
Inspectability: view the ÆSHPL and a derived graphical workflow view
He demonstrates that each artifact can be traced back to the exact ÆSHPL program that generated it. For usability, Elicit also provides a graph visualization derived directly from the same program.
Extending the session: joins, oversight bodies, and long programs with caching
He shows iterative expansion: adding comparisons (open vs closed), commercialization/GTG strategy, mapping oversight institutions, and finally joining datasets. The final program becomes much longer but remains efficient due to caching.
When a DSL is worth it: engineering checklist and evaluation investment
He closes with guidance: a DSL isn’t for everyone, but it fits when trust, robustness, and provenance matter. Most effort is not the DSL syntax itself but the surrounding system and rigorous evaluation.
Closing thesis: same table, different object—mechanism changes trust
James returns to the opening question: identical-looking outputs can carry different trust levels. Elicit’s differentiator is a visible, executable, repeatable process that users can inspect and endorse.
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome