Lean 4: How the theorem prover works and why it's the new competitive edge in AI

(venturebeat.com)

66 points by tesserato4 days ago

12 comments

Rochus3 hours ago
Interesting. It's essentially the same idea as in this article: <a href="https://substack.com/home/post/p-184486153" rel="nofollow">https://substack.com/home/post/p-184486153</a>. In both scenarios, the human is relieved of the burden of writing complex formal syntax (whether Event-B or Lean 4). The human specifies intent and constraints in natural language, while the LLM handles the work of formalization and satisfying the proof engine.But Lean 4 is significantly more rigid, granular, and foundational than e.g. Event-B, and they handle concepts like undefined areas and contradictions very differently. While both are "formal methods," they were built by different communities for different purposes: Lean is a pure mathematician's tool, while Event-B is a systems engineer's tool. Event-B is much more flexible, allowing an engineer (or the LLM) to sketch the vague, undefined contours of a system and gradually tighten the logical constraints through refinement.LLMs are inherently statistical interpolators. They operate beautifully in an Open World (where missing information is just "unknown" and can be guessed or left vague) and they use Non-Monotonic Reasoning (where new information can invalidate previous conclusions). Lean 4 operates strictly on the Closed World Assumption (CWA) and is brutally Monotonic. This is why using Lean to model things humans care about (business logic, user interfaces, physical environments, dynamic regulations) quickly hits a dead end. The physical world is full of exceptions, missing data, and contradictions. Lean 4 is essentially a return to the rigid, brittle approach of the 1980s expert systems. Event-B (or similar methods) provides the logical guardrails, but critically, it tolerates under-specification. It doesn't force the LLM to solve the Frame Problem or explicitly define the whole universe. It just checks the specific boundaries the human cares about.
- mycall57 minutes ago
 So basically you are arguing a Type Theory vs Set Theory problem, Foundationalism or Engineering Refinement. Since we read here of multiple use cases for LLMs in both CS divides, we can conclude an eventual convergence in these given approaches; and if not that, some formal principles should emerge of when to use what.
 - Rochus20 minutes ago
 This discussion started already in the sixties (see e.g. the 1969 publication by McCarthy and Hayes where they describe the "frame problem" as a fundamental obstacle to the attempt to model the dynamic world using First-Order Logic and monotonic reasoning). A popular attempt to "solve" this problem is the Cyc project. Monotonic logic is universally understood as a special, restricted case (a subset) of a broader non-monotonic theory.
Gehinnn4 hours ago
I just completed the formal verification of my bachelor thesis about real time cellular automata with Lean 4, with heavy use of AI.Over the past year, I went from fully manual mode (occasionally asking chat gpt some Lean questions) to fully automatic mode, where I barely do Lean proofs myself now (and just point AI to the original .tex files, in German). It is hard to believe how much the models and agentic harnesses improved over the last year.I cannot describe how much fun it is to do refactorings with AI on a verified Lean project!Also, it's so easy now to have visualizations and typesetted documents generated by AI, from dependency visualizations of proofs using the Lean reflection API, to visual execution traces of cellular automatas.
- svara36 minutes ago
 Can you give some examples of this? Maybe have something online? I would love to learn more about how to do proof driven AI assisted development.
 - Gehinnn4 minutes ago
 Here is a session that I just had with AI: <a href="https://gist.github.com/hediet/e3569a7c6b4b7c4f7d4a7db4101047de" rel="nofollow">https://gist.github.com/hediet/e3569a7c6b4b7c4f7d4a7db410104...</a> (summarized by AI).I use VS Code in a beefy Codespace, with GitHub Copilot (Opus 4.5). I have a single instruction file telling the AI to always run "lake build ./lean-file.lean" to get feedback.(disclaimer: I work on VS Code)
 - nwyin10 minutes ago
 it's a bit dated, but Terence Tao has a video of formalizing a proof with LLMs from 9 months ago which should be illuminating<a href="https://youtu.be/zZr54G7ec7A?si=-l3jIZZzfghoqJtq" rel="nofollow">https://youtu.be/zZr54G7ec7A?si=-l3jIZZzfghoqJtq</a>
 - Gehinnn1 minute ago
 This is also how I worked with Lean a year ago (of course in a much simpler domain). However, with agentic AI that can run lean via CLI my workflow changed completely and I rarely write proofs (only intermediate lemmas and sometimes to get "beautiful" proofs - AI proofs tend to be very ugly).
xvilka4 hours ago
Lean is a great idea, especially the 4th version, a huge level up from the 3rd one, but its core still deficient[1] in some particular scenarious (see an interesting discussion[2] in the Rock (formerly Coq) issue tracker). Not sure if it might hinder the automation with the AI.[1] <a href="https://artagnon.com/logic/leancoq" rel="nofollow">https://artagnon.com/logic/leancoq</a>[2] <a href="https://github.com/rocq-prover/rocq/issues/10871" rel="nofollow">https://github.com/rocq-prover/rocq/issues/10871</a>
- joomy2 hours ago
 The issue was a fun read, thanks for sharing.
lo_zamoyski4 minutes ago
This has been the approach taken by some using LLMs, even in less type-heavy situations. Of course, it is part of a broader tradition in which search is combined with verification. Genetic programming and related areas come to mind. Here, LLMs are search, while Lean is used to express constraints.
SteveJS3 hours ago
I am using lean as part of the prd.md description handed to a coding agent. The definitions in lean compile and mean exactly what I want them to say. The implementation i want to build is in rust.HOWEVER … I hit something i now call a McLuhen vortex error: “When a tool, language, or abstraction smuggles in an implied purpose at odds with your intended goal.”Using Lean implies to the coding agent ‘proven’ is a pervasive goal.I want to use lean to be more articulate about the goal. Instead using lean smuggled in a difficult to remove implicit requirement that everything everywhere must be proven.This was obvious because the definitions i made in lean imply the exact opposite of everything needs to be proven. When i use morphism i mean anything that is a morphism not only things proven to be morphisms.A coding agent driven by an llm needs a huge amount of structure to use what the math says rather than take on the implications that because it is using a proof system therefore everything everywhere is better if proven.The initial way i used lean poisoned the satisficing structure that unfolds during a coding pass.
- mycall1 hour ago
 Could you put that distinction into the AGENTS.md file so it will understand and follow that nuance?
kig4 hours ago
If you want to mess with this at home, I've been vibe coding <a href="https://github.com/kig/formalanswer" rel="nofollow">https://github.com/kig/formalanswer</a> to plug theorem provers into an LLM call loop. It's pretty early dev but it does have a logic rap battle mode.
- nl1 hour ago
 This is pretty interesting!
throwaway20276 hours ago
I think I saw Terence Tao use a formal proof language but I don't remember if it was Lean. I'm not familiar with it but I do agree that moving to provable languages could improve AI but isn't the basis just having some immutable rigorous set of tests basically which could be replicated in "regular" programming languages?
- iNic5 hours ago
 You can think of theorem provers as really crazy type checkers. It's not just a handful of tests that have to run, but more like a program that has to compile.
 - seanhunter5 hours ago
 Yes exactly. There is this thing called the “Curry-Howard Isomorphism” which (as I understand it) says that propositions in formal logic are isomorphic to types. So the “calculus of constructions” is a typed lambda calculus based on this that makes it possible for you to state some proposition as a type and if you can instantiate that type then what you have done is isomorphic to proving the proposition. Most proof assistants (and certainly Lean) are based on this.So although lean4 is a programming language that people can use to write “normal” programs, when you use it as a proof assistant this is what you are doing - stating propositions and then using a combination of a (very extensive) library of previous results, your own ingenuity using the builtins of the language and (in my experience anyway) a bunch of brute force to instantiate the type thus proving the proposition.
- seanhunter5 hours ago
 It was lean4. In fact he has made lean4 versions of all of the proofs in his Analysis I textbook available here<a href="https://github.com/teorth/analysis" rel="nofollow">https://github.com/teorth/analysis</a>He also has blogged about how he uses lean for his research.Edit to add: Looking at that repo, one thing I like (but others may find infuriating idk) is that where in the text he leaves certain proofs as exercises for the reader, in the repo he turns those into “sorry”s, so you can fork the repo and have a go at proving those things in lean yourself.If you have some proposition which you need to use as the basis of further work but you haven’t completed a formal proof of yet, in lean, you can just state the proposition with the proof being “sorry”. Lean will then proceed as though that proposition had been proved except that it will give you a warning saying that you have a sorry. For something to be proved in lean you have to have it done without any “sorry”s. <a href="https://lean-lang.org/doc/reference/latest/Tactic-Proofs/Tactic-Reference/#sorry" rel="nofollow">https://lean-lang.org/doc/reference/latest/Tactic-Proofs/Tac...</a>
- gaogao5 hours ago
 Yes, though often the easiest way to replicate it in regular programming languages is to translate that language to Lean or another ITM, though auto-active like Verus is used for Rust pretty successfully.Python and C though have enough nasal demons and undefined behavior that it's a huge pain to verify things about them, since some random other thread can drive by and modify memory in another thread.
tokenless4 hours ago
> Large language models (LLMs) have astounded the world with their capabilities, yet they remain plagued by unpredictability and hallucinations – confidently outputting incorrect information. In high-stakes domains like finance, medicine or autonomous systems, such unreliability is unacceptable.This misses a point that software engineers initmately know especially ones using ai tools:* Proofs are one QA tool* Unit tests, integration tests and browser automation are other tools.* Your code can have bugs because it fails a test above BUT...* You may have got the requirements wrong!Working with claude code you can have productive loops getting it to assist you in writing tests, finding bugs you hadn't spotted and generally hardening your code.It takes taste and dev experience definitely helps (as of Jan 26)So I think hallucinations and proofs as the fix is a bit barking up the wrong treeThe solution to hallucinations is careful shaping of the agent environment around the project to ensure quality.Proofs may be part of the qa toolkit for AI coded projects but probably rarely.
zmgsabst5 hours ago
The real value is in mixed mode:- Lean supports calling out as a tactic, allowing you to call LLMs or other AI as judges (ie, they return a judgment about a claim)- Lean can combine these judgments from external systems according to formal theories (ie, normal proof mechanics)- an LLM engaged in higher order reasoning can decompose its thinking into such logical steps of fuzzy blocks- this can be done recursively, eg, having a step that replaces LLM judgments with further logical formulations of fuzzy judgments from the LLMSomething, something, sheaves.
nudpiedo5 hours ago
I like a lot of the idea behind such theorem provers, however, I always have issues with them producing compatible code with other languages.This happened to me with idris and many others, I took some time to learn the basics, wrote some examples and then FFI was a joke or code generators for JavaScript absolutely useless.So no way of leveraging an existing ecosystem.
- seanhunter5 hours ago
 Lean has standard c ABI FFI support. <a href="https://lean-lang.org/doc/reference/latest/Run-Time-Code/Foreign-Function-Interface/" rel="nofollow">https://lean-lang.org/doc/reference/latest/Run-Time-Code/For...</a>
 - nudpiedo4 hours ago
 Literally the first line of the link:“The current interface was designed for internal use in Lean and should be considered unstable. It will be refined and extended in the future.“My point is that in order to use these problem provers you really gotta be sure you need them, otherwise interaction with an external ecosystem might be a dep/compilation nightmare or bridge over tcp just to use libraries.
- densh3 hours ago
 Apart from prioritizing FFI (like Java/Scala, Erlang/Elixir), the other two easy ways to bootstrap an integration of a new obscure or relatively new programming language is to focus on RPC (ffi through network) or file input-output (parse and produce well known file formats to integrate with other tools at Bash level).I find it very surprising that nobody tried to make something like gRPC as an interop story for a new language, with an easy way to write impure "extensions" in other languages and let your pure/formal/dependently typed language implement the rest purely through immutable message passing over gRPC boundary. Want file i/o? Implement gRPC endpoint in Go, and let your language send read/write messages to it without having to deal with antiquated and memory unsafe Posix layer.
AxiomLab5 hours ago
[flagged]
- youoy5 hours ago
 This site is getting invaded by AI bots... how long before its just AI speaking with AI, and just people reading the conversations thinking that its actual people?
 - 5o1ecist4 hours ago
 There's no need for you to worry about it, MY FELLOW HUMAN [1], because you will never know.[1] <a href="https://old.reddit.com/r/totallynotrobots" rel="nofollow">https://old.reddit.com/r/totallynotrobots</a>PS: Of course that's not true. An ID system for humans will inevitably become mandatory and, naturally, politicians will soon enough create a reason to use it for enforcing a planet wide totalitarian government watched over by Big Brother.Conspiracy-theory-nonsense? Maybe! I'll invite some billionaires to pizza and ask them what they think.
tesserato4 days ago
[flagged]
- nudpiedo5 hours ago
 Are you an AI just summarizing the article?
 - pja5 hours ago
 If you look at their comment history it's quite clear that's what they are.What's the HN stance on AI bots? To me it just seems rude - this is a space for people to discuss topics that interest them & AI contributions just add noise.
 - seanhunter5 hours ago
 It is very rude as it just wastes everyone’s time and debases the commons. I’m pretty sure it’s also against the guidelines.