Do you love AI coding tools such as ChatGPT and GitHub Copilot? Do you hate that you can’t trust the code they produce?
In some (for now, limited) cases, you can trust their code! Using formal verification tools such as Kani for Rust we can sometimes automatically prove the correctness of AI-generated code with mathematical certainty. My crate RangeSetBlaze now includes some such code.
Rust already provides mathematical certainty for memory management. In this article, you’ll also learn how you can use Kani to expand that certainty to the topic of computer arithmetic and overflows. You’ll find Kani useful for both human- and AI-generated code.
Aside: You may object that, in general, proving a program correct is undecidable. Also, in general, writing a correct specification is as hard as writing a correct program. While these statements are true in general, in many specific domains writing a specification is easy— for example, we just write an obviously correct but inefficient function as our spec. Also, in some useful cases, proving the equivalence of two functions is well within the capabilities of systems such as Kani. Read on for examples.
To understand the promise and limitations of automatic verification of automatically-generated code, I applied ChatGPT and Kani to parts of RangeSetBlaze. RangeSetBlaze is a Rust crate for efficiently manipulating sets of “clumpy” integers. Specifically, I looked at three programming problems with these properties:
- Problem 1: A one-line problem verifiable by Kani but too hard for ChatGPT
- Problem 2: A one-line problem solvable by ChatGPT and verifiable by Kani
- Problem 3: A more interesting problem, too hard for Kani and for ChatGPT
Excitingly, when I benchmarked the ChatGPT/Kani solution to Problem 2, I found it to be 3% faster than my original code. RangeSetBlaze now includes this computer generated and verified code.