:PROPERTIES:
:ID: aff79137-b7f4-4795-8838-9fe332a3fb9b
:mtime: 20240902015919 20240816162042 20240816145043 20240816133252 20240816104252 20240815172443
:ctime: 20240815172442
:END:
#+title: Report on Formalised Bit Vector Decision Procedures
#+filetags: :blog:public:project:
#+author: Atticus M. Kuhn
#+OPTIONS: broken-links:mark
* Introduction
In this report, I will survey three [[id:f883fe8c-a2db-4033-8c01-93c9be7c20dc][decision procedure]]s on [[id:ecf57e78-a8a2-4af0-9746-bb84525fd8a5][bit vector]]s.
#+include: ~/org/roam/20240815172704-decision_procedure.org :lines "8-"
I will explain these [[id:f883fe8c-a2db-4033-8c01-93c9be7c20dc][decision procedure]]s, how they work, and also benchmark their
rate of success on real-life benchmarks.
* Tactics
** Bitwise Tactic
The Bitwise tactic is based on the idea of [[id:5ea0735e-03b7-4183-88df-1c098c8b7fcb][bit blasting.]]
The following [[id:9db1a2e1-0330-4905-892c-8ed3b088e6d0][Backus-Naur form]] grammar represents the terms that the bitwise tactic accepts
#+BEGIN_SRC bnf
bitwise_term ::= expr "=" expr
expr :: = literal
| expr "&&&" expr
| expr "|||" expr
| expr "^^^" expr
| "~~~" expr
#+END_SRC
This is key: the bitwise tactic only works over equalities on a fixed/constant bit-width.
Here is an example of an equality where the bitwise tactic would work, in Lean:
#+BEGIN_SRC lean4
example (x y : BitVec 64) : (x ||| x) &&& y = y &&& x := by
bv_decide
#+END_SRC
The bitwise tactic works by bit-blasting the expression into constraints for each bit, and then passing these
constraints to a SAT solver. In this case, we have a verified SAT solver in Lean called Lean-SAT.
** Ring Tactic
The Ring tactic works by putting both sides of the equality in ring-normal-form. Here
is the [[id:9db1a2e1-0330-4905-892c-8ed3b088e6d0][Backus-Naur form]] of expressions that the ring tactic can handle:
#+BEGIN_SRC bnf
ring_term ::= expr "=" expr
expr :: = literal
| expr "+" expr
| expr "-" expr
| expr "*" expr
| "-" expr
| expr "<<<" literal -- x <<< c = x * 2 ^ c
| "~~~" expr --- ~~~ x = - x - 1
#+END_SRC
Here is an example of an equality that can be handled by the ring tactic in Lean:
#+BEGIN_SRC lean4
example (w : Nat) (x y : BitVec w) : ~~~ (x + y) = ~~~ x + ~~~ y + 1 := by
bv_ring
#+END_SRC
Note that the ring tactic can handle expressions over an arbitrary width.
** Automata Tactic
The automata tactic may be compared by analogy to Presburger Arithmetic over the Integers.
The automata tactic works by only considering operations that can be computed using a constant
amount of memory scratch space.
How, exactly, the automata tactic works is a complicated subject.
Below is the [[id:9db1a2e1-0330-4905-892c-8ed3b088e6d0][Backus-Naur form]] of expressions accepted by the automata procedure:
#+BEGIN_SRC bnf
automata_term ::= expr "=" expr
expr :: = literal
| expr "+" expr
| expr "-" expr
| "-" expr
| expr "<<<" literal -- x <<< c = x + x + x + .... + x
| expr "*" literal -- x * c = x + x + x + .... + x
| literal "*" expr -- c * x = x + x + x + .... + x
| expr "&&&" expr
| expr "|||" expr
| expr "^^^" expr
| "~~~" expr
#+END_SRC
For example, negation may be computed with one extra bit of scratch space, so it is handled by the automata.
However, multiplication may take up scratch space dependent on the size of its inputs, so it is not handled in
the general case by the automata tactic (although multiplication by a constant can just be re-written as repeated addition)
* Benchmarks
** Hacker's Delight Benchmark
#+include: ~/org/roam/20240816133158-hacker_s_delight.org :lines "8-"
Some of the bit-twiddling theorems from [[id:adb55e19-89c4-4c5b-9988-3ad5be323083][Hacker's Delight]] were manually translated into Lean Theorems.
These theorems came from chapters two and three of the book.
We then collected data using our tactics.
| | # | Not Equalities | Unsupported Operations | Succeeded |
|-----------------+----+-----------------+------------------------+-----------|
| Ring | 86 | 53 | 30 | 3 |
| Bitwise | 86 | 53 | 29 | 4 |
| Automata | 86 | 53 | 9 | 24 |
| Ring + Automata | 86 | 53 | 9 | 24 |
** LLVM InstCombine Benchmark
Instruction Combination, or InstCombine, is the largest part of the LLVM optimisations.
In InstCombine, instructions are locally re-written in thousands of cheap re-writes.
For example, InstCombie may re-write
#+BEGIN_SRC c
// before InstCombine
y = 2 * x;
// after InstCombine
y = x + x;
#+END_SRC
Such an optimisation carries the implicit proof that ~2 * x = x + x~.
In the InstCombine benchmark, the number of files is:
#+BEGIN_SRC bash :exports results
ls ~/coding/llvm-project-main/llvm/test/Transforms/InstCombine | wc -l
#+END_SRC
#+RESULTS:
: 1473
Within those files, the number of functions is
#+BEGIN_SRC bash :exports results
grep "define" -R ~/coding/llvm-project-main/llvm/test/Transforms/InstCombine | wc -l
#+END_SRC
#+RESULTS:
: 33558
We optimise these files using the shell command ~opt -passes=instcombine -S~. Using xDSL, we filter only those
functions which contain the operations we are interested in (arithmetic and bitwise operations).
From that, the number of Lean-understandable theorems we extract is:
#+BEGIN_SRC bash :exports results
grep "BEGIN" -R ~/coding/lean-mlir/SSA/Projects/InstCombine/tests/LLVM | wc -l
#+END_SRC
#+RESULTS:
: 951
Of these, the number of theorems automatically solved by unfolding the definition is:
#+BEGIN_SRC bash :exports results
grep "try extract_goal" -R ~/coding/lean-mlir/SSA/Projects/InstCombine/tests/LLVM | wc -l
#+END_SRC
#+RESULTS:
: 207
This means that the number of non-trivial theorems in this benchmark is:
#+BEGIN_SRC bash :exports results
grep "apply" -R ~/coding/lean-mlir/SSA/Projects/InstCombine/tests/LLVM | wc -l
#+END_SRC
#+RESULTS:
: 744
Of those, we exclude some theorems because they take too long to typecheck. This gives us the final comparison:
| | # | Not Equalities | Unsupported Operations | Succeeded |
|-----------------+-----+-----------------+------------------------+-----------|
| Ring | 717 | 179 | 478 | 58 |
| Bitwise | 717 | 179 | 332 | 206 |
| Automata | 717 | 179 | 293 | 245 |
| Ring + Automata | 717 | 179 | 246 | 290 |
Thus, on the InstCombine Benchmark, we see that ~automata~ solves the most theorems.