YouTube , released 20. 5. 2022, updated 19. 7. 2023; available in [PDF]
This post is an expanded translation of my lecture notes from a Randomized and Approximation Algorithms course that I took, and a more detailed explanation of the topics covered in my video about BEST-SAT.
An example problem could be minimum spanning trees:
- input instances: set of all weighted graphs
- permissible inputs: spanning trees for the given weighted graph
- utility function: the spanning tree weight (sum of its edges)
- we’re minimizing
Definition (Optimalization problem) is a tuple
- set of all input instances
- sets of permissible inputs
- utility function
- whether we’re maximizing or minimizing (a single bit )
Definition (NP-Optimalization problem) is an optimalization problem , for which we additionally require that:
- the length of all permissible solutions is polynomial
- the language of is polynomial
- we can check the correctness of a solution in polynomial time
- is computable in polynomial time
For minimalization problem, we ensure that the solution is always small enough.
For maximalization problem, we ensure that the solution is always large enough.
Definition: algorithm is -approximation, if:
- it computes the solution in polynomial time (in terms of )
- for minimalization problem:
- for maximalization problem:
- Input: , each clause is a disjunction of literals
- Output: evaluation of the variables (sometimes called literals)
- Goal: maximize the number of satisfied clauses
We also assume that:
- no literal repeats in a clause
- at most one of appearas in a clause
- choose all literals randomly (independently, for )
Theorem: RAND-SAT is a -approximation algorithm.
Proof: we’ll create an indicator variable for each clause
- the chance that is not satisfied is
Since the size of the clause , we get , thus
We’re using the optimal solution to the linear program (and generally the formula, if we allow real vlaues for literals) as a guide for our randomized algorithm.
- build an integer linear program:
- variables will be:
- for each literal
- for each clause
- inequalitites will be one for each clause, in the form
- we’ll maximize the number of satisfied clauses
- variables will be:
- relax the program (allow real variables instead of integers) and calculate the optimum
- set literals to with probability
Theorem: LP-SAT is a -approximation algorithm.
To prove this, we’ll use a few lemmas/theorems that aren’t difficult to prove, but aren’t really interesting. I left links to (Wikipedia and I don’t feel bad about it) articles with proofs for each, if you’re interested.
Fact (A - A/G mean inequality):
Fact (B - Jensen’s inequality): if a function is concave on the interval and , then
Fact (C - 1/e inequality):
Proof (of the main theorem): consider and with literals; then
We’re interested in the satisfied ones, so
To use fact , we observed that and that the second derivation is non-positive (so the function is concave). Now to formally count how many our program satisfies:
- assign a value of a literal using RAND-SAT with probability , else use BEST-SAT
- have an existential crisis about the fact that this works and is asymptotically optimal
Theorem: BEST-SAT is -approximation.
Proof: we want to prove that .
Let’s look at the probability that each algorithm satisfies a clause of variables:
- RAND-SAT: (at least one literal must be satisfied)
- LP-SAT: (the formula right before using fact C)
Now the proof boils down to the following table: