This post is an expanded translation of my lecture notes from a Randomized and Approximation Algorithms course that I took, and a more detailed explanation of the topics covered in my video about BEST-SAT.
Basic definitions
An example problem could be minimum spanning trees:
input instances:set of all weighted graphs
permissible inputs:spanning trees for the given weighted graph
utility function:the spanning tree weight (sum of its edges)
we’re minimizing
Definition (Optimalization problem) is a tuple I,F,f,g
set of all input instancesI
sets of permissible inputs∀I∈I:F(I)
utility function∀I∈I,A∈F(I):f(I,A)
whether we’re maximizing or minimizing (a single bit g)
Definition (NP-Optimalization problem) is an optimalization problem I,F,f,g, for which we additionally require that:
the length of all permissible solutions is polynomial
the language of (I,A),I∈I,A∈F(I) is polynomial
we can check the correctness of a solution in polynomial time
f is computable in polynomial time
For minimalization problem, we ensure that the solution is always small enough.
For maximalization problem, we ensure that the solution is always large enough.
Definition: algorithm A is R-approximation, if:
it computes the solution in polynomial time (in terms of ∣I∣)
for minimalization problem: ∀I:f(A)≤R⋅OPT(I)
for maximalization problem: ∀I:f(A)≥OPT(I)/R
MAX-SAT
Input:C1∧…∧Cn, each clause is a disjunction of kj≥1 literals
Output: evaluation a∈{0,1}n of the variables (sometimes called literals)
Goal: maximize the number of satisfied clauses ∑wj
We also assume that:
no literal repeats in a clause
at most one of xi,xi appearas in a clause
RAND-SAT
Algorithm (RAND-SAT):
choose all literals randomly (independently, for p=1/2)
profit?
Theorem:RAND-SAT is a 2-approximation algorithm.
Proof: we’ll create an indicator variable Yj for each clause
the chance that Cj is not satisfied is 2k1
Since the size of the clause k≥1, we get E[Yj]=Pr[Cjis satistied]=1−2k1≥21, thus
E[j=1∑nYj]=of expectationlinearityj=1∑nE[Yj]≥j=1∑n21≥21OPT
LP-SAT
We’re using the optimal solution to the linear program (and generally the formula, if we allow real vlaues for literals) as a guide for our randomized algorithm.
Algorithm (LP-SAT):
build an integer linear program:
variables will be:
yi for each literal
zj for each clause
inequalitites will be one for each clause, in the form zj≤positive∑yi+negative∑(1−yi)
we’ll maximize the number of satisfied clauses ∑zj
relax the program (allow real variables instead of integers) and calculate the optimum y∗,z∗
set literals xi to 1 with probability yi∗
Theorem:LP-SAT is a (1−e1)-approximation algorithm.
To prove this, we’ll use a few lemmas/theorems that aren’t difficult to prove, but aren’t really interesting. I left links to (Wikipedia and I don’t feel bad about it) articles with proofs for each, if you’re interested.
Fact (A - A/G mean inequality):i=1∏nain1≤n1i=1∑nai
Proof (of the main theorem): consider y∗,z∗ and Cj with kj literals; then
Pr[Cjis not satisfied]=i:xi∈Cj∏(1−yi∗)positivei:xi∈Cj∏yi∗negative≤Akj1i:xi∈Cj∑(1−yi∗)+i:xi∈Cj∑yi∗kj=1−kj1i:xi∈Cj∑yi∗+i:xi∈Cj∑(1−yi∗)kj≤(1−kjzj∗)kj
We’re interested in the satisfied ones, so
Pr[Cjis satisfied]≥1−(1−kjzj∗)kjour functionf(zj∗)≥B[1−(1−kj1)kj]zj∗≥C(1−e1)zj∗
To use fact B, we observed that a=f(0)=0 and that the second derivation is non-positive (so the function is concave). Now to formally count how many our program satisfies:
E[j=1∑mYj]=j=1∑mE[Yj]≥j∈U∑Pr[Cjis satisfied]≥j∈U∑(1−e1)zj∗=(1−e1)OPT
BEST-SAT
Algorithm (BEST-SAT):
assign a value of a literal using RAND-SAT with probability 1/2, else use BEST-SAT
have an existential crisis about the fact that this works and is asymptotically optimal
Theorem:BEST-SAT is 43-approximation.
Proof: we want to prove that Pr[Cjis satisfied]≥43zj∗.
Let’s look at the probability that each algorithm satisfies a clause of k variables:
RAND-SAT: 1−2k1 (at least one literal must be satisfied)
LP-SAT: [1−(1−k1)k]zj∗ (the formula right before using fact C)