Approximating E3-LIN is NP-Hard

This lecture, which I gave for my 18.434 seminar, focuses on the MAX-E3LIN problem. We prove that approximating it is NP-hard by a reduction from LABEL-COVER.

1. Introducing MAX-E3LIN

In the MAX-E3LIN problem, our input is a series of linear equations {\pmod 2} in {n} binary variables, each with three terms. Equivalently, one can think of this as {\pm 1} variables and ternary products. The objective is to maximize the fraction of satisfied equations.

Example 1 (Example of MAX-E3LIN instance)

\displaystyle \begin{aligned} x_1 + x_3 + x_4 &\equiv 1 \pmod 2 \\ x_1 + x_2 + x_4 &\equiv 0 \pmod 2 \\ x_1 + x_2 + x_5 &\equiv 1 \pmod 2 \\ x_1 + x_3 + x_5 &\equiv 1 \pmod 2 \end{aligned}

\displaystyle \begin{aligned} x_1 x_3 x_4 &= -1 \\ x_1 x_2 x_4 &= +1 \\ x_1 x_2 x_5 &= -1 \\ x_1 x_3 x_5 &= -1 \end{aligned}

A diligent reader can check that we may obtain {\frac34} but not {1}.

Remark 2

We immediately notice that

  • If there’s a solution with value {1}, we can find it easily with {\mathbb F_2} linear algebra.
  • It is always possible to get at least {\frac{1}{2}} by selecting all-zero or all-one.

The theorem we will prove today is that these “obvious” observations are essentially the best ones possible! Our main result is that improving the above constants to 51% and 99%, say, is NP-hard.

Theorem 3 (Hardness of MAX-E3LIN)

The {\frac{1}{2}+\varepsilon} vs. {1-\delta} decision problem for MAX-E3LIN is NP-hard.

This means it is NP-hard to decide whether an MAX-E3LIN instance has value {\le \frac{1}{2}+\varepsilon} or {\ge 1-\delta} (given it is one or the other). A direct corollary of this is approximating MAX-SAT is also NP-hard.

Corollary 4

The {\frac78+\varepsilon} vs. {1-\delta} decision problem for MAX-SAT is NP-hard.

Remark 5

The constant {\frac78} is optimal in light of a random assignment. In fact, one can replace {1-\delta} with {\delta}, but we don’t do so here.

Proof: Given an equation {a+b+c=1} in MAX-E3LIN, we consider four formulas {a \lor \neg b \lor \neg c}, {\neg a \lor b \lor \neg c}, {a \lor \neg b \lor \neg c}, {a \lor b \lor c}. Either three or four of them are satisfied, with four occurring exactly when {a+b+c=0}. One does a similar construction for {a+b+c=1}. \Box

The hardness of MAX-E3LIN is relevant to the PCP theorem: using MAX-E3LIN gadgets, Ha}stad was able to prove a very strong version of the PCP theorem, in which the verifier merely reads just three bits of a proof!

Theorem 6 (Hastad PCP)

Let {\varepsilon, \delta > 0}. We have

\displaystyle \mathbf{NP} \subseteq \mathbf{PCP}_{\frac{1}{2}+\varepsilon, 1-\delta}(3, O(\log n)).

In other words, any {L \in \mathbf{NP}} has a (non-adaptive) verifier with the following properties.

  • The verifier uses {O(\log n)} random bits, and queries just three (!) bits.
  • The acceptance condition is either {a+b+c=1} or {a+b+c=0}.
  • If {x \in L}, then there is a proof {\Pi} which is accepted with probability at least {1-\delta}.
  • If {x \notin L}, then every proof is accepted with probability at most {\frac{1}{2} + \varepsilon}.

2. Label Cover

We will prove our main result by reducing from the LABEL-COVER. Recall LABEL-COVER is played as follows: we have a bipartite graph {G = U \cup V}, a set of keys {K} for vertices of {U} and a set of labels {L} for {V}. For every edge {e = \{u,v\}} there is a function {\pi_e : L \rightarrow K} specifying a key {k = \pi_e(\ell) \in K} for every label {\ell \in L}. The goal is to label the graph {G} while maximizing the number of edges {e} with compatible key-label pairs.

Approximating LABEL-COVER is NP-hard:

Theorem 7 (Hardness of LABEL-COVER)

The {\eta} vs. {1} decision problem for LABEL-COVER is NP-hard for every {\eta > 0}, given {|K|} and {|L|} are sufficiently large in {\eta}.

So for any {\eta > 0}, it is NP-hard to decide whether one can satisfy all edges or fewer than {\eta} of them.

3. Setup

We are going to make a reduction of the following shape:

reduction

In words this means that

  • “Completeness”: If the LABEL-COVER instance is completely satisfiable, then we get a solution of value {\ge 1 - \delta} in the resulting MAX-E3LIN.
  • “Soundness”: If the LABEL-COVER instance has value {\le \eta}, then we get a solution of value {\le \frac{1}{2} + \varepsilon} in the resulting MAX-E3LIN.

Thus given an oracle for MAX-E3LIN decision, we can obtain {\eta} vs. {1} decision for LABEL-COVER, which we know is hard.

The setup for this is quite involved, using a huge number of variables. Just to agree on some conventions:

Definition 8 (“Long Code”)

A {K}-indexed binary string {x = (x_k)_k} is a {\pm 1} sequence indexed by {K}. We can think of it as an element of {\{\pm 1\}^K}. An {L}-binary string {y = (y_\ell)_\ell} is defined similarly.

Now we initialize {|U| \cdot 2^{|K|} + |V| \cdot 2^{|L|}} variables:

  • At every vertex {u \in U}, we will create {2^{|K|}} binary variables, one for every {K}-indexed binary string. It is better to collect these variables into a function

    \displaystyle f_u : \{\pm1\}^K \rightarrow \{\pm1\}.

  • Similarly, at every vertex {v \in V}, we will create {2^{|L|}} binary variables, one for every {L}-indexed binary string, and collect these into a function

    \displaystyle g_v : \{\pm1\}^L \rightarrow \{\pm1\}.

Picture:

edge

Next we generate the equations. Here’s the motivation: we want to do this in such a way that given a satisfying labelling for LABEL-COVER, nearly all the MAX-E3LIN equations can be satisfied. One idea is as follows: for every edge {e}, letting {\pi = \pi_e},

  • Take a {K}-indexed binary string {x = (x_k)_k} at random. Take an {L}-indexed binary string {y = (y_\ell)_\ell} at random.
  • Define the {L}-indexed binary {z = (z_\ell)_\ell} string by {z = \left( x_{\pi(\ell)} y_\ell \right)}.
  • Write down the equation {f_u(x) g_v(y) g_v(z) = +1} for the MAX-E3LIN instance.

Thus, assuming we had a valid coloring of the graph, we could let {f_u} and {g_v} be the dictator functions for the colorings. In that case, {f_u(x) = x_{\pi(\ell)}}, {g_v(y) = y_\ell}, and {g_v(z) = x_{\pi(\ell)} y_\ell}, so the product is always {+1}.

Unfortunately, this has two fatal flaws:

  1. This means a {1} instance of LABEL-COVER gives a {1} instance of MAX-E3LIN, but we need {1-\delta} to have a hope of working.
  2. Right now we could also just set all variables to be {+1}.

We fix this as follows, by using the following equations.

Definition 8 (Equations of reduction)

For every edge {e}, with {\pi = \pi_e}, we alter the construction and say

  • Let {x = (x_k)_k} be and {y = (y_\ell)_\ell} be random as before.
  • Let {n = (n_\ell)_\ell} be a random {L}-indexed binary string, drawn from a {\delta}-biased distribution ({-1} with probability {\delta}). And now define {z = (z_\ell)_\ell} by

    \displaystyle z_\ell = x_{\pi(\ell)} y_\ell n_\ell .

    The {n_\ell} represents “noise” bits, which resolve the first problem by corrupting a bit of {z} with probability {\delta}.

  • Write down one of the following two equations with {\frac{1}{2}} probability each:

    \displaystyle \begin{aligned} f_u(x) g_v(y) g_v(z) &= +1 \\ f_u(x) g_v(y) g_v(-z) &= -1. \end{aligned}

    This resolves the second issue.

This gives a set of {O(|E|)} equations.

I claim this reduction works. So we need to prove the “completeness” and “soundness” claims above.

4. Proof of Completeness

Given a labeling of {G} with value {1}, as described we simply let {f_u} and {g_v} be dictator functions corresponding to this valid labelling. Then as we’ve seen, we will pass {1 - \delta} of the equations.

5. A Fourier Computation

Before proving soundness, we will first need to explicitly compute the probability an equation above is satisfied. Remember we generated an equation for {e} based on random strings {x}, {y}, {\lambda}.

For {T \subseteq L}, we define

\displaystyle \pi^{\text{odd}}_e(T) = \left\{ k \in K \mid \left\lvert \pi_e^{-1}(k) \cap T \right\rvert \text{ is odd} \right\}.

Thus {T} maps subsets of {L} to subsets of {K}.

Remark 9

Note that {|\pi^{\text{odd}}(T)| \le |T|} and that {\pi^{\text{odd}}(T) \neq \varnothing} if {|T|} is odd.

Lemma 10 (Edge Probability)

The probability that an equation generated for {e = \{u,v\}} is true is

\displaystyle \frac{1}{2} + \frac{1}{2} \sum_{\substack{T \subseteq L \\ |T| \text{ odd}}} (1-2\delta)^{|T|} \widehat g_v(T)^2 \widehat f_u(\pi^{\text{odd}}_e(T)).

Proof: Omitted for now\dots \Box

6. Proof of Soundness

We will go in the reverse direction and show (constructively) that if there is MAX-E3LIN instance has a solution with value {\ge\frac{1}{2}+2\varepsilon}, then we can reconstruct a solution to LABEL-COVER with value {\ge \eta}. (The use of {2\varepsilon} here will be clear in a moment). This process is called “decoding”.

The idea is as follows: if {S} is a small set such that {\widehat f_u(S)} is large, then we can pick a key from {S} at random for {f_u}; compare this with the dictator functions where {\widehat f_u(S) = 1} and {|S| = 1}. We want to do something similar with {T}.

Here are the concrete details. Let {\Lambda = \frac{\log(1/\varepsilon)}{2\delta}} and {\eta = \frac{\varepsilon^3}{\Lambda^2}} be constants (the actual values arise later).

Definition 11

We say that a nonempty set {S \subseteq K} of keys is heavy for {u} if

\displaystyle \left\lvert S \right\rvert \le \Lambda \qquad\text{and}\qquad \widehat{f_u}(S) \ge \varepsilon^2.

Note that there are at most {\varepsilon^{-2}} heavy sets by Parseval.

Definition 12

We say that a nonempty set {T \subseteq L} of labels is {e}-excellent for {v} if

\displaystyle \left\lvert T \right\rvert \le \Lambda \qquad\text{and}\qquad S = \pi^{\text{odd}}_e(T) \text{ is heavy.}

In particular {S \neq \varnothing} so at least one compatible key-label pair is in {S \times T}.

Notice that, unlike the case with {S}, the criteria for “good” in {T} actually depends on the edge {e} in question! This makes it easier to keys than to select labels. In order to pick labels, we will have to choose from a {\widehat g_v^2} distribution.

Lemma 13 (At least {\varepsilon} of {T} are excellent)

For any edge {e = \{u,v\}}, at least {\varepsilon} of the possible {T} according to the distribution {\widehat g_v^2} are {e}-excellent.

Proof: Applying an averaging argument to the inequality

\displaystyle \sum_{\substack{T \subseteq L \\ |T| \text{ odd}}} (1-2\delta)^{|T|} \widehat g_v(T)^2 \left\lvert \widehat f_u(\pi^{\text{odd}}(T)) \right\rvert \ge 2\varepsilon

shows there is at least {\varepsilon} chance that {|T|} is odd and satisfies

\displaystyle (1-2\delta)^{|T|} \left\lvert \widehat f_u(S) \right\rvert \ge \varepsilon

where {S = \pi^{\text{odd}}_e(T)}. In particular, {(1-2\delta)^{|T|} \ge \varepsilon \iff |T| \le \Lambda}. Finally by \Cref{rem:po}, we see {S} is heavy. \Box

Now, use the following algorithm.

  • For every vertex {u \in U}, take the union of all heavy sets, say

    \displaystyle \mathcal H = \bigcup_{S \text{ heavy}} S.

    Pick a random key from {\mathcal H}. Note that {|\mathcal H| \le \Lambda\varepsilon^{-2}}, since there are at most {\varepsilon^{-2}} heavy sets (by Parseval) and each has at most {\Lambda} elements.

  • For every vertex {v \in V}, select a random set {T} according to the distribution {\widehat g_v(T)^2}, and select a random element from {T}.

I claim that this works.

Fix an edge {e}. There is at least an {\varepsilon} chance that {T} is {e}-excellent. If it is, then there is at least one compatible pair in {\mathcal H \times T}. Hence we conclude probability of success is at least

\displaystyle \varepsilon \cdot \frac{1}{\Lambda \varepsilon^{-2}} \cdot \frac{1}{\Lambda} = \frac{\varepsilon^3}{\Lambda^2} = \eta.

(Addendum: it’s pointed out to me this isn’t quite right; the overall probability of the equation given by an edge {e} is {\ge \frac{1}{2}+\varepsilon}, but this doesn’t imply it for every edge. Thus one likely needs to do another averaging argument.)

 

Miller-Rabin (for MIT 18.434)

This is a transcript of a talk I gave as part of MIT’s 18.434 class, the “Seminar in Theoretical Computer Science” as part of MIT’s communication requirement. (Insert snarky comment about MIT’s CI-* requirements here.) It probably would have made a nice math circle talk for high schoolers but I felt somewhat awkward having to present it to a bunch of students who were clearly older than me.

1. Preliminaries

1.1. Modular arithmetic

In middle school you might have encountered questions such as

Exercise 1

What is {3^{2016} \pmod{10}}?

You could answer such questions by listing out {3^n} for small {n} and then finding a pattern, in this case of period {4}. However, for large moduli this “brute-force” approach can be time-consuming.

Fortunately, it turns out that one can predict the period in advance.

Theorem 2 (Euler’s little theorem)

  1. Let {\gcd(a,n) = 1}. Then {a^{\phi(n)} \equiv 1 \pmod n}.
  2. (Fermat) If {p} is a prime, then {a^p \equiv a \pmod p} for every {a}.

Proof: Part (a) is a special case of Lagrange’s Theorem: if {G} is a finite group and {g \in G}, then {g^{|G|}} is the identity element. Now select {G = (\mathbb Z/n\mathbb Z)^\times}. Part (b) is the case {n=p}. \Box

Thus, in the middle school problem we know in advance that {3^4 \equiv 1 \pmod{10}} because {\phi(10) = 4}. This bound is sharp for primes:

Theorem 3 (Primitive roots)

For every {p} prime there’s a {g \pmod p} such that {g^{p-1} \equiv 1 \pmod p} but {g^{k} \not\equiv 1 \pmod p} for any {k < p-1}. (Hence {(\mathbb Z/p\mathbb Z)^\times \cong \mathbb Z/(p-1)}.)

For a proof, see the last exercise of my orders handout.

We will define the following anyways:

Definition 4

We say an integer {n} (thought of as an exponent) annihilates the prime {p} if

  • {a^n \equiv 1 \pmod p} for every {a \not\equiv 0 \pmod p},
  • or equivalently, {p-1 \mid n}.

Theorem 5 (All/nothing)

Suppose an exponent {n} does not annihilate the prime {p}. Then more than {\frac{1}{2} p} of {x \pmod p} satisfy {x^n \not\equiv 1 \pmod p}.

Proof: Much stronger result is true: in {x^n \equiv 1 \pmod p} then {x^{\gcd(n,p-1)} \equiv 1 \pmod p}. \Box

1.2. Repeated Exponentiation

Even without the previous facts, one can still do:

Theorem 6 (Repeated exponentation)

Given {x} and {n}, one can compute {x^n \pmod N} with {O(\log n)} multiplications mod {N}.

The idea is that to compute {x^{600} \pmod N}, one just multiplies {x^{512+64+16+8}}. All the {x^{2^k}} can be computed in {k} steps, and {k \le \log_2 n}.

1.3. Chinese remainder theorem

In the middle school problem, we might have noticed that to compute {3^{2016} \pmod{10}}, it suffices to compute it modulo {5}, because we already know it is odd. More generally, to understand {\pmod n} it suffices to understand {n} modulo each of its prime powers.

The formal statement, which we include for completeness, is:

Theorem 7 (Chinese remainder theorem)

Let {p_1}, {p_2}, \dots, {p_m} be distinct primes, and {e_i \ge 1} integers. Then there is a ring isomorphism given by the natural projection

\displaystyle \mathbb Z/n \rightarrow \prod_{i=1}^m \mathbb Z/p_i^{e_i}.

In particular, a random choice of {x \pmod n} amounts to a random choice of {x} mod each prime power.

For an example, in the following table we see the natural bijection between {x \pmod{15}} and {(x \pmod 3, x \pmod 5)}.

\displaystyle \begin{array}{c|cc} x \pmod{15} & x \pmod{3} & x \pmod{5} \\ \hline 0 & 0 & 0 \\ 1 & 1 & 1 \\ 2 & 2 & 2 \\ 3 & 0 & 3 \\ 4 & 1 & 4 \\ 5 & 2 & 0 \\ 6 & 0 & 1 \\ 7 & 1 & 2 \end{array} \quad \begin{array}{c|cc} x \pmod{15} & x \pmod{3} & x \pmod{5} \\ \hline 8 & 2 & 3 \\ 9 & 0 & 4 \\ 10 & 1 & 0 \\ 11 & 2 & 1 \\ 12 & 0 & 2 \\ 13 & 1 & 3 \\ 14 & 2 & 4 \\ && \end{array}

2. The RSA algorithm

This simple number theory is enough to develop the so-called RSA algorithm. Suppose Alice wants to send Bob a message {M} over an insecure channel. They can do so as follows.

  • Bob selects integers {d}, {e} and {N} (with {N} huge) such that {N} is a semiprime and

    \displaystyle de \equiv 1 \pmod{\phi(N)}.

  • Bob publishes both the number {N} and {e} (the public key) but keeps the number {d} secret (the private key).
  • Alice sends the number {X = M^e \pmod N} across the channel.
  • Bob computes

    \displaystyle X^d \equiv M^{de} \equiv M^1 \equiv M \pmod N

    and hence obtains the message {M}.

In practice, the {N} in RSA is at least {2000} bits long.

The trick is that an adversary cannot compute {d} from {e} and {N} without knowing the prime factorization of {N}. So the security relies heavily on the difficulty of factoring.

Remark 8

It turns out that we basically don’t know how to factor large numbers {N}: the best known classical algorithms can factor an {n}-bit number in

\displaystyle O\left( \exp\left( \frac{64}{9} n \log(n)^2 \right)^{1/3} \right)

time (“general number field sieve”). On the other hand, with a quantum computer one can do this in {O\left( n^2 \log n \log \log n \right)} time.

3. Primality Testing

Main question: if we can’t factor a number {n} quickly, can we at least check it’s prime?

In what follows, we assume for simplicity that {n} is squarefree, i.e. {n = p_1 p_2 \dots p_k} for distinct primes {p_k}, This doesn’t substantially change anything, but it makes my life much easier.

3.1. Co-RP

Here is the goal: we need to show there is a random algorithm {A} which does the following.

  • If {n} is composite then
    • More than half the time {A} says “definitely composite”.
    • Occasionally, {A} says “possibly prime”.
  • If {n} is prime, {A} always says “possibly prime”.

If there is a polynomial time algorithm {A} that does this, we say that PRIMES is in Co-RP. Clearly, this is a very good thing to be true!

3.2. Fermat

One idea is to try to use the converse of Fermat’s little theorem: given an integer {n}, pick a random number {x \pmod n} and see if {x^{n-1} \equiv 1 \pmod n}. (We compute using repeated exponentiation.) If not, then we know for sure {n} is not prime, and we call {x} a Fermat witness modulo {n}.

How good is this test? For most composite {n}, pretty good:

Proposition 9

Let {n} be composite. Assume that there is a prime {p \mid n} such that {n-1} does not annihilate {p}. Then over half the numbers mod {n} are Fermat witnesses.

Proof: Apply the Chinese theorem then the “all-or-nothing” theorem. \Box
Unfortunately, if {n} doesn’t satisfy the hypothesis, then all the {\gcd(x,n) = 1} satisfy {x^{n-1} \equiv 1 \pmod n}!

Are there such {n} which aren’t prime? Such numbers are called Carmichael numbers, but unfortunately they exist, the first one is {561 = 3 \cdot 11 \cdot 17}.

Remark 10

For {X \gg 1}, there are more than {X^{1/3}} Carmichael numbers at most {X}.

Thus these numbers are very rare, but they foil the Fermat test.

Exercise 11

Show that a Carmichael number is not a semiprime.

3.3. Rabin-Miller

Fortunately, we can adapt the Fermat test to cover Carmichael numbers too. It comes from the observation that if {n} is prime, then {a^2 \equiv 1 \pmod n \implies a \equiv \pm 1 \pmod n}.

So let {n-1 = 2^s t}, where {t} is odd. For example, if {n = 561} then {560 = 2^4 \cdot 35}. Then we compute {x^t}, {x^{2t}}, \dots, {x^{n-1}}. For example in the case {n=561} and {x=245}:

\displaystyle \begin{array}{c|r|rrr} & \mod 561 & \mod 3 & \mod 11 & \mod 17 \\ \hline x & 245 & -1 & 3 & 7 \\ \hline x^{35} & 122 & -1 & \mathbf 1 & 3 \\ x^{70} & 298 & \mathbf 1 & 1 & 9 \\ x^{140} & 166 & 1 & 1 & -4 \\ x^{280} & 67 & 1 & 1 & -1 \\ x^{560} & 1 & 1 & 1 & \mathbf 1 \end{array}

And there we have our example! We have {67^2 \equiv 1 \pmod{561}}, so {561} isn’t prime.

So the Rabin-Miller test works as follows:

  • Given {n}, select a random {x} and compute powers of {x} as in the table.
  • If {x^{n-1} \not\equiv 1}, stop, {n} is composite (Fermat test).
  • If {x^{n-1} \equiv 1}, see if the entry just before the first {1} is {-1}. If it isn’t then we say {x} is a RM-witness and {n} is composite.
  • Otherwise, {n} is “possibly prime”.

How likely is probably?

Theorem 12

If {n} is Carmichael, then over half the {x \pmod n} are RM witnesses.

Proof: We sample {x \pmod n} randomly again by looking modulo each prime (Chinese theorem). By the theorem on primitive roots, show that the probability the first {-1} appears in any given row is {\le \frac{1}{2}}. This implies the conclusion. \Box

Exercise 13

Improve the {\frac{1}{2}} in the problem to {\frac34} by using the fact that Carmichael numbers aren’t semiprime.

3.4. AKS

In August 6, 2002, it was in fact shown that PRIMES is in P, using the deterministic AKS algorithm. However, in practice everyone still uses Miller-Rabin since the implied constants for AKS runtime are large.