Algebraic Topology Functors

This will be old news to anyone who does algebraic topology, but oddly enough I can’t seem to find it all written in one place anywhere, and in particular I can’t find the bit about {\mathsf{hPairTop}} at all.

In algebraic topology you (for example) associate every topological space {X} with a group, like {\pi_1(X, x_0)} or {H_5(X)}. All of these operations turn out to be functors. This isn’t surprising, because as far as I’m concerned the definition of a functor is “any time you take one type of object and naturally make another object”.

The surprise is that these objects also respect homotopy in a nice way; proving this is a fair amount of the “setup” work in algebraic topology.

1. Homology, {H_n : \mathsf{hTop} \rightarrow \mathsf{Grp}}

Note that {H_5} is a functor

\displaystyle H_5 : \mathsf{Top} \rightarrow \mathsf{Grp}

i.e. to every space {X} we can associate a group {H_5(X)}. (Of course, replace {5} by integer of your choice.) Recall that:

Definition 1

Two maps {f, g : X \rightarrow Y} are homotopy equivalent if there exists a homotopy between them.

Thus for a map we can take its homotopy class {[f]} (the equivalence class under this relationship). This has the nice property that {[f \circ g] = [f] \circ [g]} and so on.

Definition 2

Two spaces {X} and {Y} are homotopic if there exists a pair of maps {f : X \rightarrow Y} and {g : Y \rightarrow X} such that {[f \circ g] = [\mathrm{id}_X]} and {[g \circ f] = [\mathrm{id}_Y]}.

In light of this, we can define

Definition 3

The category {\mathsf{hTop}} is defined as follows:

  • The objects are topological spaces {X}.
  • The morphisms {X \rightarrow Y} are homotopy classes of continuous maps {X \rightarrow Y}.

Remark 4

Composition is well-defined since {[f \circ g] = [f] \circ [g]}. Two spaces are isomorphic in {\mathsf{hTop}} if they are homotopic.

Remark 5

As you might guess this “quotient” construction is called a quotient category.

Then the big result is that:

Theorem 6

The induced map {f_\sharp = H_n(f)} of a map {f: X \rightarrow Y} depends only on the homotopy class of {f}. Thus {H_n} is a functor

\displaystyle H_n : \mathsf{hTop} \rightarrow \mathsf{Grp}.

The proof of this is geometric, using the so-called prism operators. In any case, as with all functors we deduce

Corollary 7

{H_n(X) \cong H_n(Y)} if {X} and {Y} are homotopic.

In particular, the contractible spaces are those spaces {X} which are homotopy equivalent to a point. In which case, {H_n(X) = 0} for all {n \ge 1}.

2. Relative Homology, {H_n : \mathsf{hPairTop} \rightarrow \mathsf{Grp}}

In fact, we also defined homology groups

\displaystyle H_n(X,A)

for {A \subseteq X}. We will now show this is functorial too.

Definition 8

Let {\varnothing \neq A \subset X} and {\varnothing \neq B \subset X} be subspaces, and consider a map {f : X \rightarrow Y}. If {f(A) \subseteq B} we write

\displaystyle f : (X,A) \rightarrow (Y,B).

We say {f} is a map of pairs, between the pairs {(X,A)} and {(Y,B)}.

Definition 9

We say that {f,g : (X,A) \rightarrow (Y,B)} are pair-homotopic if they are “homotopic through maps of pairs”.

More formally, a pair-homotopy {f, g : (X,A) \rightarrow (Y,B)} is a map {F : [0,1] \times X \rightarrow Y}, which we’ll write as {F_t(X)}, such that {F} is a homotopy of the maps {f,g : X \rightarrow Y} and each {F_t} is itself a map of pairs.

Thus, we naturally arrive at two categories:

  • {\mathsf{PairTop}}, the category of pairs of toplogical spaces, and
  • {\mathsf{hPairTop}}, the same category except with maps only equivalent up to homotopy.

Definition 10

As before, we say pairs {(X,A)} and {(Y,B)} are pair-homotopy equivalent if they are isomorphic in {\mathsf{hPairTop}}. An isomorphism of {\mathsf{hPairTop}} is a pair-homotopy equivalence.

Then, the prism operators now let us derive

Theorem 11

We have a functor

\displaystyle H_n : \mathsf{hPairTop} \rightarrow \mathsf{Grp}.

The usual corollaries apply.

Now, we want an analog of contractible spaces for our pairs: i.e. pairs of spaces {(X,A)} such that {H_n(X,A) = 0} for {n \ge 1}. The correct definition is:

Definition 12

Let {A \subset X}. We say that {A} is a deformation retract of {X} if there is a map of pairs {r : (X, A) \rightarrow (A, A)} which is a pair homotopy equivalence.

Example 13 (Examples of Deformation Retracts)

  1. If a single point {p} is a deformation retract of a space {X}, then {X} is contractible, since the retraction {r : X \rightarrow \{\ast\}} (when viewed as a map {X \rightarrow X}) is homotopic to the identity map {\mathrm{id}_X : X \rightarrow X}.
  2. The punctured disk {D^2 \setminus \{0\}} deformation retracts onto its boundary {S^1}.
  3. More generally, {D^{n} \setminus \{0\}} deformation retracts onto its boundary {S^{n-1}}.
  4. Similarly, {\mathbb R^n \setminus \{0\}} deformation retracts onto a sphere {S^{n-1}}.

Of course in this situation we have that

\displaystyle H_n(X,A) \cong H_n(A,A) = 0.

3. Homotopy, {\pi_1 : \mathsf{hTop}_\ast \rightarrow \mathsf{Grp}}

As a special case of the above, we define

Definition 14

The category {\mathsf{Top}_\ast} is defined as follows:

  • The objects are pairs {(X, x_0)} of spaces {X} with a distinguished basepoint {x_0}. We call these pointed spaces.
  • The morphisms are maps {f : (X, x_0) \rightarrow (Y, y_0)}, meaning {f} is continuous and {f(x_0) = y_0}.

Now again we mod out:

Definition 15

Two maps {f , g : (X, x_0) \rightarrow (Y, y_0)} of pointed spaces are homotopic if there is a homotopy between them which also fixes the basepoints. We can then, in the same way as before, define the quotient category {\mathsf{hTop}_\ast}.

And lo and behold:

Theorem 16

We have a functor

\displaystyle \pi_1 : \mathsf{hTop}_\ast \rightarrow \mathsf{Grp}.

Same corollaries as before.

Advertisements

Notes on Publishing My Textbook

Hmm, so hopefully this will be finished within the next 10 years.

— An email of mine at the beginning of this project

My Euclidean geometry book was published last March or so. I thought I’d take the time to write about what the whole process of publishing this book was like, but I’ll start with the disclaimer that my process was probably not very typical and is unlikely to be representative of what everyone else does.

Writing the Book

The Idea

I’m trying to pin-point exactly when this project changed from “daydream” to “let’s do it”, but I’m not quite sure; here’s the best I can recount.

It was sometimes in the fall of 2013, towards the start of the school year; I think late September. I was a senior in high school, and I was only enrolled in two classes. It was fantastic, because it meant I had lots of time to study math. The superintendent of the school eventually found out, though, and forced me to enroll as an “office assistant” for three periods a day. Nonetheless, office assistant is not a very busy job, and so I had lots of time, all the time, every day.

Anyways, I had written a bit of geometry material for my math club the previous year, which was intended to be a light introduction. But in doing so I realized that there was much, much more I wanted to say, and so somewhere on my mental to-do list I added “flesh these notes out”. So one day, sitting in the office, after having spent another hour playing StarCraft, I finally got down to this item on the list. I hadn’t meant it to be a book; I was just wanted to finish what I had started the previous year. But sometimes your own projects spiral out of your control, and that’s what happened to me.

Really, I hadn’t come up with a brilliant idea that no one had thought of before. To my knowledge, no one had even tried yet. If I hadn’t gone and decided to write this book, someone else would have done it; maybe not right away, but within many years. Indeed, I was honestly surprised that I was the first one to make an attempt. The USAMO has been a serious contest since at least the 1990’s and 2000’s, and the demand for this book certainly existed well before my time. Really, I think this all just goes to illustrate that the Efficient Market Hypothesis is not so true in these kind of domains.

Setting Out

Initially, this text was titled A Voyage in Euclidean Geometry and the filename Voyage.pdf would persist throughout the entire project even though the title itself would change throughout.

The beginning of the writing was actually quite swift. Like everyone else, I started out with an empty LaTeX file. But it was different from blank screens I’ve had to deal with in my life; rather than staring in despair (think English essay mode), I exploded. I was bursting with things I wanted to write. It was the result of having years of competitive geometry bottled up in my head. In fact, I still have the version 0 of the table of contents that came to life as I started putting things together.

  • Angle Chasing (include “Fact 5”)
  • Centers of the Triangle
    • The Medial Triangle
    • The Euler Line
    • The Nine-Point Circle
  • Circles
    • Incircles and Excircles
    • The Power of a Point
    • The Radical Axis
  • Computational Geometry
    • All the Areas (include Extended Sine Law, Ceva/Menelaus)
    • Similar Triangles
    • Homothety
    • Stewart’s Theorem
    • Ptolemy’s Theorem
  • Some More Configurations (include symmedians)
    • Simson lines
    • Incircles and Excenters, Revisited
    • Midpoints of Altitudes
  • Circles Again
    • Inversion
    • Circles Inscribed in Segments
    • The Miquel Point (include Brokard, this could get long)
    • Spiral Similarity
  • Projective Geometry
    • Harmonic Division
    • Brokard’s Theorem
    • Pascal’s Theorem
  • Computational Techniques
    • Complex Numbers
    • Barycentric Coordinates

Of course the table of contents changed drastically over time, but that wasn’t important. The point of the initial skeleton was to provide a bucket sort for all the things that I wanted to cover. Often, I would have three different sections I wanted to write, but like all humans I can only write one thing at a time, so I would have to create section headers for the other two and try to get the first section done as quickly as I could so that I could go and write the other two as well.

I did take the time to do some things correctly, mostly LaTeX. Some examples of things I did:

  • Set up proper amsthm environments: earlier versions of the draft had “lemma”, “theorem”, “problem”, “exercise”, “proposition”, all distinct
  • Set up an organized master LaTeX file with \include’s for the chapters, rather than having just one fat file.
  • Set up shortcuts for setting up diagrams and so on.
  • Set up a “hints” system where hints to the problems would be printed in random order at the end of the book.
  • Set up a special command for new terms (\vocab). At the beginning all it did was made the text bold, but I suspected that later I might it do other things (like indexing).

In other words, whenever possible I would pay O(1) cost to get back O(n) returns. Indeed the point of using LaTeX for a long document is so that you can “say what you mean”: you type \begin{theorem} … \end{theorem}, and all the formatting is taken care of for you. Decide you want to change it later, and you only have to change the relevant code in the beginning.

And so, for three hours a day, five days a week, I sat in the main office of Irvington High School, pounding out chapter after chapter. I was essentially typing up what had been four years of competition experience; when you’re 17 years old, that’s a big chunk of your life.

I spent surprisingly little time revising (before first submission). Mostly I just fired away. I have always heard things about how important it is to rewrite things and how first drafts are always terrible, but I’m glad I ignored that advice at least at the beginning. It was immensely helpful to have the skeleton of the book laid out in a tangible form that I could actually see. That’s one thing I really like about writing; helps you collect your thoughts together.

It’s possible that this is part of my writing style; compared to what everyone says I should do, I don’t do very much rewriting. My first and final drafts tend to look pretty similar. I think this is just because when I write something that’s not an English essay, I already have a reasonably good idea what I want to say, and that the process of writing it out does much of the polishing for me. I’m also typically pretty hesitant when I write things: I do a lot of pausing for a few minutes deciding whether this sentence is really what I want before actually writing it down, even in drafts.

Some Encouragement

By late October, I had about 80 or so pages content written. Not that impressive if you think about it; I think it works out to something like 4 pages per day. In fact, looking through my data, I’m pretty sure I had a pretty consistent writing rate of about 30 minutes per page. It didn’t matter, since I had so much time.

At this point, I was beginning to think about possibly publishing the book, so it was coming out reasonably well. It was a bit embarrassing, since as far as I could tell, publishing books was done by people who were actually professionals in some way or another. So I reached out to a couple of teachers of mine (not high school) who I knew had published textbooks in one form or another; I politely asked them what their thoughts were, and if they had any advice. I got some gentle encouragement, but also a pointer to self-publishing: turns out in this day and age, there are services like Lulu or CreateSpace that will just let you publish… whatever you want. This gave me the guts to keep working on this, because it meant that there was a minimal floor: even if I couldn’t get a traditional publisher, the worst I could do was self-publish through Amazon, which was at any rate strictly better than the plan of uploading a PDF somewhere.

So I kept writing. The seasons turned, and by February, the draft was 200 pages strong. In April, I had staked out a whopping 333 pages.

The Review Process

Entering the MAA’s Queue

I was finally beginning to run out of things I wanted to add, after about six months of endless typing. So I decided to reach out again; this time I contacted a professor (henceforth Z) that I knew, whom I knew well from time at the Berkeley Math Circle. After some discussion, Z agreed to look briefly at an early draft of the manuscript to get a feel for what it was like. I must have exceeded his expectations, because Z responded enthusiastically suggesting that I submit it to the Problem Book Series of the MAA. As it turns out, he was on the editorial board, so in just a few days my book was in the official queue.

This was all in April. The review process was scheduled to begin in June, and likely take the entirety of the summer. I was told that if I had a more revised draft before the review that I should also send it in.

It was then I decided I needed to get some feedback. So, I reached out to a few of my close friends asking them if they’d be willing to review drafts of the manuscript. This turned out to not go quite as well as I hoped, since

  • Many people agreed eagerly, but then didn’t actually follow through with going through and reading chapter by chapter.
  • I was stupid enough to send the entire manuscript rather than excerpts, and thus ran myself a huge risk of getting the text leaked. Fortunately, I have good friends, but it nagged at me for quite a while. Learned my lesson there.

That’s not to say it was completely useless; I did get some typos fixed. But just not as many as I hoped.

The First Review

Not very much happened for the rest of the summer while I waited impatiently; it was a long four month wait for me. Finally, in the end of August 2014, I got the comments from the board; I remember I was practicing the piano at Harvard when I saw the email.

There had been six reviews. While I won’t quote the exact reviews, I’ll briefly summarize them.

  1. There is too much idiosyncratic terminology.
  2. This is pretty impressive, but will need careful editing.
  3. This project is fantastic; the author should be encouraged to continue.
  4. This is well developed; may need some editing of contents since some topics are very advanced.
  5. Overall I like this project. That said, it could benefit from some reading and editing. For example, here are some passages in particular that aren’t clear.
  6. This manuscript reads well, written at a fairly high level. The motivation provided are especially good. It would be nice if there were some solutions or at least longer hints for the (many) problems in the text. Overall the author should be encouraged to continue.

The most surprising thing was how short the comments were. I had expected that, given the review had consumed the entire summer, the reviewers would at least have read the manuscript in detail. But it turns out that mostly all that had been obtained were cursory impressions from the board members: the first four reviews were only a few sentences long! The fifth review was more detailed, but it was essentially a “spot check”.

I admit, I was really at a loss for how I should proceed. The comments were not terribly specific, and the only real action-able item were to use less extravagant terms in response to 1 (I originally had “configuration”, “exercise” vs “problem”, etc.) and to add solutions (in response to 5). When I showed he comments to Z, he commented that while they were positive, they seemed to suggest that the publication may not be anytime soon. So I decided to try submitting a second draft to the MAA, but if that didn’t work I would fall back on the self-publishing route.

The reviewers had commented about finding a few typos, so I again enlisted the help of some friends of mine to eliminate them. This time I was a lot smarter. First, I only sent the relevant excerpts that I wanted them to read, and watermarked the PDF’s with the names of the recipients. Secondly, this time I paid them as well: specifically, I gave 40 + \min(40, 0.1n^2) dollars for each chapter read, where n was the number of errors found. I also gave a much clearer “I need this done by X” deadline. This worked significantly better than my first round of edits. Note to self: people feel more obliged to do a good job if you pay them!

All in all my friends probably eliminated about 500 errors.

I worked as rapidly as I could, and within four weeks I had the new version. The changes that I made were:

  • In response to the first board comment, I eliminated some of the most extravagant terminology (“demonstration”, “configuration”, etc.) in favor of more conventional terms (“example”, “lemma”).
  • I picked about 5-10 problems from each chapter and added full solutions for them. This inflated the manuscript by another 70 pages, for a new total of 400 pages.
  • Many typos and revisions were corrected, thanks to my team of readers.
  • Some formatting changes; most notably, I got the idea to put theorems and lemmas in boxes using mdframed (most of my recent olympiad handouts have the same boxes).
  • Added several references.

I sent this out and sat back.

The Second Review

What followed was another long waiting process for what again were ended up being cursory comments The delay between the first and second review was definitely the most frustrating part — there seemed to be nothing I could do other than sit and wait. I seriously considered dropping the MAA and self-publishing during this time.

I had been told to expect comments back in the spring. Finally, in early April I poked the editorial board again asking whether there had been any progress, and was horrified to find out that the process hadn’t even started out due to a miscommunication. Fortunately, the editor was apologetic enough about the error that she asked the board to try to expedite the process a little. The comments then arrived in mid-May, six weeks afterwards.

There were eight reviewers this time. In addition to some stylistic changes suggested (e.g. avoid contractions), here were some of the main comments.

  • The main complaint was that I had been a bit too informal. They were right on all accounts here: in the draft I had sent, the chapters had opened with some quotes from years of MOP (which confused the board, for obvious reasons) and I had some snarky comments about high school geometry (since I happen to despise the way Euclidean geometry is taught in high school.) I found it amusing that no one had brought it up yet, and happily obliged to fix them.
  • Some reviewers had pointed out that some of the topics were very advanced. In fact, one of the reviewers actually recommend against the publication of the book on the account that no one would want to buy it. Fortunately, the book ended up getting accepted anyways.
  • In that vein, there were some remarks that this book, although it serves its target audience well, is written at a fairly advanced level.

Some of the reviews were cursory like before, but some of them were line-by-line readings of a random chapter, and so this time I had something more tangible to work with.

So I proceeded to make the changes. For the first time, I finally had the brains to start using git to track the changes I made to the book. This was an enormously good idea, and I wish I had done so earlier.

Here are some selected changes that were made (the full list of changes is quite long).

  • Eliminate a bunch of snarky comments about high school, and the MOP quotes.
  • Eliminate about 250 contractions.
  • Eliminate about 50 instances of unnecessary future tense.
  • Eliminate the real product from the text.
  • Added in about seven new problems.
  • Added and improved significantly on the index of the book, making it far more complete.
  • Fix more references.
  • Change the title to “Euclidean Geometry in Mathematical Olympiads” (it was originally “Geometra Galactica”).
  • Change the name of Part II from “Dark Arts” to “Analytic Techniques”. (Hehe.)
  • Added people to the acknowledgments.
  • Changes in formatting: most notably I change the font size from 11pt to 10pt to decrease the page count, since my book was already twice as long as many of the other books in the series. This dropped me from about 400 pages back to about 350 pages.
  • Fix about 200 more typos. Thanks to those of you who found them!

I sent out the third draft just as June started, about three weeks after I had received the comments. (I like to work fast.)

The Last Revisions

There were another two rounds afterwards. In late June, I got a small set of about three pages of additional typos and clarifying suggestions. I sent back the third draft one day later.

Six days later, I got back a list of four remaining edits to make. I sent an updated fourth draft 17 minutes after receiving those comments. Unfortunately, it then took another five weeks for the four changes I made to be acknowledged. Finally, in early August, the changes were approved and the editorial board forwarded an official recommendation to MAA to publish the book.

Summary of Review Timeline

In summary, the timeline of the review process was

  • First draft submitted: April 6, 2014
  • Feedback received: August 28, 2014
    Second draft submitted: November 5, 2014
  • Feedback received: May 19, 2015
    Third draft submitted: June 23, 2015
  • Feedback received: June 29, 2015
    Fourth draft submitted: June 29, 2015
  • Official recommendation to MAA made: August 2015

I think with traditional publishers there is a lot of waiting; my understanding is that the editorial board largely consists of volunteers, so this seems inevitable.

Approval and Onwards

On September 3, 2015, I got the long-awaited message:

It is a pleasure to inform you that the MAA Council on Books has approved the recommendation of the MAA Problem Books editorial board to publish your manuscript, Euclidean Geometry in Mathematical Olympiads.

I got a fairly standard royalty contract from the publisher, which I signed off without much thought.

Editing

I had a total of zero math editors and one copy editor provided. It shows through on the enormous list of errors (and this is after all the mistakes my friends helped me find).

Fortunately, my copy editor was quite good (and I have a lot of sympathy for this poor soul, who had to read every word of the entire manuscript). My Git history indicates that approximately 1000 corrections were made; on average, this is about 2 per page, which sounds about right. I got the corrections on hard copy in the mail; the entire printout of my book, except well marked with red ink.

Many of the changes fell into general shapes:

  • Capitalization. I was unwittingly inconsistent with “Law of Cosines” versus “Law of cosines” versus “law of cosines”, etc and my copy editor noticed every one of these. Similarly, cases of section and chapter titles were often not consistent; should I use “Angle Chasing” or “Angle chasing”? The main point is to pick one convention and stick with it.
  • My copy editor pointed out every time I used “Problems for this section” and had only one problem.
  • Several unnecessary “quotes” and italics were deleted.
  • Oxford commas. My god, so many Oxford commas. You just don’t notice when the IMO Shortlist says “the circle through the points E, G, and H” but the European Girls’ Olympiad says “show that KH, EM and BC are concurrent”. I swear there were at least 100 of these in the boko. I tried to write a regular expression to find such mistakes, but there were lots of edge cases that came up, and I still had to do many of these manually.
  • Inconsistency of em dashes and en dashes. This one worked better with regular expressions.

But of course there were plenty of other mistakes like missing spaces, missing degree spaces, punctuation errors, etc.

Cover Art

This was handled for me by the publisher: they gave me a choice of five or so designs and I picked one I liked.

(If you are self-publishing, this is actually one of the hardest parts of the publishing logistics; you need to design the cover on your own.)

Proofs

It turns out that after all the hard work I spent on formatting the draft, the MAA has a standard template and had the production team re-typeset the entire book using this format. Fortunately, the publisher’s format is pretty similar to mine, and so there were no huge cosmetic changes.

At this point I got the proofs, which are essentially the penultimate drafts of the book as they will be sent to the printers.

Affiliation and Miscellani

There was a bit more back-and-forth with the publisher towards the end. For example, they asked me if I would like my affiliation to be listed as MIT or to not have an affiliation. I chose the latter. I also send them a bio and photograph, and an author questionaire, asking me for some standard details.

Marketing was handled by the publisher based on these details.

The End

Without warning, I got an email on March 25 announcing that the PDF versions of my book were now available on MAA website. The hard copies followed a few months afterwards. That marked the end of my publication process.

If I were to do this sort of thing again, I guess the main decision would be whether to self-publish or go through a formal publisher. The main disadvantage seems to be the time delay, and possibly also that the royalties are lesser than in self-publishing. On the flip side, the advantages of a formal publisher were:

  • Having a real copy editor read through the entire manuscript.
  • Having a committee of outsiders knock some common sense into me (e.g. not calling the book “Geometra Galactica”).
  • Having cover art and marketing completely done for me.
  • It’s more prestigious; having a real published book is (for whatever reason) a very nice CV item.

Overall I think publishing formally was the right thing to do for this book, but your mileage may vary.

Other advice I would give to my past self, mentioned above already: keep paying O(1) for O(n), use git to keep track of all versions, and be conscious about which grammatical conventions to use (in particular, stay consistent).

Here’s a better concluding question: what surprised me about the process, i.e, what was different than what I expected? Here’s a partial list of answers:

  • It took even longer than I was expecting. Large committees are inherently slow; this is no slight to the MAA, it is just how these sorts of things work.
  • I was surprised that at no point did anyone really check the manuscript for mathematical accuracy. In hindsight this should have been obvious; I expect reading the entire book properly takes at least 1-2 years.
  • I was astounded by how many errors there were in the text, be it math or grammatical or so on. During the entire process something like 2000 errors were corrected (admittedly several were minor, like Oxford commas). Yet even as I published the book, I knew that there had to be errors left. But it was still irritating to hear about them post-publication.

All in all, the entire process started in September 2013 and ended in March 2016, which is 30 months. The time was roughly 30% writing, 50% review, and 20% production.

DNSCrypt Setup with PDNSD

Here are notes for setting up DNSCrypt on Arch Linux, using pdnsd as a DNS cache, assuming the use of NetworkManager. I needed it one day since the network I was using blocked traffic to external DNS servers (parental controls), and the DNS server provided had an outdated entry for hmmt.co. (My dad then pointed out to me I could have just hard-coded the necessary IP address in /etc/hosts, oops.)

For the whole process, useful commands to test with are:

  • nslookup hmmt.co will tell you the IP used and the server from which it came.
  • dig http://www.hmmt.co gives much more detailed information to this effect. (From bind-tools.)
  • dig @127.0.0.1 http://www.hmmt.co lets you query a specific DNS server (in this case 127.0.0.1).
  • drill @127.0.0.1 http://www.hmmt.co behaves similarly.

First, pacman -S pdnsd dnscrypt-proxy (with sudo ostensibly, but I’ll leave that out here and henceforth).

Run systemctl edit dnscrypt-proxy.socket and fill in override.conf with

[Socket]
ListenStream=
ListenDatagram=
ListenStream=127.0.0.1:40
ListenDatagram=127.0.0.1:40

Optionally, one can also specify which server which DNS serve to use with systemctl edit dnscrypt-proxy.service. For example for cs-uswest I write

[Service]
ExecStart=
ExecStart=/usr/bin/dnscrypt-proxy \
      -R cs-uswest

The empty ExecStart= is necessary, since otherwise systemctl will complain about multiple ExecStart commands.

This thus configures dnscrypt-proxy to listen on 127.0.0.1, port 40.

Now we configure pdnsd to listen on port 53 (default) for cache, and relay cache misses to dnscrypt-proxy. This is accomplished by using the following for /etc/pdnsd.conf:

global {
    perm_cache = 1024;
    cache_dir = "/var/cache/pdnsd";
    run_as = "pdnsd";
    server_ip = 127.0.0.1;
    status_ctl = on;
    query_method = udp_tcp;
    min_ttl = 15m;       # Retain cached entries at least 15 minutes.
    max_ttl = 1w;        # One week.
    timeout = 10;        # Global timeout option (10 seconds).
    neg_domain_pol = on;
    udpbufsize = 1024;   # Upper limit on the size of UDP messages.
}

server {
    label = "dnscrypt-proxy";
    ip = 127.0.0.1;
    port = 40;
    timeout = 4;
    proxy_only = on;
}

source {
    owner = localhost;
    file = "/etc/hosts";
}

Now it remains to change the DNS server from whatever default is used into 127.0.0.1. For NetworkManager users, it is necessary to edit /etc/NetworkManager/NetworkManager.conf to prevent it from overriding this file:

[main]
...
dns=none

This will cause resolv.conf to be written as an empty file by NetworkManager: in this case, the default 127.0.0.1 is used as the nameserver, which is what we want.

Needless to say, one finishes with

systemctl enable dnscrypt-proxy
systemctl start dnscrypt-proxy
systemctl enable pdnsd
systemctl start pdnsd

A Sketchy Overview of Green-Tao

These are the notes of my last lecture in the 18.099 discrete analysis seminar. It is a very high-level overview of the Green-Tao theorem. It is a subset of this paper.

1. Synopsis

This post as in overview of the proof of:

Theorem 1 (Green-Tao)

The prime numbers contain arbitrarily long arithmetic progressions.

Here, Szemerédi’s theorem isn’t strong enough, because the primes have density approaching zero. Instead, one can instead try to prove the following “relativity” result.

Theorem (Relative Szemerédi)

Let {S} be a sparse “pseudorandom” set of integers. Then subsets of {A} with positive density in {S} have arbitrarily long arithmetic progressions.

In order to do this, we have to accomplish the following.

  • Make precise the notion of “pseudorandom”.
  • Prove the Relative Szemerédi theorem, and then
  • Exhibit a “pseudorandom” set {S} which subsumes the prime numbers.

This post will use the graph-theoretic approach to Szemerédi as in the exposition of David Conlon, Jacob Fox, and Yufei Zhao. In order to motivate the notion of pseudorandom, we return to the graph-theoretic approach of Roth’s theorem, i.e. the case {k=3} of Szemerédi’s theorem.

2. Defining the linear forms condition

2.1. Review of Roth theorem

Roth’s theorem can be phrased in two ways. The first is the “set-theoretic” formulation:

Theorem 2 (Roth, set version)

If {A \subseteq \mathbb Z/N} is 3-AP-free, then {|A| = o(N)}.

The second is a “weighted” version

Theorem 3 (Roth, weighted version)

Fix {\delta > 0}. Let {f : \mathbb Z/N \rightarrow [0,1]} with {\mathbf E f \ge \delta}. Then

\displaystyle \Lambda_3(f,f,f) \ge \Omega_\delta(1).

We sketch the idea of a graph-theoretic proof of the first theorem. We construct a tripartite graph {G_A} on vertices {X \sqcup Y \sqcup Z}, where {X = Y = Z = \mathbb Z/N}. Then one creates the edges

  • {(x,y)} if {2x+ y \in A},
  • {(x,z)} if {x-z \in A}, and
  • {(y,z)} if {-y-2z \in A}.

This construction is selected so that arithmetic progressions in {A} correspond to triangles in the graph {G_A}. As a result, if {A} has no 3-AP’s (except trivial ones, where {x+y+z=0}), the graph {G_A} has exactly one triangle for every edge. Then, we can use the theorem of Ruzsa-Szemerédi, which states that this graph {G_A} has {o(n^2)} edges.

2.2. The measure {\nu}

Now for the generalized version, we start with the second version of Roth’s theorem. Instead of a set {S}, we consider a function

\displaystyle \nu : \mathbb Z/N \rightarrow \mathbb R_{\ge 0}

which we call a majorizing measure. Since we are now dealing with {A} of low density, we normalize {\nu} so that

\displaystyle \mathbf E[\nu] = 1 + o(1).

Our goal is to now show a result of the form:

Theorem (Relative Roth, informally, weighted version)

If {0 \le f \le \nu}, {\mathbf E f \ge \delta}, and {\nu} satisfies a “pseudorandom” condition, then {\Lambda_3(f,f,f) \ge \Omega_{\delta}(1)}.

The prototypical example of course is that if {A \subset S \subset \mathbb Z_N}, then we let {\nu(x) = \frac{N}{|S|} 1_S(x)}.

2.3. Pseudorandomness for {k=3}

So, how can we put the pseudorandom condition? Initially, consider {G_S} the tripartite graph defined earlier, and let {p = |S| / N}; since {S} is sparse we expect {p} small. The main idea that turns out to be correct is: The number of embeddings of {K_{2,2,2}} in {S} is “as expected”, namely {(1+o(1)) p^{12} N^6}. Here {K_{2,2,2}} is actually the {2}-blow-up of a triangle. This condition thus gives us control over the distribution of triangles in the sparse graph {G_S}: knowing that we have approximately the correct count for {K_{2,2,2}} is enough to control distribution of triangles.

For technical reasons, in fact we want this to be true not only for {K_{2,2,2}} but all of its subgraphs {H}.

Now, let’s move on to the weighted version. Let’s consider a tripartite graph {G}, which we can think of as a collection of three functions

\displaystyle \begin{aligned} \mu_{-z} &: X \times Y \rightarrow \mathbb R \\ \mu_{-y} &: X \times Z \rightarrow \mathbb R \\ \mu_{-x} &: Y \times Z \rightarrow \mathbb R. \end{aligned}

We think of {\mu} as normalized so that {\mathbf E[\mu_{-x}] = \mathbf E[\mu_{-y}] = \mathbf E[\mu_{-z}] = 1}. Then we can define

Definition 4

A weighted tripartite graph {\mu = (\mu_{-x}, \mu_{-y}, \mu_{-z})} satisfies the {3}-linear forms condition if

\displaystyle \begin{aligned} \mathbf E_{x^0,x^1,y^0,y^1,z^0,z^1} &\Big[ \mu_{-x}(y^0,z^0) \mu_{-x}(y^0,z^1) \mu_{-x}(y^1,z^0) \mu_{-x}(y^1,z^1) \\ & \mu_{-y}(x^0,z^0) \mu_{-y}(x^0,z^1) \mu_{-y}(x^1,z^0) \mu_{-y}(x^1,z^1) \\ & \mu_{-z}(x^0,y^0) \mu_{-z}(x^0,y^1) \mu_{-z}(x^1,y^0) \mu_{-z}(x^1,y^1) \Big] \\ &= 1 + o(1) \end{aligned}

and similarly if any of the twelve factors are deleted.

Then the pseudorandomness condition is according to the graph we defined above:

Definition 5

A function {\nu : \mathbb Z / N \rightarrow \mathbb Z} is satisfies the {3}-linear forms condition if {\mathbf E[\nu] = 1 + o(1)}, and the tripartite graph {\mu = (\mu_{-x}, \mu_{-y}, \mu_{-z})} defined by

\displaystyle \begin{aligned} \mu_{-z} &= \nu(2x+y) \\ \mu_{-y} &= \nu(x-z) \\ \mu_{-x} &= \nu(-y-2z) \end{aligned}

satisfies the {3}-linear forms condition.

Finally, the relative version of Roth’s theorem which we seek is:

Theorem 6 (Relative Roth)

Suppose {\nu : \mathbb Z/N \rightarrow \mathbb R_{\ge 0}} satisfies the {3}-linear forms condition. Then for any {f : \mathbb Z/N \rightarrow \mathbb R_{\ge 0}} bounded above by {\nu} and satisfying {\mathbf E[f] \ge \delta > 0}, we have

\displaystyle \Lambda_3(f,f,f) \ge \Omega_{\delta}(1).

2.4. Relative Szemerédi

We of course have:

Theorem 7 (Szemerédi)

Suppose {k \ge 3}, and {f : \mathbb Z/n \rightarrow [0,1]} with {\mathbf E[f] \ge \delta}. Then

\displaystyle \Lambda_k(f, \dots, f) \ge \Omega_{\delta}(1).

For {k > 3}, rather than considering weighted tripartite graphs, we consider a {(k-1)}-uniform {k}-partite hypergraph. For example, given {\nu} with {\mathbf E[\nu] = 1 + o(1)} and {k=4}, we use the construction

\displaystyle \begin{aligned} \mu_{-z}(w,x,y) &= \nu(3w+2x+y) \\ \mu_{-y}(w,x,z) &= \nu(2w+x-z) \\ \mu_{-x}(w,y,z) &= \nu(w-y-2z) \\ \mu_{-w}(x,y,z) &= \nu(-x-2y-3z). \end{aligned}

Thus 4-AP’s correspond to the simplex {K_4^{(3)}} (i.e. a tetrahedron). We then consider the two-blow-up of the simplex, and require the same uniformity on subgraphs of {H}.

Here is the compiled version:

Definition 8

A {(k-1)}-uniform {k}-partite weighted hypergraph {\mu = (\mu_{-i})_{i=1}^k} satisfies the {k}-linear forms condition if

\displaystyle \mathbf E_{x_1^0, x_1^1, \dots, x_k^0, x_k^1} \left[ \prod_{j=1}^k \prod_{\omega \in \{0,1\}^{[k] \setminus \{j\}}} \mu_{-j}\left( x_1^{\omega_1}, \dots, x_{j-1}^{\omega_{j-1}}, x_{j+1}^{\omega_{j+1}}, \dots, x_k^{\omega_k} \right)^{n_{j,\omega}} \right] = 1 + o(1)

for all exponents {n_{j,w} \in \{0,1\}}.

Definition 9

A function {\nu : \mathbb Z/N \rightarrow \mathbb R_{\ge 0}} satisfies the {k}-linear forms condition if {\mathbf E[\nu] = 1 + o(1)}, and

\displaystyle \mathbf E_{x_1^0, x_1^1, \dots, x_k^0, x_k^1} \left[ \prod_{j=1}^k \prod_{\omega \in \{0,1\}^{[k] \setminus \{j\}}} \nu\left( \sum_{i=1}^k (j-i)x_i^{(\omega_i)} \right)^{n_{j,\omega}} \right] = 1 + o(1)

for all exponents {n_{j,w} \in \{0,1\}}. This is just the previous condition with the natural {\mu} induced by {\nu}.

The natural generalization of relative Szemerédi is then:

Theorem 10 (Relative Szemerédi)

Suppose {k \ge 3}, and {\nu : \mathbb Z/n \rightarrow \mathbb R_{\ge 0}} satisfies the {k}-linear forms condition. Let {f : \mathbb Z/N to \mathbb R_{\ge 0}} with {\mathbf E[f] \ge \delta}, {f \le \nu}. Then

\displaystyle \Lambda_k(f, \dots, f) \ge \Omega_{\delta}(1).

3. Outline of proof of Relative Szemerédi

The proof of Relative Szeremédi uses two key facts. First, one replaces {f} with a bounded {\widetilde f} which is near it:

Theorem 11 (Dense model)

Let {\varepsilon > 0}. There exists {\varepsilon' > 0} such that if:

  • {\nu : \mathbb Z/N \rightarrow \mathbb R_{\ge 0}} satisfies {\left\lVert \nu-1 \right\rVert^{\square}_r \le \varepsilon'}, and
  • {f : \mathbb Z/N \rightarrow \mathbb R_{\ge 0}}, {f \le \nu}

then there exists a function {\widetilde f : \mathbb Z/N \rightarrow [0,1]} such that {\left\lVert f - \widetilde f \right\rVert^{\square}_r \le \varepsilon}.

Here we have a new norm, called the cut norm, defined by

\displaystyle \left\lVert f \right\rVert^{\square}_r = \sup_{A_i \subseteq (\mathbb Z/N)^{r-1}} \left\lvert \mathbf E_{x_1, \dots, x_r} f(x_1 + \dots + x_r) 1_{A_1}(x_{-1}) \dots 1_{A_r}(x_{-r}) \right\rvert.

This is actually an extension of the cut norm defined on a {r}-uniform {r}-partite hypergraph (not {(r-1)}-uniform like before!): if {g : X_1 \times \dots \times X_r \rightarrow \mathbb R} is such a graph, we let

\displaystyle \left\lVert g \right\rVert^{\square}_{r,r} = \sup_{A_i \subseteq X_{-i}} \left\lvert g(x_1, \dots, x_r) 1_{A_1}(x_{-1}) \dots 1_{A_r}(x_{-r}) \right\rvert.

Taking {g(x_1, \dots, x_r) = f(x_1 + \dots + x_r)}, {X_1 = \dots = X_r = \mathbb Z/N} gives the analogy.

For the second theorem, we define the norm

\displaystyle \left\lVert g \right\rVert^{\square}_{k-1,k} = \max_{i=1,\dots,k} \left( \left\lVert g_{-i} \right\rVert^{\square}_{k-1, k-1} \right).

Theorem 12 (Relative simplex counting lemma)

Let {\mu}, {g}, {\widetilde g} be weighted {(k-1)}-uniform {k}-partite weighted hypergraphs on {X_1 \cup \dots \cup X_k}. Assume that {\mu} satisfies the {k}-linear forms condition, and {0 \le g_{-i} \le \mu_{-i}} for all {i}, {0 \le \widetilde g \le 1}. If {\left\lVert g-\widetilde g \right\rVert^{\square}_{k-1,k} = o(1)} then

\displaystyle \mathbf E_{x_1, \dots, x_k} \left[ g(x_{-1}) \dots g(x_{-k}) - \widetilde g(x_{-1}) \dots \widetilde g(x_{-k}) \right] = o(1).

One then combines these two results to prove Szemerédi, as follows. Start with {f} and {\nu} in the theorem. The {k}-linear forms condition turns out to imply {\left\lVert \nu-1 \right\rVert^{\square}_{k-1} = o(1)}. So we can find a nearby {\widetilde f} by the dense model theorem. Then, we induce {\nu}, {g}, {\widetilde g} from {\mu}, {f}, {\widetilde f} respectively. The counting lemma then reduce the bounding of {\Lambda_k(f, \dots, f)} to the bounding of {\Lambda_k(\widetilde f, \dots, \widetilde f)}, which is {\Omega_\delta(1)} by the usual Szemerédi theorem.

4. Arithmetic progressions in primes

We now sketch how to obtain Green-Tao from Relative Szemerédi. As expected, we need to us the von Mangoldt function {\Lambda}.

Unfortunately, {\Lambda} is biased (e.g. “all decent primes are odd”). To get around this, we let {w = w(N)} tend to infinity slowly with {N}, and define

\displaystyle W = \prod_{p \le w} p.

In the {W}-trick we consider only primes {1 \pmod W}. The modified von Mangoldt function then is defined by

\displaystyle \widetilde \Lambda(n) = \begin{cases} \frac{\varphi(W)}{W} \log (Wn+1) & Wn+1 \text{ prime} \\ 0 & \text{else}. \end{cases}

In accordance with Dirichlet, we have {\sum_{n \le N} \widetilde \Lambda(n) = N + o(N)}.

So, we need to show now that

Proposition 13

Fix {k \ge 3}. We can find {\delta = \delta(k) > 0} such that for {N \gg 1} prime, we can find {\nu : \mathbb Z/N \rightarrow \mathbb R_{\ge 0}} which satisfies the {k}-linear forms condition as well as

\displaystyle \nu(n) \ge \delta \widetilde \Lambda(n)

for {N/2 \le n < N}.

In that case, we can let

\displaystyle f(n) = \begin{cases} \delta \widetilde\Lambda(n) & N/2 \le n < N \\ 0 & \text{else}. \end{cases}

Then {0 \le f \le \nu}. The presence of {N/2 \le n < N} allows us to avoid “wrap-around issues” that arise from using {\mathbb Z/N} instead of {\mathbb Z}. Relative Szemerédi then yields the result.

For completeness, we state the construction. Let {\chi : \mathbb R \rightarrow [0,1]} be supported on {[-1,1]} with {\chi(0) = 1}, and define a normalizing constant {c_\chi = \int_0^\infty \left\lvert \chi'(x) \right\rvert^2 \; dx}. Inspired by {\Lambda(n) = \sum_{d \mid n} \mu(d) \log(n/d)}, we define a truncated {\Lambda} by

\displaystyle \Lambda_{\chi, R}(n) = \log R \sum_{d \mid n} \mu(d) \chi\left( \frac{\log d}{\log R} \right).

Let {k \ge 3}, {R = N^{k^{-1} 2^{-k-3}}}. Now, we define {\nu} by

\displaystyle \nu(n) = \begin{cases} \dfrac{\varphi(W)}{W} \dfrac{\Lambda_{\chi,R}(Wn+1)^2}{c_\chi \log R} & N/2 \le n < N \\ 0 & \text{else}. \end{cases}

This turns out to work, provided {w} grows sufficiently slowly in {N}.

Formal vs Functional Series (OR: Generating Function Voodoo Magic)

Epistemic status: highly dubious. I found almost no literature doing anything quite like what follows, which unsettles me because it makes it likely that I’m overcomplicating things significantly.

1. Synopsis

Recently I was working on an elegant problem which was the original problem 6 for the 2015 International Math Olympiad, which reads as follows:

Problem

[IMO Shortlist 2015 Problem C6] Let {S} be a nonempty set of positive integers. We say that a positive integer {n} is clean if it has a unique representation as a sum of an odd number of distinct elements from {S}. Prove that there exist infinitely many positive integers that are not clean.

Proceeding by contradiction, one can prove (try it!) that in fact all sufficiently large integers have exactly one representation as a sum of an even subset of {S}. Then, the problem reduces to the following:

Problem

Show that if {s_1 < s_2 < \dots} is an increasing sequence of positive integers and {P(x)} is a nonzero polynomial then we cannot have

\displaystyle \prod_{j=1}^\infty (1 - x^{s_j}) = P(x)

as formal series.

To see this, note that all sufficiently large {x^N} have coefficient {1 + (-1) = 0}. Now, the intuitive idea is obvious: the root {1} appears with finite multiplicity in {P} so we can put {P(x) = (1-x)^k Q(x)} where {Q(1) \neq 0}, and then we get that {1-x} on the RHS divides {P} too many times, right?

Well, there are some obvious issues with this “proof”: for example, consider the equality

\displaystyle 1 = (1-x)(1+x)(1+x^2)(1+x^4)(1+x^8) \dots.

The right-hand side is “divisible” by {1-x}, but the left-hand side is not (as a polynomial).

But we still want to use the idea of plugging {x \rightarrow 1^-}, so what is the right thing to do? It turns out that this is a complete minefield, and there are a lot of very subtle distinctions that seem to not be explicitly mentioned in many places. I think I have a complete answer now, but it’s long enough to warrant this entire blog post.

Here’s the short version: there’s actually two distinct notions of “generating function”, namely a “formal series” and “functional series”. They use exactly the same notation but are two different types of objects, and this ends up being the source of lots of errors, because “formal series” do not allow substituting {x}, while “functional series” do.

Spoiler: we’ll need the asymptotic for the partition function {p(n)}.

2. Formal Series {\neq} Functional Series

I’m assuming you’ve all heard the definition of {\sum_k c_kx^k}. It turns out unfortunately that this isn’t everything: there are actually two types of objects at play here. They are usually called formal power series and power series, but for this post I will use the more descriptive names formal series and functional series. I’ll do everything over {\mathbb C}, but one can of course use {\mathbb R} instead.

The formal series is easier to describe:

Definition 1

A formal series {F} is an infinite sequence {(a_n)_n = (a_0, a_1, a_2, \dots)} of complex numbers. We often denote it by {\sum a_nx^n = a_0 + a_1x + a_2x^2 + \dots}. The set of formal series is denoted {\mathbb C[ [x] ]}.

This is the “algebraic” viewpoint: it’s a sequence of coefficients. Note that there is no worry about convergence issues or “plugging in {x}”.

On the other hand, a functional series is more involved, because it has to support substitution of values of {x} and worry about convergence issues. So here are the necessary pieces of data:

Definition 2

A functional series {G} (centered at zero) is a function {G : U \rightarrow \mathbb C}, where {U} is an open disk centered at {0} or {U = \mathbb C}. We require that there exists an infinite sequence {(c_0, c_1, c_2, \dots)} of complex numbers satisfying

\displaystyle \forall z \in U: \qquad G(z) = \lim_{N \rightarrow \infty} \left( \sum_{k=0}^N c_k z^k \right).

(The limit is take in the usual metric of {\mathbb C}.) In that case, the {c_i} are unique and called the coefficients of {G}.

This is often written as {G(x) = \sum_n c_n x^n}, with the open set {U} suppressed.

Remark 3

Some remarks on the definition of functional series:

  • This is enough to imply that {G} is holomorphic (and thus analytic) on {U}.
  • For experts: note that I’m including the domain {U} as part of the data required to specify {G}, which makes the presentation cleaner. Most sources do something with “radius of convergence”; I will blissfully ignore this, leaving this data implicitly captured by {U}.
  • For experts: Perhaps non-standard, {U \neq \{0\}}. Otherwise I can’t take derivatives, etc.

Thus formal and functional series, despite having the same notation, have different types: a formal series {F} is a sequence, while a functional series {G} is a function that happens to be expressible as an infinite sum within its domain.

Of course, from every functional series {G} we can extract its coefficients and make them into a formal series {F}. So, for lack of better notation:

Definition 4

If {F = (a_n)_n} is a formal series, and {G : U \rightarrow \mathbb C} is a functional series whose coefficients equal {F}, then we write {F \simeq G}.

3. Finite operations

Now that we have formal and functional series, we can define sums. Since these are different types of objects, we will have to run definitions in parallel and then ideally check that they respect {\simeq}.

For formal series:

Definition 5

Let {F_1 = (a_n)_n} and {F_2 = (b_n)_n} be formal series. Then we set

\displaystyle \begin{aligned} (a_n)_n \pm (b_n)_n &= (a_n \pm b_n)_n \\ (a_n)_n \cdot (b_n)_n &= \left( \textstyle\sum_{j=0}^n a_jb_{n-j} \right)_n. \end{aligned}

This makes {\mathbb C[ [x] ]} into a ring, with identity {(0,0,0,\dots)} and {(1,0,0,\dots)}.

We also define the derivative {F = (a_n)_n} by {F' = ((n+1)a_{n+1})_n}.

It’s probably more intuitive to write these definitions as

\displaystyle \begin{aligned} \sum_n a_n x^n \pm \sum_n b_n x^n &= \sum_n (a_n \pm b_n) x^n \\ \left( \sum_n a_n x^n \right) \left( \sum_n b_n x^n \right) &= \sum_n \left( \sum_{j=0}^n a_jb_{n-j} \right) x^n \\ \left( \sum_n a_n x^n \right)' &= \sum_n na_n x^{n-1} \end{aligned}

and in what follows I’ll start to use {\sum_n a_nx^n} more. But officially, all definitions for formal series are in terms of the coefficients alone; these presence of {x} serves as motivation only.

Exercise 6

Show that if {F = \sum_n a_nx^n} is a formal series, then it has a multiplicative inverse if and only if {a_0 \neq 0}.

On the other hand, with functional series, the above operations are even simpler:

Definition 7

Let {G_1 : U \rightarrow \mathbb C} and {G_2 : U \rightarrow \mathbb C} be functional series with the same domain {U}. Then {G_1 \pm G_2} and {G_1 \cdot G_2} are defined pointwise.

If {G : U \rightarrow \mathbb C} is a functional series (hence holomorphic), then {G'} is defined poinwise.

If {G} is nonvanishing on {U}, then {1/G : U \rightarrow \mathbb C} is defined pointwise (and otherwise is not defined).

Now, for these finite operations, everything works as you expect:

Theorem 8 (Compatibility of finite operations)

Suppose {F}, {F_1}, {F_2} are formal series, and {G}, {G_1}, {G_2} are functional series {U \rightarrow \mathbb C}. Assume {F \simeq G}, {F_1 \simeq G_1}, {F_2 \simeq G_2}.

  • {F_1 \pm F_2 \simeq G_1 \pm G_2}, {F_1 \cdot F_2 = G_1 \cdot G_2}.
  • {F' \simeq G'}.
  • If {1/G} is defined, then {1/F} is defined and {1/F \simeq 1/G}.

So far so good: as long as we’re doing finite operations. But once we step beyond that, things begin to go haywire.

4. Limits

We need to start considering limits of {(F_k)_k} and {(G_k)_k}, since we are trying to make progress towards infinite sums and products. Once we do this, things start to burn.

Definition 9

Let {F_1 = \sum_n a_n x^n} and {F_2 = \sum_n b_n x^n} be formal series, and define the difference by

\displaystyle d(F_1, F_2) = \begin{cases} 2^{-n} & a_n \neq b_n, \; n \text{ minimal} \\ 0 & F_1 = F_2. \end{cases}

This function makes {\mathbb C[[x]]} into a metric space, so we can discuss limits in this space. Actually, it is a normed vector space obtained by {\left\lVert F \right\rVert = d(F,0)} above.

Thus, {\lim_{k \rightarrow \infty} F_k = F} if each coefficient of {x^n} eventually stabilizes as {k \rightarrow \infty}. For example, as formal series we have that {(1,-1,0,0,\dots)}, {(1,0,-1,0,\dots)}, {(1,0,0,-1,\dots)} converges to {1 = (1,0,0,0\dots)}, which we write as

\displaystyle \lim_{k \rightarrow \infty} (1 - x^k) = 1 \qquad \text{as formal series}.

As for functional series, since they are functions on the same open set {U}, we can use pointwise convergence or the stronger uniform convergence; we’ll say explicitly which one we’re doing.

Example 10 (Limits don’t work at all)

In what follows, {F_k \simeq G_k} for every {k}.

  • Here is an example showing that if {\lim_k F_k = F}, the functions {G_k} may not converge even pointwise. Indeed, just take {F_k = 1 - x^k} as before, and let {U = \{ z : |z| < 2 \}}.
  • Here is an example showing that even if {G_k \rightarrow G} uniformly, {\lim_k F_k} may not exist. Take {G_k = 1 - 1/k} as constant functions. Then {G_k \rightarrow 1}, but {\lim_k F_k} doesn’t exist because the constant term never stabilizes (in the combinatorial sense).
  • The following example from this math.SE answer by Robert Israel shows that it’s possible that {F = \lim_k F_k} exists, and {G_k \rightarrow G} pointwise, and still {F \not\simeq G}. Let {U} be the open unit disk, and set

    \displaystyle \begin{aligned} A_k &= \{z = r e^{i\theta} \mid 2/k \le r \le 1, \; 0 \le \theta \le 2\pi - 1/k\} \\ B_k &= \left\{ |z| \le 1/k \right\} \end{aligned}

    for {k \ge 1}. By Runge theorem there’s a polynomial {p_k(z)} such that

    \displaystyle |p_k(z) - 1/z^{k}| < 1/k \text{ on } A_k \qquad \text{and} \qquad |p_k(z)| < 1/k \text{ on }B_k.

    Then

    \displaystyle G_k(z) = z^{k+1} p_k(z)

    is the desired counterexample (with {F_k} being the sequence of coefficients from {G}). Indeed by construction {\lim_k F_k = 0}, since {\left\lVert F_k \right\rVert \le 2^{-k}} for each {k}. Alas, {|g_k(z) - z| \le 2/k} for {z \in A_k \cup B_k}, so {G_k \rightarrow z} converges pointwise to the identity function.

To be fair, we do have the following saving grace:

Theorem 11 (Uniform convergence and both limits exist is sufficient)

Suppose that {G_k \rightarrow G} converges uniformly. Then if {F_k \simeq G_k} for every {k}, and {\lim_k F_k = F}, then {F \simeq G}.

Proof: Here is a proof, copied from this math.SE answer by Joey Zhou. WLOG {G = 0}, and let {g_n(z) = \sum{a^{(n)}_kz^k}}. It suffices to show that {a_k = 0} for all {k}. Choose any {0<r<1}. By Cauchy’s integral formula, we have

\displaystyle \begin{aligned} \left|a_k - a^{(n)}_k\right| &= \left|\frac{1}{2\pi i} \int\limits_{|z|=r}{\frac{g(z)-g_n(z)}{z^{n+1}}\text{ d}z}\right| \\ & \le\frac{1}{2\pi}(2\pi r)\frac{1}{r^{n+1}}\max\limits_{|z|=r}{|g(z)-g_n(z)|} \xrightarrow{n\rightarrow\infty} 0 \end{aligned}

since {g_n} converges uniformly to {g} on {U}. Hence, {a_k = \lim\limits_{n\rightarrow\infty}{a^{(n)}_k}}. Since {a^{(n)}_k = 0} for {n\ge k}, the result follows. \Box

The take-away from this section is that limits are relatively poorly behaved.

5. Infinite sums and products

Naturally, infinite sums and products are defined by taking the limit of partial sums and limits. The following example (from math.SE again) shows the nuances of this behavior.

Example 12 (On {e^{1+x}})

The expression

\displaystyle \sum_{n=0}^\infty \frac{(1+x)^n}{n!} = \lim_{N \rightarrow \infty} \sum_{n=0}^N \frac{(1+x)^n}{n!}

does not make sense as a formal series: we observe that for every {N} the constant term of the partial sum changes.

But this does converge (uniformly, even) to a functional series on {U = \mathbb C}, namely to {e^{1+x}}.

Exercise 13

Let {(F_k)_{k \ge 1}} be formal series.

  • Show that an infinite sum {\sum_{k=1}^\infty F_k(x)} converges as formal series exactly when {\lim_k \left\lVert F_k \right\rVert = 0}.
  • Assume for convenience {F_k(0) = 1} for each {k}. Show that an infinite product {\prod_{k=0}^{\infty} (1+F_k)} converges as formal series exactly when {\lim_k \left\lVert F_k-1 \right\rVert = 0}.

Now the upshot is that one example of a convergent formal sum is the expression {\lim_{N} \sum_{n=0}^N a_nx^n} itself! This means we can use standard “radius of convergence” arguments to transfer a formal series into functional one.

Theorem 14 (Constructing {G} from {F})

Let {F = \sum a_nx^n} be a formal series and let

\displaystyle r = \frac{1}{\limsup_n \sqrt[n]{|c_n|}}.

If {r > 0} then there exists a functional series {G} on {U = \{ |z| < r \}} such that {F \simeq G}.

Proof: Let {F_k} and {G_k} be the corresponding partial sums of {c_0x^0} to {c_kx^k}. Then by Cauchy-Hadamard theorem, we have {G_k \rightarrow G} uniformly on (compact subsets of) {U}. Also, {\lim_k F_k = F} by construction. \Box

This works less well with products: for example we have

\displaystyle 1 \equiv (1-x) \prod_{j \ge 0} (1+x^{2^j})

as formal series, but we can’t “plug in {x=1}”, for example,

6. Finishing the original problem

We finally return to the original problem: we wish to show that the equality

\displaystyle P(x) = \prod_{j=1}^\infty (1 - x^{s_j})

cannot hold as formal series. We know that tacitly, this just means

\displaystyle \lim_{N \rightarrow \infty} \prod_{j=1}^N\left( 1 - x^{s_j} \right) = P(x)

as formal series.

Here is a solution obtained only by only considering coefficients, presented by Qiaochu Yuan from this MathOverflow question.

Both sides have constant coefficient {1}, so we may invert them; thus it suffices to show we cannot have

\displaystyle \frac{1}{P(x)} = \frac{1}{\prod_{j=1}^{\infty} (1 - x^{s_j})}

as formal power series.

The coefficients on the LHS have asymptotic growth a polynomial times an exponential.

On the other hand, the coefficients of the RHS can be shown to have growth both strictly larger than any polynomial (by truncating the product) and strictly smaller than any exponential (by comparing to the growth rate in the case where {s_j = j}, which gives the partition function {p(n)} mentioned before). So the two rates of growth can’t match.

New algebra handouts on my website

For olympiad students: I have now published some new algebra handouts. They are:

  • Introduction to Functional Equations, which cover the basic techniques and theory for FE’s typically appearing on olympiads like USA(J)MO.
  • Monsters, an advanced handout which covers functional equations that have pathological solutions. It covers in detail the solutions to Cauchy functional equation.
  • Summation, which is a compilation of various types of olympiad-style sums like generating functions and multiplicative number theory.

I have also uploaded:

  • English, notes on proof-writing that I used at the 2016 MOP (Mathematical Olympiad Summer Program).

You can download all these (and other handouts) from my MIT website. Enjoy!

Approximating E3-LIN is NP-Hard

This lecture, which I gave for my 18.434 seminar, focuses on the MAX-E3LIN problem. We prove that approximating it is NP-hard by a reduction from LABEL-COVER.

1. Introducing MAX-E3LIN

In the MAX-E3LIN problem, our input is a series of linear equations {\pmod 2} in {n} binary variables, each with three terms. Equivalently, one can think of this as {\pm 1} variables and ternary products. The objective is to maximize the fraction of satisfied equations.

Example 1 (Example of MAX-E3LIN instance)

\displaystyle \begin{aligned} x_1 + x_3 + x_4 &\equiv 1 \pmod 2 \\ x_1 + x_2 + x_4 &\equiv 0 \pmod 2 \\ x_1 + x_2 + x_5 &\equiv 1 \pmod 2 \\ x_1 + x_3 + x_5 &\equiv 1 \pmod 2 \end{aligned}

\displaystyle \begin{aligned} x_1 x_3 x_4 &= -1 \\ x_1 x_2 x_4 &= +1 \\ x_1 x_2 x_5 &= -1 \\ x_1 x_3 x_5 &= -1 \end{aligned}

A diligent reader can check that we may obtain {\frac34} but not {1}.

Remark 2

We immediately notice that

  • If there’s a solution with value {1}, we can find it easily with {\mathbb F_2} linear algebra.
  • It is always possible to get at least {\frac{1}{2}} by selecting all-zero or all-one.

The theorem we will prove today is that these “obvious” observations are essentially the best ones possible! Our main result is that improving the above constants to 51% and 99%, say, is NP-hard.

Theorem 3 (Hardness of MAX-E3LIN)

The {\frac{1}{2}+\varepsilon} vs. {1-\delta} decision problem for MAX-E3LIN is NP-hard.

This means it is NP-hard to decide whether an MAX-E3LIN instance has value {\le \frac{1}{2}+\varepsilon} or {\ge 1-\delta} (given it is one or the other). A direct corollary of this is approximating MAX-SAT is also NP-hard.

Corollary 4

The {\frac78+\varepsilon} vs. {1-\delta} decision problem for MAX-SAT is NP-hard.

Remark 5

The constant {\frac78} is optimal in light of a random assignment. In fact, one can replace {1-\delta} with {\delta}, but we don’t do so here.

Proof: Given an equation {a+b+c=1} in MAX-E3LIN, we consider four formulas {a \lor \neg b \lor \neg c}, {\neg a \lor b \lor \neg c}, {a \lor \neg b \lor \neg c}, {a \lor b \lor c}. Either three or four of them are satisfied, with four occurring exactly when {a+b+c=0}. One does a similar construction for {a+b+c=1}. \Box

The hardness of MAX-E3LIN is relevant to the PCP theorem: using MAX-E3LIN gadgets, Ha}stad was able to prove a very strong version of the PCP theorem, in which the verifier merely reads just three bits of a proof!

Theorem 6 (Hastad PCP)

Let {\varepsilon, \delta > 0}. We have

\displaystyle \mathbf{NP} \subseteq \mathbf{PCP}_{\frac{1}{2}+\varepsilon, 1-\delta}(3, O(\log n)).

In other words, any {L \in \mathbf{NP}} has a (non-adaptive) verifier with the following properties.

  • The verifier uses {O(\log n)} random bits, and queries just three (!) bits.
  • The acceptance condition is either {a+b+c=1} or {a+b+c=0}.
  • If {x \in L}, then there is a proof {\Pi} which is accepted with probability at least {1-\delta}.
  • If {x \notin L}, then every proof is accepted with probability at most {\frac{1}{2} + \varepsilon}.

2. Label Cover

We will prove our main result by reducing from the LABEL-COVER. Recall LABEL-COVER is played as follows: we have a bipartite graph {G = U \cup V}, a set of keys {K} for vertices of {U} and a set of labels {L} for {V}. For every edge {e = \{u,v\}} there is a function {\pi_e : L \rightarrow K} specifying a key {k = \pi_e(\ell) \in K} for every label {\ell \in L}. The goal is to label the graph {G} while maximizing the number of edges {e} with compatible key-label pairs.

Approximating LABEL-COVER is NP-hard:

Theorem 7 (Hardness of LABEL-COVER)

The {\eta} vs. {1} decision problem for LABEL-COVER is NP-hard for every {\eta > 0}, given {|K|} and {|L|} are sufficiently large in {\eta}.

So for any {\eta > 0}, it is NP-hard to decide whether one can satisfy all edges or fewer than {\eta} of them.

3. Setup

We are going to make a reduction of the following shape:

reduction

In words this means that

  • “Completeness”: If the LABEL-COVER instance is completely satisfiable, then we get a solution of value {\ge 1 - \delta} in the resulting MAX-E3LIN.
  • “Soundness”: If the LABEL-COVER instance has value {\le \eta}, then we get a solution of value {\le \frac{1}{2} + \varepsilon} in the resulting MAX-E3LIN.

Thus given an oracle for MAX-E3LIN decision, we can obtain {\eta} vs. {1} decision for LABEL-COVER, which we know is hard.

The setup for this is quite involved, using a huge number of variables. Just to agree on some conventions:

Definition 8 (“Long Code”)

A {K}-indexed binary string {x = (x_k)_k} is a {\pm 1} sequence indexed by {K}. We can think of it as an element of {\{\pm 1\}^K}. An {L}-binary string {y = (y_\ell)_\ell} is defined similarly.

Now we initialize {|U| \cdot 2^{|K|} + |V| \cdot 2^{|L|}} variables:

  • At every vertex {u \in U}, we will create {2^{|K|}} binary variables, one for every {K}-indexed binary string. It is better to collect these variables into a function

    \displaystyle f_u : \{\pm1\}^K \rightarrow \{\pm1\}.

  • Similarly, at every vertex {v \in V}, we will create {2^{|L|}} binary variables, one for every {L}-indexed binary string, and collect these into a function

    \displaystyle g_v : \{\pm1\}^L \rightarrow \{\pm1\}.

Picture:

edge

Next we generate the equations. Here’s the motivation: we want to do this in such a way that given a satisfying labelling for LABEL-COVER, nearly all the MAX-E3LIN equations can be satisfied. One idea is as follows: for every edge {e}, letting {\pi = \pi_e},

  • Take a {K}-indexed binary string {x = (x_k)_k} at random. Take an {L}-indexed binary string {y = (y_\ell)_\ell} at random.
  • Define the {L}-indexed binary {z = (z_\ell)_\ell} string by {z = \left( x_{\pi(\ell)} y_\ell \right)}.
  • Write down the equation {f_u(x) g_v(y) g_v(z) = +1} for the MAX-E3LIN instance.

Thus, assuming we had a valid coloring of the graph, we could let {f_u} and {g_v} be the dictator functions for the colorings. In that case, {f_u(x) = x_{\pi(\ell)}}, {g_v(y) = y_\ell}, and {g_v(z) = x_{\pi(\ell)} y_\ell}, so the product is always {+1}.

Unfortunately, this has two fatal flaws:

  1. This means a {1} instance of LABEL-COVER gives a {1} instance of MAX-E3LIN, but we need {1-\delta} to have a hope of working.
  2. Right now we could also just set all variables to be {+1}.

We fix this as follows, by using the following equations.

Definition 8 (Equations of reduction)

For every edge {e}, with {\pi = \pi_e}, we alter the construction and say

  • Let {x = (x_k)_k} be and {y = (y_\ell)_\ell} be random as before.
  • Let {n = (n_\ell)_\ell} be a random {L}-indexed binary string, drawn from a {\delta}-biased distribution ({-1} with probability {\delta}). And now define {z = (z_\ell)_\ell} by

    \displaystyle z_\ell = x_{\pi(\ell)} y_\ell n_\ell .

    The {n_\ell} represents “noise” bits, which resolve the first problem by corrupting a bit of {z} with probability {\delta}.

  • Write down one of the following two equations with {\frac{1}{2}} probability each:

    \displaystyle \begin{aligned} f_u(x) g_v(y) g_v(z) &= +1 \\ f_u(x) g_v(y) g_v(-z) &= -1. \end{aligned}

    This resolves the second issue.

This gives a set of {O(|E|)} equations.

I claim this reduction works. So we need to prove the “completeness” and “soundness” claims above.

4. Proof of Completeness

Given a labeling of {G} with value {1}, as described we simply let {f_u} and {g_v} be dictator functions corresponding to this valid labelling. Then as we’ve seen, we will pass {1 - \delta} of the equations.

5. A Fourier Computation

Before proving soundness, we will first need to explicitly compute the probability an equation above is satisfied. Remember we generated an equation for {e} based on random strings {x}, {y}, {\lambda}.

For {T \subseteq L}, we define

\displaystyle \pi^{\text{odd}}_e(T) = \left\{ k \in K \mid \left\lvert \pi_e^{-1}(k) \cap T \right\rvert \text{ is odd} \right\}.

Thus {T} maps subsets of {L} to subsets of {K}.

Remark 9

Note that {|\pi^{\text{odd}}(T)| \le |T|} and that {\pi^{\text{odd}}(T) \neq \varnothing} if {|T|} is odd.

Lemma 10 (Edge Probability)

The probability that an equation generated for {e = \{u,v\}} is true is

\displaystyle \frac{1}{2} + \frac{1}{2} \sum_{\substack{T \subseteq L \\ |T| \text{ odd}}} (1-2\delta)^{|T|} \widehat g_v(T)^2 \widehat f_u(\pi^{\text{odd}}_e(T)).

Proof: Omitted for now\dots \Box

6. Proof of Soundness

We will go in the reverse direction and show (constructively) that if there is MAX-E3LIN instance has a solution with value {\ge\frac{1}{2}+2\varepsilon}, then we can reconstruct a solution to LABEL-COVER with value {\ge \eta}. (The use of {2\varepsilon} here will be clear in a moment). This process is called “decoding”.

The idea is as follows: if {S} is a small set such that {\widehat f_u(S)} is large, then we can pick a key from {S} at random for {f_u}; compare this with the dictator functions where {\widehat f_u(S) = 1} and {|S| = 1}. We want to do something similar with {T}.

Here are the concrete details. Let {\Lambda = \frac{\log(1/\varepsilon)}{2\delta}} and {\eta = \frac{\varepsilon^3}{\Lambda^2}} be constants (the actual values arise later).

Definition 11

We say that a nonempty set {S \subseteq K} of keys is heavy for {u} if

\displaystyle \left\lvert S \right\rvert \le \Lambda \qquad\text{and}\qquad \widehat{f_u}(S) \ge \varepsilon^2.

Note that there are at most {\varepsilon^{-2}} heavy sets by Parseval.

Definition 12

We say that a nonempty set {T \subseteq L} of labels is {e}-excellent for {v} if

\displaystyle \left\lvert T \right\rvert \le \Lambda \qquad\text{and}\qquad S = \pi^{\text{odd}}_e(T) \text{ is heavy.}

In particular {S \neq \varnothing} so at least one compatible key-label pair is in {S \times T}.

Notice that, unlike the case with {S}, the criteria for “good” in {T} actually depends on the edge {e} in question! This makes it easier to keys than to select labels. In order to pick labels, we will have to choose from a {\widehat g_v^2} distribution.

Lemma 13 (At least {\varepsilon} of {T} are excellent)

For any edge {e = \{u,v\}}, at least {\varepsilon} of the possible {T} according to the distribution {\widehat g_v^2} are {e}-excellent.

Proof: Applying an averaging argument to the inequality

\displaystyle \sum_{\substack{T \subseteq L \\ |T| \text{ odd}}} (1-2\delta)^{|T|} \widehat g_v(T)^2 \left\lvert \widehat f_u(\pi^{\text{odd}}(T)) \right\rvert \ge 2\varepsilon

shows there is at least {\varepsilon} chance that {|T|} is odd and satisfies

\displaystyle (1-2\delta)^{|T|} \left\lvert \widehat f_u(S) \right\rvert \ge \varepsilon

where {S = \pi^{\text{odd}}_e(T)}. In particular, {(1-2\delta)^{|T|} \ge \varepsilon \iff |T| \le \Lambda}. Finally by \Cref{rem:po}, we see {S} is heavy. \Box

Now, use the following algorithm.

  • For every vertex {u \in U}, take the union of all heavy sets, say

    \displaystyle \mathcal H = \bigcup_{S \text{ heavy}} S.

    Pick a random key from {\mathcal H}. Note that {|\mathcal H| \le \Lambda\varepsilon^{-2}}, since there are at most {\varepsilon^{-2}} heavy sets (by Parseval) and each has at most {\Lambda} elements.

  • For every vertex {v \in V}, select a random set {T} according to the distribution {\widehat g_v(T)^2}, and select a random element from {T}.

I claim that this works.

Fix an edge {e}. There is at least an {\varepsilon} chance that {T} is {e}-excellent. If it is, then there is at least one compatible pair in {\mathcal H \times T}. Hence we conclude probability of success is at least

\displaystyle \varepsilon \cdot \frac{1}{\Lambda \varepsilon^{-2}} \cdot \frac{1}{\Lambda} = \frac{\varepsilon^3}{\Lambda^2} = \eta.

(Addendum: it’s pointed out to me this isn’t quite right; the overall probability of the equation given by an edge {e} is {\ge \frac{1}{2}+\varepsilon}, but this doesn’t imply it for every edge. Thus one likely needs to do another averaging argument.)

 

Against the “Research vs. Olympiads” Mantra

There’s a Mantra that you often hear in math contest discussions: “math olympiads are very different from math research”. (For known instances, see O’Neil, Tao, and more. More neutral stances: Monks, Xu.)

It’s true. And I wish people would stop saying it.

Every time I’ve heard the Mantra, it set off a little red siren in my head: something felt wrong. And I could never figure out quite why until last July. There was some (silly) forum discussion about how Allen Liu had done extraordinarily on math contests over the past year. Then someone says:

A: Darn, what math problem can he not do?!

B: I’ll go out on a limb and say that the answer to this is “most of the problems worth asking.” We’ll see where this stands in two years, at which point the answer will almost certainly change, but research \neq Olympiads.

Then it hit me.

Ping-pong vs. Tennis

Let’s try the following thought experiment. Consider a world-class ping-pong player, call her Sarah. She has a fan-base talking about her pr0 ping-pong skills. Then someone comes along as says:

Well, table tennis isn’t the same as tennis.

To which I and everyone else reasonable would say, “uh, so what?”. It’s true, but totally irrelevant; ping-pong and tennis are just not related. Maybe Sarah will be better than average at tennis, but there’s no reason to expect her to be world-class in that too.

And yet we say exactly the same thing for olympiads versus research. Someone wins the IMO, out pops the Mantra. Even if the Mantra is true when taken literally, it’s implicitly sending the message there’s something wrong with being good at contests and not good at research.

So now I ask: just what is wrong with that? To answer this question, I first need to answer: “what is math?”.

There’s been a trick played with this debate, and you can’t see it unless you taboo the word “math”. The word “math” can refer to a bunch of things, like:

  • Training for contest problems like USAMO/IMO, or
  • Learning undergraduate/graduate materials like algebra and analysis, or
  • Working on open problems and conjectures (“research”).

So here’s the trick. The research community managed to claim the name “math”, leaving only “math contests” for the olympiad community. Now the sentence

“Math contests should be relevant to math”

seems totally innocuous. But taboo the world “math”, and you get

“Olympiads should be relevant to research”

and then you notice something’s wrong. In other words, since “math” is a substring of “math contests”, it suddenly seems like the olympiads are subordinate to research. All because of an accident in naming.

Since when? Everyone agrees that olympiads and research are different things, but it does not then follow that “olympiads are useless”. Even if ping-pong is called “table tennis”, that doesn’t mean the top ping-pong players are somehow inferior to top tennis players. (And the scary thing is that in a world without the name “ping-pong”, I can imagine some people actually thinking so.)

I think for many students, olympiads do a lot of good, independent of any value to future math research. Math olympiads give high school students something interesting to work on, and even the training process for a contest such that the IMO carries valuable life lessons: it teaches you how to work hard even in the face of possible failure, and what it’s like to be competitive at an international level (i.e. what it’s like to become really good at something after years of hard work). The peer group that math contests give is also wonderful, and quite similar to the kind of people you’d meet at a top-tier university (and in some cases, they’re more or less the same people). And the problem solving ability you gain from math contests is indisputably helpful elsewhere in life. Consequently, I’m well on record as saying the biggest benefits of math contests have nothing to do with math.

There are also more mundane (but valid) reasons (they help get students out of the classroom, and other standard blurbs about STEM and so on). And as a matter of taste I also think contest problems are interesting and beautiful in their own right. You could even try to make more direct comparisons (for example, I’d guess the average arXiv paper in algebraic geometry gets less attention than the average IMO geometry problem), but that’s a point for another blog post entirely.

The Right and Virtuous Path

Which now leads me to what I think is a culture issue.

MOP alumni prior to maybe 2010 or so were classified into two groups. They would either go on to math research, which was somehow seen as the “right and virtuous path“, or they would defect to software/finance/applied math/etc. Somehow there is always this implicit, unspoken message that the smart MOPpers do math research and the dumb MOPpers drop out.

I’ll tell you how I realized why I didn’t like the Mantra: it’s because the only time I hear the Mantra is when someone is belittling olympiad medalists.

The Mantra says that the USA winning the IMO is no big deal. The Mantra says Allen Liu isn’t part of the “smart club” until he succeeds in research too. The Mantra says that the countless time and energy put into running each year’s MOP are a waste of time. The Mantra says that the students who eventually drop out of math research are “not actually good at math” and “just good at taking tests”. The Mantra even tells outsiders that they, too, can be great researchers, because olympiads are useless anyways.

The Mantra is math research’s recruiting slogan.

And I think this is harmful. The purpose of olympiads was never to produce more math researchers. If it’s really the case that olympiads and research are totally different, then we should expect relatively few olympiad students to go into research; yet in practice, a lot of them do. I think one could make a case that a lot of the past olympiad students are going into math research without realizing that they’re getting into something totally unrelated, just because the sign at the door said “math”. One could also make a case that it’s very harmful for those that don’t do research, or try research and then decide they don’t like it: suddenly these students don’t think they’re “good at math” any more, they’re not smart enough be a mathematician, etc.

But we need this kind of problem-solving skill and talent too much for it to all be spent on computing R(6,6). Richard Rusczyk’s take from Math Prize for Girls 2014 is:

When people ask me, am I disappointed when my students don’t go off and be mathematicians, my answer is I’d be very disappointed if they all did. We need people who can think about these complex problems and solve really hard problems they haven’t seen before everywhere. It’s not just in math, it’s not just in the sciences, it’s not just in medicine — I mean, what we’d give to get some of them in Congress!

Academia is a fine career, but there’s tons of other options out there: the research community may denounce those who switch out as failures, but I’m sure society will take them with open arms.

To close, I really like this (sarcastic) comment from Steven Karp (near bottom):

Contest math is inaccessible to over 90% of people as it is, and then we’re supposed to tell those that get it that even that isn’t real math? While we’re at it, let’s tell Vi Hart to stop making videos because they don’t accurately represent math research.

Addendums (response to comments)

Thanks first of all for the many long and thoughtful comments from everyone (both here, on Facebook, in private, and so on). It’s given me a lot to think about.

Here’s my responses to some of the points that were raised, which is necessarily incomplete because of the volume of discussion.

  1. To start off, it was suggested I should explicitly clarify: I do not mean to imply that people who didn’t do well on contests cannot do well in math research. So let me say that now.

  2. My favorite comment that I got was that in fact this whole post pattern matches with bravery debates.

    On one hand you have lots of olympiad students who actually FEEL BAD about winning medals because they “weren’t doing real math”. But on the other hand there are students whose parents tell them to not pursue math as a major or career because of low contest scores. These students (and their parents) would benefit a lot from the Mantra; so I concede that there are indeed good use cases of the Mantra (such as those that Anonymous Chicken, betaveros describe below) and in particular the Mantra is not intrinsically bad.

    Which of these use is the “common use” probably depends on which tribes you are part of (guess which one I see more?). It’s interesting in that in this case, the two sides actually agree on the basic fact (that contests and research are not so correlated).

  3. Some people point out that research is a career while contests aren’t. I am not convinced by this; I don’t think “is a career” is a good metric for measuring value to society, and can think of several examples of actual jobs that I think really should not exist (not saying any names). In addition, I think that if the general public understood what mathematicians actually do for a career, they just might be a little less willing to pay us.

    I think there’s an interesting discussion about whether contests / research are “valuable” or not, but I don’t think the answer is one-sided; this would warrant a whole different debate (and would derail the entire post if I tried to address it).

  4. Some people point out that training for olympiads yields diminishing returns (e.g. learning Muirhead and Schur is probably not useful for anything else). I guess this is true, but isn’t it true of almost anything? Maybe the point is supposed to be “olympiads aren’t everything”, which is agreeable (see below).

  5. The other favorite comment I got was from Another Chicken, who points out below that the olympiad tribe itself is elitist: they tend to wall themselves off from outsiders (I certainly do this), and undervalue anything that isn’t hard technical problems.

    I concede these are real problems with the olympiad community. Again, this could be a whole different blog post.

    But I think this comment missed the point of this post. It is probably fine (albeit patronizing) to encourage olympiad students to expand; but I have a big problem with framing it as “spend time on not-contests because research“. That’s the real issue with the Mantra: it is often used as a recruitment slogan, telling students that research is the next true test after the IMO has been conquered.

    Changing the Golden Metric from olympiads to research seems to just make the world more egotistic than it already is.

Vinogradov’s Three-Prime Theorem (with Sammy Luo and Ryan Alweiss)

This was my final paper for 18.099, seminar in discrete analysis, jointly with Sammy Luo and Ryan Alweiss.

We prove that every sufficiently large odd integer can be written as the sum of three primes, conditioned on a strong form of the prime number theorem.

1. Introduction

In this paper, we prove the following result:

Theorem 1 (Vinogradov)

Every sufficiently large odd integer {N} is the sum of three prime numbers.

In fact, the following result is also true, called the “weak Goldbach conjecture”.

Theorem 2 (Weak Goldbach conjecture)

Every odd integer {N \ge 7} is the sum of three prime numbers.

The proof of Vinogradov’s theorem becomes significantly simpler if one assumes the generalized Riemann hypothesis; this allows one to use a strong form of the prime number theorem (Theorem 9). This conditional proof was given by Hardy and Littlewood in the 1923’s. In 1997, Deshouillers, Effinger, te Riele and Zinoviev showed that the generalized Riemann hypothesis in fact also implies the weak Goldbach conjecture by improving the bound to {10^{20}} and then exhausting the remaining cases via a computer search.

As for unconditional proofs, Vinogradov was able to eliminate the dependency on the generalized Riemann hypothesis in 1937, which is why the Theorem 1 bears his name. However, Vinogradov’s bound used the ineffective Siegel-Walfisz theorem; his student K. Borozdin showed that {3^{3^{15}}} is large enough. Over the years the bound was improved, until recently in 2013 when Harald Helfgott claimed the first unconditional proof of Theorem 2, see here.

In this exposition we follow Hardy and Littlewood’s approach, i.e. we prove Theorem 1 assuming the generalized Riemann hypothesis, following the exposition of Rhee. An exposition of the unconditional proof by Vinogradov is given by Rouse.

2. Synopsis

We are going to prove that

\displaystyle  	\sum_{a+b+c = N} \Lambda(a) \Lambda(b) \Lambda(c) \asymp \frac12 N^2 \mathfrak G(N) 	 \ \ \ \ \ (1)

where

\displaystyle  \mathfrak G(N) 	\overset{\text{def}}{=} \prod_{p \mid N} \left( 1 - \frac{1}{(p-1)^2} \right) 	\prod_{p \nmid N} \left( 1 + \frac{1}{(p-1)^3} \right)

and {\Lambda} is the von Mangoldt function defined as usual. Then so long as {2 \nmid N}, the quantity {\mathfrak G(N)} will be bounded away from zero; thus (1) will imply that in fact there are many ways to write {N} as the sum of three distinct prime numbers.

The sum (1) is estimated using Fourier analysis. Let us define the following.

Definition 3

Let {\mathbb T = \mathbb R/\mathbb Z} denote the circle group, and let {e : \mathbb T \rightarrow \mathbb C} be the exponential function {\theta \mapsto \exp(2\pi i \theta)}. For {\alpha\in\mathbb T}, {\|\alpha\|} denotes the minimal distance from {\alpha} to an integer.

Note that {|e(\theta)-1|=\Theta(\|\theta\|)}.

Definition 4

For {\alpha \in \mathbb T} and {x > 0} we define

\displaystyle  S(x, \alpha) = \sum_{n \le x} \Lambda(n) e(n\alpha).

Then we can rewrite (1) using {S} as a “Fourier coefficient”:

Proposition 5

We have

\displaystyle  		\sum_{a+b+c = N} \Lambda(a) \Lambda(b) \Lambda(c) 		= \int_{\alpha \in \mathbb T} S(N, \alpha)^3 e(-N\alpha) \; d\alpha. 		 	\ \ \ \ \ (2)

Proof: We have

\displaystyle S(N,\alpha)^3=\sum_{a,b,c\leq N}\Lambda(a)\Lambda(b)\Lambda(c)e((a+b+c)\alpha),

so

\displaystyle  \begin{aligned} \int_{\alpha \in \mathbb T} S(N, \alpha)^3 e(-N\alpha) \; d\alpha &= \int_{\alpha \in \mathbb T} \sum_{a,b,c\leq N}\Lambda(a)\Lambda(b)\Lambda(c)e((a+b+c)\alpha) e(-N\alpha) \; d\alpha \\ &= \sum_{a,b,c\leq N}\Lambda(a)\Lambda(b)\Lambda(c)\int_{\alpha \in \mathbb T}e((a+b+c-N)\alpha) \; d\alpha \\ &= \sum_{a,b,c\leq N}\Lambda(a)\Lambda(b)\Lambda(c)I(a+b+c=N) \\ &= \sum_{a+b+c=N}\Lambda(a)\Lambda(b)\Lambda(c), \end{aligned}

as claimed. \Box

In order to estimate the integral in Proposition 5, we divide {\mathbb T} into the so-called “major” and “minor” arcs. Roughly,

  • The “major arcs” are subintervals of {\mathbb T} centered at a rational number with small denominator.
  • The “minor arcs” are the remaining intervals.

These will be made more precise later. This general method is called the Hardy-Littlewood circle method, because of the integral over the circle group {\mathbb T}.

The rest of the paper is structured as follows. In Section 3, we define the Dirichlet character and other number-theoretic objects, and state some estimates for the partial sums of these objects conditioned on the Riemann hypothesis. These bounds are then used in Section 4 to provide corresponding estimates on {S(x, \alpha)}. In Section 5 we then define the major and minor arcs rigorously and use the previous estimates to given an upper bound for the integral over both areas. Finally, we complete the proof in Section 6.

3. Prime number theorem type bounds

In this section, we collect the necessary number-theoretic results that we will need. It is in this section only that we will require the generalized Riemann hypothesis.

As a reminder, the notation {f(x)\ll g(x)}, where {f} is a complex function and {g} a nonnegative real one, means {f(x)=O(g(x))}, a statement about the magnitude of {f}. Likewise, {f(x)=g(x)+O(h(x))} simply means that for some {C}, {|f(x)-g(x)|\leq C|h(x)|} for all sufficiently large {x}.

3.1. Dirichlet characters

In what follows, {q} denotes a positive integer.

Definition 6

A Dirichlet character modulo {q} {\chi} is a homomorphism {\chi : (\mathbb Z/q)^\times \rightarrow \mathbb C^\times}. It is said to be trivial if {\chi = 1}; we denote this character by {\chi_0}.

By slight abuse of notation, we will also consider {\chi} as a function {\mathbb Z \rightarrow \mathbb C^\ast} by setting {\chi(n) = \chi(n \pmod q)} for {\gcd(n,q) = 1} and {\chi(n) = 0} for {\gcd(n,q) > 1}.

Remark 7

The Dirichlet characters form a multiplicative group of order {\phi(q)} under multiplication, with inverse given by complex conjugation. Note that {\chi(m)} is a primitive {\phi(q)}th root of unity for any {m \in (\mathbb Z/q)^\times}, thus {\chi} takes values in the unit circle.

Moreover, the Dirichlet characters satisfy an orthogonality relation

Experts may recognize that the Dirichlet characters are just the elements of the Pontryagin dual of {(\mathbb Z/q)^\times}. In particular, they satisfy an orthogonality relationship

\displaystyle  	\frac{1}{\phi(q)} 	\sum_{\chi \text{ mod } q} \chi(n) \overline{\chi(a)} 	= \begin{cases} 		1 & n = a \pmod q \\ 		0 & \text{otherwise} 	\end{cases} 	 \ \ \ \ \ (3)

and thus form an orthonormal basis for functions {(\mathbb Z/q)^\times \rightarrow \mathbb C}.

3.2. Prime number theorem for arithmetic progressions

Definition 8

The generalized Chebyshev function is defined by

\displaystyle  \psi(x, \chi) = \sum_{n \le x} \Lambda(n) \chi(n).

The Chebyshev function is studied extensively in analytic number theory, as it is the most convenient way to phrase the major results of analytic number theory. For example, the prime number theorem is equivalent to the assertion that

\displaystyle  \psi(x, \chi_0) = \sum_{n \le x} \Lambda(n) \asymp x

where {q = 1} (thus {\chi_0} is the constant function {1}). Similarly, Dirichlet’s theorem actually asserts that any {q \ge 1},

\displaystyle  	\psi(x, \chi) 	= \begin{cases} 		x + o_q(x) & \chi = \chi_0 \text{ trivial} \\ 		o_q(x) & \chi \neq \chi_0 \text{ nontrivial}. 	\end{cases}

However, the error term in these estimates is quite poor (more than {x^{1-\varepsilon}} for every {\varepsilon}). However, by assuming the Riemann Hypothesis for a certain “{L}-function” attached to {\chi}, we can improve the error terms substantially.

Theorem 9 (Prime number theorem for arithmetic progressions)

Let {\chi} be a Dirichlet character modulo {q}, and assume the Riemann hypothesis for the {L}-function attached to {\chi}.

  1. If {\chi} is nontrivial, then

    \displaystyle  \psi(x, \chi) \ll \sqrt{x} (\log qx)^2.

  2. If {\chi = \chi_0} is trivial, then

    \displaystyle  \psi(x, \chi_0) = x + O\left( \sqrt x (\log x)^2 + \log q \log x \right).

Theorem 9 is the strong estimate that we will require when putting good estimates on {S(x, \alpha)}, and is the only place in which the generalized Riemann Hypothesis is actually required.

3.3. Gauss sums

Definition 10

For {\chi} a Dirichlet character modulo {q}, the Gauss sum {\tau(\chi)} is defined by

\displaystyle \tau(\chi)=\sum_{a=0}^{q-1}\chi(a)e(a/q).

We will need the following fact about Gauss sums.

Lemma 11

Consider Dirichlet characters modulo {q}. Then:

  1. We have {\tau(\chi_0) = \mu(q)}.
  2. For any {\chi} modulo {q}, {\left\lvert \tau(\chi) \right\rvert \le \sqrt q}.

3.4. Dirichlet approximation

We finally require Dirichlet approximation theorem in the following form.

Theorem 12 (Dirichlet approximation)

Let {\alpha \in \mathbb R} be arbitrary, and {M} a fixed integer. Then there exists integers {a} and {q = q(\alpha)}, with {1 \le q \le M} and {\gcd(a,q) = 1}, satisfying

\displaystyle  \left\lvert \alpha - \frac aq \right\rvert \le \frac{1}{qM}.

4. Bounds on {S(x, \alpha)}

In this section, we use our number-theoretic results to bound {S(x,\alpha)}.

First, we provide a bound for {S(x,\alpha)} if {\alpha} is a rational number with “small” denominator {q}.

Lemma 13

Let {\gcd(a,q) = 1}. Assuming Theorem 9, we have

\displaystyle  S(x, a/q) 		= \frac{\mu(q)}{\phi(q)} x + O\left( \sqrt{qx} (\log qx)^2 \right)

where {\mu} denotes the Möbius function.

Proof: Write the sum as

\displaystyle  S(x, a/q) = \sum_{n \le x} \Lambda(n) e(na/q).

First we claim that the terms {\gcd(n,q) > 1} (and {\Lambda(n) \neq 0}) contribute a negligibly small {\ll \log q \log x}. To see this, note that

  • The number {q} has {\ll \log q} distinct prime factors, and
  • If {p^e \mid q}, then {\Lambda(p) + \dots + \Lambda(p^e) 			= e\log p = \log(p^e) < \log x}.

So consider only terms with {\gcd(n,q) = 1}. To bound the sum, notice that

\displaystyle  \begin{aligned} 		e(n \cdot a/q) &= \sum_{b \text{ mod } q} e(b/q) \cdot \mathbf 1(b \equiv an) \\ 		&= \sum_{b \text{ mod } q} e(b/q) \left( \frac{1}{\phi(q)} 			\sum_{\chi \text{ mod } q} \chi(b) \overline{\chi(an)} \right) 	\end{aligned}

by the orthogonality relations. Now we swap the order of summation to obtain a Gauss sum:

\displaystyle  \begin{aligned} 		e(n \cdot a/q) &= \frac{1}{\phi(q)} \sum_{\chi \text{ mod } q} \overline{\chi(an)} 			\left( \sum_{b \text{ mod } q} \chi(b) e(b/q) \right) \\ 		&= \frac{1}{\phi(q)} \sum_{\chi \text{ mod } q} \overline{\chi(an)} \tau(\chi). 	\end{aligned}

Thus, we swap the order of summation to obtain that

\displaystyle  \begin{aligned} 		S(x, \alpha) &= \sum_{\substack{n \le x \\ \gcd(n,q) = 1}} 			\Lambda(n) e(n \cdot a/q) \\ 		&= \frac{1}{\phi(q)} \sum_{\substack{n \le x \\ \gcd(n,q) = 1}} 			\sum_{\chi \text{ mod } q} \Lambda(n) \overline{\chi(an)} \tau(\chi) \\ 		&= \frac{1}{\phi(q)} \sum_{\chi \text{ mod } q} \tau(\chi) 			\sum_{\substack{n \le x \\ \gcd(n,q) = 1}} \Lambda(n) \overline{\chi(an)} \\ 		&= \frac{1}{\phi(q)} \sum_{\chi \text{ mod } q} \overline{\chi(a)} \tau(\chi) 			\sum_{\substack{n \le x \\ \gcd(n,q) = 1}} \Lambda(n)\overline{\chi(n)} \\ 		&= \frac{1}{\phi(q)} \sum_{\chi \text{ mod } q} \overline{\chi(a)} 			\tau(\chi) \psi(x, \overline\chi) \\ 		&= \frac{1}{\phi(q)} \left( \tau(\chi_0) \psi(x, \chi_0) 			+ \sum_{1 \neq \chi \text{ mod } q} \overline{\chi(a)} \tau(\chi) 			\psi(x, \overline\chi) \right). 	\end{aligned}

Now applying both parts of Lemma 11 in conjunction with Theorem 9 gives

\displaystyle  \begin{aligned} 		S(x,\alpha) 		&= \frac{\mu(q)}{\phi(q)} 			\left( x + O\left( \sqrt x (\log qx)^2 \right) \right) 			+ O\left( \sqrt x (\log x)^2 \right) \\ 		&= \frac{\mu(q)}{\phi(q)} x + O\left( \sqrt{qx} (\log qx)^2 \right) 	\end{aligned}

as desired. \Box

We then provide a bound when {\alpha} is “close to” such an {a/q}.

Lemma 14

Let {\gcd(a,q) = 1} and {\beta \in \mathbb T}. Assuming Theorem 9, we have

\displaystyle  		S(x, a/q + \beta) = 		\frac{\mu(q)}{\phi(q)} \left( \sum_{n \le x} e(\beta n) \right) 		+ O\left( (1+\|\beta\|x) \sqrt{qx} (\log qx)^2 \right).

Proof: For convenience let us assume {x \in \mathbb Z}. Let {\alpha = a/q + \beta}. Let us denote {\text{Err}(x, \alpha) 		= S(x,\alpha) - \frac{\mu(q)}{\phi(q)} x}, so by Lemma 13 we have {\text{Err}(x,\alpha) \ll \sqrt{qx}(\log x)^2}. We have

\displaystyle  \begin{aligned} 		S(x, \alpha) &= \sum_{n \le x} \Lambda(n) e(na/q) e(n\beta) \\ 		&= \sum_{n \le x} e(n\beta) \left( S(n, a/q) - S(n-1, a/q) \right) \\ 		&= \sum_{n \le x} e(n\beta) \left( 			\frac{\mu(q)}{\phi(q)} 			+ \text{Err}(n, \alpha) - \text{Err}(n-1, \alpha) \right) \\ 		&= \frac{\mu(q)}{\phi(q)} \left( \sum_{n \le x} e(n\beta) \right) 			+ \sum_{1 \le m \le x-1} \left( e( (m+1)\beta) - e( m\beta ) \right) 			\text{Err}(m, \alpha) \\ 		&\qquad + e(x\beta) \text{Err}(x, \alpha) - e(0) \text{Err}(0, \alpha) \\ 		&\le \frac{\mu(q)}{\phi(q)} \left( \sum_{n \le x} e(n\beta) \right) 			+ \left( \sum_{1 \le m \le x-1} \|\beta\| \text{Err}(m, \alpha) \right) 			+ \text{Err}(0, \alpha) + \text{Err}(x, \alpha) \\ 		&\ll \frac{\mu(q)}{\phi(q)} \left( \sum_{n \le x} e(n\beta) \right) 			+ \left( 1+x\left\| \beta \right\| \right) 			O\left( \sqrt{qx} (\log qx)^2 \right) 	\end{aligned}

as desired. \Box

Thus if {\alpha} is close to a fraction with small denominator, the value of {S(x, \alpha)} is bounded above. We can now combine this with the Dirichlet approximation theorem to obtain the following general result.

Corollary 15

Suppose {M = N^{2/3}} and suppose {\left\lvert \alpha - a/q \right\rvert \le \frac{1}{qM}} for some {\gcd(a,q) = 1} with {q \le M}. Assuming Theorem 9, we have

\displaystyle  S(x, \alpha) \ll \frac{x}{\varphi(q)} + x^{\frac56+\varepsilon}

for any {\varepsilon > 0}.

Proof: Apply Lemma 14 directly. \Box

5. Estimation of the arcs

We’ll write

\displaystyle  f(\alpha) \overset{\text{def}}{=} S(N,\alpha)=\sum_{n \le N} \Lambda(n)e(n\alpha)

for brevity in this section.

Recall that we wish to bound the right-hand side of (2) in Proposition 5. We split {[0,1]} into two sets, which we call the “major arcs” and the “minor arcs.” To do so, we use Dirichlet approximation, as hinted at earlier.

In what follows, fix

\displaystyle  \begin{aligned} 	M &= N^{2/3} \\ 	K &= (\log N)^{10}. \end{aligned}

5.1. Setting up the arcs

Definition 16

For {q \le K} and {\gcd(a,q) = 1}, {1 \le a \le q}, we define

\displaystyle  		\mathfrak M(a,q) = \left\{ \alpha \in \mathbb T 		\mid \left\lvert \alpha - \frac aq \right\rvert \le \frac 1M \right\}.

These will be the major arcs. The union of all major arcs is denoted by {\mathfrak M}. The complement is denoted by {\mathfrak m}.

Equivalently, for any {\alpha}, consider {q = q(\alpha) \le M} as in Theorem 12. Then {\alpha \in \mathfrak M} if {q \le K} and {\alpha \in \mathfrak m} otherwise.

Proposition 17

{\mathfrak M} is composed of finitely many disjoint intervals {\mathfrak M(a,q)} with {q \le K}. The complement {\mathfrak m} is nonempty.

Proof: Note that if {q_1, q_2 \le K} and {a/q_1 \neq b/q_2} then {\left\lvert \frac{a}{q_1} - \frac{b}{q_2} \right\rvert 	\ge \frac{1}{q_1q_2} \gg \frac{3}{qM}}. \Box

In particular both {\mathfrak M} and {\mathfrak m} are measurable. Thus we may split the integral in (2) over {\mathfrak M} and {\mathfrak m}. This integral will have large magnitude on the major arcs, and small magnitude on the minor arcs, so overall the whole interval {[0,1]} it will have large magnitude.

5.2. Estimate of the minor arcs

First, we note the well known fact {\phi(q) \gg q/\log q}. Note also that if {q=q(\alpha)} as in the last section and {\alpha} is on a minor arc, we have {q > (\log N)^{10}}, and thus {\phi(q) \gg (\log N)^{9}}.

As such Corollary 3.3 yields that {f(\alpha) \ll \frac{N}{\phi(q)}+N^{.834} \ll \frac{N}{(\log N)^9}}.

Now,

\displaystyle  \begin{aligned} 	\left\lvert \int_{\mathfrak m}f(\alpha)^3e(-N\alpha) \; d\alpha \right\rvert 	&\le \int_{\mathfrak m}\left\lvert f(\alpha)\right\rvert ^3 \; d\alpha \\ 	&\ll \frac{N}{(\log N)^9} \int_{0}^{1}\left\lvert f(\alpha)\right\rvert ^2 \;d\alpha \\ 	&=\frac{N}{(\log N)^9}\int_{0}^{1}f(\alpha)f(-\alpha) \; d\alpha \\ 	&=\frac{N}{(\log N)^9}\sum_{n \le N} \Lambda(n)^2 \\ 	&\ll \frac{N^2}{(\log N)^8}, \end{aligned}

using the well known bound {\sum_{n \le N} \Lambda(n)^2 \ll \frac{N}{\log N}}. This bound of {\frac{N^2}{(\log N)^8}} will be negligible compared to lower bounds for the major arcs in the next section.

5.3. Estimate on the major arcs

We show that

\displaystyle  \int_{\mathfrak M}f(\alpha)^3e(-N\alpha) d\alpha \asymp \frac{N^2}{2} \mathfrak G(N).

By Proposition 17 we can split the integral over each interval and write

\displaystyle  \int_{\mathfrak M} f(\alpha)^3e(-N\alpha) \; d\alpha 	= \sum_{q \le (\log N)^{10}}\sum_{\substack{1 \le a \le q \\ \gcd(a,q)=1}} 	\int_{-1/qM}^{1/qM}f(a/q+\beta)^3e(-N(a/q+\beta)) \; d\beta.

Then we apply Lemma 14, which gives

\displaystyle  \begin{aligned} 	f(a/q+\beta)^3 	&= \left(\frac{\mu(q)}{\phi(q)}\sum_{n \le N}e(\beta n) \right)^3 \\ 	&+\left(\frac{\mu(q)}{\phi(q)}\sum_{n \le N}e(\beta n)\right)^2 		O\left((1+\|\beta\|N)\sqrt{qN} \log^2 qN\right) \\ 	&+\left(\frac{\mu(q)}{\phi(q)}\sum_{n \le N}e(\beta n)\right) 		O\left((1+\|\beta\|N)\sqrt{qN} \log^2 qN\right)^2 \\ 	&+O\left((1+\|\beta\|N)\sqrt{qN} \log^2 qN\right)^3. \end{aligned}

Now, we can do casework on the side of {N^{-.9}} that {\|\beta\|} lies on.

  • If {\|\beta\| \gg N^{-.9}}, we have {\sum_{n \le N}e(\beta n) \ll \frac{2}{|e(\beta)-1|} 	\ll \frac{1}{\|\beta\|} \ll N^{.9}}, and {(1+\|\beta\|N)\sqrt{qN} \log^2 qN \ll N^{5/6+\varepsilon}}, because certainly we have {\|\beta\|<1/M=N^{-2/3}}.
  • If on the other hand {\|\beta\|\ll N^{-.9}}, we have {\sum_{n \le N}e(\beta n) \ll N} obviously, and {O(1+\|\beta\|N)\sqrt{qN} \log^2 qN) \ll N^{3/5+\varepsilon}}.

As such, we obtain

\displaystyle  f(a/q+\beta)^3 \ll \left( \frac{\mu(q)}{\phi(q)}\sum_{n \le N}e(\beta n) \right)^3 	+ O\left(N^{79/30+\varepsilon}\right)

in either case. Thus, we can write

\displaystyle  \begin{aligned} 	&\qquad \int_{\mathfrak M} f(\alpha)^3e(-N\alpha) \; d\alpha \\ 	&= \sum_{q \le (\log N)^{10}} \sum_{\substack{1 \le a \le q \\ \gcd(a,q)=1}} 	\int_{-1/qM}^{1/qM} f(a/q+\beta)^3e(-N(a/q+\beta)) \; d\beta \\ 	&= \sum_{q \le (\log N)^{10}} \sum_{\substack{1 \le a \le q \\ \gcd(a,q)=1}} 		\int_{-1/qM}^{1/qM}\left[\left(\frac{\mu(q)}{\phi(q)}\sum_{n \le N}e(\beta n)\right)^3 		+ O\left(N^{79/30+\varepsilon}\right)\right]e(-N(a/q+\beta)) \; d\beta \\ 	&=\sum_{q \le (\log N)^{10}} \frac{\mu(q)}{\phi(q)^3} S_q 		\left(\sum_{\substack{1 \le a \le q \\ \gcd(a,q)=1}} e(-N(a/q))\right) 		\left( \int_{-1/qM}^{1/qM}\left(\sum_{n \le N}e(\beta n)\right)^3e(-N\beta) 		\; d\beta \right ) \\ 	&\qquad +O\left(N^{59/30+\varepsilon}\right). \end{aligned}

just using {M \le N^{2/3}}. Now, we use

\displaystyle  \sum_{n \le N}e(\beta n) = \frac{1-e(\beta N)}{1-e(\beta)} 	\ll \frac{1}{\|\beta\|}.

This enables us to bound the expression

\displaystyle  \int_{1/qM}^{1-1/qM}\left (\sum_{n \le N}e(\beta n)\right) ^ 3 e(-N\beta)d\beta 	\ll \int_{1/qM}^{1-1/qM}\|\beta\|^{-3} d\beta = 2\int_{1/qM}^{1/2}\beta^{-3} d\beta 	\ll q^2M^2.

But the integral over the entire interval is

\displaystyle  \begin{aligned} 	\int_{0}^{1}\left(\sum_{n \le N}e(\beta n) \right)^3 e(-N\beta)d\beta 	&= \int_{0}^{1} \sum_{a,b,c \le N} e((a+b+c-N)\beta) \\ 	&\ll \sum_{a,b,c \le N} \mathbf 1(a+b+c=N) \\ 	&= \binom{N-1}{2}. \end{aligned}

Considering the difference of the two integrals gives

\displaystyle  \int_{-1/qM}^{1/qM}\left(\sum_{n \le N}e(\beta n) \right)^3 	e(-N\beta) \; d\beta - \frac{N^2}{2} \ll q^2 M^2 + N 	\ll (\log N)^c N^{4/3},

for some absolute constant {c}.

For brevity, let

\displaystyle  S_q = \sum_{\substack{1 \le a \le q \\ \gcd(a,q)=1}} e(-N(a/q)).

Then

\displaystyle  \begin{aligned} 	\int_{\mathfrak M} f(\alpha)^3e(-N\alpha) \; d\alpha &= \sum_{q \le (\log N)^{10}} \frac{\mu(q)}{\phi(q)^3}S_q 		\left( \int_{-1/qM}^{1/qM}\left(\sum_{n \le N}e(\beta n)\right)^3e(-N\beta) 		\; d\beta \right ) \\ 	&\qquad +O\left(N^{59/30+\varepsilon}\right) \\ &= \frac{N^2}{2}\sum_{q \le (\log N)^{10}} 	\frac{\mu(q)}{\phi(q)^3}S_q + O((\log N)^{10+c} N^{4/3}) 		+ O(N^{59/30+\varepsilon}) \\ &= \frac{N^2}{2}\sum_{q \le (\log N)^{10}} \frac{\mu(q)}{\phi(q)^3} 		+ O(N^{59/30+\varepsilon}). \end{aligned}

.

The inner sum is bounded by {\phi(q)}. So,

\displaystyle \left\lvert \sum_{q>(\log N)^{10}} 	\frac{\mu(q)}{\phi(q)^3} S_q \right\rvert 	 \le \sum_{q>(\log N)^{10}} \frac{1}{\phi(q)^2},

which converges since {\phi(q)^2 \gg q^c} for some {c > 1}. So

\displaystyle  \int_{\mathfrak M} f(\alpha)^3e(-N\alpha) \; d\alpha 	= \frac{N^2}{2}\sum_{q = 1}^\infty \frac{\mu(q)}{\phi(q)^3}S_q 	+ O(N^{59/30+\varepsilon}).

Now, since {\mu(q)}, {\phi(q)}, and {\sum_{\substack{1 \le a \le q \\ \gcd(a,q)=1}} e(-N(a/q))} are multiplicative functions of {q}, and {\mu(q)=0} unless {q} is squarefree,

\displaystyle  \begin{aligned} \sum_{q = 1}^\infty \frac{\mu(q)}{\phi(q)^3} S_q 	&= \prod_p \left(1+\frac{\mu(p)}{\phi(p)^3}S_p \right) \\ 	&= \prod_p \left(1-\frac{1}{(p-1)^3} 		\sum_{a=1}^{p-1} e(-N(a/p))\right) \\ 	&= \prod_p \left(1-\frac{1}{(p-1)^3}\sum_{a=1}^{p-1} 		(p\cdot \mathbf 1(p|N) - 1)\right) \\ 	&= \prod_{p|N}\left(1-\frac{1}{(p-1)^2}\right) 		\prod_{p \nmid N}\left(1+\frac{1}{(p-1)^3}\right) \\ 	&= \mathfrak G(N). \end{aligned}

So,

\displaystyle \int_{\mathfrak M} f(\alpha)^3e(-N\alpha) \; d\alpha = \frac{N^2}{2}\mathfrak{G}(N) + O(N^{59/30+\varepsilon}).

When {N} is odd,

\displaystyle  \mathfrak{G}(N) = \prod_{p|N}\left(1-\frac{1}{(p-1)^2}\right)\prod_{p \nmid N}\left(1+\frac{1}{(p-1)^3}\right)\geq \prod_{m\geq 3}\left(\frac{m-2}{m-1}\frac{m}{m-1}\right)=\frac{1}{2},

so that we have

\displaystyle \int_{\mathfrak M} f(\alpha)^3e(-N\alpha) \; d\alpha \asymp \frac{N^2}{2}\mathfrak{G}(N),

as desired.

6. Completing the proof

Because the integral over the minor arc is {o(N^2)}, it follows that

\displaystyle \sum_{a+b+c=N} \Lambda(a)\Lambda(b)\Lambda(c) = \int_{0}^{1} f(\alpha)^3 e(-N\alpha) d \alpha \asymp \frac{N^2}{2}\mathfrak{G}(N) \gg N^2.

Consider the set {S_N} of integers {p^k\leq N} with {k>1}. We must have {p \le N^{\frac{1}{2}}}, and for each such {p} there are at most {O(\log N)} possible values of {k}. As such, {|S_N| \ll\pi(N^{1/2}) \log N\ll N^{1/2}}.

Thus

\displaystyle \sum_{\substack{a+b+c=N \\ a\in S_N}} \Lambda(a)\Lambda(b)\Lambda(c) \ll (\log N)^3 |S|N \ll\log(N)^3 N^{3/2},

and similarly for {b\in S_N} and {c\in S_N}. Notice that summing over {a\in S_N} is equivalent to summing over composite {a}, so

\displaystyle  \sum_{p_1+p_2+p_3=N} \Lambda(p_1)\Lambda(p_2)\Lambda(p_3) 	=\sum_{a+b+c=N} \Lambda(a)\Lambda(b)\Lambda(c) + O(\log(N)^3 N^{3/2}) 	\gg N^2,

where the sum is over primes {p_i}. This finishes the proof.

First drafts of Napkin up!

EDIT: Here’s a July 19 draft that fixes some of the glaring issues that were pointed out.

This morning I finally uploaded the first drafts of my Napkin project, which I’ve been working on since December 2014. See the Napkin tab above for a listing of all drafts.

Napkin is my personal exposition project, which unifies together a lot of my blog posts and even more that I haven’t written on yet into a single coherent narrative. It’s written for students who don’t know much higher math, but are curious and already are comfortable with proofs. It’s especially suited for e.g. students who did contests like USAMO and IMO.

There are still a lot of rough edges in the draft, but I haven’t been able to find much time to work on it this whole calendar year, and so I’ve finally decided the perfect is the enemy of the good and it’s about time I brought this project out of the garage.

I’d much appreciate any comments, corrections, or suggestions, however minor. Please let me know! I do plan to keep updating this draft as I get comments, though I can’t promise that I’ll be very fast in doing so.

Here’s a table of contents, in brief:

I. Basic Algebra and Topology
II. Linear Algebra and Multivariable Calculus
III. Groups, Rings, and More
IV. Complex Analysis
V. Quantum Algorithms
VI. Algebraic Topology I: Homotopy
VII. Category Theory
VIII. Differential Geometry
IX. Algebraic Topology II: Homology
X. Algebraic NT I: Rings of Integers
XI. Algebraic NT II: Galois and Ramification Theory
XII. Representation Theory
XIII. Algebraic Geometry I: Varieties
XIV. Algebraic Geometry II: Schemes
XV. Set Theory I: ZFC, Ordinals, and Cardinals
XVI. Set Theory II: Model Theory and Forcing

(I’ve also posted this on Reddit to try and grab a larger audience. We’ll see how that goes.)