USEMO Problem Development, Behind the Scenes

In this post I’m hoping to say a bit about the process that’s used for the problem selection of the recent USEMO: how one goes from a pool of problem proposals to a six-problem test. (How to write problems is an entirely different story, and deserves its own post.) I choose USEMO for concreteness here, but I imagine a similar procedure could be used for many other contests.

I hope this might be of interest to students preparing for contests to see a bit of the behind-the-scenes, and maybe helpful for other organizers of olympiads.

The overview of the entire timeline is:

    1. Submission period for authors (5-10 weeks)
    2. Creating the packet
    3. Reviewing period where volunteers try out the proposed problems (6-12 weeks)
    4. Editing and deciding on a draft of the test
    5. Test-solving of the draft of the test (3-5 weeks)
    6. Finalizing and wrap-up

Now I’ll talk about these in more detail.

Pinging for problems

The USA has the rare privilege of having an extremely dedicated and enthusiastic base of volunteers, who is going to make the contest happen rain or shine. When I send out an email asking for problem proposals, I never really worry I won’t get enough people. You might have to adjust the recipe below if you have fewer hands on deck.

When you’re deciding who to invite, you have to think about a trade-off between problem security versus openness. The USEMO is not a high-stakes competition, so it accepts problems from basically anyone. On the other hand, if you are setting problems for your country’s IMO team selection test, you probably don’t want to take problems from the general public.

Submission of the problem is pretty straightforward, ask to have problems emailed as TeX, with a full solution. You should also ask for any information you care about to be included: the list of authors of the problem, any other collaborators who have seen or tried the problem, and a very very rough estimate of how hard the difficulty is. (You shouldn’t trust the estimate too much, but I’ll explain in a moment why this is helpful.)

Ideally I try to allocate 5-10 weeks between when I open submissions for problems and when the submission period ends.

It’s at this time you might as well see who is going to be interested in being a test-solver or reviewer as well — more on that later.

Creating the packet

Once the submission period ends, you want to then collate the problems into a packet that you can send to your reviewers. The reviewers will then rate the problems on difficulty and suitability.

A couple of minor tips for setting the packet:

  • I used to try to sort the packet roughly by difficulty, but in recent years I’ve switched to random order and never looked back. It just biases the reviewers too much to have the problem number matter. The data has been a lot better with random order.
  • Usually I’ll label the problems A-01, A-02, …, A-05 (say), C-06, C-07, …., and so on. The leading zero is deliberate: I’ve done so much IMO shortlist that if I see a problem named “C7”, it will automatically feel like it should be a hard problem, so using “C-07” makes this gut reaction go away for me.
  • It’s debatable whether you need to have subject classifications at all, since in some cases a problem might not fit cleanly, or the “true” classification might even give away the problem. I keep it around just because it’s convenient from an administrative standpoint to have vaguely similar problems grouped together and labelled, but I’ll explicitly tell reviewers to not take the classification seriously, rather as a convenience.

More substantial is the choice of which problems to include in the packet if you are so lucky to have a surplus of submissions. The problem is that reviewers only have so much time and energy and won’t give as good feedback if the packet is too long. In my experience, 20 problems is a nice target, 30 problems is strenuous, anything more than that is usually too much. So if you have more than 30 problems, you might need to cut some problems out.

Since this “early cutting” is necessarily pretty random (because you won’t be able to do all the problems yourself singe-handedly), I usually prefer to do it in a slightly more egalitarian way. For example, if one person submits a lot of problems, you might only take a few problems from them, and say “we had a lot of problems, so we took the 3 of yours we liked the most for review”. (That said, you might have a sense that certain problems are really unlikely to be selected, and so you might as well exclude those.)

You usually also want to make sure that you have a spread of difficulties and subjects. Actually, this is even more true if you don’t have a surplus: if it turns out that, say, you have zero easy algebra or geometry problems, that’s likely to cause problems for you later down the line. It might be good to see if you can get one or two of those if you can. This is why the author’s crude estimates of difficulty can still be useful — it’s not supposed to be used for deciding the test, but it can give you “early warnings” that the packet might be lacking in some area.

One other useful thing to do, if you have the time at this point, is to edit the proposals as they come in, before adding them onto the packet. This includes both making copy edits for clarity and formatting, as well as more substantial changes if you can see an alternate version or formulation that you think is likely to be better than the original. The reason you want to do this step now is that you want the reviewer’s eyes on these changes: making edits halfway through the review process or after it can cause confusion and desynchronization in the reviewer data, and increases the chances of errors (because the modifications have been checked fewer times).

The packet review process

Then comes the period where your reviewers will work through the problems and submit their ratings. This is also something where you want to give reviewers a lot of time: 6-8 weeks is a comfortable amount. Between 10 and 25 reviewers is a comfortable number.

Reviewers are asked to submit a difficulty rating and quality rating for each problem. The system I’ve been using, which seems to work pretty well, goes like follows:

  • The five possible difficulty ratings are “Unsuitable”, “Mediocre”, “Acceptable”, “Nice”, “Excellent”. I’ve found that this choice of five words has the connotations needed to get similar distributions of calibrations across the reviewers.
  • For difficulty, I like to provide three checkboxes “IMO1”, “IMO2”, “IMO3”, but also tell the reviewer that they can check two boxes if, say, they think a problem could appear as either IMO1 or IMO2. That means in essence five ratings {1, 1.5, 2, 2.5, 3} are possible.

This is what I converged on as far as scales that are granular enough to get reasonable numerical data without being so granular that they are unintuitive or confusing. (If your scale is too granular, then a person’s ratings might say more about how a person interpreted the scale than the actual content of the problem.) For my group, five buckets seems to be the magic number; your mileage may vary!

More important is to have lots of free text boxes so that reviewers can provide more detailed comments, alternate solutions, and so on. Those are ultimately more valuable than just a bunch of numbers.

Here’s a few more tips:

  • If you are not too concerned about security, it’s also nice to get discussion between reviewers going. It’s more fun for the reviewers and the value of having reviewers talk with each other a bit tends to outweight the risk of bias.
  • It’s usually advised to send only the problem statements first, and then only send the solutions out about halfway through. I’ve found most reviewers (myself included) appreciate the decreased temptation to look at solutions too early on.
  • One thing I often do is to have a point person for each problem, to make sure every problem is carefully attempted This is nice, but not mandatory — the nicest problems tend to get quite a bit of attention anyways.
  • One thing I’ve had success with is adding a question on the review form that asks “what six problems would you choose if you were making the call, and why?” I’ve found I get a lot of useful perspective from hearing what people say about this.

I just use Google Forms to collect all the data. There’s a feature you can enable that requires a sign-in, so that the reviewer’s responses are saved between sessions and loaded automatically (making it possible to submit the form in multiple sittings).

Choosing the draft of the test

Now that you have the feedback, you should pick a draft of the test! This is the most delicate part, and it’s where it is nice to have a co-director or small committee if possible so that you can talk out loud and bounce ideas of each other.

For this stage I like to have a table with the numerical ratings as a summary of what’s available. The way you want to do this is up to you, but some bits from my workflow:

  • My table is color-coded, and it’s sorted in five different ways: by problem number, by quality rating, by difficulty rating, by subject then quality rating, by subject then difficulty rating.
  • For the quality rating, I use the weights -0.75, -0.5, 0, 1, 1.5 for Unsuitable, Mediocre, Acceptable, Nice, Excellent. This fairly contrived set of weights was chosen based on some experience in which I wanted to the average ratings to satisfy a couple properties: I wanted the sign of the rating (- or +) to match my gut feeling, and I wanted to not the rating too sensitive to a few Unsuitable / Excellent ratings (either extreme). This weighting puts a “cliff” between Acceptable and Nice, which empirically seems a good place to make the most differential.
  • I like to include a short “name” in the table to help with remembering which problem numbers are which, e.g. “2017-vtx graph”.

An example of what a table might look like is given in the image below.

Here is a output made with fake data. The Python script used to generate the table is included here, for anyone that wants to use it.

WT_U = -0.75
WT_M = -0.5
WT_A = 0
WT_N = 1
WT_E = 1.5

# ---- Populate with convincing looking random data ----
import random
slugs = {
		"A-01" : r"$\theta \colon \mathbb Z[x] \to \mathbb Z$",
		"A-02" : r"$\sqrt[3]{\frac{a}{b+7}}$",
		"A-03" : r"$a^a b^b c^c$",
		"C-04" : r"$a+2b+\dots+32c$",
		"C-05" : r"$2017$-vtx dinner",
		"G-06" : r"$ST$ orthog",
		"G-07" : r"$PO \perp YZ$",
		"G-08" : r"Area $5/2$",
		"G-09" : r"$XD \cap AM$ on $\Gamma$",
		"G-10" : r"$\angle PQE, \angle PQF = 90^{\circ}$",
		"N-11" : r"$5^n$ has six zeros",
		"N-12" : r"$n^2 \mid b^n+1$",
		"N-13" : r"$fff$ cycles",
		}

qualities = {}
difficulties = {}
random.seed(150)
for k in slugs.keys():
	# just somehow throw stuff at wall to get counts
	a,b,c,d,e,f = [random.randrange(0,3) for _ in range(6)]
	if c >= 1: a = 0
	if a >= 2: d,e = 1,0
	if e == 0: f = 0
	if a == 0 and b == 0: e *= 2
	qualities[k] = [WT_U] * a + [WT_M] * b + [WT_A] * (b+d+e) + [WT_N] * (c+d+e) + [WT_E] * (c+e+f)

random.seed(369)
for k in slugs.keys():
	# just somehow throw stuff at wall to get counts
	a,b,c,d,e = [random.randrange(0,5) for _ in range(5)]
	if e >= 4:
		b = 0
		c //= 2
	elif e >= 3:
		a = 0
		b //= 2
	if a >= 3:
		e = 0
		d //= 3
	elif a >= 2:
		e = 0
		d //= 2
	difficulties[k] = [1] * a + [1.5] * b + [2] * c + [2.5] * d + [3] * e

# ---- End random data population ----

	
import statistics
def avg(S):
	return statistics.mean(S) if len(S) > 0 else None
def median(S):
	return statistics.median(S) if len(S) > 0 else None

# criteria for inclusion on chart
criteria = lambda k: True

def get_color_string(x, scale_min, scale_max, color_min, color_max):
	if x is None:
		return r"\rowcolor{gray}"
	m = (scale_max+scale_min)/2
	a = min(int(100 * 2 * abs(x-m) / (scale_max-scale_min)), 100)
	color = color_min if x < m else color_max
	return r"\rowcolor{%s!%d}" %(color, a) + "\n"

def get_label(key, slugged=False):
	if slugged:
		return r"{\scriptsize \textbf{%s} %s}" %(key, slugs.get(key, ''))
	else:
		return r"{\scriptsize \textbf{%s}}" % key

## Quality rating
def get_quality_row(key, data, slugged = True):
	a = avg(data)
	s = ("$%+4.2f$" % a) if a is not None else "---"
	color_tex = get_color_string(a, WT_U, WT_E, "Salmon", "green")
	row_tex = r"%s & %d & %d & %d & %d & %d & %s \\" \
			% (get_label(key, slugged),
				data.count(WT_U),
				data.count(WT_M),
				data.count(WT_A),
				data.count(WT_N),
				data.count(WT_E),
				s)
	return color_tex + row_tex
def print_quality_table(d, sort_key = None, slugged = True):
	items = sorted(d.items(), key = sort_key)
	print(r"\begin{tabular}{lcccccr}")
	print(r"\toprule Prob & U & M & A & N & E & Avg \\ \midrule")
	for key, data in items:
		print(get_quality_row(key, data, slugged))
	print(r"\bottomrule")
	print(r"\end{tabular}")

## Difficulty rating
def get_difficulty_row(key, data, slugged = False):
	a = avg(data)
	s = ("$%.3f$" % a) if a is not None else "---"
	color_tex = get_color_string(a, 1, 3, "cyan", "orange")
	row_tex = r"%s & %d & %d & %d & %d & %d & %s \\" \
			% (get_label(key, slugged),
				data.count(1),
				data.count(1.5),
				data.count(2),
				data.count(2.5),
				data.count(3),
				s)
	return color_tex + row_tex
def print_difficulty_table(d, sort_key = None, slugged = False):
	items = sorted(d.items(), key = sort_key)
	print(r"\begin{tabular}{l ccccc c}")
	print(r"\toprule Prob & 1 & 1.5 & 2 & 2.5 & 3 & Avg \\ \midrule")
	for key, data in items:
		print(get_difficulty_row(key, data, slugged))
	print(r"\bottomrule")
	print(r"\end{tabular}")

filtered_qualities = {k:v \
		for k,v in qualities.items() if criteria(k)}
filtered_difficulties = {k:v \
		for k,v in difficulties.items() if criteria(k)}

def print_everything(name, fn = None, flip_slug = False):
	if fn is not None:
		sort_key = lambda item: fn(item[0])
	else:
		sort_key = None
	print(r"\section{" + name + "}")
	if flip_slug:
		print_quality_table(filtered_qualities, sort_key, False)
		print_difficulty_table(filtered_difficulties, sort_key, True)
	else:
		print_quality_table(filtered_qualities, sort_key, True)
		print_difficulty_table(filtered_difficulties, sort_key, False)

# Start outputting content
print(r"""\documentclass[11pt]{scrartcl}
\usepackage{booktabs}
\usepackage[sexy]{evan}
\usepackage{tikz}
\usepackage{pgfplots}
\pgfplotsset{compat=1.17}

\begin{document}
\title{Example of ratings table with randomly generated data}
\maketitle

\setlength\tabcolsep{5pt}
""")



print(r"\section{All ratings}")
print_quality_table(qualities)
print_difficulty_table(difficulties)

print("\n" + r"\newpage" + "\n")
print_everything("Beauty contest, by overall popularity",
		lambda p : (-avg(qualities[p]), p), False)
print_everything("Beauty contest, by subject and popularity",
		lambda p : (p[0], -avg(qualities[p]), p), False)
print("\n" + r"\newpage" + "\n")
print_everything("Beauty contest, by overall difficulty",
		lambda p : (-avg(difficulties[p]), p), True)
print_everything("Beauty contest, by subject and difficulty",
		lambda p : (p[0], -avg(difficulties[p]), p), True)

print("\n")
print(r"\section{Scatter plot}")
print(r"\begin{center}")
print(r"\begin{tikzpicture}")
print(r"""\begin{axis}[width=0.9\textwidth, height=22cm, grid=both,
	xlabel={Average difficulty}, ylabel={Average suitability},
	every node near coord/.append style={font=\scriptsize},
	scatter/classes={A={red},C={blue},G={green},N={black}}]""")
print(r"""\addplot [scatter,
	only marks, point meta=explicit symbolic,
	nodes near coords*={\prob},
	visualization depends on={value \thisrow{prob} \as \prob}]""")
print(r"table [meta=subj] {")
print("X\tY\tprob\tsubj")
for p in qualities.keys():
	x = avg(difficulties[p])
	y = avg(qualities[p])
	if x is None or y is None:
		continue
	print("%0.2f\t%0.2f\t%s\t%s" %(x,y,p[2:],p[0]))
print(r"};")
print(r"\end{axis}")
print(r"\end{tikzpicture}")
print(r"\end{center}")

print(r"\end{document}")

Of course, obligatory warning to not overly rely on the numerical ratings and to put heavy weight on the text comments provided. (The numerical ratings will often have a lot of variance, anyways.)

One thing to keep in mind is that when choosing the problems is that there are two most obvious goals are basically orthogonal. One goal is to have the most attractive problems (“art”), but the other is to have an exam which is balanced across difficulty and subject composition (“science”). These two goals will often compete with each other and you’ll have to make judgment calls to prioritize one over the other.

A final piece of advice is to not be too pedantic. For example, I personally dislike the so-called “Geoff rule” that 1/2/4/5 should be distinct subjects: I find that it is often too restrictive in practice. I also support using “fractional distributions” in which say a problem can be 75% number theory and 25% combinatorics (rather than all-or-nothing) when trying to determine how to balance the exam. This leads to better, more nuanced judgments than insisting on four categories.

This is also the time to make any last edits you want to the problems, again both copy edits or more substantial edits. This gives you a penultimate draft of the exam.

Test solving

If you can, a good last quality check to do is to have a round of test-solving from an unbiased group of additional volunteers who haven’t already seen the packet. (For the volunteers, this is a smaller time commitment than reviewing an entire packet so it’s often feasible as an intermediate commitment.) You ask this last round of volunteers to try out the problems under exam-like conditions, although I find it’s not super necessary to do a full 4.5 hours or have full write-ups, if you’ll get more volunteers this way. A nice number of test-solvers is 5-10 people.

Typically this test-solving is most useful as a sanity check (e.g. to make sure the test is not obviously too difficult) and for any last minute shuffling of the problems (which often happens). I don’t advise making drastic changes at this point. It’s good as a way to get feedback on the most tricky decisions, though.

Wrap-up

After any final edits, I recommend sending a copy of the edited problems and solutions to the reviewers and test-solvers. They’re probably interested to know what problems made the cut, and you want to have eyes going through the final paper to check for ambiguities or errors.

I usually take the time to also send out some details of the selection itself: what the ratings for the problems looked like, often a sentence or two for each problem about the overall feedback, and a documentation of my thought process in the draft selection. It’s good to give people feedback on their problems, in my experience the authors usually appreciate it a lot, especially if they decide to re-submit the problem elsewhere.

And that’s the process.

USA Special Team Selection Test Series for IMO 2021

A lot of people have been asking me how team selection is going to work for the USA this year. This information was sent out to the contestants a while ago, but I understand that there’s a lot of people outside of MOP 2020 who are interested in seeing the TST problems :) so this is a quick overview of how things are going down this year.

This year there are six tests leading to the IMO 2021 team:

  • USA TSTST Day 1: November 12, 2020 (3 problems, 4.5 hours)
  • USA TSTST Day 2: December 10, 2020 (3 problems, 4.5 hours)
  • USA TSTST Day 3: January 21, 2021 (3 problems, 4.5 hours)
  • RMM Day 1: February 2021 (3 problems, 4.5 hours)
  • APMO: March 2021 (5 problems, 4 hours)
  • USAMO: April 2021 (2 days, each with 3 problems and 4.5 hours)

Everyone who was at the virtual MOP in June 2020 is invited to all three days of TSTST, and then the top scores get to take the latter three exams as team selection tests for the IMO. Meanwhile, the RMM teams and EGMO teams are based on just the three days of TSTST.

Similar to past years, discussion of TSTST is allowed on noon Eastern time Monday after each day. That means you can look forward to the first set of three new problems coming out on Monday, November 16, and similarly for the other two days of TSTST.

To add to the hype, I’ll be doing a short one-hour-or-less Twitch stream at 8:00pm ET on Tuesday November 17 where I present the solutions to the TSTST problems of day 1.  If there’s demand, I’ll probably run a review session for the other two days of TSTST, as well.

EDIT: Changed stream time to Tuesday so more people have time to try the problems.

USEMO sign-ups are open

I’m happy to announce that sign-ups for my new olympiad style contest, the United States Ersatz Math Olympiad (USEMO), are open now! The webpage for the USEMO is https://web.evanchen.cc/usemo.html (where sign-ups are posted).

https://web.evanchen.cc/static/usemo/usemo-logo.png

The US Ersatz Math Olympiad is a proof-based competition open to all US middle and high school students. Like many competitions, its goals are to develop interest and ability in mathematics (rather than measure it). However, it is one of few proof-based contests open to all US middle and high school students. You can see more about the goals of this contest in the mission statement.

The contest will run over Memorial day weekend:

  • Day 1 is Saturday May 23 2020, from 12:30pm ET — 5:00pm ET.
  • Day 2 is Sunday May 24 2020, from 12:30pm ET — 5:00pm ET.

In the future, assuming continued interest, I hope to make the USEMO into an annual tradition run in the fall.

Circular optimization

This post will mostly be focused on construction-type problems in which you’re asked to construct something satisfying property {P}.

Minor spoilers for USAMO 2011/4, IMO 2014/5.

1. What is a leap of faith?

Usually, a good thing to do whenever you can is to make “safe moves” which are implied by the property {P}. Here’s a simple example.

Example 1 (USAMO 2011)

Find an integer {n} such that the remainder when {2^n} is divided by {n} is odd.

It is easy to see, for example, that {n} itself must be odd for this to be true, and so we can make our life easier without incurring any worries by restricting our search to odd {n}. You might therefore call this an “optimization”: a kind of move that makes the problem easier, essentially for free.

But often times such “safe moves” or not enough to solve the problem, and you have to eventually make “leap-of-faith moves”. For example, maybe in the above problem, we might try to focus our attention on numbers {n = pq} for primes {p} and {q}. This does make our life easier, because we’ve zoomed in on a special type of {n} which is easy to compute. But it runs the risk that maybe there is no such example of {n}, or that the smallest one is difficult to find.

2. Circular reasoning can sometimes save the day

However, a strange type of circular reasoning can sometimes happen, in which a move that would otherwise be a leap-of-faith is actually known to be safe because you also know that the problem statement you are trying to prove is true. I can hardly do better than to give the most famous example:

Example 2 (IMO 2014)

For every positive integer {n}, the Bank of Cape Town issues coins of denomination {\frac 1n}. Given a finite collection of such coins (of not necessarily different denominations) with total value at most {99 + \frac12}, prove that it is possible to split this collection into {100} or fewer groups, such that each group has total value at most {1}.

Let’s say in this problem we find ourselves holding two coins of weight {1/6}. Perhaps we wish to put these coins in the same group, so that we have one less decision to make. However, this could rightly be viewed as a “leap-of-faith”, because there’s no logical reason why the task must remain possible after making this first move.

Except there is a non-logical reason: this is the same as trading the two coins of weight {1/6} for a single coin of weight {1/3}. Why is the task still possible? Because the problem says so: the very problem we are trying to solve includes this case, too. If the problem is going to be true, then it had better be true after we make this trade.

Thus by a perverse circular reasoning we can rest assured that our leap-of-faith here will not come back to bite us. (And in fact, this optimization is a major step of the solution.)

3. More examples of circular optimization

Here’s some more examples of problems you can try that I think have a similar idea.

Problem 1

Prove that in any connected graph {G} on {2004} vertices one can delete some edges to obtain a graph (also with {2004} vertices) whose degrees are all odd.

Problem 2 (USA TST 2017)

In a sports league, each team uses a set of at most {t} signature colors. A set {S} of teams is color-identifiable if one can assign each team in {S} one of their signature colors, such that no team in {S} is assigned any signature color of a different team in {S}. For all positive integers {n} and {t}, determine the maximum integer {g(n,t)} such that: In any sports league with exactly {n} distinct colors present over all teams, one can always find a color-identifiable set of size at least {g(n,t)}.

Feel free to post more examples in the comments.

MOHS hardness scale

There’s a new addition to my olympiad problems and solutions archive: I created an index of many past IMO/USAMO/USA TST(ST) problems by what my opinions on their difficulties are. You can grab the direct link to the file below:

https://evanchen.cc/upload/MOHS-hardness.pdf

In short, the scale runs from 0M to 50M in increments of 5M, and every USAMO / IMO problem on my archive now has a rating too.

My hope is that this can be useful in a couple ways. One is that I hope it’s a nice reference for students, so that they can better make choices about what practice problems would be most useful for them to work on. The other is that the hardness scale contains a very long discussion about how I judge the difficulty of problems. While this is my own personal opinion, obviously, I hope it might still be useful for coaches or at least interesting to read about.

As long as I’m here, I should express some concern that it’s possible this document does more harm than good, too. (I held off on posting this for a few months, but eventually decided to at least try it and see for myself, and just learn from it if it turns out to be a mistake.) I think there’s something special about solving your first IMO problem or USAMO problem or whatever and suddenly realizing that these problems are actually doable — I hope it would not be diminished by me rating the problem as 0M. Maybe more information isn’t always a good thing!

An opening speech for MOP

While making preparations for this year’s MOP, I imagined to myself what I would say on orientation night if I was director of the camp, and came up with the following speech. I thought it might be nice to share on this blog. Of course, it represents my own views, not the actual views of MOP or MAA. And since I am not actually director of MOP, the speech was never given.

People sometimes ask me, why do we have international students at MOP? Doesn’t that mean we’re training teams from other countries? So I want to make this clear now: the purpose of MOP is not to train and select future IMO teams.

I know it might seem that way, because we invite by score and grade. But I really think the purpose of MOP is to give each one of you the experience of working hard and meeting new people, among other things. Learn math, face challenges, make friends, the usual good stuff, right? And that’s something you can get no matter what your final rank is, or whether you make IMO or EGMO or even next year’s MOP. The MOP community is an extended family, and you are all part of it now.

What I mean to say is, the camp is designed with all 80 of you in mind. It made me sad back in 2012 when one of my friends realized he had little chance of making it back next year, and told me that MAA shouldn’t have invited him to begin with. Even if I can only take six students to the IMO each year, I never forget the other 74 of you are part of MOP too.

This means one important thing: everyone who puts in their best shot deserves to be here. (And unfortunately this also means there are many other people who deserve to be here tonight too, and are not. Maybe they solved one or two fewer problems than you did; or maybe they even solved the same number of problems, but they are in 11th grade and you are in 10th grade.)

Therefore, I hope to see all of you put in your best effort. And I should say this is not easy to do, because MOP is brutal in many ways. The classes are mandatory, we have a 4.5-hour test every two days, and you will be constantly graded. You will likely miss problems that others claim are easy. You might find out you know less than you thought you did, and this can be discouraging. Especially in the last week, when we run the TSTST, many of you will suddenly realize just how strong Team USA is.

So I want to tell you now, stay determined in the face of adversity. This struggle is your own, and we promise it’s worth it, no matter the outcome. We are rooting for you, and your friends sitting around you are too. (And if the people around you aren’t your friends yet, change that asap.)

Hard and soft techniques

In yet another contest-based post, I want to distinguish between two types of thinking: things that could help you solve a problem, and things that could help you understand the problem better. Then I’ll talk a little about how you can use the latter. (I’ve talked about this in my own classes for a while by now, but only recently realized I’ve never gotten the whole thing in writing. So here goes.)

1. More silly terminology

As usual, to make these things easier to talk about, I’m going to introduce some words to describe these two. Taking a page from martial arts, I’m going to run with hard and soft techniques.

A hard technique is something you try in the hopes it will prove something — ideally, solve the problem, but at least give you some intermediate lemma. Perhaps a better definition is “things that will end up in the actual proof”. Examples include:

  • Angle chasing in geometry, or proving quadrilaterals are cyclic.
  • Throwing complex numbers at a geometry problem.
  • Plugging in some values into a functional equation (which gives more equations to work with).
  • Taking a given Diophantine equation modulo {p} to get some information, or taking {p}-adic evaluations.
  • Trying to perform an induction, for example by deleting an element.
  • Trying to write down an inequality that when summed cyclically gives the desired conclusion.
  • Reducing the problem to one or more equivalent claims.

and so on. I’m sure you can come up with more examples.

In contrast, a soft technique is something you might try to help you understand the problem better — even if it might not prove anything. Perhaps a better definition is “things not written up”. Examples include:

  • Examining particular small cases of the problem.
  • Looking at the equality cases of a min/max problem.
  • Considering variants of the problem (for example, adding or deleting conditions).
  • Coming up with lots of concrete examples and playing with them.
  • Trying to come with a counterexample to the problem’s assertion and seeing what the obstructions are.
  • Drawing pictures, even on non-geometry problems (see JMO2 and JMO5 in my 2019 notes for example).
  • Deciding whether or not a geometry problem is “purely projective”.
  • Counting the algebraic degrees of freedom in a geometry problem.
  • Checking all the linear/polynomial solutions to a functional equation, in order to get a guess what the answer might be.
  • Blindly trying to guess solutions to an algebraic equation.
  • Making up an artificial unnatural function in a functional equation, and then trying to see why it doesn’t work (or occasionally being surprised that it does work).
  • Thinking about why a certain hard technique you tried failed, or even better convincing yourself it cannot work (for example, this Diophantine equation has a solution modulo every prime, so stop trying to one-shot by mods).
  • Giving a heuristic argument that some claim should be true or false (“probably {2^n \bmod n} is odd infinitely often”), or even easy/hard to prove.

and so on. There is some grey area between these two, some of the examples above might be argued to be in the other category (especially in context of specific problems), but hopefully this gives you a sense of what I’m talking about.

If you look at things I wrote back when I was in high school, you’ll see this referred to as “attacking” and “scouting” instead. This is too silly for me now even by my standards, but back then it was because I played a lot of StarCraft: Brood War (I’ve since switched to StarCraft II). The analogy there is pretty self-explanatory: knowing what your opponent is doing is important because your army composition and gameplay decisions should change in reaction to more information.

2. Using soft techniques: an example

Now after all that blabber, here’s the action item for you all: you should try soft techniques when stuck.

When you first start doing a problem, you will often have some good ideas for what to try. (For example: a wild geometry appeared, let’s scout for cyclic quadrilaterals.) Sometimes if you are lucky enough (especially if the problem is easier) this will be enough to topple the problem, and you can move on. But more often what happens is that eventually you run out of steam, and the problem is still standing. When that happens, my advice is to try doing some soft techniques if you haven’t already done so.

Here’s an example that I like to give.

Example 1 (USA TST 2009)

Find all real numbers {x}, {y}, {z} which satisfy

\displaystyle  \begin{aligned} x^3 &= 3x - 12y + 50,\\ y^3 &= 12y + 3z - 2,\\ z^3 &= 27z + 27x. \end{aligned}

A common first thing that people will try to do is add the first two equations, since that will cause the {12y} terms to cancel. This gives a factor of {x+y} in the left and an {x+z} in the right, so then maybe you try to submit that into the {27(x+z)} in the last equation, so you get {z^3 = 9(x^3+y^3-48)}, cool, there’s no more linear terms. Then. . .

Usually this doesn’t end well. You add this and subtract that and in the end all you see is equation after equation, and after a while you realize you’re not getting anywhere.

So we’re stuck now. What to do? I’ll now bring in two of the soft techniques I mentioned earlier:

  1. Let’s imagine the problem had {\mathbb R} replaced with {\mathbb C}. In this new problem, you can imagine solving for {y} in terms of {x} using the first equation, then {z} in terms of {y}, and then finally putting everything into the last equation to find a degree {27} polynomial in {x}. I say “imagine” because wow would that be ugly.

    But here’s the kicker: it’s a polynomial. It should have exactly {27} complex roots, with multiplicity. That’s a lot. Really?

    So here’s a hint you might take: there’s a good reason this is over {\mathbb R} but not {\mathbb C}. Often these kind of things end up being because there’s an inequality going on somewhere, so there will only be a few real solutions even though there might be tons of complex ones.

  2. Okay, but there’s an even more blatant thing we don’t know yet: what is the answer, anyways?

    This was more than a little bit embarrassing. We’re half an hour in to the problem and thoroughly stuck, and we don’t even have a single {(x,y,z)} that works? Maybe it’d be a good idea to fix that, like, right now. In the simplest way possible: guess and check.

    It’s much easier than it sounds, since if you pick a value of {z}, say, then you get {x} from the third equation, {y} from the first, then check whether it fits the second. If we restrict our search to integer values of {z}, then there aren’t so many that are reasonable.

I won’t spoil what the answer {(x,y,z)} is, other than saying there is an integer triple and it’s not hard to find it as I described. Once you have these two meta-considerations, you suddenly have a much better foothold, and it’s not too hard to solve the problem from here (for a USA TST problem anyways).

I pick this example because it really illustrates how hopeless repeatedly using hard techniques can be if you miss the right foothold (and also because in this problem it’s unusually tempting to just think that more manipulation is enough). It’s not impossible to solve the problem without first realizing what the answer is, but it is certainly way more difficult.

3. Improving at soft techniques

What this also means is that, in the after-math of a problem (when you’ve solved/given up on a problem and are reading and reflecting on the solution), you should also add soft techniques into the list of possible answers to “how might I have thought of that?”. An example of this is at the end of my earlier post On Reading Solutions, in which I describe how you can come up with solutions to two Putnam problems by thinking carefully about what should be the equality case.

Doing this is harder than it sounds, because the soft techniques are the ones that by definition won’t appear in most written solutions, and many people don’t explicitly even recognize them. But soft techniques are the things that tell you which hard techniques to use, which is why they’re so valuable to learn well.

In writing this post, I’m hoping to make the math contest world more aware that these sorts of non-formalizable ideas are things that can (and should) be acknowledged and discussed, the same way that the hard techniques are. In particular, just as there are a plethora of handouts on every hard technique in the olympiad literature, it should also be possible to design handouts aimed at practicing one or more particular soft techniques.

At MOP every year, I’m starting to see more and more classes to this effect (alongside the usual mix of classes called “inversion” or “graph theory” or “induction” or whatnot). I would love to see more! End speech.