Holomorphic Logarithms and Roots

In this post we’ll make sense of a holomorphic square root and logarithm. Wrote this up because I was surprised how hard it was to find a decent complete explanation.

Let {f : U \rightarrow \mathbb C} be a holomorphic function. A holomorphic {n}th root of {f} is a function {g : U \rightarrow \mathbb C} such that {f(z) = g(z)^n} for all {z \in U}. A logarithm of {f} is a function {g : U \rightarrow \mathbb C} such that {f(z) = e^{g(z)}} for all {z \in U}. The main question we’ll try to figure out is: when do these exist? In particular, what if {f = \mathrm{id}}?

1. Motivation: Square Root of a Complex Number

To start us off, can we define {\sqrt z} for any complex number {z}?

The first obvious problem that comes up is that there for any {z}, there are two numbers {w} such that {w^2 = z}. How can we pick one to use? For our ordinary square root function, we had a notion of “positive”, and so we simply took the positive root.

Let’s expand on this: given { z = r \left( \cos\theta + i \sin\theta \right) } (here {r \ge 0}) we should take the root to be

\displaystyle w = \sqrt{r} \left( \cos \alpha + i \sin \alpha \right).

such that {2\alpha \equiv \theta \pmod{2\pi}}; there are two choices for {\alpha \pmod{2\pi}}, differing by {\pi}.

For complex numbers, we don’t have an obvious way to pick {\alpha}. Nonetheless, perhaps we can also get away with an arbitrary distinction: let’s see what happens if we just choose the {\alpha} with {-\frac{1}{2}\pi < \alpha \le \frac{1}{2}\pi}.

Pictured below are some points (in red) and their images (in blue) under this “upper-half” square root. The condition on {\alpha} means we are forcing the blue points to lie on the right-half plane.


Here, {w_i^2 = z_i} for each {i}, and we are constraining the {w_i} to lie in the right half of the complex plane. We see there is an obvious issue: there is a big discontinuity near the point {z_5} and {z_7}! The nearby point {w_6} has been mapped very far away. This discontinuity occurs since the points on the negative real axis are at the “boundary”. For example, given {-4}, we send it to {-2i}, but we have hit the boundary: in our interval {-\frac{1}{2}\pi \le \alpha < \frac{1}{2}\pi}, we are at the very left edge.

The negative real axis that we must not touch is is what we will later call a branch cut, but for now I call it a ray of death. It is a warning to the red points: if you cross this line, you will die! However, if we move the red circle just a little upwards (so that it misses the negative real axis) this issue is avoided entirely, and we get what seems to be a “nice” square root.


In fact, the ray of death is fairly arbitrary: it is the set of “boundary issues” that arose when we picked {-\frac{1}{2}\pi < \alpha \le \frac{1}{2}\pi}. Suppose we instead insisted on the interval {0 \le \alpha < \pi}; then the ray of death would be the positive real axis instead. The earlier circle we had now works just fine.


What we see is that picking a particular {\alpha}-interval leads to a different set of edge cases, and hence a different ray of death. The only thing these rays have in common is their starting point of zero. In other words, given a red circle and a restriction of {\alpha}, I can make a nice “square rooted” blue circle as long as the ray of death misses it.

So, what exactly is going on?

2. Square Roots of Holomorphic Functions

To get a picture of what’s happening, we would like to consider a more general problem: let {f: U \rightarrow \mathbb C} be holomorphic. Then we want to decide whether there is a {g : U \rightarrow \mathbb C} such that

\displaystyle f(z) = g(z)^2.

Our previous discussion when {f = \mathrm{id}} tells us we cannot hope to achieve this for {U = \mathbb C}; there is a “half-ray” which causes problems. However, there are certainly functions {f : \mathbb C \rightarrow \mathbb C} such that a {g} exists. As a simplest example, {f(z) = z^2} should definitely have a square root!

Now let’s see if we can fudge together a square root. Earlier, what we did was try to specify a rule to force one of the two choices at each point. This is unnecessarily strict. Perhaps we can do something like the following: start at a point in {z_0 \in U}, pick a square root {w_0} of {f(z_0)}, and then try to “fudge” from there the square roots of the other points. What do I mean by fudge? Well, suppose {z_1} is a point very close to {z_0}, and we want to pick a square root {w_1} of {f(z_1)}. While there are two choices, we also would expect {w_0} to be close to {w_1}. Unless we are highly unlucky, this should tells us which choice of {w_1} to pick. (Stupid concrete example: if I have taken the square root {-4.12i} of {-17} and then ask you to continue this square root to {-16}, which sign should you pick for {\pm 4i}?)

There are two possible ways we could get unlucky in the scheme above: first, if {w_0 = 0}, then we’re sunk. But even if we avoid that, we have to worry that we are in a situation, where we run around a full loop in the complex plane, and then find that our continuous perturbation has left us in a different place than we started. For concreteness, consider the following situation, again with {f = \mathrm{id}}:


We started at the point {z_0}, with one of its square roots as {w_0}. We then wound a full red circle around the origin, only to find that at the end of it, the blue arc is at a different place where it started!

The interval construction from earlier doesn’t work either: no matter how we pick the interval for {\alpha}, any ray of death must hit our red circle. The problem somehow lies with the fact that we have enclosed the very special point {0}.

Nevertheless, we know that if we take {f(z) = z^2}, then we don’t run into any problems with our “make it up as you go” procedure. So, what exactly is going on?

3. Covering Projections

By now, if you have read the part of algebraic topology. this should all seem very strangely familiar. The “fudging” procedure exactly describes the idea of a lifting.

More precisely, recall that there is a covering projection

\displaystyle (-)^2 : \mathbb C \setminus \{0\} \rightarrow \mathbb C \setminus \{0\}.

Let {V = \left\{ z \in U \mid f(z) \neq 0 \right\}}. For {z \in U \setminus V}, we already have the square root {g(z) = \sqrt{f(z)} = \sqrt 0 = 0}. So the burden is completing {g : V \rightarrow \mathbb C}.

Then essentially, what we are trying to do is construct a lifting {g} for the following diagram: cproj-squareOur map {p} can be described as “winding around twice”. From algebraic topology, we now know that this lifting exists if and only if

\displaystyle f_\ast``(\pi_1(V)) \subseteq p_\ast``(\pi_1(E))

is a subset of the image of {\pi_1(E)} by {p}. Since {B} and {E} are both punctured planes, we can identify them with {S^1}.

Ques 1

Show that the image under {p} is exactly {2\mathbb Z} once we identify {\pi_1(B) = \mathbb Z}.

That means that for any loop {\gamma} in {V}, we need {f \circ \gamma} to have an even winding number around {0 \in B}. This amounts to

\displaystyle \frac{1}{2\pi} \oint_\gamma \frac{f'}{f} \; dz \in 2\mathbb Z

since {f} has no poles.

Replacing {2} with {n} and carrying over the discussion gives the first main result.

Theorem 2 (Existence of Holomorphic {n}th Roots)

Let {f : U \rightarrow \mathbb C} be holomorphic. Then {f} has a holomorphic {n}th root if and only if

\displaystyle \frac{1}{2\pi i}\oint_\gamma \frac{f'}{f} \; dz \in n\mathbb Z

for every contour {\gamma} in {U}.

4. Complex Logarithms

The multivalued nature of the complex logarithm comes from the fact that

\displaystyle \exp(z+2\pi i) = \exp(z).

So if {e^w = z}, then any complex number {w + 2\pi i k} is also a solution.

We can handle this in the same way as before: it amounts to a lifting of the following diagram. cproj-expThere is no longer a need to work with a separate {V} since:

Ques 3

Show that if {f} has any zeros then {g} possibly can’t exist.

In fact, the map {\exp : \mathbb C \rightarrow \mathbb C\setminus\{0\}} is a universal cover, since {\mathbb C} is simply connected. Thus, {p``(\pi_1(\mathbb C))} is trivial. So in addition to being zero-free, {f} cannot have any winding number around {0 \in B} at all. In other words:

Theorem 4 (Existence of Logarithms)

Let {f : U \rightarrow \mathbb C} be holomorphic. Then {f} has a logarithm if and only if

\displaystyle \frac{1}{2\pi i}\oint_\gamma \frac{f'}{f} \; dz = 0

for every contour {\gamma} in {U}.

5. Some Special Cases

The most common special case is

Corollary 5 (Nonvanishing Functions from Simply Connected Domains)

Let {f : \Omega \rightarrow \mathbb C} be continuous, where {\Omega} is simply connected. If {f(z) \neq 0} for every {z \in \Omega}, then {f} has both a logarithm and holomorphic {n}th root.

Finally, let’s return to the question of {f = \mathrm{id}} from the very beginning. What’s the best domain {U} such that we can define {\sqrt{-} : U \rightarrow \mathbb C}? Clearly {U = \mathbb C} cannot be made to work, but we can do almost as well. For note that the only zero of {f = \mathrm{id}} is at the origin. Thus if we want to make a logarithm exist, all we have to do is make an incision in the complex plane that renders it impossible to make a loop around the origin. The usual choice is to delete negative half of the real axis, our very first ray of death; we call this a branch cut, with branch point at {0 \in \mathbb C} (the point which we cannot circle around). This gives

Theorem 6 (Branch Cut Functions)

There exist holomorphic functions

\displaystyle \begin{aligned} \log &: \mathbb C \setminus (-\infty, 0] \rightarrow \mathbb C \\ \sqrt[n]{-} &: \mathbb C \setminus (-\infty, 0] \rightarrow \mathbb C \end{aligned}

satisfying the obvious properties.

There are many possible choices of such functions ({n} choices for the {n}th root and infinitely many for {\log}); a choice of such a function is called a branch. So this is what is meant by a “branch” of a logarithm.

The principal branch is the “canonical” branch, analogous to the way we arbitrarily pick the positive branch to define {\sqrt{-} : \mathbb R_{\ge 0} \rightarrow \mathbb R_{\ge 0}}. For {\log}, we take the {w} such that {e^w = z} and the imaginary part of {w} lies in {(-\pi, \pi]} (since we can shift by integer multiples of {2\pi i}). Often, authors will write {\text{Log } z} to emphasize this choice.

Example 7

Let {U} be the complex plane minus the real interval {[0,1]}. Then the function {U \rightarrow \mathbb C} by {z \mapsto z(z-1)} has a holomorphic square root.

Corollary 8

A holomorphic function {f : U \rightarrow \mathbb C} has a holomorphic {n}th root for all {n \ge 1} if and only if it has a holomorphic logarithm.

Facts about Lie Groups and Algebras

In Spring 2016 I was taking 18.757 Representations of Lie Algebras. Since I knew next to nothing about either Lie groups or algebras, I was forced to quickly learn about their basic facts and properties. These are the notes that I wrote up accordingly. Proofs of most of these facts can be found in standard textbooks, for example Kirillov.

1. Lie groups

Let {K = \mathbb R} or {K = \mathbb C}, depending on taste.

Definition 1

A Lie group is a group {G} which is also a {K}-manifold; the multiplication maps {G \times G \rightarrow G} (by {(g_1, g_2) \mapsto g_1g_2}) and the inversion map {G \rightarrow G} (by {g \mapsto g^{-1}}) are required to be smooth.

A morphism of Lie groups is a map which is both a map of manifolds and a group homomorphism.

Throughout, we will let {e \in G} denote the identity, or {e_G} if we need further emphasis.

Note that in particular, every group {G} can be made into a Lie group by endowing it with the discrete topology. This is silly, so we usually require only focus on connected groups:

Proposition 2 (Reduction to connected Lie groups)

Let {G} be a Lie group and {G^0} the connected component of {G} which contains {e}. Then {G^0} is a normal subgroup, itself a Lie group, and the quotient {G/G^0} has the discrete topology.

In fact, we can also reduce this to the study of simply connected Lie groups as follows.

Proposition 3 (Reduction to simply connected Lie groups)

If {G} is connected, let {\pi : \widetilde G \rightarrow G} be its universal cover. Then {\widetilde G} is a Lie group, {\pi} is a morphism of Lie groups, and {\ker \pi \cong \pi_1(G)}.

Here are some examples of Lie groups.

Example 4 (Examples of Lie groups)

  • {\mathbb R} under addition is a real one-dimensional Lie group.
  • {\mathbb C} under addition is a complex one-dimensional Lie group (and a two-dimensional real Lie group)!
  • The unit circle {S^1 \subseteq \mathbb C} is a real Lie group under multiplication.
  • {\text{GL }(n, K) \subset K^{\oplus n^2}} is a Lie group of dimension {n^2}. This example becomes important for representation theory: a representation of a Lie group {G} is a morphism of Lie groups {G \rightarrow \text{GL }(n, K)}.
  • {\text{SL }(n, K) \subset \text{GL }(n, K)} is a Lie group of dimension {n^2-1}.

As geometric objects, Lie groups {G} enjoy a huge amount of symmetry. For example, any neighborhood {U} of {e} can be “copied over” to any other point {g \in G} by the natural map {gU}. There is another theorem worth noting, which is that:

Proposition 5

If {G} is a connected Lie group and {U} is a neighborhood of the identity {e \in G}, then {U} generates {G} as a group.

2. Haar measure

Recall the following result and its proof from representation theory:

Claim 6

For any finite group {G}, {\mathbb C[G]} is semisimple; all finite-dimensional representations decompose into irreducibles.

Proof: Take a representation {V} and equip it with an arbitrary inner form {\left< -,-\right>_0}. Then we can average it to obtain a new inner form

\displaystyle \left< v, w \right> = \frac{1}{|G|} \sum_{g \in G} \left< gv, gw \right>_0.

which is {G}-invariant. Thus given a subrepresentation {W \subseteq V} we can just take its orthogonal complement to decompose {V}. \Box
We would like to repeat this type of proof with Lie groups. In this case the notion {\sum_{g \in G}} doesn’t make sense, so we want to replace it with an integral {\int_{g \in G}} instead. In order to do this we use the following:

Theorem 7 (Haar measure)

Let {G} be a Lie group. Then there exists a unique Radon measure {\mu} (up to scaling) on {G} which is left-invariant, meaning

\displaystyle \mu(g \cdot S) = \mu(S)

for any Borel subset {S \subseteq G} and “translate” {g \in G}. This measure is called the (left) Haar measure.

Example 8 (Examples of Haar measures)

  • The Haar measure on {(\mathbb R, +)} is the standard Lebesgue measure which assigns {1} to the closed interval {[0,1]}. Of course for any {S}, {\mu(a+S) = \mu(S)} for {a \in \mathbb R}.
  • The Haar measure on {(\mathbb R \setminus \{0\}, \times)} is given by

    \displaystyle \mu(S) = \int_S \frac{1}{|t|} \; dt.

    In particular, {\mu([a,b]) = \log(b/a)}. One sees the invariance under multiplication of these intervals.

  • Let {G = \text{GL }(n, \mathbb R)}. Then a Haar measure is given by

    \displaystyle \mu(S) = \int_S |\det(X)|^{-n} \; dX.

  • For the circle group {S^1}, consider {S \subseteq S^1}. We can define

    \displaystyle \mu(S) = \frac{1}{2\pi} \int_S d\varphi

    across complex arguments {\varphi}. The normalization factor of {2\pi} ensures {\mu(S^1) = 1}.

Note that we have:

Corollary 9

If the Lie group {G} is compact, there is a unique Haar measure with {\mu(G) = 1}.

This follows by just noting that if {\mu} is Radon measure on {X}, then {\mu(X) < \infty}. This now lets us deduce that

Corollary 10 (Compact Lie groups are semisimple)

{\mathbb C[G]} is semisimple for any compact Lie group {G}.

Indeed, we can now consider

\displaystyle \left< v,w\right> = \int_G \left< g \cdot v, g \cdot w\right>_0 \; dg

as we described at the beginning.

3. The tangent space at the identity

In light of the previous comment about neighborhoods of {e} generating {G}, we see that to get some information about the entire Lie group it actually suffices to just get “local” information of {G} at the point {e} (this is one formalization of the fact that Lie groups are super symmetric).

To do this one idea is to look at the tangent space. Let {G} be an {n}-dimensional Lie group (over {K}) and consider {\mathfrak g = T_eG} the tangent space to {G} at the identity {e \in G}. Naturally, this is a {K}-vector space of dimension {n}. We call it the Lie algebra associated to {G}.

Example 11 (Lie algebras corresponding to Lie groups)

  • {(\mathbb R, +)} has a real Lie algebra isomorphic to {\mathbb R}.
  • {(\mathbb C, +)} has a complex Lie algebra isomorphic to {\mathbb C}.
  • The unit circle {S^1 \subseteq \mathbb C} has a real Lie algebra isomorphic to {\mathbb R}, which we think of as the “tangent line” at the point {1 \in S^1}.

Example 12 ({\mathfrak{gl}(n, K)})

Let’s consider {\text{GL }(n, K) \subset K^{\oplus n^2}}, an open subset of {K^{\oplus n^2}}. Its tangent space should just be an {n^2}-dimensional {K}-vector space. By identifying the components in the obvious way, we can think of this Lie algebra as just the set of all {n \times n} matrices.

This Lie algebra goes by the notation {\mathfrak{gl}(n, K)}.

Example 13 ({\mathfrak{sl}(n, K)})

Recall {\text{SL }(n, K) \subset \text{GL }(n, K)} is a Lie group of dimension {n^2-1}, hence its Lie algebra should have dimension {n^2-1}. To see what it is, let’s look at the special case {n=2} first: then

\displaystyle \text{SL }(2, K) = \left\{ \begin{pmatrix} a & b \\ c & d \end{pmatrix} \mid ad - bc = 1 \right\}.

Viewing this as a polynomial surface {f(a,b,c,d) = ad-bc} in {K^{\oplus 4}}, we compute

\displaystyle \nabla f = \left< d, -c, -b, a \right>

and in particular the tangent space to the identity matrix {\begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}} is given by the orthogonal complement of the gradient

\displaystyle \nabla f (1,0,0,1) = \left< 1, 0, 0, 1 \right>.

Hence the tangent plane can be identified with matrices satisfying {a+d=0}. In other words, we see

\displaystyle \mathfrak{sl}(2, K) = \left\{ T \in \mathfrak{gl}(2, K) \mid \text{Tr } T = 0. \right\}.

By repeating this example in greater generality, we discover

\displaystyle \mathfrak{sl}(n, K) = \left\{ T \in \mathfrak{gl}(n, K) \mid \text{Tr } T = 0. \right\}.

4. The exponential map

Right now, {\mathfrak g} is just a vector space. However, by using the group structure we can get a map from {\mathfrak g} back into {G}. The trick is “differential equations”:

Proposition 14 (Differential equations for Lie theorists)

Let {G} be a Lie group over {K} and {\mathfrak g} its Lie algebra. Then for every {x \in \mathfrak g} there is a unique homomorphism

\displaystyle \gamma_x : K \rightarrow G

which is a morphism of Lie groups, such that

\displaystyle \gamma_x'(0) = x \in T_eG = \mathfrak g.

We will write {\gamma_x(t)} to emphasize the argument {t \in K} being thought of as “time”. Thus this proposition should be intuitively clear: the theory of differential equations guarantees that {\gamma_x} is defined and unique in a small neighborhood of {0 \in K}. Then, the group structure allows us to extend {\gamma_x} uniquely to the rest of {K}, giving a trajectory across all of {G}. This is sometimes called a one-parameter subgroup of {G}, but we won’t use this terminology anywhere in what follows.

This lets us define:

Definition 15

Retain the setting of the previous proposition. Then the exponential map is defined by

\displaystyle \exp : \mathfrak g \rightarrow G \qquad\text{by}\qquad x \mapsto \gamma_x(1).

The exponential map gets its name from the fact that for all the examples I discussed before, it is actually just the map {e^\bullet}. Note that below, {e^T = \sum_{k \ge 0} \frac{T^k}{k!}} for a matrix {T}; this is called the matrix exponential.

Example 16 (Exponential Maps of Lie algebras)

  • If {G = \mathbb R}, then {\mathfrak g = \mathbb R} too. We observe {\gamma_x(t) = e^{tx} \in \mathbb R} (where {t \in \mathbb R}) is a morphism of Lie groups {\gamma_x : \mathbb R \rightarrow G}. Hence

    \displaystyle \exp : \mathbb R \rightarrow \underbrace{\mathbb R}_{=G} \qquad \exp(x) = \gamma_x(1) = e^t \in \mathbb R = G.

  • Ditto for {\mathbb C}.
  • For {S^1} and {x \in \mathbb R}, the map {\gamma_x : \mathbb R \rightarrow S^1} given by {t \mapsto e^{itx}} works. Hence

    \displaystyle \exp : \mathbb R \rightarrow S^1 \qquad \exp(x) = \gamma_x(1) = e^{it} \in S^1.

  • For {\text{GL }(n, K)}, the map {\gamma_X : K \rightarrow \text{GL }(n, K)} given by {t \mapsto e^{tX}} works nicely (now {X} is a matrix). (Note that we have to check {e^{tX}} is actually invertible for this map to be well-defined.) Hence the exponential map is given by

    \displaystyle \exp : \mathfrak{gl}(n,K) \rightarrow \text{GL }(n,K) \qquad \exp(X) = \gamma_X(1) = e^X \in \text{GL }(n, K).

  • Similarly,

    \displaystyle \exp : \mathfrak{sl}(n,K) \rightarrow \text{SL }(n,K) \qquad \exp(X) = \gamma_X(1) = e^X \in \text{SL }(n, K).

    Here we had to check that if {X \in \mathfrak{sl}(n,K)}, meaning {\text{Tr } X = 0}, then {\det(e^X) = 1}. This can be seen by writing {X} in an upper triangular basis.

Actually, taking the tangent space at the identity is a functor. Consider a map {\varphi : G_1 \rightarrow G_2} of Lie groups, with lie algebras {\mathfrak g_1} and {\mathfrak g_2}. Because {\varphi} is a group homomorphism, {G_1 \ni e_1 \mapsto e_2 \in G_2}. Now, by manifold theory we know that maps {f : M \rightarrow N} between manifolds gives a linear map between the corresponding tangent spaces, say {Tf : T_pM \rightarrow T_{fp}N}. For us we obtain a linear map

\displaystyle \varphi_\ast = T \varphi : \mathfrak g_1 \rightarrow \mathfrak g_2.

In fact, this {\varphi_\ast} fits into a diagram


Here are a few more properties of {\exp}:

  • {\exp(0) = e \in G}, which is immediate by looking at the constant trajectory {\phi_0(t) \equiv e}.
  • {\exp'(x) = x \in \mathfrak g}, i.e. the total derivative {D\exp : \mathfrak g \rightarrow \mathfrak g} is the identity. This is again by construction.
  • In particular, by the inverse function theorem this implies that {\exp} is a diffeomorphism in a neighborhood of {0 \in \mathfrak g}, onto a neighborhood of {e \in G}.
  • {\exp} commutes with the commutator. (By the above diagram.)

5. The commutator

Right now {\mathfrak g} is still just a vector space, the tangent space. But now that there is map {\exp : \mathfrak g \rightarrow G}, we can use it to put a new operation on {\mathfrak g}, the so-called commutator.

The idea is follows: we want to “multiply” two elements of {\mathfrak g}. But {\mathfrak g} is just a vector space, so we can’t do that. However, {G} itself has a group multiplication, so we should pass to {G} using {\exp}, use the multiplication in {G} and then come back.

Here are the details. As we just mentioned, {\exp} is a diffeomorphism near {e \in G}. So for {x}, {y} close to the origin of {\mathfrak g}, we can look at {\exp(x)} and {\exp(y)}, which are two elements of {G} close to {e}. Multiplying them gives an element still close to {e}, so its equal to {\exp(z)} for some unique {z}, call it {\mu(x,y)}.

One can show in fact that {\mu} can be written as a Taylor series in two variables as

\displaystyle \mu(x,y) = x + y + \frac{1}{2} [x,y] + \text{third order terms} + \dots

where {[x,y]} is a skew-symmetric bilinear map, meaning {[x,y] = -[y,x]}. It will be more convenient to work with {[x,y]} than {\mu(x,y)} itself, so we give it a name:

Definition 17

This {[x,y]} is called the commutator of {G}.

Now we know multiplication in {G} is associative, so this should give us some nontrivial relation on the bracket {[,]}. Specifically, since

\displaystyle \exp(x) \left( \exp(y) \exp(z) \right) = \left( \exp(x) \exp(y) \right) \exp(z).

we should have that {\mu(x, \mu(y,z)) = \mu(\mu(x,y), z)}, and this should tell us something. In fact, the claim is:

Theorem 18

The bracket {[,]} satisfies the Jacobi identity

\displaystyle [x,[y,z]] + [y,[z,x]] + [z,[x,y]] = 0.

Proof: Although I won’t prove it, the third-order terms (and all the rest) in our definition of {[x,y]} can be written out explicitly as well: for example, for example, we actually have

\displaystyle \mu(x,y) = x + y + \frac{1}{2} [x,y] + \frac{1}{12} \left( [x, [x,y]] + [y,[y,x]] \right) + \text{fourth order terms} + \dots.

The general formula is called the Baker-Campbell-Hausdorff formula.

Then we can force ourselves to expand this using the first three terms of the BCS formula and then equate the degree three terms. The left-hand side expands initially as {\mu\left( x, y + z + \frac{1}{2} [y,z] + \frac{1}{12} \left( [y,[y,z]] + [z,[z,y] \right) \right)}, and the next step would be something ugly.

This computation is horrifying and painful, so I’ll pretend I did it and tell you the end result is as claimed. \Box
There is a more natural way to see why this identity is the “right one”; see Qiaochu. However, with this proof I want to make the point that this Jacobi identity is not our decision: instead, the Jacobi identity is forced upon us by associativity in {G}.

Example 19 (Examples of commutators attached to Lie groups)

  • If {G} is an abelian group, we have {-[y,x] = [x,y]} by symmetry and {[x,y] = [y,x]} from {\mu(x,y) = \mu(y,x)}. Thus {[x,y] = 0} in {\mathfrak g} for any abelian Lie group {G}.
  • In particular, the brackets for {G \in \{\mathbb R, \mathbb C, S^1\}} are trivial.
  • Let {G = \text{GL }(n, K)}. Then one can show that

    \displaystyle [T,S] = TS - ST \qquad \forall S, T \in \mathfrak{gl}(n, K).

  • Ditto for {\text{SL }(n, K)}.

In any case, with the Jacobi identity we can define an general Lie algebra as an intrinsic object with a Jacobi-satisfying bracket:

Definition 20

A Lie algebra over {k} is a {k}-vector space equipped with a skew-symmetric bilinear bracket {[,]} satisfying the Jacobi identity.

A morphism of Lie algebras and preserves the bracket.

Note that a Lie algebra may even be infinite-dimensional (even though we are assuming {G} is finite-dimensional, so that they will never come up as a tangent space).

Example 21 (Associative algebra {\rightarrow} Lie algebra)

Any associative algebra {A} over {k} can be made into a Lie algebra by taking the same underlying vector space, and using the bracket {[a,b] = ab - ba}.

6. The fundamental theorems

We finish this list of facts by stating the three “fundamental theorems” of Lie theory. They are based upon the functor

\displaystyle \mathscr{L} : G \mapsto T_e G

we have described earlier, which is a functor

  • from the category of Lie groups
  • into the category of finite-dimensional Lie algebras.

The first theorem requires the following definition:

Definition 22

A Lie subgroup {H} of a Lie group {G} is a subgroup {H} such that the inclusion map {H \hookrightarrow G} is also an injective immersion.

A Lie subalgebra {\mathfrak h} of a Lie algebra {\mathfrak g} is a vector subspace preserved under the bracket (meaning that {[\mathfrak h, \mathfrak h] \subseteq \mathfrak h]}).

Theorem 23 (Lie I)

Let {G} be a real or complex Lie group with Lie algebra {\mathfrak g}. Then given a Lie subgroup {H \subseteq G}, the map

\displaystyle H \mapsto \mathscr{L}(H) \subseteq \mathfrak g

is a bijection between Lie subgroups of {G} and Lie subalgebras of {\mathfrak g}.

Theorem 24 (The Lie functor is an equivalence of categories)

Restrict {\mathscr{L}} to a functor

  • from the category of simply connected Lie groups over {K}
  • to the category of finite-dimensional Lie algebras over {K}.


  1. (Lie II) {\mathscr{L}} is fully faithful, and
  2. (Lie III) {\mathscr{L}} is essentially surjective on objects.

If we drop the “simply connected” condition, we obtain a functor which is faithful and exact, but not full: non-isomorphic Lie groups can have isomorphic Lie algebras (one example is {\text{SO }(3)} and {\text{SU }(2)}).

Algebraic Topology Functors

This will be old news to anyone who does algebraic topology, but oddly enough I can’t seem to find it all written in one place anywhere, and in particular I can’t find the bit about {\mathsf{hPairTop}} at all.

In algebraic topology you (for example) associate every topological space {X} with a group, like {\pi_1(X, x_0)} or {H_5(X)}. All of these operations turn out to be functors. This isn’t surprising, because as far as I’m concerned the definition of a functor is “any time you take one type of object and naturally make another object”.

The surprise is that these objects also respect homotopy in a nice way; proving this is a fair amount of the “setup” work in algebraic topology.

1. Homology, {H_n : \mathsf{hTop} \rightarrow \mathsf{Grp}}

Note that {H_5} is a functor

\displaystyle H_5 : \mathsf{Top} \rightarrow \mathsf{Grp}

i.e. to every space {X} we can associate a group {H_5(X)}. (Of course, replace {5} by integer of your choice.) Recall that:

Definition 1

Two maps {f, g : X \rightarrow Y} are homotopy equivalent if there exists a homotopy between them.

Thus for a map we can take its homotopy class {[f]} (the equivalence class under this relationship). This has the nice property that {[f \circ g] = [f] \circ [g]} and so on.

Definition 2

Two spaces {X} and {Y} are homotopic if there exists a pair of maps {f : X \rightarrow Y} and {g : Y \rightarrow X} such that {[f \circ g] = [\mathrm{id}_X]} and {[g \circ f] = [\mathrm{id}_Y]}.

In light of this, we can define

Definition 3

The category {\mathsf{hTop}} is defined as follows:

  • The objects are topological spaces {X}.
  • The morphisms {X \rightarrow Y} are homotopy classes of continuous maps {X \rightarrow Y}.

Remark 4

Composition is well-defined since {[f \circ g] = [f] \circ [g]}. Two spaces are isomorphic in {\mathsf{hTop}} if they are homotopic.

Remark 5

As you might guess this “quotient” construction is called a quotient category.

Then the big result is that:

Theorem 6

The induced map {f_\sharp = H_n(f)} of a map {f: X \rightarrow Y} depends only on the homotopy class of {f}. Thus {H_n} is a functor

\displaystyle H_n : \mathsf{hTop} \rightarrow \mathsf{Grp}.

The proof of this is geometric, using the so-called prism operators. In any case, as with all functors we deduce

Corollary 7

{H_n(X) \cong H_n(Y)} if {X} and {Y} are homotopic.

In particular, the contractible spaces are those spaces {X} which are homotopy equivalent to a point. In which case, {H_n(X) = 0} for all {n \ge 1}.

2. Relative Homology, {H_n : \mathsf{hPairTop} \rightarrow \mathsf{Grp}}

In fact, we also defined homology groups

\displaystyle H_n(X,A)

for {A \subseteq X}. We will now show this is functorial too.

Definition 8

Let {\varnothing \neq A \subset X} and {\varnothing \neq B \subset X} be subspaces, and consider a map {f : X \rightarrow Y}. If {f(A) \subseteq B} we write

\displaystyle f : (X,A) \rightarrow (Y,B).

We say {f} is a map of pairs, between the pairs {(X,A)} and {(Y,B)}.

Definition 9

We say that {f,g : (X,A) \rightarrow (Y,B)} are pair-homotopic if they are “homotopic through maps of pairs”.

More formally, a pair-homotopy {f, g : (X,A) \rightarrow (Y,B)} is a map {F : [0,1] \times X \rightarrow Y}, which we’ll write as {F_t(X)}, such that {F} is a homotopy of the maps {f,g : X \rightarrow Y} and each {F_t} is itself a map of pairs.

Thus, we naturally arrive at two categories:

  • {\mathsf{PairTop}}, the category of pairs of toplogical spaces, and
  • {\mathsf{hPairTop}}, the same category except with maps only equivalent up to homotopy.

Definition 10

As before, we say pairs {(X,A)} and {(Y,B)} are pair-homotopy equivalent if they are isomorphic in {\mathsf{hPairTop}}. An isomorphism of {\mathsf{hPairTop}} is a pair-homotopy equivalence.

Then, the prism operators now let us derive

Theorem 11

We have a functor

\displaystyle H_n : \mathsf{hPairTop} \rightarrow \mathsf{Grp}.

The usual corollaries apply.

Now, we want an analog of contractible spaces for our pairs: i.e. pairs of spaces {(X,A)} such that {H_n(X,A) = 0} for {n \ge 1}. The correct definition is:

Definition 12

Let {A \subset X}. We say that {A} is a deformation retract of {X} if there is a map of pairs {r : (X, A) \rightarrow (A, A)} which is a pair homotopy equivalence.

Example 13 (Examples of Deformation Retracts)

  1. If a single point {p} is a deformation retract of a space {X}, then {X} is contractible, since the retraction {r : X \rightarrow \{\ast\}} (when viewed as a map {X \rightarrow X}) is homotopic to the identity map {\mathrm{id}_X : X \rightarrow X}.
  2. The punctured disk {D^2 \setminus \{0\}} deformation retracts onto its boundary {S^1}.
  3. More generally, {D^{n} \setminus \{0\}} deformation retracts onto its boundary {S^{n-1}}.
  4. Similarly, {\mathbb R^n \setminus \{0\}} deformation retracts onto a sphere {S^{n-1}}.

Of course in this situation we have that

\displaystyle H_n(X,A) \cong H_n(A,A) = 0.

3. Homotopy, {\pi_1 : \mathsf{hTop}_\ast \rightarrow \mathsf{Grp}}

As a special case of the above, we define

Definition 14

The category {\mathsf{Top}_\ast} is defined as follows:

  • The objects are pairs {(X, x_0)} of spaces {X} with a distinguished basepoint {x_0}. We call these pointed spaces.
  • The morphisms are maps {f : (X, x_0) \rightarrow (Y, y_0)}, meaning {f} is continuous and {f(x_0) = y_0}.

Now again we mod out:

Definition 15

Two maps {f , g : (X, x_0) \rightarrow (Y, y_0)} of pointed spaces are homotopic if there is a homotopy between them which also fixes the basepoints. We can then, in the same way as before, define the quotient category {\mathsf{hTop}_\ast}.

And lo and behold:

Theorem 16

We have a functor

\displaystyle \pi_1 : \mathsf{hTop}_\ast \rightarrow \mathsf{Grp}.

Same corollaries as before.

Uniqueness of Solutions for DiffEq’s

Let {V} be a normed finite-dimensional real vector space and let {U \subseteq V} be an open set. A vector field on {U} is a function {\xi : U \rightarrow V}. (In the words of Gaitsgory: “you should imagine a vector field as a domain, and at every point there is a little vector growing out of it.”)

The idea of a differential equation is as follows. Imagine your vector field specifies a velocity at each point. So you initially place a particle somewhere in {U}, and then let it move freely, guided by the arrows in the vector field. (There are plenty of good pictures online.) Intuitively, for nice {\xi} it should be the case that the trajectory resulting is unique. This is the main take-away; the proof itself is just for completeness.

This is a so-called differential equation:

Definition 1

Let {\gamma : (-\varepsilon, \varepsilon) \rightarrow U} be a continuous path. We say {\gamma} is a solution to the differential equation defined by {\xi} if for each {t \in (-\varepsilon, \varepsilon)} we have

\displaystyle  \gamma'(t) = \xi(\gamma(t)).

Example 2 (Examples of DE’s)

Let {U = V = \mathbb R}.

  1. Consider the vector field {\xi(x) = 1}. Then the solutions {\gamma} are just {\gamma(t) = t+c}.
  2. Consider the vector field {\xi(x) = x}. Then {\gamma} is a solution exactly when {\gamma'(t) = \gamma(t)}. It’s well-known that {\gamma(t) = c\exp(t)}.

Of course, you may be used to seeing differential equations which are time-dependent: i.e. something like {\gamma'(t) = t}, for example. In fact, you can hack this to fit in the current model using the idea that time is itself just a dimension. Suppose we want to model {\gamma'(t) = F(\gamma(t), t)}. Then we instead consider

\displaystyle  \xi : V \times \mathbb R \rightarrow V \times \mathbb R \qquad\text{by}\qquad \xi(v, t) = (F(v,t), 1)

and solve the resulting differential equation over {V \times \mathbb R}. This does exactly what we want. Geometrically, this means making time into another dimension and imagining that our particle moves at a “constant speed through time”.

The task is then mainly about finding which conditions guarantee that our differential equation behaves nicely. The answer turns out to be:

Definition 3

The vector field {\xi : U \rightarrow V} satisfies the Lipschitz condition if

\displaystyle  \left\lVert \xi(x')-\xi(x'') \right\rVert \le \Lambda \left\lVert x'-x'' \right\rVert

holds identically for some fixed constant {\Lambda}.

Note that continuously differentiable implies Lipschitz.

Theorem 4 (Picard-Lindelöf)

Let {V} be a finite-dimensional real vector space, and let {\xi} be a vector field on a domain {U \subseteq V} which satisfies the Lipschitz condition.

Then for every {x_0 \in U} there exists {(-\varepsilon,\varepsilon)} and {\gamma : (-\varepsilon,\varepsilon) \rightarrow U} such that {\gamma'(t) = \xi(\gamma(t))} and {\gamma(0) = x_0}. Moreover, if {\gamma_1} and {\gamma_2} are two solutions and {\gamma_1(t) = \gamma_2(t)} for some {t}, then {\gamma_1 = \gamma_2}.

In fact, Peano’s existence theorem says that if we replace Lipschitz continuity with just continuity, then {\gamma} exists but need not be unique. For example:

Example 5 (Counterexample if {\xi} is not differentiable)

Let {U = V = \mathbb R} and consider {\xi(x) = x^{\frac23}}, with {x_0 = 0}. Then {\gamma(t) = 0} and {\gamma(t) = \left( t/3 \right)^3} are both solutions to the differential equation

\displaystyle  \gamma'(t) = \gamma(t)^{\frac 23}.

Now, for the proof of the main theorem. The main idea is the following result (sometimes called the contraction principle).

Lemma 6 (Banach Fixed-Point Theorem)

Let {(X,d)} be a complete metric space. Let {f : X \rightarrow X} be a map such that {d(f(x_1), f(x_2)) < \frac{1}{2} d(x_1, x_2)} for any {x_1, x_2 \in X}. Then {f} has a unique fixed point.

For the proof of the main theorem, we are given {x_0 \in V}. Let {X} be the metric space of continuous functions from {(-\varepsilon, \varepsilon)} to the complete metric space {\overline{B}(x_0, r)} which is the closed ball of radius {r} centered at {x_0}. (Here {r > 0} can be arbitrary, so long as it stays in {U}.) It turns out that {X} is itself a complete metric space when equipped with the sup norm

\displaystyle  d(f, g) = \sup_{t \in (-\varepsilon, \varepsilon)} \left\lVert f(t)-g(t) \right\rVert.

This is well-defined since {\overline{B}(x_0, r)} is compact.

We wish to use the Banach theorem on {X}, so we’ll rig a function {\Phi : X \rightarrow X} with the property that its fixed points are solutions to the differential equation. Define it by, for every {\gamma \in X},

\displaystyle  \Phi(\gamma) : t \mapsto x_0 + \int_0^t \xi(\gamma(s)) \; ds.

This function is contrived so that {(\Phi\gamma)(0) = x_0} and {\Phi\gamma} is both continuous and differentiable. By the Fundamental Theorem of Calculus, the derivative is exhibited by

\displaystyle  (\Phi\gamma)'(t) = \left( \int_0^t \xi(\gamma(s)) \; ds \right)' = \xi(\gamma(t)).

In particular, fixed points correspond exactly to solutions to our differential equation.

A priori this output has signature {\Phi\gamma : (-\varepsilon,\varepsilon) \rightarrow V}, so we need to check that {\Phi\gamma(t) \in \overline{B}(x_0, r)}. We can check that

\displaystyle  \begin{aligned} \left\lVert (\Phi\gamma)(t) - x_0 \right\rVert &=\left\lVert \int_0^t \xi(\gamma(s)) \; ds \right\rVert \\ &\le \int_0^t \left\lVert \xi(\gamma(s)) \; ds \right\rVert \\ &\le t \max_{s \in [0,t]} \left\lVert \xi\gamma(s) \right\rVert \\ &< \varepsilon \cdot A \end{aligned}

where {A = \max_{x \in \overline{B}(x_0,r)} \left\lVert \xi(x) \right\rVert}; we have {A < \infty} since {\overline{B}(x_0,r)} is compact. Hence by selecting {\varepsilon < r/A}, the above is bounded by {r}, so {\Phi\gamma} indeed maps into {\overline{B}(x_0, r)}. (Note that at this point we have not used the Lipschitz condition, only that {\xi} is continuous.)

It remains to show that {\Phi} is contracting. Write

\displaystyle  \begin{aligned} \left\lVert (\Phi\gamma_1)(t) - (\Phi\gamma_2)(t) \right\rVert &= \left\lVert \int_{s \in [0,t]} \left( \xi(\gamma_1(s))-\xi(\gamma_2(s)) \right) \right\rVert \\ &= \int_{s \in [0,t]} \left\lVert \xi(\gamma_1(s))-\xi(\gamma_2(s)) \right\rVert \\ &\le t\Lambda \sup_{s \in [0,t]} \left\lVert \gamma_1(s)-\gamma_2(s) \right\rVert \\ &< \varepsilon\Lambda \sup_{s \in [0,t]} \left\lVert \gamma_1(s)-\gamma_2(s) \right\rVert \\ &= \varepsilon\Lambda d(\gamma_1, \gamma_2) . \end{aligned}

Hence once again for {\varepsilon} sufficiently small we get {\varepsilon\Lambda \le \frac{1}{2}}. Since the above holds identically for {t}, this implies

\displaystyle  d(\Phi\gamma_1, \Phi\gamma_2) \le \frac{1}{2} d(\gamma_1, \gamma_2)

as needed.

This is a cleaned-up version of a portion of a lecture from Math 55b in Spring 2015, instructed by Dennis Gaitsgory.

Constructing the Tangent and Cotangent Space

This one confused me for a long time, so I figured I should write this down before I forgot again.

Let {M} be an abstract smooth manifold. We want to define the notion of a tangent vector to {M} at a point {p \in M}. With that, we can define the tangent space {T_p(M)}, which will just be the (real) vector space of tangent vectors at {p}.

Geometrically, we know what this should look like for our usual examples. For example, if {M = S^1} is a circle embedded in {\mathbb R^2}, then the tangent vector at a point {p} should just look like a vector running off tangent to the circle.
manifold-tangent-space-circleSimilarly, given a sphere {M = S^2}, the tangent space at a point {p} along the sphere would look like plane tangent to {M} at {p}.

However, the point of an abstract manifold is that we want to see the manifold as an intrinsic object, in its own right, rather than as embedded in {\mathbb R^n}. This can be thought of as analogous to the way that we think of a group as an abstract object in its own right, even though Cayley’s Theorem tells us that any group is a subgroup of the permutation group. (This wasn’t always the case! During the 19th century, a group was literally defined as a subset of {\text{GL}(n)} or of {S_n}. In fact Sylow developed his theorems without the word “group” Only much later did the abstract definition of a group was given, an abstract set {G} which was independent of any embedding into {S_n}, and an object in its own right.) So, we would like our notion of a tangent vector to not refer to an ambient space, but only to intrinsic properties of the manifold {M} in question.

So how do we capture the notion of the tangent to a manifold referring just to the manifold itself? Well, the smooth structure of the manifold lets us speak of smooth functions {f : M \rightarrow \mathbb R}. In the embedded case, we can thus think of taking a directional derivative along {\vec v} (i.e. some partial derivative). To give a concrete example, suppose we have a smooth function {f : S^2 \rightarrow \mathbb R} and a point {p}. By the structure of a manifold, near the point {p}, {f} looks like a function on some neighborhood of the origin in {\mathbb R^2 = T_p(M)} So we are allowed to take the partial derivative of {f} with respect to any of the vectors in {T_p(M)}.

For a fixed {v} this partial derivative is a linear map {D : C^\infty(M) \rightarrow \mathbb R}. It turns out this goes the other way: if you know what {D} does to every smooth function, then you can figure out which vector it’s taking the partial derivative of. This is the trick we use in order to create the tangent space. Rather than trying to specify a vector {\vec v} directly (which we can’t do because we don’t have an ambient space), we instead look at arbitrary derivative-like functions, and associate them with a vector. More formally, we have the following.

Definition 1

A derivation {D} at {p} is a linear map {D : C^\infty(M) \rightarrow \mathbb R} (i.e. assigning a real number to every smooth {f}) satisfying the following Leibniz rule: for any {f}, {g} we have the equality

\displaystyle D(fg) = f(p) \cdot D(g) + g(p) \cdot D(f) \in \mathbb R.

This is just a “product rule”. Then the tangent space is easy to define:

Definition 2

A tangent vector is just a derivation at {p}, and the tangent space {T_p(M)} is simply the set of all these tangent vectors.

In fact, one can show that the product rule for {D} is equivalent to the following three conditions:

  1. {D} is linear, meaning {D(af+bg) = a D(f) + b D(g)}.
  2. {D(1_M) = 0}, where {1_M} is the constant function on {M}.
  3. {D(fg) = 0} whenever {f(p) = g(p) = 0}. Intuitively, this means that if a function {h = fg} vanishes to second order at {p}, then its derivative along {D} should be zero.

This suggests a third equivalent definition: suppose we define

\displaystyle \mathfrak m_p \overset{\mathrm{def}}{=} \left\{ f \in C^\infty M \mid f(p) = 0 \right\}

to be the set of functions which vanish at {p} (this is called the maximal ideal at {p}). In that case,

\displaystyle \mathfrak m_p^2 = \left\{ \sum_i f_i \cdot g_i \mid f_i(p) = g_i(p) = 0 \right\}

is the set of functions vanishing to second order at {p}. Thus, a tangent vector is really just a linear map

\displaystyle \mathfrak m_p / \mathfrak m_p^2 \rightarrow \mathbb R.

In other words, the tangent space is actually the dual space of {\mathfrak m_p / \mathfrak m_p^2}; for this reason, the space {\mathfrak m_p / \mathfrak m_p^2} is defined as the cotangent space (the dual of the tangent space). This definition is even more abstract than the one with derivations above, but it has the advantage (or so I’m told) that it can be transferred to other settings (like algebraic varieties).

EDIT (Oct 5 2015): Reproducing this Reddit comment by tactics, The beauty of the definition given in this blog post is stuffed away into an easily-forgotten sentence. This definition:

  • Does not rely on any kind of parametrization, and
  • It is defined only in terms of the ring of “regular functions” defined on the space.

The former is nice for philosophical reasons. The latter is nice because we can pull in a lot of intuition about manifolds into the study of algebraic varieties, and similarly, we can inject a lot of ring theory into the study of manifolds.

With all these equivalent definitions, the last thing I should do is check that this definition of tangent space actually gives a vector space of dimension {n}. To do this it suffices to show verify this for open subsets of {\mathbb R^n}, which will imply the result for general manifolds {M} (which are locally open subsets of {\mathbb R^n}). Using some real analysis, one can prove the following result:

Theorem 3

Suppose {M \subset \mathbb R^n} is open and {0 \in M}. Then

\displaystyle \begin{aligned} \mathfrak m_0 &= \{ \text{smooth functions } f : f(0) = 0 \} \\ \mathfrak m_0^2 &= \{ \text{smooth functions } f : f(0) = 0, (\nabla f)_0 = 0 \}. \end{aligned}

In other words {\mathfrak m_0^2} is the set of functions which vanish at {0} and such that all first derivatives of {f} vanish at zero.

Thus, it follows that there is an isomorphism

\displaystyle \mathfrak m_0 / \mathfrak m_0^2 \cong \mathbb R^n \quad\text{by}\quad f \mapsto \left[ \frac{\partial f}{\partial x_1}(0), \dots, \frac{\partial f}{\partial x_n}(0) \right]

and so the cotangent space, hence tangent space, indeed has dimension {n}.