This article is also available as a PDF.

Introduction

The Riemann Hypothesis, first posed in 1859, is one of the most famous unsolved problems in mathematics. It’s usually stated as a conjecture about the Riemann zeta function, which is defined as the series \[\zeta(s) = \sum_{n=1}^\infty n^{-s}\] when \(\mathrm{Re}(s)>1\), and which can be extended to all of \(\mathbb{C}\) by analytic continuation. (We’re following the somewhat strange convention, universal in analytic number theory, of referring to the argument in a function like this as \(s\).) The conjecture is that the only zeroes of \(\zeta\) for \(0\le\mathrm{Re}(s)\le 1\) are at points where \(\mathrm{Re}(s)=\frac12\).

When it’s stated this way, it’s pretty difficult to see why it would be anything more than a curiosity — it might be somewhat surprising that it’s so hard to control the zeroes of a function with such a simple definition, but that would be about it. In fact, the reason Riemann proposed the Riemann Hypothesis comes from a deep connection between the Riemann zeta function and the distribution of prime numbers.

Specifically, the Prime Number Theorem, which was first proved in 1898, states that if we define \(\pi(x)\) to be the number of primes \(\le x\), then \[\pi(x) \sim \frac{x}{\log x}.\] (The notation \(\sim\) here refers to asymptotic equality; we say \(f(x)\sim g(x)\) if \(\lim_{x\to\infty} f(x)/g(x)=1\).) Though it’s far from obvious looking at the statement, the proof of the Prime Number Theorem is intimately connected to the zeroes of the Riemann zeta function, and, if it could be established, the Riemann Hypothesis would give us much greater control over the error in a closely related estimate for \(\pi(x)\) than we are able to prove without it.

The purpose of this article is to explain how these connections come about. The intended reader has been exposed to some complex analysis — our main tool will be the Residue Theorem — but might not know anything about the Riemann Hypothesis or how it’s connected to number theory.

This is, in my opinion, a story whose broad strokes are very beautiful, but which gets quite technical if one wants to prove everything completely. We will therefore be skipping a lot of proofs. If you’re interested in going deeper, you can learn much more from analytic number theory textbooks. In preparing this article, I was greatly helped by:

  • Multiplicative Number Theory I. Classical Theory by Hugh Montgomery and Robert Vaughan,
  • Multiplicative Number Theory by Harold Davenport, and
  • Complex Analysis by Elias Stein and Rami Shakarchi.

I’m grateful to Hunter Brooks, Paul Dancstep, Jake Levinson, and Julian Rosen for helpful comments on earlier drafts of this article, and to Vivian Kuperberg for many enlightening conversations on this topic.

Dirichlet Series

The above expression for the Riemann zeta function has the form of a Dirichlet series, that is, a series of the form \[f(s)=\sum_{n=1}^\infty a_n n^{-s}.\] Our investigation of the relationship between the zeta function and the distribution of primes will benefit from a quick overview of the theory of Dirichlet series in general.

First, we should say a bit about what the point of studying a Dirichlet series might be. You may be familiar with the fact that it can be useful to study a sequence of numbers \((c_n)\) by looking at its generating function, that is, by examining the power series \(\sum_{n=0}^\infty c_nz^n\). By manipulating the generating function using the tools of complex analysis, it’s often possible to extract combinatorial information about the coefficients of the power series of the resulting function.

One way to see why one might prefer to study some sequences using Dirichlet series is to examine what happens when you multiply two ordinary power series and compare it to the corresponding result for Dirichlet series. We have that \[\left(\sum_{n=0}^\infty c_n z^n\right)\left(\sum_{n=0}^\infty d_n z^n\right) = \sum_{n=0}^\infty\left(\sum_{k+m=n} c_kd_m\right) z^n,\] whereas \[\left(\sum_{n=1}^\infty a_n n^{-s}\right)\left(\sum_{n=1}^\infty b_n n^{-s}\right) = \sum_{n=1}^\infty\left(\sum_{km=n} a_kb_m\right) n^{-s}.\] Note that computing the coefficient of \(z^n\) in the power series involves a sum over all \(k,m\) with \(k+m=n\), whereas for the coefficient of \(n^{-s}\) in the Dirichlet series we look at \(k,m\) with \(km=n\).

This means that Dirichlet series are especially well-suited to the case where the coefficients behave in an interesting way with respect to multiplication rather than addition. One nice example of this (which I encourage you to verify) is that the \(n\)’th coefficient of the Dirichlet series for \((\zeta(s))^2\) is the number of divisors of \(n\).

Where power series have a radius of convergence, Dirichlet series have an abscissa of convergence, that is, there is a real number \(\sigma_c\) such that the series converges at \(s\) if \(\mathrm{Re}(s)>\sigma_c\) and diverges if \(\mathrm{Re}(s)<\sigma_c\). (As with power series, it’s much harder to pin down what happens right on the boundary.) We might have \(\sigma_c=-\infty\), meaning that the series converges everywhere, or \(\sigma_c=\infty\), meaning it converges nowhere.

This phenomenon is a manifestation of a general pattern which is useful to keep in mind when dealing with Dirichlet series: much of the time, where you would use a circle or a disk to do something with a power series, you will use a vertical line or a right half plane to do the same thing with a Dirichlet series.

This pattern appears again in the next fact that we’ll discuss. One very useful tool when studying ordinary generating functions is the Cauchy Integral Formula, which lets you extract each coefficient of power series by expressing it as an integral around a circle. Our next object of discussion is an analogous result that lets you use an integral to extract coefficients from a Dirichlet series.

The Cauchy Integral Formula is powered by the observation that the integral of \((1/2\pi i)z^n\) around the unit circle is 1 if \(n=-1\) and 0 otherwise. Therefore, if we divide a power series by \(z^{n+1}\) and perform this integral, we learn about the \(n\)’th coefficient.

We’d like to find an analogous way to use an integral to extract coefficients from a Dirichlet series, and in order to do this we’ll need something to play the role of the fact about the integral of \(z^n\) around the unit circle. The fact in question is that \[\frac{1}{2\pi i}\lim_{T\to\infty}\int_{c-iT}^{c+iT}\frac{y^s}{s}ds = \begin{cases} 1, & y>1 \\ \frac12, & y=1 \\ 0, & 0<y<1 \end{cases}\] for \(c,y>0\). (Note that, in keeping with our pattern, we’re now looking at an integral along a vertical line rather than a circle.)

Proving this is a bit more involved than proving the fact about \(z^n\), so I will offer you just a sketch of the argument and encourage you to fill it in yourself. Suppose first that \(y>1\). We’ll apply the Residue Theorem to the integral of \(y^s/s\) around a large rectangle whose right edge is the vertical line from \(c-iT\) to \(c+iT\). (Note that this encloses the pole at \(s=0\), whose residue is 1.) Because \(y>1\), \(y^s\) gets smaller as \(s\) gets more negative, and with sufficient care about how the width of the rectangle grows in relation to \(T\), you can use this to make the integrals along the top, bottom, and left sides of the rectangle go to zero as \(T\) goes to \(\infty\), leaving only the piece we care about.

When \(0<y<1\), we can make a similar argument, except now \(y^s\) gets small as \(s\) increases, which means we should extend the rectangle to the right, where it doesn’t enclose any poles of \(y^s/s\). When \(y=1\), you can compute the integral directly.

This gives us a nice way to pick out terms from a Dirichlet series. Suppose \(x>0\). If \(f(s)=\sum_{n=1}^\infty a_n n^{-s}\), then we should expect \[\begin{aligned} \frac{1}{2\pi i}\lim_{T\to\infty} \int_{c-iT}^{c+iT} f(s) \frac{x^s}{s} ds &= \frac{1}{2\pi i}\lim_{T\to\infty} \int_{c-iT}^{c+iT} \sum_{n=1}^\infty a_n \frac{(x/n)^s}{s} ds \\ &= \frac{1}{2\pi i} \sum_{n=1}^\infty a_n \lim_{T\to\infty} \int_{c-iT}^{c+iT} \frac{(x/n)^s}{s} ds \\ &= \sum_{n\le x}{\vphantom{\sum}}' a_n,\end{aligned}\] where the number theorists’ notation \(\sum'\) indicates that, if \(x\) is an integer, then we should count the term where \(n=x\) with a coefficient of \(\frac12\). This result is called Perron’s Formula, and it is valid as long as everything converges fast enough to enable the exchange of the integral and summation on the second line. This turns out to work as long as \(c>\max(0,\sigma_c)\).

A Target for Perron’s Formula

The Functions \(\Lambda\) and \(\psi_0\)

With Perron’s Formula in hand, we would like a function we can apply it to from which we might extract information about the distribution of the primes. Ideally, we’d arrange for the sum appearing on the right side of Perron’s Formula to be equal to the prime-counting function \(\pi(x)\). That way, we would have an expression for \(\pi(x)\) in terms of an integral, and we could perhaps then use the tools of complex analysis to estimate that integral.

We will in fact do something quite similar to this, but it turns out to be much easier to target a function which is similar to, but a bit different from, \(\pi(x)\) itself. We’ll therefore take a bit of time to introduce this function and discuss how it relates to \(\pi(x)\) before we come back to the question of how to connect it to Perron’s Formula.

We’ll define \[\Lambda(n) = \begin{cases} \log p & n\text{ is a power of $p$ for some prime $p$} \\ 0 & \text{otherwise.} \end{cases}\] This \(\Lambda(n)\) is called the von Mangoldt function. We’ll then define the Chebyshev psi function as \[\psi_0(x) = \sum_{n\le x}{\vphantom{\sum}}' \Lambda(n)\] for \(x>0\), where \(\sum'\) is as in the previous section. This function \(\psi_0(x)\) will be our substitute for \(\pi(x)\). Our plan will be somewhat indirect: we’ll use Perron’s formula to get quantitative information about \(\psi_0(x)\), from which it will be possible to prove some corresponding facts about \(\pi(x)\).

It’s useful to see how this second step might work, so we’ll go through one example now. Specifically, suppose we knew that \(\psi_0(x)\sim x\); we will show that then \(\pi(x)\sim x/\log x\), which was our statement of the Prime Number Theorem in the introduction. (This argument won’t be used going forward, so you should feel free to skip it if you’d like.)

First, note that (assuming \(x\) is not an integer for simplicity) \[\psi_0(x) = \sum_{p\le x}\left\lfloor \frac{\log x}{\log p} \right\rfloor\log p,\] since there are \(\left\lfloor \log x/\log p \right\rfloor\) powers of each prime \(p\) less than \(x\) and each one contributes \(\log p\) to the sum. But this is \[\le\sum_{p\le x}\frac{\log x}{\log p}\log p = \pi(x)\log x,\] and so, for all \(x\), we have that \(\frac{\psi_0(x)}{x}\le \frac{\pi(x)}{x/\log x}\). This gives us that \(\liminf_{x\to\infty}\frac{\pi(x)}{x/\log x}\ge 1\).

For the other inequality, fix \(0<\alpha<1\). We then have (still assuming \(x\) is not an integer) \[\psi_0(x) \ge \sum_{p\le x}\log p \ge \sum_{x^\alpha<p\le x}\log p \ge (\pi(x) - \pi(x^\alpha))\log (x^\alpha),\] which means that \[\psi_0(x) + \alpha\pi(x^\alpha)\log x \ge \alpha \pi(x)\log x,\] which implies that \[\frac{\psi_0(x)}{x} + \frac{\alpha \pi(x^\alpha) \log x}{x} \ge \alpha \frac{\pi(x)}{x/\log x}.\]

Since \(\pi(x^\alpha)<x^\alpha\), we can bound the second term on the left from above by \(\alpha x^{\alpha-1}\log x\), which goes to 0 as \(x\to\infty\). Since \(\alpha\) can be arbitrarily close to 1, this gives us that \(\limsup_{x\to\infty}\frac{\pi(x)}{x/\log x}\le 1\), which was the rest of what we needed.

Perron’s Formula and \(\psi_0\)

Our next task is to find some way for \(\psi_0(x)\) to pop out of Perron’s Formula. This will happen if we can find some function \(f(s)\) whose Dirichlet series is \[f(s) = \sum_{n=1}^\infty \Lambda(n) n^{-s}.\] With that in mind, let’s suppose we had such a function \(f\) and see if we can turn its Dirichlet series into something recognizable.

First, note that \[f(s) = \sum_{m=1}^\infty \Lambda(n) n^{-s} = \sum_{p\text{ prime}}\sum_{m=1}^\infty(\log p)p^{-ms}.\] The term \((\log p)p^{-ms}\) is the derivative of \(-p^{-ms}/m\), so if we define \[F(s) = \sum_{p\text{ prime}}\sum_{m=1}^\infty\frac{p^{-ms}}{m},\] then \(F'(s)=-f(s)\). Now, recall that the power series for \(\log(1-x)\) is \(-\sum_{m=1}^\infty x^m/m\). We can therefore write \[F(s) = -\sum_{p\text{ prime}}\log(1-p^{-s}) = \log\left(\prod_{p\text{ prime}}\frac{1}{1-p^{-s}}\right).\]

The expression inside the logarithm here is the famous Euler Product formula for the Riemann zeta function: if \(\mathrm{Re}(s)>1\), then \[\zeta(s)=\prod_{p\text{ prime}}\frac{1}{1-p^{-s}}.\] Proving that this infinite product converges in the required range is a bit delicate, but it’s not hard to see informally why it ought to be equal to the sum defining \(\zeta\). Using the usual power series for \(1/(1-z)\), we see that \[\prod_{p\text{ prime}}\frac{1}{1-p^{-s}} = \prod_{p\text{ prime}}\left(1+p^{-s}+(p^2)^{-s}+\cdots\right).\] If we imagine multiplying out this product, each term of the result will contain, for each prime, a factor of the form \((p^m)^{-s}\) for some \(m\). Every positive integer \(n\)’s prime factorization will appear exactly once in this way, and the corresponding term will be equal to \(n^{-s}\), exactly what appears in the series defining \(\zeta\).

Pulling this all together, we conclude that \(F(s)=\log\zeta(s)\), and therefore \(f(s)=-F'(s)=-\zeta'(s)/\zeta(s)\). In other words, \(\Lambda(n)\) is the \(n\)’th Dirichlet coefficient for the function \(-\zeta'(s)/\zeta(s)\). Applying Perron’s formula to this function then tells us that \[\psi_0(x) = \frac{1}{2\pi i}\lim_{T\to\infty} \int_{c-iT}^{c+iT} \left(-\frac{\zeta'(s)}{\zeta(s)}\right)\frac{x^s}{s}ds\] whenever \(c\) is greater than both 0 and the abscissa of convergence of the Dirichlet series for \(\zeta'/\zeta\), which turns out to be 1. This integral will be our window into the distribution of the primes.

Evaluating the Integral

We’ll now take on the task of seeing what we can learn about this integral. From here on, we’re going to have to state a lot of things without proof.

Our main tool will once again be the Residue Theorem. Note first that \(x^s/s\) gets small when \(\mathrm{Re}(s)\) and \(|s|\) are small. This suggests that, if we complete our vertical line contour into a large rectangle extending to the left, then (assuming we can control the growth of \(\zeta'/\zeta\)) the contribution of the other three sides of the rectangle ought to go to zero.

This does in fact work. With this result in hand, we can conclude that \(\psi_0(x)\) should be equal to the sum of the residues of the integrand \(\left(-\frac{\zeta'(s)}{\zeta(s)}\right)\frac{x^s}{s}\) at all of its poles with real part \(\le 1\). So we now turn to an examination of those poles.

We’ll first need some facts about \(\zeta\). Our Dirichlet series defining \(\zeta\) converges only for \(\mathrm{Re}(s)>1\). It is possible, though, to analytically continue \(\zeta\) to a meromorphic function on all of \(\mathbb{C}\), and we follow the usual practice in complex analysis of referring to this analytic continuation as \(\zeta\) even when it is no longer equal to the original series.

The resulting function \(\zeta\) has exactly one pole. It’s at \(s=1\), it’s simple, and its residue is 1. The zeta function has no zeroes or poles for \(\mathrm{Re}(s)>1\); this in fact follows from the Dirichlet series representation.

It is somewhat tedious but essentially elementary to show that \(\zeta\) satisfies the following functional equation: \[\zeta(s) = \pi^{s-\frac12} \frac{\Gamma((1-s)/2)}{\Gamma(s/2)} \zeta(1-s).\] (Here \(\pi\) is the familiar mathematical constant, not the prime-counting function, and \(\Gamma\) is the gamma function.) This gives us a way to control the zeroes of \(\zeta(s)\) for \(\mathrm{Re}(s)<0\). Such a zero can come from...

  • ...zeroes of \(\Gamma((1-s)/2)\). There are none of these; \(\Gamma\) in fact has no zeroes at all.
  • ...zeroes of \(\zeta(1-s)\). If \(\mathrm{Re}(s)<0\), then \(\mathrm{Re}(1-s)>1\), and we just said \(\zeta\) has no zeroes in this region.
  • ...poles of \(\Gamma(s/2)\). Here we do get some: \(\Gamma\) has poles at every negative integer, which gives \(\zeta\) a zero at every negative even integer.

The zeroes at the negative even integers are called the trivial zeroes of \(\zeta\). The region \(0\le\mathrm{Re}(s)\le 1\) is called the critical strip, and what we’ve just shown is that it’s the only part of the complex plane in which nontrivial zeroes of \(\zeta\) can live.

Let’s use this information to examine the residues of our integrand \(\left(-\frac{\zeta'(s)}{\zeta(s)}\right)\frac{x^s}{s}\). Recall that, for any meromorphic function \(f\), the logarithmic derivative \(f'/f\) has a simple pole at any point where \(f\) has a zero or a pole, and the residue is equal to the multiplicity of \(f\) at that point. We therefore see that our integrand has poles at:

  • \(s=0\) from the \(1/s\). The residue is \(-\zeta'(0)/\zeta(0)\), which turns out to be \(-\log(2\pi)\).
  • \(s=1\) from the pole of \(\zeta\) there. The residue is \(x\).
  • \(s=-2n\), for any positive integer \(n\), from the trivial zeroes of \(\zeta\). These contribute a total of \(-\sum_{n=1}^\infty\frac{x^{2n}}{2n} = -\frac12\log(1-x^{-2})\).
  • \(s=\rho\), where \(\rho\) is any nontrivial zero of \(\zeta\). The residue of each of these is \(-x^\rho/\rho\) times the multiplicity of the zero.

Putting all of this information together, we conclude that \[\psi_0(x) = x - \sum_\rho\frac{x^\rho}{\rho} - \log(2\pi) - \frac12\log(1-x^{-2}),\] where \(\rho\) ranges over all nontrivial zeroes of \(\zeta\) and we count them with multiplicity.

Estimates of the Distribution of Primes

This already gives us a very tight relationship between the distribution of primes — as captured by the growth of \(\psi_0(x)\) — and the locations of the nontrivial zeroes of the Riemann zeta function. The game one plays from here is to try to find regions of the critical strip without any zeroes (unsurprisingly called zero-free regions), and to try to control the number of zeroes in a given region of the critical strip as a function of the region’s size.

We won’t pursue this project now, since it gets very technical very quickly. We’ll restrict ourselves in this final section to a quick survey of the sorts of estimates that can be extracted from results of this type.

First, it is possible to get enough control over the zeroes of \(\zeta\) to show that \(\sum_\rho x^\rho/\rho = o(x)\), and this is in turn enough to establish that \(\psi_0(x)\sim x\), which we earlier argued implies the Prime Number Theorem. More precisely, the arguments for accomplishing this that appear in most of the textbooks on this subject end up showing that \[\psi_0(x) = x + O(x\exp[-c(\log x)^{\frac12}])\] for some constant \(c>0\).

What does this mean for the prime-counting function \(\pi(x)\) itself, which after all is a much more interesting object than \(\psi_0(x)\)? To state this, we have to confront a small lie I’ve been telling about the Prime Number Theorem up to now. While it is true that \(\pi(x)\sim x/\log x\), this is actually not a very good estimate for \(\pi(x)\). In this form, the estimate of the error ends up taking the form \[\pi(x) = \frac{x}{\log x} + O\left(\frac{x}{(\log x)^2}\right),\] and one can show that this is in a sense best possible.

One can get much closer by instead estimating \(\pi(x)\) in terms of the function \[\operatorname{li}(x) = \int_2^x \frac{1}{\log t}dt,\] which is also asymptotically equal to \(x/\log x\) but comes much closer to being a good estimate for \(\pi(x)\). It is possible to turn the above estimate for \(\psi_0(x)\) into a very similar-looking estimate for \(\pi(x)\): \[\pi(x) = \operatorname{li}(x) + O(x\exp[-c(\log x)^{\frac12}]).\]

While better than \(x/(\log x)^2\), this is still not a great error term; for example, it’s still asymptotically larger than \(x^{1-\epsilon}\) for every positive \(\epsilon\). This is perhaps not surprising if you take another look at the formula we found for \(\psi_0(x)\) above. Any zero \(\rho\) of \(\zeta\) will make a contribution to that sum whose size is on the order of \(x^{\mathrm{Re}(\rho)}\). This strongly suggests that we won’t be able to improve our estimate by much unless we can get all the zeroes of \(\zeta\) to have small real part.

And in fact, it is possible to make this intuition precise: it can be shown that, if every zero of \(\zeta\) in the critical strip has real part \(\le\sigma\), then \[\psi_0(x) = x + O(x^\sigma(\log x)^2)\] and \[\pi(x) = \operatorname{li}(x) + O(x^\sigma\log x).\] This is much better — for example, the error is smaller than \(x^{\sigma+\epsilon}\) for any \(\epsilon\). The Riemann Hypothesis would of course produce this result with \(\sigma=\frac12\), and because we know that there are plenty of zeroes of \(\zeta\) on this line, it’s not possible to make \(\sigma\) any smaller than this.

This is the main reason Riemann proposed the Riemann Hypothesis — it would give us much better control over the growth of \(\pi(x)\) than we are able to prove otherwise. (One nice way to interpret the claim is that it means that the first half of the digits of the \(n\)’th prime should be expected to line up with those of \(\operatorname{li}^{-1}(n)\).) In the years since 1859, an astoundingly large number of other consequences of the Riemann Hypothesis have been discovered both inside and outside number theory. To give just a couple number-theoretic examples of a similar flavor to what we’ve already discussed:

  • The Riemann Hypothesis implies that, for any prime \(p\), the distance from \(p\) to the next prime is \(O(p^{\frac12}\log p)\).
  • The sigma function \(\sigma(n)\) is defined to be the sum of all the divisors of \(n\). The Riemann Hypothesis implies that \(\sigma(n) < e^\gamma n \log \log n\) for all sufficiently large \(n\). (Here \(\gamma\) is the Euler–Mascheroni constant.)
  • The Euler totient function \(\phi(n)\) is defined to be the number of positive integers \(\le n\) which are relatively prime to \(n\). The Riemann Hypothesis is equivalent to the claim that, if \(N_k\) is the product of the first \(k\) primes, then \(N_k/\phi(N_k) > e^\gamma\log \log N_k\).
  • A somewhat goofier one involves the Redheffer matrix \(A(n)\), which is defined to be the \(n\times n\) matrix whose \((i,j)\) entry is 1 if either \(j=1\) or \(i\) divides \(j\), and 0 otherwise. The Riemann Hypothesis is equivalent to the claim that \(\det(A(n)) = O(n^{\frac12+\epsilon})\) for every \(\epsilon>0\).

The list of interesting consequences is much larger for the so-called Generalized Riemann Hypothesis, which is the conjecture that a class of meromorphic functions called Dirichlet \(L\)-functions (of which \(\zeta\) is one example) also have all their nontrivial zeroes on the line \(\mathrm{Re}(s)=\frac12\). There are two discussions (links here and here) on the website MathOverflow that are a good source for more information about interesting consequences of the Riemann Hypothesis and its generalizations.

My hope is that this article has made the connection between the zeta function and the primes feel less mysterious, but, speaking personally, I don’t think it will ever stop feeling remarkable — it’s quite surprising that it’s possible to use information about the zeroes of a meromorphic function like \(\zeta\) to answer questions about primes and divisibility of integers. This connection is, in a sense, the driving force behind much of modern analytic number theory, and there is much more to learn if you’re interested in going further.