Maximising the number of solutions to a linear equation in a set of integers

Given a linear equation of the form $a_1x_1 + a_2x_2 + a_3x_3 = 0$ with integer coefficients $a_i$, we are interested in maximising the number of solutions to this equation in a set $S \subseteq \mathbb{Z}$, for sets $S$ of a given size. We prove that, for any choice of constants $a_1, a_2$ and $a_3$, the maximum number of solutions is at least $\left(\frac{1}{12} + o(1)\right)|S|^2$. Furthermore, we show that this is optimal, in the following sense. For any $\varepsilon>0,$ there are choices of $a_1, a_2$ and $a_3,$ for which any large set $S$ of integers has at most $\left(\frac{1}{12} + \varepsilon\right)|S|^2$ solutions. For equations in $k \geq 3$ variables, we also show an analogous result. Set $\sigma_k = \int_{-\infty}^{\infty} (\frac{\sin \pi x}{\pi x})^k dx.$ Then, for any choice of constants $a_1, \dots, a_k$, there are sets $S$ with at least $(\frac{\sigma_k}{k^{k-1}} + o(1))|S|^{k-1}$ solutions to $a_1x_1 + \dots + a_kx_k = 0$. Moreover, there are choices of coefficients $a_1, \dots, a_k$ for which any large set $S$ must have no more than $(\frac{\sigma_k}{k^{k-1}} + \varepsilon)|S|^{k-1}$ solutions, for any $\varepsilon>0$.


Introduction
Let a 1 , a 2 and a 3 be fixed coprime integers, none of which is zero. We will consider the linear equation a 1 x 1 + a 2 x 2 + a 3 x 3 = 0. (1.1) In this paper, we are interested in the problem of finding sets with as many solutions to (1.1) as possible. This leads to the following definition.
The trivial upper bound on T (S) is T (S) |S| 2 . This is because, for any choice of x 1 and x 2 , there is at most one choice of x 3 such that a 1 x 1 + a 2 x 2 + a 3 x 3 = 0, namely x 3 = −a1x1−a2x2 a3 . We are interested in making T (S) as large as possible, for a fixed size |S|.
For some choices of coefficients a 1 , a 2 and a 3 , the exact maximal value of T (S) is known. For example, consider the case a 1 = a 2 = a 3 = 1. Then, work of Hardy and Littlewood [8] and Gabriel [5] shows that, when |S| is odd, T (S) is maximised when S is an interval centred about 0. This was extended to even |S| by Lev in [10]. In fact, their arguments show that if S ⊆ Z is a set, and S is an interval centred about 0 of the same size, then T a1,a2,a3 (S) T 1,1,1 (S ). The ideas behind their approaches involve rearrangement inequalites, which are discussed in detail in [9,Chapter 10], and which inspire some of the arguments in this paper.
The set of solutions to x 1 − 2x 2 + x 3 = 0 is precisely the set of three-term arithmetic progressions; that is, the set of affine shifts of the set {0, 1, 2}. By analogy with this, Bhattacharya, Ganguly, Shao and Zhao considered longer arithmetic progressions; in [2,Theorem 2.4], they proved that the number of k term arithmetic progressions in a set S of n integers is maximised when S is an interval.
Ganguly (in a personal communication) asked about other affine patterns; in particular, finding sets S with as many affine copies of {0, 1, 3}, or solutions to x + 2y = 3z, as possible. In this case, such a result would necessarily be less clean; for instance, there are more solutions to x + 2y = 3z in {0, 1, 3} than in {0, 1, 2}.
Indeed, in general, much less is known. For a lower bound on the maximal value of T (S), a fairly good bound is given by the following example. Proof. The idea behind the construction is to split S into three pieces S 1 , S 2 and S 3 , of roughly equal size, for which there are many solutions to a 1 x 1 + a 2 x 2 + a 3 x 3 = 0 with each x i taken from S i . Let M be a large integer, which we assume to be divisible by 6. We will define However, we may find a large collection of triples (x 1 , x 2 , x 3 ) by choosing x 1 ∈ S 1 and x 2 ∈ S 2 arbitrarily, and selecting those for which x 3 = −a1x1−a2x2 a3 is in S 3 . If x 1 = a 2 a 3 x 1 and x 2 = a 1 a 3 x 2 , then we have x 3 = −a 1 a 2 (x 1 + x 2 ). Therefore, a pair (x 1 , x 2 ) will give rise to a solution precisely when |x 1 + x 2 | M/6.
We may compute the number of such pairs (x 1 , x 2 ) as the sum M/6 Thus, the number of triples is at least 1 12 Given this, it is natural to define the following quantity: where S runs over subsets of Z.
Thus, the assertion that 1 12 γ a1,a2,a3 holds for all a 1 , a 2 and a 3 follows from Proposition 1.2 and the work of Hardy and Littlewood in [8].
As far as the author is aware, exact values for γ a1,a2,a3 are only known in cases for which |a 1 a 2 a 3 | 2 (this includes the cases previously discussed). In particular, we have Theorem 2]. The same holds in the third non-equivalent case with |a 1 a 2 a 3 | = 2, namely γ 1,2,1 = 1 2 . Even the value of γ 1,2,−3 is not known, although the author conjectures that it is 1 3 , which is the value calculated for S = [−M/2, M/2]. The main theorem of this paper is a converse, of sorts, to Proposition 1.2. In particular, we will prove the following.
Theorem 1.4. The constant 1 12 in the statement of Proposition 1.2 is optimal, in the following sense. For any ε > 0, there exists a choice of a 1 , a 2 and a 3 for which γ a1,a2,a3 In view of this theorem, (1.2) gives the best possible bounds on γ a1,a2,a3 that are independent of the coefficients a i .
The plan for this paper is as follows. In Section 2, we will record some additive combinatorial lemmas that we will need in order to establish Theorem 1.4. In Section 3, we will use these lemmas to prove Theorem 1.4.
One might also ask about generalising Theorem 1.4 to other settings. For instance, given a system of m linear equations in k variables (where we assume that m k − 2), can we prove an analogue of Theorem 1.4?
If m = 1, then an analogue of Proposition 1.2 holds for any value of k 3. Set Then, for any choice of coefficients a 1 , . . . , a k , there are sets S with at least σ k k k−1 |S| k−1 + O(|S| k−2 ) solutions to a 1 x 1 + · · · + a k x k = 0. We will discuss (1.5) further in Section 4.
Furthermore, the corresponding analogue of Theorem 1.4 holds. For any ε > 0, there are choices of coefficients a 1 , . . . , a k for which any large set S must have no more than ( σ k k k−1 + ε)|S| k−1 solutions. For instance, for any small positive ε, we can find coefficients a 1 , a 2 , a 3 and a 4 with the property that T (S) ( 1 96 + ε)|S| 3 , where T (S) counts the number of solutions to a 1 x 1 + a 2 x 2 + a 3 x 3 + a 4 x 4 = 0. We will discuss this in Section 4.
On the other hand, the opposite is true in the case that m > 1. Indeed, it is possible to show that there is no constant c > 0, such that for any system of 2 equations in 4 variables, there are large sets S with at least c|S| 2 solutions to the system. We will prove this fact in Section 5.

Notation
As we have already noted, T (S) will be the number of solutions to a 1 x 1 + a 2 x 2 + a 3 x 3 in S. We can extend this by defining T (S 1 , S 2 , S 3 ) to be the number of solutions to a 1 x 1 + a 2 x 2 + a 3 x 3 , where x i ∈ S i . We will use the notation a · S to denote the set {ax, x ∈ S}.
We will also make frequent use of the Vinogradov notation f g to mean that f = O(g). When the is subscripted, we allow the implicit constant to depend on the subscripts. This version of the paper replaces a previous version [1]. The argument used to prove Theorem 1.4 is replaced with a new argument which avoids appealing to the arithmetic regularity lemma (and can handle a wider class of equations), and the results of Section 5 are new to this version.

Additive combinatorial lemmas
In this section, we will collect some lemmas that will be necessary for the proof of Theorem 1.4.
For any set A ⊆ Z, let δ[A] be its growth under the differencing operator, |A−A| |A| . If A and B are two sets of integers, let the additive energy between A and B, E(A, B) be defined by It is easy to see that this satisfies the following inequalities: the third of which follows immediately from the first two.
We will require the following lemma, which states that, when two sets A and B have δ[A] and δ [B] small, and if E(A, B) is large, then |A − B| is also small.
We will also require a weak form of a structure theorem due to Green and Sisask.
The quantity T (S 1 , S 2 , S 3 ) is related to the additive energy via the following lemma.
Proof. For any t ∈ Z, let μ(t) denote the number of ways of writing t = a 1 x 1 + a 2 x 2 , for x i ∈ S i . Thus, by definition, Now, we see that the inequality following from Cauchy-Schwarz. This completes the proof of Lemma 2.3.
The following two facts are standard results in additive combinatorics.
Lemma 2.6. Suppose that S 1 , S 2 , S 3 ⊆ Z are sets with sizes s 1 , s 2 and s 3 , respectively. Then, we have the bound Proof. We will first prove Lemma 2.6 in the case that a 1 , a 2 and a 3 are all 1. Without loss of generality, assume s 1 s 2 s 3 . Suppose first that s 3 s 1 + s 2 . In that case, we have . The first line follows from the trivial observation that for each pair of x ∈ S 1 and y ∈ S 2 , there can be at most one solution to x + y + z = 0 with z ∈ S 3 . The third line follows from our assumption on s 3 . Thus, (2.2) follows in this case. Now, suppose s 3 < s 1 + s 2 . In this case, we may apply [11, Lemma 2], which states that 2) follows in this case via an easy application of the Cauchy-Schwarz inequality. Finally, for arbitrary coefficients a 1 , a 2 and a 3 , observe that . This completes the proof of Lemma 2.6.
Finally, we will require the following theorem of Bukh: Given two coprime integers λ 1 and λ 2 , we have that for any S ⊆ Z, 3. Proof of Theorem 1.4 In this section, we will use the lemmas of Section 2 to prove Theorem 1.4. We must prove that, given a suitable choice of a 1 , a 2 and a 3 , all sufficiently large sets S have T (S) ( 1 12 + ε)|S| 2 .
Let ε > 0. Given our choice of ε, we must choose the values of the coefficients a 1 , a 2 and a 3 ; we will do so later. Suppose that S is a sufficiently large set. We will immediately apply the structure theorem, Theorem 2.2, to S, with ε 1 = ( ε 6 ) 4 . This gives us a decomposition S = S 1 · · · S n S 0 . We will start by showing that the contribution to T (S) from solutions a 1 x 1 + a 2 x 2 + a 3 x 3 = 0, with at least one of the x i taken from S 0 , is small.
Proof. The number of such solutions may be upper bounded by and so it suffices to show that each term is no greater than ε 6 |S| 2 . Applying Lemmas 2.3 and 2.5, we have At this point, we must bound the number of solutions to a 1 x 1 + a 2 x 2 + a 3 x 3 = 0, where each of x 1 , x 2 and x 3 is taken from an S i with i 1. To do this, we will start by restricting which triples (i, j, k) can have the property that there are many solutions with x 1 ∈ S i , x 2 ∈ S j and x 3 ∈ S k . For instance, the fact that δ[S 1 ] is small, together with an assumption that a 1 and a 2 are coprime and |a 1 + a 2 | is large, will imply that there cannot be too many solutions with x 1 , x 2 and x 3 all in S 1 .
In particular, this will give us a fairly rigid structure on the collection of triples S i , S j , S k such that T (S i , S j , S k ) can give a non-trivial contribution to T (S, S, S). In order to quantify this structure, we will draw a labelled digraph G whose vertices correspond to the S i with i 1. We will draw an edge from S i to S j with label a1 a2 if and only if T where K 1 is as in the statement of Theorem 2.2. Similarly, we will draw an edge with label a3 |S| 2 , and similarly for the other four possible labels.
In particular, observe that if there is an edge from S i to S j with label x, then there will be an edge from S j to S i with label x −1 . Our definition of G does not necessarily preclude the existence of multiple edges between S i and S j (with different labels), or edges from S i to S i . However, as part of the proof, we will show that this cannot happen, provided that we assume a suitable hypothesis on a 1 , a 2 and a 3 .
First, we will show that G captures almost all of the solutions to a 1 x 1 + a 2 x 2 + a 3 x 3 = 0. Then, the total number of solutions to a 1 x 1 + a 2 x 2 + a 3 x 3 = 0 among all of the bad triples is at most ε 4 |S| 2 .
Proof. There are six ways a triple (S i , S j , S k ) can be bad. One such way is if there is no edge from S i to S j with label a1 a2 .
Let us count the total number of solutions among triples for which the a1 a2 edge is missing. That is, since the number of pairs S i , S j is bounded by K 2 1 . Summing this over the six possible ways for a triple to be bad completes the proof of Lemma 3.2.
In view of Lemmas 3.1 and 3.2, it remains to show that the number of solutions among the good triples is at most ( 1 12 + ε 4 )|S| 2 , for a suitable choice of the coefficients a 1 , a 2 and a 3 . The values we will choose are a 1 = 1, a 2 = M and a 3 = M + 1, where We can now prove the following lemma: With the values of a 1 , a 2 and a 3 that we have chosen, the product of the labels along any cycle in G must be 1.
Remark. This immediately tells us that G has no loops (edges from a vertex to itself). In view of the fact that an edge from S i to S j with label x is accompanied by an edge from S j to S i with label x −1 , this also tells us that there can be at most one edge from S i to S j .
Remark. We have chosen particular values of the a i for simplicity; indeed, we only need a single choice of coefficients to work in order to establish Theorem 1.4. However, the same argument is able to establish Lemma 3.3, and thus also Theorem 1.4, for a much wider class of equations. For example, whenever a 1 , a 2 and a 3 are coprime, and at least two of the three coefficients are large enough, then the analogue of Lemma 3.3 holds, and thus γ a1,a2,a3 < 1/12 + ε.
Conversely, it does not suffice for just one of the a i to be large. For example, if a 1 = a 2 = 1, then it can be shown that, for S a slightly modified version of the set in Proposition 1.2, T 1,1,a3 (S) > 1 5 |S| 2 for any a 3 .
Proof of Lemma 3.3. Suppose there is a cycle whose label product is not 1; consider a shortest such cycle. By minimality, such a cycle may have no repeated vertices, and thus must have at most K 1 vertices. Thus, without loss of generality the cycle is S 1 , S 2 , . . . , S k , S 1 , where S i → S i+1 has label t i (with S k+1 = S 1 ), and k K 1 .
By Lemma 2.3, we deduce that for each i, Now, let us apply Lemma 2.1 to S i and S i+1 . We have that and so we deduce that Now, we can prove, by inductively applying Lemma 2.4, that Thus, setting i = k, we learn that since k K 1 . By hypothesis, t 1 t 2 . . . t k = 1. However, we know that t 1 t 2 . . . t k can be written in the form M e1 (M + 1) e2 for some integers e i not both zero. Suppose that e 1 is non-zero; the argument is similar if e 2 is non-zero.
Write t 1 t 2 . . . t k = r s for coprime integers r and s; our hypothesis tells us that M must divide r or s. Therefore, , as a consequence of (3.1). Thus, we have shown that |r · S 1 − s · S 1 | ( But, if S 1 is sufficiently large, this contradicts Theorem 2.7, which states that whenever |S 1 | is sufficiently large. This contradiction completes the proof of Lemma 3.3. To complete the proof of Theorem 1.4, we just need to bound the number of solutions to a 1 x 1 + a 2 x 2 + a 3 x 3 = 0, with x 1 , x 2 , x 3 taken from a good triple. The following lemma will achieve this. Then, the number of solutions to a 1 x 1 + a 2 x 2 + a 3 x 3 = 0 taken from good triples is bounded above by ( 1 12 + ε 4 )|S| 2 , whenever |S| is large enough.
Proof. We will start by defining a function with the property that if S i → S j has label t, then d(j) = td(i).
One way we can do this is as follows. For each connected component G of G, choose the smallest value of i such that S i is in G , and set d(i) = 1. Then, for any other j with S j in G , d(j) is determined by the product of the labels on any path from S i to S j . Lemma 3.3 guarantees that this value does not depend on the path chosen. Now, for each d, let R d = ∪ i:d(i)=d S i . Suppose that S i , S j , S k is a good triple, in that order (so, for example, the label on S i → S j is a 2 /a 1 ). Then, setting d = a 1 d(i), we have that Therefore, all of the solutions coming from the good triple S i , S j , S k will be counted in T (R d/a1 , R d/a2 , R d/a3 ), and so an upper bound for the total number of solutions coming from good triples is where the sum is taken over all d such that all three of the R i exist (in particular, there can be no more than n terms in the sum).
We may apply Lemma 2.6 to give an upper bound for this.
where the sum on the second line is over unordered pairs d 1 , d 2 such that d 1 /d 2 is equal to the ratio between two of the a i . The second inequality follows because if d 1 ∼ d 2 , then there is exactly one ratio a i /a j such that d 1 /d 2 = a i /a j . Thus, the term |R d1 ||R d2 | appears in at most one of the sums on the right-hand side of the first line. Finally, for i = 0, 1 and 2, define the quantity X i by By our construction of d, each |R d | appears as a term in exactly one of the X i . Furthermore, d 1 ∼ d 2 only if R d1 and R d2 are in different sums X i , and any term |R d1 ||R d2 | appears at most once in (3.6). Consequently, we have the upper bound the latter inequality following from an easy application of Cauchy-Schwarz, since X 0 + X 1 + X 2 |S|. This completes the proof of Lemma 3.4.
We have now essentially proven Theorem 1.4. Indeed, any solution to a 1 x 1 + a 2 x 2 + a 3 x 3 = 0 must either have some x i in S 0 , or must come from a bad triple, or must come from a good triple. Combining Lemmas 3.1, 3.2 and 3.4 gives the result if |S| is large enough.

Equations in more than 3 variables
A fairly natural extension of Theorem 1.4 is to ask if a similar result holds for k-variable equations a 1 x 1 + · · · + a k x k = 0. (4.1) As before, let T (S) be the number of solutions to (4.1) in S. Similarly, let T (S 1 , . . . , S k ) denote the number of solutions with x i taken from S i . We have a trivial upper bound for T (S), namely that T (S) |S| k−1 . Before presenting our analogous example to Proposition 1.2, we require some notation and definitions. Let I x : R → R denote the indicator function of a (real) interval of length x centred at the origin, so I x (y) = 1 if and only if |y| x 2 , and I x (y) = 0 otherwise. Remark. In the introduction, we gave the following formula for σ k : The equivalence of these forms follows from taking a Fourier transform and applying the convolution identity; the details can be seen in [3]. See also [12], where it can be shown that σ 2h is the leading coefficient of the polynomial Ψ h (n).
Remark. σ k obeys a simple asymptotic (see, for example, [13], or [6] for more terms): as k → ∞. We may interpret σ k combinatorially. If f k is the probability density function of a sum of k independent random variables distributed uniformly on [−1/2, 1/2], then σ k = f k (0). Thus, the form of the asymptotic for σ k is not surprising, in view of the Central Limit theorem. In particular, σ k = Φ k (1, . . . , 1).
Remark. There is an explicit formula for Φ. In general, we have (4.4) where ω(ε) = i ε i and ε · t = i ε i t i and sgn denotes the sign function. This is established in [3]. For k = 3, we can write (for t 1 t 2 t 3 ) otherwise. (4.5) In analogy with Proposition 1.2, we have the following.
The proof of Proposition 4.3 will rely on the following fact, which states that, when the coefficients a i are all 1, long progressions behave somewhat like real intervals. Then, the number of solutions to Proof. We may assume without loss of generality that the progressions S i have common difference 1. To prove Proposition 4.4, it suffices to use the following observation.
Up to an error which is at most O k ((s 1 + · · · + s k ) k−2 ), this can be written as an integral The two implications above allow us to show that, up to acceptable error, this is equal to which is equal to Φ(s 1 , . . . , s k ); we omit the details.
We are now ready to prove Proposition 4.3.
Proof of Proposition 4.3. As in Proposition 1.2, we will consider S as the union of k sets S 1 , . . . , S k , with the property that T (S 1 , . . . , S k ) is large.
The way we will do this is as follows. Let M be a large integer, which we assume to be divisible by 2k. Define Perhaps unsurprisingly, Theorem 1.4 also generalises to this setting.
Theorem 4.5. Let ε > 0. Then, there exist coefficients a 1 , . . . , a k with the property that, for any suitably large set S, The proof of Theorem 4.5 is broadly similar to the proof of Theorem 1.4. There are two main places in which the argument slightly differs. Firstly, we must generalise Lemma 2.3 to give a bound for T (S 1 , . . . , S k ) in terms of E(S 1 , S 2 ): Lemma 4.6. Suppose that S 1 , . . . , S k ⊆ Z are finite sets. Then, Proof. For any t ∈ Z, let μ(t) denote the number of ways of writing t = a 1 x 1 + a 2 x 2 , for x i ∈ S i . Thus, by definition, Define ν(t) to be the number of ways of writing t = −a 3 x 3 − · · · − a k x k , for x i ∈ S i . Thus, we see that Finally, we observe that t ν(t) 2 represents the number of solutions to the equation and so we can bound it by (|S 3 | . . . |S k |) 2− 1 k−2 , by the same argument used in (2.1) to bound the energy.
Secondly, we will have to apply a k variable analogue of Lemma 2.6. The analogue of this is the following: Remark. This lemma is actually weaker than Lemma 2.6, where the error term was O(1). The weaker error term here comes from our reduction to the real case using Proposition 4.4; an inductive proof would likely give an O( s i ) k−3 error term. However, the O( s i ) k−2 error term is sufficient for our purpose.
Remark. If k is even, we can actually deduce a stronger version of (4.7) by using Hölder's inequality. We have where the second line used Hölder's inequality along with the fact that k is even. This is stronger than (4.7) via an application of the AM-GM inequality. It is unclear whether the stronger version holds in the case that k is odd; indeed, it is not too hard to establish for k = 3 by using (4.5). However, this stronger form is not necessary, so we only prove the version we need.
Proof of Lemma 4.7. First, observe that the statement of the lemma is unchanged if we assume without loss of generality that each a i is 1, since we may replace S i with a i · S i . The first step in the proof is to apply [10, Theorem 1], which says that we may take each S i to be an interval of length s i , roughly centred at the origin (depending on the parity of s i ), in order to maximise T (S 1 , . . . , S k ). We may immediately apply Proposition 4.4, which says that Thus, it suffices to prove that This will follow if we can prove that, for positive real numbers t 1 , . . . , t k , To prove (4.8), first observe that equality holds in the case that all of the t i are equal. Indeed, when t i = 1 the relation follows from the definition of σ, and for other constant values of t i the equality follows by homogeneity. Set To prove that Θ(t 1 , . . . , t k ) achieves its maximum value (with t 1 + · · · + t k fixed) when all of the t i are equal, observe that it will suffice to prove the following claim. Claim 1. If t 1 + t 2 is fixed (as well as each of t 3 , . . . , t k ), then Θ(t 1 , . . . , t k ) achieves its maximum when t 1 = t 2 .
To see that this claim is sufficient, observe that we may repeatedly replace the largest and smallest of the t i with their average. In doing so, max t i − min t i will tend to 0, and we can use the continuity of Θ to obtain the result.
To prove Claim 1, recall the expression for Θ(t 1 , . . . , t k ): . Now, observe that g may be written as a combination of intervals in the following sense: for some function h : R >0 → R >0 with bounded support. (The exception is when k = 3, in which case g is just a single interval. But that will not affect the remainder of the proof of Claim 1.) To see why this is the case, we may use induction. If k = 4, then suppose without loss of generality that t 3 t 4 . Then, we take h(r) = t 3 t 4 if , and 0 otherwise. For k > 4, it is easiest to apply the induction hypothesis to I t −1 3 * · · · * I t −1 k−1 , and then use a similar decomposition to the one we used for the k = 4 case. We omit the details.
In view of this decomposition, proving Claim 1 may be reduced to the following claim: Then, for any choice of t, we have that t 1 t 2 (I t −1 In fact, the easiest way to prove Claim 2 is via the following explicit formula for (I a * I b * I c )(0): assuming that c a, b without loss of generality. Given (4.9), we can prove that Θ(a, b, c) is a concave function. If, for instance, c −1 > a −1 + b −1 , then Θ(a, b, c) = c which is clearly concave. When a −1 , b −1 and c −1 satisfy the triangle inequality, then We may prove that this is concave by computing the Hessian matrix and showing that it is non-positive-definite everywhere; for instance, by using Sylvester's Rule. We omit the details. In particular, tΘ(t 1 , t 2 , t −1 ) = t 1 t 2 (I t −1 is concave as a function of t 1 and t 2 . Therefore, which is exactly the statement of Claim 2. This completes the proof of Claim 1, and thus Lemma 4.7.
Armed with our more general Lemmas 4.6 and 4.7, we may use an argument similar to the proof of Theorem 1.4 in Section 3 in order to prove Theorem 4.5.
To see why, observe that if X i + X j is kept fixed, moving X i and X j closer together increases the value of the left-hand side without changing the right-hand side. Thus, the left-hand side is maximised when the X i are all the same, at which point equality occurs.
Putting all of this together, we learn that which gives the bound in the statement of Theorem 4.5 when |S| is large enough.

Systems of more than one equation
Another way in which one might wish to extend Theorem 1.4 is to ask if a similar result holds for systems of m equations in k variables. One might imagine that a result of the following form ought to hold.
Question. Suppose k m + 2 and m 1. Does there exist an explicit positive constant σ m,k with the following properties. • For any ε > 0, there are systems such that the number of k-tuples satisfying A in any large S ⊆ Z is no more than (σ m,k + ε)|S| k−m .
Thus, Theorems 1.4 and 4.5 tell us that σ m,k exists whenever m = 1, and that σ 1,k = σ k . However, it turns out that when m > 1, not even the first of these has a positive answer, in the following sense.
Theorem 5.1. Let ε > 0. Then, there exists a non-degenerate system of two equations in four variables with the property that for any large enough S, there are no more than ε|S| 2 solutions to the system in S.
Remark. It is easy to see that Theorem 5.1 implies the analogous result for any choice of k, m with k m + 2 and m > 1.
The goal of this section is to prove Theorem 5.1.
Proof. We will prove Theorem 5.1 for the following system: where M is a sufficiently large constant (in terms of ε) to be chosen later. We will start by borrowing the following lemma, which appears as part of the proof of the Balog-Szemerédi-Gowers theorem.
Lemma 5.2 [14,Corollary 6.20]. Let G be a bipartite graph with vertex sets A and B and edge set E ⊆ A × B. Suppose |E| ε|A||B|, for some ε > 0. Then, we can find subsets A ⊆ A and B ⊆ B, with |A | ε |A| and |B | ε |B|, such that, whenever a ∈ A and b ∈ B , there are ε |A||B| paths of length 3 from a to b in G.
Let S be a sufficiently large set (in terms of M and ε), and suppose that there are more than ε|S| 2 solutions to (5.1) in S. Consider the bipartite graph on vertex set A B, where A = B = S; that is, both parts of G are S. Draw an edge from a to b if and only if there is a solution to (5.1) with x = a and y = b; in other words, if a + b and a + Mb are both in S. In particular, G has at least ε|S| 2 edges.
We may immediately apply Lemma 5.2 to G. This gives us sets A ⊆ A and B ⊆ B such that, for any a ∈ A and b ∈ B , there are ε |S| 2 paths of length 3 in G from a to b.
Claim. These sets A and B satisfy |A + B | ε |S| and |A + M · B | ε |S|.
Proof of Claim. To prove this claim, we can use an argument similar to that used in the proof of the Balog-Szemerédi-Gowers theorem. Showing that |A + B | ε |S| and |A + M · B | ε |S| are similar, so we will only do the former.
Let X denote the set of triples (x, y, z) of elements of (A + B) ∩ S, for which x − y + z ∈ A + B . We may trivially upper bound |X|; indeed, |(A + B) ∩ S| |S|, so |X| |S| 3 .
For a lower bound on |X|, consider an element a + b of A + B . By definition, there are ε |S| 2 paths of length 3 from a to b in G. Each such path may be written a ∼ b , a ∼ b , a ∼ b for some a ∈ A, b ∈ B. In other words, a + b , a + b and a + b are all in S. Now, (a + b ) − (a + b ) + (a + b) = (a + b), so we have located a triple x, y, z ∈ (A + B) ∩ S with x − y + z = a + b. These triples will be different for different paths, and so there must be ε |S| 2 such triples.
There are |A + B | elements of A + B , each of which gives ε |S| 2 triples x, y, z. Thus, we have that |A + B ||S| 2 ε |S| 3 , and thus |A + B | ε |S|, as required.
Let us now see how we may use this claim to complete the proof of Theorem 5.1. Lemma 2.4 immediately tells us that |B − M · B | ε |S|, and thus that |B − M · B | ε |B |. This contradicts Theorem 2.7, provided that M is sufficiently large.