Forbidden vector-valued intersections

We solve a generalised form of a conjecture of Kalai motivated by attempts to improve the bounds for Borsuk's problem. The conjecture can be roughly understood as asking for an analogue of the Frankl-R\"odl forbidden intersection theorem in which set intersections are vector-valued. We discover that the vector world is richer in surprising ways: in particular, Kalai's conjecture is false, but we prove a corrected statement that is essentially best possible, and applies to a considerably more general setting. Our methods include the use of maximum entropy measures, VC-dimension, Dependent Random Choice and a new correlation inequality for product measures.


Introduction
Intersection theorems have been a central topic of Extremal Combinatorics since the seminal paper of Erdős, Ko and Rado [9], and the area has grown into a vast body of research (see [2], [4] or [19] for an overview). The Frankl-Rödl forbidden intersection theorem is a fundamental result of this type, which has had a wide range of applications to different areas of mathematics, including discrete geometry [12], communication complexity [28] and quantum computing [6].
To state their result we introduce the following notation. Let k and t ∈ [n] let A × t A be the set of all (A, B) ∈ A × A with |A ∩ B| = t. Note that [n] k × t
Although the bounds from Conjecture 1.2 in general do not hold, it is still natural to ask whether we can find any (t, w)-intersection in such 'exponentially dense' subsets A ⊂ [n] k,s . If so, what is the optimal lower bound on |A × (t,w) A|? This paper investigates these questions; in particular, we give a natural correction to Conjecture 1.2.
Our results will apply to the following more general setting of vector-valued set 'sizes': given vectors V = (v i : i ∈ [n]) in R D , we define the V-size of A ⊂ [n] by We note that the Frankl-Rödl theorem concerns V-sizes where D = 1 and all v i = 1, and the Kalai conjecture concerns V-sizes where D = 2 and v i = (1, i).

Vector-valued intersections
In order to prove our forbidden V-intersection theorem, we need to work over a general alphabet, where we associate a vector with each possible value of each coordinate, as follows.  We also introduce a class of norms on R D to account for the possibility that different coordinates of vectors in V may operate at different scales. In the following definition we think of R as a scaling; e.g. for the Kalai vectors (1, i), we take R = (1, n).
Our V-intersection theorem requires two properties of the set of vectors V. The first property, roughly speaking, says that any vector in Z D can be efficiently generated by changing the values of coordinates, and that furthermore this holds even if a small set of coordinates are frozen, so that no coordinate is overly significant. To see why such a condition is necessary, suppose that D = 1 and almost all coordinates have only even values: then there are large families where all intersections have a fixed parity. Definition 1.7. Let V = (v i j ) be an (n, J)-array in Z D . We say that V is γ-robustly (R, k)-generating in Z D if for any v ∈ Z D with v R ≤ 1 and T ⊂ [n] with |T | ≤ γn there is S ⊂ [n] \ T with |S| ≤ k and j i , j ′ i ∈ J for all i ∈ S such that v = i∈S (v i Note that if V = (v i : i ∈ [n]), considered as an (n, {0, 1})-array, then Definition 1.7 says that for all such v and T there are disjoint S, S ′ ⊂ [n]\T with |S|+|S ′ | ≤ k such that v = i∈S v i − i∈S ′ v i .
We also make the following 'general position' assumption for V. R d . We say that V is γ ′ -robustly (γ, R)-generic if for any X ⊂ [n] with |X| > γ ′ n, some I ⊂ X is (γ, R)-generic for V.
We are now in a position to state our main theorem. It shows that, under the above assumptions on V, there are only two obstructions to a set X = ({0, 1} n ) V z satisfying a supersaturation result as in Kalai's conjecture (case i): either (case ii) there is a small set B f ull ⊂ X responsible for almost all w-intersections in X , or (case iii) there is a large set B empty ⊂ X containing no w-intersections. Furthermore, in case ii we obtain optimal supersaturation relative to B f ull .
-generic and γ i -robustly (R, k)-generating for i = 1, 2. Let z, w ∈ Z D with z = w and let X = ({0, 1} n ) V z . Then one of the following holds: iii. There is B empty ⊂ X with |B empty | ≥ ⌊(1 − ε) n |X |⌋ satisfying (B empty × B empty ) V∩ w = ∅. Furthermore, if ii holds and iii does not then any Remark 1.10. i. Theorem 1.9 applies to (t, w)-intersections in [n] k,s , as we have shown above that its hypotheses hold for the Kalai vectors. ii. As indicated above, cases ii and iii of Theorem 1.9 may simultaneously hold (see counterexample 1 of Section 5). iii. The assumption that V is γ 1 -robustly (R, k)-generating is redundant, as it is implied by γ 2robustly (R, k)-generating, but the assumptions of γ ′ i -robustly (γ i , R)-generic for i = 1, 2 are incomparable, and our proof seems to require this 'multiscale general position'.
We have highlighted Theorem 1.9 as our main result for the sake of giving a clean combinatorial statement. However, we will in fact obtain considerably more general results in two directions, whose precise statements are postponed until later in the paper.
• Our most general result, Theorem 6.2, implies cross-intersection theorems for two or more families and applies to families of vectors over any finite alphabet.
• Theorem 1.9 leaves open the question of how many w-intersections are guaranteed in large subsets of X when case (ii) holds; this is answered by Theorem 11.1.
It is natural to ask under which conditions the alternate cases of Theorem 1.9 hold. These conditions are best understood in relation to our proof framework, so we postpone this discussion to section 1.4, after we have introduced the two principal components of the proof.

A probabilistic forbidden intersection theorem
A key paradigm of our approach is that V-intersection theorems often have equivalent formulations in terms of certain product measures (the maximum entropy measures described in the next subsection), and that the necessary condition for these theorems appears naturally as a condition on the product measures. (A similar idea arose in the new proof of the density Hales-Jewett theorem developed by the first Polymath project [26], although in this case the natural 'equal slices' distribution was not a product measure.) To illustrate this point, we recast the Frankl-Rödl theorem in such terms. Again we identify subsets of [n] with their characteristic vectors in {0, 1} n , on which we introduce the product measure where q 1,1 = t/n, q 0,1 = q 1,0 = (k − t)/n and q 0,0 = (n − 2k + t)/n. It follows from our general large deviation principle in the next subsection (or is easy to see directly in this case) that the hypothesis of Theorem 1.1 is essentially equivalent to µ p (A) > (1 − δ) n and the conclusion to µ q (A × t A) > (1 − ε) n . Furthermore, the assumption on t can be rephrased as q j,j ′ ≥ ε for all j, j ′ ∈ {0, 1}, and this indicates the condition that we need in general.
Let us formalise the above discussion of product measures in a general context. Although we only considered the cases when the 'alphabet' J is {0, 1} or {0, 1} × {0, 1}, we remark that it is essential for our arguments to work with general alphabets, as the proofs of our results even in the binary case rely on reductions that increase the alphabet size. Definition 1.11. Suppose p = (p i j : i ∈ [n], j ∈ J) with all p i j ∈ [0, 1] and j∈J p i j = 1 for all i ∈ [n]. The product measure µ p on J n is given, for a ∈ J n , by µ p (a) = i∈[n] p i a i . Given an (n, J)-array V and a measure µ on J n , we write V(µ) = E a∼µ V(a). Suppose µ q is a product measure on ( s∈S J s ) n , with q = (q i j 1 ,...,js : i ∈ [n], j 1 ∈ J 1 , . . . , j S ∈ J S ). For s ∈ [S] the s-marginal of µ q is the product measure µ ps on J n s with (p s ) i j = q i j 1 ,...,j S for all i ∈ [n], j ∈ J s , where the sum is over all (j 1 , . . . , j S ) with j s = j.
We say that µ q has marginals (µ ps : s ∈ S). We say that µ q is κ-bounded if all q i j,j ′ ∈ [κ, 1 − κ]. Note that if µ q is κ-bounded then so are its marginals.
A rough statement of our probabilistic forbidden intersection theorem (Theorem 1.14 below) is that if A has 'large measure' then the set of w-intersections in A has 'large measure'. We will combine this with an equivalence of measures discussed in the next subsection to deduce our main theorem. First will highlight two special cases of Theorem 1.14 that have independent interest. The first is the following result, which ignore the intersection conditions, and is only concerned with the relationship between the measures of A and A × A; it is a new of correlation inequality (see Theorem 7.1 for a more general statement that applies to several families defined over general alphabets). Theorem 1.12. Let 0 < n −1 , δ ≪ κ, ε < 1 and µ q be a κ-bounded product measure on Next we consider the problem of finding V-intersections that are close to w, which is also natural, and somewhat easier than finding V-intersections that are (exactly) w. We require some notation.
Theorem 1.13 naturally fits into the wide literature on forbidden L-intersections in extremal set theory (see [2], [4] or [19]). Here one aims to understand how large certain families of sets can be if all intersections between elements of A are restricted to lie in some set L. For example, the Erdős-Ko-Rado theorem [9] can be viewed as an L 0 -intersection theorem for families A ⊂ n k , where L 0 = {l ∈ N : 1 ≤ l ≤ k}. Similarly, Katona's t-intersection theorem [22] can be viewed as an L ≥t -intersection theorem for families A ⊂ P[n], where L ≥t = {l ∈ N : l ≥ t}. Now we state our probabilistic forbidden intersection theorem: if V is robustly generated then Theorem 1.13 can be upgraded to find fixed V-intersections.

Maximum entropy and large deviations
Next we will discuss an equivalence of measures that will later combine with Theorem 1.14 to yield Theorem 1.9. Here we are guided by the maximum entropy principle (proposed by Jaynes [18] in the context of Statistical Mechanics) which suggests considering the distribution with maximum entropy subject to the constraints of our problem, as defined in the following lemma (the proof is easy, and will be given in Section 2).
We will show that µ V w is equivalent to the uniform measure on (J n ) V w , in the sense of exponential contiguity, defined as follows. (It is reminiscent of, but distinct from, the more well-known theory of contiguity, see [17,Section 9.6].) Definition 1.16. Let µ = (µ n ) n∈N and µ ′ = (µ ′ n ) n∈N , where µ n and µ ′ n are probability measures on a finite set Ω n for all n ∈ N. Let F = (F n ) n∈N where each F n is a set of subsets of Ω n .
We say that µ ′ exponentially dominates µ relative to F, and write We say that µ and µ ′ are exponentially contiguous relative to F, and write µ ≈ F µ ′ if µ F µ ′ and µ ′ F µ.
If ∆ = (∆ n ) n∈N with each ∆ n ⊂ Ω n then we write µ ∆ µ ′ if µ F µ ′ , where F n is the set of all subsets of ∆ n ; we define µ ≈ ∆ µ ′ similarly.
Note that F is a partial order and ≈ F is an equivalence relation. The following result establishes the required equivalence of measures under the same hypotheses as in the previous subsection. It can be regarded as a large deviation principle for conditioning x ∈ J n on the event V(x) = w (see [8] for an overview of this area).
To apply Theorem 1.17 under combinatorial conditions, we will use the following lemma which shows that µ p V w is κ-bounded under our general position condition on V. (See also Section 4 for a more general result based on VC-dimension that applies to larger alphabets.) Alexander Barvinok remarked (personal communication) that similar results to Theorem 1.17 and Lemma 1.18 were obtained by Barvinok and Hartigan in [3]. Theorem 3 of [3] gives stronger bounds on |({0, 1} n ) V w | where applicable, but their assumptions are very different to ours (they assume bounds for quadratic forms of certain inertia tensors), and they also require that the vectors all operate at the 'same scale', so their results do not apply to the Kalai vectors. Although our bounds are weaker, our proofs are considerably shorter, and furthermore, stronger bounds here would not give any improvements elsewhere in our paper, as they account for a term subexponential in n, while our working tolerance is up to a term exponential in n.

Supersaturation
We now give a brief overview of the strategy for combining the results of the previous two subsections to prove supersaturation, and also indicate the conditions that determine which case of Theorem 1.9 holds. Under the set up of Theorem 1.9, a telegraphic summary of the argument is: where µ q is chosen to optimise the lower bound on |(A × A) V∩ w | implied by the final inequality. The best possible supersaturation bound (case i of Theorem 1.9) arises when Theorem 1.14 is applicable with µ q equal to the maximum entropy measure µ q that represents (X × X ) V∩ w : this case holds when µ q is κ-bounded and has marginals µ p close to µ p := µ p V z . Case ii of Theorem 1.9 holds if µ q is κ-bounded but µ p is not close to µ p : then µ p is concentrated on a small subset B f ull of X , which is responsible for almost all w-intersections in X .
Lastly, case iii of Theorem 1.9 holds if µ q is not κ-bounded. The key to understanding this case is the well-known [31] Vapnik-Chervonenkis dimension, defined as follows. Definition 1. 19. We say that A ⊂ J n shatters X ⊂ [n] if for any (j x : x ∈ X) ∈ J X there is a ∈ A with a x = j x for all x ∈ X. The VC-dimension dim V C (A) of A is the largest size of a subset of [n] shattered by A.
To see why it is natural to consider the VC-dimension, consider the problem of finding an intersection of size n/3 among subsets of [n] of size 2n/3. The conditions of the Frankl-Rödl theorem are not satisfied, and indeed the conclusion is not true: take 2n/3 × n/3 [n] 2n/3 as a subset of ({0, 1} × {0, 1}) n , we see that no coordinate can take the value (0, 0), so there is not even a shattered set of size 1! Modifying this example in the obvious way we see that it is natural to assume a bound that is linear in n. We also note that this example shows that the 'Frankl-Rödl analogue' of Conjecture 1.2 is not true, and hints towards a counterexample for Kalai's conjecture. More generally, we will prove that κ-boundedness of µ q is roughly equivalent to the VC-dimension of (X × X ) V∩ w being large as a subset of ({0, 1} × {0, 1}) n (see Lemma 4.8). Case iii of Theorem 1.9 will apply when (X × X ) V∩ w has low VC-dimension. The above outline also gives some indication of how the values in Theorem 1.3 arise. As described above, the supersaturation conclusion desired by Conjecture 1.2 (case i of Theorem 1.9) needs µ q to have marginals µ p close to µ p := µ p V z . We can describe µ q and µ p explicitly using Lagrange multipliers: they are Boltzmann distributions (see Lemma 10.1). In general, it is not possible for one Boltzmann distribution to be a marginal of another, which explains why Conjecture 1.2 is generally false. An analysis of the special conditions under it is possible gives rise to the characterisation of Γ in Theorem 1.3.
The outline also suggests a possible characterisation of the optimal level of supersaturation in all cases (i.e. including those for which Kalai's conjecture fails). Any choice of µ q satisfying the hypotheses of Theorem 1.14 with marginal distributions µ V z gives a lower bound on |(A × A) V∩ w |, and the optimal such lower bound is obtained by taking such a measure with maximum entropy. Is this essentially tight? We wil give a positive answer to this question by proving a matching upper bound in Section 11.
Finally, we remark that our method allows different vectors defining the sizes of intersections from those defining the sizes of sets in the family, i.e. V ′ -intersections in ({0, 1} n ) V z ; in Section 6.3 we show such an application to give a new proof of a theorem of Frankl and Rödl [11,Theorem 1.15] on intersection patterns in sequence spaces.

Organisation of the paper
In the next section we collect some probabilistic methods that will be used throughout the paper. We prove the large deviation principle (Theorem 1.17) in Section 3. In Section 4 we establish the connection between VC-dimension and boundedness of maximum entropy measures. Section 5 is expository: we give two concrete counterexamples to Kalai's Conjecture 1.2. Next we introduce a more general setting in Section 6, state our most general result (Theorem 6.2), and show that it implies our probabilistic intersection theorem (Theorem 1.14). In Section 7 we prove a correlation inequality needed for the proof of Theorem 6.2; as far as we are aware, the inequality is quite unlike other such inequalities in the literature. We prove Theorem 6.2 in Section 8, and then deduce our main theorem (1.9) in Section 9. Our corrected form of Kalai's conjecture (Theorem 1.3) is proved in Section 10; we also show here in much more generality that supersaturation of the form conjectured by Kalai is rare. In Section 11 we give a complete characterisation of the optimal level of supersaturation in terms of a certain optimisation problem for measures. Lastly, in section 12 we recast our results in terms of 'exponential continuity': a notion that arises naturally when comparing distributions according to exponential contiguity, and may be interpreted in terms of robust statistics for social choice: this point and several potential directions for future research are addressed in the concluding remarks.

Notation
We identify subsets of a set with their characteristic vectors: A ⊂ X corresponds to a ∈ {0, 1} X , where a i = 1 ⇔ i ∈ A. The Hamming distance between vectors a and a ′ in a product space J n is d(a, a ′ ) = |{i ∈ [n] : a i = a ′ i }|. Given a set X, we write X k = {A ⊂ X : |A| = k}. We write δ ≪ ε to mean for any ε > 0 there exists δ 0 > 0 such that for any δ ≤ δ 0 the following statement holds. Statements with more constants are defined similarly. We write a = b ± c to mean b − c ≤ a ≤ b + c. Throughout the paper we omit floor and ceiling symbols where they do not affect the argument. All vectors appear in boldface.

Probabilistic methods
In this section we gather several probabilistic methods that will be used throughout the paper: concentration inequalities, entropy, an application of Dependent Random Choice to the independence number of product graphs, and an alternative characterisation of exponential contiguity.

Concentration inequalities
We start with the well-known Chernoff bound (see e.g. [1, Appendix A]).
An easy consequence is the following concentration inequality for random sums of vectors.
, so the lemma follows from a union bound.
We will also use the following consequence of Azuma's martingale concentration inequality (see e.g. [25]). We say that f : J n → R is b-Lipschitz if for any a, a ′ ∈ J n differing only in a single coordinate we have |f (a) − f (a ′ )| ≤ b.

Entropy
In this subsection we record some basic properties of entropy (see [7] for an introduction to information theory). The entropy of a probability distribution p = (p 1 , . . . , p n ) is H(p) = − i∈[n] p i log 2 p i . The entropy of a random variable X taking values in a finite set S is H(X) = H(p), where p = (p s : s ∈ S) is the law of X, i.e. p s = P(X = s). When p = (p, 1 − p) takes only two values we write Entropy is subadditive: if X = (X 1 , . . . , X n ) then H(X) ≤ n i=1 H(X i ), with equality if and only if the X i are independent. An equivalent reformulation is the following lemma.
It is easy to deduce Lemma 1.15 from Lemma 2.4. Indeed, consider µ ∈ M V w with maximum entropy. Let with equality if and only if µ = µ p . As M V w is convex, uniqueness follows from strict concavity of the entropy function, which we will now explain. It is often convenient to use the notation and L ′′ (p) = − 1 p log 2 < 0, so L is strictly concave. The following lemma is immediate from these formulae and the mean value form of Taylor's theorem: We deduce the following 'stability version' of the uniqueness of the maximum entropy measure, which quantifies the decrease in entropy in terms of distance from the maximiser. We conclude this subsection with a perturbation lemma.

Dependent Random Choice
We will use the following version of Dependent Random Choice (see [23,Lemma 11] for a proof and [10] for a comprehensive survey of the method). We write for the set of common neighbours of u and u ′ in a graph G.
The following is an immediate consequence of Lemma 2.8, applied with t = ⌈2/cε⌉.
We say that S ⊂ V (G) is independent if it contains no edges of G. The independence number α(G) of G is the maximum size of an independent set in G. Given graphs G 1 , . . . , G k , we write By repeated application of the previous lemma, we obtain the following corollary.

Exponential Contiguity
We conclude this section with an alternative characterisation of exponential contiguity.

Large deviations of fixed sums
In this section we prove Theorem 1.17. Our first lemma will be used to show that the maximum entropy measure is exponentially dominated by the uniform measure.
. As these random variables are independent and EX = −H(µ p ), the bound on µ p (B) follows from Chernoff's inequality.
Our next lemma gives a lower bound for point probabilities of maximum entropy measures, which implies an upper bound on the number of solutions of V(x) = w.
Our final lemma will give an approximate formula for the number of solutions of V(x) = w (as mentioned in the introduction, [3,Theorem 3] gives stronger bounds under different hypotheses). First we require a small set that efficiently generates Z D , as described by the following definition and associated lemma, which shows that such a set exists under the mild assumption of polynomial growth for the coordinate scale vector R (this will also be used later in Theorem 6.2).
Proof. We first note that the final statement of the lemma follows from the first: the latter gives the lower bound, as EV(x) = w when x ∼ µ p V w , and the upper bound follows from Lemma 3.2. It remains to prove the first statement of the lemma. Let F be the set of x ∈ J n such that there is We claim that µ p (F) > 1/2. First we assume the claim and deduce the lower bound. By double-counting pairs ( Note that log 2 µ p (F ′ ) ≤ log 2 |F ′ | − H(µ p ) + δ 2 n, and by Lemma 3.1 and the claim we have µ p ( To prove the claim, we consider x ∼ µ p and show that with probability at least 1/2 there is ). Let B 2 be the event that for some m we have |{t : This completes the proof of the claim, and so of the lemma.
We deduce Theorem 1.17, which states that under the hypotheses of the above lemmas, we have

Boundedness, feasibility and universal VC-dimension
In this section we will give several combinatorial characterisations of the boundedness condition on maximum entropy measures required in our probabilistic intersection theorem. The characterisations hold under the following 'multiscale general position' assumption, which extends Definition 1.8 to all finite alphabets (by 'multiscale' we mean that the parameter γ can be arbitrary, which is true of the Kalai vectors).
We say that a sequence (V n , R n ) of (n, J)-arrays and scalings is robustly generic if V n is γ ′ -robustly (γ, R n )-generic whenever n −1 ≪ γ ≪ γ ′ .
It will also be convenient to use the following sequence formulation of Definition 1.7.
Definition 4.2. We say that (V n , R n ) is robustly generating if there are γ > 0 and k, n 0 ∈ N such that V n is γ-robustly (R n , k)-generating for all n > n 0 .
Next we will define the combinatorial conditions that appear in our characterisation. We recall the definition of VC-dimension and also define a universal variant that will be important in the proof of Theorem 1.9 in section 9.
Next we give a feasibility condition, which can be informally understood as saying that we can solve any small perturbation of the equation V n (x) = z n .
Definition 4.4. Let (V n , R n , z n ) be a sequence of (n, J)-arrays, scalings and vectors in Z D . We say (V n , R n , z n ) is λ-feasible if there is n 0 such that for any n > n 0 , any z ′ n ∈ Z D with z ′ n − z n Rn ≤ λn, and any (n ′ , J)-array V ′ n ′ obtained from V n by deleting at most λn co-ordinates, we have (J n ′ ) Our final property appears to be a substantial weakening of our κ-boundedness condition, so it is quite surprising that it also gives a characterisation.
Definition 4.5. Suppose µ p is a product measure on J n . We say that µ p is κ-dense if there are at least κn coordinates i ∈ [n] such that p i j ≥ κ for all j ∈ J. Now we can state the main theorem of this section. The sense of the equivalences in the statement is that the implied constants are bounded away from zero together. For example, the implication ii ⇒ i means that for any δ > 0 there is ε > 0 such that if µ Vn zn is δ-dense then µ Vn zn is ε-bounded. Theorem 4.6. Let (V n , R n ) be a robustly generic and robustly generating sequence of (n, J)-arrays and scalings in Z D , and (z n ) a sequence of vectors in Z D . The following are equivalent: The main step in the proof of Theorem 4.6 is Lemma 4.8, which provides the implication iii ⇒ i. It also implies Lemma 1.18, as for binary vectors the following coarse version of the Sauer-Shelah theorem shows that linear VC-dimension is equivalent to exponential growth.
The proof of Lemma 4.8 is immediate from the next two lemmas, which give the implications iii ⇒ ii and ii ⇒ i of Theorem 4.6.
Consider a product measure µ p ′ where for some t > 0 we have p ′i Proof of Theorem 4.6. It remains to prove the implications i ⇒ v and v ⇒ iv (note that iv ⇒ iii is trivial).
For i ⇒ v, let n −1 ≪ λ ≪ κ ≪ γ, k −1 , suppose V n is γ-robustly (R n , k)-generating, µ Vn zn is κ-bounded, z ′ n ∈ Z D with z ′ n − z n Rn ≤ λn, and V ′ is obtained from V n by deleting S ⊂ [n] with |S| ≤ λn. Then V ′ is R n -bounded and (γ/2)-robustly (R n , k)-generating. Also, the restriction For v ⇒ iv, let n −1 ≪ κ, and suppose (V n , R n , z n ) is κ-feasible. Fix S ⊂ [n] with |S| = κn and y ∈ J S . We need to show that there is x ∈ (J n ) Vn zn with x| S = y. Let V ′ , V 0 be obtained from V n by respectively deleting, retaining the coordinates of S. Let z ′ n = z n − V 0 (y). Then z ′ n − z n Rn ≤ κn, so by definition of κ-feasibility we can find We conclude this section by noting the following lemma which is immediate from the preceding proof and Lemma 4.8.
Let The first example also shows that cases ii and iii can hold simultaneously.

The general setting
In this section we state our most general result, Theorem 6.2; we will defer the proof to section 8. This is in fact the main result of the paper in some sense, as we will show in this section that it implies Theorem 1.14 (in a more general cross-intersection form). However, the hypothesis of 'transfers' in Theorem 6.2 appears to be quite strong at first sight, and it will take some work to show that it follows from the hypotheses of Theorem 1.14 (it is here that the idea of enlarging the alphabet comes into play). We state our result in the next subsection and then deduce Theorem 1.14 in the following subsection. A second application of Theorem 6.2 is given in subsection 6.3, where we use it to give a short proof of a theorem of Frankl and Rödl on forbidden intersection patterns.

Statement of the general theorem
Before stating our theorem, we require the following definition, which describes a situation when for any vector u in some specific set (which will be given by the following definition), there are many ways of choosing a coordinate and two particular alterations of its value: one does not change the associated vector, and the other changes it by u. We say that V has γ-robust transfers for U if it has transfers for (P, U ) for some P such that |P m | ≥ γn for all m ∈ [M ].
Note that an (n, s∈S J s )-array in Z D has transfers for (P, U ) if it has them as an (n, J ×L)-array, where J = s∈S ′ J s and L = s∈S\S ′ J s for some S ′ ⊂ S.
We can now state our general theorem. (Recall that U exists by Lemma 3.4.) 6.2 Proof of Theorem 1.14 Now we assume Theorem 6.2 and prove Theorem 1.14; in fact we prove the more general crossintersection theorem. The strategy is to fuse together suitable co-ordinates and enlarge the alphabet.
We let N = ⌊n/k⌋ and partition [n] into sets T 1 , . . . , T N each of size k and a remainder set R with 0 ≤ |R| ≤ k − 1, such that each S mj ∪ S ′ mj is contained in some T i . We let P = (P m : m ∈ [M ]), where each P m is the set of i ∈ [N ] such that T i contains some S mj ∪ S ′ mj . We start by reducing to the case R = ∅ and k|n. For R s ⊂ R we let A Rs s = {A ∈ A s : A∩R = R s } for s = 1, 2. By the pigeonhole principle we can fix (R 1 , R 2 ) so that µ p ) according to some fixed bijection of T i with [k]. We will apply Theorem 6.2 with N in place of n, with S = {1, 2} and J 1 = J 2 = {0, 1} k , and A ′ s (naturally identified) in place of A s . We let W = (w i J 1 ,J 2 ) be the (N, {0, 1} k × {0, 1} k )-array in Z D defined by w i J 1 ,J 2 = j∈J 1 ∩J 2 v j for J 1 , J 2 ⊂ T i . Note that V ∩ (x, y) = W(x, y) for all x, y in {0, 1} n (naturally identified).
We also note that W has transfers for (U , P). To see this, consider i ∈ P m with S mj ∪ S ′ mj ⊂ T i .
To summarise, after the above reductions, we have , so the theorem follows from Theorem 6.2.

Application to a theorem of Frankl and Rödl
In this subsection we give another application of Theorem 6.2, which illustrates an additional flexibility, namely that our method allows different vectors defining the sizes of intersections from those defining the sizes of sets in the family. We will give a new proof of a theorem of Frankl and Rödl [11, Theorem 1.15] on intersection patterns in sequence spaces. (To align with notation from the rest of the paper, our notation differs from that of [11].) Given non-negative integers l 1 , . . . , l s with i l i = n, let k 1 ,...,kt , the intersection pattern of x and y is given by an s times t matrix M , with M j 1 ,j 2 = |{i ∈ [n] : l 1 ,...,ls and A 2 ⊂ [n] k 1 ,...,kt we let A 1 × M A 2 denote the set of pairs (x, y) ∈ A 1 × A 2 with intersection pattern M .
We wish to emphasize two aspects of the above proof. Firstly, it is crucial that the arrays V 1 , V 2 and V can differ. Secondly, the arrays V i are not |J i | −1 -robustly (γ, R)-generic for any γ > 0 for i = 1, 2, so we cannot apply Lemma 4.8, but we were able to see directly that µ p 1 and µ p 2 are κ-bounded. Thus Theorem 6.2 has useful consequences even for arrays that are not robustly generic.

Correlation on product sets
In this section we will prove the following correlation inequality which will be used in the proof of Theorem 6.2; it can also be interpreted as an exponential contiguity result for product measures (see Theorem 7.2).
Let M = E µq (f ). We claim that M ≤ (2δ + α)n ≤ 2αn. To see this, we apply a well-known concentration argument. For I ⊂ R, let This completes the proof in this case. Now we deduce the general case by induction on |S|. Suppose the theorem is known for |S| = k−1 and we wish to prove it for |S| = k. Fix s ∈ S and let S ′ = S \ {s}. We view ( s∈S J s ) n as (J s × J ′ ) n , where J ′ = s ′ ∈S ′ J s ′ . Let µ p ′ be the product measure on J ′n defined by µ p S ′ (x ′ ) = µ q ((J 1 ) n × {x ′ }). Then µ p ′ is κ-bounded and has marginals (µ p s ′ ) s ′ ∈S ′ , so by induction hypothesis, Also, we can view µ q as a product measure on (J s × J ′ ) n , with marginals µ ps and µ p ′ . Since from the |S| = 2 case of the theorem we obtain µ q ( s∈S A s ) ≥ (1 − ε) n , as required.
Next we will apply Theorem 7.1 to show exponential contiguity of µ q and s∈S µ ps , defined by ( s∈S µ ps )(x s : s ∈ S) = s∈S µ ps (x s ). Here the subscript Π indicates exponential contiguity relative to product sets, i.e. we apply Definition 1.16 in the case Ω n = s∈S J s n and F = Π = (Π n ) n∈N , where Π n = {(A n,s : s ∈ S) : all A n,s ∈ J n s }. Theorem 7.2. Let 0 < n −1 ≪ κ ≪ 1 and µ q be a κ-bounded product measure on ( s∈S J s ) n with marginals (µ ps : s ∈ S). Then µ q ≈ Π s∈S µ ps .
Proof. As in the proof of Theorem 7.1, it suffices to consider the case S = [2]. By Theorem 7.1 we have µ p 1 × µ p 2 Π µ q . Conversely, consider A s ⊂ J n s for s ∈ [2]. By the Cauchy-Schwarz inequality, writing for x 1 ∈J n 1 ,x 2 ∈J n 2 , we have 1 xs∈As 2 ≤ s∈ [2] µ q (x 1 , x 2 )1 xs∈As = s∈ [2] µ ps (A s ), We conclude this section by giving the easy deduction of Theorem 1.13 from Theorem 7.1.

Proof of the general theorem
In this section we prove Theorem 6.2. We start by reducing to the case |S| = 2.
Proof. First note that if V has γ-robust transfers for U then it has them as an (n, L 1 × L 2 )-array, where each L j = s∈S j J s for some partition (S 1 , S 2 ) of S. Now let µ p S 1 denote the product measure on L n 1 defined by Now we will prove a succession of special cases of Theorem 6.2, where the proof of each case builds on the previous cases, culminating in the proof of the general case. We assume without further comment that S = {1, 2}. We claim that we can fix K with K m = (2α ± κ/4)|P m | for all m ∈ [M ] such that µ p (A ∩ B K ) > (1 − δ) n . Indeed, by assumption we have µ p (A) > (1 − δ) n/2 . Also, for a ∼ µ p and X m = i∈Pm a i we have EX m = 2α|P m |, so by Chernoff's inequality P(|X m − EX m | > κ|P m |/4) ≤ 2e −(κ|Pm|/4) 2 /2|Pm| ≤ 2e −κ 2 γn/32 . There are at most n M choices of K, so by a union bound and the pigeonhole principle there is some K with all We note for any a and a ′ in B K,z that V(a, a ′ ) is determined by the values t m = i∈Pm a i a ′ i . Indeed, for each i ∈ P m , as u m is an i-transfer, we may suppose that v i  It remains to show that we can find such a and a ′ . We consider the graph Write F K 1 ,K 2 for the set of all a ∈ {0, 1} n such that i∈B j a i = K j for j = 1, 2. As in the proof of Lemma 8.2, we write Consider the bipartite graph G with parts We will now find F ⊂ B × B with µ q (F) > (1 − ε/2) n (also writing µ q for its restriction to This will suffice to prove the lemma, as then Then µ q (E) < 2De −ζ 2 n/2 by Lemma 2.2. We choose F = (B×B)\E. By Theorem 7.1 we have µ q (B×B)

It remains to show for fixed
. To see this, it suffices to verify the hypotheses of Lemma 8.2, applied with N G (b 1 , b ′ 1 ) in place of A, restricting µ q to ({0, 1} × {0, 1}) B 2 , and with V 2 in place of V. We note that V 2 has transfers for the same (P, U ), and P = (P m : replacing ζ by 2ζ we see that all hypotheses hold, so the proof of the lemma is complete.
Note that r 1 , . . . , r n are independent, so r defines a product measure µ q ′′ on ( 1].) For fixed h := (S, r ′ ) and j = 1, 2 let Since q ′′ = q, we have µ p j (A j ) = E h (ν(F h j )) for j = 1, 2 and µ q ((A × A) V w ) = E h (ν(F h w )). In the remainder of the proof we will show that P h (ν(F h w ) > (1 − ε/2) n ) > (1 − δ ′ ) n , where δ ≪ δ ′ ≪ ζ. This will imply the Theorem, as then µ q (( To achieve this, we will show that for 'good' h we can apply Lemma 8.4 to F h 1 and F h 2 , with uniform product measure and the array X h := (v i j,j ′ : i ∈ S, j, j ′ ∈ {0, 1}). As . First we define some bad events for h and show that they are unlikely.
The last bad event is that we do not have robust transfers. Let P h = (P h m : m ∈ [M ]), where P h m is the set of i ∈ P m such that u m is an i-transfer in X h . Recalling that u m is an i-transfer in V via (0, 1) and (0, 1) for all i ∈ P m , we have i ∈ P h m whenever i ∈ S, so E|P h m | ≥ κγn. By Chernoff's inequality, the bad event B 3 that some |P h m | < κγn/2 satisfies P(B 3 ) < 2M e −κ 2 γ 2 n/8 . Now let G be the good event for h that ν(F h 1 )ν(F h 2 ) > (1 − δ ′ ) n . By Cauchy-Schwarz and Theorem 7.1 we have as required to prove the theorem.
9 Proof of Theorem 1.9 In this section we will prove Theorem 1.9. Let X = ({0, 1} n ) V z , as in the statement of Theorem 1.9. The proof will split naturally into two pieces according to the VC-dimension of (X × X ) V∩ w . The next subsection shows that for high VC-dimension cases i or ii of Theorem 1.9 hold; the following subsection shows that case iii holds in the case of small VC-dimension.

Large VC-dimension
Here we implement the strategy discussed in subsection 1.4: we consider the maximum entropy measure µ q that represents (X × X ) V∩ w , and distinguish cases i or ii from Theorem 1.9 according to whether its marginals µ p are close to µ p := µ p V z . Throughout this subsection we use the following notation.
where 0 denotes the zero vector in Z D . Let z, w ∈ Z D and X = ({0, 1} n ) V z . We identify (X × X ) V∩ w with (J n ) V x , where x := (z, z, w). We define We denote the marginals of µ q by µ p (both marginals are equal).
Next we show κ-boundedness of the above measures under our usual assumptions on V (and justify the final statement of the above definition).
We conclude with the main result of this subsection, that there is a large subset of X with no w-intersection.
Proof. We may assume |X | ≥ (1 + ε) n as otherwise we can take B empty = ∅. Take α and ξ such that Then |X 0 ∪ X 1 | ≥ |X |/2 by Lemma 9.4. The remainder of the proof splits into two similar cases according to which X j is large; we will give full details for the case j = 1 and then indicate the necessary modifications for j = 0.
Next we will define B empty . We randomly select S ⊂ [n] with |S| = ξn, and let C = {x ∈ X ′′ : S ⊂ S 1 (x)}. We say that x ∈ C is isolated if there is no x ′ ∈ C with V ∩ (x, x ′ ) = w. We let B empty be the set of isolated x ∈ C. Then by definition we have (B empty × B empty ) V∩ w = ∅. Now we will show that E|B empty | ≥ (1−ε) n |X |. As E(|C|) = t ξn n ξn −1 |X ′′ |, |X ′′ | ≥ (1−ξ 1/2 ) n |X | and ξ ≪ ε, it suffices to show P(x is isolated | x ∈ C) ≥ 1/2 for all x ∈ X ′ . To see this, we condition on x ∈ C and note that S is equally likely to be any subset of S 1 (x) of size ξn. Consider any w (x), we have |S 1 (x)\S 1 (y)| ≥ ξn by definition of X ′′ , and x ′ ∈ C ⇔ S ⊂ S 1 (y). For fixed y we have P(S ⊂ S 1 (y)) ≤ t−ξn ξn t ξn −1 ≤ (1 − ξ) ξn . By definition of X 1 we have a union bound over at most (1 + α) n choices of y ∈ N 1 w (x), so as α ≪ ξ, the probability that x is not isolated given x ∈ C is o(1), so at most 1/2, as required.
Similarly, if |X 0 | ≥ |X |/4, we define X ′ and X ′′ in the same way for X 0 , and let C = {x ∈ X ′′ : S ⊂ S 0 (x)}. We use the same definition of B empty as before, and bound the probability that x is not isolated given x ∈ C by taking a union bound over at most (1+α) n choices of y := x ′ | S 0 (x) ∈ N 0 z−w (x). The remaining details of this case are the same, so we omit them.

Solution of Kalai's Conjecture
In this section we prove Theorem 1.3, which is our solution to Kalai's Conjecture 1.2. We give the proof in the first subsection, then generalise it in the following subsection to show that supersaturation of the type conjectured by Kalai is quite rare.

Proof of Theorem 1.3
As described in subsection 1.4, the supersaturation conclusion desired by Conjecture 1.2 (case i of Theorem 1.9) needs the maximum entropy measure µ q that represents (X × X ) V∩ w to have marginals µ p close to µ p := µ p V z . Recall that in Definition 9.1 we constructed µ q as µ V x , where V is a certain (n, {0, 1} × {0, 1})-array in Z 3D and x := (z, z, w). In this subsection we work with the Kalai vectors V = (v i ) i∈ [n] with v i = (1, i), so D = 2. In the notation of Conjecture 1.2 we have z = (k, s) and w = (t, w). Sometimes we will indicate the dependence on n as a subscript in our notation, e.g. writing z n = (k n , s n ) = (⌊α 1 n⌋ , α 2 n 2 ). Our proof will use the following concrete description of the maximum entropy measures as Boltzmann distributions.
Proof. By the theory of Lagrange multipliers, p is a stationary point of gives the stated formula.
Proof. We suppose that case (i) does not hold and prove that case (ii) holds. We can fix κ > 0 and a sequence n m → ∞ such that each µ q (nm) is κ-bounded. By Lemma 10.1, we have π nm ∈ R 4 such that q (nm) = q (nm) π nm . By κ-boundedness, each π nm ∈ [−C, C] 4 for some C = C(κ) ∈ R. By compactness of [−C, C] 4 , we can pass to a convergent subsequence, so by relabelling we can assume π nm → π ∈ R 4 .
The first equivalence is immediate from the above proof, and the second from Theorem 4.8.
We conclude this subsection with the solution to Kalai's conjecture.

Uniqueness in higher dimensions
In this subsection we illustrate how the method used to prove Theorem 1.3 can be applied in a broader context. Throughout this subsection we work with the following setting. • Write X n = ({0, 1} n ) Vn zn ) and suppose that |X n | > (1 + η) n , where η = η(α) > 0 is fixed.
• The arrays V n have a 'scaling limit': there is a positive measurable function p : [0, 1] D → R with [0,1] D p(x)dx = 1 such that for any measurable set B ⊂ [0, 1] D we have The assumption that (V n , z n ) is robustly generic is in fact redundant, as it can be shown to follow from the scaling limit assumption, but for the sake of brevity we omit this deduction. We say that (α, β) is (n, δ, ε)-good if the corresponding V n -intersection problem exhibits 'full supersaturation' analogous to that in Conjecture 1.2, i.e. any A ⊂ X n with |A| ≥ (1 − δ) n |X n | satisfies (A × A) . We will outline the proof of the following analogue of Theorem 1.3, which shows that if we exclude the case of 'uniformly random sets' (i.e. α = 1/2 d∈[D] ) then 'full supersaturation' only occurs for one specific value of β.
Theorem 10.5. In the above setting, if α = 1/2 d∈[D] then there is β * = β * (α) ∈ (0, 1) D such that for Similarly to the previous subsection, we wish to determine when µ q = µ V x (with V and x as in Definition 9.1) has marginals close to µ p = µ V z (here we are omitting the subscript n from our notation). If these measures are κ-bounded, Lemma 10.1 gives λ ∈ R D such that and π 1 , π 2 ∈ R D such that q i j,j ′ = (q (n) with Z π1,π 2 (x) = 1 + 2e π 1 ·x + e π 2 ·x .
Again we study the marginal problem for µ q and µ p via the limit marginal problem of characterising λ and π such that f λ (x) = g π 1 ,π 2 0,1 i . The limit versions are h(λ) = α and h * (π 1 , π 2 ) = β, where Our next lemma is analogous to Lemma 10.2.
Lemma 10.6. i. h is a homeomorphism between R D and h(R D ). ii. For large n we have µ Vn zn = µ p , for some p = p (n) λ (n) where λ (n) → λ = h −1 (α). We omit the proof of Lemma 10.6, as it is the same as that of Lemma 10.2, except in one detail which we will now check, namely that the principal minors of the Jacobian of h are positive. To see this, note that the Jacobian J has entries For any y ∈ R D we have y T Jy = x∈[0,1] D | x, y | 2 f λ (x)(1+ e λ.x ) −1 p(x)dx. As f λ and p are positive, we have y T Jy > 0 whenever y = 0, as required. We also have the following analogue of Lemma 10.3; again, we omit the similar proof.
The uniqueness in Theorem 10.5 is explained by the following lemma which solves the limit marginal problem.
We fix a partition of [n] into sets S 1 , . . . , S M so that Note that |B| ≥ |X |/n M . The following lemma shows that µ p ′ is a good approximation to the maximum entropy measure µ p .
We use a similar construction of an empirical measure that represents w-intersections. Let G be the graph with V (G) = B where AB ∈ E(G) if |A ∩ B| V = w. We define the type of AB ∈ E(G) Note that each µ q(t) has both marginals µ p ′ and V ∩ (µ q(t) ) − w R ≤ p − p ′ 1 ≤ κn. We can assume H(µ q ) < log 2 |E(G)| − εn/2 for all q ∈ Q, otherwise the proof is complete. We fix a type t occurring at least e(G)/n 2M times and set q = q( t).
Then H(µ q ) ≥ log 2 (e(G)/n 2M ), so q / ∈ Q by (3). The following lemma will show that all empirical measures associated to edges of G are close to q; we will then use this and q / ∈ Q in Lemma 11.5 to find a large independent set in G, which will complete the proof of Theorem 11.1.
For the proof we require the following bound analogous to (3) for a wider class of measures.

Proof.
We will obtain the required bound from (3) a measure in Q close to µ q ′ . Recall that p ′ is κ ′ -bounded and p − p ′ 1 ≤ κn Consider q ′′ that minimises q ′′ − q ′ 1 subject to µ q being κbounded and having marginals µ p . For each i we can construct {(q ′′ ) i j,j ′ } from {(q ′ ) i j,j ′ } by moving probability mass |p i − p ′ i | to create the correct marginals, and moving a further mass of at most 2κ while maintaining the same marginals to ensure κ-boundedness. Therefore q ′′ − q ′ 1 ≤ 6κn. Now we will perturb q ′′ to obtain q ∈ Q, i.e. we maintain κ-boundedness and the same marginals µ p , and obtain V ∩ (µ q ) = w.
The following lemma completes the proof of Theorem 11.1.
Therefore, for any U ⊂ T 1 of size u = ⌈4λ 1/2 n⌉, the family A U := {B ∈ B * : U ⊂ B} forms an independent set in G * . Consider a uniformly random choice of such U . For any B ∈ B * , as |B ∩ T 1 | ≥ κ ′ |T 1 |/2 we have P(B ∈ A U ) ≥ (κ ′ /4) u ≥ (1 − λ 1/3 ) n , as λ ≪ κ ′ . Therefore E U |A U | = B∈B * P(B ∈ A U ) ≥ (1 − ε) n |X |. Thus for some U we obtain an independent set A U of at least this size, which completes the proof of Case 1.

Concluding remarks
There are several natural directions in which to explore potential generalisations of our results: instead of associating vectors in Z D to each coordinate we may consider values in another (abelian) group G, and we may consider more general functions of the coordinate values, e.g. a (low degree) polynomial (e.g. a quadratic for application to the Borsuk conjecture) rather than a linear function (is there a 'local' version of Kim-Vu [24] polynomial concentration?). Even for linear functions in one dimension, our setting seems somewhat related to some open problems in Additive Combinatorics, such as the independence number of Paley graphs, but here our assumptions seem too restrictive (one cannot use transfers). We may also ask when better bounds hold, e.g. for G = Z/6Z we recall an open problem of Grolmusz [15]: is there a subexponential bound for set systems where the size of each set is divisible by 6 but each pairwise intersection is not divisible by 6?
Our results may interpreted as giving robust statistics in the theory of social choice. Suppose that we represent a voter by an opinion vector x ∈ J n , where each x i represents an opinion on the ith issue, for example, when |J| = 2 each issue could be a question with a yes/no answer. Then we can represent a population of voters by a probability measure µ on J n , where µ(x) is the proportion of a voters with opinion x. Now suppose that we want to compare two (or more) voters. One natural measure of comparison is to assign a score to each opinion and calculate the total score on opinions where they agree. If this is too simplistic, then we could assign score vectors in some R D , where D is small enough to give a genuine compression of the data, but large enough to capture the varied nature of the issues: we compare x and x ′ according to V ∩ (x, x ′ ). Taking the perspective of robust statistics (see [16]), it is natural to ask whether this statistic is sensitive to our uncertainty in the probability measure that represents the population as a whole: Theorem 12.2 (with the remark following it) gives one possible answer.