An effective equidistribution result for $SL(2,R)\ltimes(R^2)^{\oplus k}$ and application to inhomogeneous quadratic forms

Let $G=$SL$(2,R)\ltimes(R^2)^{\oplus k}$ and let $\Gamma$ be a congruence subgroup of SL$(2,Z)\ltimes(Z^2)^{\oplus k}$. We prove a polynomially effective asymptotic equidistribution result for special types of unipotent orbits in $\Gamma\backslash G$ which project to pieces of closed horocycles in SL$(2,Z)\backslash$SL$(2,R)$. As an application, we prove an effective quantitative Oppenheim type result for the quadratic form $(m_1-\alpha)^2+(m_2-\beta)^2-(m_3-\alpha)^2-(m_4-\beta)^2$, for $(\alpha,\beta)$ of Diophantine type, following the approach by Marklof [24] using theta sums.


Introduction
The results of M. Ratner on measure rigidity and equidistribution of orbits of a unipotent flow [32], [33], play a fundamental role in homogeneous dynamics. These results also have many applications outside of dynamics, ranging from problems in number theory to mathematical physics. In recent years there has been an increased interest in obtaining effective versions of Ratner's results in special cases, i.e., to provide an explicit rate of density or equidistribution for the orbits of a (non-horospherical) unipotent flow; cf. [12], [6], [28], [21], [39], [3], [31]. In particular, in [39] and [3], effective equidistribution results were obtained for orbits of a 1parameter unipotent flow on SL(2, Z) ⋉ Z 2 \ SL(2, R) ⋉ R 2 , using Fourier analysis and methods of from analytic number theory, and in the very recent paper [31], building on similar methods, effective equidistribution of diagonal translates of certain orbits in SL(3, Z)⋉Z 3 \ SL(3, R)⋉R 3 was established. Our purpose in the present paper is to prove results of a similar nature for homogeneous spaces of the group G = SL(2, R) ⋉ (R 2 ) ⊕k for k ≥ 2, and to apply these to derive an effective quantitative Oppenheim type result for a certain family of inhomogeneous quadratic forms of signature (2,2). Here (R 2 ) ⊕k denotes the direct sum of k copies of R 2 , each provided with the standard action of SL(2, R).
We now turn to a precise description of our setting. We represent vectors by column matrices. Throughout the paper we will identify (R 2 ) ⊕k with R 2k so that the action of G ′ := SL(2, R) is given by The elements of G = SL(2, R) ⋉ (R 2 ) ⊕k are then represented by pairs (M, v) ∈ G ′ × R 2k , with a multiplication law Let a(y) = √ y 0 0 1/ √ y and u(x) = 1 x 0 1 (y > 0, x ∈ R).
We will always view G ′ = SL(2, R) as a subgroup of G through M → (M, 0); in particular, a(y) and u(x) are also elements of G. We set Γ = SL(2, Z) ⋉ (Z 2 ) ⊕k .
In our notation, this is the subgroup of all (M, v) ∈ G with M ∈ SL(2, Z) and v ∈ Z 2k . Given a subgroup Γ of Γ of finite index, we consider the homogeneous space As we will detail below, this space is a torus bundle over a finite cover of the familiar 3dimensional homogeneous space SL(2, Z)\ SL(2, R) classifying unimodular lattices in R 2 . We fix µ to be the (left and right invariant) Haar measure on G, normalized so as to induce a probability measure on X, which we also denote by µ.
For any a, b ∈ R k we denote by ab the standard scalar product, ab = a 1 b 1 + . . . + a k b k .
Theorem 1.1. Let Γ be a subgroup of Γ = SL(2, Z) ⋉ (Z 2 ) ⊕k of finite index. Fix ξ = ξ 1 ξ 2 in R 2k subject to the condition that there does not exist any m ∈ Z k \ {0} for which both mξ 1 and mξ 2 are integers. Then for any Borel probability measure λ on R which is absolutely continuous with respect to the Lebesgue measure, and any bounded continuous function f on X = Γ\G, In view of the relation u(x)a(y) = a(y)u(y −1 x), the integration in the left hand side of (1) is along an orbit of the unipotent flow U t : Γg → Γgu(t) (t ∈ R) on X. Let D : G → G ′ be the natural projection sending (M, v) to M ; then D(Γ) is a finite index subgroup of SL(2, Z), and D induces a projection map from X to X ′ := D(Γ)\G ′ , which we also call D; this realizes X as a torus bundle over the space X ′ , which in turn is a finite cover of SL(2, Z)\ SL(2, R). The orbits which appear in (1) are exactly those orbits of the flow U t which project to a closed horocycle in X ′ around its cusp at ∞. Letting y decrease towards zero means that we are considering expanding translates of the initial orbit x → Γ(1 2 , ξ)u(x). Let us note that the condition imposed on ξ in Theorem 1.1 cannot be weakened. Indeed, for any m ∈ Z k \ {0}, set This is a closed embedded submanifold of codimension 2 in X. If both mξ 1 and mξ 2 are integers then Γ 1 2 , ξ u(x)a(y) ∈ X m for all x ∈ R, y > 0, (2) and therefore the curve certainly cannot become equidistributed in X, i.e. (1) fails for some f . (For example, consider any bounded continuous f ≥ 0 such that f |Xm ≡ 0 while X f dµ > 0. ) Marklof in [24,Thm. 5.7] proved Theorem 1.1 in the special case of ξ 1 = 0, and then in [26,Thm. 3.1] in the special case of ξ 2 = 0. Note that if ξ 1 = 0, the condition on ξ 2 in the theorem becomes that 1 together with the k components of ξ 2 should be linearly independent over Q (and vice versa if ξ 2 = 0). Our main results in the present paper are Theorem 1.2 and Theorem 1.3 below, which give effective versions of these two special cases of Theorem 1.1, under the further requirement that Γ is a congruence subgroup of Γ.
To prepare for the statement of the main theorems we introduce some further notation. For a positive integer N , Γ(N ) denotes the principal congruence subgroup of SL(2, Z) of level N : We will consider X = Γ\G where Γ is a subgroup of Γ of the form Γ = Γ(N ) ⋉ Z 2k . (The case of an arbitrary congruence subgroup of Γ can easily be reduced to the case of Γ = Γ(N ) ⋉ Z 2k , by using the fact that for any q ∈ Z + , the map (M, v) → (M, qv) is an automorphism of G.) We introduce the following cuspidal height function, for (M, v) ∈ G: where in the right hand side we use the standard action of G ′ = SL(2, R) on the Poincaré upper half plane H = {τ = u + iv ∈ C : v > 0}. Then Y(M, v) ≥ √ 3/2 for all (M, v) ∈ G. Note that Y(M, v) depends only on the coset Γ(M, v), and in particular Y can be viewed as a function on X. Given p 1 , p 2 , . . . ∈ X, we have Y(p j ) → ∞ if and only if the sequence p 1 , p 2 , . . . leaves all compact subsets of X.
For m ≥ 0 and a ∈ R, we let C m a (X) be the space of all m times continuously differentiable functions on X, all of whose derivatives up to order m are ≪ Y −a throughout X. In more precise terms, let g be the Lie algebra of G, and fix a basis X 1 , . . . , X 2k+3 of g (we make a definite choice of this basis; cf. (18) below). Each Y ∈ g can be realised as a left invariant differential operator on functions on G, and thus also a differential operator on X = Γ\G, which we will also denote by Y . For any f ∈ C m (X), set where the sum is taken over all monomials in X 1 , . . . , X 3+2k of degree ≤ m. In particular, · C 0 0 is the supremum norm. Then C m a (X) is the space of all f ∈ C m (X) with f C m a < ∞. For any integer n ≥ 0 and real numbers a ≥ 0 and p ∈ [1, +∞], we introduce the weighted Sobolev norm S p,a,n (h) on functions h ∈ C n (R) through S p,a,n (h) = n j=0 (1 + |x|) a ∂ j h(x) L p .
Let us make some comments on these results. Firstly, note that for any fixed ξ 2 ∈ R k and β > k, one has δ β,ξ 2 (y −1/2 ) → 0 as y → 0 if and only if rξ 2 / ∈ Q for all r ∈ Z k \ {0}. Hence Theorem 1.2 indeed gives an effective version of Theorem 1.1 in the special case when Γ is a congruence subgroup of Γ and ξ 1 = 0. Similarly Theorem 1.3 gives an effective version of Theorem 1.1 when ξ 2 = 0. Secondly, as we will explain in Section 3 below (see especially Lemmata 3.1 and 3.3, and the relation (28)), for a sufficiently large β and ξ 2 subject to a Diophantine condition, the majorant function δ β,ξ 2 (T ) has a power rate decay in T as T → ∞. In particular for any ε > 0, δ β,ξ 2 (T ) ≪ T ε−1 holds for all ξ 2 ∈ R k outside a set of Hausdorff dimension < k. Note that for any such β and ξ 2 , the bound in Theorem 1.2 decays like y 1 4 −ε as y → 0. An analogous statement holds for Theorem 1.3.
One should also note that the integral in Theorem 1.3 (but not the one in Theorem 1.2) runs over a closed orbit in X; indeed the point Γ 1 2 , ξ 1 0 u(x) is invariant under x → x + N , since u(t) ξ 1 0 = ξ 1 0 , ∀t, and u(N ) ∈ Γ(N ). Hence it is only natural that the bound obtained in Theorem 1. 3

is invariant under translations of h.
We have made no effort to optimize the dependence on the test functions f and h in the theorems; rather, we have simply imposed as much smoothness and decay of these as needed to comfortably reach the best decay rate with respect to y that our method can give.
The proofs of Theorems 1.2 and 1.3 are given in Sections 4-8; the basic approach is to use Fourier decomposition with respect to the torus fiber variable, just as in [39]; however there are several new difficulties that have to be tackled. In particular, the Γ ′ -orbits in Z 2k , which are used to partition the Fourier decomposition, are more complicated for k ≥ 2 than for k = 1: There are two types of orbits, which we call "A-orbits" and "B-orbits", where B-orbits only appear for k ≥ 2; cf. Sec. 4. Establishing cancelation in the contribution from the B-orbits requires a novel treatment, which we give in Sec. 8. The treatment of the A-orbits (cf. Sec. 7) becomes more delicate for k ≥ 2 than for k = 1, and this is where we need to require that the test function f decays sufficiently rapidly in the cusp (cf. the parameter "a" in Theorems 1.2 and 1.3); this is not needed for k = 1. Other differences versus [39] are that we consider congruence subgroups and not just Γ = SL(2, Z) ⋉ (Z 2 ) ⊕k itself, and the fact that the Diophantine conditions are more complicated in the present paper, as they concern vectors in R k . As will be seen, in the present paper we make crucial use of the assumptions in Theorems 1.2 and 1.3 that either ξ 1 = 0 or ξ 2 = 0. It is an interesting problem to seek a more general treatment so as to obtain an effective version of Theorem 1.1 for general ξ 1 , ξ 2 . We have some preliminary results on this problem and hope to return to it in a later paper.
We next turn to an application of Theorem 1.2: Following an approach introduced by Marklof in [24] using theta series, we will prove an effective quantitative Oppenheim type result for the inhomogeneous quadratic form for a fixed vector (α, β) ∈ R 2 subject to Diophantine conditions. Recall that the original Oppenheim conjecture states that for any indefinite nondegenerate homogeneous quadratic form Q in n ≥ 3 variables, not proportional to a rational form, Q(Z n ) is dense in R. This was proved in celebrated work by Margulis [22]. An effective version of this result has more recently been obtained by Lindenstrauss and Margulis, [21]. A quantitative (but non-effective) version of the Oppenheim conjecture for forms of signature (p, q) with p ≥ 3 and q ≥ 1 was proved by Eskin, Margulis and Mozes, [7], and extended to forms of signature (2, 2) subject to a Diophantine condition in [8]. Similar quantitative results were later proved also for inhomogeneous quadratic forms by Margulis and Mohammadi [23]; in particular the result proved by Marklof [24] for the form Q in (9) is a special case of the results in [23]; however the method of proof in [23] is different and does not involve theta series.
Effective quantitative results for indefinite forms in n ≥ 5 variables have been proved by Götze and Margulis [13]. However we are not aware of any previous effective quantitative results for forms in 3 or 4 variables.
One verifies easily that We say that ξ ∈ R k is κ-Diophantine if there exists a constant c > 0 such that qξ − m ≥ cq −κ for all q ∈ Z + and m ∈ Z k (cf. [25, Sec. 1.5] 2 ). We also say that ξ is [κ; c]-Diophantine in this case. The smallest possible value for κ is κ = k −1 , and on the other hand Lebesgue-almost every ξ ∈ R k is (k −1 + ε)-Diophantine for every ε > 0. In Section 3 we will also discuss a different (also standard) Diophantine condition, which is more directly connected to the decay properties of δ β,ξ (T ).
In Section 9 we prove the following effective quantitative Oppenheim result for the form Q: There exists an absolute constant B > 0 such that for any [κ; c]-Diophantine vector (α, β) ∈ R 2 with |α|, |β| ≤ 1, any f ∈ C 1 c (R 4 ) with support contained in the unit ball centered at the origin, any g ∈ C 3 (R) with S 1,2,3 (g) < ∞, and any T ≥ 1, where the implied constant is absolute. 2 Note that our κ corresponds to "κ − 1" in [25,Sec. 1.5]. Both of these conventions are common in the literature, and we made our choice so as to make the statement of Theorem 1.4 and later results as simple as possible.
The assumption in Theorem 1.4 that supp(f ) is contained in the unit ball simplifies the statement of the theorem, but can easily be weakened by an aposteriori scaling argument; furthermore one can remove the assumption that (α, β) ∈ [−1, 1] 2 , as long as T is large compared to (α, β) . Cf. Corollary 9.12 in Sec. 9.5.
As we will show in Section 9.5, by a standard approximation argument, Theorem 1.4 implies the following effective counting result. For real numbers a < b and T > 0, set (One could also replace the ball { x < T } in (13) by a more general expanding region in R 4 ; however in order to keep the presentation simple we will not elaborate on this.) Corollary 1.5. There exists an absolute constant B ′ > 0 such that for any [κ; c]-Diophantine vector (α, β) ∈ [−1, 1] 2 and any real numbers a < b and T ≥ 1, where the implied constant is absolute.
Note that the right hand sides of (12) and (14) tend to zero as T → ∞ (keeping all other data fixed) whenever 1, α, β are linearly independent over Q and the vector (α, β) is κ-Diophantine for some κ. If (α, β) furthermore satisfies a Diophantine condition of the type discussed in Section 3 then we even have a power rate decay with respect to T in (12) and (14). In particular, by a result of Schmidt [35] (or [34]), we have a power rate decay with respect to T whenever α, β are algebraic numbers such that 1, α, β are linearly independent over Q; cf. Remark 4 below.
Remark 1. The actual powers for the decay with respect to T which we obtain in Theorem 1.4 and Corollary 1.5 are quite small and depend strongly on the a and m appearing in the C m anorm in Theorem 1.2 (which, as we remarked above, we have not attempted to optimize). Cf. Lemma 9.8 and Remark 13 below. It is an interesting problem to seek the maximal power η such that the difference in (12) decays like T −η , for any fixed (α, β) subject to an appropriate Diophantine condition and any sufficiently nice test functions f and g.
Remark 2. The relation lim T →∞ N α,β (a, b, T ) = π 2 2 (b − a) also holds for (α, β) κ-Diophantine with 1, α, β linearly dependent over Q, except that for certain such pairs α, β the definition of N α,β (a, b, T ) in (13) has to be modified by removing one more exceptional subspace besides ∆. This follows as a special case of the (ineffective) result of Margulis and Mohammadi [23, Theorem 1.9] 3 . The reason why Theorem 1.4 and Cor. 1.5 fail to give the desired limiting result in the case when 1, α, β are linearly dependent over Q is that as a crucial step in the proof, Theorem 1.2 is applied with ξ 2 = ( α β ), and as we discussed in connection with Theorem 1.1 (cf. (2)), the asymptotic equidistribution therein fails when 1, α, β are Q-linearly dependent. This situation is discussed in [24, Appendix A], and as indicated there, and carried out in some special cases, it is possible to extend the proof method of [24] to the case of Qlinear dependence, by utilizing equidistribution in the appropriate homogeneous submanifold of Γ\G. It would be interesting to make this approach effective, i.e. to seek a satisfactory effective version of the statement that lim It should be noted that some Diophantine condition on (α, β) is certainly necessary in order for lim T →∞ N α,β (a, b, T ) = π 2 2 (b − a) to hold; cf. [24,Thm. 1.13 and Sec. 9]. By contrast, the 3 The notion of ξ ∈ R k being "κ-Diophantine" in [23] is different from the one which we have defined; however it is easy to verify that if ξ is κ-Diophantine in the sense of [23, Def. 1.7] then ξ is (κ − 1)-Diophantine in our sense, and if ξ is κ-Diophantine in our sense then ξ is k(κ + 1)-Diophantine in the sense of [23,Def. 1.7]. One also verifies by a direct computation that the form Q in (9) with (α, β) / ∈ Q 2 admits at most one more exceptional subspace in the sense of [23, p. 124(bottom)] besides ∆ = {(m1, m1) : m1 ∈ Z 2 }, and such an exceptional subspace can only occur when 1, α, β are linearly dependent over Q.
non-quantitative result that Q(Z 4 ) is dense in R, and in fact even lim inf T →∞ N α,β (a, b, T ) ≥ π 2 2 (b − a) for all a < b, is known to hold for all irrational vectors (α, β), that is, for all (α, β) ∈ R 2 \ Q 2 . This is a special case of [23,Thm. 1.4].
Finally let us note that Theorem 1.4 implies an effective version of the main theorem of [24], which says that under explicit Diophantine conditions on (α, β) ∈ R 2 , the local two-point correlations of the sequence given by the values of Q 1 (m, n) = (m − α) 2 + (n − β) 2 , with (m, n) ∈ Z 2 , are those of a Poisson process -a result which partly confirms a conjecture of Berry and Tabor [1] on quantized integrable systems. For fixed (α, β) ∈ R 2 , denote by 0 ≤ λ 1 ≤ λ 2 ≤ · · · → ∞ the sequence of values of Q 1 (m, n) for (m, n) ∈ Z 2 , counted with multiplicity. One easily verifies that the asymptotic density of this sequence is π: For a given interval [a, b] ⊂ R, the pair correlation function is then defined as In Section 9.5 we will prove: Corollary 1.6. There exists an absolute constant B ′′ > 0 such that for any [κ; c]-Diophantine vector (α, β) ∈ R 2 , and any real numbers a < b and Λ ≥ 1, where the implied constant is absolute.
This corollary indeed gives an effective version of Marklof [24,Theorem 1.8], as well as of [25,Theorem 1.6] in the case k = 2, since the right hand side of (16) tends to zero as T → ∞ for any fixed κ-Diophantine vector (α, β) (any κ) such that 1, α, β are linearly independent over Q.
The main result in Marklof [25,Theorem 1.6] generalizes [24,Theorem 1.8] to the case of the local pair correlation density of the sequence m − α k (m ∈ Z k ) for any k ≥ 2 (and also for k = 2 it is a stronger result, since the Diophantine condition imposed on the vector (α, β) ∈ R 2 is weaker). Unfortunately it seems that Theorem 1.2 above cannot be used to prove an effective version of this more general result when k ≥ 3. The reason is that the key equidistribution result required, [25,Thm. 5.1], concerns the integral with σ = k 2 − 1, that is, the integral which appears in Theorem 1.2 but with the function h replaced by x → y σ h(y σ x). With this choice, the S ∞,2+ε,2 -norm in the right hand side of (7) grows rapidly as y → 0, making the bound useless. This failure may at first seem surprising, since the factor y σ means, when σ > 0, that we are considering a unipotent orbit expanding at a faster rate than for σ = 0, so the result can be expected to be easier (or at least not more difficult) to prove. However, there is a genuine difference between x near zero and x far from zero in the integrand in (17); for example, for any u(n) ∈ Γ, using u(n) 1 2 , 0 It is clear from this that if one would solve the aforementioned problem of proving an effective version of Theorem 1.1 in the general case with both ξ 1 , ξ 2 allowed to be non-zero, this can be expected to also lead to an effective version of [25,Thm. 5.1], and so, with further work, should also lead to an effective version of [25, Theorem 1.6] for general k ≥ 2.

Some notation
We use the standard notation A = O(B) or A ≪ B meaning |A| ≤ CB for some constant C > 0. We shall also use A ≍ B as a substitute for A ≪ B ≪ A. The implicit constant C will always be allowed to depend on k and N without any explicit mention. If we wish to indicate that C also depends on some other quantities f, g, h, we will use the notation Recall from Section 1 that G ′ = SL(2, R) and G = G ′ ⋉ R 2k . Let g be the Lie algebra of G; it may be naturally identified with the space sl(2, [20,Prop. 1.124]). Using this notation, we fix the following basis of g: . (cf. Section 1). Given a function f on X = Γ\G, we will often view f as a function on G through f (g) = f (Γg), and we will write Since Γ ′ is normal in Γ ′ , f R is also left Γ-invariant, i.e. f R can be viewed as a function on X.
Note also that f R C m a = f C m a for all m ≥ 0, a ∈ R.

Linear form Diophantine conditions
Given real numbers κ ≥ k and α ≥ 1, we say that a vector ξ ∈ R k is κ-LFD (short for κ-linear form Diophantine) if there is a constant c > 0 such that and we say that ξ is (κ, α)-LFD if there is a constant c > 0 such that Recall here that for x ∈ R, x denotes the distance to the nearest integer, and rξ is the scalar product, rξ = r 1 ξ 1 + . . . + r k ξ k . The condition in (20) is very standard in the Diophantine approximation literature; however we are not aware of any discussion of the more general condition in (21). When (20) holds, we will say that ξ is [κ; c]-LFD, and similarly when (21) holds, we will say that ξ is [(κ, α); c]-LFD. Note that being [κ; c]-LFD is equivalent to being [(κ, α); c]-LFD for any α ≥ κ. Hence the notion of being [(κ, α); c]-LFD is mainly relevant when 1 ≤ α < κ, and in this case the condition (21) is equivalent to the same condition with r restricted to being a primitive vector in Z k (viz., a vector with gcd(r 1 , . . . , r k ) = 1).
Proof. The set in the statement of the Lemma contains the set of all ξ ∈ R k which are not κ-LFD, and the latter set has (Hausdorff) dimension k − 1 + k+1 κ+1 , cf. Bovey and Dodson, [2]. Furthermore, taking r = e 1 in (21) we see that the set in the statement of the lemma contains the set of all ξ ∈ R k for which ξ 1 is not α-LFD, and this set has dimension k − 1 + 2 α+1 . Hence it remains to prove that the dimension in the statement of the lemma is bounded Then every non-(κ, α)-LFD ξ in [0, 1) k belongs to ∆ j,r,m for infinitely many (j, r, m) ∈ Z + × (Z k \ {0}) × Z. Note also that ∆ j,r,m = ∅ unless |m| ≪ j r , and for any (j, r, m) ∈ Z + × (Z k \ {0}) × Z, if we set ℓ = ℓ j,r = j −α−1 r −κ−1 then the set ∆ j,r,m can be covered by ≪ ℓ 1−k open hypercubes each having sides of length ≪ ℓ, with the normal to each face being parallel to a co-ordinate axis. If s > k − 1 + max k+1 κ+1 , 2 α+1 then the total s-volume of the family of hypercubes obtained as (j, r, m) runs through Note also that for any δ > 0 there are only a finite number of non-empty sets ∆ j,r,m satisfying ℓ j,r ≥ δ; hence every non-(κ, α)-LFD ξ ∈ [0, 1) k is contained in the union of hypercubes in the above family restricted by ℓ j,r < δ. It follows that for every s > k − 1 + max k+1 κ+1 , 2 α+1 , the s-dimensional outer Hausdorff measure of the set of all non-(κ, α)-LFD ξ in [0, 1) k equals zero. This completes the proof.
We will need the following auxiliary result.
Lemma 3.2. Let η ∈ R, c > 0, κ ≥ 1, and assume that jη ≥ cj −κ for all j ∈ Z + . Then (The bound is essentially optimal. Indeed, if jη ≤ cj −κ holds for some j then for T = j 1+κ /c, already the term 1 j 2 +T j jη is bounded below by 1 2 (cT ) − 2 1+κ .) Proof. We assume cT > 1 since otherwise the bound is trivial. Note that the assumptions of the lemma imply that η is irrational, and 0 < c ≤ η ≤ 1 2 . Thus T > 2. Let p k /q k be the kth convergent of the (simple) continued fraction expansion of η (cf., e.g., [14, Ch. X]; in particular 1 = q 0 ≤ q 1 < q 2 < · · · ). For any ℓ ≥ 1 we have where the last bound follows from [29,Lemma 4.8 where we used the fact that q ℓ is bounded below by the ℓth Fibonacci number.
We now give a result on the rate of decay of the majorant function δ β,ξ (T ) (cf. (5)), assuming that ξ is of an appropriate LFD type. In fact we consider the following slightly simpler majorant: Note that δ β,ξ (T ) and δ β,ξ (T ) decay with very similar rates, since Proof. Using Lemma 3.2 and the assumption that ξ is [(κ, α); c]-LFD, we have Multiplying by r −β and adding over all r ∈ Z k \ {0}, we obtain the stated bound.

Fourier decomposition with respect to the torus variable
We now start with the proof of Theorems 1.2 and 1.3. In this section, which generalizes [39,Sec. 4], we consider the Fourier decomposition of a given test function on X with respect to the torus variable, and prove bounds on the resulting Fourier coefficients. Some parts of our discussion is a close mimic of [39,Sec. 4], but there are also some new aspects that have to be considered; cf. in particular all of Section 4.2 below.
To start with, we consider an arbitrary function f ∈ C(Z 2k \G), where Z 2k is viewed as a subgroup of G through n → (1 2 , n). We view f as a function on G by composing with the projection G → Z 2k \G. Then f (M, ξ) = f ((1 2 , n)(M, ξ)) = f (M, ξ+n) for all n ∈ Z 2k , which means that for any fixed M ∈ G ′ , ξ → f (M, ξ) is a function on the torus T 2k = Z 2k \R 2k . We write f (M, m) for the Fourier coefficients in the torus variable; Here dξ denotes Lebesgue measure on R 2k . Thus for f ∈ C 2 (Z 2k \G) we have with absolute convergence uniformly 4 over (M, ξ) in any compact subset of G. (Indeed, the If f is also invariant under some T ∈ Γ ′ = SL(2, Z), this leads to a corresponding invariance relation for f (M, m): where t T is the transpose of T .
Proof. We have where in the second equality we used the fact that ξ → T ξ is a diffeomorphism of T 2k preserving dξ, and in the last equality we used the fact that f is left T -invariant. Using m(T ξ) = ( t T m)ξ we obtain (31).
Because of Lemma 4.1, if f ∈ C 2 (Γ\G), then it is convenient to group the terms in (30) together according to the orbits for the action of Γ ′ on Z 2k . We call an orbit for this action an A-orbit if it contains some element of the form 0 r , where r ∈ Z k \ {0}. Every other non-zero orbit is called a B-orbit.
with the property that there are some Proof. Let η = q r be an element in a B-orbit. Then η = 0, and we may take ℓ 1 to be the smallest index for which q ℓ 1 r ℓ 1 = 0 0 . After replacing η by T η for an appropriate T ∈ Γ ′ we can ensure that q ℓ 1 = 0 and r ℓ 1 > 0, while clearly still q j = r j = 0 for all j < ℓ 1 . Now since η is not in an A-orbit we cannot have q j = 0 for all j, and we take ℓ 2 > ℓ 1 to be the smallest index for which q ℓ 2 = 0. Finally by replacing η by 1 0 x 1 η for an appropriate x ∈ Z we can make 0 ≤ r ℓ 2 < |q ℓ 2 | hold, while q j and r j for j < ℓ 2 remain unchanged.
Let us fix, once and for all, a set of representatives A k , B k ⊂ Z 2k such that A k contains exactly one element from each A-orbit and B k contains exactly one element from each Borbit, and furthermore each η ∈ A k is of the form η = 0 r and each η ∈ B k has the property described in Lemma 4.2.
The lemma implies that we can decompose Z 2k as a disjoint union of singleton sets as follows: where Γ ′ ∞ \Γ ′ denotes any set of representatives for the right cosets inside Γ ′ of the subgroup Grouping together the terms in (30) according to (32), and then applying Lemma 4.1, we get, for any f ∈ C 2 (Γ\G): If k = 1 then B k = ∅ and (34) can be seen to agree with [39,Lemma 4.1]. However B k is easily seen to be nonempty for every k ≥ 2.
We now wish to give a similar decomposition of a general function f ∈ C 2 (X). Recall that X = Γ\G and Γ = Γ ′ ⋉ Z 2k with Γ ′ = Γ(N ), a normal subgroup of Γ ′ = SL(2, Z). For any subgroup H of G ′ and any subset A ⊂ G ′ satisfying HA = A, we denote by H\A a set of representatives for the distinct cosets Ha (a ∈ A). We also write Γ ′ ∞ \Γ ′ /Γ ′ for a set of representatives for the double cosets of the form Γ One then verifies that R∈Γ ′ Using Γ ′ R = RΓ ′ and t (Rγ)η = t γ( t Rη) for γ ∈ Γ ′ , this formula is seen to provide a decomposition of Z 2k into orbits for the action of t Γ ′ = Γ ′ . In order to get a convenient corresponding partition of the sum in (30), recall (19), and note that for any This is proved by a computation similar to the proof of Lemma 4.1. Using Lemma 4.1 we get f (M, t γ t Rη) = f R (RγM, η) for all γ ∈ Γ ′ , or in other words: Now from (30), (35) and (36) we get: Note here that for any η ∈ A k and R ∈ Γ (36) and Lemma 4.3. However for η ∈ B k there is no such invariance present.
Proof. The left invariant differential operator corresponding to Y ∈ g is given by and hence by repeated integration by parts we have Hence Using Lemma 4.4 we immediately obtain bounds on derivatives of f (·, ·) with respect to the first variable. We express these in terms of Iwasawa co-ordinates, that is we write (by a slight abuse of notation)
We have the following analogue of Lemma 4.4.
Remark 6. As a consequence, for any 0 < β < 1 2 we have Proof. We write η = q r and T = a b c d . Repeated integration by parts gives (cf. the proof of Lemma 4.4): Hence if we write η (ℓ) := q ℓ r ℓ ∈ R 2 then we conclude that for each ℓ ∈ {1, . . . , k} and for Now fix a column vector v of T with the largest norm. Then T ≤ √ 2 v . By our definition of B k , η has the property described in Lemma 4.2, i.e. there are 1 ≤ ℓ 1 < ℓ 2 ≤ k such that r j = 0 for all j < ℓ 1 , q j = 0 for all j < ℓ 2 , and r ℓ 1 > 0, 0 ≤ r ℓ 2 < |q ℓ 2 |. In particular the vectors η (ℓ 1 ) and η (ℓ 2 ) are non-zero, hence both have length ≥ 1, and the angle between the lines Rη (ℓ 1 ) and Rη (ℓ 2 ) in R 2 is > π 4 . Hence the normal line to v in R 2 has an angle ≥ π 8 to at least one of the lines Rη (ℓ 1 ) and Rη (ℓ 2 ) , and it follows that at least one of the scalar products η (ℓ 1 ) v and η (ℓ 2 ) v has an absolute value ≥ sin( π 8 ) v . Hence using (43) we get Next let v ′ be the other column vector of T , and let α ∈ (0, π 2 ] be the angle between the lines Rv and Rv ′ ; then v v ′ sin α = 1, since det T = 1. Let ℓ ∈ {1, . . . , k} be the index for which η (ℓ) is maximal; then η ≤ √ k η (ℓ) . Now the normal line to η (ℓ) in R 2 must have an angle ≥ α 2 to at least one of the lines Rv and Rv ′ . Hence either Applying (43) for the appropriate column vector of T we get Together, (44) and (45) imply (42).
Using Iwasawa co-ordinates, the bound in Remark 6 can be expressed as follows, for any 0 < β < 1 2 (cf. Remark 5): Arguing again as in [39,Lemmas 4.3,4.4] we now obtain the following bound on derivatives.

Obtaining the leading term
Our task is to study the integral We may assume 0 < y ≤ 1 from the start, since (7) and (8) are otherwise trivial (indeed, the left hand sides of (7), (8) are always ≪ f C 0 0 S ∞,0,2 (h)). Decomposing f as in (37), we get that (47) is Here the change of order of summation and integration will be justified by an absolute convergence which holds for any f and h as in Theorem 1.2 or Theorem 1.3; cf. Lemmata 7.3 and 8.2 as well as (98), (99) below.
By the bound by Kim and Sarnak [19] towards the Ramanujan conjecture, the smallest nonzero eigenvalue of the Laplace operator on the hyperbolic surface Γ(N )\H satisfies λ 1 ≥ Remark 7. Note that in the more general setting of Theorem 1.1, we could have e.g. Γ = Λ ⋉ (Z 2 ) ⊕k with Λ being a non-congruence subgroup of SL(2, Z). If we would seek to extend the present methods to that case, when carrying out this first step of using equidistribution on Λ\ SL(2, R), we would obtain an analogue of (49) with an error term decaying as O(y c(Λ) ) for some 0 < c(Λ) ≤ 1 2 . However in this case it is known that for certain choices of Λ the spectral gap for Λ\ SL(2, R) can be made arbitrarily small [36], meaning that there is no uniform lower bound on the exponent c(Λ).

Cancellation in an exponential sum
In this section, we derive bounds on certain exponential sums which give nontrivial cancellations in various sums that arise frequently in our arguments in the rest of the paper. Recall that Γ ′ = SL(2, Z) and . For any given integer c ≡ c 0 mod N , we consider the following subsets: is a finite set. We introduce the symbol (1) to denote summation over all matrices in [Γ ′ ∞ \[R] ; c], and (2) to denote summation over all matrices in Note that the summation range in both (1) and (2) depend implicitly on c, N and R.
Remark 8. In the rest of this section we will assume c = 0. Note that we have an obvious bijection, Hence without loss of generality we may assume c > 0.
For any N, R, c as above with c > 0, and m, n ∈ Z, we introduce the following generalized Kloosterman sum: , both d mod cN and a mod cN are independent of the choice of coset representative. We begin by deriving bounds for the sums S(m, n; c; R, N ).
Proof. By a straightforward analysis one verifies that the map a b c d → a mod cN, d mod cN The formula (51) now follows since the map taking a, d , For n a positive integer, we write σ(n) for the number of (positive) divisors of n, and σ 1 (n) for their sum: σ(n) = d|n 1 and σ 1 (n) = d|n d.
We now use the multiplicativity relation to prove that the generalized Kloosterman sums satisfy a Weil type bound (cf. (54)), and to give an explicit formula in the case n = 0.
We are now set to state and prove the main lemma in this section.
. Then for any subset K ⊂ Z and any α ∈ R, where c 2 is a multiplicative inverse of c 2 mod N .
We remark that the sum in the left hand side of (58) is well-defined, since, ; c], both d and the congruence class of a modulo cN are independent of the choice of a coset representative.

Proof. Set
Note that since f ∈ C 4 ∩ L 1 (R×(R/Z)); the sum defining H(x 1 , x 2 ) is absolutely convergent for almost all (x 1 , x 2 ) ∈ R × (R/Z), and H ∈ L 1 (R 2 /Z 2 ). We will use the notation F j,k = ∂ j In order to get a stronger convergence statement, we note that This follows by integrating the inequality |f ( . Similarly, we have |f j,0 (r, x 2 )| ≤ R/Z (|f j,0 (r, s)| + |f j,1 (r, s)|) ds, and using this in (60), we obtain the following elementary Sobolev embedding type inequality: Using (61) and the fact that f j,k ∈ L 1 (R × (R/Z)) for j, k ≤ 1, we conclude that the sum in (59) is absolutely convergent for all (x 1 , x 2 ), uniformly over (x 1 , x 2 ) in any compact set. In particular, the function H(x 1 , x 2 ) is defined everywhere on R 2 /Z 2 , and is continuous.
Consider the Fourier coefficients of H, Note that for any j ≤ 1 and k ≤ 2, This follows by applying (60) to F j,k (x 1 , N x 2 ) and using F j,k , F j+1,k ∈ L 1 (R × (R/N Z)). We may now integrate by parts repeatedly in (62), using (63) to justify convergence, to obtain for any 0 ≤ j, k ≤ 2 and any integers m, n subject to m = cN α if j > 0 and n = 0 if k > 0. Using this formula for j ∈ {0, 2} and k = 2 gives Similarly, using (64) for j ∈ {0, 2} and k = 0, These bounds imply that the Fourier series of H is absolutely convergent; and since H is continuous, H is in fact equal to its Fourier series at every point (cf., e.g., [11,Prop. 3.1.14]): Now we consider the sum in the left hand side of (58). We have Here all sums are absolutely convergent, since the sum in (59) is absolutely convergent and (2) runs over a finite set. Substituting (67) in the last sum, and using (50), we obtain Now we bound the contribution from all terms with n = 0 in (69) using (65), (54) and n =0 (m, n, c) 1/2 n −2 ≤ n =0 |n| −3/2 ≪ 1, while the terms with n = 0 are handled using (62) and (55) when m ∈ K, and using (66) and (56) when m / ∈ K. In this way we obtain (58).
a c remains the same if we replace c, R, α by −c, −R, −α ; after this replacement, Lemma 6.3 applies to the sum. Lemma 6.3 will suffice for most parts of our discussion. However, at one step in the treatment of the sum over B k in (48), we will need a more delicate estimate. The point here is to obtain a bound which only involves derivatives ∂ ℓ 1 x 1 ∂ ℓ 2 x 2 F with ℓ 2 as small as possible. Lemma 6.3 requires using ℓ 2 = 2 but the following lemma will effectively allow us to take ℓ 2 = 1 2 + ε. Cf. also Remark 12 below. We define a mixed L 1 , L 2 norm for a functions F on R × R/N Z as follows: Hence as in the proof of Lemma 6.3, H(x 1 , x 2 ) in (59) is a well-defined continuous function on R 2 /Z 2 , and its Fourier coefficients a m,n satisfy (64) for any j ≤ 2, k ≤ 1; that is, This gives a relation between a m,n and the n-th Fourier coefficient of F m,j,k . Using this relation for j ∈ {0, 2} and applying Parseval's identity, for any k ≥ 0 and m ∈ Z, we get Using this bound, n =0 |a m,n | ≤ ( n =0 |n| −2 ) 1 2 ( n =0 n 2 |a m,n | 2 ) 1 2 , and (66), we conclude that the Fourier series of H is absolutely convergent, and hence as in the proof of Lemma 6.3, we again have Using (66) and (56) for n = 0, and the generalized Weil bound (54) for n = 0, we see that (72) is |a m,n | (n, c). |a m,n | 2 |n| 1+ε . Now, since 0 < ε < 1, we may apply Hölder's inequality with p = 2 1−ε and q = 2 1+ε , to get Here in the last step we use the Parseval bound, (71), for k = 0 and k = 1. Furthermore, Hence for any m, n =0 |a m,n | (n, c) Using this bound in (73), we obtain (70).

7.
The contribution from A k -orbits 7.1. The case of Diophantine ξ 2 . We next study the sum in the second line of (48). This sum will be bound by a generalization of the method in [39]. We first prove a bound which is adequate for any ξ = ξ 1 ξ 2 ∈ R 2k for which ξ 2 has good Diophantine properties. This bound will be used in the special case ξ 1 = 0 in the proof of Theorem 1.2. We note that we allow the special case k = 1 in the present section, to allow comparison with [39,Prop. 8.3]; cf. Remark 11 below.
Proposition 7.1. Fix an integer m ≥ max(8, k + 3) and real numbers a ∈ ( k 2 − 1 2 , m 2 − 1) and ε > 0. Then for any (Recall that the majorant function δ β,ξ 2 (T ) was introduced in (27).) To start with the proof of Proposition 7.1, let us fix some η = 0 r ∈ A k and R = a 0 b 0 c 0 d 0 ∈ Γ ′ . Using the notation introduced in Section 6, the corresponding inner sum in (74) can be written as The contribution from the terms with c = 0 can be bounded easily. Indeed, there are at most two such terms in (75), and by Lemma 4.4 and the remarks below (19), for any b ∈ Z we have Using this with m = k + 1 and adding over all η ∈ A k , we see that the contribution from all the terms with T = * * 0 * in the second line of (48) is O( h L 1 f C k+1 0 y (k+1)/2 ), which is clearly subsumed by the bound in (74). Hence, from now on we focus on the terms with c = 0. The following lemma expresses the integral in (75) in the Iwasawa notation (cf. (40)).
in the sense that if either of the integrals is absolutely convergent then so is the other, and the equality holds.
Remark 10. In the case c < 0 one obtains exactly the same formula, except that We now prove that we have an absolute convergence in the left hand side of (74); this fact is important in order to justify the manipulations which we will carry out later. Lemma 7.3. Set m = max(3, k + 1). Then for any f ∈ C m 0 (X) and any h ∈ C 1 (R) with S 1,0,1 (h) < ∞, the expression is finite for all y > 0. If, furthermore, f ∈ C m a (X) for some a and m subject to a ≥ 0, a > k 2 − 1 and m > 2a + 2, then the expression in (77) stays bounded as y → 0. (Note that the lemma in particular applies to any f and h as in Proposition 7.1.) Proof. As previously, we write T = a b c d . The contribution from terms with c = 0 in (77) is treated by (76). Thus, we only consider the terms with c > 0; the terms with c < 0 can be dealt with similarly. By Lemma 7.2, and since Γ ′ ∞ \Γ ′ /Γ ′ is finite, it suffices to prove that for By Lemma 4.5 (and the observations below (19)), for any m ≥ 0 and a ∈ R ≥0 we have uniformly over u ∈ R. Using this bound for both m = 0 and a general m ≥ 0, we conclude We decompose the innermost sum in (78) in the same way as in (68), and then use the fact that which holds since |h(α)| ≤ α+1/2 α−1/2 (|h(x)| + |h ′ (x)|) dx for all α ∈ R. From the proof of Lemma 6.1 we also have Hence, we conclude that if f ∈ C m a (X) and S 1,0,1 (h) < ∞ then the left hand side of (78) is Assuming m > 2a + 2, we get (cf. Lemma 7.4 below): (Here m > 2a + 1 suffices for the first step, while m > 2a + 2 is needed to get the last bound.) The last sum converges provided that either m > k or 2a + 2 > k; and if 2a + 2 > k then it also stays bounded as y → 0.
In the proof above, we used the following bound, which we will need again later.
Lemma 7.4. Fix a ≥ 0 and m > 2a + 1. Then for any u > 0 and r ≥ 1 we have Proof. This is a straightforward case-by-case analysis.
We continue with the proof of Proposition 7.1. Using Lemma 7.2 and Remark 10, the sum in (75), excluding all terms with c = 0, can be expressed as Here the change of order of summation and integration is justified by Lemma 7.3. We will only deal with the first sum in (81); the second sum can be dealt with similarly (cf. Remark 9). By Lemma 6.3, for any positive integer c ≡ c 0 mod N and any θ ∈ (0, π), we have: Here we will use the bound (79). By a similar application of Lemma 4.5 as in (79), we have for any m ′ ∈ Z ≥6 and a ′ ∈ R ≥0 , uniformly over u ∈ R: Using these bounds, we conclude that the first sum in (81) is Lemma 7.5. Fix β > 1. Then for any α ∈ R and X > 0, Proof. If β ∈ Z then this is [39, Lemma 8.2] (with η = 1 and m = β + 1). The proof extends without changes to the case of an arbitrary real β > 1.
Lemma 7.6. For any X > 0 and β > 1 2 , Proof. ( (83)) is In order to obtain a bound on the left hand side of (74), we have to add over R running through the finite set Γ ′ ∞ \Γ ′ /Γ ′ , and add over all η ∈ A k , which means that r runs through a subset of Z k \ {0}. For this to give a satisfactory result, we have to assume 2a + 1 > k, while in the second bound we choose a ′ = max( k 2 − 11 4 , 0); with this choice, m ′ = max(8, k + 3) satisfies the condition m ′ > 2a ′ + 15 2 . Adding now over R and η, we conclude that the left hand side of (74  [39]) than Prop. 7.1, namely, essentially, "S 1,0,1+ε (h)" in place of S 1,0,2 (h). We have avoided this in the present paper for simplicity of presentation.
7.2. The case ξ 2 = 0. In this case, we prove the following bound.
Proposition 7.7. Fix an integer m ≥ max(8, k + 3) and real numbers a ∈ ( k 2 − 1 2 , m 2 − 1) and ε > 0. Then for any f ∈ C m a (X), h ∈ C 2 (R) with S 1,0,2 (h) < ∞, ξ = ξ 1 0 ∈ R 2k and 0 < y ≤ 1, we have (1) Note here that the error term in last line is the same as the last line in (82); hence it can be bounded as before; cf. (83), (86). In the remaining error term in (88) we have We can now argue as in the proof of Proposition 7.1, but instead of Lemma 7.5 using the simple bound which is valid under the assumption that m > 2a + 1, and for any fixed ε > 0. This leads to the conclusion that the contribution from the error terms in the last two lines of (88) to the left hand side of (87) is with a = k 2 − 1 2 , m = k + 1, a ′ = max( k 2 − 11 4 , 0) and m ′ = max(8, k + 3). This is clearly subsumed by the right hand side of (87). Now, it only remains to consider the first line in the right hand side of (88). The contribution from this line to the expression in the first line of (81) can be written as follows, after expressing the indicator function of c ≡ c 0 mod N as N −1 b mod N e(b(c − c 0 )/N ): where α := rξ 1 + b/N . We will use integration by parts to handle the sum over c. Thus we let We have the following bound, analogous to [39, Lemma 9.2].
For any m ≥ 0 and a ∈ R ≥0 , by Lemma 4.5 we have (in a similar way as in (79)) Using integration by parts in (89) (justified using (91) and B α (X) ≪ X 2 ), we have: Furthermore, Lemma 7.4 implies that for m > 2a + 1, Hence, also using Lemma 7.8 and jα = j(rξ 1 + b/N ) ≥ N −1 jN rξ 1 , we find that the expression in (89) is Here for each j ≥ U we use (U + X) 1+2a−m ≤ X 1+2a−m and min( 1 where the last bound is proved by splitting into the two cases U ≤ j jβ and U > j jβ and evaluating the integrals. The proof of the lemma is completed by adding up our bounds over all positive integers j, and noticing that j≥U j 2a−m ≪ (U + 1) 1+2a−m , which is bounded above by the contribution from j = 1 in the right hand side of (93).
Assuming now m > 2a + 2, using the lemma we get, via (92), that the expression in (89) is (Indeed, the last bound holds even if the last sum over j is restricted to j = N, 2N, 3N, . . ..) Finally we have to add this bound over all R in the finite set Γ ′ ∞ \Γ ′ /Γ ′ , and over all η ∈ A k , which means that r runs through a subset of Z k \ {0}. Comparing with the definition (5), assuming now a > k−1 2 (viz., 2a + 1 > k), we immediately find that the sum of the bound in (94) over all r ∈ Z k with 0 < r < y −1/2 is On the other hand, for r with r ≥ y −1/2 , the sum over j in (94) equals ∞ j=1 j −2 = π 2 /6, and hence the sum of the bound in (94) over all such r is, assuming m > k However this is subsumed by the bound (95), since a > k−1 2 and δ µ,ξ (T ) ≥ (T + 1) −1 for all T > 0 (as is clear by taking r = e 1 , j = 1 in (5)). Hence for any fixed a > k−1 2 and m > 2a + 2, m ∈ Z, we have proved that the contribution from the first line in the right hand side of (88) to the left hand side of (87) is bounded by (95). This completes the proof of Proposition 7.7.
Any T = a b c d ∈ Γ ′ with c = 0 can be expressed as T = ε 1 n 0 1 , where ε ∈ {−1, 1} and n ∈ Z, and the contribution from these T to the left hand side of (96) is wherein R denotes the unique element in our chosen system of representatives Γ ′ /Γ ′ satisfying R ≡ ε 1 n 0 1 mod N . Using (97) and m ≥ 2k + 1, we get that the sum in consideration is This is clearly subsumed by the bound in (96).
Hence from now on we focus on the terms for T = a b c d with c = 0 in the left hand side of (96). We will restrict to the case c > 0; the case c < 0 can be handled completely analogously. We fix some η = ( q r ) ∈ B k and R = a 0 b 0 c 0 d 0 ∈ Γ ′ . Using Lemma 7.2, the inner sum can be expressed as: Let us first record a trivial upper bound on (100), variants of which will be used repeatedly below.
Proof. We overestimate the sum by letting a, c, d run through all integer triples with c > 0 and ad ≡ 1 mod c. Using (97) we then get that the left hand side of (101) is where v = v(y, c, θ) = sin 2 θ c 2 y and u n = u n (y, c, d, θ) = n + α c − sin 2θ 2c 2 y , with α = α(c, d) being the unique integer between 1 and c satisfying αd ≡ 1 mod c. But here where we used the fact that m > 2k ≥ 2. Furthermore, if S ∞,2,0 (h) < ∞ then we have Hence we obtain that (102) is as one verifies by treating the two cases c 2 y ≥ 1 and c 2 y < 1 separately, and in the latter case, splitting the interval for θ into the parts {θ : | sin θ| < c √ y} and {θ : | sin θ| ≥ c √ y}. Now the lemma follows by using (106) in (105).
Adding the bound in Lemma 8.2 over all R ∈ Γ ′ /Γ ′ and η ∈ B k (again using m > 2k), we immediately see that the sum in the left hand side of (96) stays bounded as y → 0. In order to show that the sum actually decays as y → 0, we have to establish cancellation in (100). It will be convenient later (cf. Lemma 8.4 below) to note that we may restrict the integral in (100) to those θ ∈ (0, π) which satisfy y| cot θ| ≤ 1. Indeed, if y| cot θ| > 1 then | sin θ| < y, and we note that for any c ≥ 1 we have, with v = sin 2 θ c 2 y as in the proof of Lemma 8.2, Using this bound in place of (106) in the proof of Lemma 8.2, we conclude that the contribution from θ with y| cot θ| > 1 in (100) is ≪ f C 3m 0 S ∞,2,0 (h) η −m y m/2 . Adding this over R and η as in the left hand side of (96), we again obtain a bound which is (by far) subsumed by the bound in (96).
Let us also note that if T = a b c d in (100) has d = 0 then necessarily c = 1, and inspecting the proof of Lemma 8.2 we see that the contribution from all such T in (100) is hand side of (96), which is ok. Hence from now on we may consider the sum in (100) restricted by d = 0. Next we will make use of the approximation a c = 1+bc dc ≈ b d . The error in doing so is controlled by the following lemma.
Proof. For any a b c d ∈ Γ ′ with c, d = 0 we have, letting J be the interval with endpoints a c − sin 2θ 2c 2 y and b d − sin 2θ 2c 2 y , and using a c − b d = 1 dc and (97), with v = sin 2 θ c 2 y and u = a c − sin 2θ 2c 2 y . (We used the crude bound |d| −1 ≤ 1, and the fact that (u + ξ) 2 + 1 ≍ u 2 + 1 for all u ∈ R, |ξ| ≤ 1.) Hence, arguing as in the proof of Lemma 8.2, and using the same notation "u n " as there, we find that the left hand side of (107) is The rest of the proof is very similar to Lemma 8.2, except that we now use in place of (106).
Adding the bound in Lemma 8.3 over all R ∈ Γ ′ /Γ ′ and η ∈ B k gives a bound , and this is subsumed by the bound in (96). Hence from now on we may replace a c by b d in (100). Restricting the summation to d > 0 (the case d < 0 being completely analogous), and writing I y := {θ ∈ (0, π) : y| cot θ| ≤ 1}, the resulting sum is: c>0, d>0 e (bq + dr)ξ 2 Replacing a, b, c, d by −b, d, −a, c in this sum gives, with R : a<0, c>0 e (dq + cr)ξ 2 where ( 1) is the same as (1) (cf. p. 16) but using R in place of R, and for any c ∈ Z + and θ ∈ (0, π), F c,θ (x 1 , x 2 ) is the function on R × (R/N Z) given by (Note that F c,θ also depends on N, y, R, η.) Using | f R (u, v, θ; η)| ≪ min(v, v −1 ) m/2 , cf. (97), we see that the sum defining F c,θ (x 1 , x 2 ) is absolutely convergent, and that F c,θ (x 1 , x 2 ) is continuous on R × (R/N Z). If F c,θ is sufficiently differentiable with the first few derivatives being in L 1,2 , then we may apply Lemma 6.4, to see that, for any 0 < ε < 1 2 , Bounds on the L 1,2 -norms of derivatives of F c,θ are provided by the following lemma.
We now turn to the proof of the bound (111). Using S ∞,a,ℓ 2 (h) < ∞, y| cot θ| ≤ 1 and (113), we see that (112) This bound is also valid when ℓ 2 = 0. Next, using the fact that R (u 2 It follows that the left hand side of (111), after squaring, is Using m > 2ℓ + 3 2 > 2ℓ + 1 + ε 2 , we find that the integral in the last line of (116) is Carrying out the addition over γ, we obtain the bound in (111).
Note that Lemma 8.4 also applies to give a bound on ∂ ℓ 1 , by Cauchy-Schwarz. However, in the case c √ y ≤ | sin θ|, we need to get rid of the ε-power in (111). Thus we prove: Lemma 8.5. For any integers ℓ 1 ≥ 0 and m > 2ℓ 1 +1, for any f ∈ C 3m+ℓ 1 0 (X) and Proof. Following the proof of Lemma 8.4, we see that the left hand side of (117) is ≪ 0 −∞ B 1 (s) ds, where B 1 (s) is given by (115) (with a = ℓ 2 = 0). This integral is bounded by a direct computation, and we obtain the bound in (117).
Using Lemma 8.6, it follows that (121), and hence also (108), is (We replaced "N qξ 2 " by qξ 2 through the same type of estimate as in (94).) Adding the last bound over R ∈ Γ ′ /Γ ′ and η ∈ B k , using η∈Z 2k \{0} η −m < ∞ and r∈Z k ( q r ) −m ≪ q k−m for every q ∈ Z k \{0}, and noticing that a and ε can be taken arbitrarily near 2 and 0, respectively, we obtain the bound in Proposition 8.1. This completes the proof of Proposition 8.1, and also of Theorem 1.2.
Remark 12. We now explain why we had to use Lemma 6.4 in place of Lemma 6.3 in the above proof of Proposition 8.1. One can prove a bound for the L 1 -norm of ∂ ℓ 1 x 1 ∂ ℓ 2 x 2 F c,θ which is very similar to the bound in Lemma 8.4, and in the case c √ y ≤ 1 this leads to a bound 0<θ<π (y| cot θ|≤1) Multiplying this with σ(c) 3/2 √ c and adding over c (cf. (108), (110)) gives (if ℓ 2 > 1 2 ) a bound y (1−ℓ 2 )/2 , which is insufficient. Indeed, Lemma 6.3 requires us to take ℓ 2 as large as 2. Using instead the L 1,2 -norm and Lemma 6.4 means that we can effectively take ℓ 2 to be as small as 1 2 + ε, leading to the final bound y 1 4 − ε 2 . (One could sharpen Lemma 6.3 to a bound of the same style as in Lemma 6.4 but only involving the L 1 -norm; this would allow us to use "ℓ 2 = 1 + ε"; however this would still not be sufficient.) 8.2. The case ξ 2 = 0. The treatment in this case is quite a bit easier than for ξ 1 = 0. We prove the following bound: Proposition 8.7. Let k ≥ 2. Fix a real number ε > 0 and an integer m ≥ max(7, 2k + 1). For any f ∈ C 3m+2 0 (X), h ∈ C 2 (R) with S 1,0,2 (h) < ∞, ξ 1 ∈ R k and 0 < y ≤ 1, we have Note that Theorem 1.3 follows from Proposition 8.7 together with Proposition 7.7 and the relations (48), (49).
Proof. The beginning of the proof of Proposition 8.1 carries over without changes; the first difference is that in place of (100) we get: Interchanging the roles of a and d in the summation, we see that (124) can be alternatively expressed as: where ν(x) = n∈Z h(x + nN ) (a function on R/N Z) and where ( 1) is the same as (1) (cf. p. 16) but using R : for any c and θ appearing above: Using (97) and writing v = sin 2 θ c 2 y , for any ℓ ≥ 0, we get: and thus and by Lemma 7.5 and Lemma 7.6 (using m ≥ 7), this is Adding this bound over R ∈ Γ ′ /Γ ′ and η ∈ B k , using r∈Z k ( q r ) −m ≪ q k−m for every q ∈ Z k \ {0}, and η∈Z 2k \{0} η −m < ∞ (these hold since m > 2k), we obtain the bound in Proposition 8.7. This also completes the proof of Theorem 1.3.

Application to a quantitative Oppenheim result
Our goal in this section is to prove Theorem 1.4, by making Marklof's approach from [24] effective. This will involve an application of Theorem 1.2 at a key step. 9.1. Set-up. Let H = {τ = u + iv ∈ C : v > 0}, the Poincaré upper half plane. Let k be a positive integer and let S(R k ) be the Schwartz space of functions on R k which, together with their derivatives, decrease rapidly at infinity. A central role in the approach of [24] is played by the Jacobi theta sum, Θ f (τ, φ; ξ). It is defined by the following formula, for any f ∈ S(R k ), where, for φ in any interval νπ < φ < (ν + 1)π (ν ∈ Z), f φ is given by the formula [24,.
Cf. [24,Lemmata 4.11,4.12]. For the proof of Theorem 1.4, we will eventually specialize to k = 2: The starting point for the method developed in [24] is the following identity 5 , valid for any f, g ∈ S(R 2 ), h ∈ L 1 (R), T > 0 and ξ 2 ∈ R 2 : 5 Cf. [24,Sec. 2.3], where the identity (129) appears in the special case when f (x) ≡ ψ1( x 2 ), g(x) ≡ ψ2( x 2 ) and using a slightly different notation than in (129). Note that we write h(s) = R h(u)e(−su) du in (129), in line with previous definitions in our paper, whereas a different normalization of h is used in [24, p. 423(top)].
where Q is the inhomogeneous quadratic form on R 4 given by (9) with ξ 2 = α β ∈ R 2 , i.e., The formula (129) follows by replacing Θ f and Θ g by their defining sums (cf. (126)) and changing the order of summation and integration.
The key step in [24] is then to determine the limit of the left hand side of (129) as T → ∞, by using the invariance properties of the function Θ f Θ g and an equidistribution result as in Theorem 1.1 above (with ξ 1 = 0); this is where we will apply our effective result, Theorem 1.2, instead. A central difficulty in [24] comes from the fact that the theta functions Θ f , Θ g are unbounded; thus one needs to truncate the function Θ f Θ g in the cusp before the equidistribution result can be applied, and then bound the error caused by the truncation. In fact it turns out that one picks up an explicit extra contribution from the part of the integral in (129) over a tiny interval |u| ≪ T −(1+ε) , whereas the error caused by the truncation for the remaining part of the integral can be proved to be appropriately small, provided that ξ 2 is Diophantine. The treatment of these matters in [24] is already in principle effective, and so our work concerning the truncation error will essentially only consist in keeping more explicit track on how the bounds in [24] depends on various parameters; cf. in particular Proposition 9.6 below. Also, for the application of Theorem 1.2, we require precise bounds on derivatives of the function Θ f Θ g ; this is worked out in Lemma 9.2 below. 9.2. Bounds for the derivatives of Θ f Θ g . Although we will eventually specialize to k = 2, we will consider a general k ∈ Z + as long as this causes no extra work. We will use the same notation S p,a,n as introduced in the introduction also for the corresponding weighted Sobolev norm of a function f ∈ C n (R k ) with k ≥ 2; namely Here we use standard multi-index notation, i.e. γ runs through k-tuples of nonnegative integers, |γ| = γ 1 + . . . + γ k and ∂ γ = ∂ γ 1 x 1 · · · ∂ γ k x k . It will be convenient to work with the Sobolev norms S 2,a,a on functions in C a (R k ), and we introduce the notation · L 2 a for these. Thus for any integer a ≥ 0 and f ∈ C a (R k ), We note that (cf., e.g., [10,Ch. 8 Combining this relation with the Plancherel Theorem we also have where f (y) = R k f (x)e(−xy) dx is the Fourier transform of f . In (133) and (134), the implied constants only depend on k and a.
Given f ∈ S(R k ), we view f φ (w) as a function on the space R k+1 , given by the coordinates (w, φ).
The following lemma corresponds to [24, lemma 4.3], but extended to arbitrary derivatives of f φ and with the implied constant made more precise.
Using Lemma 9.1, we now obtain bounds on arbitrary derivatives of the function Θ f Θ g ∈ C ∞ (G). Recall that we write ord(D)≤m to denote a sum over all monomials D of degree ≤ m in the fixed basis X 1 , . . . , X 3+2k of g (cf. (18)).
9.3. Bounds on the truncation error. Let us fix, once and for all, a C ∞ function g 1 : Note here that (±Γ ′ ∞ ) = ± 1 n 0 1 : n ∈ Z ; cf. (33). The function X Y is smooth and SL(2, Z)-invariant. For any τ ∈ H, there is (since Y ≥ 1) at most one term in the sum in (139) which gives a non-zero contribution. In particular, X Y (g) ∈ [0, 1] for all g ∈ G. Also, in terms of the cuspidal height function Y, cf. (3), we have X Y (g) = 0 whenever Y(g) ≤ Y and X Y (g) = 1 whenever Y(g) ≥ 2Y . Proof. Since X Y (and thus (DX Y )) is SL(2, Z)-invariant, it suffices to consider points (τ, φ, ξ) with τ = u + iv belonging to the standard fundamental domain for SL(2, Z), i.e. |u| ≤ 1 2 and |v| ≥ 1. Then we may in fact assume v > 1, since otherwise (τ, φ, ξ) is not in the support of X Y . However, for v > 1 we have: . This L ∞ -norm is clearly finite, and independent of Y .
As in [24, 6.4], [25, 6.3], we have the explicit formula The following lemma shows that for an appropriate choice of f * , the function F f * ,Y controls the error when truncating Θ f Θ g at height ≍ Y .
Lemma 9.5. Let A > k. Then for any [κ; c]-Diophantine α ∈ R k , and any D, T ≥ 1, Proof. Since α is [κ; c]-Diophantine, dα + m ≥ cd −κ for all integers d ≥ 1 and m ∈ Z k . Also for each fixed d there is at most one m ∈ Z k in the box −dα + (− 1 2 , 1 2 ) k , and in particular there is at most one m ∈ Z k with dα + m < 1 2 . Hence for d ∈ {1, . . . , D}, where in the last inequality we use the fact that D κ /c > 1 (note that α being [κ; c]-Diophantine implies c ≤ 1 2 ). Adding the above bound over d = 1, . . . , D, we obtain that the left hand side of (143) is ≪ D Aκ+1 (cT ) −A .
To prove another bound on the same sum, for any fixed b ∈ Z, we start by considering the set The distance between any two distinct points in this set is bounded from below by min T qα + n : q ∈ Z, n ∈ Z k , |q| ≤ (cT ) 1/κ , [q = 0 or n = 0] ≥ min T, min where the first inequality holds since α is [κ; c]-Diophantine. (Note also that there is no double representation in (144), i.e. T ((b + d)α + m) is an injective function of d, m ∈ Z × Z k .) Hence for any R ≥ 1, M b contains ≪ k R k points with x ≤ R, and so by a standard dyadic decomposition we have Now by appropriate choices of b, the sum in (143) can be majorized by 1 + D(cT ) −1/κ sums as in (145).
We have thus proved that the left hand side of (143) is always ≪ D Aκ+1 (cT ) −A and also ≪ 1 + D(cT ) −1/κ . Splitting into cases depending on which bound is strongest, we obtain the statement in (143).
The following proposition is an effective version of [24,Prop. 6.5], and is the central result needed to bound the error caused by truncating the function Θ f Θ g in the integral (129). We here specialize to the case k = 2; the case k ≥ 3 involves in principle the same computations however there are several differences in the detailed analysis (cf. [25,Prop. 6.4]).
Proof. Without loss of generality, let us assume that f is positive and even, i.e., f ≥ 0 and f (−w) = f (w). Recall the expansion (141), and note that the terms with g Y (v) vanish since v ≤ Y ; hence we are left with The contribution from the first sum in (147) to the integral in (146) is We discuss the three integrals in (150) one by one. Firstly, note that I 1,c = ∅ implies c ≤ C 1 := (2H) −(κ+A −1 ) c 0 v − 1 2 1/(κ+1+A −1 ) , and for each such c, t ∈ I 1,c implies √ Turning to the integral over I 2,c , the fact that t ∈ I 2,c forces √ t 2 + 1 > (C 1 /c) κ+1+A −1 , with C 1 as above. Therefore, we see that Finally for the integral over I 3,c we have, using only Hence we obtain the bound in (146).
Next, we note that Proposition 9.6 can be extended in a straightforward manner to the case of functions h which do not have compact support but decay appropriately at infinity: Then for any f ∈ C(R 2 ) with S ∞,A,0 (f ) < ∞ and any function h : for j ≥ 1; then apply Proposition 9.7 to bound the contribution from each function h j , using supp h j ⊂ [−2 j , 2 j ] and h j L ∞ ≪ S ∞,2,0 (h)2 −2j . 9.4. Proof of Theorem 1.4. We are now ready to give the proof of Theorem 1.4. The first step is to give an effective rate for the convergence of the integral in (129) to its limit; this is obtained in Proposition 9.10 below. The proof of this proposition is divided into two lemmas, Lemma 9.8 which concerns the part of the integral where u is not very near zero, and Lemma 9.9 which concerns the remaining part. These two lemmas are (in principle) effective versions of [24,Cor. 7.4] and [24,Lemma 8.3], respectively.
Throughout this section, we let Γ = Γ(2) ⋉ Z 4 , and G = SL(2, R) ⋉ (R 2 ) ⊕2 . Recall that Θ f Θ g is a left Γ 2 invariant function on G, with Γ 2 as in (128); thus in particular, it is left invariant under Γ = Γ(2)⋉Z 4 . As always, we let µ be the probability measure on Γ\G induced by an appropriately normalized Haar measure on G (which we also denote by µ). Lemma 9.8. Let f, g ∈ S(R 2 ) and h ∈ C 2 (R), and assume S ∞,3,2 (h) < ∞. Let ξ 2 ∈ R 2 be [κ; c]-Diophantine. Then for any v ∈ (0, 1] and any real number B subject to we have Remark 13. As will be seen in the proof, the (quite small) power 1 127κ which we obtain in (153) depends strongly on which C m a -norm of the test function (i.e., F below) is required to bound in the effective equidistribution result of Theorem 1.2. Since we did not make any effort to optimize the a and m in Theorem 1.2, we do not attempt to optimize the decay rate with respect to v in Lemma 9.8 nor in Lemma 9.9 or Proposition 9.10. Instead we focus on giving results which are simple to state, yet explicit.
Also by (156) we have Here one computes, by a standard unfolding argument (cf. [25, 6.2]), We combine the above bounds with (155), where we also use the fact that δ β,ξ 2 (T ) ≥ T −1 (∀T ≥ 1), which follows by just considering the terms corresponding to r = ±e 1 and j = 1 in (5). We then get, with a = 167: .
We now continue with the proof of Theorem 1.4. Take f ∈ C 1 c (R 4 ) with support contained in the unit ball centered at the origin. We wish to go from (166) to an asymptotic formula for N α,β (f, g, T ). Fix, once and for all, a function φ ∈ C ∞ c (R 2 ) with support contained in the unit ball centered at the origin and satisfying R 2 φ(x) dx = 1. Then for an appropriate number 0 < η < 1 (to be fixed below) we define φ η ∈ C ∞ c (R 2 ) by φ η (x) := η −2 φ(η −1 x), and set f := f * (φ η ⊗ φ η ).
Then define f j ∈ C 1 c (R 4 ) through f j (x) := ϕ j ( x )f (x). Then f (x) = ∞ j=0 f j (x), and it follows that and (To prove the first relation one uses (10); the change of order of summation is justified since we have absolute convergence; ∞ j=0 m∈Z 4 \∆ f j (T −1 m)g(Q(m)) < ∞. This absolute convergence follows from f j L ∞ ≪ S ∞,3,0 (f )2 −3j and the fact that the support of f j is contained in the ball of radius 2 j+1 about the origin, combined with the bound m∈Z 4 \∆ m ≤S g(Q(m)) ≪ S 2 for S large, which follows from (173) by the argument in the proof of Lemma 9.11. The justification of the second relation in (176) is similar but easier.) We also set f j := δ 2 j+1 f j .
Finally we give the proofs of Corollaries 1.5 and 1.6 stated in the introduction.
Remark 15. Of course, the bound in (180) is often wasteful regarding the dependence on a, b. However, recall that we have to keep η, η ′ ≪ 1 in order for the first bound in (180) to be valid, and our main aim in Corollary 1.5 was to give a reasonably simple statement of a general bound with an absolute implied constant, and with a power rate decay with respect to T for any fixed (α, β) subject to a Diophantine condition.
Proof of Corollary 1.6. This can again be derived from Theorem 1.4 by an approximation argument; however it is easier to argue directly from (164), since there m 1 and m 2 appear shifted by ξ 2 , which is exactly what we need. Indeed, let χ : R 2 → {0, 1} be the characteristic function of the open unit ball centered at the origin and let χ (−b/2,−a/2) be the characteristic function of the interval (−b/2, −a/2); then for g 1 = g 2 = χ and h = χ (−b/2,−a/2) , the left hand side of (164) is exactly equal to πR 2 [a, b](T 2 ) (cf. (15)). Now the corollary follows by a similar approximation argument as in the proof of Corollary 1.5.