The Sum-of-Squares Algorithmic Paradigm in Statistics

(STATS 314a: "Advanced Statistical Theory", Spring 2022)
Logistics

The best way to contact me is by email. Please be sure to include "STATS 314" in the subject line.

Prerequisites: The most important prerequisite is mathematical maturity. I expect a solid background in probability (the material in STATS 116 is technically sufficient, but students will have ideally built upon this knowledge in other courses). Coursework in algorithms (such as CS161) or machine learning (such as STATS/CS 229, STATS 214) is helpful as well---I will not assume this knowledge, but I'll give background at a fast pace.

Overview

This course will introduce and explore Sum-of-Squares (SoS) algorithms in the context of statistics. In recent years, the powerful SoS "proofs-to-algorithms" paradigm has led to numerous breakthroughs in algorithmic statistics, yielding polynomial-time algorithms with provable guarantees for previously unreachable problems including learning latent variable models (via tensor decomposition), clustering, numerous problems in robust statistics, and more.

The course will begin with introductory lectures designed to bring the uninitiated up to speed: we'll introduce the SoS "proofs-to-algorithms" paradigm, as well as the underlying tools from convex optimization (semidefinite programming). We will then cover a number of applications, including clustering, robust mean estimation, robust regression, mean-field approximations of Ising models, and tensor decompositions for learning latent variable models. In the wake of these applications, an important question is how to make these algorithms not only polynomial-time but practical, and we will see the current best efforts on this front. Finally, we'll also explore the ways in which SoS has informed the study of information-computation tradeoffs, or the interaction of computational and statistical resources.

Topics

We will cover the following topics, roughly partitioned into units as follows:

  1. Introduction to the sum-of-squares paradigm, semidefinite programming, and information-computation tradeoffs, via illustrative examples.
  2. Clustering. Clustering mixtures of Gaussians and other distributions with SoS certificates of hypercontractivity.
  3. Global correlation rounding & Ising models. A general technique for rounding sum-of-squares relaxations, with application to mean-field approximations in Ising models.
  4. Robust statistics. Brief reprise of robust mean estimation. Robust linear regression, and list-decodable linear regression. Heavy-tailed mean estimation.
  5. Tensor decomposition. Tensor decomposition is a useful primitive in learning latent variable models, such as dictionary learning and topic models. SoS algorithms provide the best guarantees known for tensor decomposition in a number of settings.
  6. Practical implementations of SoS. SoS provides strong guarantees in polynomial time, but because these algorithms rely on solving large semidefinite programs, running them "off the shelf" is impractical. We'll see some cases in which researchers have developed practical versions of SoS algorithms.
  7. Lower Bounds. For many problems in algorithmic statistics, no polynomial-time algorithms are known, even when the problems are known to be information-theoretically solvable. Lower bounds against sum-of-squares algorithms give evidence that these problems may be computationally hard. The study of SoS lower bounds has informed the study of information-computation tradeoffs, leading to the "low-degree conjecture;" we will survey these developments.
Schedule

Below is a preliminary schedule. It will be updated as we progress through the course.

Date Topic Notes
3/28 Introduction: robust mean estimation. [notes] online
3/30 Semidefinite programming and the SoS algorithm [notes] online
4/04 Matrix and tensor completion Section 2.2 of this survey covers the lecture content. Handwritten notes available on Canvas, until typed lecture notes are posted.
4/06 Community detection in block models & information-computation tradeoffs [notes] Emmanuel Abbe's survey is a great reference, with a thorough (though 5 year old) bibliography.
4/11 Clustering mixtures of Gaussians [notes]
- Homework 0 available on Canvas, due Monday April 18 at 5:00pm Pacific.
4/13 Clustering 2, certifiable hypercontractivity (same notes as previous lecture)
4/18 Global correlation rounding [notes] Guest lecture by Frederic Koehler
4/20 Mean-field approximations in Ising Models Guest lecture by Frederic Koehler, see this paper
4/25 Robust linear regression [notes]
4/27 List-decodable linear regression The lecture is based on the paper of Karmalkar, Klivans, and Kothari. The concurrent paper of Raghavendra and Yau obtains similar results.
5/02 List-decodable regression cont. See the survey article of Monique Laurent for details on how to certify the non-negativity of univariate polynomials over an interval (section 3.6).
5/04 Tensor decomposition for learning latent variable models (quasi-polynomial time dictionary learning) Quasi-polynomial time SoS algorithms for dictionary learning via tensor decomposition, a representative paper about using tensor decompositions to learn latent variable models (one of several).
5/09 Efficient SoS algorithms for decomposing random overcomplete tensors Until the new lecture notes are up, see these notes from last year's STATS 319. The relevant references are this paper by Ge and Ma and this paper by Ma, Shi, Steurer. Also see Tropp's matrix concentration survey.
5/11 Making SoS practical 1: spectral algorithms for overcomplete tensor decomposition The lecture was based on this recent paper of Ding, d'Orsi, Liu, Tiegel, and Steurer, which builds on this paper and this paper.
5/16 Making SoS practical 2: fast SDP solvers See this paper about solving constraint satisfaction problems and this paper about robust mean estimation as two examples.
5/18 SoS & lower bounds 1: 3-XOR Lecture notes from the course by Barak and Steurer.
5/23 SoS & lower bounds 2: planted clique and pseudocalibration Notes on lower bound for planted clique from the course by Barak and Steurer. See also this survey (section 3) on pseudocalibration, and this survey and Sam Hopkins' thesis for a conjuecture about the computational complexity of hypothesis testing.
5/25 Student Presentations
5/30 NO CLASS: Memorial Day
6/01 Student Presentations
Materials & Resources

Additional recommended readings and videos will be posted on a per-lecture basis, in the "notes" field of the schedule above.