A study on structure recovery and the broadcasting problem by Vasilki Velona

Vasiliki Velona defended her PhD thesis A study on structure recovery and the broadcasting problem (pdf), supervised by Gabor Lugosi (ICREA and UPF) and Juanjo Rué (IMTech and DMAT), on 17 September 2021 within the UPC doctoral program in Applied Mathematics at the FME. Currently she is a post-doctoral researcher at the Hebrew University of Jerusalem working with Ori Gurel-Gurevich and Ohad Feldheim.

Thesis summary

The thesis contains a study of two problems of combinatorial statistics. The first one is structure-recovering for partial correlation graphs and the second one is the broadcasting problem on certain families of random recursive trees. In a precise language, the structure-recovery problem that we study is the following: given access to individual entries of a covariance matrix S, the objective is to learn the support of the inverse of S (let us denote the inverse by K) using only a small fraction of the entries of S. We call this problem ‘structure recovery’ since the zero-entries of K define the adjacency matrix of a graph (the so-called partial correlation graph). As an example of why this is an important graph to learn, consider that S corresponds to a Gaussian random vector (X1, . . . , Xn). Then an entry sij is zero if and only if Xi and Xj are independent given the rest of the random variables. A series of algorithms is proposed to address the aforementioned question, assuming that our graphs satisfy certain sparsity conditions. The sparsity that is assumed is related to how much our graph resembles a tree; in particular we deal with trees, graphs of two small connected components, and graphs of small treewidth. The proposed algorithms can also be used to estimate K and not only learn the partial correlation graph. Moreover, they can be used to invert any symmetric positive definite matrix since the analysis can be detached from its statistical connection and impact. The motivation for the use of covariance entries is that S might be too large to be stored, as it often happens in statistical settings. In fact, our goal is to learn the partial correlation graph using a sub-quadratic number of queries, since a quadratic time is needed just to write down and store the covariance matrix — this is the starting point for a big part of the literature. The desired complexity bounds are achieved through our analysis.

Moving to the second problem under study, we consider a broadcasting process on a graph to be the propagation of a message (let us say a bit value in {0, 1}) from one node to all the rest, possibly corrupted. Our goal is to guess the initial message. We consider that our graph is a tree created dynamically at times 0, 1, . . . , n, in a way that at time i the i-th vertex enters the system and attaches with an edge to an existing vertex j (we then write i ∼ j). We are interested in the case where i attaches uniformly at random to an existing vertex (uniform attachment) or where i attaches to a vertex with probability proportional to the outdegree of an existing vertex, plus some parameter β > 0. The broadcasting process we consider is one where vertex 0 (the root) has a bit value that is propagated correctly to its neighbours with probability 1 − q and incorrectly with probability q. The broadcasting problem under study can be formulated in this way: given access to a random tree produced by either uniform attachment or preferential attachment and the bit values of the vertices, but without observing the time labels of the vertices, recover the bit of vertex zero. In a more difficult variant, we answer the same question given only the bits of vertices with outdegree zero (the leaves). In both variants of the problem in both models, we
characterize the values of q for which the optimal reconstruction method has a probability of error bounded away from 1/2. We also show that the probability of error is bounded by a constant times q. Two simple reconstruction rules are analyzed in detail. One of them is the simple majority vote, the other is the bit value of the centroid of the tree (or the closest leaf to the centroid). We also analyze a third reconstruction rule which is more complex but works for all q where reconstruction is theoretically possible.

Highlighted publication: Gábor Lugosi, Jakub Truszkowski, Vasiliki Velona and Piotr Zwiernik. “Learning partial correlation graphs and graphical models by covariance queries”, Journal of Machine Learning Research 22, September 2021, 1-41 (pdf).