# Optimality guarantees for distributed statistical estimation

@article{Duchi2014OptimalityGF, title={Optimality guarantees for distributed statistical estimation}, author={John C. Duchi and Michael I. Jordan and Martin J. Wainwright and Yuchen Zhang}, journal={arXiv: Information Theory}, year={2014} }

Large data sets often require performing distributed statistical estimation, with a full data set split across multiple machines and limited communication between machines. To study such scenarios, we define and study some refinements of the classical minimax risk that apply to distributed settings, comparing to the performance of estimators with access to the entire data. Lower bounds on these quantities provide a precise characterization of the minimum amount of communication required to… Expand

#### Figures and Topics from this paper

#### 46 Citations

Communication-Efficient Distributed Learning of Discrete Probability Distributions

- 2017

We initiate a systematic investigation of distribution learning (density estimation) when the data is distributed across multiple servers. The servers must communicate with a referee and the goal is… Expand

Probabilistic Error Upper Bounds For Distributed Statistical Estimation

- 2017

The size of modern datasets has spurred interest in distributed statistical estimation. We consider a scenario in which randomly drawn data is spread across a set of machines, and the task is to… Expand

Converses for distributed estimation via strong data processing inequalities

- Mathematics, Computer Science
- 2015 IEEE International Symposium on Information Theory (ISIT)
- 2015

We consider the problem of distributed estimation, where local processors observe independent samples conditioned on a common random parameter of interest, map the observations to a finite number of… Expand

Information-Theoretic Lower Bounds on Bayes Risk in Decentralized Estimation

- Mathematics, Computer Science
- IEEE Transactions on Information Theory
- 2017

We derive lower bounds on the Bayes risk in decentralized estimation, where the estimator does not have direct access to the random samples generated conditionally on the random parameter of… Expand

Distributed Statistical Estimation and Rates of Convergence in Normal Approximation

- Mathematics, Computer Science
- Electronic Journal of Statistics
- 2019

It is shown that one of the key benefits of the divide-and-conquer strategy is robustness, an important characteristic for large distributed systems, and connections between performance of these distributed algorithms and the rates of convergence in normal approximation are established. Expand

Distributed linear regression by averaging

- Mathematics
- 2018

Distributed statistical learning problems arise commonly when dealing with large datasets. In this setup, datasets are partitioned over machines, which compute locally, and communicate short… Expand

Efficient Estimation for Generalized Linear Models on a Distributed System with Nonrandomly Distributed Data

- Computer Science, Mathematics
- 2020

This work develops a Pseudo-Newton-Raphson algorithm for efficient estimation of generalized linear models on a distributed system with nonrandomly distributed data and develops a likelihood ratio test for hypothesis testing based on the one-step estimator. Expand

Wyner-Ziv Estimators: Efficient Distributed Mean Estimation with Side Information

- Computer Science, Mathematics
- AISTATS
- 2021

Without any probabilistic assumptions on the underlying data, the problem of distributed mean estimation where the server has access to side information is studied and Wyner-Ziv estimators are proposed, which are communication and computationally efficient and near-optimal when an upper bound for the distance between the side information and the data is known. Expand

Communication Complexity of Distributed Statistical Algorithms

- Computer Science
- 2015

This paper constructs bounds on the minimax risk under loss functions when statistical estimation is performed in a distributed environment and with communication constraints. We treat this problem… Expand

Distributed Learning with Sublinear Communication

- Computer Science, Mathematics
- ICML
- 2019

By slightly relaxing the standard boundedness assumptions for linear models, a family of algorithms that combine mirror descent with randomized sparsification/quantization of iterates are obtained and this result extends to the general stochastic convex optimization model. Expand

#### References

SHOWING 1-10 OF 45 REFERENCES

Information-theoretic lower bounds for distributed statistical estimation with communication constraints

- Computer Science, Mathematics
- NIPS
- 2013

Lower bounds on minimax risks for distributed statistical estimation under a communication budget are established for several problems, including various types of location models, as well as for parameter estimation in regression models. Expand

Communication-efficient algorithms for statistical optimization

- Computer Science, Mathematics
- 2012 IEEE 51st IEEE Conference on Decision and Control (CDC)
- 2012

A sharp analysis of this average mixture algorithm is provided, showing that under a reasonable set of conditions, the combined parameter achieves mean-squared error that decays as O(N-1 + (N/m)-2). Expand

Local Privacy, Data Processing Inequalities, and Statistical Minimax Rates

- Mathematics, Computer Science
- 2013

This work proves bounds on information-theoretic quantities, including mutual information and Kullback-Leibler divergence, that depend on the privacy guarantees, and provides a treatment of several canonical families of problems: mean estimation, parameter estimation in fixed-design regression, multinomial probability estimation, and nonparametric density estimation. Expand

Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling

- Mathematics, Computer Science
- IEEE Transactions on Automatic Control
- 2012

This work develops and analyze distributed algorithms based on dual subgradient averaging and provides sharp bounds on their convergence rates as a function of the network size and topology, and shows that the number of iterations required by the algorithm scales inversely in the spectral gap of thenetwork. Expand

Universal decentralized estimation in a bandwidth constrained sensor network

- Computer Science
- IEEE Transactions on Information Theory
- 2005

This correspondence investigates optimal local estimation and final fusion schemes under the constraint that the communication from each sensor to the fusion center must be a one-bit message. Expand

Statistical estimation : asymptotic theory

- Mathematics
- 1981

when certain parameters in the problem tend to limiting values (for example, when the sample size increases indefinitely, the intensity of the noise ap proaches zero, etc.) To address the problem of… Expand

Decentralized Detection'

- 1993

Consider a set of sensors that receive observations from the environment and transmit finite-valued messages to a fusion center that makes a final decision on one out of M alternative hypotheses. The… Expand

Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers

- Computer Science
- Found. Trends Mach. Learn.
- 2011

It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas. Expand

Distributed Learning, Communication Complexity and Privacy

- Computer Science, Mathematics
- COLT
- 2012

General upper and lower bounds on the amount of communication needed to learn well are provided, showing that in addition to VC- dimension and covering number, quantities such as the teaching-dimension and mistake-bound of a class play an important role. Expand

An information statistics approach to data stream and communication complexity

- Computer Science, Mathematics
- J. Comput. Syst. Sci.
- 2004

An improved lower bound for the multi-party set-disjointness problem in the general communication complexity model, and a nearly optimal lower bound in the one-way communication model. Expand