# Paper Digest: COLT 2014 Highlights

Readers can also choose to read this highlight article on our console, which allows users to filter out papers using keywords and find related papers.

The Annual Conference on Learning Theory (COLT) focuses on addressing theoretical aspects of machine learing and related topics.

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

If you do not want to miss any interesting academic paper, you are welcome to **sign up our free daily paper digest service ** to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team

team@paperdigest.org

#### TABLE 1: COLT 2014 Papers

Title | Authors | Highlight | |
---|---|---|---|

1 | Preface | Maria Florina Balcan, Csaba Szepesv�ri | Preface |

2 | Open Problem: Tightness of maximum likelihood semidefinite relaxations | Afonso S. Bandeira, Yuehaw Khoo, Amit Singer | As an illustrative example, we focus on the generalized Procrustes problem. |

3 | Open Problem: A (Missing) Boosting-type Convergence Result for AdaBoost.MH with Factorized Multi-class Classifiers | Bal�zs K�gl | In this open problem paper we take a step back to the basic setup of boosting generic multi-class factorized (Hamming) classifiers (so no trees), and state the classical problem of boosting-like convergence of the training error. |

4 | Open Problem: Finding Good Cascade Sampling Processes for the Network Inference Problem | Manuel Gomez-Rodriguez, Le Song, Bernhard Schoelkopf | Information spreads across social and technological networks, but often the network structures are hidden and we only observe the traces left by the diffusion processes, called cascades. |

5 | Open Problem: Tensor Decompositions: Algorithms up to the Uniqueness Threshold? | Aditya Bhaskara, Moses Charikar, Ankur Moitra, Aravindan Vijayaraghavan | Open Problem: Tensor Decompositions: Algorithms up to the Uniqueness Threshold? |

6 | Open Problem: The Statistical Query Complexity of Learning Sparse Halfspaces | Vitaly Feldman | We propose a potentially easier question: what is the query complexity of this learning problem in the statistical query (SQ) model of Kearns (1998). |

7 | Open Problem: Online Local Learning | Paul Christiano | The question we pose is: how general is this phenomenon? |

8 | Open Problem: Shifting Experts on Easy Data | Manfred K. Warmuth, Wouter M. Koolen | In the full information setting, the FlipFlop algorithm by De Rooij et al. (2014) combines the best of the iid optimal Follow-The-Leader (FL) and the worst-case-safe Hedge algorithms, whereas in the bandit information case SAO by Bubeck and Slivkins (2012) competes with the iid optimal UCB and the worst-case-safe EXP3. |

9 | Open Problem: Efficient Online Sparse Regression | Satyen Kale | We provide one natural formulation as an online sparse regression problem with squared loss, and ask whether it is possible to achieve sublinear regret with efficient algorithms (i.e. polynomial running time in the natural parameters of the problem). |

10 | Distribution-independent Reliable Learning | Varun Kanade, Justin Thaler | We study several questions in the \emphreliable agnostic learning framework of Kalai et al. (2009), which captures learning tasks in which one type of error is costlier than other types. |

11 | Learning without concentration | Shahar Mendelson | We obtain sharp bounds on the convergence rate of Empirical Risk Minimization performed in a convex class and with respect to the squared loss, without any boundedness assumptions on class members or on the target. |

12 | Uniqueness of Ordinal Embedding | Matth�us Kleindessner, Ulrike Luxburg | Uniqueness of Ordinal Embedding |

13 | Bayes-Optimal Scorers for Bipartite Ranking | Aditya Krishna Menon, Robert C. Williamson | We address the following seemingly simple question: what is the Bayes-optimal scorer for a bipartite ranking risk? |

14 | Multiarmed Bandits With Limited Expert Advice | Satyen Kale | We consider the problem of minimizing regret in the setting of advice-efficient multiarmed bandits with expert advice. |

15 | Learning Sparsely Used Overcomplete Dictionaries | Alekh Agarwal, Animashree Anandkumar, Prateek Jain, Praneeth Netrapalli, Rashish Tandon | We consider the problem of learning sparsely used overcomplete dictionaries, where each observation is a sparse combination of elements from an unknown overcomplete dictionary. |

16 | Community Detection via Random and Adaptive Sampling | Se-Young Yun, Alexandre Proutiere | In this paper, we consider networks consisting of a finite number of non-overlapping communities. |

17 | A second-order bound with excess losses | Pierre Gaillard, Gilles Stoltz, Tim van Erven | A second-order bound with excess losses |

18 | Logistic Regression: Tight Bounds for Stochastic and Online Optimization | Elad Hazan, Tomer Koren, Kfir Y. Levy | In this paper we investigate the question of whether these smoothness and convexity properties make the logistic loss preferable to other widely considered options such as the hinge loss. |

19 | Higher-Order Regret Bounds with Switching Costs | Eyal Gofer | This work examines online linear optimization with full information and switching costs (SCs) and focuses on regret bounds that depend on properties of the loss sequences. |

20 | The Complexity of Learning Halfspaces using Generalized Linear Methods | Amit Daniely, Nati Linial, Shai Shalev-Shwartz | We study the performance of this approach in the problem of (agnostically and improperly) learning halfspaces with margin γ. |

21 | Optimal learners for multiclass problems | Amit Daniely, Shai Shalev-Shwartz | In this paper we seek for a generic optimal learner for \emphmulticlass prediction. |

22 | Stochastic Regret Minimization via Thompson Sampling | Sudipto Guha, Kamesh Munagala | Our goal in this paper is to make progress towards understanding the empirical success of this policy. |

23 | Approachability in unknown games: Online learning meets multi-objective optimization | Shie Mannor, Vianney Perchet, Gilles Stoltz | We revisit the classical setting and consider the setting where the player has a preference relation between target sets: she wishes to approach the smallest (“best”) set possible given the observed average payoffs in hindsight. |

24 | Belief propagation, robust reconstruction and optimal recovery of block models | Elchanan Mossel, Joe Neeman, Allan Sly | We consider the problem of reconstructing sparse symmetric block models with two blocks and connection probabilities a/n and b/n for inter- and intra-block edge probabilities respectively. |

25 | Sample Compression for Multi-label Concept Classes | Rahim Samei, Pavel Semukhin, Boting Yang, Sandra Zilles | For a specific extension of the notion of VC-dimension to multi-label classes, we prove that every maximum multi-label class of dimension d has a sample compression scheme in which every sample is compressed to a subset of size at most d. |

26 | Finding a most biased coin with fewest flips | Karthekeyan Chandrasekaran, Richard Karp | We study the problem of learning a most biased coin among a set of coins by tossing the coins adaptively. |

27 | Volumetric Spanners: an Efficient Exploration Basis for Learning | Elad Hazan, Zohar Karnin, Raghu Meka | We define a novel geometric notion of exploration basis with low variance called volumetric spanners, and give efficient algorithms to construct such bases. |

28 | lil� UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits | Kevin Jamieson, Matthew Malloy, Robert Nowak, S�bastien Bubeck | The paper proposes a novel upper confidence bound (UCB) procedure for identifying the arm with the largest mean in a multi-armed bandit game in the fixed confidence setting using a small number of total samples. |

29 | An Inequality with Applications to Structured Sparsity and Multitask Dictionary Learning | Andreas Maurer, Massimiliano Pontil, Bernardino Romera-Paredes | An Inequality with Applications to Structured Sparsity and Multitask Dictionary Learning |

30 | On the Complexity of A/B Testing | Emilie Kaufmann, Olivier Capp�, Aur�lien Garivier | When the distribution of the outcomes are Gaussian, we prove that the complexity of the fixed-confidence and fixed-budget settings are equivalent, and that uniform sampling of both alternatives is optimal only in the case of equal variances. |

31 | Elicitation and Identification of Properties | Ingo Steinwart, Chlo� Pasin, Robert Williamson, Siyu Zhang | We extend existing results to characterize the elicitability of properties in a general setting. |

32 | The sample complexity of agnostic learning under deterministic labels | Shai Ben-David, Ruth Urner | For any d, we present classes of VC-dimension d that are learnable from \tilde O(d/ε)-many samples and classes that require samples of size Ω(d/ε^2). |

33 | Density-preserving quantization with application to graph downsampling | Morteza Alamgir, G�bor Lugosi, Ulrike Luxburg | We consider the problem of vector quantization of i.i.d. samples drawn from a density p on \mathbbR^d. |

34 | A Convex Formulation for Mixed Regression with Two Components: Minimax Optimal Rates | Yudong Chen, Xinyang Yi, Constantine Caramanis | We consider the mixed regression problem with two components, under adversarial and stochastic noise. |

35 | Efficiency of conformalized ridge regression | Evgeny Burnaev, Vladimir Vovk | In this paper we explore the degree to which this additional requirement of efficiency is satisfied in the case of Bayesian ridge regression; we find that asymptotically conformal prediction sets differ little from ridge regression prediction intervals when the standard Bayesian assumptions are satisfied. |

36 | Most Correlated Arms Identification | Che-Yu Liu, S�bastien Bubeck | We study the problem of finding the most mutually correlated arms among many arms. |

37 | Fast matrix completion without the condition number | Moritz Hardt, Mary Wootters | We give the first algorithm for Matrix Completion that achieves running time and sample complexity that is polynomial in the rank of the unknown target matrix, \emphlinear in the dimension of the matrix, and \emphlogarithmic in the condition number of the matrix. |

38 | Learning Coverage Functions and Private Release of Marginals | Vitaly Feldman, Pravesh Kothari | We study the problem of approximating and learning coverage functions. |

39 | Computational Limits for Matrix Completion | Moritz Hardt, Raghu Meka, Prasad Raghavendra, Benjamin Weitz | On the technical side, we contribute several new ideas on how to encode hard combinatorial problems in low-rank optimization problems. |

40 | Robust Multi-objective Learning with Mentor Feedback | Alekh Agarwal, Ashwinkumar Badanidiyuru, Miroslav Dud�k, Robert E. Schapire, Aleksandrs Slivkins | We present an algorithm with a vanishing regret compared with the optimal possible improvement, and show that our regret bound is the best possible. |

41 | Uniqueness of Tensor Decompositions with Applications to Polynomial Identifiability | Aditya Bhaskara, Moses Charikar, Aravindan Vijayaraghavan | Given the importance of Kruskal’s theorem in the tensor literature, we expect that our robust version will have several applications beyond the settings we explore in this work. |

42 | New Algorithms for Learning Incoherent and Overcomplete Dictionaries | Sanjeev Arora, Rong Ge, Ankur Moitra | This paper presents a polynomial-time algorithm for learning overcomplete dictionaries; the only previously known algorithm with provable guarantees is the recent work of Spielman et al. (2012) who who gave an algorithm for the undercomplete case, which is rarely the case in applications. |

43 | Online Linear Optimization via Smoothing | Jacob Abernethy, Chansoo Lee, Abhinav Sinha, Ambuj Tewari | We present a new optimization-theoretic approach to analyzing Follow-the-Leader style algorithms, particularly in the setting where perturbations are used as a tool for regularization. |

44 | Learning Mixtures of Discrete Product Distributions using Spectral Decompositions | Prateek Jain, Sewoong Oh | In this paper, we introduce a polynomial time/sample complexity method for learning a mixture of r discrete product distributions over {1, 2, …, \ell}^n, for general \ell and r. |

45 | Localized Complexities for Transductive Learning | Ilya Tolstikhin, Gilles Blanchard, Marius Kloft | We give a preliminary analysis of the localized complexities for the prominent case of kernel classes. |

46 | On the Consistency of Output Code Based Learning Algorithms for Multiclass Learning Problems | Harish G. Ramaswamy, Balaji Srinivasan Babu, Shivani Agarwal, Robert C. Williamson | In this paper, we consider the question of statistical consistency of such methods. |

47 | Edge Label Inference in Generalized Stochastic Block Models: from Spectral Theory to Impossibility Results | Jiaming Xu, Laurent Massouli�, Marc Lelarge | We propose a computationally efficient spectral algorithm and show it allows for asymptotically correct inference when the average node degree could be as low as logarithmic in the total number of nodes. |

48 | Lower Bounds on the Performance of Polynomial-time Algorithms for Sparse Linear Regression | Yuchen Zhang, Martin J. Wainwright, Michael I. Jordan | Under a standard assumption in complexity theory (NP not in P/poly), we demonstrate a gap between the minimax prediction risk for sparse linear regression that can be achieved by polynomial-time algorithms, and that achieved by optimal algorithms. |

49 | Follow the Leader with Dropout Perturbations | Tim Van Erven, Wojciech Kotlowski, Manfred K. Warmuth | We consider online prediction with expert advice. |

50 | Lipschitz Bandits: Regret Lower Bound and Optimal Algorithms | Stefan Magureanu, Richard Combes, Alexandre Proutiere | For discrete Lipschitz bandits, we derive asymptotic problem specific lower bounds for the regret satisfied by any algorithm, and propose OSLB and CKL-UCB, two algorithms that efficiently exploit the Lipschitz structure of the problem. |

51 | Sample Complexity Bounds on Differentially Private Learning via Communication Complexity | Vitaly Feldman, David Xiao | In this work we analyze the sample complexity of classification by differentially private algorithms. |

52 | Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations | H. Brendan McMahan, Francesco Orabona | We study algorithms for online linear optimization in Hilbert spaces, focusing on the case where the player is unconstrained. |

53 | Principal Component Analysis and Higher Correlations for Distributed Data | Ravi Kannan, Santosh Vempala, David Woodruff | We present algorithms for two illustrative problems on massive data sets: (1) computing a low-rank approximation of a matrix A=A^1 + A^2 + \ldots + A^s, with matrix A^t stored on server t and (2) computing a function of a vector a_1 + a_2 + \ldots + a_s, where server t has the vector a_t; this includes the well-studied special case of computing frequency moments and separable functions, as well as higher-order correlations such as the number of subgraphs of a specified type occurring in a graph. |

54 | Compressed Counting Meets Compressed Sensing | Ping Li, Cun-Hui Zhang, Tong Zhang | By observing that natural signals (e.g., images or network data) are often nonnegative, we propose a framework for nonnegative signal recovery using \em Compressed Counting (CC). |

55 | The Geometry of Losses | Robert C. Williamson | In doing so we show a formal connection between proper losses and norms. |

56 | Resourceful Contextual Bandits | Ashwinkumar Badanidiyuru, John Langford, Aleksandrs Slivkins | We design the first algorithm for solving these problems that improves over a trivial reduction to the non-contextual case. |

57 | The More, the Merrier: the Blessing of Dimensionality for Learning Large Gaussian Mixtures | Joseph Anderson, Mikhail Belkin, Navin Goyal, Luis Rademacher, James Voss | In this paper we show that very large mixtures of Gaussians are efficiently learnable in high dimension. |

58 | Near-Optimal Herding | Nick Harvey, Samira Samadi | We present a new polynomial-time algorithm that solves the sampling problem with error O\left(\sqrtd \log^2.5|\mathcalX| / t \right) assuming that \mathcalX is finite. |

59 | Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians | Constantinos Daskalakis, Gautam Kamath | One of our main contributions is an improved and generalized algorithm for selecting a good candidate distribution from among competing hypotheses. |

60 | Online Learning with Composite Loss Functions | Ofer Dekel, Jian Ding, Tomer Koren, Yuval Peres | We study a new class of online learning problems where each of the online algorithm’s actions is assigned an adversarial value, and the loss of the algorithm at each step is a known and deterministic function of the values assigned to its recent actions. |

61 | Online Non-Parametric Regression | Alexander Rakhlin, Karthik Sridharan | We establish optimal rates for online regression for arbitrary classes of regression functions in terms of the sequential entropy introduced in (Rakhlin et al., 2010). |