腾讯 AI Lab 官网

腾讯 AI Lab 官网
论文

On Quadratic Convergence of DC Proximal Newton Algorithm in Nonconvex Sparse Learning

We propose a DC proximal Newton algorithm for solving nonconvex regularized sparse learning problems in high dimensions. Our proposed algorithm integrates the proximal Newton algorithm with multi-stage convex relaxation based on difference of convex (DC) programming, and enjoys both strong computational and statistical guarantees. Specifically, by leveraging a sophisticated characterization of sparse modeling structures/assumptions (i.e., local restricted strong convexity and Hessian smoothness), we prove that within each stage of convex relaxation, our proposed algorithm achieves (local) quadratic convergence, and eventually obtains a sparse approximate local optimum with optimal statistical properties after only a few convex relaxations. Numerical experiments are provided to support our theory...

NIPS · Dec 2017

Learning Non-Gaussian Multi-Index Model via Second-Order Stein’s Method

We consider estimating the parametric components of semiparametric multi-index models in high dimensions. To bypass the requirements of Gaussianity or elliptical symmetry of covariates in existing methods, we propose to leverage a second-order Stein’s method with score function-based corrections. We prove that our estimator achieves a near-optimal statistical rate of convergence even when the score function or the response variable is heavy-tailed. To establish the key concentration results, we develop a data-driven truncation argument that may be of independent interest. We supplement our theoretical findings with simulations...

NIPS · Dec 2017

Homotopy Parametric Simplex Method for Sparse Learning

High dimensional sparse learning has imposed a great computational challenge to large scale data analysis. In this paper, we are interested in a broad class of sparse learning approaches formulated as linear programs parametrized by a regularization factor, and solve them by the parametric simplex method (PSM). Our parametric simplex method offers significant advantages over other competing methods: (1) PSM naturally obtains the complete solution path for all values of the regularization parameter; (2) PSM provides a high precision dual certificate stopping criterion; (3) PSM yields sparse solutions through very few iterations, and the solution sparsity significantly reduces the computational cost per iteration. Particularly, we demonstrate the superiority of PSM over various sparse learning approaches, including Dantzig selector for sparse linear regression, LAD-Lasso for sparse robust linear regression, CLIME for sparse precision matrix estimation, sparse differential network estimation, and sparse Linear Programming Discriminant (LPD) analysis. We then provide sufficient conditions under which PSM always outputs sparse solutions such that its computational performance can be significantly boosted. Thorough numerical experiments are provided to demonstrate the outstanding performance of the PSM method...

NIPS · Dec 2017

Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent

Most distributed machine learning systems nowadays, including TensorFlow and CNTK, are built in a centralized fashion. One bottleneck of centralized algorithms lies on high communication cost on the central node. Motivated by this, we ask, can decentralized algorithms be faster than its centralized counterpart?..

NIPS · Dec 2017

Efficient Optimization for Linear Dynamical Systems with Applications to Clustering and Sparse Coding

Linear Dynamical Systems (LDSs) are fundamental tools for modeling spatio-temporal data in various disciplines. Though rich in modeling, analyzing LDSs is not free of difficulty, mainly because LDSs do not comply with Euclidean geometry and hence conventional learning techniques can not be applied directly. In this paper, we propose an efficient projected gradient descent method to minimize a general form of a loss function and demonstrate how clustering and sparse coding with LDSs can be solved by the proposed method efficiently. To this end, we first derive a novel canonical form for representing the parameters of an LDS, and then show how gradient-descent updates through the projection on the space of LDSs can be achieved dexterously. In contrast to previous studies, our solution avoids any approximation in LDS modeling or during the optimization process. Extensive experiments reveal the superior performance of the proposed method in terms of the convergence and classification accuracy over state-of-the-art techniques...

NIPS · Dec 2017

Predicting Scene Parsing and Motion Dynamics in the Future

The ability of predicting the future is important for intelligent systems, e.g. autonomous vehicles and robots to plan early and make decisions accordingly. Future scene parsing and optical flow estimation are two key tasks that help agents better understand their environments as the former provides dense semantic information, i.e. what objects will be present and where they will appear, while the latter provides dense motion information, i.e. how the objects will move. In this paper, we propose a novel model to simultaneously predict scene parsing and optical flow in unobserved future video frames. To our best knowledge, this is the first attempt in jointly predicting scene parsing and motion dynamics. In particular, scene parsing enables structured motion prediction by decomposing optical flow into different groups while optical flow estimation brings reliable pixel-wise correspondence to scene parsing. By exploiting this mutually beneficial relationship, our model shows significantly better parsing and motion prediction results when compared to well-established baselines and individual prediction models on the large-scale Cityscapes dataset. In addition, we also demonstrate that our model can be used to predict the steering angle of the vehicles, which further verifies the ability of our model to learn latent representations of scene dynamics...

NIPS · Dec 2017

Mixture-Rank Matrix Approximation for Collaborative Filtering

Low-rank matrix approximation (LRMA) methods have achieved excellent accuracy among today’s collaborative filtering (CF) methods. In existing LRMA methods, the rank of user/item feature matrices is typically fixed, i.e., the same rank is adopted to describe all users/items. However, our studies show that submatrices with different ranks could coexist in the same user-item rating matrix, so that approximations with fixed ranks cannot perfectly describe the internal structures of the rating matrix, therefore leading to inferior recommendation accuracy. In this paper, a mixture-rank matrix approximation (MRMA) method is proposed, in which user-item ratings can be characterized by a mixture of LRMA models with different ranks. Meanwhile, a learning algorithm capitalizing on iterated condition modes is proposed to tackle the non-convex optimization problem pertaining to MRMA. Experimental studies on MovieLens and Netflix datasets demonstrate that MRMA can outperform six state-of-the-art LRMA-based CF methods in terms of recommendation accuracy...

NIPS · Dec 2017

Geometric Descent Method for Convex Composite Minimization

In this paper, we extend the geometric descent method recently proposed by Bubeck, Lee and Singh to tackle nonsmooth and strongly convex composite problems. We prove that our proposed algorithm, dubbed geometric proximal gradient method (GeoPG), converges with a linear rate (1-1⁄√κ) and thus achieves the optimal rate among first-order methods, where κ is the condition number of the problem. Numerical results on linear regression and logistic regression with elastic net regularization show that GeoPG compares favorably with Nesterov’s accelerated proximal gradient method, especially when the problem is ill-conditioned...

NIPS · Dec 2017