Biblio
Deep learning is a highly effective machine learning technique for large-scale problems. The optimization of nonconvex functions in deep learning literature is typically restricted to the class of first-order algorithms. These methods rely on gradient information because of the computational complexity associated with the second derivative Hessian matrix inversion and the memory storage required in large scale data problems. The reward for using second derivative information is that the methods can result in improved convergence properties for problems typically found in a non-convex setting such as saddle points and local minima. In this paper we introduce TRMinATR - an algorithm based on the limited memory BFGS quasi-Newton method using trust region - as an alternative to gradient descent methods. TRMinATR bridges the disparity between first order methods and second order methods by continuing to use gradient information to calculate Hessian approximations. We provide empirical results on the classification task of the MNIST dataset and show robust convergence with preferred generalization characteristics.
Sequential recommendation algorithms aim to predict users' future behavior given their historical interactions. A recent line of work has achieved state-of-the-art performance on sequential recommendation tasks by adapting ideas from metric learning and knowledge-graph completion. These algorithms replace inner products with low-dimensional embeddings and distance functions, employing a simple translation dynamic to model user behavior over time. In this paper, we propose TransFM, a model that combines translation and metric-based approaches for sequential recommendation with Factorization Machines (FMs). Doing so allows us to reap the benefits of FMs (in particular, the ability to straightforwardly incorporate content-based features), while enhancing the state-of-the-art performance of translation-based models in sequential settings. Specifically, we learn an embedding and translation space for each feature dimension, replacing the inner product with the squared Euclidean distance to measure the interaction strength between features. Like FMs, we show that the model equation for TransFM can be computed in linear time and optimized using classical techniques. As TransFM operates on arbitrary feature vectors, additional content information can be easily incorporated without significant changes to the model itself. Empirically, the performance of TransFM significantly increases when taking content features into account, outperforming state-of-the-art models on sequential recommendation tasks for a wide variety of datasets.