text2vec - Modern Text Mining Framework for R
Fast and memory-friendly tools for text vectorization, topic modeling (LDA, LSA), word embeddings (GloVe), similarities. This package provides a source-agnostic streaming API, which allows researchers to perform analysis of collections of documents which are larger than available RAM. All core functions are parallelized to benefit from multicore machines.
Last updated
glovelatent-dirichlet-allocationnatural-language-processingtext-miningtopic-modelingvectorizationword-embeddingsword2veccpp
13.69 score 870 stars 27 dependents 1.6k scripts 8.4k downloads
RestRserve - A Framework for Building HTTP API
Allows to easily create high-performance full featured HTTP APIs from R functions. Provides high-level classes such as 'Request', 'Response', 'Application', 'Middleware' in order to streamline server side application development. Out of the box allows to serve requests using 'Rserve' package, but flexible enough to integrate with other HTTP servers such as 'httpuv'.
Last updated
http-serveropenapirest-apiswagger-uicpp
9.28 score 296 stars 1 dependents 107 scripts 615 downloads
rsparse - Statistical Learning on Sparse Matrices
Implements many algorithms for statistical learning on sparse matrices - matrix factorizations, matrix completion, elastic net regressions, factorization machines. Also 'rsparse' enhances 'Matrix' package by providing methods for multithreaded <sparse, dense> matrix products and native slicing of the sparse matrices in Compressed Sparse Row (CSR) format. List of the algorithms for regression problems: 1) Elastic Net regression via Follow The Proximally-Regularized Leader (FTRL) Stochastic Gradient Descent (SGD), as per McMahan et al(, <doi:10.1145/2487575.2488200>) 2) Factorization Machines via SGD, as per Rendle (2010, <doi:10.1109/ICDM.2010.127>) List of algorithms for matrix factorization and matrix completion: 1) Weighted Regularized Matrix Factorization (WRMF) via Alternating Least Squares (ALS) - paper by Hu, Koren, Volinsky (2008, <doi:10.1109/ICDM.2008.22>) 2) Maximum-Margin Matrix Factorization via ALS, paper by Rennie, Srebro (2005, <doi:10.1145/1102351.1102441>) 3) Fast Truncated Singular Value Decomposition (SVD), Soft-Thresholded SVD, Soft-Impute matrix completion via ALS - paper by Hastie, Mazumder et al. (2014, <doi:10.48550/arXiv.1410.2596>) 4) Linear-Flow matrix factorization, from 'Practical linear models for large-scale one-class collaborative filtering' by Sedhain, Bui, Kawale et al (2016, ISBN:978-1-57735-770-4) 5) GlobalVectors (GloVe) matrix factorization via SGD, paper by Pennington, Socher, Manning (2014, <https://aclanthology.org/D14-1162/>) Package is reasonably fast and memory efficient - it allows to work with large datasets - millions of rows and millions of columns. This is particularly useful for practitioners working on recommender systems.
Last updated
collaborative-filteringfactorization-machinesmatrix-completionmatrix-factorizationrecommender-systemsparse-matricessvdopenblascppopenmp
9.17 score 180 stars 28 dependents 55 scripts 7.1k downloadsmlapi - Abstract Classes for Building 'scikit-learn' Like API
Provides 'R6' abstract classes for building machine learning models with 'scikit-learn' like API. <https://scikit-learn.org/> is a popular module for 'Python' programming language which design became de facto a standard in industry for machine learning tasks.
Last updated
5.48 score 28 dependents 7 scripts 7.2k downloadssparsepp - 'Rcpp' Interface to 'sparsepp'
Provides interface to 'sparsepp' - fast, memory efficient hash map. It is derived from Google's excellent 'sparsehash' implementation. We believe 'sparsepp' provides an unparalleled combination of performance and memory usage, and will outperform your compiler's unordered_map on both counts. Only Google's 'dense_hash_map' is consistently faster, at the cost of much greater memory usage (especially when the final size of the map is not known in advance).
Last updated
hashmapheader-onlyrcppsparsepp
4.04 score 11 stars 2 scripts 283 downloadssparsio - I/O Operations with Sparse Matrices
Fast 'SVMlight' reader and writer. 'SVMlight' is most commonly used format for storing sparse matrices (possibly with some target variable) on disk. For additional information about 'SVMlight' format see <http://svmlight.joachims.org/>.
Last updated
svmlightcpp
3.74 score 11 stars 10 scripts 206 downloads