Package: text2vec 0.6.6

text2vec: Modern Text Mining Framework for R

Fast and memory-friendly tools for text vectorization, topic modeling (LDA, LSA), word embeddings (GloVe), similarities. This package provides a source-agnostic streaming API, which allows researchers to perform analysis of collections of documents which are larger than available RAM. All core functions are parallelized to benefit from multicore machines.

Authors:Dmitriy Selivanov [aut, cre, cph], Manuel Bickel [aut, cph], Qing Wang [aut, cph]

text2vec_0.6.6.tar.gz
text2vec_0.6.6.zip(r-4.7)text2vec_0.6.6.zip(r-4.6)text2vec_0.6.6.zip(r-4.5)
text2vec_0.6.6.tgz(r-4.6-x86_64)text2vec_0.6.6.tgz(r-4.6-arm64)text2vec_0.6.6.tgz(r-4.5-x86_64)text2vec_0.6.6.tgz(r-4.5-arm64)
text2vec_0.6.6.tar.gz(r-4.7-arm64)text2vec_0.6.6.tar.gz(r-4.7-x86_64)text2vec_0.6.6.tar.gz(r-4.6-arm64)text2vec_0.6.6.tar.gz(r-4.6-x86_64)
text2vec_0.6.6.tgz(r-4.6-emscripten)
manual.pdf |manual.html
card.svg |card.png
text2vec/json (API)
NEWS

# Install 'text2vec' in R:
install.packages('text2vec', repos = c('https://dselivanov.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/dselivanov/text2vec/issues

Uses libs:
  • c++– GNU Standard C++ Library v3
Datasets:

On CRAN:

Conda:

glovelatent-dirichlet-allocationnatural-language-processingtext-miningtopic-modelingvectorizationword-embeddingsword2veccpp

13.69 score 870 stars 27 packages 1.6k scripts 8.4k downloads 4 mentions 42 exports 14 dependencies

Last updated from:0b31bdd81f. Checks:13 OK. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-arm64OK249
linux-devel-x86_64OK204
source / vignettesOK269
linux-release-arm64OK176
linux-release-x86_64OK242
macos-release-arm64OK115
macos-release-x86_64OK426
macos-oldrel-arm64OK115
macos-oldrel-x86_64OK230
windows-develOK205
windows-releaseOK170
windows-oldrelOK188
wasm-releaseOK118

Exports:as.lda_cBNSchar_tokenizercheck_analogy_accuracycoherenceCollocationscombine_vocabulariescreate_dtmcreate_tcmcreate_vocabularydist2fitfit_transformGlobalVectorsGloVehash_vectorizeridirifilesifiles_parallelitokenitoken_paralleljsPCA_robustLatentDirichletAllocationLatentSemanticAnalysisLDALSAnormalizepdist2perplexitypostag_lemma_tokenizerprepare_analogy_questionsprune_vocabularypsim2RelaxedWordMoversDistanceRWMDsim2space_tokenizersplit_intoTfIdfvocab_vectorizervocabularyword_tokenizer

Dependencies:data.tabledigestfloatlatticelgrMatrixMatrixExtramlapiR6RcppRcppArmadilloRhpcBLASctlrsparsestringi

Analyzing Texts with the text2vec package

Rendered fromtext-vectorization.Rmdusingknitr::knitron May 25 2026.

Last update: 2023-11-13
Started: 2016-01-10

GloVe Word Embeddings

Rendered fromglove.Rmdusingknitr::knitron May 25 2026.

Last update: 2023-11-13
Started: 2016-01-10

Readme and manuals

Help Manual

Help pageTopics
Converts document-term matrix sparse matrix to 'lda_c' formatas.lda_c
BNSBNS
Checks accuracy of word embeddings on the analogy taskcheck_analogy_accuracy
Coherence metrics for topic modelscoherence
Collocations model.Collocations
Combines multiple vocabularies into onecombine_vocabularies
Document-term matrix constructioncreate_dtm create_dtm.itoken create_dtm.itoken_parallel
Term-co-occurence matrix constructioncreate_tcm create_tcm.itoken create_tcm.itoken_parallel
Creates a vocabulary of unique termscreate_vocabulary create_vocabulary.character create_vocabulary.itoken create_vocabulary.itoken_parallel vocabulary
Pairwise Distance Matrix Computationdist2 distances pdist2
re-export rsparse::GloVeGlobalVectors GloVe
Creates iterator over text files from the diskidir ifiles ifiles_parallel
Iterators (and parallel iterators) over input objectsitoken itoken.character itoken.iterator itoken.list itoken_parallel itoken_parallel.character itoken_parallel.iterator itoken_parallel.list
(numerically robust) Dimension reduction via Jensen-Shannon Divergence & Principal ComponentsjsPCA_robust
Creates Latent Dirichlet Allocation model.LatentDirichletAllocation LDA
Latent Semantic Analysis modelLatentSemanticAnalysis LSA
IMDB movie reviewsmovie_review
Matrix normalizationnormalize
Perplexity of a topic modelperplexity
Prepares list of analogy questionsprepare_analogy_questions
Printing Vocabularyprint.text2vec_vocabulary
Prune vocabularyprune_vocabulary
Creates Relaxed Word Movers Distance (RWMD) modelRelaxedWordMoversDistance RWMD
Pairwise Similarity Matrix Computationpsim2 sim2 similarities
Split a vector for parallel processingsplit_into
text2vectext2vec-package text2vec
TfIdfTfIdf
Simple tokenization functions for string splittingchar_tokenizer postag_lemma_tokenizer space_tokenizer tokenizers word_tokenizer
Vocabulary and hash vectorizershash_vectorizer vectorizers vocab_vectorizer