NEWS.md
tknz_sent()
and preprocess()
now have a different implementation on Windows and UNIX OSs, respectively (since the previous C++ implementation has impredictable behaviour on Windows, see #30). This fix also included minor changes in the tknz_sent()
output, in some corner cases (e.g. tknz_sent("")
now returns character(0)
, wheareas it used to return ""
).perplexity()
gets a new argument exp
that allows to return the cross-entropy per word, rather than perplexity (its exponential).perplexity.character()
gets a new argument detailed
that allows to return, alongside with the total perplexity of the input document, also the cross-entropies and word lengths of individual sentences. Closes #28.?kgram_freqs
.R
requirements 3.5 -> 4.0
.SystemRequirements: C++11
(see this tidyverse blog post)verbose
arguments now default to FALSE
.probability()
, perplexity()
and sample_sentences()
are restricted to accept only language_model
class objects as their model
argument..preprocess
and .tknz_sent
arguments to be ignored in process_sentences()
.max_lines
and batch_size
arguments in kgram_freqs.connection()
.dictionary
.dictionary()
with batch processing and non-trivial size constraints on vocabulary size.