kgrams v0.1.2 on CRAN

Summary

Version v0.1.2 of my R package kgrams was just accepted by CRAN. This package provides tools for training and evaluating k-gram language models in R, supporting several probability smoothing techniques, perplexity computations, random text generation and more.

Short demo

library(kgrams)
# Get k-gram frequency counts from Shakespeare's "Much Ado About Nothing"
freqs <- kgram_freqs(kgrams::much_ado, N = 4)

# Build modified Kneser-Ney 4-gram model, with discount parameters D1, D2, D3.
mkn <- language_model(freqs, smoother = "mkn", D1 = 0.25, D2 = 0.5, D3 = 0.75)

# Sample sentences from the language model at different temperatures
set.seed(840)
sample_sentences(model = mkn, n = 3, max_length = 10, t = 1)

[1] "i have studied eight or nine truly by your office [...] (truncated output)"
[2] "ere you go : <EOS>"                                                        
[3] "don pedro welcome signior : <EOS>"

sample_sentences(model = mkn, n = 3, max_length = 10, t = 0.1)

[1] "i will not be sworn but love may transform me [...] (truncated output)" 
[2] "i will not fail . <EOS>"                                                
[3] "i will go to benedick and counsel him to fight [...] (truncated output)"

sample_sentences(model = mkn, n = 3, max_length = 10, t = 10)

[1] "july cham's incite start ancientry effect torture tore pains endings [...] (truncated output)"   
[2] "lastly gallants happiness publish margaret what by spots commodity wake [...] (truncated output)"
[3] "born all's 'fool' nest praise hurt messina build afar dancing [...] (truncated output)"

NEWS

Overall Software Improvements

The package’s test suite has been greatly extended.
Improved error/warning conditions for wrong arguments.
Re-enabled compiler diagnostics as per CRAN policy (#19)

API Changes

verbose arguments now default to FALSE.
probability(), perplexity() and sample_sentences() are restricted to accept only language_model class objects as their model argument.

New features

as_dictionary(NULL) now returns an empty dictionary.

Bug Fixes

Fixed bug causing .preprocess and .tknz_sent arguments to be ignored in process_sentences().
Fixed previously wrong defaults for max_lines and batch_size arguments in kgram_freqs.connection().
Added print method for class dictionary.
Fixed bug causing invalid results in dictionary() with batch processing and non-trivial size constraints on vocabulary size.

Other

Maintainer’s email updated

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{gherardi2021,
  author = {Gherardi, Valerio},
  title = {Kgrams V0.1.2 on {CRAN}},
  date = {2021-11-13},
  url = {https://vgherard.github.io/posts/2021-11-13-kgrams-v012-released/kgrams-v012-released.html},
  langid = {en}
}

For attribution, please cite this work as:

Gherardi, Valerio. 2021. “Kgrams V0.1.2 on CRAN.” November 13, 2021. https://vgherard.github.io/posts/2021-11-13-kgrams-v012-released/kgrams-v012-released.html.

Subscribe