Prune M-gram frequency tables or Stupid Back-Off prediction tables for an M-gram model to a smaller order N.

prune(object, N, ...)

# S3 method for sbo_kgram_freqs
prune(object, N, ...)

# S3 method for sbo_predtable
prune(object, N, ...)

Arguments

object

A kgram_freqs or a sbo_predtable class object.

N

a length one positive integer. N-gram order of the new object.

...

further arguments passed to or from other methods.

Value

an object of the same class of the input object.

Details

This generic function provides a helper to prune M-gram frequency tables or M-gram models, represented by sbo_kgram_freqs and sbo_predtable objects respectively, to objects of a smaller N-gram order, N < M. For k-gram frequency objects, frequency tables for k > N are simply dropped. For sbo_predtable's, the predictions coming from the nested N-gram model are instead retained. In both cases, all other other attributes besides k-gram order (such as the corpus preprocessing function, or the lambda penalty in Stupid Back-Off training) are left unchanged.

Author

Valerio Gherardi

Examples

# Drop k-gram frequencies for k > 2 freqs <- twitter_freqs summary(freqs)
#> k-gram frequency table #> #> Order (N): 3 #> Dictionary size: 1000 words #> #> # of unique 1-grams: 1002 #> # of unique 2-grams: 71608 #> # of unique 3-grams: 247080 #> #> Object size: 4.7 Mb #> #> See ?predict.sbo_kgram_freqs for usage help.
freqs <- prune(freqs, N = 2) summary(freqs)
#> k-gram frequency table #> #> Order (N): 2 #> Dictionary size: 1000 words #> #> # of unique 1-grams: 1002 #> # of unique 2-grams: 71608 #> #> Object size: 0.9 Mb #> #> See ?predict.sbo_kgram_freqs for usage help.
# Extract a 2-gram model from a larger 3-gram model pt <- twitter_predtable summary(pt)
#> Next-word prediction table from Stupid Back-off N-gram model #> #> Order (N): 3 #> Dictionary size: 1000 words #> Back-off penalization (lambda): 0.4 #> Maximum number of predictions (L): 3 #> #> Object size: 1.4 Mb #> #> See ?predict.sbo_predictor for usage help. #>
pt <- prune(pt, N = 2) summary(pt)
#> Next-word prediction table from Stupid Back-off N-gram model #> #> Order (N): 2 #> Dictionary size: 1000 words #> Back-off penalization (lambda): 0.4 #> Maximum number of predictions (L): 3 #> #> Object size: 0.1 Mb #> #> See ?predict.sbo_predictor for usage help. #>