Information on available k-gram continuation probability smoothers.

List of smoothers currently supported by kgrams

  • "ml": Maximum Likelihood estimate (Markov 1913) .

  • "add_k": Add-k smoothing (Dale and Laplace 1995; Lidstone 1920; Johnson 1932; Jeffreys 1998) .

  • "abs": Absolute discounting (Ney and Essen 1991) .

  • "wb": Witten-Bell smoothing (Bell et al. 1990; Witten and Bell 1991)

  • "kn": Interpolated Kneser-Ney. (Kneser and Ney 1995; Chen and Goodman 1999) .

  • "mkn": Interpolated modified Kneser-Ney. (Chen and Goodman 1999) .

  • "sbo": Stupid Backoff (Brants et al. 2007) .

smoothers()

info(smoother)

Arguments

smoother

a string. Code name of probability smoother.

Value

smoothers() returns a character vector, the list of code names of probability smoothers available in kgrams. info(smoother) returns NULL (invisibly) and prints some information on the selected smoothing technique.

References

Bell TC, Cleary JG, Witten IH (1990). Text compression. Prentice-Hall, Inc.

Brants T, Popat AC, Xu P, Och FJ, Dean J (2007). “Large Language Models in Machine Translation.” In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), 858--867. https://aclanthology.org/D07-1090/.

Chen SF, Goodman J (1999). “An empirical study of smoothing techniques for language modeling.” Computer Speech & Language, 13(4), 359--394.

Dale AI, Laplace P (1995). Philosophical essay on probabilities. Springer.

Jeffreys H (1998). The theory of probability. OUP Oxford.

Johnson WE (1932). “Probability: The deductive and inductive problems.” Mind, 41(164), 409--423.

Kneser R, Ney H (1995). “Improved backing-off for M-gram language modeling.” 1995 International Conference on Acoustics, Speech, and Signal Processing, 1, 181-184 vol.1.

Lidstone GJ (1920). “Note on the general case of the Bayes-Laplace formula for inductive or a posteriori probabilities.” Transactions of the Faculty of Actuaries, 8(182-192), 13.

Markov AA (1913). “Essai d'une Recherche Statistique Sur le Texte du Roman Eugene Oneguine.” Bull. Acad. Imper. Sci. St. Petersburg, 7.

Ney H, Essen U (1991). “On smoothing techniques for bigram-based natural language modelling.” In Acoustics, Speech, and Signal Processing, IEEE International Conference on, 825--828. IEEE Computer Society.

Witten IH, Bell TC (1991). “The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression.” Ieee transactions on information theory, 37(4), 1085--1094.

Author

Valerio Gherardi

Examples

# List available smoothers
smoothers()
#> [1] "ml"    "add_k" "abs"   "kn"    "mkn"   "sbo"   "wb"   

# Get information on smoother "kn", i.e. Interpolated Kneser-Ney
info("kn")
#> Interpolated Kneser-Ney
#>  * code: 'kn'
#>  * parameters: D
#>  * constraints: 0 <= D <= 1