Return Begin-Of-Sentence, End-Of-Sentence and Unknown-Word special tokens.
EOS()
BOS()
UNK()
a string representing the appropriate special token.
These functions return the internal representation of BOS, EOS and UNK tokens respectively. Their actual returned values are irrelevant and their only purpose is to simplify queries of k-gram counts and probabilities involving the special tokens, as shown in the examples.
f <- kgram_freqs("a b b a b", 2)
query(f, c(BOS(), EOS(), UNK()))
#> [1] 1 1 0
m <- language_model(f, "add_k", k = 1)
probability(c("a", "b") %|% BOS(), m)
#> [1] 0.4 0.2
probability("a b b a" %+% EOS(), m)
#> [1] 0.002721088
# The actual values of BOS(), EOS() and UNK() are irrelevant
c(BOS(), EOS(), UNK())
#> [1] "___BOS___" "___EOS___" "___UNK___"