A simple text preprocessing utility.
preprocess(input, erase = "[^.?!:;'\\w\\s]", lower_case = TRUE)
input | a character vector. |
---|---|
erase | a length one character vector. Regular expression matching parts of text to be erased from input. The default removes anything not alphanumeric, white space, apostrophes or punctuation characters (i.e. ".?!:;"). |
lower_case | a length one logical vector. If TRUE, puts everything to lower case. |
a character vector containing the processed output.
Valerio Gherardi
preprocess("Hi @ there! I'm using `sbo`.")#> [1] "hi there! i'm using sbo."