A simple text preprocessing utility.

preprocess(input, erase = "[^.?!:;'\\w\\s]", lower_case = TRUE)

Arguments

input

a character vector.

erase

a length one character vector. Regular expression matching parts of text to be erased from input. The default removes anything not alphanumeric, white space, apostrophes or punctuation characters (i.e. ".?!:;").

lower_case

a length one logical vector. If TRUE, puts everything to lower case.

Value

a character vector containing the processed output.

Author

Valerio Gherardi

Examples

preprocess("Hi @ there! I'm using `sbo`.")
#> [1] "hi there! i'm using sbo."