You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Not sure if this is intended behaviour or not. If it IS intended, I think the documentation is misleading.
The termExtraction function has a "remove.terms" argument with the following description:
#' @param remove.terms is a character vector. It contains a list of additional terms to delete from the documents before term extraction. The default is \code{remove.terms = NULL}.
However, this is not what actually occurs. The terms are actually removed after term extraction, on the list of terms. The distinction I'm drawing is relevant in the case of a bi-gram. If I want to remove the word "learning", "management learning" as a bi-gram will still exist, because the remove.terms is used after extraction, on the list rather than removing it before, and not allowing "management learning" in the first place.
The relevant part of the code is below in the extractNgrams function.
Hi,
thanks for your remarks.
The terms are removed after term extraction and it is an intended behavoiur. In this way, it is possible to create n-grams containing a certain word and then decide to remove only some of them or remove only the single word, not the n-grams using it.
You're right, the documentation is misleading and we will correct it.
Thanks
Massimo
Hi,
Not sure if this is intended behaviour or not. If it IS intended, I think the documentation is misleading.
The termExtraction function has a "remove.terms" argument with the following description:
#' @param remove.terms is a character vector. It contains a list of additional terms to delete from the documents before term extraction. The default is \code{remove.terms = NULL}.
However, this is not what actually occurs. The terms are actually removed after term extraction, on the list of terms. The distinction I'm drawing is relevant in the case of a bi-gram. If I want to remove the word "learning", "management learning" as a bi-gram will still exist, because the remove.terms is used after extraction, on the list rather than removing it before, and not allowing "management learning" in the first place.
The relevant part of the code is below in the extractNgrams function.
The text was updated successfully, but these errors were encountered: