Database Reference
In-Depth Information
traditional corpus. The next section discusses such a metric. It's known as Term
Frequency—Inverse Document Frequency (TFIDF), which is based entirely on all
the fetched documents and which keeps track of the importance of terms occurring
in each of the documents.
Note that the fetched documents may change constantly over time. Consider the
case of a web search engine, in which each fetched document corresponds to
a matching web page in a search result. The documents are added, modified,
or removed and, as a result, the metrics and indices must be updated
correspondingly. Additionally, word distributions can change over time, which
reduces the effectiveness of classifiers and filters (such as spam filters) unless they
are retrained.
Search WWH ::




Custom Search