카지노 사이트

Text Quantification

Fabrizio Sebastiani

October 29, 2014 – Afternoon

Tutorial notes

watch video

Abstract:

In recent years it has been pointed out that, in a number of applications involving (text) classification, the final goal is not determining which class (or classes) individual unlabelled data items belong to, but determining the prevalence (or “relative frequency”) of each class in the unlabelled data. The latter task is known as quantification.
Assume a market research agency runs a poll in which they ask the question “What do you think of the recent ad campaign for product X?” Once the poll is complete, they may want to classify the resulting textual answers according to whether they belong or not to the class LovedTheCampaign. The agency is likely not interested in whether a specific individual belongs to the class LovedTheCampaign, but in knowing how many respondents belong to it, i.e., in knowing the prevalence of the class. In other words, the agency is interested not in classification, but in quantification. Essentially, quantification is classification tackled at the aggregate (rather than at the individual) level.
The research community has recently shown a growing interest in tackling quantification as a task in its own right. One of the reasons is that, since the goal of quantification is different than that of classification, quantification requires evaluation measures different than for classification. A second, related reason is that using a method optimized for classification accuracy is suboptimal when quantification accuracy is the real goal. A third reason is the growing awareness that quantification is going to be more and more important; with the advent of big data, more and more application contexts are going to spring up in which we will simply be happy with analyzing data at the aggregate (rather than at the individual) level.
The goal of this tutorial is to introduce the audience to the problem of quantification, to the techniques that have been proposed for solving it, to the
metrics used to evaluate them, and to the problems that are still open in the area.

Instructor:

top of the page