October 29, 2014 – Afternoon
In recent years it has been pointed out that, in a number of applications involving (text) classification, the final goal is not determining which class (or classes) individual unlabelled data items belong to, but determining the prevalence (or “relative frequency”) of each class in the unlabelled data. The latter task is known as quantification.
Assume a market research agency runs a poll in which they ask the question “What do you think of the recent ad campaign for product X?” Once the poll is complete, they may want to classify the resulting textual answers according to whether they belong or not to the class LovedTheCampaign. The agency is likely not interested in whether a specific individual belongs to the class LovedTheCampaign, but in knowing how many respondents belong to it, i.e., in knowing the prevalence of the class. In other words, the agency is interested not in classification, but in quantification. Essentially, quantification is classification tackled at the aggregate (rather than at the individual) level.
The research community has recently shown a growing interest in tackling quantification as a task in its own right. One of the reasons is that, since the goal of quantification is different than that of classification, quantification requires evaluation measures different than for classification. A second, related reason is that using a method optimized for classification accuracy is suboptimal when quantification accuracy is the real goal. A third reason is the growing awareness that quantification is going to be more and more important; with the advent of big data, more and more application contexts are going to spring up in which we will simply be happy with analyzing data at the aggregate (rather than at the individual) level.
The goal of this tutorial is to introduce the audience to the problem of quantification, to the techniques that have been proposed for solving it, to the
metrics used to evaluate them, and to the problems that are still open in the area.
- Fabrizio Sebastiani, Principal Scientist, Qatar Computing Research Institute
Fabrizio Sebastiani is a Principal Scientist at QCRI; he was (until June 2014) a Senior Researcher at the Italian National Council of Research (from which he is currently on leave), and (until February 2006) an Associate Professor at the Department of Pure and Applied Mathematics of the University of Padova, Italy. His main current research interests are at the intersection of information retrieval, machine learning, and human language technologies, with particular emphasis on text classification, information extraction, opinion mining, and their applications.