The 5th International Symposium
on Intelligent Data Analysis

Berlin, Germany
August 28-30, 2003

Topics of Interest
Important Dates
Conference Organization
Program Committee
Paper Submission
Preliminary Program
Keynote Talks
Poster Instructions
Student Grants
Conference Venue
Conference Location
Hotel Information

Keynote Talks

A Virtual Cup-Tester for Coffee Quality Assessment

Giacomo Della Riccia
Department of Mathematics and Informatics
University of Udine
Via delle Science, 206
33100-Udine, Italy

Abstract: In industrial coffee routine, sensory analysis is still the ultimate tool to assess overall quality. A panel of assessors who may be either professional cup-testers or naive consumers having received a basic training, assign a score, called Merit, usually on a discrete scale, to specially prepared cups of "espresso" coffee. Such a practice, which involves roasting and brewing, is time consuming and rather expensive; moreover, cup-testing sessions cannot be too long or frequent during the day because fatigue develops after the first dozen of cups or so, causing possible distorsions in the sensory evaluations. Thus coffee industry is greatly interested in the calibration on sensory data of instruments, to perform an automatic screening of "bad coffees" which could be rejected before the panel evaluation. The two classes, "good" and "bad" coffees, are determined by a cutoff value of the variable Merit indicated by the panel. In the last 7 years, Illycaffé company has used a NIRSystem 6500 (Foss Tecator) to collect the near-infrared transflectance spectra of several thousands of raw coffee samples before the testing procedure. Using this database, we correlated spectra and Merit by Multivariate Regression algorithms normally adopted in Chemometrics. With the predicted Merit scale derived from that calibration, we devised a Bayes classifier. Thus, if a new sample is classified as "good", it is processed and submitted to the judgement of the panel, whereas if the sample is declared to be "bad", it is immediately discarded. Actually we combined "nearest neighbours" and Bayes classifiers techniques to obtain local classifiers which perform much better than a single global classifier, as it will be seen from the operation curves shown during the talk. The approach we developed is non-destructive, economic and rapid, and the operating curves allow decision-makers to modify the classifier threshold in order to adjust the ratio between the amount of time the panel could save and the risk of loosing "good" coffees, according to the daily working load of the panel.

Distributed Learning for the Analysis of Extreme Data Sets

Lawrence O. Hall
Department of Computer Science and Engineering, ENB 118
University of South Florida
4202 E. Fowler Ave.
Tampa, Fl 33620-9951, USA

Abstract: An extreme data set is one which contains millions to billions of examples. The size of the data set is a challenge to modern machine learning/data mining algorithms. Example data sets are those of credit card fraud, network intrusion detection, protein structure analysis, and very large-scale simulations. Often these data sets are highly skewed with the class of interest occurring relatively rarely, e.g. security breaches on a computer network. Sometimes organizations would like to share solutions to problems, but not the data that describes the problems, e.g. credit card fraud. Building a distributed set of models on intractable subsets of Extreme Data results in an ensemble of classifiers that may be used to classify new examples. This talk describes the process of building a distributed ensemble of classifiers, approaches to dealing with skewed class distributions and the accuracy one may expect from an ensemble. The approach discussed is straightforward to apply, quite scalable and results in a model for data analysis which is typically equivalent or better than a single classifier built on all the data.


Last updated: Mon Aug 18 15:25:04 CEST 2003 -