|
|
Keynote Talks
A Virtual Cup-Tester for Coffee Quality Assessment
Giacomo Della Riccia
Department of Mathematics and Informatics
University of Udine
Via delle Science, 206
33100-Udine, Italy
dlrca@uniud.it
Abstract:
In industrial coffee routine, sensory analysis is still the ultimate
tool to assess overall quality. A panel of assessors who may be either
professional cup-testers or naive consumers having received a basic
training, assign a score, called Merit, usually on a discrete scale,
to specially prepared cups of "espresso" coffee. Such a practice,
which involves roasting and brewing, is time consuming and rather
expensive; moreover, cup-testing sessions cannot be too long or
frequent during the day because fatigue develops after the first dozen
of cups or so, causing possible distorsions in the sensory evaluations.
Thus coffee industry is greatly interested in the calibration on
sensory data of instruments, to perform an automatic screening of
"bad coffees" which could be rejected before the panel evaluation.
The two classes, "good" and "bad" coffees, are determined by a cutoff
value of the variable Merit indicated by the panel. In the last 7 years,
Illycaffé company has used a NIRSystem 6500 (Foss Tecator) to
collect the near-infrared transflectance spectra of several thousands
of raw coffee samples before the testing procedure. Using this database,
we correlated spectra and Merit by Multivariate Regression algorithms
normally adopted in Chemometrics. With the predicted Merit scale
derived from that calibration, we devised a Bayes classifier. Thus,
if a new sample is classified as "good", it is processed and submitted
to the judgement of the panel, whereas if the sample is declared to be
"bad", it is immediately discarded. Actually we combined "nearest
neighbours" and Bayes classifiers techniques to obtain local
classifiers which perform much better than a single global classifier,
as it will be seen from the operation curves shown during the talk.
The approach we developed is non-destructive, economic and rapid, and
the operating curves allow decision-makers to modify the classifier
threshold in order to adjust the ratio between the amount of time
the panel could save and the risk of loosing "good" coffees, according
to the daily working load of the panel.
Distributed Learning for the Analysis of Extreme Data Sets
Lawrence O. Hall
Department of Computer Science and Engineering, ENB 118
University of South Florida
4202 E. Fowler Ave.
Tampa, Fl 33620-9951, USA
hall@csee.usf.edu
Abstract:
An extreme data set is one which contains millions to billions of
examples. The size of the data set is a challenge to modern machine
learning/data mining algorithms. Example data sets are those of
credit card fraud, network intrusion detection, protein structure
analysis, and very large-scale simulations. Often these data sets are
highly skewed with the class of interest occurring relatively rarely,
e.g. security breaches on a computer network. Sometimes organizations
would like to share solutions to problems, but not the data that
describes the problems, e.g. credit card fraud. Building a
distributed set of models on intractable subsets of Extreme Data
results in an ensemble of classifiers that may be used to classify new
examples. This talk describes the process of building a distributed
ensemble of classifiers, approaches to dealing with skewed class
distributions and the accuracy one may expect from an ensemble.
The approach discussed is straightforward to apply, quite scalable
and results in a model for data analysis which is typically equivalent
or better than a single classifier built on all the data.
|
|