Tutorial Sessions/Invited Talks
All
tutorials and invited talks are free to registered conference attendees of
all conferences held at WOLDCOMP'11. Those who are
interested in attending one or more of the tutorials are
to sign up on site at the conference registration desk
in Las Vegas. A
complete & current list of WORLDCOMP Tutorials
can be found
here.
In addition to
tutorials at other conferences, DMIN'11 aims at
providing a set of tutorials dedicated to Data Mining
topics. The 2007 key tutorial was given
by Prof. Eamonn
Keogh on Time Series Clustering. The 2008 key tutorial
was presented by Mikhail Golovnya (Senior Scientist,
Salford Systems, USA) on Advanced Data Mining
Methodologies. DMIN'09 provided four
tutorials presented by Prof. Nitesh V. Chawla on Data Mining with
Sensitivity to Rare Events and Class Imbalance,
Prof. Asim Roy on Autonomous Machine
Learning, Dan Steinberg (CEO of Salford Systems)
on Advanced
Data Mining Methodologies, and Peter Geczy on Emerging Human-Web
Interaction Research. DMIN'10 hosted a tutorial
presented by Prof. Vladimir Cherkassky on Advanced
Methodologies for Learning with Sparse Data. He was a
keynote speaker as well (Predictive Data Modeling and
the Nature of Scientific Discovery). In addition, we had
one tutorial held by Peter Geczy on Web Mining.
DMIN'11
will host the following tutorials/invited
talks:
Tutorial A |
Speaker: |
Gary M. Weiss, Fordham University, USA |
|
Topic: |
Smart Phone-Based Sensor Data
Mining |
Webpage |
http://www.cis.fordham.edu/faculty/Gary-Weiss.html |
Date & Time |
Tuesday, July 19,
6:00-8:30pm
(new) |
Location |
Ballroom 1 |
Description |
Smart phones have
exploded in popularity in recent years and are
now the most common computing devices, having
surpassed personal computers. While smart phones,
and other related devices such as tablet
computers, now run sophisticated operating
systems and include substantial processing power
and memory, they are more than computing and
communication devices—they are sophisticated
sensors. This becomes clear when you realize
that these devices typically contain a: GPS
sensor, acceleration sensor (accelerometer),
audio sensor (microphone), image sensor (camera),
light sensor, direction sensor (compass),
proximity sensor, temperature sensor, and
pressure sensor. The availability of these
sensors in mass-marketed mobile devices creates
exciting new opportunities for data mining and
data mining applications. In this tutorial I
will survey the data mining applications that
can be built using these sensors, the data
mining methods used to extract information from
these sensors, and the practical and
architectural issues that relate to data mining
of sensor data from devices with relatively
limited resources (e.g., battery life). I will
also discuss how sensor data from a population
of smart phones can be pooled (crowdsourcing) to
provide useful knowledge and interesting
applications. This tutorial is intended for
anyone interested in the topic and those from
other research areas (e.g., wireless networks)
should be able to learn much from the tutorial. |
Short Bio |
Gary Weiss is a
faculty member in the department of Computer and
Information Science at Fordham University. He
earned his B.S degree from Cornell University,
his M.S. degree from Stanford University, and
his Ph.D. from Rutgers University. Prior to
coming to Fordham he worked for over 15 years at
AT&T Bell Labs and AT&T Labs. Until recently,
his research has focused on how various
real-world factors, such as class imbalance,
affects the ability to learn from data. This led
to several KDD workshops on Utility-Based Data
Mining and a special issue of the Data Mining
and Knowledge Discovery journal on this topic.
For the past two years Dr. Weiss has led a dozen
students on the WISDM (Wireless Sensor Data
Mining) project. Recent work has focused on
mining accelerometer data from smart phones and
this has led to publications on cell phone-based
activity recognition and cell-phone based
biometric identification. Dr. Weiss has I have
published over forty papers in the areas of
machine learning and data mining as well as
several in the area of expert systems and
object-oriented programming. |
Tutorial B |
Speaker: |
Michael Mahoney, Stanford University, USA |
|
Topic: |
Geometric Tools for Identifying Structure in
Large Social and Information Networks |
Webpage |
http://cs.stanford.edu/people/mmahoney/ |
Date & Time |
Monday July 18,
5:45-8:15pm |
Location |
Platinum Room |
Description |
Abstract
The tutorial will
cover recent algorithmic and statistical work on
identifying and exploiting "geometric" structure
in large informatics graphs such as large social
and information networks. Such tools (e.g.,
Principal Component Analysis and related
non-linear dimensionality reduction methods) are
popular in many areas of machine learning and
data analysis due to their relatively-nice
algorithmic properties and
their connections with regularization and
statistical inference. These tools are not,
however, immediately-applicable in many large
informatics graphs applications since graphs are
more combinatorial objects; due to the noise and
sparsity patterns of many real-world networks,
etc. Recent theoretical and empirical work has
begun to remedy this, and in doing so it has
already elucidated several surprising and
counterintuitive properties of very large
networks. Topics include: underlying theoretical
ideas; tips to bridge the theory-practice gap;
empirical observations; and the usefulness of
these tools for such diverse applications as
community detection, routing, inference, and
visualization.
Audience
This tutorial
will provide an opportunity for the data
analysis community, including both
mathematically-oriented researchers as well as
practitioners, to learn about recent algorithmic
advances for dealing with
very large social and information networks. Many
of these algorithmic tools have implicit
geometric properties associated with them; and
these geometric properties often have implicit
statistical properties and consequences that
indicate where these tools are more or less
useful in real-world applications. As such, this
tutorial should be of interest to and accessible
by a large fraction of the data analysis
community - including both: established
researchers who have done work in this or
related areas, as well as researchers whose
interests are not directly in the topic of the
tutorial; and graduate students and postdocs, as
well as junior and more senior researchers.
Many of the algorithmic and statistical
techniques to be discussed have a strong overlap
with seemingly-different problems and questions
in statistics, optimization, numerical analysis,
and machine learning - these connections will be
highlighted throughout. Relatedly, many of these
questions have been studied by researchers in
theoretical computer science, scientific
computing, statistics, machine learning, and
data analysis; the complementary aspects of
these different approaches, including their
applicability to solving real-world problems
from different application domains, will be
emphasized. Depending on one's background, one
can expect to benefit in different ways from the
tutorial. In particular: Practitioners of
machine learning and data analysis should gain
just enough insight into the theoretical
underpinnings of relevant algorithms to see how
and why algorithms work well or fail to work
well in real-world settings;
Application-oriented theorists should gain
insight into how the inner-workings of
algorithms have practical implications for
machine learning and data analysis on large
networks, as well as learn about interesting
theoretical problems raise by recent empirical
findings; and Knowledgeable members of the
data analysis community should gain a broad
overview of the area of large-scale graph mining
and network analysis, including where data
analysis methods with which they are familiar
are well-suited or ill-suited. |
Short Bio |
Michael Mahoney is
currently at Stanford University. His research
interests focus on algorithmic and statistical
aspects of algorithms for large-scale data
problems in scientific and Internet applications.
Currently, he is working on geometric network
analysis; developing approximate computation and
regularization methods for large informatics
graphs; and applications to community detection,
clustering, and information dynamics in large
social and information networks. He has also
worked on randomized matrix algorithms and
applications in genetics and medical imaging. He
has been a faculty member at Yale University and
a researcher at Yahoo, and his PhD was is
computational statistical mechanics at Yale
University. |
Invited Talks
Invited Talk |
Speaker: |
Peter Geczy, AIST, Japan |
|
Topic: |
Data Mining and Privacy: Water and Fire? |
Date & Time |
Tuesday, July 19,
01:20-2:20pm (+ 40 minutes buffer) |
Location |
Ballroom 1 |
Description |
Data mining research and
practice have been experiencing an extraordinary
growth over the past decade‒so have privacy
concerns. Progress in data mining has been
pushing the envelope of reachable depth,
information and knowledge extracted from vast
amounts of data‒increasingly exposing your
innermost characteristics, behaviors and habits.
Advanced data mining techniques and analytics
have been significantly benefiting organizations
in both commercial and noncommercial sectors‒yet
providing an unprecedented potential for abuse.
Is the interplay of data mining and privacy a
conflict in making? This pertinent matter has
been approached variously. Privacy preserving
data mining has been tackling the issue from
algorithmic and technology angles. Laws and
regulations enacted by countries have been
addressing the issue from legislative angles.
Best practices and conducts instituted by
commercial and international bodies have been
exploring self-regulatory angles. Bridging data
mining and privacy requires interdisciplinary
endeavor. We will concisely survey the status
quo and highlight selected promising directions. |
Short Bio |
Dr. Peter Geczy is a
chief scientist at The National Institute of
Advanced Industrial Science and Technology (AIST).
He also held positions at The Institute of
Physical and Chemical Research (RIKEN) and The
Research Center for Future Technologies. His
interdisciplinary scientific interests encompass
domains of data and web mining, human
interactions and behavior, social intelligence
technologies, privacy, information systems,
knowledge management and engineering, artificial
intelligence, and adaptable systems. His recent
research focus also extends to the spheres of
service science, engineering, management, and
computing. He received several awards in
recognition of his accomplishments. Dr. Geczy
has been serving on various professional boards
and committees, and has been a distinguished
speaker in academia and industry. |
Invited Talk B |
Speaker: |
Nitesh V. Chawla, University of Notre Dame, USA |
|
Topic: |
Connecting the
dots for personalized healthcare |
Webpage |
http://www.cse.nd.edu/~nchawla/ |
Date & Time |
Monday, July 18,
01:20-2:20pm (+ 40 minutes buffer) |
Location |
Ballroom 1 |
Description |
Proactive
personalized medicine is expected to bring
fundamental changes, offering recommendations of
lifestyle adjustments and treatments to avoid
diseases a patient has high risk for developing
in the future. Due to common genetic, molecular,
environmental, and lifestyle-based individual
risk factors, most diseases do not occur in
isolation. No matter how unique our medical
experiences, chances are that other patients
among millions have experienced genetic and
environmental risk factors that closely mirror
ours. In this talk, I will present our work that
builds a comprehensive recommendation system,
called CARE (Collaborative Assessment and
Recommendation Engine), by pulling in experience
of millions of patients to answer the question.
I will also present our work on multi-relational
representation of disease networks using both
genetic knowledge, based on previously
discovered gene-disease associations and
phenotypic data from real patient histories. |
Short Bio |
Nitesh Chawla is
an Assistant Professor in the Department of
Computer Science and Engineering at the
University of Notre Dame. He directs the Data
Inference Analysis and Learning Lab (DIAL) and
co-directs the Interdisciplinary Center of the
Network Science and Applications (iCenSA) at
Notre Dame. His research is primarily focused on
machine learning, data mining, and social and
dynamic networks. His work has led to
applications in various domains including
biology, medicine, finance, security, social
science, fraud detection, intrusion detection,
and text categorization. He is on the editorial
board of IEEE Transactions on Systems, Man and
Cybernetics Part B. He has received various
awards and acknowledgements. He received the NAE
FIE New Faculty Fellowship in 2005. His current
research is supported form NSF, DOD, NWICG, NIJ,
and industry sponsors. |
|
|