General Schedule

5:00 pm - 9:00 pm Registration (Medinas A Foyer)
7:30 am - 8:00 pm Registration (Medinas A Foyer)
9:00 am - 5:00 pm Full Day Workshop W1 - ADKDD'08 (Casablanca B)
Full Day Workshop W2 - WEBKDD'08 (Casablanca A)
Full Day Workshop W3 - Sensor-KDD (Rabat B)
Full Day Workshop W4 - PinKDD'08 (Rabat A)
Full Day Workshop W5 - SNA-KDD (Tangier)
Full Day Workshop W13 - Multimedia Data Mining (Kenitra B)
9:00 am - 12:00 pm Half Day Workshop W6 - KDD CUP and Mining Medical data (Kenitra A)
Half Day Workshop W7 - Multiple Information Sources (Casablanca F)
Half Day Workshop W11 - BIOKDD08 (Agadir)
Half Day Workshop W12 - Mining for Business Applications (Fez)
9:00 am - 12:00 pm Tutorial - Mining Massive RFID, Trajectory, and Traffic Data Sets (Casablanca H)
Tutorial - Predictive Modeling with Social Networks (Baraka)
Tutorial - Mining Uncertain and Probabilistic Data: Problems, Challenges, Methods, and Applications (Casablanca C)
Tutorial - Detecting Clusters in Moderate-to-High Dimensional Data: Subspace Clustering, Pattern-based Clustering, and Correlation Clustering (Casablanca G)
10:00 am - 10:30 am Coffee Break (Medinas A Foyer, Medinas D)
12:00 pm - 2:00 pm Lunch (on your own)
2:00 pm - 5:30 pm Half Day Workshop W8 - Large Scale Recommender Systems and NetFlix Prize (Fez)
Half Day Workshop W10 - Mining using Matrices and Tensors (Agadir)
2:00 pm - 5:00 pm Tutorial - Blogosphere: Research Issues, Applications, and Tools (Kenitra A)
Tutorial - Graph Mining and Graph Kernels (Baraka)
Tutorial - Applied Text Mining (Casablanca C)
3:00 pm - 3:30 pm Coffee Break (Medinas A Foyer, Medinas D)
6:00 pm - 6:15 pm Opening Remarks (Casablanca North)
6:15 pm - 6:45 pm Award Presentations (Casablanca North)
6:45 pm - 7:30 pm Innovation Award Talk (Casablanca North)
7:30 am - 8:00 pm Registration (Medinas A Foyer)
7:30 am - 9:00 am Continental Breakfast (Medinas A Foyer, Medinas D)
8:00 am - 6:00 pm Exhibits (Casablanca South)
9:00 am - 10:00 am Plenary Invited Talk - Trevor Hastie (Casablanca North)
10:00 am - 10:30 am Coffee Break (Medinas Foyer, Casablanca South)
10:30 am - 12:30 pm Combined Session 1: Topic Modeling
Combined Session 2: Data Integration
Research Session 1: Social Networks
Research Session 2: Text Mining
12:30 pm - 2:00 pm Conference Lunch (Casablanca North)
Sponsored by Microsoft adCenter Labs
2:00 pm - 3:35 pm Research Session 3: Statistical Methods
Research Session 4: Graph Mining
Research Session 5: Classification
Industry Session 1: Invited Talk & Exploiting Location Information and Geo-mining
Invited Talk - Thore Graepel
3:35 pm - 4:00 pm Coffee Break (Medinas Foyer, Casablanca South)
4:00 pm - 5:20 pm Research Session 6: Rank and Metric Learning
Research Session 7: Clustering and Distance Functions
Research Session 8: Streams and Evolving Data
Industrial Session 2: Social Networks
6:15 pm - 8:45 pm Poster Reception I & Demo Session (Casablanca North)
Sponsored by Oracle
7:30 am - 5:00 pm Registration (Medinas A Foyer)
7:30 am - 9:00 am Continental Breakfast (Medinas A Foyer, Medinas D)
8:00 am - 6:00 pm Exhibits (Casablanca South)
9:00 am - 10:00 am Plenary Invited Talk - Michael Schwarz (Casablanca North)
10:00 am - 10:30 am Coffee Break (Medinas Foyer, Casablanca South)
10:30 am - 12:05 am Research Session 9: Active and Semi-supervised Learning
Research Session 10: Discovery and Detection
Research Session 11: Pattern Mining
Industrial Session 3: Invited Talk & Visual Analytics
Invited Talk - Udo Miletzki
12:05 pm - 2:00 pm SIGKDD Business Lunch
Sponsored by Yahoo!
2:00 pm - 3:20 pm Research Session 12: Feature Selection
Research Session 13: Collaborative Filtering and Matrices Research Session 14: Sequence Data
2:00 pm - 3:20 pm Panel - Social Networks: Looking Ahead (Kenitra)
3:20 pm - 3:50 pm Coffee Break (Medinas Foyer, Casablanca South)
3:50 pm - 5:10 pm Research Session 15: SIGKDD Dissertation Award Winners & Privacy
Research Session 16: Prediction Models
Combined Session 3: Performance and Scale
Industry Session 4: Medical Data Mining
5:15 pm - 6:15 pm KDD Transfer Meeting (Agadir)
(SIGKDD-2008 and SIGKDD-2009 Organizers only)
5:30 pm - 8:00 pm Poster Reception II & Demo Session (Ballroom)
Sponsored by Netflix
8:00 pm - 9:30 pm Program Committee Dinner (Wynn Las Vegas)
7:30 am - 9:00 am Continental Breakfast (Medinas A Foyer, Medinas D)
9:00 am - 10:00 am Plenary Invited Talk - Jitendra Malik (Casablanca North)
10:00 am - 10:30 am Coffee Break (Medinas Foyer A, Medinas D)
10:30 am - 12:10 pm Combined Session 4: Text Mining
Research Session 17: Partially Supervised Learning
Research Session 18: Matrix Methods
Industry Session 5: Search and Commerce
12:10 pm - 12:30 pm Closing Remarks (Casablanca North)

Invited Talks

Trevor Hastie, Stanford University

Regularization Paths and Coordinate Descent

Chair: Sunita Sarawagi

In a statistical world faced with an explosion of data, regularization has become an important ingredient. In many problems, we have many more variables than observations, and the lasso penalty and its hybrids have become increasingly useful. This talk presents some effective algorithms based on coordinate descent for fitting large scale regularization paths for a variety of problems. Joint work with Rob Tibshirani and Jerome Friedman

Michael Schwarz, Yahoo! Research

Internet Advertising and Optimal Auction Design

Chair: Ying Li

We characterize the optimal (revenue maximizing) auction for sponsored search advertising. We show that a search engine's optimal reserve price is independent of the number of bidders. Using simulations, we consider the changes that result from a search engine's choice of reserve price and from changes in the number of participating advertisers.

Jitendra Malik, UC Berkeley

The Future of Image Search

Chair: Bing Liu

There are billions of images on the Internet. Today, searching for a desired image is largely based on textual data such as filename or associated text on the web page; not much use is made of the image content. There are good reasons for this. The field of content-based image retrieval, which emerged during the 1990s, focused primarily on color and texture cues. These were easier to model than shape, but they turned out to be much less useful than originally hoped. I shall review some of the recent developments in the field of visual object recognition in the computer vision community that offer greater promise. Much better image features for characterizing shape, advances in machine learning techniques, and the availability of large amounts of training data lie at the heart of these approaches.

Thore Graepel, Microsoft Research

Large Scale Data Analysis and Modeling in Online Services and Advertising

The last five years have seen a tremendous growth in online search, advertising and gaming services. Today, it is extremely important to analyse large collections of user interaction data as a first step in building predictive models for these services. In this talk we will report on two applications of large scale data analysis performed at Microsoft Research and how they guided model development:

  1. We will present the unique challenges involved in building a new advertisement ranking algorithm starting with the near real-time analysis of click-through logs of weeks of data. For this task, we created type-safe and very fast procedures to build a data store of click-through meta-information about users and advertisements which then guided the development of features for training a Bayesian click-through estimation algorithm. We will discuss how system issues such as memory consumption and algorithmic performance influenced the modelling process. We will also discuss the issue of scientific programming languages capable of dealing with CPU intensive task while allowing rapid prototyping and give a quick overview of F#, a functional programming language ideally suited for this task.
  2. In the second half, we will give an insight into the data analysis and modelling tasks that went into the development of Halo 3's online ranking and matchmaking algorithm. At its core, Halo 3 uses the well-known TrueSkill ranking and matchmaking system but before its launch, we performed thousands of simulations of ranking behaviour on over 3,000,000 players varying speed of convergence, skill-level display and other parameters. We will also discuss the limitations of this simulation and present results how the running online part of the game today compares with the simulations. As of today, over 800,000 unique players play over 2,000,000 Halo 3 matches every 24 hours.

Udo Miletzki, Siemens AG

The Genesis of Postal OCR and Beyond

We provide an overview of the world largest industrial OCR application: Postal Address Reading. We will talk about its humble beginnings and will elaborate how it evolved rapidly to high-tech machinery and discuss its future prospects. Some prominent historical-, system-, methodological-, cultural- and social aspects will also be illuminated.

Every day, millions of mail pieces are automatically sorted and distributed based on a powerful fleet of readers, which recognize millions of characters and words per second and recombine them to meaningful and valid addresses. Cheques and paper forms will vanish sooner or later, since they can be completely replaced by electronic cash flow and e-forms. Mail, however, will persist and even grow in volume for three good reasons: First, mail is conjoined with goods and material, especially in the era of web-shopping. Second, Postal Services are the only world comprising service reaching even the most remote places in the world. Third, postal services will undergo a hybridization process, which means that mail and email will fuse to hybrid mail. Hybrid mail will reach the recipient in the appropriate form according to the recipients preferences, no matter, if it was sent as letter, fax or email.