Readings in Media Processing:

Multimedia Data Compression and Data Mining 

CSc 80000, Spring 2007

Professor Zhigang Zhu
Department of Computer Science
The City College of New York  and Graduate Center
The City University of New York (CUNY)

Time: Tuesday 6:30 - 8:30 pm
Room: 3209
Credits: 3.0

Office Hours: Tuesday 4:30 - 6:00 pm Rm 4439

Course Update Information

January 30, 2007. First day of class.  Assignment 1 is due in a week. Please send it via email to me at , with  your full name, the last 4 digits of your ID, and including "CSc 80000 Data Mining" (exact please)  in your Subject line. Otherwise I may not able to receive it.

February 01, 2007.
Asignements and Suggested Reading Toptics

February 27, 2007. Plese check out the Reading/Project Presentation Schedule. Everyone: please prepare for the 8 minute fast foward presentation on March 6, 2007. Please send me your PPT slides before 2:00 pm of that day. Otherwise you will need to bring your flash drive to be used in my machine.

May 17, 2007. You may find the slides of all the presentations at Reading/Project Presentation Schedule

May 17, 2007. Final Grading

Course Description

Data Mining has become one of the most exciting and fastest growing fields in computer science. Data Mining refers to various techniques which can be used to uncover hidden information from a database. The data to be mined may be complex, multimedia data including text, graphics, video, audio and bioinformatics data. Data Mining has evolved from several areas including: databases, artificial intelligence, machine learning, pattern recognition, multimedia information retrieval, and can be applied to the exploration of hidden information from web, text, image, audio, video, and bioinformatics data.

Multimedia data mining is also related to multimedia data compression. Data compression is the technique to reduce the redundancies in data representation in order to decrease data storage requirements, and hence communication overloads when transmitted through a communication network. It the compressed data are properly indexed, it may improve the performance of mining data in the compressed large database as well. This is particularly useful when interactivity is involved with a data mining system.

This course is designed to provide graduate students with introductory of multimedia data compression and data mining concepts and tools, and to some extend, their connections. In addition, the students in the class is going to explore the literature on the state-of-the-art research and development in some advanced topics such as web mining, image/video mining and bioinformatics.

Course Organization

The course will consist of lectures by the instructor,  and presentations by students for their readings and project assignments.

Part I. Introduction (Lectures)

1. Introduction, Related Topics & Course Organization (pdf)

2.  Overview of Multimedia Data Compression ( pdf )

3. Overview of Multimedia Data Mining ( pdf )

Part II. Data Mining Core Techniques (Readings)

1. Classification: Bayesian, KNN, ID3, ANN, rule-based

2. Clustering: hierarchical, partitional, clustering in large database

3. Associate Rules: basic and advanced algorithms

Part III. Multimedia Data Compression (Readings)

1. Information theory concepts

2. Data compression issues: models, measures and algorithms

3. Text compression:  LZ77, LZ78, LZW algorithms

4. Image compression: principles, JPEG, JPEG2000

5. Video compression: MPEG-2, MPEG-4/7

Part IV. Multimedia Data Mining (Readings)

1. Text Mining: keyword-basd, text retrieval, similarity-based, etc.

2. Web Mining: contents, structure and usage

2. Image/Video Mining: CBIR, video event detection

3. Bioinformatics: biology preliminaries, information aspects, microarray data clustering


Students who take the course for credits will be required
(1) to finish 2 assignments consisting of mainly paperwork  (20%);
(2) to attend class lectures, presentations and discussions (20%);
(3) to give  two presentations (40 minutes and 10 minutes each) to the class on their reading assignments (30%);  and
(4) to submit a report of a synthetic proposal on their readings and/or designs, and to give a presentation ( 1 hour) in class (30%).

Textbook, References and Readings


Reading Topics

Suggested Reading Topics

Other  References on Multimedia Data Mining/Compression (to be updated):

Copyright @ Zhigang Zhu ( zhu at ), 2007.