|
CSc 59866 - CAPSTONE
I Prof. Esther Levin Fall 2005 Spoken Dialog Systems and Voice XML |
|||
|
Time: |
MW 12:15 am – 1:45 pm |
Place: |
NAC 7-312 |
|
Professor: |
Prof. Esther Levin |
Office Hours: |
Mon 2:00-4:00 |
|
Email: |
esther@cs.ccny.cuny.edu |
Phone: |
212 650-5626 |
Spoken dialogue systems enable users to interact with computer systems via natural and intelligent dialogues, as they would with human agents. Development of such systems requires a wide range of speech and language technologies, including automatic speech recognition (ASR), to convert audio signals of human speech into text strings, natural language and dialogue processing (NLP), to determine the meanings and intentions of the recognized utterances and to generate a cooperative response to them, and text-to-speech synthesis (TTS), to convert the system utterance into actual speech output.
VoiceXML is the HTML of the voice web, the open standard markup language for voice applications. VoiceXML harnesses the massive web infrastructure developed for HTML to make it easy to create and deploy voice applications. Like HTML, VoiceXML has opened up huge business opportunities: the Economist even says that "VoiceXML could yet rescue telecoms carriers from their folly in stringing so much optical fibre around the world."
While HTML assumes a graphical web browser with display, keyboard, and mouse, VoiceXML assumes a voice browser with audio output, audio input, and keypad input. Audio input is handled by the voice browser's speech recognizer. Audio output consists both of recordings and speech synthesized by the voice browser's text-to-speech system.
VoiceXML takes advantage of several trends:
We will be designing, developing, testing and deploying spoken dialog system for a variety of applications. In particular, we will be designing an Automatic Reader Advisor for New York Public Library - a system that can automatically provide information about books and complete book orders through natural spoken dialog.
Upon the successful completion of this project a student should be able to:
In this project-based course, students are grouped into teams to work on projects involved with design, implementation and testing of spoken dialog systems. The capstone course will last two semesters. In the first semester, we will study key technologies involved in this multi-disciplinary field. The second semester will focus on implementation of exciting real-world dialog systems using the Voice XML platform.
The course material will be entirely self-contained
Fall 2005: There will be 5-8 assignments totaling 50%-80% of the final grade. Some of the assignments will be research to be presented in class. The rest of the points are for 1 or 2 exams. Attendance is mandatory.
McTear, Michael, Spoken Dialogue Technology - Towards the Conversational User Interface. Springer Verlag, 2004
D. Jurafsky and J.H. Martin, SPEECH and LANGUAGE PROCESSING: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Prentice-Hall, ISBN: 0-13-095069-6, 2000.
R. Duda, P. Hart, D. Stork, "Pattern Classification", second
edition, 2000.
L.R. Rabiner and B.W. Juang, Fundamentals
of Speech Recognition, Prentice-Hall, ISBN: 0-13-015157-2, 1993.
L.R. Rabiner and R.W. Schafer, Digital Processing of Speech Signals,
Prentice-Hall, ISBN: 0-13-213603-1, 1978.
X. Huang, A. Acero, and H.W. Hon, Spoken
Language Processing - A Guide to Theory, Algorithm, and System Development,
Prentice Hall, ISBN: 0-13-022616-5, 2001.
· Aug 29-th: Assignment 1 is posted. Due Sep. 12th in class.
· Aug 29-th: No lab meeting on Wed Sep. 7th
· Sep 12-th: Our next meeting (wed Sep 14-th) will be held in NAC 5/126
· Sep16-th. Assignment 2 is posted. Due Oct 3-rd.
· Sep 22-nd. Grades for Assignment 1 are posted
· Sep 24-th. Please complete the Leeds Online Tutorial on HMMs (except Forward and Forward/Backward Algorithm) before the lecture on Sep 28-th.
· Sep 25-th. : Assignment3 is posted. Due Oct 12-th.
· Sep 29-th: No lab meeting on Mon Oct 3rd.
· Oct 4-th: Assugnment 4 is posted Due Oct 24-th.
· Oct 10-th. Grades for Assignment 2 are posted.
|
Weeks |
Topic |
|
1 |
|
|
2-7 |
Pattern Recognition: Introduction to Pattern recognition, Bayesian Classifiers, Hidden Markov Models, Tutorial on HMMs |
|
8-14 |
Spoken dialog systems, Speech recognition, Grammars, Dialog Managements, Spoken dialog Engineering. |
Allen, James, Natural language understanding - 2nd ed. - Redwood
City, Calif.; Wokingham : Benjamin/Cummings, 1995. - 0805303340
Anderson, E. et al., Early adopter VoiceXML. Wrox Press Ltd, 2001.
Beasley, Rick. - Voice application development with VoiceXML
- Indianapolis, Ind. : Sams, 2001. - 0672321386
Bernsen, Niels Ole. - Designing
interactive speech systems : from first ideas to user testing - London :
Springer, 1998. - 3540760482
Cole, Ron. Survey of the state of the art in human language technology -
Cambridge : Cambridge University Press, 1997. - (Studies in natural language
processing ; v.12-13). - 0521592771
Jurafsky, Dan, 1962-. - Speech and language
processing : an introduction to natural language process. - Upper Saddle
River, N.J.: Prentice Hall; London : Prentice-Hall Internation,
2000. - 0130950696
Larson, J.A. VoiceXML: Introduction to
Developing Speech Applications. Prentice Hall Professional , 2002 -
0130092622
Maier, E. - Dialogue processing in spoken language systems: ECAI '96
workshop, Budapest. - Berlin; London : Springer, 1997. - (Lecture notes in
computer science. Lecture notes in artificial intelligence). - 3540631755
Markowitz, Judith A. - Using speech recognition -
Upper Saddle River, N.J.; London : Prentice Hall, 1996. - 0131863215
Roe, D.B. & Wilpon, J.G. Voice communication
between humans and machines - Washington, D.C.: National Academy Press,
1994. - 0309049881
Miller, M VoiceXML: 10 Projects to Voice
Enable Your Web Site. John Wiley & Sons, Inc., 2002 - 0471207373
Smith, Ronnie W. - Spoken natural language dialog systems : a practical
approach - New York; Oxford : Oxford University Press, 1994. - 0195091876
Natural Language Processing course from University of Ulster
VoiceXML development platform: Bevocal Café
Other VoiceXML developer resources
W3C Dialog Requirements for Voice Markup Languages (http://www.w3.org/TR/voice-dialog-reqs/)
Developer.com (Voice) (http://www.developer.com/voice/)
The XML Cover Pages VoiceXML Forum (Voice Extensible Markup Language Forum)
Voice Services: What sorts of voice applications are best suited for VoiceXML? Here are a few ideas. (http://www.voicexml.org/tutorials/intro6.html)
Sites with sample applications and demos:
Nuance Communications: http://www.nuance.com
Apple: http://www.apple.com/macos/speech/
Scansoft: http://www.scansoft.com/
AAAI Workshop on Miscommunication in Dialogue, August 1996
CONVERSA - voice enabling technologies
CSLU Home Page (Center for Spoken Language Understanding, Oregon)
LIMSI: Projects on spoken language (France)
Speech enabled agents - Microsoft Research
Natural Interactive Systems Laboratory (NIS), Odense University, Denmark
SIGDIAL - special interest group of ACL for dialogue and discourse
Speech Applications Project (Sun Microsystems)
Spoken Language Systems Group (MIT)
TRAINS Project Home Page (University of Rochester)
Verbmobil (Large project based in Germany on spoken language and dialogue)
Waxholm dialog project (Sweden)
http://www.nuance.com/solutions/utilities/index.html
http://www.nuance.com/solutions/bankingcredit/index.html
http://www.scansoft.com/network/solutions/
http://www-306.ibm.com/software/pervasive/tech/demos/voice_server_demo.shtml
(download Flash demo - WSVdemo.exe)
http://www.voicegenie.com/Phone_Demos.htm?5.0.0.0
(Flash demos: ATM locator, Taxi booking, email reader)