CIS 530 -- Introduction to Natural Language Processing -- 2009
COURSE STRUCTURE *
MODULES AND NOTES *
ASSIGNMENTS *
RESOURCES
Instructor Mitch Marcus
Office: Levine 503
mitch (AT) cis.upenn.edu
Office Hours: TBA |
Teaching Assistant Constantine Lignos
Office: IRCS 410
lastname (AT) seas.upenn.edu
Office Hours: TBA
|
Class Schedule:
Tuesday & Thursday, 4:30pm to 6:00pm, Moore 212
Course
Administrator: Cheryl Hickey, 502 Levine, 215-898-3538, cherylh (AT)
cis.upenn.edu
COURSE STRUCTURE
Web Page:
http://www.seas.upenn.edu/~cis530/
- Textbooks:
-
- Jurafsky & Martin,SPEECH and LANGUAGE PROCESSING: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd edition
- Chris Manning & Hinrich Shutze, Foundations of Statistical Natural Language Processing, MIT Press, 1999. (available online from the Penn campus)
- Steven Bird, Ewan Klein, and Edward Loper. Natural Language Processing with Python, Available online from the NLTK site.
- Various supplementary readings.
- Homework:
- Homework will be distributed on the lectures and posted on the web page.
- Late homeworks will be penalized, based on the number of weekdays (or fractions thereof) passed since the HW was due:
1 day late- 20% penalty
2 days late- 30% penalty
3 days late- 50% penalty
More than 3 days late- no credit
For example, if a HW is due on Thursday at 4PM, if turned in late before 4PM Friday you will receive a 20% penalty, Monday before 4PM a 30% penalty, Tuesday before 4PM a 50% penalty, and you will receive no credit if turned in after 4PM Tuesday.
Back to
Top
CLASS MODULES
- Links to classroom slides will appear below.
Lecture Notes are in Microsoft PowerPoint format.
You can view them with either Microsoft PowerPoint or the free
Microsoft PowerPoint Viewer on Windows.
Module 1: Introduction & Word-Based Methods
- Course Introduction
- Introduction and Syllabus [Slides]
- A Sample of Applications, NLP as Cognitive Science [Slides]
- Introduction to Python
[Slides]
- Introduction to Information Theory
[Slides]
- N-Gram Word-Based Models of Syntax
[ Slides: Powerpoint,]
[ pdf]
- Word Distributions
- Smoothing & Backoff
- Word Classes and Part of Speech Tagging
[Slides I: Powerpoint,]
[ PDF]
[Slides II: Powerpoint,]
[ PDF]
- Tag Set Design
- Hidden Markov Models
- Transformation-based Learning
- Speech Recognition
[Slides]
- Why is Speech Recognition hard?
- HMMs for speech
Module 2: Parsing
- Introduction to Syntactic Analysis
- Context Free Models for English Syntax
[Slides]
- Basic CF Parsing Algorithms
[Slides]
- Enriched Models for NL Syntax
- The inadequacy of CF Models
- Feature Structures and Unification
[Slides]
- Tree Adjoining Grammars
[Slides]
- Statistical Parsing of CFGs
- Probabilistic CFGs
- Generative Statistical Models
- Discriminative Models for Parsing
Module 3: Meaning
- A Practical Introduction to Semantics
- Lexical Semantics
- Word Sense Disambiguation: Decision Lists, SVMs
- Logical Form and Semantics
- Introduction to Logical Form
- Mapping from Syntactic Structures to LF
- Practical Methods: Information Extraction and Named Entity Recognition
- Practical Methods: Naive Bayes for Spam Filtering; SVMS, Perceptrons
Recognition
- Discourse & Pragmatics
- Text Coherence & Discourse Structure
- Discourse Intentions in Human-Robot Language
Module 4: Putting the Pieces Together
- Machine Translation
- Statistical Translation: The state of the art
- Multi-document Summarization
Back to Top
HOMEWORK
ASSIGNMENTS
General Information (Using
Python, NLTK, Coding Standards, etc.)
How to submit:
- 1. connect to eniac.seas.upenn.edu
- 2. type the command ' turnin -c cis530 -p hwx filename'
- 3. If the system requires to choose the section, type 'ALL'
Assignment I
- Due: Thursday, Oct. 8th, 2009, 4PM.
Suggested reading: NLTK Book, Ch. 1-2
Solution
Assignment II
- Due: Thursday, Oct. 29th, 2009, 4PM.
Assignment III
- Due: Thursday, Nov. 12th, 2009, 4PM.
Data
Back to Top
OTHER RESOURCES
Python Resources
Back to Top
For more information, please contact mitch (AT)
cis.upenn.edu
Back to the CIS homepage
Final for 2009