CIS 530 -- Introduction to Natural Language Processing -- 2009


COURSE STRUCTURE
* MODULES AND NOTES * ASSIGNMENTS * RESOURCES

Instructor
Mitch Marcus 
Office: Levine 503 
mitch (AT) cis.upenn.edu 
Office Hours: TBA 
Teaching Assistant
Constantine Lignos 
Office: IRCS 410 
lastname (AT) seas.upenn.edu 
Office Hours: TBA  
 

Class Schedule: Tuesday & Thursday, 4:30pm to 6:00pm, Moore 212

Course Administrator: Cheryl Hickey, 502 Levine, 215-898-3538, cherylh (AT) cis.upenn.edu

COURSE STRUCTURE


Web Page:
http://www.seas.upenn.edu/~cis530/

Textbooks:
  • Jurafsky & Martin,SPEECH and LANGUAGE PROCESSING: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd edition
  • Chris Manning & Hinrich Shutze, Foundations of Statistical Natural Language Processing, MIT Press, 1999. (available online from the Penn campus)
  • Steven Bird, Ewan Klein, and Edward Loper. Natural Language Processing with Python, Available online from the NLTK site.
  • Various supplementary readings.


  • Homework:
    Homework will be distributed on the lectures and posted on the web page.
    Late homeworks will be penalized, based on the number of weekdays (or fractions thereof) passed since the HW was due:
    1 day late- 20% penalty
    2 days late- 30% penalty
    3 days late- 50% penalty
    More than 3 days late- no credit
    For example, if a HW is due on Thursday at 4PM, if turned in late before 4PM Friday you will receive a 20% penalty, Monday before 4PM a 30% penalty, Tuesday before 4PM a 50% penalty, and you will receive no credit if turned in after 4PM Tuesday.

    Back to Top

    CLASS MODULES

    Links to classroom slides will appear below.

    Lecture Notes are in Microsoft PowerPoint format. You can view them with either Microsoft PowerPoint or the free Microsoft PowerPoint Viewer on Windows.

    Module 1: Introduction & Word-Based Methods  
    • Course Introduction
      • Introduction and Syllabus [Slides]
      • A Sample of Applications, NLP as Cognitive Science [Slides]
    • Introduction to Python [Slides]
    • Introduction to Information Theory [Slides]
    • N-Gram Word-Based Models of Syntax [ Slides: Powerpoint,] [ pdf]
      • Word Distributions
      • Smoothing & Backoff
    • Word Classes and Part of Speech Tagging [Slides I: Powerpoint,] [ PDF] [Slides II: Powerpoint,] [ PDF]
      • Tag Set Design
      • Hidden Markov Models
      • Transformation-based Learning
    • Speech Recognition [Slides]
      • Why is Speech Recognition hard?
      • HMMs for speech


    Module 2: Parsing
    • Introduction to Syntactic Analysis
    • Context Free Models for English Syntax [Slides]
    • Basic CF Parsing Algorithms [Slides]
    • Enriched Models for NL Syntax
      • The inadequacy of CF Models
      • Feature Structures and Unification [Slides]
      • Tree Adjoining Grammars [Slides]
    • Statistical Parsing of CFGs
      • Probabilistic CFGs
      • Generative Statistical Models
      • Discriminative Models for Parsing


    Module 3: Meaning
    • A Practical Introduction to Semantics
      • Lexical Semantics
        • Word Sense Disambiguation: Decision Lists, SVMs
      • Logical Form and Semantics
        • Introduction to Logical Form
        • Mapping from Syntactic Structures to LF
      • Practical Methods: Information Extraction and Named Entity Recognition
      • Practical Methods: Naive Bayes for Spam Filtering; SVMS, Perceptrons Recognition
    • Discourse & Pragmatics
      • Text Coherence & Discourse Structure
      • Discourse Intentions in Human-Robot Language


    Module 4: Putting the Pieces Together
    • Machine Translation
      • Statistical Translation: The state of the art
    • Multi-document Summarization

    Back to Top

    HOMEWORK ASSIGNMENTS

    General Information (Using Python, NLTK, Coding Standards, etc.)

    How to submit:
    • 1. connect to eniac.seas.upenn.edu
    • 2. type the command ' turnin -c cis530 -p hwx filename'
    • 3. If the system requires to choose the section, type 'ALL'


    Assignment I
    • Due: Thursday, Oct. 8th, 2009, 4PM.
      Suggested reading: NLTK Book, Ch. 1-2
      Solution


    Assignment II
    • Due: Thursday, Oct. 29th, 2009, 4PM.


    Assignment III
    • Due: Thursday, Nov. 12th, 2009, 4PM.
      Data



    Back to Top

    OTHER RESOURCES

    Python Resources


    Back to Top


    For more information, please contact mitch (AT) cis.upenn.edu

    Back to the CIS homepage

    Final for 2009