Internet search, speech recognition, machine translation, question answering, information retrieval, biological sequence analysis -- are all at the forefront of this century’s information revolution. In addition to their use of machine learning, these technologies rely heavily on classic statistical estimation techniques. Yet most CS and engineering undergraduate programs do not prepare students in this area beyond an introductory probability & statistics course. This course is designed to address this gap.
The goal of "Language and Statistics" is to ground the data-driven techniques used in language technologies in sound statistical methodology. We start by formulating various language technology problems in both an information theoretic framework (the source-channel paradigm) and a Bayesian framework (the Bayes classifier). We then discuss the statistical properties of words, sentences, documents and whole languages, and the various computational formalisms used to represent language. These discussions naturally lead to specific concepts in statistical estimation.
Topics include: Zipf's distribution and type-token curves; point estimators, Maximum Likelihood estimation, bias and variance, sparseness, smoothing and clustering; interpolation, shrinkage, and backoff; entropy, cross entropy and mutual information; decision tree models applied to language; latent variable models and the EM algorithm; hidden Markov models; exponential models and the maximum entropy principle; semantic modeling and dimensionality reduction; probabilistic context-free grammars and syntactic language models.
“Language and Statistics” is designed for LTI and other SCS graduate students, but others are also welcome. CS undergraduate upperclassmen who have taken it have done well, although they found it challenging.
The Master-level version of this course (11-661) does not require the course project.
Instructor: Bhiksha Raj
TAs:
Lecture: Monday and Wednesday, 12:00 noon - 1.20pm
Location: Gates-Hillman Complex GHC 4307
Recitation: TBD
Office hours:
Strong quantitative aptitude. Familiarity and comfort with basic undergraduate-level probability. Some programming skill.
This course is worth 12 units.
Maximum | |
Assignments | 7 assignments, total contribution to grade 70% |
Final Project | 1 project, total contribution to grade 30% (only for those taking 11-761; otherwise, renormalize) |
Extensions: If you have an unavoidable conflict that prevents you from completing an assignment on time (such as travel to a conference or a medical emergency), please send an email to the TA in charge of that assignment, as soon as you become aware of the problem, briefly stating the circumstances and how much more time you need. The TA-in-charge is authorized to grant an extension as long as you requested it promptly. Do not send extension requests to the Instructor, nor to the other TAs.
Assignments turned late without prior approval will incur a 25% penalty per 24-hour period or any part thereof.
The course will not follow a specific book, but will draw from a number of sources. We list relevant books at the end of this page. We will also put up links to relevant reading material for each class. Students are expected to familiarize themselves with the material before the class. The readings will sometimes be arcane and difficult to understand; if so, do not worry, we will present simpler explanations in class.
We will use Piazza for discussions. Here is the link. Please sign up.
Lectures | Dates | Topics | Required Reading (due BEFORE class) | Lecture notes/Slides | Additional readings, if any | Assignments |
---|---|---|---|---|---|---|
1,2 | August 29, September 5 |
|
[MS] 1.1 - 1.3, 2.1 |
|
HW 1 out on 09/08 | |
3,4,5 | September 10, September 12, September 17 |
|
[MS] 1.4 | HW1 due on 09/14 | ||
6,7 | September 19, September 24, September 26 |
|
[MS] 6.2.1 | [mD] ch. 6, esp. 6.5 | HW2 out on 09/23. Latex source available here | |
October 1 | Class Cancelled | |||||
8 | October 3 |
|
[MS] 6.2.2 | Slides included in the Maximum Likelihood Estimation deck | Good-Turing | HW2 due on 09/30 |
9 | Oct 5 @ NSH 3002 (Make up class) |
|
|
Slides included in the Maximum Likelihood Estimation deck | Chen & Goodman 98(pp. 1-21) | HW 3 out on 10/08 |
10,11 | Oct 8, Oct 10 |
|
|
Slides included in the Maximum Likelihood Estimation deck | More advanced EM notes by John Lafferty |
|
12,13 | Oct 15, Oct 17 |
|
||||
14,15 | Oct 22, Oct 24 |
|
|
|||
16,17 | Oct 29, Oct 31 |
|
|
|||
18,19 | Nov 5, Nov 7 |
|
[MS] ch. 9 | Larry Rabiner's classic HMM tutorial |
|
|
20 | Nov 12 |
|
[MS] ch. 9 | Larry Rabiner's classic HMM tutorial |
|
|
Nov 15 - Nov 17 |
|
|||||
21,22,23 | Nov 14, Nov 19, Nov 26 |
|
|
|||
24 | November 28 |
|
[MS] 11.1-11.4 | |||
25,26 | Dec 3, Dec 5 |
|
HW 7 due on 12/07 |
Abbreviation | Book |
[MS] | Manning and Schutze, Foundations of Statistical Natural Language Processing. |
[BCW] | Bell, Cleary and Witten, Text Compression. |
[mD] | Morris DeGroot, Probability and Statistics, 2nd edition. |
[sK] | Slava M. Katz, "Estimation of probabilities from sparse data for the language model component of a speech recognizer", IEEE Transactions on Acoustic, Speech and Signal Processing, vol. 35, no. 3, pp. 400-401, 1987. |
[BBDM] | L. Bahl, P. Brown, P. de Souza and R. Mercer, "A tree-based statistical language model for natural language speech recognition", IEEE Transactions on Acoustic, Speech and Signal Processing, vol. 37, no. 7, pp. 1001-1008, 1989. |
[rR] | Roni Rosenfeld, "A maximum entropy approach to adaptive statistical language modeling", Speech and Language, vol. 10, pp. 187-228, 1996. |