CS 395T Computational Statistics with Application to Bioinformatics
CAM 383M Statistical and Discrete Methods for Scientific Computing

Course data:
Department: The University of Texas at Austin, Department of Computer Sciences
Instructor: Professor William H. Press
Now being given: Spring, 2011
Previously offered: Spring, 2008; Spring, 2009; Spring, 2010.
Meets for lectures: MW 1:30 - 3:00 PM in PAI 3.14, and occasionally F 1:30 - 3:00 in WEL 4.224
Meets for discussion section and computational workshop: F 1:30 - 3:00 in WEL 4.224

Current Information for Spring 2011:
For up-to-date information see the CS395T/CAM383M Discussion Forum. The rest of this page has information about the course in general.

Course description:
This is a practical course in applying (mostly) modern statistical techniques to (mostly) real data, particularly bioinformatic data and large data sets. There is only a small amount of theorem proving; the emphasis is on efficient computation and concise coding, mostly in MATLAB (where we learn various data-parallel language idioms) and C++ (which we learn to interface seamlessly to MATLAB for convenience and computational power).

Topics covered:
Topics covered include probability theory and Bayesian inference; univariate distributions; Central Limit Theorem; generation of random deviates; tail (p-value) tests; multiple hypothesis correction; empirical distributions; model fitting; error estimation; contingency tables; multivariate normal distributions; phylogenetic clustering; Gaussian mixture models; EM methods; maximum likelihood estimation; Markov Chain Monte Carlo; principal component analysis; dynamic programming; hidden Markov models; performance measures for classifiers; support vector machines; Wiener filtering; wavelets; multidimensional interpolation; information theory.

Lecture notes and concept list:
A detailed course outline, with links to complete lecture notes (PDF slide files) is here, from Spring, 2010. The list of concepts associated with each lecture is provided as a study guide. Earlier versions from previous years are also available: 2009, and 2008.

(Instructors at other institutions may obtain PowerPoint versions of these files on request.)

Prerequisites:
Graduate standing, or upper-division undergraduate with consent of instructor. Mathematics at least including undergraduate multivariable calculus and linear algebra is assumed, as well as some programming experience in MATLAB, C++, and/or Java (or, possibly, Mathematica, Fortran, or C). A previous course in undergraduate level probability and statistics is helpful, but not required.

Texts:
There is no required text. However, many lectures will utilize methods in Numerical Recipes, Third Edition. Enrolled students will be provided with a free electronic subscription to this book, as well as access to its source code.

Some other relevant books are:

Course requirements:
Enrolled students are expected to attend lectures. Occasional (not regular) problem sets or computer exercises will be assigned. Students are expected to contribute to the course wiki. A student project or paper is required (may be collaborative). No written exams, but there will be individual final oral interviews (20-30 min.) covering the lecture material.

2010 course forum:
The 2010 course discussion forum has additional materials, links to references, and discussion contributed by students. (Also includes archived 2009 course forum.)

2008 course wiki:
The 2008 course wiki has an earlier version of the lecture notes, with some discussion threads and contributions by students. (Unfortunately, some of these are hard to find, since they are attached to individual slide links in the lectures.)