Instructor: Fei Lu [
feilu## ( ## = @math.jhu.edu) ]
Class meets: TuTr,10:30-11:45,Maryland 202
Office Hours: TuTr,10-10:30,11:45--12:30, Krieger 218
Textbooks: see more in course Plan .
CS02: S0Felipe Cucker and Steve Smale. On the mathematical foundations of learning. Bulletin of the American Mathematical Society, 39(1):1--49, 2002.
DGL96: Luc Devroye, Laszlo Gyorfi, and Gabor Lugosi, A probability theory of pattern recognition
Mur22intro, Mur22adv: Kevin Murphy, Machine learning: a probabilistic perspective. Introduction and Advanced Topics.
BN06: Christopher M Bishop and Nasser M Nasrabadi. Pattern recognition and machine learning, volume 4. Springer, 2006.
SF12: Robert E. Schapire and Yoav Freund. Boosting: Foundations and Algorithms. MIT Press, 2012. Open Access
Syllabus: This course provides an introduction to three topics: learning kernels in operators arising from interacting particle systems, probability theory for time series classification, and time series modeling with neural networks. The course focuses on modeling with time series data by combining statistical/machine learning theory with dynamical systems. The underlying theme is a probabilistic perspective of learning, viewing the time series and dynamical systems as descriptions of stochastic processes.
Grading: class participation (60%), presentation (40%)
week | Topic | ||
---|---|---|---|
Nonparametric learning of kernels in operators | |||
8/29,31 | Plan Overview and review classical learning theory LecNote1 |
||
9/5,7 | Review classical learning theory: LecNote1 Finite-many particles LecNote2 |
||
9/12,14 | Coercivity condition: LecNote2 Mean-field equations: construction of loss function LecNote4 |
||
9/19,21 | Mean-field equations: identifiability LecNote4 Regularization: DARTR LecNote5 |
||
9/26,28 | Regularization: DARTR LecNote5 Small noise analysis LecNote6 |
||
10/3,5 | Small noise analysis: proof LecNote6 Bayesian perspective and measure on infinite-D |
||
Time series classification: Keras and TSC website | |||
10/10,12 | Probability theory for Pattern Recognition: DGL96, Chapter 1-2 | ||
10/17,19 | Linear discrimiation: DGL96, Chapter 4 | ||
10/24,26 | Nearest neighbor rules: DGL96, Chapter 5 | ||
10/31,11/2 | Consistency: DGL96, Chapter 6 Boosting: slides of Rob Schapire Survey by Schapire, 2012; and XGBoost (Chuhuan Huang) |
||
11/7,9 | Convergence of AdaBoost: Boosting: Foundations and Algorithms, rate of convergence TSC: ROCKET paper and code (Yantao Wu) |
||
11/14,16 | Random Forest TSC: ResNet |
||
Time series modeling | |||
11/20 -24 | No class. Thanksgiving break | ||
11/28,30 |
PanguWeather in Nature: Li etc: Accurate medium-range global weather forecasting with 3D neural networks GraphCast in Science: Lam etc: Learning skillful medium-range global weather forecasting Deepmind blog Transformer: Vaswani etc 17': Attention Is All You Need |
||
12/5,7 | Transformer for TS modeling: Geneva+Zabaras22: Transformer for modeling physical systems Survey: Zeng etc2205: Are Transformers Effective for Time Series Forecasting? AAAI 2023 Neural ODEs Chen etc18: Neural Ordinary Differential Equations Deep Implicit Layers Tutorial Neural CDEs Kidger+Morill+Foster+Lyons20: Neural Controlled Differential Equations for Irregular Time Series |
||
12/12 | Deep state-space model: Rangapuram etc18: Deep State Space Models for Time Series Forecasting Deep SSM, nonlinear Gedon etc21: Deep State Space Models for Nonlinear System Identification Structured state-space sequential model (S4): Gu+Goel+Saab+Ré: Structured State Spaces for Sequence Modeling (S4) |