You are not allowed to perform this action
Principles and Techniques in Data Science
Instructor: Mahdi Dolati | Certificate: Official (bilingual) |
Term: Summer 2025 | Prerequisite: Python Programming |
Schedule: Sunday and Tuesday 16:30-18:00 | Online Class: Online Class |
General Objective
The objective of this course is to empower students to provide data-driven solutions for various problems. For this purpose, students will become familiar with the mathematical and statistical prerequisites for such approaches, learn the principles and steps of data-driven solutions including data analysis and visualization, statistical and probabilistic modeling, statistical inference, and decision-making under uncertainty. Through practical application of these methods to real-world problems, students will become familiar with the challenges of implementing these methods in practice.
Topics
- Data Analysis
- Introduction to the data science lifecycle
- Data generation (questionnaires, census, controlled experiments)
- Data collection and aggregation (data standardization, tabular data representation, filtering and aggregating data)
- Data cleaning (outlier management, missing values, encoding and vector space representation)
- Exploratory data analysis
- Data visualization
- Pattern recognition and hypothesis generation through data visualization
- Understanding pitfalls in data analysis (data bias, insufficient features, confusing correlation with causation)
- Hypothesis testing and p-value manipulation
- Statistical Data Modeling
- Introduction to modeling steps (cost function, parameter learning, prediction, decision theory)
- Model generalization capability and its evaluation using cost functions
- Training, validation, and test data separation
- Overfitting, cross-validation, regularization
- Optimization methods (gradient descent, Newton's method, momentum-based methods)
- Probabilistic and Bayesian modeling
- Statistical inference, model learning using estimation theory, prediction using trained models
- Decision theory
- Bias-variance tradeoff
- Curse of dimensionality
- Statistical Modeling in Practice
- High-dimensional data visualization using t-SNE
- Feature extraction and selection
- Feature quantization using decision trees
- Linear classification methods
- Classification using decision trees
- Classifier evaluation
- Machine Learning Engineering in Production
- Introduction to MLOps: end-to-end learning, continuous learning, data drift, concept drift, feature store, pipelines
- Data lifecycle in production environments
- Learning lifecycles and pipelines in production environments
- Deployment of learning systems in production environments
Assessment
- Exams: Midterm and final exams (50% of grade)
- Assignments and Project: Three theoretical assignments and one practical project to be submitted during the semester (50% of grade)
References
- Principles and Techniques of Data Science, UC Berkeley, Fall 2022.
- J. Grus, Data Science from Scratch, O’Reilly, 2019.
- G. James, D. Witten, T. Hastie, R. Tibshirani, An Introduction to Statistical Learning, Springer, 2017.
- C. O'Neil, R. Schutt, Doing Data Science, O’Reilly, 2013.
- W. McKinney, Python for Data Analysis, O’Reilly, 2012.