You are not allowed to perform this action
                    
                    
Principles and Techniques in Data Science
| Instructor: Mahdi Dolati | Certificate: Official | 
| Term: Summer 2025 | Prerequisite: Python Programming | 
| Schedule: Sunday and Tuesday 16:30-18:00 | Online Class: Online Class | 
General Objective
The objective of this course is to empower students to provide data-driven solutions for various problems. For this purpose, students will become familiar with the mathematical and statistical prerequisites for such approaches, learn the principles and steps of data-driven solutions including data analysis and visualization, statistical and probabilistic modeling, statistical inference, and decision-making under uncertainty. Through practical application of these methods to real-world problems, students will become familiar with the challenges of implementing these methods in practice.
Topics
- Data Analysis
- Introduction to the data science lifecycle
 - Data generation (questionnaires, census, controlled experiments)
 - Data collection and aggregation (data standardization, tabular data representation, filtering and aggregating data)
 - Data cleaning (outlier management, missing values, encoding and vector space representation)
 - Exploratory data analysis
 - Data visualization
 - Pattern recognition and hypothesis generation through data visualization
 - Understanding pitfalls in data analysis (data bias, insufficient features, confusing correlation with causation)
 - Hypothesis testing and p-value manipulation
 
 - Statistical Data Modeling
- Introduction to modeling steps (cost function, parameter learning, prediction, decision theory)
 - Model generalization capability and its evaluation using cost functions
 - Training, validation, and test data separation
 - Overfitting, cross-validation, regularization
 - Optimization methods (gradient descent, Newton's method, momentum-based methods)
 - Probabilistic and Bayesian modeling
 - Statistical inference, model learning using estimation theory, prediction using trained models
 - Decision theory
 - Bias-variance tradeoff
 - Curse of dimensionality
 
 - Statistical Modeling in Practice
- High-dimensional data visualization using t-SNE
 - Feature extraction and selection
 - Feature quantization using decision trees
 - Linear classification methods
 - Classification using decision trees
 - Classifier evaluation
 
 - Machine Learning Engineering in Production
- Introduction to MLOps: end-to-end learning, continuous learning, data drift, concept drift, feature store, pipelines
 - Data lifecycle in production environments
 - Learning lifecycles and pipelines in production environments
 - Deployment of learning systems in production environments
 
 
Assessment
- Exams: Midterm and final exams (50% of grade)
 - Assignments and Project: Three theoretical assignments and one practical project to be submitted during the semester (50% of grade)
 
References
- Principles and Techniques of Data Science, UC Berkeley, Fall 2022.
 - J. Grus, Data Science from Scratch, O’Reilly, 2019.
 - G. James, D. Witten, T. Hastie, R. Tibshirani, An Introduction to Statistical Learning, Springer, 2017.
 - C. O'Neil, R. Schutt, Doing Data Science, O’Reilly, 2013.
 - W. McKinney, Python for Data Analysis, O’Reilly, 2012.