Our systems are now restored following recent technical disruption, and we’re working hard to catch up on publishing. We apologise for the inconvenience caused. Find out more

Recommended product

Popular links

Popular links


Probability and Statistics for Data Science

Probability and Statistics for Data Science

Probability and Statistics for Data Science

Carlos Fernandez-Granda, New York University
July 2025
Hardback
9781009180085
c.
$120.00
USD
Hardback
USD
Paperback

    This self-contained guide introduces two pillars of data science, probability theory and statistics, side by side, illuminating the connections between probabilistic concepts and the statistical techniques they employ, such as the relationship between nonparametric and parametric models and random variables. Other topics covered include hypothesis testing, principal component analysis, correlation, and regression. Examples throughout the book draw from real-world datasets, quickly demonstrating concepts in practice and confronting readers with fundamental challenges in data science, such as overfitting, the curse of dimensionality, and causal inference. Code in Python reproducing these examples is available on the book's website, along with videos, slides, and solutions to exercises. This accessible book is ideal for undergraduate and graduate students, data science practitioners, and others interested in the theoretical concepts underlying data science methods.

    • Focuses on the topics most relevant to data science in practice
    • Emphasizes intuition and concrete applications while remaining mathematically rigorous and proving all mathematical statements
    • Allows readers to practice and review concepts with 200 exercises and 100 videos

    Product details

    July 2025
    Hardback
    9781009180085
    700 pages
    254 × 178 mm
    Not yet published - available from July 2025

    Table of Contents

    • Preface
    • Introduction and Overview
    • 1. Probability
    • 2. Discrete variables
    • 3. Continuous variables
    • 4. Multiple discrete variables
    • 5. Multiple continuous variables
    • 6. Discrete and continuous variables
    • 7. Averaging
    • 8. Correlation
    • 9. Estimation of population parameters
    • 10. Hypothesis testing
    • 11. Principal component analysis and low-rank models
    • 12. Regression and classification
    • A. Datasets
    • References
    • Index.