Large-Scale Data Analytics with Python and Spark

All titles

A Hands-on Guide to Implementing Machine Learning Solutions

Author:

Isaac Triguero, University of Nottingham
Mikel Galar, Public University of Navarre

Published:

November 2023

Format:

Paperback

ISBN:

9781009318259

Experience the eBook and the associated online resources on our new Higher Education website. Go to site For other formats please stay on this page.

$39.99

USD

Paperback

Request inspection copy

Description

Based on the authors' extensive teaching experience, this hands-on graduate-level textbook teaches how to carry out large-scale data analytics and design machine learning solutions for big data. With a focus on fundamentals, this extensively class-tested textbook walks students through key principles and paradigms for working with large-scale data, frameworks for large-scale data analytics (Hadoop, Spark), and explains how to implement machine learning to exploit big data. It is unique in covering the principles that aspiring data scientists need to know, without detail that can overwhelm. Real-world examples, hands-on coding exercises and labs combine with exceptionally clear explanations to maximize student engagement. Well-defined learning objectives, exercises with online solutions for instructors, lecture slides, and an accompanying suite of lab exercises of increasing difficulty in Jupyter Notebooks offer a coherent and convenient teaching package. An ideal teaching resource for courses on large-scale data analytics with machine learning in computer/data science departments.

Engages students and supports instructors in teaching large-scale data analytics and ML
Encourages hands-on learning and fosters reflective thinking with explanations, code, real examples, and exam-style exercises
Introduces the key principles of big data platforms rather than attempting to cover all technical aspects, to avoid overwhelming students
Provides lab assignments to assess student progress, designed to run on standard computers without expensive big data infrastructures

Reviews & endorsements

'With the growing ubiquity of large and complex datasets, MapReduce and Spark's dataflow programming models have become mission-critical skills for data scientists, data engineers, and ML engineers. Triguero and Galar leverage their extensive teaching experience on this topic to deliver this tour de force deep dive into both the technical concepts and programming knowhow needed for such modern large-scale data analytics. They interleave intuitive exposition of the concepts and examples from data engineering and classical ML pipelines with well-thought-out hands-on code and outputs. This book not only shows how all this knowledge is useful in practice today but also sets up the reader to be able to successfully 'generalize' to future workloads.' Arun Kumar, University of California, San Diego

See more reviews

Product details

Published: November 2023
Format: Paperback
ISBN: 9781009318259
Length: 422 pages
Dimensions: 245 × 170 × 20 mm
Weight: 0.78kg
Availability: Available

Often bought together

This title is available for institutional purchase via Cambridge Core

Learn more

Related Journals

Also by this Author

Contents

Part I. Understanding and Dealing with Big Data:
1. Introduction
2. MapReduce
Part II. Big Data Frameworks:
3. Hadoop
4. Spark
5. Spark SQL and DataFrames
Part III. Machine Learning for Big Data:
6. Machine Learning with Spark
7. Machine Learning for Big Data
8. Implementing Classical Methods: k-means and Linear Regression
9. Advanced Examples: Semi-supervised, Ensembles, Deep Learning Model Deployment.

Look inside

Courses

Resources

Additional Information

About the authors

Products and services

About us