About the Author Joel Grus is a research engineer at the Allen Institute for Artificial Intelligence. Previously he worked as a software engineer at Google and a data scientist at several startups. He lives in Seattle, where he regularly attends data science happy hours.
Features & Highlights
To really learn data science, you should not only master the tools—data science libraries, frameworks, modules, and toolkits—but also understand the ideas and principles underlying them. Updated for Python 3.6, this second edition of Data Science from Scratch shows you how these tools and algorithms work by implementing them from scratch.
If you have an aptitude for mathematics and some programming skills, author Joel Grus will help you get comfortable with the math and statistics at the core of data science, and with the hacking skills you need to get started as a data scientist. Packed with New material on deep learning, statistics, and natural language processing, this updated book shows you how to find the gems in today’s messy glut of data.
Get a crash course in Python
Get a crash course in Python
Learn the basics of linear algebra, statistics, and probability—and how and when they’re used in data science
Learn the basics of linear algebra, statistics, and probability—and how and when they’re used in data science
Collect, explore, clean, munge, and manipulate data
Collect, explore, clean, munge, and manipulate data
Dive into the fundamentals of machine learning
Dive into the fundamentals of machine learning
Implement models such as k-nearest neighbors, Naïve Bayes, linear and logistic regression, decision trees, neural networks, and clustering
Implement models such as k-nearest neighbors, Naïve Bayes, linear and logistic regression, decision trees, neural networks, and clustering
Explore recommender systems, natural language processing, network analysis, MapReduce, and databases.
Explore recommender systems, natural language processing, network analysis, MapReduce, and databases.
.
Customer Reviews
Rating Breakdown
★★★★★
30%
(169)
★★★★
25%
(141)
★★★
15%
(84)
★★
7%
(39)
★
23%
(129)
Most Helpful Reviews
★★★★★
1.0
AGUCRGUUQ4HQY5JFW4Y3...
✓ Verified Purchase
Wow. Figures contain no color.
Apologies to the author for the low-rating. After purchasing this book, I expected color, not gray-scale, used in the (many) visualizations, especially based on the content. Color is important to understanding the material.
I'm not sure if the author had a say in this, or this is O'Reilly's doing, but I would respectfully suggest that if leaving out color in the visualizations was intentional, it's not the way to go.
Based on this, I regretfully suggest that you DO NOT purchase the book. In my view, the book is significantly diminished because of this omission, and I think you'll agree, especially if you purchased at full price as I did.
Does the Kindle version have color? Not sure. I wish I could still purchase PDF's, but that was closed some time ago.
I'm starting to think this is my last O'Reilly book purchase. Perhaps the intent is to move us all to Safari subscriptions.
113 people found this helpful
★★★★★
3.0
AHJQNEZYQAUUURIRMZNL...
✓ Verified Purchase
Annoying Author
The author is clearly technically competent but not a great teacher. His instinct is to write code as concisely as possible at all times, the coder equivalent of throwing lots of SAT words into their sentences to impress everyone with how smart they are. I bought this book to learn data science, not unpack one line code snippets that do 8 things at once.
A better book would focus on the student's learning, with less emphasis on how clever the author is at writing code.
72 people found this helpful
★★★★★
5.0
AEAOSK7QXQPOQOPJ4NZG...
✓ Verified Purchase
Good Coverage of the "Bare Metal" of basic Data Science
If you need a good broad brush to learn from, the second revision (in monochrome) is the book for you!
Yes, there is numpy, pandas, and a host of other packages and frameworks available to perform many of the examples of what is explained in the book. But you need to broaden your knowledge with this material that touches the "bare metal" of Data Science.
Excellent use is made of clear, concise verbiage to make things "black and white". (save the color images and other crutches for the board room stakeholders!).
33 people found this helpful
★★★★★
5.0
AGKDU7Z3XLDZ734ZLJPD...
✓ Verified Purchase
Amazing introduction to Data Science
Let me start this review by explaining clearly who this book is for: anyone who has had some form of introduction (even if concise) to programming in Python, algebra, statistics, and probability will find this book a great introduction to Data Science. While the author does a great job at having a crash course on these topics (and I even learned a thing or two here and there), I can see the contents being a bit overwhelming if this is your first point of contact with these subjects. However, should you meet the requirements I mentioned above, you'll find this book a breeze! Joel does a good job at explaining the topics using his signature brand of humor, keeping the read entertaining even in the most advanced areas. I'd even say that this is a must read if you are considering going into machine learning, since it teaches you a thing or two in the topic as well. Please keep in mind that the book is monochrome. If that bothers you, consider viewing the electronic version.
TLDR: If you're looking for a concise introduction to data science and have a bit of knowledge of basic Python, algebra, statistics and probability, look no further than this book! Otherwise, come back once you've picked up those tools and you'll feel right at home :)
22 people found this helpful
★★★★★
5.0
AEM3HE5W3NMZT7BKWZX2...
✓ Verified Purchase
Easily the best data science book
This is the first book to finally succeed Segaran's Programming Collective Intelligence. It always teaches concepts with real world and plausible examples. The author's humor is subtlety felt through out the book to keep things entertaining.
16 people found this helpful
★★★★★
3.0
AHP4C6GDXDCBHHGFDMB5...
✓ Verified Purchase
Python prereq
It appears to be a good book. I believe if the author clearly mentions that he is assuming the reader already has some basics in Python, the reader/customer will be able to make a informed purchase decision.
10 people found this helpful
★★★★★
5.0
AGFY7GSERTKSD3FTF2WF...
✓ Verified Purchase
Exactly what the title says
If you're looking to get started with data science but you're confused on which material to use...videos, books, a specific online class, etc...start with this book and see where it takes you. This book will give you a great place to start and afterwards, you will have enough knowledge to make an educated choice on what educational resource to use next.
6 people found this helpful
★★★★★
5.0
AGJWR6TRRHQNQQYBQGDB...
✓ Verified Purchase
The BEST book for learning how many data science functions work under the hood - START HERE!
Did you see something on the news about ChatGPT, Stable Diffusion, or some other big development that made you want to look into machine learning?
Maybe you truly plan on entering data science as a field but don't know where to start?
Or perhaps you've seen one of the author's brilliant/hilarious talks about why he doesn't like Jupyter Notebooks or how to answer the infamous "FizzBuzz" programming interview question using Tensorflow neural networks (seriously, look up Joel Grus on YouTube).
If you know a little bit of Python, a little bit of relevant math, and want to go into any data science or machine learning path, then this book is a must-have. It certainly won't be the only resource you'll need, but it helps you get the most out of other content you'll likely look into later (like how to code up a machine learning pipeline, or maybe a large language model if you're really adventurous).
Far too many machine learning lessons out there just tell you to import certain Python libraries (scikit-learn for example) and start using them without giving you any basic understanding of how those imported functions even work to begin with. Even to this day there are still college courses and coding bootcamps that ask you to download a Jupyter Notebook file and just hit "Shift + Enter" and look at the output.
You're not going to learn how to code that way!!!
Joel Grus does an excellent job of filling in this gap by teaching you more Python than what a statistics professional would usually know and more math than what a typical software developer would know. And that's key if you want to go into a field that relies on both.
All the information for Python and math that you need to get started is here. It's 27 chapters that get you familiar with Python and how to use it, as well as the math used in data science and ML (linear algebra, probability and statistics, algorithms, etc).
You eventually learn enough of both as you go through the chapters to start applying what you learn for some real-world usage.
I've had this book for years and it's still as useful as when it first came out, but the only exception I've seen is that the Twitter API tutorial in the book no longer applies to the paid format that Twitter now uses to access that feature. The tutorial is still good for learning how API's get put to use.
Once you've read this book and have gotten familiar with all it has to offer, your next step will probably involve looking into a book about how to actually use pre-built data science libraries (like what you find in the Anaconda distribution of Python).
This book may turn out to be heavily responsible for my first startup, but that's a story for later.
3 people found this helpful
★★★★★
5.0
AGJWR6TRRHQNQQYBQGDB...
✓ Verified Purchase
The BEST book for learning how many data science functions work under the hood - START HERE!
Did you see something on the news about ChatGPT, Stable Diffusion, or some other big development that made you want to look into machine learning?
Maybe you truly plan on entering data science as a field but don't know where to start?
Or perhaps you've seen one of the author's brilliant/hilarious talks about why he doesn't like Jupyter Notebooks or how to answer the infamous "FizzBuzz" programming interview question using Tensorflow neural networks (seriously, look up Joel Grus on YouTube).
If you know a little bit of Python, a little bit of relevant math, and want to go into any data science or machine learning path, then this book is a must-have. It certainly won't be the only resource you'll need, but it helps you get the most out of other content you'll likely look into later (like how to code up a machine learning pipeline, or maybe a large language model if you're really adventurous).
Far too many machine learning lessons out there just tell you to import certain Python libraries (scikit-learn for example) and start using them without giving you any basic understanding of how those imported functions even work to begin with. Even to this day there are still college courses and coding bootcamps that ask you to download a Jupyter Notebook file and just hit "Shift + Enter" and look at the output.
You're not going to learn how to code that way!!!
Joel Grus does an excellent job of filling in this gap by teaching you more Python than what a statistics professional would usually know and more math than what a typical software developer would know. And that's key if you want to go into a field that relies on both.
All the information for Python and math that you need to get started is here. It's 27 chapters that get you familiar with Python and how to use it, as well as the math used in data science and ML (linear algebra, probability and statistics, algorithms, etc).
You eventually learn enough of both as you go through the chapters to start applying what you learn for some real-world usage.
I've had this book for years and it's still as useful as when it first came out, but the only exception I've seen is that the Twitter API tutorial in the book no longer applies to the paid format that Twitter now uses to access that feature. The tutorial is still good for learning how API's get put to use.
Once you've read this book and have gotten familiar with all it has to offer, your next step will probably involve looking into a book about how to actually use pre-built data science libraries (like what you find in the Anaconda distribution of Python).
This book may turn out to be heavily responsible for my first startup, but that's a story for later.
3 people found this helpful
★★★★★
3.0
AGZBYMF7ZOOKT7NWHTXZ...
✓ Verified Purchase
Useful but not extraordinary
Ambivalent about this one. On one hand the idea of implementing major ml and data science algorithms bottoms up, only using the base library in Python, is great as you can get a deeper understanding. From this point of view the book is worth reading. However the theory is quite rushed, the mathematics could have been described separately in formulas, not only code, and it lacks any graphical illustrations that would help you to visually understand. Lazy from this point of view. Overall worth reading but do not expect miracles.