Kareem Carr Data Scientist

Stats PhD student @Harvard • I tweet about data science: what it is, how to do it better, why it matters for the rest of society.

We may earn commissions for purchases made via this page

Book Recommendations:

BOOK: Statistical Inference BEST FOR: People with a solid math background including linear algebra and a few calculus courses. We use this in my graduate program. It's a solid textbook that covers most of the basics. https://t.co/seAqP8Ug9L (from X)

Statistical Inference book cover

George Casella, Roger L. Berger(you?)

This book builds theoretical statistics from the first principles of probability theory. Starting from the basics of probability, the authors develop the theory of statistical inference using techniques, definitions, and concepts that are statistical and are natural extensions and consequences of previous concepts. This book can be used for readers who have a solid mathematics background. It can also be used in a way that stresses the more practical uses of statistical theory, being more concerned with understanding basic statistical concepts and deriving reasonable statistical procedures for a variety of situations, and less concerned with formal optimality investigations.

BOOK: Introduction to Probability BEST FOR: People who want to learn some statistics theory but haven't done much calculus or linear algebra. It's a solid introduction to probability theory but not much practical statistics. EXTRAS: Free Youtube lectures by the author https://t.co/6hGNAt6Qb5 (from X)

Introduction to Probability, Second Edition (Chapman & Hall/CRC Texts in Statistical Science) book cover

Joseph K. Blitzstein, Jessica Hwang(you?)

Developed from celebrated Harvard statistics lectures, Introduction to Probability provides essential language and toolsfor understanding statistics, randomness, and uncertainty. The book explores a wide variety of applications and examples, ranging from coincidences and paradoxes to Google PageRank and Markov chain Monte Carlo (MCMC). Additional application areas explored include genetics, medicine, computer science, and information theory. The authors present the material in an accessible style and motivate concepts using real-world examples. Throughout, they use stories to uncover connections between the fundamental distributions in statistics and conditioning to reduce complicated problems to manageable pieces. The book includes many intuitive explanations, diagrams, and practice problems. Each chapter ends with a section showing how to perform relevant simulations and calculations in R, a free statistical software environment. The second edition adds many new examples, exercises, and explanations, to deepen understanding of the ideas, clarify subtle concepts, and respond to feedback from many students and readers. New supplementary online resources have been developed, including animations and interactive visualizations, and the book has been updated to dovetail with these resources. Supplementary material is available on Joseph Blitzstein’s website www. stat110.net. The supplements include: Solutions to selected exercises Additional practice problems Handouts including review material and sample exams Animations and interactive visualizations created in connection with the edX online version of Stat 110. Links to lecture videos available on ITunes U and YouTube There is also a complete instructor's solutions manual available to instructors who require the book for a course.

BOOK: Think Stats (@AllenDowney) BEST FOR: Programmers, computer scientists and Python users. This book teaches you statistics concepts by showing you how to code them from first principles. The one downside is you don't learn state of the art statistics packages. https://t.co/bpnxqNHqOY (from X)

If you know how to program, you have the skills to turn data into knowledge, using tools of probability and statistics. This concise introduction shows you how to perform statistical analysis computationally, rather than mathematically, with programs written in Python. By working with a single case study throughout this thoroughly revised book, you’ll learn the entire process of exploratory data analysis―from collecting data and generating statistics to identifying patterns and testing hypotheses. You’ll explore distributions, rules of probability, visualization, and many other tools and concepts. New chapters on regression, time series analysis, survival analysis, and analytic methods will enrich your discoveries. Develop an understanding of probability and statistics by writing and testing code Run experiments to test statistical behavior, such as generating samples from several distributions Use simulations to understand concepts that are hard to grasp mathematically Import data from most sources with Python, rather than rely on data that’s cleaned and formatted for statistics tools Use statistical inference to answer questions about real-world data

BOOK: R for Data Science (@hadleywickham and @StatGarrett) BEST FOR: People want to get their hands dirty but don't know where to start. Teaches you how to work with data using the R programming language without bogging you down with a lot of theory EXTRAS: Free ebook https://t.co/2NTFEVus7j (from X)

R for Data Science: Import, Tidy, Transform, Visualize, and Model Data book cover

Hadley Wickham, Mine Cetinkaya-Rundel, Garrett Grolemund(you?)

Use R to turn data into insight, knowledge, and understanding. With this practical book, aspiring data scientists will learn how to do data science with R and RStudio, along with the tidyverseâ??a collection of R packages designed to work together to make data science fast, fluent, and fun. Even if you have no programming experience, this updated edition will have you doing data science quickly. You'll learn how to import, transform, and visualize your data and communicate the results. And you'll get a complete, big-picture understanding of the data science cycle and the basic tools you need to manage the details. Updated for the latest tidyverse features and best practices, new chapters show you how to get data from spreadsheets, databases, and websites. Exercises help you practice what you've learned along the way. You'll understand how to: Visualize: Create plots for data exploration and communication of results Transform: Discover variable types and the tools to work with them Import: Get data into R and in a form convenient for analysis Program: Learn R tools for solving data problems with greater clarity and ease Communicate: Integrate prose, code, and results with Quarto

BOOK: The Art of Statistics (@d_spiegel) BEST FOR: People who want a deep dive into the details of when and why people use specific statistical techniques without a lot of math. https://t.co/SXmj2aCDvj (from X)

In this "important and comprehensive" guide to statistical thinking (New Yorker), discover how you can use data and mathematics to gain a better understanding of life’s biggest problems. The age of big data has made statistical literacy more important than ever. In The Art of Statistics, David Spiegelhalter shows how to apply statistical reasoning to real-world problems. Whether we're analyzing preventative medical screening or the terrible crime sprees of serial killers, Spiegelhalter teaches us how to clarify questions, assumptions, and expectations and, most importantly, how to interpret the answers we receive. Combining the incomparable insight of an expert with the playful enthusiasm of an aficionado, The Art of Statistics is the definitive guide to the power of data. "A call to arms for greater societal data literacy . . . a reminder that there are passionate, self-aware statisticians who can argue eloquently that their discipline is needed now more than ever." -- Financial Times