The Art of R Programming: A Tour of Statistical Software Design
The Art of R Programming: A Tour of Statistical Software Design book cover

The Art of R Programming: A Tour of Statistical Software Design

1st Edition

Price
$12.22
Format
Paperback
Pages
404
Publisher
No Starch Press
Publication Date
ISBN-13
978-1593273842
Dimensions
8.5 x 0.91 x 11 inches
Weight
1.65 pounds

Description

Amazon.com Review From the Author: Why Use R for Your Statistical Work? As the Cantonese say, yauh peng, yauh leng , which means “both inexpensive and beautiful.” Why use anything else? R has a number of virtues: It is a public-domain implementation of the widely regarded S statistical language, and the R/S platform is a de facto standard among professional statisticians. It is a public-domain implementation of the widely regarded S statistical language, and the R/S platform is a de facto standard among professional statisticians. It is comparable, and often superior, in power to commercial products in most of the significant senses -- variety of operations available, programmability, graphics, and so on. It is comparable, and often superior, in power to commercial products in most of the significant senses -- variety of operations available, programmability, graphics, and so on. It is available for the Windows, Mac, and Linux operating systems. It is available for the Windows, Mac, and Linux operating systems. In addition to providing statistical operations, R is a general-purpose programming language, so you can use it to automate analyses and create new functions that extend the existing language features. In addition to providing statistical operations, R is a general-purpose programming language, so you can use it to automate analyses and create new functions that extend the existing language features. R includes a library of several thousand user-contributed packages. R includes a library of several thousand user-contributed packages. It incorporates features found in object-oriented and functional programming languages. It incorporates features found in object-oriented and functional programming languages. R is capable of producing beautiful graphics for your presentations, reports or articles. R is capable of producing beautiful graphics for your presentations, reports or articles. About the Author Norman Matloff is a professor of computer science (and was formerly a professor of statistics) at the University of California, Davis. His research interests include parallel processing and statistical regression, and he is the author of a number of widely-used Web tutorials on software development. He has written articles for the New York Times , the Washington Post , Forbes Magazine , and the Los Angeles Times , and is the co-author of The Art of Debugging (No Starch Press).

Features & Highlights

  • R is the world's most popular language for developing statistical software: Archaeologists use it to track the spread of ancient civilizations, drug companies use it to discover which medications are safe and effective, and actuaries use it to assess financial risks and keep economies running smoothly.
  • The Art of R Programming
  • takes you on a guided tour of software development with R, from basic types and data structures to advanced topics like closures, recursion, and anonymous functions. No statistical knowledge is required, and your programming skills can range from hobbyist to pro.
  • Along the way, you'll learn about functional and object-oriented programming, running mathematical simulations, and rearranging complex data into simpler, more useful formats. You'll also learn to:
  • Create artful graphs to visualize complex data sets and functions
  • Create artful graphs to visualize complex data sets and functions
  • Write more efficient code using parallel R and vectorization
  • Write more efficient code using parallel R and vectorization
  • Interface R with C/C++ and Python for increased speed or functionality
  • Interface R with C/C++ and Python for increased speed or functionality
  • Find new packages for text analysis, image manipulation, and thousands more
  • Find new packages for text analysis, image manipulation, and thousands more
  • Squash annoying bugs with advanced debugging techniques
  • Squash annoying bugs with advanced debugging techniques
  • Whether you're designing aircraft, forecasting the weather, or you just need to tame your data,
  • The Art of R Programming
  • is your guide to harnessing the power of statistical computing.

Customer Reviews

Rating Breakdown

★★★★★
60%
(197)
★★★★
25%
(82)
★★★
15%
(49)
★★
7%
(23)
-7%
(-22)

Most Helpful Reviews

✓ Verified Purchase

Excellent guide to the R language

There are hundreds of R books, but this is the best one to address the core problem of learning to *program* in R. As reviewer Jason notes, R is used by several audiences with varying needs, but anyone who uses R for long must come to terms with learning to program it. This is the book for that.

What Matloff does is to lay out the essentials of the R language (or S, if you prefer) in depth but in a readable fashion, with well-chosen examples that reinforce learning about the language itself (as opposed to focusing on statistics or data analysis).

I'm a long-time (12 years) R user, which is my platform for analytics every day, and I have programmed in a variety of languages from C to Perl. I have long missed the fact that there is nothing for R comparable to Kernighan & Ritchie ("K&R", [[ASIN:0131101633 The C Programming Language]]) or similar programming classics; finally there is. Matloff is not quite as beautiful and elegant as K&R (and to be fair, is not in their position as the language creator) but this book has similar goals and comes reasonably close.

I think there are two primary audiences for this book: those who are learning R from a computer science or programming background; and statisticians and others who use the programming language and want a thorough exposition. In my case, for instance, despite having written perhaps 100k lines of R code over the years, there remained areas where I was uneasy (e.g., exactly how do lists relate to data frames). Matloff sets it all straight, in friendly, readable fashion. Even in rudimentary chapters, I learned shortcuts and miscellaneous functions that are quite useful. The examples throughout are more "CS-like" than statistical, which is highly advantageous for this topic.

In addition to the tutorial content, it is well-suited as a quick reference. It doesn't aim to be comprehensive from a function point of view (which is almost impossible, and what R Help is for), but it is comprehensive from a programming conceptual point of view.

In short, if you program R, and unless you're a member of R-Core, then I believe you'll enjoy this, will learn something, and will refer back to it repeatedly.
255 people found this helpful
✓ Verified Purchase

Valuable addition to R bookshelf

Jason's juxtaposition of "data analysts" and "serious R programmers" strikes me as a little unfair, but I see what he means. Consider yourself a "serious R programmer" (SRP), and buy this book, if you are interested in the following aspects of R:

Variable scope - Chapter 7
User-defined classes - Ch 9
Debugging - Ch 13
Profiling and performance (mostly, vectorization) - Ch 14
Interfacing with C/C++ and Python - Ch 15
Parallel computation ("pure R" approach using "snow" package, and C++-aided approach using "OpenMP" library) - Ch 16

I have not seen the material of Chapters 15-16 in any other R reference; the other topics have shown up elsewhere - in "R in Nutshell", for example - but get more attention here. The chapters would have been much shorter if written in a "Nutshell" style; however, I do not automatically consider a verbose, user-friendly writing style a negative.

The early chapters introduce R in a way similar to other books - except for (a) eschewing discussion of the language's statistical repertoire, which makes sense given "programming" focus, and (b) showing a greater interest in the "matrix" class - and although they do it quite nicely (this said, let me ask the author to reconsider his "extended examples"), I would not recommend "Art of R Programming" to non-SRPs, and point them to Robert Kabacoff's "R in Action" or (the E-Z version) Paul Teetor's "R Cookbook" instead.

Overall, while the book did not quite click for me - I am a "data analyst" and at present do not have much "need for speed" (cf. C/C++); on the other hand, I would like a firmer grasp on R's OOP, but here, "Art of R Programming" only whets one's appetite - I cannot deny its quality and unique value for budding SRPs. If there was any wavering between four and five stars on my part, the appreciation of how pretty and inexpensive the book is tipped the scales.
66 people found this helpful
✓ Verified Purchase

OK but somewhat disorganized

This books main strength is also its greatest weakness, it tries to be too much of everything to everyone. The author obviously is a great R programmer (as he will demonstrate way too much) having a masters degree in CS and teaching R at college. However often he is too clever by half, adding non-relevant examples of overly complex and somewhat confuted code. I think he is doing this more out of love for the language then to show off but the effect is the same, much of the book comes off as disorganized and too complex for the beginner/intermediate R user to be helpful given the topic discussed. I will say that anybody who buys this book will find something to about it to like, so it is a useful addition to any R library.

Iterating the main theme, the book is very desultory. Especially when comparing it to a great book like "R Tutorial and Exercise Solution " by Chi Yau, which is organized properly. In the first few chapters of The Art of R Programming the author will lay out and explain some basic concepts and code examples then in the next page he is showing how to manipulate various data frames with 12-20 lines of complex code. I'm not sure what audience is reading introductory chapters and would find this abstruse and erudite code useful at all given the basic chapter concepts. Also the chapter layout itself seems odd as salient and trivial topics get uneven treatment relative to their important in the real world. As a Engineer and a holder of a CS degree myself, it isn't as if the code is too complex per se, its just too complex and superfluous given the topic discussed.

The author would have been much better served saving the fancy coding to advanced topics in which it would have been more relevant later in the book.
57 people found this helpful
✓ Verified Purchase

A Programmers Introduction to R

The uniformly good reviews for "The Art Of R Programming" led me to read it, and I'm glad I did. I've used R casually for years as a sort of "secret weapon" to quickly analyze a few millions data points, graph it, and draw useful conclusions, all before some one could load the data into a SQL database. I've long believed that R is a clean, well designed language for data analysis that was missing a good introductory text for programmers. R's type system, lexical structure, run time mechanics, and functional nature make it one of the best designed languages around, but this also seems to be one of the best kept secrets in the software community. Until I read "The Art of R Programming" I'd never come across material on R that introduced R as a programming language. Most of what I saw presented it as a statistical toolbox that you could, almost accidentally, program.

However, be warned that the book is not rigorous, either as an introduction or a reference. It is concise, easy to read, and much is driven by case studies to show you how to do things. But it often left me uneasy as a software engineer. For example, it states that R uses "lazy evaluation" when a more accurate statement would be that it is simply evaluates function arguments lazily. The description of the run time object environment is clunky: evaluation contexts, closures, and recursion are treated separately. It does not entirely explain how symbol look up works for functions (you won't learn why "sum
55 people found this helpful
✓ Verified Purchase

Great at times, terrible at others

I came to this book knowing next to nothing about R. I'm an experienced programmer, but my knowledge of statistics is not as deep as it should be, and rusty.

The book does a great job at times of explaining how the various R functions work, as well as concepts such as "vectorized" functions. A bit of code is shown, and then there is a lot of explanation that describes what it does, and why. Sometimes, the phrasing could use improvement, and I found myself perhaps struggling to master a concept longer than I should have, but it was enough to get the job done.

Then I got about a quarter of the way through the book and hit an extended example of applying logistic regression. First, the code included a tilde operator, which had not been mentioned anywhere the book before that. Next, it called a function, glm, without explaining what it does, and it showed the results, and said, "Sure enough, we get a 2-by-8 matrix, with the jth column given the pair of estimated B[i] values obtained when we do a logistic regression using the jth explanatory variable."

In effect, the book suddenly shifted from an explain-it-all-as-we-go text to a we-assume-you-know-statistics-as-well-as-exotic-R-operators-and-functions text. I am completely unable to understand this example until and unless I dig into both the related concepts in statistics, and the R-related syntax. I can't blame the book too much for my lack of knowledge in statistics, but I can say that it was careful to provide explanations on some much simpler statistical concepts earlier. As far as the R syntax, I don't think there is any excuse for that. It also turns out that the caret operator in this context is not at all what a programmer would expect it to be--no coverage of that either.

Somewhat later was a very long example on a Discrete Event Simulator. Here, as in so many other places, the author likes cryptic variable names such as rw, evntty, inspt and appin. If you were to study the code long enough, you would eventually understand what all of these meant. But it's sloppy and irritating and makes the job of understanding the code much harder.

Not long after this, he makes a comment on recursion that made me burst out laughing:

"It's fairly abstract. I knew that the graduate student [who had asked him for advice on writing a function], as a fine mathematician, would take to recursion like a fish to water.... But many programmers find it tough."

What I, a mere dim-as-a-20-watt-bulb programmer, find tough, is a plethora of cryptic variable names. Recursion, not so much. I followed his example with ease. Maybe if I were a math graduate student I could understand those variables!

I've also been disappointed with how little attention the book gives to the fundamental differences between some of R's "families" of functions, such as apply, lapply, sapply, and tapply, or lm and glm. There is a brief hand-waving comment and then off we go. This is unfortunate especially since, in my view, the builtin R help is often impenetrable and written more as a technical spec then a clear explanation.

I have pushed on to subsequent chapters, and learned more from the book. But be forewarned that it has a tendency to shift suddenly and without warning from a from-the-ground-up perspective to a we're-all-experienced-R-users perspective.

One other comment, as others have noted here, the publisher really should have included data files so that readers could play along with the examples.
50 people found this helpful
✓ Verified Purchase

Good from cover to cover

I'm always very wary of books about programming that have titles in the form "The Art of ... Programming", but this book is good despite the title. Matloff is clear and thoughtful writer who takes the reader through their first steps with R (which has a syntax that requires learning as it is nothing like other languages that a regular programmer would have encountered).

I did find, however, the comparisons with C programming annoying in the first part of the book. The author continuously goes on about "if you're a C programmer" and then some comparison to C. I didn't find this helpful (and I am a C programmer) and I think it could have been safely left out. A good example of this is on page 12 where is says "Matrices are indexed using double subscripting, much as in C/C++, although subscripts start from 1 instead of 0." So pretty much not like C/C++. That's a good example of how the C interludes don't help the new reader.

Just occasionally the author gets ahead of himself. Early on in the book he introduces matrices and on page 28 does a matrix addition in the form m + 10:13. He hasn't explained how that addition is going to work.

However, these complaints are pretty minor. The book does a good job of taking you from knowing nothing about R to working with complex programs and data. The chapter on S3 and S4 classes is particularly welcome, but I think it could have been more in depth and earlier in the book. They are an important topic.

Overall this is a very good book to learn R from and has enough depth that the experienced R user will find useful things in the later chapters.
33 people found this helpful
✓ Verified Purchase

Little more than a (basic) reference

After reading this book from cover to cover, I wasn't sure who the intended audience was for this book. Perhaps it tries to pander to "everyone", which probably explains why it fails in both breadth and depth.

For a R beginner with little programming experience, this book CAN serve as a compilation of many relevant topic. However, I believe you will soon move onto other references once you start to code seriously. R in a nutshell, for example, will have much more longevity as part of your R references.

If you are looking to pick up R with a coding background, this book does not do R justice in explaining its functional nature or any of the more nuanced aspects of WHY R works in the way it does. John Chamber's "Software for Data Analysis" and Hadley Wickham's upcoming book (major parts of which is free on github) are much better suited for that.

If you are an intermediate R user, this book will not advance you to the next stage. Many of the functions written feels "hacked together" rather than something that is carefully optimized. In fact, many times the author even suggests that there are perhaps code that is faster or more efficient than the code he provided, which begs the question "why should I bother learning this code, then?"

Aside from these, there are also some major issues:
(1) Organization (lack of): As other reviewers have mentioned, this book is not well-organized. If you are truly a R beginner, reading it from start to finish will be impossible (since it skips around so much). You will find yourself either skipping back and forth or googling, or both.
(2) Depth (lack of): I found myself asking "why?" way too many times when reading this book (basically several times every page), and had to dig deeper to find out for myself.
(3) Inculcates some pretty bad R programming habits, such as using 'T' and 'F' instead of TRUE and FALSE, very confusing naming conventions, etc.
(3) Price: for something that is no more than a very basic reference with little insight into the nature of R programming, I think this book is over-priced.

Reviewer's background: Statistics PhD student, frequent R user
20 people found this helpful
✓ Verified Purchase

... due to the high reviews and boy am I disappointed. I'm a grad student in computer engineering at ...

I bought this book due to the high reviews and boy am I disappointed. I'm a grad student in computer engineering at a top program so I'm by no means dumb. But this book is a horrible book if you are new to R. For one thing, the code examples it gives are overly complicated with poor explanations which prevents one from actually learning R. Some simple functions are given very few examples which leaves a lot of ambiguity on how to use them. I had to actually figure a lot of stuff myself by experimenting with R to know to exactly some details work.

But perhaps the worst part is that the author will use concepts in examples that aren't covered until a few chapters later. That makes the code even harder to understand. You come across some new notation that you've never seen before (with no explanation) and you thought you missed it earlier on when it fact it's not your fault at all. That is just poor organization, unacceptable for a programmer. Even a lot of codes the author gives are not optimized (and the author even admits this a lot of time), so you spend a lot of time trying to figure out how a program works when you haven't even mastered how all the basic functions work.

I found myself learning way more R watching Youtube videos than this book, with far less pain. Maybe I will come back to some of the skipped portion of this book after I'm more experienced with R, but for now it's worthless for a R beginner.
17 people found this helpful
✓ Verified Purchase

The BEST book on R Programming out there.

Anyone seeking to learn R faces two major challenges: (1) learning how to swim in the sea of information: R packages, books, websites, blog posts, message boards etc. that threatens to drown a newbie and (2) and coming to grips with the structure, syntax and features of the language itself. Having some idea of what one wants to do with R is clearly an important first step that will set the path of learning. R, an open source computer language, is the premier software system for statistical computing. Not only can any statistical idea be expressed in R, it is likely that someone in the open source community has already written a function to accomplish or at least facilitate any statistical analysis a working statistician or data scientist might be contemplating.

R functions are organized into libraries or packages that usually relate to some particular statistical task. Assuming something like an average of 20 functions per package, the 3400 available contributed packages[1] offer over 68,000 routines to read in data, manipulate it analyze it and visualize the results. No one could possibly become familiar with all of these. But, because R is an interpreted (instant feedback) language that encourages experimentation, some serious, sophisticated statistical analyses can be accomplished by stringing together the appropriate functions into a script. If interest in R is to only perform some particular analysis then a beginner’s best bet might be to select one of 100 or so books or blogs on doing statistics with R that provides relevant sample code and cut and paste to get a workable script. There is no shame in this. That is why all the open source authors went to the trouble of packaging up their work.

However, if a person really wants to be able to speak the R language and become a competent R programmer then, at the present time, one can find no better guide than Norman Matloff’s The Art of R Programming. Professor Matloff is a statistician and a computer scientist with a considerable amount of teaching experience. His book is no mere programming reference guide. It is a carefully crafted sequence of lessons that start at the beginning and work up to some fairly advanced topics including a lucid account of object-oriented programming in R, a presentation of the rudiments of TCP/IP operations and a discussion of R programming for the internet, examples of parallel programming with R, and a discussion spanning several chapters of how to write production-level R code that includes methods and advice on debugging R code, writing efficient R code, and interfacing R with other languages. Other distinguishing features of the book are brief examples showcasing a large number of functions (including rare gems such as D() for symbolic differentiation) that indicate the power and scope of R, and over thirty “Extended Examples” each of which is a credible study in writing careful, professional code. The most captivating aspect of the book, however, is Matloff’s thoughtful manner of exposition. R’s rich, compact syntax can be challenging the first time around. Matloff knows where the difficulties are. His presentations of R’s various features and functions begin from a point of view that anticipates obstacles that likely to confound someone going down the R path for the first time and guides the novice around them. I expect that The Art of R Programming will appeal to diverse audience of aspiring R programmers.
16 people found this helpful
✓ Verified Purchase

Sloppy

The author provides a decent enough basic overview of commonly used R features and does elaborate on some of the internals and best-practices to create efficient code, but I have a particular peeve against including example code that does not work.

How hard could it be to actually try to run the code before publishing, just to see if it functions without errors? This includes not only the printed examples, but also Matloff's downloadable code.
14 people found this helpful