Customer Reviews


30 Reviews
5 star:
 (15)
4 star:
 (6)
3 star:
 (4)
2 star:
 (1)
1 star:
 (4)
 
 
 
 
 
Average Customer Review
Share your thoughts with other customers
Create your own review
 
 

The most helpful favorable review
The most helpful critical review


7 of 8 people found the following review helpful
4.0 out of 5 stars Solid high-level intro
I bought this book for a project at work, to prototype a log analysis system using Hadoop. I haven't bought very many technical books in the last few years, but the quality of most online documentation for Hadoop is poor and books seemed like a better option. This book is considered the "bible" for Hadoop. It was useful, and I kept it open on my desk for quite a while...
Published 4 months ago by Kenneth Cognoscente

versus
74 of 85 people found the following review helpful
1.0 out of 5 stars Useless as a Tutorial
I bought this book as a very experienced programmer but no prior experience with Hadoop, which I need to come up to speed on for a new project. I am extremely disappointed in the book and feel I wasted my money. If there's one thing you want from a book on a new technology, it's the ability to get a basic "Hello World" equivalent program running, from which you can then...
Published 9 months ago by Frustrated Hadoop Learner


‹ Previous | 1 2 3 | Next ›
Most Helpful First | Newest First

74 of 85 people found the following review helpful
1.0 out of 5 stars Useless as a Tutorial, September 19, 2012
Amazon Verified Purchase(What's this?)
This review is from: Hadoop: The Definitive Guide (Paperback)
I bought this book as a very experienced programmer but no prior experience with Hadoop, which I need to come up to speed on for a new project. I am extremely disappointed in the book and feel I wasted my money. If there's one thing you want from a book on a new technology, it's the ability to get a basic "Hello World" equivalent program running, from which you can then start iterating. This book completely falls down on this most basic requirement - when you get to the very first example program in the book, it tells you that you need to first compile a bunch of example code from the book's website. That shouldn't be required, but ok, whatever. Then when you go to the book's website, you are told that you first need to install a bunch of extra stuff covered later in the book before you can compile the libraries apparently needed to get anything at all to run. This really makes no sense at all - there's no way I should be having to read all the later chapters to figure out what these things are in order to get my very first example program running. Tossed it into the trash and off in search of a resource done by someone who understands how to structure a tutorial properly.
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No


30 of 34 people found the following review helpful
3.0 out of 5 stars Fell short of my expectations. Source of much frustration., August 25, 2012
Amazon Verified Purchase(What's this?)
This review is from: Hadoop: The Definitive Guide (Paperback)
I had read all the positive reviews and really had high hopes for the book, waited for the 3rd edition thinking it would be current, but I've mainly felt frustration in reading it once past the first few chapters.

Reference to the Bible in other reviews are apt. The book is a mishmash of chapters with a wide variety of styles and intents. The writing giving the overview is great. But other chapters are a reference manual dump with little motivation. Other chapters tried to be guided tutorial, but lacked in important details (or were out dated by changes). Wish it could have been written with a clearer editorial point of view, or better organized in sections with similar purposes.

Keeping up with a such a fast moving project with a paperback book is no doubt a difficult task. I didn't feel the book did a good job of dealing with the changes that happened with the shift to 1.x .

Most frustrating were the mentions of the "book's website" as a source of up-to-date information. Which website? (hadoopbook.com, oreilly.com, github.com). Wouldn't it make sense to use a URL instead of the phrase "book's website?"

Minor complaint, don't like the code listings without filenames.

Expect to find a lot of time looking for stuff on the web that should have been included in the book or at least documented with a concrete URLs.

There are certainly example of truly fine technical writing in the book. Just wish that level could have been maintained through out the book.
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No


7 of 7 people found the following review helpful
3.0 out of 5 stars Good general guide, poor if looking for detail, poor if looking for Hadoop 2.0 information, January 6, 2013
By 
Al (California) - See all my reviews
Amazon Verified Purchase(What's this?)
This review is from: Hadoop: The Definitive Guide (Paperback)
If you're looking to learn about what Hadoop is, all of the buzzwords/terms you've heard about (i.e. HDFS, MapReduce), and get an overview of software in the Hadoop ecosystem (Pig, Hive, etc.) this is a good book that will give you a good overview and pointers in the right direction.

However, the book isn't going to give you a lot of detail on programming MapReduce and things like that.

In other words, it's a good breadth book, not a good depth book. So YMWV depending on what you're looking for.

I bought the previous edition of this book and gave it 4 stars. I bought this newer edition looking for information about Hadoop 2.0, Yarn, and all of the new stuff coming out. It provided a little bit of information about this, but overall was lacking in these details. So I notched it down 1 star because of that. It was just too much duplicate information from the prior edition.
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No


19 of 24 people found the following review helpful
1.0 out of 5 stars poorly organized and hard to get examples working, October 12, 2012
By 
Amazon Verified Purchase(What's this?)
This review is from: Hadoop: The Definitive Guide (Paperback)
I purchased this book a few months ago based on many earlier 5-star reviews. I had high hopes that it would be as good as those reviewers highly praised. However, the book is actually unbelievably poorly organized - essentially written in a spaghetti fashion. Yes - it contains a lot of information about Hadoop, but with three basic issues: 1) examples are trivial and hard to get working due to insufficient, unclear or no procedures; 2) many subjects (e.g. streaming) are spread over several chapters and readers have to stitch them together after reading all relevant chapters; and 3) many stataments are either inaccurate or lack supportive data. Ironically, one has to apply MapReduce to all the subjects in order to sort out various subjects in a more logic order. I look forward to the 4th edition with significant quality improvement.
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No


7 of 8 people found the following review helpful
4.0 out of 5 stars Solid high-level intro, February 11, 2013
Amazon Verified Purchase(What's this?)
This review is from: Hadoop: The Definitive Guide (Paperback)
I bought this book for a project at work, to prototype a log analysis system using Hadoop. I haven't bought very many technical books in the last few years, but the quality of most online documentation for Hadoop is poor and books seemed like a better option. This book is considered the "bible" for Hadoop. It was useful, and I kept it open on my desk for quite a while as I worked to get the infrastructure set up. Consider it a high-level intro to lots of different Hadoop topics, and you'll be happy with it. Just don't expect it to answer all of your questions. You'll probably still end up doing a lot of digging through other online sources, because the Hadoop ecosystem is large and complicated, and no book can really cover all of it. Besides this book, I also bought Hadoop In Action (not quite as big as this book, but a useful counter-point) and Data Intensive Text Processing With MapReduce (which gave me a good intro to the Map Reduce algorithm, but wasn't that useful once I had a general idea what was going on).
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No


34 of 48 people found the following review helpful
1.0 out of 5 stars Too Many Foundational Inaccuracies, September 4, 2012
This review is from: Hadoop: The Definitive Guide (Paperback)
I read the book with attention mainly to Hadoop's underlying premises and platform architecture, and note that this review focuses on the book itself, not the subject of Hadoop in general.

Firstly, I agree with the reviewer noting the book's a "mishmash". It's rather unorganized and thus presented poorly in that it delivers a series of ad-hoc "how tos". After three editions, this should have been remedied.

But, what I feel is the largest shortcoming is that, while the author certainly seems to demonstrate deep knowledge of Hadoop and its related projects, he make numerous assertions of underlying platform concepts that are either unsubstantiated or completely incorrect. Given the complexities and efforts expected of large-scale, distributed systems, this is a critical weakness.

For example, page 3 under "Data Storage and Analytics" (and available under "Look Inside") illustrates a naďve and incorrect understanding of disk performance; research "understanding IOPS" to understand why this is. Ironically, actual and not theoretical performance would likely be worse than what he outlines so had he provided perhaps just a tad more accuracy, he would not only have maintained credibility, but also in turn made a stronger case for the limitations of disk I/O (albeit rotating in this context). This is not to split hairs since, and by his own statement, the focal point of Hadoop is mitigating mass storage and processing scalability bottlenecks, and Hadoop is the focal point of the book. Foundational knowledge, such as how to measure disk performance, in the problem itself is expected.

His knowledge of RAID concepts is also demonstrably quite lacking, and various RAID levels have to-date been the standard mechanism to speedup disk I/O and mitigate consequences of disk failure. HDFS has its own counterpart to RAID so a definitive guide to Hadoop must provide a definitive understanding of RAID. Again, this is squarely within the scope of the book so to expect the author to understand the topic is not unreasonable, but unfortunately here too his credibility suffers.

Page 3 also describes "how RAID works", but even that statement is inherently inaccurate. In practice, "RAID" itself isn't an absolute term and must be accompanied by a level, and certain levels serve completely different purposes (research "RAID levels"); his comment would be accurate rephrased as "how RAID 1 (mirroring) works". Later, in chapter 9, he does in fact refer to RAID 0, but then states that RAID 0 "is" (as opposed to "may be") slower than JBOD with HDFS. Regardless of whether than could be the case or not, it's presented as fact and, inexplicably, he offers a hyperlink to an email outlining a brief, one-off experiment as "proof". This is far from scientific or objective; to extrapolate a single cause from such an "experiment" is tantamount to junk science. The authors of the experiment's results themselves didn't even offer it as conclusive.

He also makes careless logical and mathematical generalizations, like in the following statement: "[i]n JBOD, disk operations are independent, so the average speed of operations is greater than that of the slowest disk.". That is not a true statement because if all of the disks are same speed (however that's measured...) then mean speed and each disk's speed would be equal. Furthermore, "[d]isk performance often shows considerable variation in practice, even for disks of the same model.". Period. End of story. No evidence, no citations, not even a logical proof. Nothing. A completely subjective and baseless assertion that the reader is expected to simply accept. This pattern unfortunately permeates the entire text.

His recommendation of JBOD, however, applies only to a certain class of Hadoop servers and for another he does in fact recommend RAID. Whether that reflects general consensus, I don't know, but after claiming that JBOD under HDFS outperforms RAID 0, he adds that JBOD is superior also because "if a disk fails in a JBOD configuration, HDFS can continue to operate without the failed disk, whereas with RAID, failure of a single disk causes the whole array (and hence the node) to become unavailable.". I'm sure that gave a chuckle to those who possess even the most basic understanding of RAID levels and level nesting. And besides the proposition being simply false at face value, it's also logically contradicts his suggestion a few paragraphs prior that RAID should be used, albeit for a certain server role, but used nevertheless. Whether he's gaming his RAID explanations to suit a particular purpose or he's playing fast-and-loose with terms he doesn't understand is unclear, but what is clear is that his information is unreliable.

Another example includes asserting a SAN impacts data center bandwidth. With virtually no exceptions, SANs are over dedicated fiber channels, not "the network", and thus "network bandwidth" potentially being a bottleneck, as he describes, is completely inapplicable.

He refers to a "1 GB" switch in several places and we're left to assume it's actually "1 Gbps". Similarly, references to "rational" rather than "relational" databases appear repeatedly early in the book. Misprints or not, they further erode credibility.
"Linear scalability" through parallel processing is a repeated reference, but at any scale--from multicore to thousand-node grids--engineers know that Amdahl's Law proves this is simply not possible. "Less non-linear" or a similar description would be accurate and not mislead the reader to believe doubling compute doubles speedup.

Ultimately, I'm disappointed in the extremely limited depth the author demonstrates in understanding distributed system and even simple computing fundamentals. Perhaps these topics have been rushed and perhaps other flaws are attributable to the publisher, but they are so central to the subject that to speak to them at all requires speaking to them intelligently and scientifically. Because the author unfortunately indicates little of either, I cannot recommend this book and will instead seek credibility on the subject elsewhere.
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No


8 of 11 people found the following review helpful
5.0 out of 5 stars The Bible, but not a Tutorial, December 11, 2012
By 
rICh (boulder, co) - See all my reviews
Amazon Verified Purchase(What's this?)
This is the best reference out there regarding Hadoop, but do not mistake it for a tutorial -- it's not really meant to be read cover to cover. If you want that, I've heard good things about Chuck Lam's "Hadoop in Action".

Now that you know what it isn't, here's what this book is:
A comprehensive, "roll up your sleeves, here's some Java" deep dive into Hadoop. It covers the basics as well as advanced topics and a brief tour of the supporting projects (like Hive, Pig, etc). No single book will do Haddop justice, but this book is the best attempt so far. If you only have enough cheddar to buy a single book, this is the one you should own.
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No


1 of 1 people found the following review helpful
2.0 out of 5 stars Not for beginners, June 18, 2013
Amazon Verified Purchase(What's this?)
This review is from: Hadoop: The Definitive Guide (Paperback)
I feel like this book is very hard to follow and not organized well. This book is better as a reference book then a book to teach you about Hadoop. I had taken a Hadoop training course before reading this and I still had a difficult time following along. With that said, the combination of the training course and this book, I was able to pass the Cloudera Certfied Developer for Apache Hadoop (CCDH) certification exam. I gave this book 2 stars because I believe there was some key pieces of knowledge in this book that was asked on the exam.
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No


3.0 out of 5 stars Could be better organized, July 6, 2013
By 
Jay Urbain (Bayside, WI. USA) - See all my reviews
Amazon Verified Purchase(What's this?)
The book has a lot of useful information, however it is poorly organized.

I believe, what's needed is a more task-based approach to presenting this material, i.e., step by step.

I found myself hunting around for information, not being able to find it, and just using Google.
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No


1 of 2 people found the following review helpful
4.0 out of 5 stars Good Overview of a Complex Technology, May 12, 2013
By 
Scott McFarland (Manassas, VA United States) - See all my reviews
Amazon Verified Purchase(What's this?)
This review is from: Hadoop: The Definitive Guide (Paperback)
This must be a tough book to write. It's an attempt to describe all components of an ecosystem, not to give a cookbook for a particular tool or framework. I think that some of the worse reviews on Amazon are by people who don't have the time and the disposition to pursue a real understanding of Hadoop.

Some sections of this are better than others, but on the whole the book is a solid resource that gives a reader a good high-level understanding of the technology. I am glad that I bought it, and I have a better understanding of Hadoop for having read it.
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No


‹ Previous | 1 2 3 | Next ›
Most Helpful First | Newest First

This product

Hadoop: The Definitive Guide
Hadoop: The Definitive Guide by Tom White (Paperback - May 26, 2012)
$49.99 $27.89
In Stock
Add to cart Add to wishlist
Only search this product's reviews