Chapter 2. MapReduce
MapReduce is a programming model for data processing. The model is simple, yet not to simple to express useful programs in. Hadoop can run MapReduce programs written i various languages; in this chapter, we look at the same program expressed in Java, Ruby and Python. Most importantly, MapReduce programs are inherently parallel, thus puttin very large-scale data analysis into the hands of anyone with enough machines at thei disposal. MapReduce comes into its own for large datasets, so let’s start by looking at one.