What Exactly is Hadoop & MapReduce?

In a nutshell, Hadoop is an open source framework that enables the distributed processing of large amounts of data over multiple servers. In effect it is a distributed file system tailored to the storage needs of big data analysis. In lieu of holding all of the data required on one big expensive machine, Hadoop offers a scalable solution of incorporating more drives and data sources as the need arises.

Having the storage capacity for big data analyses in place is instrumental, but equally important is having the means to process data from the distributed sources. This is where Map Reduce comes into play.

Map Reduce is a programming model introduced by Google for processing and generating large data sets on clusters of computers. This video from IBM Analytics does an excellent job of presenting a clear concise description of what Map Reduce accomplishes.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s