[hadoop][mapreduce]Hadoop MapReduce Can Transform How You Build Top-Ten Lists

2013年5月23日星期四

[hadoop][mapreduce]Hadoop MapReduce Can Transform How You Build Top-Ten Lists | Datastream

Hadoop MapReduce Can Transform How You Build Top-Ten Lists | Datastream
http://www.greenplum.com/blog/topics/hadoop/how-hadoop-mapreduce-can-transform-how-you-build-top-ten-lists

"To find a top ten list with only one MapReduce job, we’re going to set up a tournament in our Hadoop cluster. The tournament is pretty simple:

1. Each mapper finds its local top ten list and sends that list to the reducer.

2. The reducer finds the top ten from the finalists sent from the mappers."

"Running a job like this to compute your top ten list will be far more efficient than the alternative of sorting the data first, then taking the top ten. The amount of data being moved over the network to the reducer in the top ten job here is minuscule in comparison to all of the data in a sort."

雖然是舊文了，但是還是有很多用MapReduce觀念處理資料的tricks

沒有留言:

張貼留言

訂閱：張貼留言 (Atom)

2013年5月23日 星期四