http://www.greenplum.com/blog/topics/hadoop/how-hadoop-mapreduce-can-transform-how-you-build-top-ten-lists
"To find a top ten list with only one MapReduce job, we’re going to set up a tournament in our Hadoop cluster. The tournament is pretty simple:
1. Each mapper finds its local top ten list and sends that list to the reducer.
2. The reducer finds the top ten from the finalists sent from the mappers."
"Running a job like this to compute your top ten list will be far more efficient than the alternative of sorting the data first, then taking the top ten. The amount of data being moved over the network to the reducer in the top ten job here is minuscule in comparison to all of the data in a sort."
雖然是舊文了,但是還是有很多用MapReduce觀念處理資料的tricks
沒有留言:
張貼留言