piątek, 4 stycznia 2013

Map Reduce implementation with Hadoop

Hadoop is an open source framework which supports big data distributed application. One od main features is MapReduce algorithm implementation. Hadoop gives us opportunity to use it's API to implement our own Maping and Reducing. To run your first hadoop job you will need to implement generic Mapper and Reducer classes. See examples below.
public class Map extends MapReduceBase implements
  Mapper {

 private final IntWritable one = new IntWritable(1);
 private Text word = new Text();

 public void map(LongWritable key, Text value,
   OutputCollector output, Reporter reporter)
   throws IOException {

  String line = value.toString();
  StringTokenizer tokenizer = new StringTokenizer(line);
  while (tokenizer.hasMoreTokens()) {
   word.set(tokenizer.nextToken());
   output.collect(word, one);
  }
 }
}
public class Reduce extends MapReduceBase implements
  Reducer {
 public void reduce(Text key, Iterator values,
   OutputCollector output, Reporter reporter)
   throws IOException {
  int sum = 0;

  while (values.hasNext()) {
   sum += values.next().get();
  }

  output.collect(key, new IntWritable(sum));
 }
}
Last thing you need to do is clip them together using Job Configuration
public class Job {

 public static final void main(String[] args) throws Exception {

  JobConf conf = new JobConf(Job.class);
  conf.setJobName("Hadoop-Workshop-GKOLPU");

  conf.setOutputKeyClass(Text.class);
  conf.setOutputValueClass(IntWritable.class);

  conf.setMapperClass(Map.class);
  conf.setCombinerClass(Reduce.class);
  conf.setReducerClass(Reduce.class);

  conf.setInputFormat(TextInputFormat.class);
  conf.setOutputFormat(TextOutputFormat.class);

 FileInputFormat.setInputPaths(conf,new Path(args[1]));
 FileOutputFormat.setOutputPath(conf,new Path(args[2]));

  JobClient.runJob(conf);
 }
}

Brak komentarzy:

Prześlij komentarz