piątek, 4 stycznia 2013

Hadoop. MapReduce File OutputFormat.

Hadoop provide much API for MapReduce customizing. Almost ever we want to get results in some specific format. We just nedd to do two things. Implement OutputFormat interface or extend it's implementation. See the code below.

public class EmailXmlOutputFormat extends FileOutputFormat {

 public RecordWriter getRecordWriter(FileSystem ignored, 
                  JobConf job,String name, Progressable progress)
                  throws IOException {
  Path file = FileOutputFormat
                         .getTaskOutputPath(job, name);
  FileSystem fs = file.getFileSystem(job);
  FSDataOutputStream fileOut=fs.create(file, progress);
  return new EmailXmlRecordWriter(fileOut);
 }
}
To make code complete implementing RecordWriter...

public class EmailXmlRecordWriter implements RecordWriter {
 private static final String utf8 = "UTF-8";

 private DataOutputStream out;

 public EmailXmlRecordWriter(DataOutputStream out) 
    throws IOException {
  this.out = out;
  out.writeBytes("\n");
 }

 public synchronized void write(Text key, Text value) 
    throws IOException {

  boolean nullKey = key == null;
  boolean nullValue = value == null;

  if (nullKey && nullValue) {
   return;
  }
 
  //TODO
  //put your code here 
  //write to out stream

 }

 public synchronized void close(Reporter reporter) 
    throws IOException {
  try {
   out.writeBytes("\n");
  } finally {
   out.close();
  }
 }
}
Firstable coding constructor and close method. Then putting some logic into write. Done.

Brak komentarzy:

Prześlij komentarz