-compile map reduce code and build jar file
-upload jar to remote server via ftp
-connect to server via ssh client
-prepare input data in HDFS (usually only once)
-submit job using ‘hadoop jar…’ command
-copy map reduce output from HDFS to remote server local file system
-download output to local machine
Like I mentioned, doing those entire steps manually takes time. I was seeking for some ways to do it faster and easier. After short research, because I didn’t find any tool or solution, so I state the easier and the best way is to develop something myself. I was wondering how to do this. I could code some standalone application for doing this but it would not be enough comfortable again and require few steps from the user like switching from java IDE to another window and finding jar in file system. I thought maybe it will be better to write eclipse plugin. Everything would be in one place but would have some weakness also – no usage outside of eclipse. Next thought was Maven. Integrated with … everything could be used in the console and what important plugin development for it is easy and pleasant, so I took this idea started development.
The result of my work looks really well. Now I submit my job by ‘one button click’. I have my maven execution configured in eclipse (hadoop:execute)
<build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-hadoop-plugin</artifactId> <version>0.0.1-SNAPSHOT</version> <configuration> <host>ipAddress</host> <login>login</login> <password>password</password> <outputDir>output30</outputDir> <hdfsOutputDir> /books/users/gkolpu/output30 </hdfsOutputDir> <hdfsInputDir> /books/input </hdfsInputDir> <className> org.gkolpu.hadoop.BookWordCounter </className> <jarName>WordCounter.jar</jarName> </configuration> </plugin> </plugins> </build>
Running
my maven hadoop:execute goal from eclipse I see all logs coloured in IDE console
When
the job finished on remote server map reduce output is downloaded automatically
to target directory
Output
files can be easily opened in IDE editors
You can find this plugin with source code on my git-hub repository. There is one only one goal but this project is still under development and other maven goals are planned.
https://github.com/gkolpuc/maven-hadoop-plugin
Brak komentarzy:
Prześlij komentarz