ReflectionUtils.getCallingClass(0).getResourceAsStream("testGfile.txt")
Thursday, November 3, 2011
Load a Resource from Classpath in Groovy
Lets look at how we can load a resource from classpath in groovy when groovy is used as a part of java application. Put you resource beside the groovy class and do below
Monday, October 31, 2011
Write Nutch Plugin Code in Groovy
In this post, let us look at how we can integrate groovy into nutch code base so that one can write plugin code in groovy.
Below are the steps
1.Edit "src/plugin/myplugin/ivy.xml" of your plugin to include groovy jar. After edit, dependencies section of the file should look like below
Below are the steps
1.Edit "src/plugin/myplugin/ivy.xml" of your plugin to include groovy jar. After edit, dependencies section of the file should look like below
<dependencies> <dependency org="org.codehaus.groovy" name="groovy-all" rev="1.7.4"/> </dependencies>2.Edit "src/plugin/myplugin/plugin.xml" to include groovy jar. Runtime section of the file after edit should look like below
<runtime> <library name="my-plugin.jar"> <export name="*"/> </library> <library name="groovy-all-1.7.4.jar"/> </runtime>3.Edit "ivy/ivy.xml" to include below in the dependencies section.
<dependency org="org.codehaus.groovy" name="groovy-all" rev="1.7.4"/>4.Edit "src/plugin/build-plugin.xml" and add the below taskdef
<taskdef name="groovyc" classname="org.codehaus.groovy.ant.Groovyc"> <classpath refid="classpath"/> </taskdef>add the below target
<target name="groovyCompile"> <echo message="Compiling groovy classes in plugin: ${name}"/> <groovyc srcdir="${src.dir}" includes="**/*.groovy" destdir="${build.classes}"> <classpath refid="classpath"/> </groovyc> </target>and modify target named "compile" to depend on "groovyCompile"
change below <target name="compile" depends="init,deps-jar, resolve-default"> to <target name="compile" depends="init,deps-jar, resolve-default, groovyCompile">That's it. Now you should be able to add groovy classes to your plugin and used them in java classes. Also, if you are working in eclipse, do not forget to add groovy-all-1.7.4.jar to your classpath.
Creating Nutch Distribution with Custom Plugin Code and Running
In this post, we are going to talk about how we can build nutch with custom plugin code. Look here to see how we can create our custom plugin. After the plugin development is done, to make a distribution, we need to do below
- Go to nutch project folder (lets assume it is "~/workspaces/nutch")
- Run "ant tar"
- The above command creates "~/workspaces/nutch/dist" (here you can find the distribution nutch-1.3.tar.gz)
- nutch-1.3.tar.gz is also unzipped into the folder "~/workspaces/nutch/dist/nutch-1.3"
- change directory to "~/workspaces/nutch/dist/nutch1.3/runtime/local"
- verify that folder "urls" exists in this directory. If not, create it. Add some seed urls to it.
- While developing, "plugin.folders" property in "conf/nutch-site.xml" has a value of "./src/plugins". This does not work when you are working with distribution. Change the value of this property to wherever plugin jars are located. By default, these jars are in "~/workspaces/nutch/dist/nutch1.3/runtime/local/plugins" folder. Since we are already in folder "local", change the value of "plugin.folders" property value in "conf/nutch-site.xml" to "./plugins". If you for got to do this, you might see an error: Caused by: java.lang.RuntimeException: x point org.apache.nutch.net.URLNormalizer not found. at org.apache.nutch.net.URLNormalizers.
(URLNormalizers.java:122)at org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:70) - Now run nutch using command like "bin/nutch crawl urls -dir crawl -depth 2 -topN 4". A good explanation of what everything means in this command can be found here.
That's it...
Thursday, October 27, 2011
Java Regular Expressions Test Harness
Regular Expressions are best understood when we try them using java program with different inputs and analyzing output while reading the definitions.
Java Regex page provides a class that we can use to try out regular expressions with different inputs. But, the issue with this program is, it does not work in eclipse because of the access to Console from eclipse. If you are experiencing that issue, the below program can be used to try them out.
package org.apache.nutch.util; import java.io.BufferedReader; import java.io.IOException; import java.io.InputStreamReader; import java.util.regex.Matcher; import java.util.regex.Pattern; public class RegexTestHarness { public static void main(String[] args){ try { BufferedReader console = new BufferedReader(new InputStreamReader(System.in)); System.out.println("Enter regual expression: "); String regex; regex = console.readLine(); while (!regex.equalsIgnoreCase("exit")) { Pattern pattern = Pattern.compile(regex); System.out.println("Enter input string to search: "); Matcher matcher = pattern.matcher(console.readLine()); boolean found = false; while (matcher.find()) { System.out.println("I found the text '"+ matcher.group()+ "' starting at index " +matcher.start()+ " and ending at index " +matcher.end()); found = true; } if(!found){ System.out.println("No match found.%n"); } System.out.println("Enter regual expression: "); regex = console.readLine(); } } catch (IOException e) { e.printStackTrace(); } } }
Sunday, October 23, 2011
Parse String to Java Date and Format Date to Localized String
In this post, Let's look into how we can convert a String to Date object in Java and vice versa.
Solid understanding of Date class is essential to understand this post or anything related to java dates. I have written a post explaining the basics of Date class.
I have a string "october 21, 2011 19:08". I would like to convert this to Date object. How can I do this? Also, how can I convert that Date back to localized String?
First, Lets look at the code.
Output
Explanation
Solid understanding of Date class is essential to understand this post or anything related to java dates. I have written a post explaining the basics of Date class.
I have a string "october 21, 2011 19:08". I would like to convert this to Date object. How can I do this? Also, how can I convert that Date back to localized String?
First, Lets look at the code.
1: import java.text.ParseException;
2: import java.text.SimpleDateFormat;
3: import java.util.Date;
4: import java.util.TimeZone;
5:
6: public class DateTest {
7:
8: public static void main(String[] args) {
9: try {
10: String dateStr = "october 21, 2011 19:08";
11: String pattern = "MMMM dd, yyyy HH:mm";
12: TimeZone timezoneOfDateStr = TimeZone.getTimeZone("Asia/Kolkata");
13:
14: SimpleDateFormat sd = new SimpleDateFormat(pattern);
15: sd.setTimeZone(timezoneOfDateStr);
16: Date date = sd.parse(dateStr);
17:
18: System.out.println("Converting : ");
19: System.out.println(" Date String: "+dateStr);
20: System.out.println(" pattern: "+pattern);
21: System.out.println(" time zone: Asia/kolkata");
22: System.out.println();
23: System.out.println("Converted.");
24: System.out.println("System Stored the above Date as : "+date.getTime());
25: System.out.println();
26: String patternWithTimezone = "MMMM dd, yyyy HH:mm zzzz";
27: SimpleDateFormat sdWithTZ = new SimpleDateFormat(patternWithTimezone);
28: sdWithTZ.setTimeZone(timezoneOfDateStr);
29: System.out.println("Formatting the date back to human readable format in the same timezone");
30: System.out.println(" here it is: "+sdWithTZ.format(date));
31:
32: System.out.println();
33: sdWithTZ.setTimeZone(TimeZone.getTimeZone("GMT"));
34: System.out.println("Formatting the date to human readable format in GMT");
35: System.out.println(" here it is: "+sdWithTZ.format(date));
36: } catch (ParseException e) {
37: e.printStackTrace();
38: }
39: }
40:
41: }
42:
Output
Converting : Date String: october 21, 2011 19:08 pattern: MMMM dd, yyyy HH:mm time zone: Asia/kolkata Converted. System Stored the above Date as : 1319204280000 Formatting the date back to human readable format here it is: October 21, 2011 19:08 India Standard Time Formatting the date to human readable format in GMT here it is: October 21, 2011 13:38 Greenwich Mean Time
Explanation
- Line 10: The date that needs to be converted
- Line 11: Pattern of the date that is defined in line 10. To understand how to build this pattern, read the table in the java doc of SimpleDateFormat class which explains about the letters and how they are interpreted (example: M is interpreted as month, y as year, d as Date, H as hour, m as min, z as timezone etc)
- Line 12: Timezone of the above date. Why timezone? A date by itself does not tell the whole story. (example: Oct 21, 2011 08:00 in newyork is same as Oct 21, 2011 11:00 in Las Angeles). So, just saying Oct 21, 2011 would not tell the whole story, we also need to tell the system what timezone that time belongs to. In this case, we are saying that the date belongs to "Asia/Kolkata timezone" / "India Standard Time".
- Lines 14,15,16: Create SimpleDateformat class. Tell it what pattern the string is in, What timezone it belongs to. Parse the date string and get date object.
- Lines 18 to 25: Print some details. It also prints the date in milliseconds. Why in milliseconds? Read Java Date Explanation
- Lines 26 to 30: Format the date to human readable format in India Standard Time and print.
- Lines 32 to 35: Format the date to human readable format in GMT and print.
Java Date Explanation
In this post, let's look into some basics about Date class which creates lots of confusion not only among beginners but even among experienced programmers. It is understandable given that even sun folks couldn't understand it correctly when they initially wrote the Date class. That is the reason the Date class has below explanation in it's java class.
What I am going to discuss below is very important. It will make your understanding of dates whole lot easier. Pay special attention.
When you think of Date, think of milliseconds from epoch (January 1, 1970) . Date in java is just a wrapper around milliseconds. Let me clarify this with an example.
Output
Above we created Date objects in different time zones (GMT-5, GMT+1, and GMT+7) and printed their millisecond value.
Surprised that all the date instances print same value for milliseconds? (the 1 millisecond difference between 1st and 2nd line in the output is because of when in the program the date object is created). Shouldn't they be hours apart? Not really. This is how it works.
Let's say, we created a Date instance in a machine running in Arizona Timezone (GMT-7) on Oct 21, 2011 at 06:00. Think of it as below is what happens (although the machine would always have millis calculated in GMT from Jan 1, 1970 midnight)
that is the reason, regardless of where the machine is running (or what the default timezone is), the date object would only represent the number of milliseconds since January 1, 1970, 00:00:00 GMT. That's it. That is the reason all three lines in the output show the same value.
Solid understanding of the above concept is essential when one is dealing with dates in Java. If you did not understand the above, go back again and read one more time. Once you understand the above concept everything else should be easy to understand.
See my post about Parse String to Java Date and vice versa
Prior to JDK 1.1, the class Date had two additional functions. It allowed the interpretation of dates as year, month, day, hour, minute, and second values. It also allowed the formatting and parsing of date strings. Unfortunately, the API for these functions was not amenable to internationalization.
What I am going to discuss below is very important. It will make your understanding of dates whole lot easier. Pay special attention.
When you think of Date, think of milliseconds from epoch (January 1, 1970) . Date in java is just a wrapper around milliseconds. Let me clarify this with an example.
1: import java.util.Date;
2: import java.util.TimeZone;
3:
4: public class DateTest {
5:
6: public static void main(String[] args) {
7: TimeZone.setDefault(TimeZone.getTimeZone("GMT-5"));
8: Date gmtMinusFiveDate = new Date();
9:
10: TimeZone.setDefault(TimeZone.getTimeZone("GMT+1"));
11: Date gmtPlusOneDate = new Date();
12:
13: TimeZone.setDefault(TimeZone.getTimeZone("GMT+7"));
14: Date gmtPlusSevenDate = new Date();
15:
16: System.out.println("GMT-5 date milliseconds: "+gmtMinusFiveDate.getTime());
17: System.out.println("GMT+1 date milliseconds: "+gmtPlusOneDate.getTime());
18: System.out.println("GMT+7 date milliseconds: "+gmtPlusSevenDate.getTime());
19: }
20: }
Output
GMT-5 date milliseconds: 1319391112399 GMT+1 date milliseconds: 1319391112400 GMT+7 date milliseconds: 1319391112400
Above we created Date objects in different time zones (GMT-5, GMT+1, and GMT+7) and printed their millisecond value.
Surprised that all the date instances print same value for milliseconds? (the 1 millisecond difference between 1st and 2nd line in the output is because of when in the program the date object is created). Shouldn't they be hours apart? Not really. This is how it works.
Let's say, we created a Date instance in a machine running in Arizona Timezone (GMT-7) on Oct 21, 2011 at 06:00. Think of it as below is what happens (although the machine would always have millis calculated in GMT from Jan 1, 1970 midnight)
- Convert the time to GMT. So, Oct 21, 2011 06:00 will be converted to Oct 21, 2011 13:00 (remember GMT-7, so just add 7 hours)
- Calculate number of milliseconds elapsed from Jan 1, 1970 to Oct 21, 2011 13:00 (lets say )
- Create a Date object that is a wrapper around this milliseconds value.
that is the reason, regardless of where the machine is running (or what the default timezone is), the date object would only represent the number of milliseconds since January 1, 1970, 00:00:00 GMT. That's it. That is the reason all three lines in the output show the same value.
Solid understanding of the above concept is essential when one is dealing with dates in Java. If you did not understand the above, go back again and read one more time. Once you understand the above concept everything else should be easy to understand.
See my post about Parse String to Java Date and vice versa
Wednesday, October 12, 2011
Configure SSH for multiple remote machines without the need for defining parameters every time
I usually connect to remote machines and giving ssh key location, hostname, and username etc.. every time is painful, isn't there a simple way to configure it?
Yes, there is.
1. Create a .ssh directory under home directory if it does not exist.
4. To connect to remote machine.
5. To copy a file from demo server to your machine
Yes, there is.
1. Create a .ssh directory under home directory if it does not exist.
$ mkdir -p ~/.ssh
2. Open a file named config (create it if it does not exist)
$ vi ~/.ssh/config
3. Configure hosts in this file. Copy the following code and replace the values accordingly.
#Demo server configuration
Host demo
HostName 12.14.134.123
User john
IdentityFile ~/.ssh/ssh_key
#Test server configuration
Host test
HostName 12.14.134.124
User john
IdentityFile ~/.ssh/ssh_key
4. To connect to remote machine.
$ ssh demo
5. To copy a file from demo server to your machine
$ scp demo:~/file1.txt ~/Downlods/
MapReduce Simplified
What is it?
MapReduce is a programming model to simplify processing of huge data on large number of machines. This programming model was introduced in the paper published by google's Jeffrey Dean and Sanjay Ghemawat. More details at http://labs.google.com/papers/mapreduce.html.
Why?
Programmers without any experience of parallel programming and distributed systems can easily write programs to process huge data sets on large number of machines using this model.
When?
Need to process lots of data
How?
Write map, reduce functions and feed them to the MapReduce framework. The framework takes care of slicing and distributing the work to multiple machines, processing, handling failures, and giving the result back.
Since picture is a thousand words, let's see how we can take 1,000,000 text documents, look through them, and find how many times each word is used. Actually wait a minute, to make this example simple, lets do just do 2 documents. But, huge data is where this programming model shines. I borrowed this example from hadoop tutorial at http://hadoop.apache.org/common/docs/current/mapred_tutorial.html
1) In the first step, we feed these two documents to the master node. Master node splits the data and sends it to worker nodes (perform map function) to process the data.
2) Now, worker nodes process the data and give the result back to the master node. Master node then collects the data from all the worker nodes, arranges them by key, and sends them to other worker nodes to perform reduce function.
3)After performing the reduce function, worker nodes give the result back to the master node. Now, the master node processes the resulting data and gives the result (which is count of all words from both documents).
When you think about huge amount of data, leveraging cluster of machines using MapReduce programming model is an efficient way to deal with it.
MapReduce is a programming model to simplify processing of huge data on large number of machines. This programming model was introduced in the paper published by google's Jeffrey Dean and Sanjay Ghemawat. More details at http://labs.google.com/papers/mapreduce.html.
Why?
Programmers without any experience of parallel programming and distributed systems can easily write programs to process huge data sets on large number of machines using this model.
When?
Need to process lots of data
How?
Write map, reduce functions and feed them to the MapReduce framework. The framework takes care of slicing and distributing the work to multiple machines, processing, handling failures, and giving the result back.
Since picture is a thousand words, let's see how we can take 1,000,000 text documents, look through them, and find how many times each word is used. Actually wait a minute, to make this example simple, lets do just do 2 documents. But, huge data is where this programming model shines. I borrowed this example from hadoop tutorial at http://hadoop.apache.org/common/docs/current/mapred_tutorial.html
1) In the first step, we feed these two documents to the master node. Master node splits the data and sends it to worker nodes (perform map function) to process the data.
When you think about huge amount of data, leveraging cluster of machines using MapReduce programming model is an efficient way to deal with it.
Wednesday, September 14, 2011
Most useful vi editor ( vim editor ) commands
Backspace and arrow keys behave differently in vi editor. Better alternative is vim editor.
Install vim editor as below
Alright, before we start, vim editor has two modes.
Open File
Change to insert mode by hitting the key i and type few lines so that we can use some other commands.
Now, come back to command mode by hitting Esc
Save file
Quit modified file without saving
Quit unmodified file
Great. Now you can open a file, type some text, save and exit.
Lets open the file again with vim editor and try some more commands. I am assuming that you typed in atleast 10 lines.
Copy a line
Copy 5 lines
paste the copied line/s from buffer
delete a line
delete 5 lines (home work :) )
Greate now you can copy n paste too. saves lot of time, isn't it. Now the most useful command
Search
Case-Insensitive Search
Go to the end of file
That's it. These are the basic commands that you need to know for doing some simple things with vim editor.
Install vim editor as below
$sudo apt-get install vim
Alright, before we start, vim editor has two modes.
- Command Mode: This is the default mode. When vim editor is first started, it enters in this mode. Command mode is useful for tasks like copying, deleting, seaching etc..
- Insert Mode: This is the mode for entering text
Since vim is in command mode initially, to get to insert mode, all you need to do is, hit key i
To get back to command mode from insert mode, hit key Esc
Note: there are other keys that can change the editor from command mode to insert mode but to keep this post simple, I am not going to discuss that
Now to the commands
Open File
$vim sample_file.txt
Change to insert mode by hitting the key i and type few lines so that we can use some other commands.
Now, come back to command mode by hitting Esc
Save file
:wq
Quit modified file without saving
:q!
Quit unmodified file
:q
Great. Now you can open a file, type some text, save and exit.
Lets open the file again with vim editor and try some more commands. I am assuming that you typed in atleast 10 lines.
Copy a line
yy
Copy 5 lines
5yy (you get the idea)
paste the copied line/s from buffer
p
delete a line
dd
delete 5 lines (home work :) )
Greate now you can copy n paste too. saves lot of time, isn't it. Now the most useful command
Search
:/word (after hitting enter, use n to go to next match)
Case-Insensitive Search
:/cword
Go to the end of file
:$
That's it. These are the basic commands that you need to know for doing some simple things with vim editor.
Friday, September 2, 2011
jps can not detect the tomcat process? not a problem!
Before going any further, first thing you need to do is to check and see what version of jdk is being used by tomcat. Is it Java 6 Update 23 or Java 6 Update 24? you are at the right place
Why this problem?
some bug in jvm. I would not go into details. if you are interested please look at below links
https://issues.apache.org/bugzilla/show_bug.cgi?id=50518
http://stackoverflow.com/questions/6287926/jps-not-showing-tomcat-process
How to fix it?
Solution 1
Use any other version than the above mentioned ones
Solution 2
If, for some reason, you have to keep the same version. do below.
Happy monitoring...
Why this problem?
some bug in jvm. I would not go into details. if you are interested please look at below links
https://issues.apache.org/bugzilla/show_bug.cgi?id=50518
http://stackoverflow.com/questions/6287926/jps-not-showing-tomcat-process
How to fix it?
Solution 1
Use any other version than the above mentioned ones
Solution 2
If, for some reason, you have to keep the same version. do below.
- Find where tomcat is saving the monitoring related data.
This normally gets saved at $java.io.tmpdir. Easy way to figure out is to search for directories starting with hsperf* on your machine.
some thing like below would work on linux.
$find / -name hsperf* - lets say it is found at /tmp/tomcat6_tmp/hsperfdata_root$jps -J-Djava.io.tmpdir= /tmp/tomcat6_tmp
- For any other jvm monitoring tool to work, the tool needs to know where the above found file is. for example for jstat to give details about your tomcat process (lets say processid/vmid is 12345..)
$jstat -gc 12345 -J-Djava.io.tmpdir=/tmp/tomcat6_tmp
Happy monitoring...
Wednesday, August 31, 2011
Create new local git repo and add existing code to it
1. create repo
$mkdir /dev/repos/proj.git
$cd /dev/repos/proj.git
$git init --bare
2. add existing code to it$cd /dev/repos/proj.git
$git init --bare
$cd /dev/existingdir
$git init
add any directories/files to be ignored to .gitignore in this directory. create .gitignore if it does not exist$git init
$git add *
$git commit -m "message"
$git remote add origin /dev/repos/proj.git
$git remote update
$git push origin master
$git commit -m "message"
$git remote add origin /dev/repos/proj.git
$git remote update
$git push origin master
Tuesday, August 30, 2011
Linux Commands Cheat Sheet
List services with ports, pids, and addresses
List processes and sort them by fourth column
$netstat -tulpn
List processes and sort them by fourth column
$ps -e | sort -b -k4
Subscribe to:
Posts (Atom)