About these ads

Desktop Goodies: Houjuu Nue Picture Frame

27 08 2009

You know what? Fasting month is a lot of hassle. I work 50% less efficiency and I often get lightheaded when I yawn. Just read an article in the news of how Indonesian people tend to spend more on foods during the fasting month rather than normal days. Ah, the irony. I thought Ramadhan is an occasion where we get to feel how the unfortunates feel when they can’t buy foods and drinks. Seems that we’re fasting for not to empathize with our brothers and sisters who are less fortunate. Anyway, I can’t concentrate anymore in my workplace so I decided to make this Rainmeter skin for entertaining myself. Then, 40 minutes till I can get home and 1 hour and 40 minutes till I can get to eat and drink again.

8-27-2009 4-11-26 PMSkin Name: Houjuu Nue Picture Frame

File Size: 167 KB

Color Theme: Magenta

Download: http://adf.ly/KhuwI

Settings: You need to configure nue_LandscapeFrame.ini before getting the skin to display your pictures.First, change the “ImagePath” variable to the address/ directory which you want the skin to display your picture. Don’t forget to add a backslash (\) at the end of the directory (Example: C:\Pictures\).If you want the application to crawl the subfolders of the current directory, put the “Subfolders” variable value to “1″. Now, to set the length of time on changing between pictures, you need to set “Update” value (in terms of miliseconds).

Note: A Rainmeter application can be downloaded in my previous post. Don’t forget to extract and move the folder to your Rainmeter skin directory. Happy downloading!

About these ads




Counting Word Occurrence in a TextFile Using Efficient Storage For Faster Retrieval

27 08 2009

The previous day I applied for a position as a JAVA developer for Harita Mineral. I passed my second try in logic test after failing it in the first attempt. So, the next step was the technical test, where I was being judged on how I solve a problem by coding the real deal. The technical test for the company I applied was:

Goal of the Technical Test:
We are to see how you organize your thought, construct your code in accordance to good Object Oriented Programming’s Practice, and how you use data structure to store your data or search algorithm (if you use any) within the given problem.

Test:
You are to write a small Java Program to read and parse words from any given html file then store the words into the memory. User can search any words that he/she wishes to and the program will find the words from the memory and output the result of the search. Read the requirement below for more details.

Requirement:
1. User can enter file that they want the program to read. The file must be in the same directory as the class file.
2. User can enter word to be search as many times as they want
3. User can enter numerical number to give sign to program that they want to exit
4. Program will read and parse words in the given filename by user and store them in memory to search (output the measure time of reading and parsing words)
5. Program will output the number of occurrence of the word that users want to search if the word search exist in memory
6. Program will output the number of milliseconds needed to search the given word.
7. No database usage allowed.
8. Only 1 java file allowed
9. Only JDK Standard Edition (1.6) with standard API you can use to construct the program. (read as = no outside jars of standard issue JDK allowed)
10. Write the program within maximum 150 lines of code (don’t use any comment)

———
Example of Expected Output for the program:

C:>java RunTest
enter filename to read => overview-summary.html (user presses ENTER)
Total unique words = 1000 words – it takes 170 ms
enter word to search => java (user presses ENTER)
Found word java with total 20 occurrences in file overview-summary.html – it takes 90 ms
enter word to search => xml (user presses ENTER)
Found word xml with total 5 occurrences in file overview-summary.html – it takes 100 ms
enter word to search => harita (user presses ENTER)
Not found word harita in file overview-summary.html – it takes 200 ms
enter word to search => 1 (user presses ENTER)
Thank you for using RunTest
C:>

You have 3 days to submit your Java code starting from today.
You may ask questions to clarify any of the instructions written here.

So, to tackle this problem effectively we need to elaborate it into small pieces of tasks:

1. Deciding the correct library to use

We need a function to read input from keyboard as well as reading input from a file. Therefore, we need an input/ output library to handle this function which resides in the java.io. Next is to break the text we acquire from the defined file to only accept words. StreamTokenizer is a fast way to do this since you can define what characters to be a word, number, or symbol. StreamTokenizer also resides in java.io. Now, to store the words into a storage, we can use the library from java.util. The choices are many such as ArrayList, LinkedList, HashTable, HashMap, and so on. I personally choose HashMap since it can promote faster retrieval and element storing. I’ll be explaining more on this later.

So, in the first step, we got ourselves the code like this:

import java.io.*;
import java.util.*;

public class WordCount {
   public static void main (String[] args){

   }
}

2. Reading user input for what file to be read

Next, comes the problem to read user’s keyboard input. We need System.in to access the input from the command line and a converter called InputStreamReader to convert the input into readable format for JAVA and BufferedReader to convert it further to human readable format by calling the readline function in the BufferedReader class. Oh yeah, we need to catch an exception in accessing the readline function so you need a try method for that.

Anyway, here’s the code that you should have now:

public class WordCount {
   public static void main (String[] args){
      String inputText = "";
      System.out.print("Enter a filename to read: ");
      InputStreamReader converter = new InputStreamReader(System.in);
      BufferedReader in = new BufferedReader(converter);
      try {
         inputText = in.readLine();
      }

      catch(Exception e)  {
         System.out.println(""+e);
      }
   }
}

3. Reading the user-defined textFile

This is an easy one. You can use FileReader to access the specific file. However, you might want to also store it in the BufferedReader for the usage of StreamTokenizer class for defining delimiter to break down the sentence into words.

Therefore, the code now should be like this:

public class WordCount {
   public static void main (String[] args){
      String inputText = "";
      System.out.print("Enter a filename to read: ");
      InputStreamReader converter = new InputStreamReader(System.in);
      BufferedReader in = new BufferedReader(converter);
      try {
         inputText = in.readLine();
         BufferedReader inputFile = new BufferedReader(new FileReader(inputText));
      }

      catch(Exception e)  {
         System.out.println(""+e);
      }
   }
}

4. Breaking down the sentence into words

This is one of the core function that need’s to be sorted out, so read carefully. StreamTokenizer class provide a handy amount of features to store our input. The first you want to do is to put the file that has been read by the system in the BufferedReader to a newly invoked StreamTokenizer class. Next is to define what characters you want to consider as words and be stored in a public variable of StreamTokenizer called “sval”. To do this, reset all the characters by using resetSyntax function to put all the characters to ordinary characters. Then, use wordChars function to define which characters to be words. In this case, use wordChars(‘A’, ‘Z’) and wordChars(‘a’,'z’) to define all words to be alphabetic only. Now, to iterate all the words that has been stored, you can use nextToken function and check whether the token is a word by checking whether the value of token is StreamTokenizer.TT_WORD (constant type to indicate that the token is a word).

You now should have the code like this:

public class WordCount {
   public static void main (String[] args){
      String inputText = "";
      System.out.print("Enter a filename to read: ");
      InputStreamReader converter = new InputStreamReader(System.in);
      BufferedReader in = new BufferedReader(converter);
      try {
         inputText = in.readLine();
         BufferedReader inputFile = new BufferedReader(new FileReader(inputText));
         StreamTokenizer st = new StreamTokenizer(inputFile);
         st.resetSyntax();
         st.wordChars('A','Z');
         st.wordChars('a','z');

         int found = StreamTokenizer.TT_WORD;
         String keyBuffer = "";
         while ((found != StreamTokenizer.TT_EOF))   {
            found = st.nextToken ();
            if(found == StreamTokenizer.TT_WORD) {
               String token = st.sval;
            }
         }
      }

      catch(Exception e)  {
         System.out.println(""+e);
      }
   }
}

5. Storing the retrieved word into HashMap

We’re using HashMap because of the easiness and efficiency usage of this particular storage. HashMap introduce the concept of key and value, where to access a value that has been stored into the HashMap, you need a certain key value. Think of it as a door, where if you want to access to what inside the door, you’ll need a certain key to open the door. Therefore, we can use the word as the key and the number of occurence of that particular word as the value to the corresponding key.

Now, we need to always check whether the key (the word) has already stored in the HashMap. We can use the get function in the HashMap class to check whether key has already been registered to the HashMap. If the get function returned null, it means that it is a new word. Use the put function to store that word as the key and “1″ as the number of occurence of the value of the corresponding value.

On the other hand, if the get function do not return null, that means that the key has already been registered to the HashMap. Therefore we need to override the value of that particular key by using the put function. The brilliant thing about HashMap that it only allows unique keys to be registered. Any put function to an existing key will resulted in the value of the key to be overwrite.

Now, the code you have been doing so far, should be like this:

public class WordCount {
   public static void main (String[] args){
      String inputText = "";
      HashMap<String,Integer> wordTank = new HashMap<String,Integer>();
      System.out.print("Enter a filename to read: ");
      InputStreamReader converter = new InputStreamReader(System.in);
      BufferedReader in = new BufferedReader(converter);
      try {
         inputText = in.readLine();
         BufferedReader inputFile = new BufferedReader(new FileReader(inputText));
         StreamTokenizer st = new StreamTokenizer(inputFile);
         st.resetSyntax();
         st.wordChars('A','Z');
         st.wordChars('a','z');

         int found = StreamTokenizer.TT_WORD;
         String keyBuffer = "";
         while ((found != StreamTokenizer.TT_EOF))   {
            found = st.nextToken ();
            if(found == StreamTokenizer.TT_WORD) {
               String token = st.sval;
               if(wordTank.get(token) == null) {
                  wordTank.put(token,1);
               }
               else {
                  wordTank.put(token,wordTank.get(token)+1);
               }
            }
         }
         System.out.println("Total Unique words: " + wordTank.size());
      }

      catch(Exception e)  {
         System.out.println(""+e);
      }
   }
}

6. Retrieving a word from user input

The next requirement is to provide a search feature to check whether a word that is typed by a user exists in the storage. If it exists, display the occurrence of that particular word. By now you should’ve guess that we need another readLine function to read user’s keyboard input. Since the search feature is repeated until the user types a numeric number to quit, put the next line code on an infinite loop and parse the user input into integer by using Integer.parseInt(string variable) function. If the input is a number, the system quits, while non-numeric will bring you to the retrievement of the word that is inputted by the user.

To retrieve the word in the storage, use the usual get function in the HashMap. If it returns null, that means the word that is being searched is not there and vice versa. So far, you should have this code now:

public class WordCount {
   public static void main (String[] args){
      String inputText = "";
      HashMap<String,Integer> wordTank = new HashMap<String,Integer>();
      System.out.print("Enter a filename to read: ");
      InputStreamReader converter = new InputStreamReader(System.in);
      BufferedReader in = new BufferedReader(converter);
      try {
         inputText = in.readLine();
         BufferedReader inputFile = new BufferedReader(new FileReader(inputText));
         StreamTokenizer st = new StreamTokenizer(inputFile);
         st.resetSyntax();
         st.wordChars('A','Z');
         st.wordChars('a','z');

         int found = StreamTokenizer.TT_WORD;
         String keyBuffer = "";
         while ((found != StreamTokenizer.TT_EOF))   {
            found = st.nextToken ();
            if(found == StreamTokenizer.TT_WORD) {
               String token = st.sval;
               if(wordTank.get(token) == null) {
                  wordTank.put(token,1);
               }
               else {
                  wordTank.put(token,wordTank.get(token)+1);
               }
            }
         }
         System.out.println("Total Unique words: " + wordTank.size());
         Integer occurrence;
         while(true) {
            System.out.print("Enter word to search => " );
            inputText = in.readLine();
            try {
               Integer.parseInt(inputText);
               System.out.println("Thanks for using WordCount");
               break;
            }
            catch(Exception e) {}
            occurrence = wordTank.get(inputText);
               if(occurrence != null) {
                  System.out.println("Found word " + inputText + " with total occurrences of " +
                     occurrence);
               }

               else{
                  System.out.println("Can't found word " + inputText + " in the file.");
               }
          }
      }

      catch(Exception e)  {
         System.out.println(""+e);
      }
   }
}

7. Calculate your algorithm performance

Now, you need to test whether HashMap and StreamTokenizer can provide you faster storing and retrieving element in an array-like  structure. Use System.currentMillis() to get the current millisecond and put it between the start and the end of coding block that you want to analyze.

Now, the final code should be like this:

public class WordCount {
   public static void main (String[] args){
      String inputText = "";
      HashMap<String,Integer> wordTank = new HashMap<String,Integer>();
      System.out.print("Enter a filename to read: ");
      InputStreamReader converter = new InputStreamReader(System.in);
      BufferedReader in = new BufferedReader(converter);
      try {
         inputText = in.readLine();
         long startTime = System.currentTimeMillis();
         long endTime;
         BufferedReader inputFile = new BufferedReader(new FileReader(inputText));
         StreamTokenizer st = new StreamTokenizer(inputFile);
         st.resetSyntax();
         st.wordChars('A','Z');
         st.wordChars('a','z');

         int found = StreamTokenizer.TT_WORD;
         String keyBuffer = "";
         while ((found != StreamTokenizer.TT_EOF))   {
            found = st.nextToken ();
            if(found == StreamTokenizer.TT_WORD) {
               String token = st.sval;
               if(wordTank.get(token) == null) {
                  wordTank.put(token,1);
               }
               else {
                  wordTank.put(token,wordTank.get(token)+1);
               }
            }
         }
         System.out.println("Total Unique words: " + wordTank.size() + " it takes " +
            (System.currentTimeMillis()-startTime) + " ms\n");
         Integer occurrence;
         while(true) {
            System.out.print("Enter word to search => " );
            inputText = in.readLine();
            try {
               Integer.parseInt(inputText);
               System.out.println("Thanks for using WordCount");
               break;
            }
            catch(Exception e) {}
            startTime = System.currentTimeMillis();
            occurrence = wordTank.get(inputText);
            endTime = System.currentTimeMillis();
               if(occurrence != null) {
                  System.out.println("Found word " + inputText + " with total occurrences of " +
                     occurrence + " in the file. It takes " + (endTime-startTime) + " ms.");
               }

               else{
                  System.out.println("Can't found word " + inputText + " in the file. It takes " +
                     (endTime-startTime) + " ms.");
               }
          }
      }

      catch(Exception e)  {
         System.out.println(""+e);
      }
   }
}

Test that with your desires text file. How fast did the algorithm perform in compare to using standard array and manual tokenizing?








Follow

Get every new post delivered to your Inbox.

Join 985 other followers