Java: Compute Source Code Similarity Based on Jaccard Similarity Coefficient

Compute Source Code Similarity I have tried to analyze source code by various methods recently. In this post, I will show you my artifact - Compute source code similarity based on Jaccard Similarity Coefficient. Jaccard Similarity Coefficient is an common and quick techniqueto compute similarity between 2 documents. You can see more details in Jaccard index or MinHash . My similarity computation strategy is below: Calculate hash value for each line for each files. Create set which contains hashes calculated in the above process for the each files. Compute Jaccard Similarity Coefficient for all combinations of the files. Source Code Now I can show you code snippets for compute source code similarity based on Jaccard Similarity Coefficient. Jaccard Simlarity package com.dukesoftware.utils.text; import java.util.HashSet; import java.util.Set; /** * Jaccard similarity coefficient http://en.wikipedia.org/wiki/MinHash */ public class JaccardSimlarity<T> { priv

Java 8: Process Each Line in File

Here is the code for iterating each line in file. You simply write lambda expression for processing line and doing something nice. private static void processLine(File file, Consumer<String> lineProcessor) throws IOException{ try(FileReader in = new FileReader(file); BufferedReader br = new BufferedReader(in)){ String line; while ((line = br.readLine()) != null) { lineProcessor.accept(line); } } } The below code is an example usage of processLine method. The code simply print out lines in file. public static void main(String[] args) throws IOException { processLine(new File("c:/temp/test.txt"), System.out::println); }

Java: Remove Comments, Annotations and Extra Spaces From Source Code

I have written Java code for removing comments, annotations and extra spaces from Java source code. package com.dukesoftware.utils.text; import java.io.BufferedReader; import java.io.File; import java.io.FileNotFoundException; import java.io.FileReader; import java.io.IOException; import com.dukesoftware.utils.io.IOUtils; public class JavaSourceUtils { public static void main(String[] args) throws FileNotFoundException, IOException { String cleanedUpSource = removeCommentAndAnnotation(IOUtils.userDirectory( "workspace2/DukeSoftwareUtils/src/main/java/com/dukesoftware/utils/text/JavaSourceUtils.java" )); System.out.println(cleanedUpSource); } public static final String removeCommentAndAnnotation(File file) throws IOException { try(BufferedReader br = new BufferedReader(new FileReader(file))) { return removeCommentAndAnnotation(br); } } private final static int COD

Java: Count String Occurrence

Here is the code for count up string given as a second parameter appears on string given as a first parameter. public final static int countOccurrence(String src, String search) { for(int count = 0, fromIndex = 0;;) { fromIndex = src.indexOf(search, fromIndex); if(fromIndex < 0) return count; fromIndex++; count++; } }

Java: Read Exif Metadata Using metadata-extractor

In this post, I will show you the code snippet for extracting jpeg image exif metadata using metadata-extractor . Code package com.dukesoftware.image; import com.drew.imaging.ImageMetadataReader; import com.drew.imaging.ImageProcessingException; import com.drew.metadata.Directory; import com.drew.metadata.Metadata; import com.drew.metadata.Tag; import java.io.File; import java.io.IOException; public class MetaDataExtractor { public static void main(String[] args) throws IOException { printMetaData(new File("c:/temp/metadata_eample.jpg")); } private static void printMetaData(File file) throws IOException { try { Metadata drewmetadata = ImageMetadataReader.readMetadata(file); for (Directory directory : drewmetadata.getDirectories()) { System.out.println("==="+directory.getClass().getName()+"==="); for (Tag tag : directory.getTags()) { System.out.p

Java: Read Exif Metadata Using Sanselan

Recently I have been investigating image library in Java. In this post, I will show you the code snippet for extracting jpeg image exif metadata using Sanselan . Code package com.dulesoftware.image; import java.io.File; import java.io.IOException; import java.util.List; import org.apache.sanselan.ImageReadException; import org.apache.sanselan.Sanselan; import org.apache.sanselan.common.IImageMetadata; import org.apache.sanselan.common.ImageMetadata.Item; import org.apache.sanselan.formats.jpeg.JpegImageMetadata; import org.apache.sanselan.formats.tiff.TiffField; import org.apache.sanselan.formats.tiff.TiffImageMetadata; public class SanselanMetadataExample { public static void main(String[] args) throws Exception { printMeatadata(new File("c:/temp/metadata_eample.jpg")); } public static void printMeatadata(File file) throws ImageReadException, IOException { IImageMetadata sanselanmetadata = Sanselan.getMetadata(file); if (sanselan

Java: Extract Metadata Using Apache Commons Imaging -

Introduction In the previous post , I showed available tags in ExifTagConstants, TiffTagConstants, GpsTagConstants classes. In this post, I will show you code snippet for extracting metada from jpeg using the tags. In order to run example code, you should download commons-imaging.jar from here . Somehow the official homepage of Commons Imaging doesn't provide commons-imaging.jar!! Code package com.dulesoftware.image; import java.io.File; import java.io.IOException; import java.lang.reflect.Field; import java.lang.reflect.Modifier; import java.util.ArrayList; import java.util.Arrays; import java.util.List; import java.util.stream.Collectors; import org.apache.commons.imaging.ImageReadException; import org.apache.commons.imaging.Imaging; import org.apache.commons.imaging.common.IImageMetadata; import org.apache.commons.imaging.common.IImageMetadata.IImageMetadataItem; import org.apache.commons.imaging.formats.jpeg.JpegImageMetadata; import org.apache.commons.imaging.formats.t

Java: Available Tags for Extracting Metadata Using Apache Commons Imaging

Introduction In this Java: Read Image Metadata by Java Image IO post, I demonstrated the code snippet for reading image metadata only using standard java imageio library. The problem is the standard image library cannot read jpeg exif metadata as human readable format (they can be extracted as byte[] data). In this post, I will show you the code for reading jpeg exif metadata using Apache Commons Imaging library. In order to run the example code, you should download commons-imaging-1.0-SNAPSHOT.jar from commons-imaging . Basic of Read Exif Data Extracting jpeg exif metadata can be done by using findEXIFValue method in JpegImageMetadata class. The method takes TagInfo as an argument and return back TiffField . Available TagInfo static fields (constants) are mainly defined in the following class in org.apache.commons.imaging.formats.tiff.constants package: ExifTagConstants TiffTagConstants GpsTagConstants Available TagInfo in ExifTagConstants, TiffTagConstants, Gps

Java: Read Image Metadata by Java Image IO

Introduction I have googled and wrote small code snippet for parsing and printing image metadata in Java. The biggest problem of this program (or maybe Java standard imageio library) cannot read jpeg exif data. I have found following solutions for this problem. Java Advanced Image API (Plugin) <- Looks quite obsolete. I also found the project in java.net - jai-imageio , but again not maintained for a long time :( Apache Commons Imaging : Tested in Java: Available Tags for Extracting Metadata Using Apache Commons Imaging Sanselan : Tested in Java: Read Exif Metadata Using Sanselan metadata-extractor : Testes in Java: Read Exif Metadata Using metadata-extractor J-Exiftool Both libraries are not updated recently but I guess it's because they don't need anymore enhancements. I will investigate and re-post how to use these image libraries soon... In this post, I only show you the code snippet for parsing and printing image metadata using Java standard Image API

Java: Extract Gif Images From an Animated Gif Image

In this post, I will show you the code for extracting gif frame images from an animated gif image. I will show you two types of code: Simply extracting gif images stored in the original animated gif image Creating gif image by rendering and accumulating each gif frames stored in the original gif image Code for Extracting Gif Image From An Animated Gif Java Code package com.dukesoftware.utils.image; import java.awt.image.BufferedImage; import java.io.File; import java.io.IOException; import javax.imageio.ImageIO; import javax.imageio.ImageReader; import javax.imageio.stream.ImageInputStream; public class GifImageUtils { public static void main(String[] args) throws IOException { String sourceGif = "c:/temp/SmallFullColourGIF.gif"; saveAnimatedGifFramePartsToImage(sourceGif, "c:/temp/gif_parts"); } public static void saveAnimatedGifFramePartsToImage(String input, String outDir) throws IOException { ImageReade

Java: Unsigned Right Shift Operator

When you work on image pixel manipulation, you may use bit shift operators such as below: // assume a,r,g,b is max 8 bits (0-255) value int pixel = (a << 24 )| (r << 16) | (g << 8) | b; Now I give you a simple question - what will you do for recovering original a, r, g, b value from the pixel int? You may think the code below: int aa = (v & 0xff000000) >> 24; int rr = (v & 0xff0000) >> 16; int gg = (v & 0xff00) >> 8; int bb = (v & 0xff); Actually the above code is wrong. Do you know where it is? The wrong part is right shift operator used for getting value of "aa". In Java, the most left bit is used as a signed bit. If you use the right shift operator to the int which has 1 on the most left bit, the most left bit of the right shifted int remains in 1 to save sign. In the above question case, you should use unsigned right shift operator (>>>) which fills all 0 to bits after right shifted. So the answer

Java: org.w3c.dom.Node to Formatted Xml String

package com.dukesoftware.xml; import java.io.StringWriter; import javax.xml.transform.OutputKeys; import javax.xml.transform.Transformer; import javax.xml.transform.TransformerException; import javax.xml.transform.TransformerFactory; import javax.xml.transform.dom.DOMSource; import javax.xml.transform.stream.StreamResult; import org.w3c.dom.Node; public class XmlUtils { public static String toString(Node node) throws TransformerException { StringWriter writer = new StringWriter(); Transformer transformer = TransformerFactory.newInstance().newTransformer(); transformer.setOutputProperty(OutputKeys.INDENT, "yes"); transformer.transform(new DOMSource(node), new StreamResult(writer)); return writer.toString(); } }

Java: スペルミス修正プログラム

高性能なスペルミス修正アルゴリズムを How to Write a Spelling Corrector で見つけたので紹介します。 リンク先では、理論的背景とコードも説明されていますので、参考になるかと思います。 ここ のJavaバージョンのコードを私なりに書き直してみました。 package com.dukesoftware.spellcorrector; import java.io.IOException; import java.util.ArrayList; import java.util.Collections; import java.util.HashMap; import java.util.List; import java.util.Map; public class SpellCorrector { public static void main(String args[]) throws IOException { Map<String, Integer> nWords = new HashMap<String, Integer>() {{ put("spell", 1); }}; SpellCorrector spellCorrector = new SpellCorrector(nWords); System.out.println(spellCorrector.correct("superl")); } private final Map<String, Integer> nWords; private final String[] a_to_z; public SpellCorrector(Map<String, Integer> nWords) throws IOException { this.nWords = nWords; this.a_to_z = createAtoZStringArray(); } private Str

Google App Engine Java: Setup Test Configuration

If you want to test a class which depends on Google App Engine infrastructure, such as data storing functionality with "PersistenceManagerFactory", you should set up LocalServiceTestHelper before testing. I don't tell you the details very much, but I will show you the minimum test setup in the code below. This code was enough for me to test my data access object functionality. package com.dukesoftware.gaej.test; import org.junit.AfterClass; import org.junit.BeforeClass; import org.junit.Test; import com.google.appengine.tools.development.testing.LocalDatastoreServiceTestConfig; import com.google.appengine.tools.development.testing.LocalServiceTestHelper; public class GoogleAppEngineTest { private static final LocalServiceTestHelper helper = new LocalServiceTestHelper(new LocalDatastoreServiceTestConfig()); @BeforeClass public static void setUp() { helper.setUp(); } @AfterClass public static void tearDown() { he

Google App Engine Task Queue - Using DeferredTask

When to Use Google App Engine Task Queue? You should use task queue when you try to do something taking time such as storing or updating bunch of data to datastore. In Google App Engine, a http request which isn't returned by timeout should be simply failed. The timeout configured on Google App Engine is 60 seconds. Please see https://developers.google.com/appengine/articles/deadlineexceedederrors . So in order to response back to the client quickly, you should use Task Queue for tasks which takes long time. Push Tasks to Task Queue I have read an official document of Task Queue and tried to use it. What frustrates me is the document only explains how to create a task with parameters, push the task to the task queue and pass to the worker servlet. i.e. actual heavy part is treated in the worker servlet. Of course it is fine, but I felt it was a bit indirect way to push tasks to queue. What I desired to do is creating task object and pushing it to the queue in one single

Java: Get All Static Fields Defined in Class by Reflection

If you want to get all static fields in a class, use code snippet below. The key part is "Modifier.isStatic", "getDeclaredFields" methods package com.dukesoftware.reflection; import java.lang.reflect.Field; import java.lang.reflect.Modifier; import java.util.Arrays; import java.util.List; import java.util.stream.Collectors; public class ReflectionUtils { public static List<Field> getStaticFields(Class<?> clazz) { Field[] declaredFields = clazz.getDeclaredFields(); return Arrays.stream(declaredFields) .filter(field -> Modifier.isStatic(field.getModifiers())) .collect(Collectors.toList()) ; } }

Java Reflection: Getter and Setter Method

Reflective Getter and Setter Method In this post, I will show you Java code for getting getter and setter method from given object. The code is below: public final static Method getSetterMethod(Object o, String propertyName, Class<?> paramterType) throws SecurityException, NoSuchMethodException{ return o.getClass().getMethod("set"+toUpperFirstChar(propertyName), paramterType); } public final static Method getGetterMethod(Object o, String propertyName) throws SecurityException, NoSuchMethodException{ return o.getClass().getMethod("get"+toUpperFirstChar(propertyName)); } public final static String toUpperFirstChar(String str){ if(str.isEmpty()) return ""; return str.substring(0, 1).toUpperCase()+str.substring(1, str.length()); } Where to Use Reflective Getter and Setter? You may have a question - "Where should we use reflective getter and setter method?" If you already know where to use them, you can skip the fo

The Simplest Setup to Use Amazon Product API in Java

In this post, I will explain the simplest setup to use Amazon Product API in Java. I know Amazon provides soap interface and we can automatically create Soap code for accessing Amazon Product API. But sometimes, it is too much for light users. So I will show you the simplest way to use Amazon Product API using SignedRequestsHelper. Hope this post helps your application development ;) Preparation Create an AWS Account Get AWS Key, secret key and associate tag: I think you can create them from http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSGettingStartedGuide/AWSCredentials.html Get SignedRequestsHelper: You can download from https://code.google.com/p/amazon-product-advertising-api-sample/source/browse/src/com/amazon/advertising/api/sample/SignedRequestsHelper.java Java Code Example Congratulation! After you finish above preparation, now you can request Amazon Product API!! The following code shows how to use SignedRequestsHelper. The key points of the examp

Switch Html View Based on Devices Using Filter

Introduction In this post, I will explain how to switch view based on devices on the web application - like pc, smartphone, tablet. The basic strategy is checking user agenet which is sent from these devices, and building the specific html for the device. The key issue uis how to check user agent of all coming http requests efficiently and ellegantly. If you use some web application framework such as Java Spring Framework or PHP Symfony framework, the solution is using &qout;Filter&qout; which every http requests pass through. Let me show you the example implementation using Spring framework. Basic Strategy In filter check if "view_mode" in user cookies, set the view mode based on that cookie value if the cookie does not exist, set view mode based on user agent and publish corresponding view mode cookie set the selected view mode to HttpServletRequest attribute. In controller, set view template based on view mode in the HttpServletRequest attribute