Wednesday, May 18, 2011

Diffing Android Applications

Hi !

During the last month, I tried to implement a new module to do a diff between two android applications. Current tools, like BinDiff, patchdiff, turbodiff ... don't support this feature.

If you would like to diff two applications, the classical first step is to extract the Control Flow Graph, and it's already done in Androguard. After that, you must be able to compare two graphs (the basic blocks are nodes), and it's not an easy task :) But in our laboratory we have worked on topics like graph matching algorithm, by using the information in nodes with the Normalized Compression Distance. The general idea is if you compress an object X, the length of the compression is close to the length of the compression of an object XX. So, if two objects are not quite different, the result of the NCD will be close to 0.0, otherwise if two objects are totaly different, the result will be close to 1.0.

In the repository of Androguard, I added a C library (called "libsimilarity") with python wrapper (with ctypes) to perform such operations, and to switch dynamically the compression algorithm (for now, ZLIB, BZ2, LZMA, XZ, SNAPPY are supported) :

Of course it's not easy to use the NCD, for example, if you try to compare images you will have bad results, and this is the same for assembly instructions or bytecodes instructions, this is why you must have a filter to transform your original information into a neutral form.

Between two applications, the first step is to sort methods to find matching methods (close methods or identical methods). By using Snappy, it's possible to compare globaly and quickly methods with the NCD. So, for a method we have a checksum on instructions to remove identical methods, and we use the signature algorithm to extract a simple representation of a method :

in order to filter and identify close methods to perform the second step. When we have close methods, we extract the CFG of each method, and we identify close basic blocks with the previous algorithm, but by using a simple filter on instructions. When all basic blocks are associated, we apply the longest common subsequence algorithm to extract which instructions have been added or removed.

The final step is to visualize the differences between the two applications. Classicaly, tools used two CFG to display differences, but last year at, Aureliano Calvo from Core Security Technologies shows a tool called aureliax which transforms two CFG into one to display differences. So we have used the same idea in our differences visualization algorithm.

For example, we can patch a java source code of a simple application, and add a new method :

And with the Androdiff tool, it's possible to view differences between the original application, and the new one with previous algorithms :

where only one instruction has been added (with no modification of the control flow) (green color for added instructions, red color for remove instructions) :

or where instructions change the CFG :

or when we have a new method :

For now, the tool is available on the Androguard webpage, all parameters can be changed to add/change filtering or similarity algorithms, but we need to improve it with more examples and documentations.

The whitepaper which describes all algorithms is coming soon, "stay tuned" for new examples :)
See ya !


  1. Excellent tool....Need to try this one out..I'll post my comments once I've tried this tool..In the meantime, thanks for posting this detailed post.

  2. Very beautifully explained.Thanks for such a useful piece of information.
    Android App Design