Tuesday, May 31, 2011

Similarity of android applications or "rip-off indicator"

Hi !

By using algorithms described in the previous post, it's possible to detect if an application is really close from another one. And it can be really useful in many situations, like :
  • check if your application has been stolen by someone, for now it's very easy to rip off an application from the android market, and to crack/re-package the application with smali/basksmali/apk-tool to re-inject the application in the market,
  • check if you use a correct obfuscator (proguard :) ?),
  • check the new methods injected by a malware.
We can use the similarity between :
  • methods (like described in the previous post),
  • Android/Java API,
  • strings,
  • constants (integer, float, ...),
  • variable initialisations,
  • exceptions,
  • control flow graph,
  • fill array data,
  • ....
For now, the program (androsim.py is available on androguard repository) uses only the first/second/third points, and it calculates the inclusion similarity (percentage) of the first application inside the second one.

But, it's interesting to see it in action, and to view first results :

with two quite identical applications :

The display is very basic, you can view :
  • DIFF METHODS : how many methods have been really compared (they are quite the same),
  • NEW METHODS : how many methods are totaly new in the second application,
  • MATCH METHODS : how many methods matched perfectly,
  • DELETE METHODS : how many methods have been deleted in the first application.
And the two latest lines are :
  • the marks to calculate the final score (0.0 is a good mark, 1.0 not),
  • the final percentage score (100.0 indicates that the applications are the same).

with two different applications :

with an original application and the application "obfuscated" by proguard :

At the end of the obfuscation, the application are quite the same (there is an optimizer) because there is no obfuscation in this software... but there is a good java obfuscator ?:)

with infected applications :

These two applications have been infected by a malware, that's why we can see that we have :
  • identical methods,
  • new methods (the methods of the malware !!).
As I said, the algorithm must be improve with new tests, documentations ... but I think it's possible to do the same things with classical assembly applications because it's a very generic algorithm.

The whitepaper which describes all algorithms is coming soon, "stay tuned" for new examples :)

See ya !!!

1 comment:

  1. Hi,
    Very interesting tool, I'm trying to learn how to use it.

    About this comparison feature, I think that giving an indicator like percentage of matches/diffs/new methods could be useful (just a simple suggestion)

    Thank you for creating this tool (and for your GO lessons by the way ;-D )