Monthly Archives: January 2019

Simple python diffing script

The one of the first thing to do in case of n-day vulnerability analysis of an open sourced software is to find out the location of patched code. In some cases this is trivial step which is done just by looking into commits of git repositories, however in some cases, the intention is to hide the security patch or there is no repository available at all.

When there are multiple files within the folder or even the subfolders manual work of identifying changed files one by one is not only boring but also inefficient.

I was trying to find out some software that solves that problem at first but after one disappointing attempt I've decided to write simple python script that solves my problem.

The provided script is not intended to be perfect as there are some limitations by design but it provides an extendable baseline. I believe there are also better solutions for the problem, so if you have a tip don't hesitate to give a comment what you use.

Here is an example of script output from real case scenario..

python cmpfolder.py _Drupal/2018-075/gathercontent/ _Drupal/2018-075/gathercontent-2/ -d
_Drupal/2018-075/gathercontent/gathercontent.install   <--->  _Drupal/2018-075/gathercontent-2/gathercontent.install
441,448d440
<  * Fix permissions on views.
<  */
< function gathercontent_update_7311() {
<   ctools_include('object-cache');
<   ctools_object_cache_clear('view', 'mapping');
< }
< 
< /**

_Drupal/2018-075/gathercontent/gathercontent.module   <--->  _Drupal/2018-075/gathercontent-2/gathercontent.module
540c540
<             $handler = entity_translation_get_handler('node', $node->value());
---
>             $handler = entity_translation_get_handler('node', $node);

_Drupal/2018-075/gathercontent/gathercontent.info   <--->  _Drupal/2018-075/gathercontent-2/gathercontent.info
37,38c37,38
< ; Information added by Drupal.org packaging script on 2018-11-28
< version = "7.x-3.5"
---
> ; Information added by Drupal.org packaging script on 2018-08-07
> version = "7.x-3.4"
41c41
< datestamp = "1543419184"
---
> datestamp = "1533634986"

_Drupal/2018-075/gathercontent/views/gathercontent.views_default.inc   <--->  _Drupal/2018-075/gathercontent-2/views/gathercontent.views_default.inc
21,22c21
<   $handler->display->display_options['access']['type'] = 'perm';
<   $handler->display->display_options['access']['perm'] = 'administer gathercontent';
---
>   $handler->display->display_options['access']['type'] = 'none';
263,264c262
<   $handler->display->display_options['access']['type'] = 'perm';
<   $handler->display->display_options['access']['perm'] = 'administer gathercontent';
---
>   $handler->display->display_options['access']['type'] = 'none';

The script

import os
import hashlib
import argparse

parser = argparse.ArgumentParser()
parser.add_argument("dir1", help="directory1", type=str, action="store")
parser.add_argument("dir2", help="directory2", type=str, action="store")
parser.add_argument("-d", action="store_true", required=False, help="perform diff on files")

args = parser.parse_args()

def getFileHash(path):
    with open(path, "rb") as fH:
        h = hashlib.sha256()
        data = fH.read()
        h.update(data)
        return h.hexdigest().upper()

def findDiff(src, dst):
    srcContent = os.listdir(src)
    dstContent = os.listdir(dst)

    for i in srcContent:
        srcPth = src + os.sep + i
        dstPth = dst + os.sep + i

        if os.path.isfile(srcPth):
            if os.path.isfile(dstPth):
                if getFileHash(srcPth) != getFileHash(dstPth):
                    print ("\n{0}   <--->  {1}".format(srcPth, dstPth))
                    if args.d:
                        os.system("diff {0} {1}".format(srcPth, dstPth))
            else:
                print ("{0} is missing".format(dstPth))
        elif os.path.isdir(srcPth) and os.path.isdir(dstPth):
            findDiff(srcPth, dstPth)

findDiff(args.dir1, args.dir2)