? QA Design Gurus: Huge Logs’ Comparison made easy!

May 10, 2016

Huge Logs’ Comparison made easy!


Log comparison or log analysis is an integral part of Test results verification. This could be a daily routine while testing runtime of the applications/databases or drivers or even the verification of the logs generated by the Automation frameworks.

But this process might become tedious and cumbersome when the log files get huge (say 30K-40K lines). Though there are many handy tools like ‘Beyond Compare’, ‘Win Merge’ for text comparison, the huge length of the file makes it a visual pain and prone to human error.

The complexity might further rise, if the files under comparison differ in their alignment. Log files generally contain dates, build numbers, names of the machines, databases/servers/apps that they connect to, which differ from run to run adding to the differences list that the diff tools show up.

“Divide and Rule” could be the mantra in simplifying things quickly in such cases. As a quick work around, one can split the huge file based on a pattern into manageable small files so that the comparison through the tools gets easier. This minimizes the impact of the differences in the alignment or order of the text in the files under comparison.



There are again n number of file splitter tools available. “GSplit” is one such file splitter tool which I tried, that was simple and easy to use. It can split based on a pattern or based on the number of lines we intend to put in each chunk etc. The good part is, it can also, unite things back in to one big file.
The tool provides command line option too and hence can be integrated into Automation or continuous Integration.

More details about the tool here.

Every problem will have multiple solutions. It depends on n number of factors on which one to pick. Sometimes, we would want to go for an easier, quicker way of solving the problem (though not a recommended way in long term) but at times we would want to take time to research, investigate and find an unshakable solution possible. “Divide and Rule” strategy for this problem of huge file comparison belongs to the short term, ‘get it quick’ category of solution.

Ideally, if possible, we better avoid generating such huge files. Especially, when generating from an Automation framework. In cases where it is unavoidable or immutable, we may also look at other long term options where the human eye itself is not involved in comparison. Instead, we should just get out filtered, semantical, rule based differences between two files – all automated!!



Your Thoughts??

No comments:

Post a Comment