Basics: Binary files comparing

Preface...

Construction of a small-size patch module is closely related to the number of changes brought to the new version versus an old one, and also to the efficiency of the methods, capable to determine these changes and present them in a compact form. Well-known algorithms of text data comparing cannot help in comparing of the binary data when the structure of files is completely unknown in advance, and the nature of these changes can be poorly predicted. Attempt to apply LCS-search algorithms (Largest Common Subsequence) also have shown their incompetence. Comparison of binary files requires a special approach which is successfully implemented in PatchFactory3. In comparison with the previous versions of the program (1.x and 2.x) we have managed to greatly increase the quality and the speed of comparing process - presented in version 3 of PatchFactory.

 

The basis of the algorithm is the ability to find concurrence in compared files at a level of octet-byte subsequences. Such byte-oriented nature of the used algorithm allows to make the efficiency of its work independent of a file format. The size of the resulting difference file, in the view of time spent on its construction is assumed as the main criterion of patch-building algorithm efficiency.

 

Key features of new algorithm:

· Significant improvement of algorithm quality parameters (generated difference files are smaller, work speed is greater);
· Special optimization providing considerable raise of comparison quality for executable modules (exe, dll);
· Comparing files with size up to 2^63 bytes;
· Flexible control under ratio "time/result" with the help of different comparing methods selection;
· Setting utilized resources constraints (memory size, process priority);
· Caching of comparing results. Storage of comparing results in intermediate files allows to significantly reduce the time of patch building.

 

The comparing algorithm is implemented as a separate console program dfbuild.exe In PatchFactory v3. Accepting as entrance parameters the file to be compared (old and new), dfbuild constructs a difference between them and saves it in a df-file.

 

The table below contains the examples (implemented for well-known software products main exe-files), illustrating the efficiency of df-files in comparison with LZMA-compression. Percentage values show the ratio of the LZMA-compressed file and df-file sizes to the size of a new file.

 

 

 

File

New file size, bytes

LZMA-compression

DF-file size

bytes

%

bytes

%

RAR.exe
[3.40 beta3] [3.42]

297 472

123 733

41.6

10 291

3.5

Winamp.exe
[5.04] [5.05]

980 480

303 275

30.9

12 240

1.2

TheBat.exe
[3.4.0.933] [3.5.0.1013]

8 955 464

2 632 520

29.4

1 493 827

16.7

HelpMan.exe
[3.4] [3.5]
(Help&Manual)

5 139 456

1 704 393

32.2

284 577

5.5

SPECCTRA.exe
[10.2] [15.2]
(SPECCTRA ShapeBased Automation Software

13 307 978

4 190 654

31.5

1 245 494

9.4

Fireworks.exe
[7.0.0.288] [7.0.2.295]

14 696 448

4 861 690

33.1

400 728

2.7

 

Converted from CHM to HTML with chm2web Standard 2.75 (unicode)