Skip to content
Snippets Groups Projects
Commit 48ad5ab7 authored by Alex Deymo's avatar Alex Deymo
Browse files

Select SuffixArray size at runtime.

The suffix array is used to search for strings from the new file in the
old file. The size of each element of the suffix array must be big
enough to hold the length of the old file. In most of the cases we use,
if not all of them, the old file is smaller than 2 GiB, which allows to
use 32-bit integers for the suffix array.

The previous code decided whether to use 64-bit or 32-bit integers
based on the large-file surpport, even in 32-bit computers architectures
which would not work in those cases; and would also use twice as much
memory when constructing the suffix array of inputs smaller than 2 GiB.

This patch selects the suffix array size based on the input size,
effectively using half the memory in the most common case. This also
improves the runtime performance due to lower cache pressure.

As a side effect, this patch hides the suffix array implementation
details from the public interface. This removes the limitation where
the caller of libbsdiff and libbsdiff needed to be compiled with the
same large-file support flags; and also hides the dependency on
libdivsufsort from the public interface.

While doing so, instead of re-implementing the suffix array search
algorithm (which had a history of bugs and corner cases) we re-use the
one from libdivsufsort, adapted to the bsdiff algorithm use-case.

The result is about 40% faster when diffing 10MiB files, and the patch
size is within 0.1%, due to differences in the position selected
whenever more than one match is available.

Bug: 34220646
Test: Added unittests for the suffix array.

Change-Id: Ib9dc85a83bbf10157a3937c5687e5663ab0d20d5
parent 52b6bae3
No related branches found
No related tags found
No related merge requests found
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment