Select SuffixArray size at runtime.
The suffix array is used to search for strings from the new file in the old file. The size of each element of the suffix array must be big enough to hold the length of the old file. In most of the cases we use, if not all of them, the old file is smaller than 2 GiB, which allows to use 32-bit integers for the suffix array. The previous code decided whether to use 64-bit or 32-bit integers based on the large-file surpport, even in 32-bit computers architectures which would not work in those cases; and would also use twice as much memory when constructing the suffix array of inputs smaller than 2 GiB. This patch selects the suffix array size based on the input size, effectively using half the memory in the most common case. This also improves the runtime performance due to lower cache pressure. As a side effect, this patch hides the suffix array implementation details from the public interface. This removes the limitation where the caller of libbsdiff and libbsdiff needed to be compiled with the same large-file support flags; and also hides the dependency on libdivsufsort from the public interface. While doing so, instead of re-implementing the suffix array search algorithm (which had a history of bugs and corner cases) we re-use the one from libdivsufsort, adapted to the bsdiff algorithm use-case. The result is about 40% faster when diffing 10MiB files, and the patch size is within 0.1%, due to differences in the position selected whenever more than one match is available. Bug: 34220646 Test: Added unittests for the suffix array. Change-Id: Ib9dc85a83bbf10157a3937c5687e5663ab0d20d5
Showing
- Android.bp 2 additions, 0 deletionsAndroid.bp
- Makefile 3 additions, 1 deletionMakefile
- bsdiff.cc 39 additions, 75 deletionsbsdiff.cc
- bsdiff.gyp 2 additions, 0 deletionsbsdiff.gyp
- include/bsdiff/bsdiff.h 12 additions, 19 deletionsinclude/bsdiff/bsdiff.h
- include/bsdiff/suffix_array_index_interface.h 36 additions, 0 deletionsinclude/bsdiff/suffix_array_index_interface.h
- suffix_array_index.cc 174 additions, 0 deletionssuffix_array_index.cc
- suffix_array_index.h 22 additions, 0 deletionssuffix_array_index.h
- suffix_array_index_unittest.cc 79 additions, 0 deletionssuffix_array_index_unittest.cc
Loading
Please register or sign in to comment