Opened 11 years ago

Last modified 11 years ago

#45 new enhancement

File diff performance patch (reduced disk IO and wall time

Reported by: Alex Harper Owned by: ben
Priority: normal Milestone: 0.12
Component: bbackupd Version: trunk
Keywords: Cc:


The enclosed patch (tested against SVN revision 2104) changes the file diff logic with the following enhancements:

  • Files are read no more than twice (versus read again and again for every block size).
  • Before performing a rolling checksum each server-side block is first checked (by MD5) at its previous location in the file. In the event the block has not changed or moved, the rolling checksum is skipped.
  • Rolling checksums are searched in total-file-coverage order (size times number of blocks) favoring larger blocks in the final recipe.

In my testing these changes improve file diff performance wall time from 2-10x and make the diff process CPU bound (instead of IO bound).

No new dependencies are created by the patch, this is only an algorithmic change. The code passes existing unittests. Additionally, I have tested with my personal data for 3 wks on OS X (i386) without incident.

Attachments (2)

BackupFileDiff.patch (46.7 KB) - added by Alex Harper 11 years ago.
BackupFileDiff.2361.patch (48.3 KB) - added by chris 10 years ago.
new version of 2361 patch, add testbbackupd fix

Download all attachments as: .zip

Change History (3)

comment:1 Changed 11 years ago by Alex Harper

Note: Patch not yet attached because of a problem with Trac attachment permissions. I've reported this to the admin, and will attach the patch ASAP.

Changed 11 years ago by Alex Harper

Attachment: BackupFileDiff.patch added

Changed 10 years ago by chris

Attachment: BackupFileDiff.2361.patch added

new version of 2361 patch, add testbbackupd fix

Note: See TracTickets for help on using tickets.