Opened 12 years ago

Last modified 11 years ago

#8 new task

Improve handling of directories with many files

Reported by: Martin Ebourne Owned by: chris
Priority: normal Milestone: 0.12
Component: box libraries Version: trunk
Keywords: Cc:


Both the backup client and the server housekeeping code use inefficient directory indexing algorithms. The result is that when handling directories with many 1000s of files in them excessive CPU usage is seen.

Change History (2)

comment:1 Changed 12 years ago by ben

bbackupd : When a directory has lots of files, there are rather too many compares going on when searching for it's entry in the directory listing retrieved from the server. This scales logarithmically.

bbstored : When housekeeping, bbstored reads every directory. Within each directory, the contents are only scanned linearly a couple of times, so overall it should scale linearly with the number of directories (where number of files in them are far less important).

We do need to move to a ref counted store, to avoid all this scanning. But on review of the code, I don't think the excessive CPU usage on the server is due to inefficient handling of large directories.

comment:2 Changed 11 years ago by chris

Milestone: 0.200.12
Owner: set to chris
Version: 0.10trunk

bbackupd gets slower and slower when backing up a directory with many files. The problem appears to be that the directory is rewritten after each file is added, which is O(n2) in number of files in the directory.

Note: See TracTickets for help on using tickets.