Ticket #8 (new task)
Improve handling of directories with many files
| Reported by: | martin | Owned by: | chris |
|---|---|---|---|
| Priority: | normal | Milestone: | 0.12 |
| Component: | box libraries | Version: | trunk |
| Keywords: | Cc: |
Description
Both the backup client and the server housekeeping code use inefficient directory indexing algorithms. The result is that when handling directories with many 1000s of files in them excessive CPU usage is seen.
Change History
comment:2 Changed 4 years ago by chris
- Owner set to chris
- Version changed from 0.10 to trunk
- Milestone changed from 0.20 to 0.12
bbackupd gets slower and slower when backing up a directory with many files. The problem appears to be that the directory is rewritten after each file is added, which is O(n2) in number of files in the directory.
Note: See
TracTickets for help on using
tickets.

bbackupd : When a directory has lots of files, there are rather too many compares going on when searching for it's entry in the directory listing retrieved from the server. This scales logarithmically.
bbstored : When housekeeping, bbstored reads every directory. Within each directory, the contents are only scanned linearly a couple of times, so overall it should scale linearly with the number of directories (where number of files in them are far less important).
We do need to move to a ref counted store, to avoid all this scanning. But on review of the code, I don't think the excessive CPU usage on the server is due to inefficient handling of large directories.