source: box/trunk/docs/api-notes/backup_encryption.txt @ 2478

Revision 2478, 4.8 KB checked in by chris, 3 years ago (diff)

Rearrangement of api-notes directory.

  • Property svn:eol-style set to native
Line 
1TITLE Encryption in the backup system
2
3This document explains how everything is encrypted in the backup system, and points to the various functions which need reviewing to ensure they do actually follow this scheme.
4
5
6SUBTITLE Security objectives
7
8The crpyto system is designed to keep the following things secret from an attacker who has full access to the server.
9
10* The names of the files and directories
11* The contents of files and directories
12* The exact size of files
13
14Things which are not secret are
15
16* Directory heirarchy and number of files in each directory
17* How the files change over time
18* Approximate size of files
19
20
21SUBTITLE Keys
22
23There are four separate keys used:
24
25* Filename
26* File attributes
27* File block index
28* File data
29
30and an additional secret for file attribute hashes.
31
32The Cipher is Blowfish in CBC mode in most cases, except for the file data. All keys are maximum length 448 bit keys, since the key size only affects the setup time and this is done very infrequently.
33
34The file data is encrypted with AES in CBC mode, with a 256 bit key (max length). Blowfish is used elsewhere because the larger block size of AES, while more secure, would be terribly space inefficient. Note that Blowfish may also be used when older versions of OpenSSL are in use, and for backwards compatibility with older versions.
35
36The keys are generated using "openssl rand", and a 1k file of key material is stored in /etc/box/bbackupd. The configuration scripts make this readable only by root.
37
38Code for review: BackupClientCryptoKeys_Setup()
39in lib/backupclient/BackupClientCryptoKeys.cpp
40
41
42SUBTITLE Filenames
43
44Filenames need to be secret from the attacker, but they need to be compared on the server so it can determine whether or not is it a new version of an old file.
45
46So, the same Initialisation Vector is used for every single filename, so the same filename encrypted twice will have the same binary representation.
47
48Filenames use standard PKCS padding implemented by OpenSSL. They are proceeded by two bytes of header which describe the length, and the encoding.
49
50Code for review: BackupStoreFilenameClear::EncryptClear()
51in lib/backupclient/BackupStoreFilenameClear.cpp
52
53
54SUBTITLE File attributes
55
56These are kept secret as well, since they reveal information. Especially as they contain the target name of symbolic links.
57
58To encrypt, a random Initialisation Vector is choosen. This is stored first, followed by the attribute data encrypted with PKCS padding.
59
60Code for review: BackupClientFileAttributes::EncryptAttr()
61in lib/backupclient/BackupClientFileAttributes.cpp
62
63
64SUBTITLE File attribute hashes
65
66To detect and update file attributes efficiently, the file status change time is not used, as this would give suprious results and result in unnecessary updates to the server. Instead, a hash of user id, group id, and mode is used.
67
68To avoid revealing details about attributes
69
701) The filename is added to the hash, so that an attacker cannot determine whether or not two files have identical attributes
71
722) A secret is added to the hash, so that an attacker cannot compare attributes between accounts.
73
74The hash used is the first 64 bits of an MD5 hash.
75
76
77SUBTITLE File block index
78
79Files are encoded in blocks, so that the rsync algorithm can be used on them. The data is compressed first before encryption. These small blocks don't give the best possible compression, but there is no alternative because the server can't see their contents.
80
81The file contains a number of blocks, which contain among other things
82
83* Size of the block when it's not compressed
84* MD5 checksum of the block
85* RollingChecksum of the block
86
87We don't want the attacker to know the size, so the first is bad. (Because of compression and padding, there's uncertainty on the size.)
88
89When the block is only a few bytes long, the latter two reveal it's contents with only a moderate amount of work. So these need to be encrypted.
90
91In the header of the index, a 64 bit number is chosen. The sensitive parts of the block are then encrypted, without padding, with an Initialisation Vector of this 64 bit number + the block index.
92
93If a block from an previous file is included in a new version of a file, the same checksum data will be encrypted again, but with a different IV. An eavesdropper will be able to easily find out which data has been re-encrypted, but the plaintext is not revealed.
94
95Code for review: BackupStoreFileEncodeStream::Read() (IV base choosen about half-way through)
96BackupStoreFileEncodeStream::EncodeCurrentBlock() (encrypt index entry)
97in lib/backupclient/BackupStoreFileEncodeStream.cpp
98
99
100SUBTITLE File data
101
102As above, the first is split into chunks and compressed.
103
104Then, a random initialisation vector is chosen, stored first, followed by the compressed file data encrypted using PKCS padding.
105
106Code for review: BackupStoreFileEncodeStream::EncodeCurrentBlock()
107in lib/backupclient/BackupStoreFileEncodeStream.cpp
108
109
Note: See TracBrowser for help on using the repository browser.