Opened 12 years ago

Closed 12 years ago

#9 closed defect (fixed)

SSL connection may time out when backing up large data sets

Reported by: chris Owned by: chris
Priority: normal Milestone: 0.11
Component: bbackupd Version: 0.10
Keywords: ssl timeout client reload scanning directories Cc:

Description

Tobias Balle-Petersen reports that when backing up 1 Tb of files, the client needs to be reloaded before each backup, otherwise the backup will fail with an SSL timeout error. He has KeepAliveTime enabled and set low (10 seconds).

It appears that the client can spend large amounts of time scanning (locally) for changed files without contacting the server, and during these long delays, the connection can time out. Reloading the client causes it to discard cached information and request data from the store more often, which avoids the timeout.

I suspect that the root of the problem is that KeepAliveTime is only honoured while diffing individual files, and not while scanning directories. In a large data set where few files change between backup runs, a timeout may occur while scanning directories. Sending KeepAlive? messages regularly while scanning directories may fix this problem.

Change History (26)

comment:1 Changed 12 years ago by chris

Please review [1141] [1142].

comment:2 Changed 12 years ago by chris

(In [1145]) Added tests for timers with zero interval, which should never expire (refs #9)

comment:3 Changed 12 years ago by chris

(In [1154]) Use a static pointer rather than a static object, to allow it to be freed in Timers::Cleanup, removing a reported memory leak (refs #9)

comment:4 Changed 12 years ago by chris

(In [1162]) Initialise cross-platform timers on all platforms, remove win32-specific code (refs #9)

comment:5 Changed 12 years ago by chris

(In [1165]) Initialise timers in all unit tests (refs #9)

comment:6 Changed 12 years ago by chris

(In [1170]) Add ExtendedLogFile? option to bbackupd config (refs #9)

comment:7 Changed 12 years ago by chris

(In [1173]) * Allow Daemons to be created more than once per process

  • Don't initialise signal handler until after fork, in case the parent is actually a unit test or another complex application
  • Don't exit(0) in the parent, for the same reason (refs #9)

comment:8 Changed 12 years ago by chris

(In [1174]) Add missing newlines to protocol logging to a file (refs #9)

comment:9 Changed 12 years ago by chris

(In [1175]) Separate ReadPidFile?() out from LaunchServer?() in test code (refs #9)

comment:10 Changed 12 years ago by chris

(In [1179]) Added test for keepalives being sent (refs #9)

comment:11 Changed 12 years ago by chris

(In [1180])

  • Fix timer expiry calculation when timers expire in the past
  • Fix handling of timers which never expire (zero deadline) (refs #9)

comment:12 Changed 12 years ago by chris

Component: bbackupctlbbackupd

(In [1171])

  • Use gettimeofday() to increase accuracy of GetCurrentBoxTime??() on platforms which support it. Fixes busy waits for 1 second in backup client when time for next backup is not on a 1 second boundary (which it never is).

(In [1176])

  • Moved intercept code to a library module to allow it to be used by test/bbackupd as well.

comment:13 Changed 12 years ago by chris

(In [1181]) Added a test for diff termination if MaximumDiffingTime? is exceeded (refs #3, refs #9)

comment:14 Changed 12 years ago by chris

(In [1184]) Replace old-style setitimers for KeepAliveTime and MaximumDiffingTime? with new Timer objects. (refs #3, refs #9)

comment:15 Changed 12 years ago by chris

Also required: [1182], [1183].

comment:16 Changed 12 years ago by chris

(In [1185]) Search for dlfcn.h and dlsym() (needed for new intercept code) (refs #3, refs #9)

comment:17 Changed 12 years ago by chris

(In [1187]) Added a header file for including in test/bbackupd/testbbackupd.cpp and other modules which might need intercepts in future.

Added opendir/readdir and lstat hook capability.

(refs #3, refs #9)

comment:18 Changed 12 years ago by chris

(In [1190]) Added tests for keepalives while scanning large directories. (refs #3, refs #9)

comment:19 Changed 12 years ago by chris

(In [1191]) Moved KeepAlive? timer to BackupClientContext? object.

Made timeout initialisation non-static, and a property of the context object. (perhaps should be in rParams, I know).

(refs #3, refs #9)

comment:20 Changed 12 years ago by chris

(In [1192]) Send keepalives when needed while scanning large directories (refs #3, refs #9)

comment:21 Changed 12 years ago by chris

(In [1193]) Make the timer test reliable by using nanosleep() instead of sleep(), since sleep() may use signals and interfere with SIGALRM, and also cannot be resumed if interrupted by a signal. (refs #3, refs #9).

comment:22 Changed 12 years ago by chris

(In [1194]) Fixed a race condition caused by rescheduling in signal handler (refs #3, refs #9)

comment:23 Changed 12 years ago by chris

(In [1198]) Fix more deadlocks by minimising the amount of stuff that the signal handler does. (refs #3, refs #9)

comment:24 Changed 12 years ago by chris

(In [1246]) Don't do things with essential side effects inside ASSERT() macros (refs #3, refs #9)

comment:25 Changed 12 years ago by chris

Status: newassigned

Asked Tobias to check whether those issues are fixed in chris/merge.

comment:26 Changed 12 years ago by chris

Resolution: fixed
Status: assignedclosed

I believe these issues are resolved with the new timer code. Please let me know if not.

Note: See TracTickets for help on using tickets.