drizzle
Profile
Search
 
Hosted by The Rackspace Cloud
Replication Command Log Archival

The idea behind this is task is to create "snapshots", or archived Command Log segments, which can be applied to a provisioned server on startup in a way similar to the SQL-based output of the drizzledump program.

Concepts

The Command Log is a tail-write-only log, protected by an atomic<off_t> which is moved by the CommandLog::apply() method along the log file while Command GPB messages are written to the log.

This has a couple ramifications:

  1. Once a Command message is written to the "active" command.log file, it can never be updated.
  2. The atomic<off_t> guarding the tail is always monotonically increasing

Given the above two conditions, it seems feasible to have a lock-free solution which contructs "archives" or "snapshots" that will produce an exact dataset of a server at a specific point in time.

Snapshot Process

The following steps need to be done in the archival/snapshot process:

  1. Determine the global transaction identifier corresponding to the timestamp that needs to be snapshotted up to
  2. Read through the active command log, building an index of the log up to the last transaction to be applied in the snapshot
  3. Use the index to do the following, in order:
    1. Determine the definitions of the tables in each schema at the latest transaction
      • For each schema:
        • For each table in schema:
          1. Create an in-memory primary key for the table, containing all key values
          2. Purge all keys for deleted records
          3. Construct a single Command message with an InsertRecord submessage for each key, with the latest values of all columns for that record
    2. Determine the last "change record" for records in each schema table
    3. Add the new change record as a Command message with an InsertRecord submessage to the archive