drizzle
Profile
Search
 
Hosted by The Rackspace Cloud
Automation Documentation

This page serves to document the interface, command line options, and configuration file format and options for the Drizzle Automation Project.

Contents

Installing drizzle-automation

These installation instructions are Debian-centric. For those users on other platforms, please feel free to update this wiki with instructions. Thanks.

Dependencies


Dbt2

% git clone git://osdldbt.git.sourceforge.net/gitroot/osdldbt/dbt2
% cd dbt2 (or whatever the current version is>
% rm -f CMakeCache.txt	&& cmake CMakeLists.txt -DDBMS="drizzle" && make  (build tools)
% export PATH=$PATH:<dbt2 home>/bin
% dbt2-datagen -w <number of warehouses> -d <location to store the data> --drizzle   (generate the data)


Note the number of warehouses from the generated data must be less than or equal to the number specified in the configuration file

Doxygen

The DOXY command requires installation of the Doxygen package. On Debian systems, this can be installed with:

$> sudo apt-get install doxygen

LCOV

The LCOV command requires installation of the LCOV package and requires a modern version of the GNU compiler collection with support for the gprof profiling options and coverage suite commonly called GCOV.

#> sudo apt-get install lcov

In addition, a small fix is necessary to the genhtml program in order for the LCOV generation to HTML files to work properly.

As root, open up the /usr/bin/genhtml Perl script in an editor. Edit the lines below(may be changed from version to other try to search for the following lines) as follows.

From this:

	open(SOURCE_HANDLE, "<".$source_filename)
		or die("ERROR: cannot open $source_filename for reading!\n");

to this:

	open(SOURCE_HANDLE, "<".$source_filename)
		or return;

Profiling

The SYSBENCH command can run with a profiler to profile code run during a benchmark run. So, you should first install the drizzle-sysbench project as documented above.

For profiling with Valgrind, you will want to install valgrind:

$> sudo apt-get install valgrind

Many folks like to analyze cachegrind output using the excellent graphical calltree analysis tool, KCachegrind:

$> sudo apt-get install kcachegrind

Note that KCachegrind is a local analysis tool, and is not required on the machine running profiling.

For profiling with OProfile, you should install oprofile:

$> sudo apt-get install oprofile

Random Query Generator

The Random Query Generator requires installation of the randgen repository, execute the following in your bzr repository directory

bzr branch lp:randgen

See the the Random Query Generator Wiki for details on how to run tests. After installing the Drizzle Automation suite as described at Automation_Documentation#Installing_the_Automation_Suite, make sure and edit the following configuration parameters if necessary in /etc/drizzle-automation/randgen/randgen.cnf. The randgen_tests section specifies the name of the test and arguments to run that test, a few examples are below

[run]
# location of the  randgen repository
randgen_home= /home/drizzle/repos/randgen
[randgen_tests]
basic= --queries=50 --threads=7 --gendata=conf/drizzle.zz --grammar=conf/drizzle.yy --reporters=ErrorLog,Backtrace --validators=ErrorMessageCorruption --engine=Innodb
combinations= --queries=1000 --threads=10 --gendata=conf/combinations_drizzle.zz --grammar=conf/combinations.yy --reporters=ErrorLog,Backtrace -validators=ErrorMessageCorruption --engine=Innodb
database_consistency_after_recovery= --queries=100 --threads=5 --gendata=conf/transactions.zz --grammar=conf/transactions.yy --reporters=RecoveryConsistency -validators=ErrorMessageCorruption --engine=Innodb
database_integrity_after_recovery= --queries=100 --threads=5 --gendata=conf/drizzle.zz --grammar=conf/drizzle.yy --reporters=Recovery -validators=ErrorMessageCorruption --engine=Innodb
limit_validator= --queries=1000 --threads=10 --gendata=conf/drizzle.zz --grammar=conf/subquery_semijoin_drizzle.yy --reporters=ErrorLog,Backtrace --validators=Limit --engine=Innodb
many_indexes= --queries=5 --threads=2 --gendata=conf/many_indexes_drizzle.zz --grammar=conf/many_indexes_drizzle.yy --reporters=ErrorLog,Backtrace --validators=ErrorMessageCorruption --engine=Innodb
optimized_semijoin_compare= --queries=1000 --threads=10 --gendata=conf/drizzle.zz --grammar=conf/subquery_semijoin_drizzle.yy --reporters=ErrorLog,Backtrace --validators=ResultsetComparator --engine=Innodb

You will also need to make sure you have the following perl modules installed on your machine


Note that for now you must modify drizzle.xs by deleting the following two lines until this bug is fixed in DBD::drizzle:

  1. === modified file 'drizzle.xs'
  2. --- old/drizzle.xs      2009-08-05 13:42:30 +0000
  3. +++ new/drizzle.xs      2010-05-25 21:14:56 +0000
  4. @@ -134,8 +134,6 @@ void _admin_internal(drh,dbh,command,dbn
  5.  
  6.   if (strEQ(command, "shutdown"))
  7.      (void) drizzle_shutdown(con, &res, DRIZZLE_SHUTDOWN_DEFAULT, &retval);
  8. - else if (strEQ(command, "refresh"))
  9. -    (void) drizzle_refresh(con, &res, DRIZZLE_REFRESH_LOG, &retval);
 10.   else if (strEQ(command, "createdb"))


SLOCCount

The SLOC command requires installation of David Wheeler's SLOCcount package. On Debian systems, this can be installed with:

 $> sudo apt-get install sloccount

You then need to modify /usr/bin/get_sloc at line 355, change the arbitrary line length from 77 to 100. This was forcing a new line after 77 characters which was causing a problem in how we parse the results.

Sqlbench and Crashme

The sqlbench and crashme command requires installation of the sql-bench repository, execute the following in your bzr repository directory

bzr branch lp:sql-bench

You will also need to make sure you have the following perl modules installed on your machine


In the crashme script at line 1568 replace these two lines

	  ["TO_DAYS","to_days",
	   "to_days(make_date(1996,01,01))",729024,0],

with:

	  ["TO_DAYS-drizzle","to_days",
	   "to_days(make_date(1996,01,01))",2450084,0],

Sysbench

The SYSBENCH command requires that the Drizzle library header be installed. Simply build Drizzle and do:

$> sudo make install

This will install the right headers.

The SYSBENCH command also requires installation of the drizzle-sysbench program, which can be installed from source like so:

$> bzr branch lp:~drizzle-developers/sysbench/trunk  drizzle-sysbench
$> cd drizzle-sysbench
$> ./autogen.sh && ./configure && make
$> sudo make install

Make sure sysbench is then in your path

Installing the Statistic Collection Database

We store statistical information in a local mysql database (so be sure that MySQL is installed locally).

Create a new database named "drizzle_stats" (the default name in the default configuration file)

$> mysqladmin --user=root --password create drizzle_stats
<Enter root pass>

Now create the requisite user:

$> mysql --user=root --password
mysql> GRANT ALL ON drizzle_stats.* TO drizzle@localhost IDENTIFIED BY 'drizzle';

Finally, install the database tables by copying the following into a mysql client after changing to the drizzle_stats database:

$> mysql --user=root --password
mysql> use drizzle_stats
Database changed.
CREATE TABLE sloc_stats (
  server VARCHAR(20) NOT NULL
, version VARCHAR(50) NOT NULL
, directory VARCHAR(50) NOT NULL
, language VARCHAR(20) NOT NULL
, run_date DATETIME NOT NULL
, count INT NOT NULL
, PRIMARY KEY (server, version, directory, language)
) ENGINE=InnoDB;

CREATE TABLE lcov_stats (
  server VARCHAR(20) NOT NULL
, version VARCHAR(50) NULL
, run_date DATETIME NOT NULL
, dir_name VARCHAR(255) NOT NULL
, coverage_percent DECIMAL(5,2) NOT NULL
, PRIMARY KEY (server, version, run_date, dir_name)
) ENGINE=InnoDB;

CREATE TABLE bench_config (
  config_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
, name VARCHAR(255) NOT NULL
) ENGINE=InnoDB;

CREATE TABLE bench_runs (
  run_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
, config_id INT NOT NULL
, server VARCHAR(20) NOT NULL
, version VARCHAR(50) NULL
, run_date DATETIME NOT NULL
) ENGINE=InnoDB;

CREATE TABLE bench_show_status_history (
  run_id INT NOT NULL
, concurrency INT NOT NULL
, variable_name VARCHAR(255) NOT NULL
, value INT NOT NULL
, PRIMARY KEY (run_id, variable_name, concurrency)
) ENGINE=InnoDB;

CREATE TABLE sysbench_run_iterations (
  run_id INT NOT NULL
, concurrency INT NOT NULL
, iteration INT NOT NULL
, tps DECIMAL(13,2) NOT NULL
, read_write_req_per_second DECIMAL(13,2) NOT NULL
, deadlocks_per_second DECIMAL(5,2) NOT NULL
, min_req_latency_ms DECIMAL(10,2) NOT NULL
, avg_req_latency_ms DECIMAL(10,2) NOT NULL
, max_req_latency_ms DECIMAL(10,2) NOT NULL
, 95p_req_latency_ms DECIMAL(10,2) NOT NULL
, PRIMARY KEY (run_id, concurrency, iteration)
) ENGINE=InnoDB;

CREATE TABLE sqlbench_run_iterations (
 run_id INT NOT NULL
, operation_name VARCHAR(40) NOT NULL
, seconds DECIMAL(5,2) NOT NULL
, usr DECIMAL(5,2) NOT NULL
, sys DECIMAL(5,2) NOT NULL
, cpu DECIMAL(5,2) NOT NULL
, tests INT NOT NULL
, engine VARCHAR(20) NOT NULL
, PRIMARY KEY (run_id, operation_name)
) ENGINE=InnoDB;

CREATE TABLE drizzleslap_run_iterations (
run_id INT NOT NULL
, engine_name VARCHAR(40) NOT NULL
, test_name VARCHAR(40) NOT NULL
, queries_avg DECIMAL(8,3) NOT NULL
, queries_min DECIMAL(8,3) NOT NULL
, queries_max DECIMAL(8,3) NOT NULL
, total_time DECIMAL(8,3) NOT NULL
, stddev DECIMAL(8,3) NOT NULL
, iterations INT NOT NULL
, concurrency INT NOT NULL
, concurrency2 INT NOT NULL
, queries_per_client INT NOT NULL
, PRIMARY KEY (run_id, test_name, concurrency)
) ENGINE=InnoDB;

CREATE TABLE dbt2_run_iterations (
run_id INT NOT NULL
, tpm DECIMAL(6,2) NOT NULL
, connections INT NOT NULL
, test_time INT NOT NULL
, rollbacks INT NOT NULL
, warehouses INT NOT NULL
, PRIMARY KEY (run_id, connections, test_time)
) ENGINE=InnoDB;

Installing the Automation Suite

Execute the following in your home user folder:

 bzr init-repo --2a drizzle-automation
 cd drizzle-automation
 bzr branch lp:drizzle-automation trunk
 cd trunk
 sudo python setup.py install
 sudo mkdir -p /etc/drizzle-automation
 sudo cp -r drizzle/automation/config/* /etc/drizzle-automation

Note that until we fix drizzle-automation, machines should add a cronjob to clean up old repositories, something like the following to remove repositories older than 30 days

 0 23 * * * find /home/drizzle/repos -type d -name 'staging-crashme-crashme-r*' -mtime +10 -exec rm -rf {} \;  

Customizing Your Automation configuration

The above command will install a default automation.cnf file in /etc/drizzle-automation/automation.cnf. You should open that file up in an editor and change settings as needed.

The drizzle-automation interface

At a bare minimum, to get drizzle-automation to perform an action, you need to provide it with a command. The command is one of the commands detailed below.

The drizzle-automation program is called like so:

$> drizzle-automation $COMMAND

where $COMMAND is one of the supported automation commands (see below).

Of course, there are numerous command line interface options, which are covered in a section below, and the vast majority of those command line options can be defaulted to a value based on values you can put in a configuration file. The configuration files are documented also below.

Sandboxes

When you execute drizzle-automation, the program creates a sandbox for your process to play in. You can override this and specify a directory by using the --use-root-sandbox option (see example below). In this way, you are guaranteed that when you run your process, it will not interfere with previous run data, built servers, configurations, etc.

A sandbox is simply a separate bzr branch which houses a specific version of code you are testing with a specific run of drizzle-automation.

A sandbox is created in the main repository directory (see the --repo-dir option or [defaults][repo-dir] configuration file option) and under the parent branch for the command you are running and the branch-nick you are processing (see --branch option or [defaults][command][branch] configuration file option). The sandbox's specific branch will be named r$REVNO.

In this way, the repository will contain a tree of branches which are isolated from each other, for each command, branch, and revision.

It's easier to see with an example. Let's assume you have set the following configuration options:

# repository directory         (--bzr-repo-dir)
bzr_repo_dir= /home/jpipes/repos/drizzle/

And you call drizzle-automation with the following:

$> drizzle-automation --bzr-branch=lp:drizzle --bzr-revision=943 lcov

In the above call to drizzle-automation, we are asking it to execute the lcov command, for the lp:drizzle branch, at revision 943.

If it didn't exist already, drizzle-automation will create a bzr repository at:

/home/drizzle/repos/drizzle

drizzle-automation will automatically create a bzr branch at:

/home/drizzle/repos/drizzle/drizzle-lcov-r943

If it didn't already exist, there will also be a bzr branch created at:

/home/drizzle/repos/drizzle/drizzle-lcov

which will be the "parent" branch of drizzle-lcov-r943. This drizzle-lcov may or may not be at the drizzle branch's revision 943, but drizzle-lcov-r943 is guaranteed to be at revision 943 of the drizzle branch.

As you will see below, you can specify a range of revisions on the command line. In this case, drizzle-automation will create a sandbox for each revision. So, for example, let's say you execute the following:

$> drizzle-automation --branch=lp:drizzle --revision=943..946 lcov

then, drizzle-automation will create the following sandboxes, performing the LCOV command in each sandbox it creates:

/home/drizzle/repos/drizzle/drizzle-lcov-r943
/home/drizzle/repos/drizzle/drizzle-lcov-r944
/home/drizzle/repos/drizzle/drizzle-lcov-r945
/home/drizzle/repos/drizzle/drizzle-lcov-r946

In this way, it is easy to tell which revisions have been processed for a specific command just by looking at the repository directory itself. In addition, the sandboxing enables drizzle-automation to be re-run at a specific revision in case build or automation fails. Failures in one sandbox won't affect the running of the automation scripts against other revisions.

A revision range can end in the word HEAD to signal to drizzle-automation to run revisions all the way up to the HEAD of the desired branch, like so:

$> ./drizzle-automation --bzr-branch=lp:drizzle --bzr-revision=1000..HEAD lcov

The commands

Crashme

The crashme command is a perl script that runs a set of tests which check the limits of your database. crashme comes with the sqlbench repository and thus when you run the crashme target from drizzle-automation, it needs to know the location of the sql-bench repository which is specified in the drizzle-automation configuration file.

After installing the Drizzle Automation suite as described here, make sure and edit configuration parameters if necessary in /etc/drizzle-automation/crashme/crashme.cnf. Currently this only contains the location of sql-bench home but can be used in the future to add more variables as needed.

Example usage:

 $> drizzle-automation  --bzr-revision=1125 --bench-config-name=crashme crashme

After building the specified or current version of Drizzle, drizzle-automation then cd's to the sql-bench repository and runs the script "crash-me". An example of how the script is run:

./crash-me --server=drizzled --connect-options=port=9306 --dir=limits-1125 --force

The results are stored in a file name in the sqlbench_home/limits-bzr_revision directory The results file name is "server_name.cfg" For example: "limits-1125/drizzled.cfg"

No information is currently stored in the database, the test either passes or fails if it crashes the server

Dbt2

Dbt2 is an OLTP transactional performance test. It simulates a wholesale parts supplier where several workers access a database, update customer information and check on parts inventories.

After installing the Drizzle Automation suite as described at Automation_Documentation#Installing_the_Automation_Suite, make sure and edit configuration parameters if necessary in /etc/drizzle-automation/dbt2/dbt2.cnf for runtime variables and in /etc/drizzle-automation/automation.cnf for email notifications

To run the command, do the following:

$> drizzle-automation --bzr-branch=lp:drizzle --bzr-revision=1300 --bench-config-name=dbt2 dbt2

This will grab the r1300 revision of the trunk branch of drizzle, create a sandbox for it if not already created, and run the dbt2 tests against that revision, and store the results in the local database. The tests and its arguments are specified in the /etc/drizzle-automation/automation.cnf configuration file.

The results for each run are stored by concurrency and can be found at <working_dir>/dbt2-<bzr revision>/<concurrency>/report.txt. For example /home/drizzle/repos/staging-dbt2-dbt2-r1251/dbt2-1251/2/report.txt

An example results file would look like:

                         Response Time (s)
 Transaction      %    Average :    90th %        Total        Rollbacks      %
------------  -----  ---------------------  -----------  ---------------  -----
    Delivery   3.59      0.038 :     0.072           56                0   0.00
   New Order  43.73      0.022 :     0.046          683                5   0.73
Order Status   3.14      0.003 :     0.003           49                0   0.00
     Payment  38.80      0.010 :     0.026          606                0   0.00
 Stock Level   4.10      0.014 :     0.019           64                0   0.00

119.30 new-order transactions per minute (NOTPM)
5.7 minute duration
0 total unknown errors
5 rollback transactions
-27 second(s) ramping up
11 second(s) cooling down

Querying the dbt2 Results

Once run, you can query the database with the following sql query which retrieves results for revisions 1251 to 1260:

SELECT r.version, r.run_id, r.run_date, i.connections, i.tpm
          FROM bench_config c
          NATURAL JOIN bench_runs r
          NATURAL JOIN dbt2_run_iterations i
          WHERE r.server = 'drizzled' 
          AND r.version BETWEEN 'staging-1251' AND 'staging-1260'
          GROUP BY r.version, r.run_id, i.connections
          ORDER BY i.connections, r.version DESC, r.run_id DESC

Doxy

To produce Doxygen code documentation on your branch, simply do:

$> drizzle-automation --bzr-branch=lp:drizzle --bzr-revision=1000 --no-rsync doxy

This will create a sandbox for the r1000 revision of the trunk drizzle branch, and run Doxygen on all source files.

Syncing Doxygen Generated HTML to a Remote Server

If you need to rsync the resulting documentation to a server, remove the --no-rsync option and add the following to your /etc/drizzle-automation/automation.cnf file.

[doxy]
ssh_user= $YOUR_SSH_USERNAME
rsync_dir= $TARGET_LOCATION

For instance, to sync to the main drizzle.org Doxygen sub-site, we use the following:

[doxy]
ssh_user= jpipes
rsync_dir= drizzle.org:web/doxygen/

Drizzleslap

Drizzleslap is a diagnostic program designed to emulate client load for a Drizzle server and to report the timing of each stage. It works as if multiple clients are accessing the server.

To run the command, do the following:

$> drizzle-automation --bzr-branch=lp:drizzle --bzr-revision=1300 --bench-config-name=drizzleslap drizzleslap

This will grab the r1200 revision of the trunk branch of drizzle, create a sandbox for it if not already created, and run the drizzleslap tests against that revision, and store the results in the local database. The tests and its arguments are specified in the /etc/drizzle-automation/drizzleslap/drizzleslap.cnf configuration file.

The results are stored in a csv file in /tmp/drizzleslap-<bzr version>.csv For example: "/tmp/drizzleslap-1300.csv

An example csv file would look like:

nnodb,write-commit-scale,0.141,0.121,0.262,1.413,0.043,10,1,1,1000
innodb,write-commit-scale,14.090,11.798,16.529,140.905,1.929,10,100,100,1000
.......

Querying the drizzleslap Results

Once run, you can query the database with the following sql query which retrieves results for revisions 1218 to 1221:

SELECT r.version, r.run_id, r.run_date, i.test_name, i.concurrency, avg(i.total_time) as total
          FROM bench_config c
          NATURAL JOIN bench_runs r
          NATURAL JOIN drizzleslap_run_iterations i
          WHERE r.server = 'drizzled' 
          AND r.version BETWEEN 'staging-1218' AND 'staging-1221'
          GROUP BY r.version, r.run_id, i.test_name, i.concurrency
          ORDER BY i.test_name, r.version DESC, r.run_id DESC, i.concurrency

LCOV

The LCOV command does the following every time you run it:

  1. Configures and builds the Drizzle server with code coverage and profiling enabled
  2. Runs the Drizzle test suite
  3. Collects GCOV code coverage information for all source files in specified directories
  4. Builds HTML reports for all the source code coverage data
  5. Syncs the reports with a remote server (optional)
  6. Stores the code coverage data in a database (optional)

To run the command, do the following:

$> drizzle-automation --bzr-branch=lp:drizzle --bzr-revision=1200 --no-rsync lcov

This will grab the r1200 revision of the trunk branch of drizzle, create a sandbox for it if not already created, and run the code coverage against that revision, and store the results in the local database.

Syncing LCOV Results with a Remote Server

To sync the generated HTML coverage reports with a remote server, leave off the --no-rsync option from the command line. Before running, you must ensure that your automation.cnf file contains enough information for the automation suite to know where to rsync the results to. So, if not already there, add the following to the configuration file:

[lcov]
directories= client,drizzled,plugin
ssh_user= $YOUR_SSH_USER
rsync_dir= $TARGET_LOCATION

The "directories" is a comma-separated list of the directories in the Drizzle source tree you wish to run coverage reports against. The ssh_user is a user which has SSH key access to the $TARGET_LOCATION.

As an example, here is what we use for syncing the LCOV reports on Drizzle.org:

[lcov]
directories= drizzled,mystrings,mysys
ssh_user= jpipes
rsync_dir= drizzle.org:web/lcov/

Querying Results of LCOV Runs

By far the most useful benefit of the LCOV processing is the generated HTML reports, however the Drizzle Automation suite also records basic coverage information for your directories in the local database. This is useful if you want to track the coverage over time. You can query the local database like so:

mysql> select * from lcov_stats;
+----------+------------+---------------------+-----------+------------------+
| server   | version    | run_date            | dir_name  | coverage_percent |
+----------+------------+---------------------+-----------+------------------+
| drizzled | trunk-1033 | 2009-05-24 14:39:03 | drizzled  |            73.80 | 
| drizzled | trunk-1033 | 2009-05-24 14:39:03 | mystrings |           100.00 | 
| drizzled | trunk-1033 | 2009-05-24 14:39:03 | mysys     |            83.90 | 
+----------+------------+---------------------+-----------+------------------+

Randgen

The Randgen command runs the Random Query Generator which is a tool for testing the Drizzle and MySQL server by means of randomly generated data and queries. Both the data and the queries are highly configurable and various different scenarios can be tested.

$> drizzle-automation --bzr-branch=lp:drizzle --bzr-revision=1150 --bench-config-name=randgen randgen

This will create a sandbox for the r1150 revision of the trunk drizzle branch, and run the randgen tests specified in the /etc/drizzle-automation/randgen/randgen.cnf file.

SLOC

The SLOC command runs the SLOCcount program from David Wheeler against a configurable set of directories in a branch. It processes the output of the SLOCcount program and can store the processed results in a database.

Example usage:

$> drizzle-automation --bzr-branch=lp:drizzle --bzr-revision=900..HEAD sloc

Would run SLOCcount against revisions 900 through the latest revision in the trunk branch.

The information produced by drizzle-automation for the SLOC command is as follows:

For each directory in the [sloc][directories] configuration variable, and for each language SLOCcount finds in the directory, drizzle-automation collects the number of lines of source code.

By default, this command stores the count of source lines for each directory and language in the following directories:

drizzled, mystrings, mysys

Querying the SLOCCount Results

Once run, you can query the database like so:

jpipes@serialcoder:~/repos/drizzle-automation/trunk$ mysql --user=root drizzle_stats
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 56
Server version: 5.0.67-0ubuntu6 (Ubuntu)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> select * from sloc_stats;
+----------+------------+-----------+----------+---------------------+--------+
| server   | version    | directory | language | run_date            | count  |
+----------+------------+-----------+----------+---------------------+--------+
| drizzled | trunk-1033 | drizzled  | cpp      | 2009-05-24 13:38:57 | 109637 | 
| drizzled | trunk-1033 | drizzled  | perl     | 2009-05-24 13:38:57 |    146 | 
| drizzled | trunk-1033 | drizzled  | sh       | 2009-05-24 13:38:57 |    446 | 
| drizzled | trunk-1033 | drizzled  | yacc     | 2009-05-24 13:38:57 |   5184 | 
| drizzled | trunk-1033 | mystrings | cpp      | 2009-05-24 13:38:57 |  21441 | 
| drizzled | trunk-1033 | mysys     | cpp      | 2009-05-24 13:38:57 |  11954 | 
+----------+------------+-----------+----------+---------------------+--------+
6 rows in set (0.00 sec)

Sqlbench

The sqlbench command runs the sql bench benchmark.

Example usage:

$> drizzle-automation  --bzr-revision=1125 --bench-config-name=sqlbench sqlbench

After installing the Drizzle Automation suite as described at Automation_Documentation#Installing_the_Automation_Suite, make sure and edit configuration parameters if necessary in /etc/drizzle-automation/sqlbench/sqlbench.cnf. Currently this only contains the location of sql-bench home but can be used in the future to add more variables as needed.

After building the specified or current version of Drizzle, drizzle-automation then cd's to the sql-bench repository and runs the script "run-all-tests". An example of how the script is run:

./run-all-tests --server=drizzled --dir=results-1125 --log --connect-options=port=9306 
--bzr-repo=/home/drizzle/repos/sql-bench --machine=orisndriz05

The results are stored in a file name in the sqlbench_home/results-bzr_revision directory The results file name is "RUN-server_name-bzr_revision-hostname" For example: "results-1125/RUN-drizzled-1125-orisndriz0"

The results file has a section that looks like the following:

Operation                  seconds   usr     sys     cpu     tests
alter_table_add          13.00    0.01    0.00    0.01     100
alter_table_drop         12.00    0.00    0.00    0.00      91
connect                       2.00    0.84    0.36    1.20    2000
.......

drizzle-automation starts grabbing lines at the alter_table_add line until the end of the file

Querying the sqlbench Results

Once run, you can query the database with the following sql query which retrieves results for revisions 1118 to 1120:

SELECT r.version, i.operation_name, i.seconds, i.usr, i.sys, i.cpu
    FROM bench_config c 
    NATURAL JOIN bench_runs r
    NATURAL JOIN sqlbench_run_iterations i
    WHERE r.server = 'drizzled'
    AND i.engine = "innodb"
    AND r.version IN ('staging-1118', 'staging-1119','staging-1120')
    GROUP BY i.operation_name, r.version
    ORDER BY i.operation_name, r.version;

Sysbench

The Sysbench command builds the Drizzle server, configures it for a benchmark run, runs benchmarks using drizzle-sysbench against the server, and stores the results in a database.

It is the most complex of the automation commands, and requires some setup time to run properly.

Creating A Sysbench Configuration File

The first step in preparing to run benchmarks on your machine is to create a sysbench configuration file. I've created two sample configuration files you can use in config/sysbench/. One is for a read-only sysbench workload, and the other is for a read-write workload.

To copy the sample configuration files (RECOMMENDED over creating new ones), simple do:

$> cd ~/repos/drizzle-automation/trunk # or wherever you keep the automation suite...
$> sudo mkdir -p /etc/drizzle-automation/sysbench
$> sudo cp config/sysbench/* /etc/drizzle-automation/sysbench

This will put two configuration files in your /etc/drizzle-automation/sysbench directory:

  1. innodb_1000K_readonly.cnf
  2. innodb_1000K_readwrite.cnf

Let's take a peak at the sample innodb_1000K_readonly configuration file:

[drizzled]

# Options provided directly to the drizzled server
# when the server is started.

innodb_buffer_pool_size= 128M
innodb_log_file_size= 64M
innodb_log_buffer_size= 8M
innodb_thread_concurrency= 0
innodb_additional_mem_pool_size= 16M
table_open_cache= 4096
table_definition_cache= 4096

[mysqld]

# Options provided directly to the mysqld server
# when the server is started.

innodb_buffer_pool_size= 128M
innodb_log_file_size= 64M
innodb_log_buffer_size= 8M
innodb_thread_concurrency= 0
innodb_additional_mem_pool_size= 16M
character_set_server= utf8
table_open_cache= 4096
open_files_limit= 4096

[run]

iterations= 3
concurrency= 2,4,8,16,32,64,128,256,512,1024,2048

[sysbench]

# Options given to the sysbench program on every iteration

max-time= 60
max-requests= 0 
test= oltp 
db-ps-mode= disable 
drizzle-table-engine= innodb 
oltp-read-only= on 
oltp-table-size= 1000000

You will note that the default concurrency levels go up to 2048 concurrent connections for the readonly workload and up to 1024 concurrent connections on the readwrite workload.

On Linux and OpenSolaris boxes, you will get errors running high-concurrency workloads unless you change the file descriptor limits.

Changing ulimits on Linux

Add the following two lines to /etc/security/limits.conf:

soft    nofile            8192
hard   nofile            8192
Changing ulimits on OpenSolaris

Add the following two lines to /etc/system and then restart your system:

set rlim_fd_cur=8192
set rlim_fd_max=8192

8192 is pretty much arbitrary but should give you lots of room for open file descriptors...

Running the Command

To run sysbench through the automation suite, simply do:

$> drizzle-automation --bzr-branch=lp:drizzle \
--bzr-revision=1000 --bench-config-name=innodb_1000K_readonly sysbench

The above will run the configuration in /etc/drizzle-automation/sysbench/innodb_1000K_readonly.cnf on the r1000 revision of the trunk drizzle branch.

Querying for the Run Results

Once a sysbench run has completed successfully, you can query the local database for the results. Here is a quick SQL command which will give you a summary of the run results. Simply replace the $variables with your specific needs, as shown in the example below:

SELECT r.version, c.name, i.concurrency, AVG(i.tps) as TPS
FROM bench_config c 
NATURAL JOIN bench_runs r
NATURAL JOIN sysbench_run_iterations i
WHERE r.server = 'drizzled'
AND r.version = $version_name
GROUP BY r.version, c.name, i.concurrency;

$version_name will be the branch name and the revision separated by a hyphen if you are running in BZR processing mode (the default).

For instance, to show the results of the "stage" branch at revision 1032, I could do this:

mysql>  SELECT r.version, c.name, i.concurrency, AVG(i.tps) as TPS
    ->  FROM bench_config c 
    ->  NATURAL JOIN bench_runs r
    ->  NATURAL JOIN sysbench_run_iterations i
    ->  WHERE r.server = 'drizzled'
    ->  AND r.version = 'stage-1032'
    ->  GROUP BY r.version, c.name, i.concurrency;
+------------+------------------------+-------------+-------------+
| version    | name                   | concurrency | TPS         |
+------------+------------------------+-------------+-------------+
| stage-1032 | innodb_1000K_readonly  |           2 | 1085.490000 | 
| stage-1032 | innodb_1000K_readonly  |           4 | 1419.210000 | 
| stage-1032 | innodb_1000K_readonly  |           8 | 1949.573333 | 
| stage-1032 | innodb_1000K_readonly  |          16 | 2594.200000 | 
| stage-1032 | innodb_1000K_readonly  |          32 | 3383.833333 | 
| stage-1032 | innodb_1000K_readonly  |          64 | 3920.556667 | 
| stage-1032 | innodb_1000K_readonly  |         128 | 3927.426667 | 
| stage-1032 | innodb_1000K_readonly  |         256 | 3069.143333 | 
| stage-1032 | innodb_1000K_readonly  |         512 | 1982.510000 | 
| stage-1032 | innodb_1000K_readonly  |        1024 | 1206.643333 | 
| stage-1032 | innodb_1000K_readonly  |        2048 |  523.296667 | 
| stage-1032 | innodb_1000K_readwrite |           2 |  555.843333 | 
| stage-1032 | innodb_1000K_readwrite |           4 |  896.146667 | 
| stage-1032 | innodb_1000K_readwrite |           8 | 1129.113333 | 
| stage-1032 | innodb_1000K_readwrite |          16 | 1748.230000 | 
| stage-1032 | innodb_1000K_readwrite |          32 | 2168.320000 | 
| stage-1032 | innodb_1000K_readwrite |          64 | 2129.303333 | 
| stage-1032 | innodb_1000K_readwrite |         128 | 1771.130000 | 
| stage-1032 | innodb_1000K_readwrite |         256 | 1282.536667 | 
| stage-1032 | innodb_1000K_readwrite |         512 |  872.956667 | 
| stage-1032 | innodb_1000K_readwrite |        1024 |  511.540000 | 
+------------+------------------------+-------------+-------------+
21 rows in set (0.01 sec)

Profiling the Server During Benchmarking

There are a number of options for profiling the server with Valgrind and OProfile while running sysbench. This section details the process of profiling with these tools. Make sure you have installed the profiling tools you wish to use. Please refer to the installation section above.

The basic process of profiling involves running a regular SYSBENCH command automation run and adding a --sysbench-profiler=PROFILER option to the command, like so:

$> drizzle-automation --bench-config-name=innodb_1000K_readonly --profiler=callgrind sysbench

The above would do callstack analysis and profiling during a sysbench run of the innodb_1000K_readonly sysbench configuration.

The currently available profiler adapters are:

Memory Checking with Valgrind

To run memory checking against the server during a sysbench run, simply do:

$> drizzle-automation --bzr-branch=lp:drizzle \
                                 --bzr-revision=1000 \
                                 --bench-config-name=innodb_readonly_1000K \
                                 --profiler=memcheck sysbench

This will perform the innodb_1000K_readonly sysbench configuration while profiling with Valgrind's memory checking system. The output will be stored in the sandbox created during processing as memcheck.out. In the above example, the results would be stored in:

/$BZR_REPO_DIR/trunk-sysbench-r1000/memcheck.out

And would look very similar to this:

jpipes@serialcoder:~/repos/drizzle/trunk-sysbench-r1039$ cat memcheck.out 
==2093== Memcheck, a memory error detector.
==2093== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al.
==2093== Using LibVEX rev 1854, a library for dynamic binary translation.
==2093== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
==2093== Using valgrind-3.3.1-Debian, a dynamic binary instrumentation framework.
==2093== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al.
==2093== For more details, rerun with: -v
==2093== 
==2093== My PID = 2093, parent PID = 1.  Prog and args are:
==2093==    /home/jpipes/repos/drizzle/trunk-sysbench-r1039/drizzled/drizzled
==2093==    --port=9306
==2093==    --basedir=/home/jpipes/repos/drizzle/trunk-sysbench-r1039
==2093==    --datadir=/home/jpipes/repos/drizzle/trunk-sysbench-r1039/var
==2093==    --innodb_buffer_pool_size=64M
==2093==    --innodb_log_file_size=32M
==2093==    --innodb_thread_concurrency=0
==2093==    --innodb_additional_mem_pool_size=16M
==2093==    --innodb_log_buffer_size=8M
==2093==    --table_open_cache=256
==2093==    --table_definition_cache=256
==2093== 
==2093== Conditional jump or move depends on uninitialised value(s)
==2093==    at 0x63FEF0: rw_lock_x_lock_func (sync0rw.ic:292)
==2093==    by 0x6B6ED2: log_group_checkpoint (log0log.c:1782)
==2093==    by 0x6B6F7F: log_groups_write_checkpoint_info (log0log.c:1890)
==2093==    by 0x64EB57: recv_recovery_from_checkpoint_start_func (log0recv.c:452)
==2093==    by 0x6B4B7A: innobase_start_or_create_for_mysql (srv0start.c:1526)
==2093==    by 0x61C379: _ZL13innobase_initR14PluginRegistry (ha_innodb.cc:1884)
==2093==    by 0x5092A4: _ZL17plugin_initializeP13st_plugin_int (sql_plugin.cc:525)
==2093==    by 0x50AA8E: plugin_init(int*, char**, int) (sql_plugin.cc:626)
==2093==    by 0x4180A4: _ZL22init_server_componentsv (drizzled.cc:1556)
==2093==    by 0x419D60: main (drizzled.cc:1782)
<lots snipped>
==2093== 
==2093== LEAK SUMMARY:
==2093==    definitely lost: 6,536 bytes in 39 blocks.
==2093==    indirectly lost: 2,881 bytes in 19 blocks.
==2093==      possibly lost: 2,432 bytes in 8 blocks.
==2093==    still reachable: 14,061,796 bytes in 300 blocks.
==2093==         suppressed: 0 bytes in 0 blocks.
==2093== Reachable blocks (those to which a pointer was found) are not shown.
==2093== To see them, rerun with: --leak-check=full --show-reachable=yes
Call Stack Profiling with Callgrind
.......
Cache Miss and Branch Prediction Analysis with Cachegrind

.....

Command line options

--bench-config-name

The --bench-config-name option is REQUIRED when running the crashme, dbt2, drizzleslap, sqlbench and sysbench commands, and is ignored if you specify it with any other command.

--bench-config-name indicates to drizzle-automation which "named" benchmark configuration you wish to use when running the BENCH command. A named benchmark configuration is simply a file in your /etc/drizzle-automation/<command>/ directory which contains bench-specific configuration information.

As an example please see the section above on running the SYSBENCH command for more details about what goes into a sysbench configuration file.

--bzr-repo-dir

Specifies the root BZR repository which drizzle-automation uses when processing branches. You may also specify a repository directory in a configuration file's [defaults] section.

Examples:

$> drizzle-automation --bzr-repo-dir=/home/myusername/repos/drizzle lcov

or in a configuration file:

[defaults]
bzr_repo_dir= /home/myusername/repos/drizzle

--bzr-branch

This command-line option is OPTIONAL. If missing, drizzle-automation looks for the [defaults][bzr_branch] configuration variable and uses that.

The --bzr-branch option specifies the name of the BZR branch which the drizzle-automation process will operate against.

For example, assume that we have a BZR repository at /home/username/repos/drizzle.

If we wanted to run the SLOC command for revision 967 in the "lp:~mordred/drizzle/my-experimental" branch, we would execute:

$> drizzle-automation --bzr-branch=lp:~mordred/drizzle/my-experimental --bzr-revision=967 sloc

--bzr-revision

This command-line option is OPTIONAL. If missing, the HEAD revision for the specified branch is processed.

A single BZR revision to process, or a range of revisions.

To specify a single revision, do:

$> drizzle-automation --bzr-revision=$REVNO $COMMAND

where $REVNO is a number corresponding to the BZR revno you want.

Processing a Range of BZR Revisions

To process a range of revisions, do:

$> drizzle-automation --bzr-revision=$REVNO_START..$REVNO_END $COMMAND

For instance, to run LCOV against revisions 931 through 950 inclusive, do:

$> drizzle-automation --bzr-revision=931..951 lcov

A special revision specifier of HEAD indicated to run up to the latest revision. It can be used in a revision range, too. For instance:

$> drizzle-automation --bzr-revision=950..HEAD lcov

would run LCOV automation processes for all revisions 950 up to whatever the latest revision is.

Finally, there is an option to have the automation suite query the local database for the last revision it has processed for a command and run up to a specific revision:

$> drizzle-automation --bzr-revision=LAST..HEAD lcov

would run LCOV automation processes for the next revision AFTER the last revision logged as completed in the local database, up to the HEAD revision in the branch. This is useful for automating processes where you don't know the last complete processed revision number...

--dry-run

--dry-run is a command-line option ONLY.

Tells drizzle-automation to not actually run the automation process requested, but only perform validation of configuration files, options specified, and the necessary sandbox checks.

--force

--force is a command-line option ONLY.

Forces drizzle-automation to continue processing even when it encounters an error.

--log-file

This command-line option is OPTIONAL. If missing, drizzle-automation looks for any of the following configuration file variables:

[defaults][log-file]
[$command][log-file]

If it fails to find any specified log-file, logging is output to stdout.

The --log-file option specifies a file to log output of the drizzle-automation program. You may also force logging to stdout by specifying "stdout" as the file name.

Example:

$> drizzle-automation --log-file=/tmp/automation.log -r987 --branch=trunk lcov

would run the LCOV command for the trunk branch at revision 987 and all output would be sent to the file at /tmp/automation.log

--no-pull

--no-pull is a command-line option ONLY.

Tells drizzle-automation to not actually try to pull source code from the launchpad bzr repository

--no-rsync

--no-rsync is a command-line option ONLY.

Tells drizzle-automation to NOT sync any results for the Doxygen or LCOV commands to a remote server.

--no-store-db

--no-store-db is a command-line option ONLY.

Tells drizzle-automation to NOT log any results of the command to a local database

--profiler

Name of the profiling tool to use for profiling the server during a benchmark run. The current options are:


--report

Provides the ability to run a report for previous runs of a benchmark. This is only for benchmarks that record information in a database (dbt2, drizzleslap, sysbench and sqlbench). For example:

% drizzle-automation  --report-name=dbt2 --bench-config-name=dbt2 -r1300 report

would print to standard out the normal report that is generated from running dbt2 for revision 1300. You can add --with-email-report if you want to send the information to the configuration specific email distribution list.

--sysbench-profiler-log-file

The location to put output from the profiler selected with the --sysbench-profiler option.

--use-root-sandbox

Use the value of bzr_repo_dir for builds, the default is false. An example for our hudson builds:

% drizzle-automation dbt2 --with-email-report --bench-config-name=dbt2 --no-pull --use-root-sandbox --bzr-branch=staging --bzr-repo-dir=$WORKSPACE

Which says to use the location of $WORKSPACE to do the build and also don't pull from the repository, assuming it is already up to date since Hudson takes care of updating from the repository

--with-email-report

--with-email-report is a command-line option ONLY

Tells drizzle-automation to email a report (currently on SYSBENCH regression report) upon successful completion of the run to recipients in the [$COMMAND][report_email] configuration variable comma-separated list of email addresses.

The drizzle-automation configuration files

The configuration files are where you will want to place most of your options. There is a sample automation.cnf configuration file in the /config directory of the source tree which you should copy to /etc/drizzle-automation/automation.cnf and edit to your needs.

drizzle-automation searches the following configuration files when running automation processes:

/etc/drizzle-automation/automation.cnf
/etc/drizzle-automation/$COMMAND.cnf
~/$COMMAND.cnf
$CWD/$COMMAND.cnf