This is the workspace for Padraig O'Sullivan, who is working on the Google Summer of Code project to re-implement INFORMATION_SCHEMA for Drizzle. Jay Pipes is mentoring Padraig.
Contents |
Before beginning work on extracting the current I_S implementation to be a plugin, it will be useful to document clearly how the current implementation works. By documenting its current design and implementation, ideas for a new design may become more apparent.
Another interesting exercise to perform for informational purposes at this stage of the project will be to look at other open-source database systems to see how data dictionaries are implemented in those systems. For example, PostgreSQL may be educational to look at. While the ideas will probably not directly apply in the case of Drizzle, we still believe it would be a worthwhile exercise to perform.
In order to measure any impact on performance, it is first necessary to have a baseline set of benchmarks at varying concurrency levels. We'll then compare the performance of the final codebase against these baselines to track our progress. We can use the existing Drizzle Automation project to perform these baseline benchmarks.
Once the current I_S implementation has been clearly documented along with the issues associated with the current implementation, we aim to extract this implementation and make it a plugin. A number of sub-steps will be involved in this stage:
Once I_S has been extracted to a plugin, this should be merged to the mainline branch. Of course, this is dependent on code review from the Drizzle development team and the outcome of regression testing. The next step of the project cannot be proceeded to until this stage has been completed.
Once I_S has been extracted into a plugin, work on a new design for I_S can begin. This stage of the project will likely require significant input from the wider community. While it is likely that Padraig can draw up a design himself, many other members of the community who have faced issues with the current I_S design have ideas on a new design for I_S. It is highly encouraged that input from the community be obtained for this stage of the project by sending an email to the mailing list asking for input.
Once discussion has occured with the community, a design will be decided on (after the community has at least given it some approval). This design will be clearly documented before any implementation effort proceeds. Again, feedback will be sought on this design document before implementation proceeds.
Once a design has been decided upon, an implementation of that design can be performed.
We have split the project in to a number of phases which correspond very tightly to the overall project goals listed in Section 1. A separate sub-section will be provided below with details on each separate phase of the project along with the following metrics: 1) estimated time to completion (in work days) and 2) estimated completion date.
Please note that all dates are purely estimates but we are cautiously optimistic that they can be met. We feel that having a deadline to aim for can make it easier for the developer to organise and plan his/her schedule and provides a clear goal to aim for.
Phase 1 of the project involves documenting how the current I_S implementation works. This will involve examining source code files, reading documentation, and stepping through execution of the server with gdb. Once this phase has been completed, we aim to have a complete understanding of how the current I_S implementation works along with the issues that are associated with this implementation. This phase will also include some time to look at the method which PostgreSQL uses to manage its system catalog as we believe that some valuable lessons can be learned from performing this exercise.
Estimated Time To Completion: 7 days
Estimated Completion Date: 05/29
The second phase of the project involves creating a new I_S plugin for Drizzle. This will involve becoming familiar with the plugin system in Drizzle and understanding the process of how to create a new plugin. Basically, for this phase of the project, the existing I_S implementation will not be radically changed; we only aim to extract what is currently there in order to create a new I_S plugin.
This phase will also involve regression testing in order to make sure that we have not introduced any substantial regression by extracting I_S into a plugin.
Estimated Time To Completion: 21 days
Estimated Completion Date: 06/20
Phase 3 of the project involves developing a new design for I_S in Drizzle. This phase of the project will not involve much development effort but will require a lot of documentation. Feedback from the community and project mentor is crucial for this phase of the project. It is likely that the new I_S design will require several rounds of revisions after the initial design is documented.
This phase of the project can also include some prototype implementations which will help with phase 4 when the actual implementation of the design is performed. These prototype implementations can occur in parallel with the I_S design so that the developer is not concentrating solely on documentation during this phase of the project.
Estimated Time To Completion: 28 days
Estimated Completion Date: 07/20
Phase 4 of the project is the implementation of the design specified during phase 3 of the project. If the design is well thought out and clearly specified, implementation should not be too difficult. Of course, any new code should be reviewed by several member of the Drizzle developer team before it is merged to trunk. Since this practice is part of the Drizzle development model, this fact does not need to be re-iterated further.
Estimated Time To Completion: 21 days
Estimated Completion Date: 08/15
Finally, phase 5 of the project is to ensure that the new implementation is rigorously tested. This will involve the creation of new test cases if necessary (which will also be inherent to phase 4 of the project) to obtain the highest code coverage possible of the new I_S implementation. This phase will also involve regression testing to determine what kind of performance the new design obtains.
During this phase of the project, the project's mentor should ensure that the documentation created as part of the project is acceptable. If modifications are needed to any documentation produced, this phase of the project would be a good time for the developer to respond to those requests.
Estimated Time To Completion: 7 days
Estimated Completion Date: 08/24
In this sub-section, we proceed to describe how I_S is currently implemented in Drizzle.
Currently, an I_S table is represented by an InfoSchemaTable structure. This structure is defined in drizzled/table.h as follows:
struct InfoSchemaTable { const char* table_name; ST_FIELD_INFO *fields_info; /* Create information_schema table */ Table *(*create_table) (Session *session, TableList *table_list); /* Fill table with data */ int (*fill_table) (Session *session, TableList *tables, COND *cond); /* Handle fileds for old SHOW */ int (*old_format) (Session *session, struct InfoSchemaTable *schema_table); int (*process_table) (Session *session, TableList *tables, Table *table, bool res, LEX_STRING *db_name, LEX_STRING *table_name); int idx_field1, idx_field2; bool hidden; uint32_t i_s_requested_object; /* the object we need to open(Table | VIEW) */ };
The fields_info member of InfoSchemaTable represents the fields of a particular I_S table. This structure is defined as follows (its definition is also found in drizzled/table.h):
typedef struct st_field_info { /** This is used as column name. */ const char* field_name; /** For string-type columns, this is the maximum number of characters. Otherwise, it is the 'display-length' for the column. */ uint32_t field_length; /** This denotes data type for the column. For the most part, there seems to be one entry in the enum for each SQL data type, although there seem to be a number of additional entries in the enum. */ enum enum_field_types field_type; int value; /** This is used to set column attributes. By default, columns are @c NOT @c NULL and @c SIGNED, and you can deviate from the default by setting the appopriate flags. You can use either one of the flags @c MY_I_S_MAYBE_NULL and @cMY_I_S_UNSIGNED or combine them using the bitwise or operator @c |. Both flags are defined in table.h. */ uint32_t field_flags; // Field atributes(maybe_null, signed, unsigned etc.) const char* old_name; /** This should be one of @c SKIP_OPEN_TABLE, @c OPEN_FRM_ONLY or @c OPEN_FULL_TABLE. */ uint32_t open_method; } ST_FIELD_INFO;
The current list of tables in I_S is defined statically at the bottom of the drizzled/show.cc file as an array called schema_tables. An example of the first few entries of this array is:
InfoSchemaTable schema_tables[]= { {"CHARACTER_SETS", charsets_fields_info, create_schema_table, fill_schema_charsets, make_character_sets_old_format, 0, -1, -1, 0, 0}, {"COLLATIONS", collation_fields_info, create_schema_table, fill_schema_collation, make_old_format, 0, -1, -1, 0, 0}, {"COLLATION_CHARACTER_SET_APPLICABILITY", coll_charset_app_fields_info, create_schema_table, fill_schema_coll_charset_app, 0, 0, -1, -1, 0, 0}, {"COLUMNS", columns_fields_info, create_schema_table, get_all_tables, make_columns_old_format, get_schema_column_record, 1, 2, 0, OPTIMIZE_I_S_TABLE}, ...
The information on the fields for each table is also defined statically in the drizzled/show.cc file. The fields for each table is represented as a separate array of ST_FIELD_INFO structures. As an example, the fields for the PLUGINS I_S table is represented as the array plugin_fields_info which looks like so:
ST_FIELD_INFO plugin_fields_info[]= { {"PLUGIN_NAME", NAME_CHAR_LEN, DRIZZLE_TYPE_VARCHAR, 0, 0, "Name", SKIP_OPEN_TABLE}, {"PLUGIN_VERSION", 20, DRIZZLE_TYPE_VARCHAR, 0, 0, 0, SKIP_OPEN_TABLE}, {"PLUGIN_STATUS", 10, DRIZZLE_TYPE_VARCHAR, 0, 0, "Status", SKIP_OPEN_TABLE}, {"PLUGIN_AUTHOR", NAME_CHAR_LEN, DRIZZLE_TYPE_VARCHAR, 0, 1, 0, SKIP_OPEN_TABLE}, {"PLUGIN_DESCRIPTION", 65535, DRIZZLE_TYPE_VARCHAR, 0, 1, 0, SKIP_OPEN_TABLE}, {"PLUGIN_LICENSE", 80, DRIZZLE_TYPE_VARCHAR, 0, 1, "License", SKIP_OPEN_TABLE}, {0, 0, DRIZZLE_TYPE_VARCHAR, 0, 0, 0, SKIP_OPEN_TABLE} };
When research initially started on the current I_S implementation, we were under the impression that the I_S would be created during server startup (as is common in many relational databases e.g. PostgreSQL). We were not aware of the fact that temporary tables were created during query execution and so spent some time looking at what happens with regards to I_S during server startup.
Looking at the main() function in drizzled/drizzled.cc, we were not able to find much occuring with regards to I_S functionality. That being said, there is a call to the add_status_vars() function from within the init_common_variables() function which adds server status variables to the dynamic list of status variables that are shown by 'SHOW STATUS'.
Thus, we are not paying much attention to server startup for the moment as we don't feel much related to I_S occurs during this stage. If we discover more information, we will update this section appropriately.
WORK IN PROGRESS
Query processing on an I_S table is best explained using an example query. Thus, we will use the following query for demonstration purposes:
SELECT constraint_name FROM key_column_usage;
The key_column_usage table is defined as follows in the schema_tables array described above:
{"KEY_COLUMN_USAGE", key_column_usage_fields_info, create_schema_table,
get_all_tables, 0, get_schema_key_column_usage_record, 4, 5, 0,
OPEN_TABLE_ONLY},
Below is a rough outline of the call-graph for this query (TODO: get a better diagram for this) where the indentation is intended to indicate what calls what (functions which we are not interested in are omitted):
JOIN::exec
get_schema_tables_result
get_all_tables
get_schema_table_idx
get_table_open_method
get_lookup_field_values
make_db_list
find_files
make_table_name_list
find_files
make_table_list
open_normal_and_derived_tables
open_tables
mysql_schema_table
create_schema_table
create_tmp_table
get_new_handler
get_schema_key_column_usage_record
store_key_column_usage
schema_table_store_record
close_tables_for_reopen
Basically, the query is issued against temporary tables which are created and populated during query execution. Now, we will attempt to step through the above execution and expand a little more on what happens.
Every query issued against an I_S table goes through JOIN::exec() (as does every query in general) which is the method that executes a query plan produced by the optimizer (TODO: expand on what the optimizer currently does with an I_S query). Within JOIN::exec(), a call to get_schema_tables_result() is performed. When the query being executed is on an I_S table, query execution will stop after this method call i.e. queries on I_S tables are satisfied by the get_schema_tables_result() function.
Within the get_schema_tables_result() function, there are some optimizations which are specific to the get_all_tables() function (only when session->lex->desribe is set). First, a check is performed to determine whether this I_S table was already processed. If it has been, then the corresponding temporary table does not need to be populated again (it may however need to be refreshed).
Assuming that the I_S table has not already been processed, a call to the fill_table member (which is a function pointer) of the InfoSchemaTable structure is made. We can determine what function will be called based on the definition of each InfoSchemaTable in the schema_tables array. The specific query we are investigating is on the key_column_usage table whose fill_table member is set to get_all_tables() (the fill_table member is set to get_all_tables() for a number of I_S tables).
The get_all_tables() function fills the I_S tables whose data are retrieved from storage engines. The first task performed by this function is to determine the method by which the files needed for populating the I_S table should be opened. Since we are walking through get_all_tables() in this case, the only method available is to open both data and index files (we will discuss the other possible methods later).
Next, a call to get_lookup_field_values() is performed. This function calculates lookup values from WHERE conditions. It in turn calls the calc_lookup_values_from_cond() function if the query is not a SHOW command. However, since this specific query we are investigating does not have a WHERE condition, it does not do anything of significance.
The next relevant function called from get_all_tables() is the make_db_list() function. This function creates the list of database names. It does this by calling the find_files() function.
find_files() does exactly as specified in its name - it finds files in a given directory. The directory it is given to search in is the Drizzle data directory (specified in drizzle_data_home). It does this by calling the my_dir() function which opens a directory and then proceeds to read all entries within that directory. Whether it searches for sub-directories or files is specified by the input parameter dir; if this parameter is true then it will search for sub-directories, otherwise it will search for files. In this case, since dir is true (when called from make_db_list), find_files() searches for sub-directories. For each sub-directory it finds within the Drizzle data directory, it converts the sub-directory name to a table name (which in this invocation of find_files() really represents a database name) (by calling filename_to_tablename) and then stores that name within the current Session object.
Next, after populating the list of database names, the get_all_tables() function will iterate through each database. For each database, it will make a call to the make_table_name_list() function that creates the list of table names within a specific database. Similar in spirit to make_db_list(), it does this by calling the find_files() function.
This invocation of find_files() is made with the dir parameter to find_files() set to false. Since dir is false in find_files(), it will search for files within a directory. The directory that find_files() is told to search in will be the Drizzle data directory concatenated with the database name that get_all_tables() is currently working with. For each file it finds within the given database directory, it converts the file name to a table name (by calling filename_to_tablename) and then stores that name within the current Session object.
Once the list of tables within a database have been determined, the get_all_tables() function iterates through that list. Next, a few optimizations are present for queries which are SHOW commands but these are not relevant for this specific query (we will attempt to document these optimizations at a later stage). Thus, the next relevant function called from get_all_tables() is the open_normal_and_derived_tables() function. This function opens all tables that are specified as an input parameter and processes derived tables.
open_normal_and_derived_tables() first calls the open_tables() function. The open_tables() function has specific logic to check whether the table being opened is an I_S table. If it is an I_S table, it calls the mysql_schema_table() function to process it. The mysql_schema_table() function is where the temporary table will be created for the I_S table. It does this by calling the create_table member of the InfoSchemaTable structure. We can determine what function will be called based on the definition of each InfoSchemaTable in the schema_tables array. The specific query we are investigating is on the key_column_usage table whose create_table member is set to create_schema_table() (as it is for the majority of I_S tables).
The create_schema_table() function first gets the list of fields that are needed for the specific I_S table by accessing the fields_info member of the InfoSchemaTable structure. Next, it loops through each field and creates a new Item based on the type of each field and adds each newly created item to a list of fields. Once this has been done, a Tmp_Table_Param object is created to store a description of the temporary table to create that is passed as a parameter to the create_tmp_table() function that is called next. At this stage, a temporary table has been created with the appropriate fields but it has not been populated with data yet.
After the call to open_normal_and_derived_tables() is completed and the temporary table has been created, the next important function call is to the process_table member of the InfoSchemaTable structure. Again, we can determine what function will be called based on the definition of each InfoSchemaTable in the schema_tables array. The specific query we are investigating is on the key_column_usage table whose process_table member is set to get_schema_key_column_usage_record().
The get_schema_key_column_usage_record() function is specific to the key_column_usage table. It calls the store_key_column_usage() function that stores information on key column usage for the current table that the get_all_tables() function is on (remember that get_all_tables() is currently iterating over each table in each database) in the temporary table that was created.
After all databases and tables have been traversed, the actual select query on the I_S table will read from the temporary table that was created.
I_S tables are currently implemented as temporary tables which are created and populated with data during query execution. This causes a number of issues with queries on the I_S. Preferably, the metadata that the I_S generates should be stored as a set of tables in the database. By keeping the metadata in the same format as the data, the system is made both more compact and simpler to use: users can employ the same language and tools to investigate the metadata that they use for other data, and the internal system code for managing the metadata is largely the same as the code for managing other tables [1].
Creating temporary tables and populating them with data during query execution can have a significant impact on performance. Evidence of this fact can be seen in several bugs that have been filed against the current I_S implementation due to its performance issues e.g. [2]. The manner by which these temporary tables are created is also likely to become a severe bottleneck when a large number of databases and tables are present. Remember that the Drizzle data directory is first searched for sub-directories which represent databases and then each sub-directory is searched for files which represent tables in each database. This searching is performed by making calls to my_dir() which is just a wrapper around opendir() and readdir() calls. This can obviously wind up being very inefficient if a Drizzle data directory contains many files.
TODO: create a simple benchmark (some small SQL script) to show how bad performance can get.
A very rough start has been made on extracting the current I_S implementation to be a plugin. This is not meant to be code that will be merged. This is being done in order for the author to become more familiar with Drizzle's plugin system and also to get a better feel for what an interface for an I_S plugin should be. After all, only after trying to create an I_S plugin can we see what issues it brings about.
The main branch for this work is:
lp:~posulliv/drizzle/info-schema-plugin
At the moment (as of 06/16), 1 patch related to this work has been merged (from the branch lp:~posulliv/drizzle/refactor-info-schema-class). This initial work was a re-factoring of the InfoSchemaTable structure. The InfoSchemaTable structure is now a fully fledged class where all data members are private. Accessors have been provided in order for clients to access the information in this class. We believe this effort was necessary since building a plugin on a data structure which was not very suitable would have been un-desirable.
Now that the InfoSchemaClass has been re-factored, a very small patch has been created that just extracts the PROCESSLIST table. The PROCESSLIST table is now part of the info schema plugin and has been removed from the show.cc file. We are still waiting for feedback on this patch. Once feedback is received, we can proceed to extract more of the I_S tables into the plugin we have created (named info_schema).
A branch which serves a proof-of-concept has also been created. This branch is too large to propose for merging as a single patch. In this branch, all of the I_S tables have been extracted into a plugin. This branch can be obtained at:
lp:~posulliv/drizzle/full-i-s-plugin
WORK IN PROGRESS - this is really just a stream of thoughts at the moment that I am writing down somewhere...
We believe the main topic of this project is metadata management. A method for storing metadata (i.e. a system catalog) efficiently needs to be devised for Drizzle. Once this is in place, we believe that we can develop an I_S using this system catalog as a base. First, we shall define what we are referring to when we say system catalog.
In [3], a repository (also referred to as a dictionary or catalog) is defined as something which holds descriptions of the objects in a system and records the interdependencies among those objects. Some of the key properties of a repository are:
Any system catalog we design should possess the properties listed above if possible. Initially, we are thinking that a set of base tables shall be used as the store for the system catalog. These base tables will exist in a particular database designated for this purpose e.g. SYSTEM. These base tables should be created when the server starts for the first time. On subsequent server starts, the base tables should be opened with the data that already exists within them. What exactly will be stored in these tables and their structure can be decided on at a later stage when these ideas have been fleshed out more.
An interface will need to be provided to these base tables. We need to be able to retrieve information from them and update them with information. Since Drizzle is based on a micro-kernel design, we believe message passing is likely to be the best way to accomplish this. An interface needs to be defined that receives and sends messages (GPB messages is the likely tool to use here). These messages would represent changes in the metadata. For example, if a new table is created a message should be created and sent to whatever module implements the interface to the base tables.
Plugins can register as observers of this metadata so that whenever an event occurs related to metadata, an appropriate message will be sent to each observer. Utilizing this approach, we could write I_S as some sort of plugin which registers as an observer of these metadata events and updates the information it is storing appropriately.
Ideally, I would like the I_S to be implemented as views defined on top of the base tables. I feel that this would be the best approach as we would not need to worry about updates to the metadata in this case since when a query is issued on an I_S view, the appropriate data will be materialized.
Of course, this means we need support for views in Drizzle. Views are likely to be added as a plugin in the not too distance future...
We will briefly expand (without going in to much detail) on how I_S is implemented in PostgreSQL as it is an open-source relational DBMS with a rich academic heritage that implements the I_S SQL standard [4].
Basically, the I_S in PostgreSQL consists of a set of views that are defined on the PostgreSQL system catalogs which are specific to PostgreSQL. Since I_S is defined in the SQL standard, the views defined on the system catalogs remain stable and do not contain information about PostgreSQL specific features. The system catalog base tables are defined in header files contained in the src/include/catalog directory in a PostgreSQL source tree and internal methods for accessing and manipulating these system catalogs are implemented in C files in the src/backend/catalog directory in a PostgreSQL source tree. One other file which is of importance is the information_schema.sql file under the src/backend/catalog directory. This file actually creates the SQL commands for creating the various I_S views defined in the SQL standard.
The system catalog is created when a server is initialized and is stored under the SYSTEM tablespace in PostgreSQL. The I_S views defined in the SQL standard are created during server initialization also. Since these views are dynamic, they provide the latest information contained in the system catalog base tables when a query is issued against them. Also, since the I_S is implemented as views over base tables, the internal system code for managing the metadata is largely the same as the code for managing other tables. Thus, the optimizer can play an important role in optimizing queries issued against an I_S view.
We believe the Oracle DBMS follows a similar approach by defining a set of views over some base tables which contain metadata. Oracle differs however in that it does not provide an I_S as defined in the SQL standard. The set of views defined in Oracle are specific to that particular database and are referred to as the data dictionary. Extensive information can be gleaned from these views defined in Oracle's data dictionary. All data dictionary views in Oracle start with v$ and are defined over base tables which all start with x$. These base tables are hidden from users of the database and database users and administrators are discouraged from reading directly from these base tables since they are highly de-normalized.
[1] Joseph M. Hellerstein and Michael Stonebraker. Anatomy of a Database System. In Readings in Database Systems, 4th Edition. 2005.
[2] http://bugs.mysql.com/bug.php?id=19588
[3] Jim Gray and Andreas Reuter. Transaction Processing: Concepts and Techniques. 1993.
[4] http://www.postgresql.org/docs/current/interactive/information-schema.html