In this post we look at methods to define data marts in the Informix Warehouse Accelerator (IWA). IWA is a software appliance for the Informix database server, using in-memory, columnar data storage for business intelligence applications doing OLAP. For such workloads IWA typically can achieve a query acceleration of 10 - 100 times faster than running the workload within the Informix database server.
Part 3: Defining lWA Data Marts
In the last post we used the IWA Example Database to demonstrate creation and administration of a data mart. This was quite easy because we used a ready-made data mart definition. Additionally, the example database itself is pretty simple. It serves well for explaining and understanding some basic concepts, but real use case scenarios are much more complex. While the administration steps for data marts always remain simple enough, finding the proper definition for a data mart can be quite difficult. And if the data mart is to accelerate some report that consists of hundreds of queries this does not make things easier. Not all of the queries may be acceleratable. Figuring out which ones are and from there the tables and columns needed in the data mart can be a herculean task. Sometimes such reports are 'grown' over many years and there may be nobody around knowing all the details.
One approach is to include all tables with all their columns in the data mart - or at least all columns of supported data types. However, this probably is not optimal and may even not be possible because the resulting data mart may be too big. Often there are some lengthy description columns never used by the typical OLAP queries. Such columns are prime candidates for omission from a data mart, making the data mart smaller and the data loading faster. Identifying these columns is a problem, especially when there are hundreds or even thousands of tables in the database.
Workload analysis attacks the problem from the other side: Analyze a given workload (e.g. one or several reports), filter all the queries that can be accelerated and determine just the tables and columns that these queries need. For this task we use built-in functionality, called "query probing", in the Informix database server. The "query probing" does the filtering of acceleratable queries and automatically collects all the necessary information for building a data mart definition, including the references between tables. This is especially useful when no primary-foreign relational constraints are defined, which often is the case in a data warehouse.
We can further combine the "query probing" with the SQL Tracing feature. This also collects timing information on the individual queries in a workload. Obviously, we need to actually execute the workload in order to get the timings. Once we have all the information from the "query probing" and the SQL Tracing, we can see which of the long running queries can be accelerated. We may even be able to determine long running queries that cannot be accelerated and perhaps can change them in some way or other to make them acceleratable.
Finally, we use the integrated function "ifx_probe2mart()" to automatically create a data mart definition from the collected query probing data. From there it is then straightforward to create the data mart itself. The data mart load step after the creation will show how big the data mart is and whether at all it fits into IWA's memory. If the data mart turns out to be too big, we can scrutinize the data mart definition to determine some column or table that we can remove as a compromise. We can also search in the collected SQL Trace statements for that table to see, which queries need the table or column we would like to remove and what run time they had. After all, we may not want to remove something that is needed by the longest running query and thus making it non-acceleratable.
Workload Analysis Example
Let's put workload analysis to work in a very simple example. We again use the IWA Example Database named "iwadb" from the last post. As workload we use the three example queries in the files "q1.sql", "q2.sql" and "q3.sql" that come with it. Following are the steps:
0. Preparation: Update statistics and create a logging database named "martdb" for the data mart definition:
1. Clear probing data possibly left over from previous probings, turn on query probing and run the workload without actually executing it:
2. Check for collected probing data:
3. From the collected probing data create a data mart definition for a data mart with name "new_mart" in the logging database "martdb":
4. Extract the data mart definition to an XML file named "new_mart.xml":
Few thoughts on the above example:
We need to perform the preparation step (0) only once, even if later we want to adapt the data mart definition to a changed query workload, e.g. when new queries get added to the workload.
Probing data is kept in memory only. After a restart of the Informix server all previously collected probing data will be gone.
For the actual probing we run the statements to set the environment and the actual workload in the same dbaccess session. This is necessary, because the shown environment settings are valid only for the duration of the session. Otherwise, it is also possible to set the variable "use_dwa" for a specific (existing) session. Alternatively, the setting can be placed in a specific "sysdbopen()" procedure.
As we probe the workload without really executing it, we do not use SQL Tracing in this example. The SQL Tracing anyway would not produce timing information.
The command "onstat -g probe" basically shows the raw probing information with little formatting to make it 'readable'. The command is useful to just check if any probing data was collected. In depth interpretation of the data is something for Tech Support.
To perform the procedure "ifx_probe2Mart()", we need to be connected to the logging
database that we created in the preparation step. If a definition for the data mart already exists, any newly probed tables or columns will be added to the existing definition.
Therefore, if we want to get rid of an existing data mart definition in the logging database, we have to manually delete it with an SQL statement like the following:
When extracting the data mart definition as XML file, we use the '!' to avoid that the function "lotofile()" adds a string to make the file name unique between repeated runs.
In the last blog post I promised to touch on more conceptual details and administration tasks for data marts in IWA. Today's topic of defining a data mart does not exactly fit my promise, but at least it is an important aspect of the overall concept. I will certainly return to more details in a future blog post.
Advisory Development Engineer
Connect with me on LinkedIn