Compression of TimeSeries data was first introduced in 12.10.xC3. This feature provides compression algorithms on sensor data to maximize the storage efficiency. It was limited to numeric types (smallint, integer, bigint, real, and float). While this worked well for applications that only captured numeric sensor data, there are many use cases where other data is captured that is not numeric. This latest enhancement to TS compression supports these applications capturing non-numeric data from the sensors.
Sensor Data Compression with Additional Data Types
There are currently 6 compression options for the numeric data types - q(), ls(), lb(), qls(), qlb() and n(), where n() is no compression. If the user has a TS value with a subtype that contains any other data types (beside the required timestamp column), then the current TS compression could not be used.
The main goal of this enhancement in 12.10.xC10 was to support compression on time series data that includes one or more columns of type lvarchar. This feature is not meant to compress the lvarchar column itself.
As the design of the feature took shape, it was clear that supporting the lvarchar type as a non-compressed column within the time series was more complicated than supporting most other types. The good news is that when we completed this feature we were able to support all the types that are supported in a TimeSeries element, namely:
CHAR and NCHAR Considerations
The CHAR and NCHAR data types can be treated in two ways: 1) a variable length string with no space padding or 2) a fixed length, space padded, string.
If you have a CHAR(30) with the string ‘text’, the value will be the word “text” followed by 26 spaces. By default, TS trims off the trailing spaces and stores essentially a variable length string type. In this case it will store a length and 4 letters of the text. If the string has a maximum length under 256, one byte is used for the length, otherwise 2 bytes are used.
In this example, by default the compression buffer will store 5 bytes, 1-byte length followed by the word text. This will save 25 bytes.
Alternatively, let’s look at CHAR(2) field. It may be used with a code and is never padded with spaces. By default, we will store 1-byte length followed by the 1-byte or 2-byte value. For a total of 2 or 3 bytes. If the user is aware of this scenario, where the user can specify an option to the n() compression parameter: n(1) to change the behavior of CHAR and NCHAR to a fixed length, space padded, string. In this case it will store a 2-byte value since the length is not stored. In this scenario, it may save 1 byte per row on storage. This can be applied to any CHAR or NCHAR column.
If you use swing door compression, you need space in the internal buffer to store at least the current row and a saved row. However, as more rows get compressed, you may see better compression ratio.
The original buffer for compression was fixed at just under 4K. When we support the lvarchar data type with no length specification, we get the default maximum length of 2048 bytes. Using swingdoor compression, we immediately found there was not enough space in the buffer for 2 rows. Work was needed on buffer management and what size works best.
In this feature, advanced users can specify the buffer size. If not specified, TimeSeries will pick a default buffer size being the usable page size of the container. Checks are made that this is a reasonable size buffer for the given time series.
With this feature, we have introduced a new compression option - bc() - for buffer control. You may only specify it once in a compression string. It takes on several forms: