First release
30 Dec 2016
Leonardo Silvestri

Status

This is an alpha release. This means that it is expected to contain bugs and might be missing potentially important features.

Concepts

ztsdb strives for simplicity and a small code footprint. This first release weighs in at just about ~22k lines of code (while C++ and R unit tests add an additional ~16k).

Time-series update is designed with speed in mind. On slightly older server hardware (~2011 Xeon E7-4850 2GHz, 1333MHz DDR3), less than two cores can reliably run > 100,000 updates/second of a 3 column times-series while computing aggregation statistics on these updates, and handling queries without noticeable latency. Still on the same processor, data transfer between two instances is a little less than 1GB/second, which means that a time-series of 1 billion rows and 3 columns is transferred in roughly 50s.

ztsdb can be tightly integrated with R: queries can be written unquoted on the R command line. For this, we take advantage of R's lazy evaluation. Since the ztsdb query and manipulation language is a subset of R, it is parsed by R, but instead of having R evaluate it, it is parsed again to build a ztsdb abstract syntax tree (AST). This AST is then sent to the appropriate ztsdb instance where it is evaluated. The result is then sent back to R and translated into one of R's data types. This means that it is possible to have a very efficient workflow. R provides a rich statistical analysis environment while ztsdb provides the services for handling very large amounts of time-series data. The integration is tight enough that ztsdb feels more like an extension than a separate entity.

ztsdb has a very flexible mechanism to run code on any remote instance. It is easy to capture local data (from R or from a ztsdb instance) and send it as part of query code to be executed on other instances (see in particular the escape operator). It is even possible to nest such queries, and query an instance on behalf of another instance. This provides a solid mechanism for load and data distribution.

Of course these qualities of compactness, speed and flexibility have their trade-offs. From the ACID set, ztsdb can only claim ID, namely isolation and durability. Atomicity and consistency are a burden that is placed on the user. Additionally, distribution has to be solved by user architecture.

Collaboration

ztsdb is provided with the hope that it will be useful. I am extremely interested in all feedback, and I will be particularly grateful to you if you can explain your use case and why you found ztsdb was or was not suitable. Use cases and feedback will drive future development. Any collaboration is highly encouraged and welcome.

In order to maximize the ease of access for both users and developers, the project will have strong documentation, including details about the implementation.