ztsdb is a fast, small and lightweight multi-user noSQL column-store database management system designed and optimized for the update, storage and handling of time-series data. Its query and manipulation language is based on the R programming language and allows complex selections of data inside a time-series or across multiple time-series. It is free software licensed under the GPLv3.

At a glance

  • query and manipulation language based on the R programming language
  • the fundamental data types are arrays and time-series and all can be defined as persistent
  • seamless integration with R: ztsdb queries can be freely intermixed with R
  • coherent and rich representation of date/time with nanosecond precision and built-in time-zone awareness
  • seamless connectivity with other ztsdb instances
  • continuous/streaming updates
  • n-dimensional time-series
  • C and C++ interfaces for data insert
  • C++ interface for query and data update
  • dynamically extensible
  • licensing: GPLv3

Examples

Query and manipulation language based on R

Mostly a subset of R, it should feel very familiar to R users. This example shows the creation of a persistent time-series (any ztsdb data object can be either in-memory or persistent). This time series will be created and stored in "/tmp/my_first_zts", and can be updated:

  • via an external C++ process and/or
  • via a connection created on the R command line and/or
  • via another ztsdb instance
### the time-series' index is a vector of time of length 0:
idx <- as.nanotime(NULL)
### the data is a matrix of double with 0 rows and 3 columns:
data <- matrix(NaN, 0, 3, dimnames=list(NULL, c("one", "two", "three")))

z <<- zts(idx, data, file="/tmp/my_first_zts")

Seamless integration with R

The following code can be run directly on the R command line. It creates a connection to a ztsdb instance and then uses the ? operator to execute the query on the connected instance. The results are translated to R types (in particular ztsdb time-series are translated to data.table-based time-series). Let's suppose that the time-series z in the previous example was created on a local instance listening on port 15001 and that it is populated with data.

con1 <- connection(host="127.0.0.1", port=15001)  # create a connection to a specified DB instance

con1 ? head(z)         # get the first 6 observations of time-series 'z'
con1 ? tail(z, 10)     # get the last 10 observations of time-series 'z'

con1 ? z[|+2016-01-01 00:00:00 UTC -> 2017-01-01 00:00:00 UTC-|, ]  # get the specified time slice

Coherent and rich representation of date/time

The following temporal built-in types exist: nanotime, nanoival, nanoduration and nanoperiod. They have nanosecond precision with operations which, when relevant, are time-zone aware. Arithmetic and set operations are defined on these types as well as other functions such as sequence generation and alignment.

start <- |.2016-03-01 00:00:00 America/New_York.|      # date time constant
p     <- as.nanoperiod("1m1d/-12:00:00")               # period (variable length), 1 month + 1 day - 12 hours
end   <- `+`(start, p, tz="America/New_York")          # time-zone is relevant with a period
ivl   <- nanoival(start, end)                          # interval constant (by default does not contain 'end')

march_and_a_bit_more <- z[ivl, ]                       # subset the elements of 'z' that lie in 'ivl'

seq is a fast sequence generation function which is, when relevant, time zone aware. The following example generates a sequence of 12 hours:


start <- |.2000-01-01 12:00:00 Europe/London.|
end   <- |.2016-01-01 12:00:00 Europe/London.|

day_seq  <- seq(from=start, to=end, by=as.nanoperiod("1d"), tz="Europe/London")
twelve_hour_seq <- union(day_seq, day_seq - as.nanoduration("12:00:00"))

For more information on the seq function see Generating sequences.

align is a powerful function that allows matching up unaligned data as well as building aggregated data. In the following example we calculate median minutes:

### calculate the median of each minute for the observations in 'z':
one_minute <- as.nanoduration("00:01:00")
minutes <- seq(from=start, to=end, by=one_minute)
a <- align(z, minutes, -one_minute, method="median")

For more information on the align function see Align operations.

For more information on the usage of temporal types see Arithmetic operations on temporal types.

Seamless connectivity with other ztsdb database instances

An expression on the right side of the query operator ? is evaluated on the remote instance except if it is escaped. The escape operator ++ moves the evaluation of the escaped sequence to the local instance on which the connection is defined. Together with nesting, this allows for complex queries on remote instances that may in turn query other instances to evaluate the query.

### the code below can be run unchanged both on the R command line and on the ztsdb command line
port1 <- 15001
port2 <- 15002
con1 <- connection("127.0.0.1", port1)  # create a connection to instance 1
con1 ? (a <<- matrix(1:3e7, 1e7, 3))    # create on instance 1 a 10000000x3 matrix

con1 ? (port2 <<- ++port2)     # define variable 'port2' on instance 1 using local variable 'port2'
con1 ? (con2 <<- connection("127.0.0.1", port2))  # create a connection to instance 2 on instance 1
con1 ? (ivl <<- ++ivl)         # define variable 'ivl' on instance 1

con1 ? con2 ? z[++ivl, ]       # ask instance 1 to get from instance 2 the subset of 'z' in interval 'ivl' 

Additionally, since a query is an expression like any other, queries can be directly embedded inside other expressions; multiple queries in a single expression will dispatch in parallel.

con1 <- connection("192.168.0.1", 15001)  # create a connection to instance 1
con2 <- connection("192.168.0.2", 15002)  # create a connection to instance 2
x <- (con1 ? y) + (con2 ? z)   # with 'y' and 'z' defined respectively on instance 1 and instance 2

Continuous/streaming update

ztsdb timers allow repetitive execution of code at a predefined interval. A timer is executed in its own context just like any incoming request. As it shares with other contexts the same global environment, it has access to any variable defined therein (including of course time-series). Using timers enables the continuous updating of time-series (and more generally of any variable) based on changes happening to other time-series (other variables). The example below shows the code to set up a continuous calculation of the minute mean based on a time-series z that we suppose is being continuously updated at sub-minute granularity.

### create a couple of zts of size 0x3; 'z' on which we suppose sub-minute granularity
### updates, and 'mmean' where we will put the minute-means calculated from 'z':
data  <-  matrix(0, 0, 3, dimnames=list(NULL, c("a","b","c")))
idx   <-  as.nanotime(NULL)
z     <<- zts(idx, data)
mmean <<- zts(idx, data)

ten_ms <- as.nanoduration(1e7)
t1 <- timer(ten_ms,
            loop = {
                ### if a minute is complete, calculate its mean and add it to 'mmean':
                current_minute <- floor(tail(zts.idx(z), 1), "minute")
                if (current_minute >= last_minute + one_minute) {
                    last_minute <- current_minute
                    m <- align(z, last_minute, -one_minute, method="mean")
                    rbind(--mmean, m)
                }
                ### keep the size of z reasonable:
                if (nrow(z) > 2e6) {
                    zts.resize(--z, start=nrow(z)-1e6)
                }
            },
            once = {
                one_minute  <- as.nanoduration("00:01:00")
                last_minute <- floor(Sys.time(), "minute")
            })

N-dimensional time-series

Each time observation can be associated to an n-dimensional array. In the following example, a random 10x2 matrix is created and the rolling covariance over a window of 4 observations is calculated. The rollcov function returns the covariance between each column, thus returning a 10x2x2 time-series.

one_second <- as.nanoduration("00:00:01")
start <- |.2015-08-06 06:38:01 America/New_York.|
idx   <- seq(from=start, length.out=10, by=one_second)
data  <- runif(matrix(0, 10, 2))   # random matrix 10x2
z     <- zts(idx, data)
rollcov(z, z, 4)

C/C++ interface for data update

In the following C++ example, message msg is created and then immediately sent on a socket:

  auto data = std::vector<double>(ncols);
  std::iota(data.begin(), data.end(), 1);

  // create the append message:
  const auto now = std::chrono::system_clock::now();
  auto msg = arr::make_append_msg(varname, std::vector<Global::dtime>{now}, data);

  // write the message to socket 'fd':
  ssize_t wres = write(fd, msg.first.get(), msg.second);

See a complete example in C++.

See a complete example in C.

Dynamically extensible

The following C++ code defines a function that takes the cosine of the cosine:

#include <cmath>
#include "valuevar.hpp"
#include "env.hpp"

extern shared_ptr<BaseFrame> global;

val::Value _cos2(const vector<pair<string, val::Value>>& v) {
  return cos(cos(get<double>(v[0].second)));
}

val::VBuiltinG g(global, "cos2", "function (x) NULL\n", _cos2);

It can then be dynamically loaded in ztsdb like this:

dyn.load("libsotest.so")

The cos2 function is then available as any of the built-in functions.