Retrieving data in Python

We provide a Python function, query, that fetches time series data from InfluxDB server and returns pandas.DataFrame object.

Setting up

The query function uses requests, an HTTP client library for humans, to access the database. Since the function returns pandas.DataFrame object, we need pandas as well. To install them:

$ sudo apt install python-requests python-pandas

or if you prefer pip:

$ pip install requests pandas

Please make sure installing them on your correct Python environment. This guide is written with python v3.5, requests v2.14.2, and pandas v0.20.1.

Provided function

query(domain, api_key[, time_filter=''][, device='//'][, location='//'][, sensor='//'][, include_network_sensors=False][, channel='//'][, agg_func=None][, agg_interval=None][, do_unstack=True][, convert_timestamp=True][, timezone='UTC'])

fetches time series data with matching parameters from the database server and converts into pandas.DataFrame object

Parameters
  • domain (str) – The domain part of the URL

  • api_key (str) – Secret key for authenticating with the server

  • time_filter (str) – Time filter expression written in InfluxQL, the default value '' ignores time

  • device (str) – InfluxQL regular expression to select devices, the default value '//' matches all devices

  • location (str) – InfluxQL regular expression to select locations, the default value '//' matches all locations

  • sensor (str) – InfluxQL regular expression to select sensors, the default value '//' matches all sensors

  • include_network_sensors (bool) – True - allows returning network-related data, False (default) - filters out all network-related data

  • channel (str) – InfluxQL regular expression to select channels, the default value '//' matches all channels

  • agg_func (str) – InfluxQL aggregate function evaluated on the selected data, no aggregate function is applied by default

  • agg_interval (str) – InfluxQL time interval value, used with agg_func

  • do_unstack (bool) – True (default) - return time series in individual columns, False - return all time series in a single column

  • convert_timestamp (bool) – True (default) - convert timestamps into date time objects, False - return timestamps in unix time in milliseconds

  • timezone (str) – Timezone name, used with convert_timestamp

Returns

object containing time series data

Type

pandas.DataFrame

Download

To download the source of decentlab.py, please use the Raw or Download ZIP button on the GitHubGist page.

Demonstration script

The following Python script demonstrates the main functionality of query function with our demo.decentlab.com server. Before running the script, please download decentlab.py from our gist repository by clicking the Raw and save it in your Python environment.

from decentlab import query
df = query(domain='demo.decentlab.com',
           api_key='eyJrIjoiclhMRFFvUXFzQXpKVkZydm52b0VMRVg3M3U2b3VqQUciLCJuIjoiZGF0YS1xdWVyeS1hcGktZGVtby0yIiwiaWQiOjF9',
           time_filter='time > now() - 1d',
           device='/26[67]/',
           sensor='/(temperature)|(humidity)/',
           agg_func='mean',
           agg_interval='7m')

The above example will retrieve the data from demo.decentlab.com server, matching the following criteria:

  • devices with ID containing 266 or 267 (will match also 2671, 12670, 4266, etc)

  • sensor names containing either temperature or humidity

  • compute the mean of groups divided by 7 min interval

  • received in the last one day

Note that the head function shows only the first five rows. The result of the above script:

>>> df.head()

series               266.sht-humidity  266.sht-temperature  267.sensirion-sht21-humidity  267.sensirion-sht21-temperature
time
2017-02-06 15:04:00         31.261963            21.919326                           NaN                              NaN
2017-02-06 15:11:00         30.666870            21.951501                           NaN                              NaN
2017-02-06 15:18:00         30.445618            22.101653                           NaN                              NaN
2017-02-06 15:25:00         30.251068            22.133828                           NaN                              NaN
2017-02-06 15:32:00         30.209106            22.144553                           NaN                              NaN

Another example demonstrates different possibilities.

from decentlab import query
df = query(domain='demo.decentlab.com',
           api_key='eyJrIjoiclhMRFFvUXFzQXpKVkZydm52b0VMRVg3M3U2b3VqQUciLCJuIjoiZGF0YS1xdWVyeS1hcGktZGVtby0yIiwiaWQiOjF9',
           time_filter = 'time >= \'2016-12-01 00:00:00\' AND time < \'2017-01-01 00:00:00\'',
           device = '/^266$/',
           sensor = '/^sht-/',
           timezone = 'Europe/Zurich',
           do_unstack = False)

In this example, we obtain data received from only 266. ^ and $ indicates the beginning and ending of regular expression so that no other ID containing 266, such as 1266, 2660, or _266a, matches the expression. In addition, all sensors starting with sht- are selected. The casting was disabled to obtain single column output data.

>>> df.head()
                                                          value
time                             series
2016-12-01 01:02:43.769000+01:00 266.sht-humidity     27.462524
                                 266.sht-temperature  21.018418
2016-12-01 01:07:43.762000+01:00 266.sht-humidity     27.462524
                                 266.sht-temperature  21.007693
2016-12-01 01:12:43.763000+01:00 266.sht-humidity     27.508301

>>> df.tail()
                                                          value
time                             series
2017-01-01 00:47:02.358000+01:00 266.sht-temperature  20.417813
2017-01-01 00:52:02.356000+01:00 266.sht-humidity     25.044006
                                 266.sht-temperature  20.407087
2017-01-01 00:57:02.364000+01:00 266.sht-humidity     25.044006
                                 266.sht-temperature  20.407087

Here, the data received in the December of 2016 is returned. Please be careful with absolute time value in a query because it is always interpreted in UTC, and the output is also in UTC by default. For instance, the output timezone is Europe/Zurich in this example, and therefore the time values are beyond 2016-12-31 23:59:59, which is in the timezone of UTC+01 (winter time).