Retrieving data in Python¶
We provide a Python function, query
, that fetches time series data from InfluxDB server and returns pandas.DataFrame object.
Setting up¶
The query
function uses requests, an HTTP client library for humans, to access the database. Since the function returns pandas.DataFrame object, we need pandas as well. To install them:
$ sudo apt install python-requests python-pandas
or if you prefer pip:
$ pip install requests pandas
Please make sure installing them on your correct Python environment. This guide is written with python v3.5, requests v2.14.2, and pandas v0.20.1.
Provided function¶
-
query
(domain, api_key[, time_filter=''][, device='//'][, location='//'][, sensor='//'][, include_network_sensors=False][, channel='//'][, agg_func=None][, agg_interval=None][, do_unstack=True][, convert_timestamp=True][, timezone='UTC'])¶ fetches time series data with matching parameters from the database server and converts into pandas.DataFrame object
- Parameters
domain (str) – The domain part of the URL
api_key (str) – Secret key for authenticating with the server
time_filter (str) – Time filter expression written in InfluxQL, the default value
''
ignores timedevice (str) – InfluxQL regular expression to select devices, the default value
'//'
matches all deviceslocation (str) – InfluxQL regular expression to select locations, the default value
'//'
matches all locationssensor (str) – InfluxQL regular expression to select sensors, the default value
'//'
matches all sensorsinclude_network_sensors (bool) –
True
- allows returning network-related data,False
(default) - filters out all network-related datachannel (str) – InfluxQL regular expression to select channels, the default value
'//'
matches all channelsagg_func (str) – InfluxQL aggregate function evaluated on the selected data, no aggregate function is applied by default
agg_interval (str) – InfluxQL time interval value, used with
agg_func
do_unstack (bool) –
True
(default) - return time series in individual columns,False
- return all time series in a single columnconvert_timestamp (bool) –
True
(default) - convert timestamps into date time objects,False
- return timestamps in unix time in millisecondstimezone (str) – Timezone name, used with convert_timestamp
- Returns
object containing time series data
- Type
Download¶
To download the source of decentlab.py, please use the Raw
or Download ZIP
button on the GitHubGist page.
Demonstration script¶
The following Python script demonstrates the main functionality of query
function with our demo.decentlab.com server. Before running the script, please download decentlab.py from our gist repository by clicking the Raw
and save it in your Python environment.
from decentlab import query
df = query(domain='demo.decentlab.com',
api_key='eyJrIjoiclhMRFFvUXFzQXpKVkZydm52b0VMRVg3M3U2b3VqQUciLCJuIjoiZGF0YS1xdWVyeS1hcGktZGVtby0yIiwiaWQiOjF9',
time_filter='time > now() - 1d',
device='/26[67]/',
sensor='/(temperature)|(humidity)/',
agg_func='mean',
agg_interval='7m')
The above example will retrieve the data from demo.decentlab.com server, matching the following criteria:
devices with ID containing
266
or267
(will match also 2671, 12670, 4266, etc)sensor names containing either
temperature
orhumidity
compute the mean of groups divided by 7 min interval
received in the last one day
Note that the head
function shows only the first five rows. The result of the above script:
>>> df.head()
series 266.sht-humidity 266.sht-temperature 267.sensirion-sht21-humidity 267.sensirion-sht21-temperature
time
2017-02-06 15:04:00 31.261963 21.919326 NaN NaN
2017-02-06 15:11:00 30.666870 21.951501 NaN NaN
2017-02-06 15:18:00 30.445618 22.101653 NaN NaN
2017-02-06 15:25:00 30.251068 22.133828 NaN NaN
2017-02-06 15:32:00 30.209106 22.144553 NaN NaN
Another example demonstrates different possibilities.
from decentlab import query
df = query(domain='demo.decentlab.com',
api_key='eyJrIjoiclhMRFFvUXFzQXpKVkZydm52b0VMRVg3M3U2b3VqQUciLCJuIjoiZGF0YS1xdWVyeS1hcGktZGVtby0yIiwiaWQiOjF9',
time_filter = 'time >= \'2016-12-01 00:00:00\' AND time < \'2017-01-01 00:00:00\'',
device = '/^266$/',
sensor = '/^sht-/',
timezone = 'Europe/Zurich',
do_unstack = False)
In this example, we obtain data received from only 266. ^
and $
indicates the beginning and ending of regular expression so that no other ID containing 266
, such as 1266
, 2660
, or _266a
, matches the expression. In addition, all sensors starting with sht-
are selected. The casting was disabled to obtain single column output data.
>>> df.head()
value
time series
2016-12-01 01:02:43.769000+01:00 266.sht-humidity 27.462524
266.sht-temperature 21.018418
2016-12-01 01:07:43.762000+01:00 266.sht-humidity 27.462524
266.sht-temperature 21.007693
2016-12-01 01:12:43.763000+01:00 266.sht-humidity 27.508301
>>> df.tail()
value
time series
2017-01-01 00:47:02.358000+01:00 266.sht-temperature 20.417813
2017-01-01 00:52:02.356000+01:00 266.sht-humidity 25.044006
266.sht-temperature 20.407087
2017-01-01 00:57:02.364000+01:00 266.sht-humidity 25.044006
266.sht-temperature 20.407087
Here, the data received in the December of 2016 is returned. Please be careful with absolute time value in a query because it is always interpreted in UTC, and the output is also in UTC by default. For instance, the output timezone is Europe/Zurich in this example, and therefore the time values are beyond 2016-12-31 23:59:59, which is in the timezone of UTC+01 (winter time).