Usage¶
To use pydrill in a project:
from pydrill.client import PyDrill
drill = PyDrill(host='localhost', port=8047)
You can also initialize via environment variables such as:
PYDRILL_HOST
PYDRILL_PORT
You can use Drill PAM authentication via auth param:
drill = PyDrill(auth='drill_user:drill_password')
To enable specific storage plugin you can:
drill.storage_enable('mongo')
You can view all queries which were executed or are running:
drill.profiles()
To check if Drill is running:
if drill.is_active():
your_code
Query involves only providing sql:
employees = drill.query('''
SELECT * FROM cp.`employee.json` LIMIT 5
''')
for employee in employees:
print result
If you feel like building sql queries is not nicest thing ever you should try pydrill_dsl https://pypi.python.org/pypi/pydrill_dsl
Support for pandas:
# pandas dataframe
df = employees.to_dataframe()
print(df[df['salary'] > 20000])
Supported api calls¶
-
class
pydrill.client.
PyDrill
(host='localhost', port=8047, trasport_class=<class 'pydrill.transport.Transport'>, connection_class=<class 'pydrill.connection.requests_conn.RequestsHttpConnection'>, auth=None, **kwargs)[source]¶ >>> drill = PyDrill(host='localhost', port=8047) >>> drill.is_active() True
-
metrics
(timeout=10)[source]¶ Get the current memory metrics.
- Parameters
timeout – int
- Returns
pydrill.client.Result
-
options
(timeout=10)[source]¶ List the name, default, and data type of the system and session options.
- Parameters
timeout – int
- Returns
pydrill.client.Result
-
plan
(sql, timeout=10)[source]¶ - Parameters
sql – string
timeout – int
- Returns
pydrill.client.ResultQuery
-
profile
(query_id, timeout=10)[source]¶ Get the profile of the query that has the given queryid.
- Parameters
query_id – The UUID of the query in standard UUID format that Drill assigns to each query.
timeout – int
- Returns
pydrill.client.Result
-
profile_cancel
(query_id, timeout=10)[source]¶ Cancel the query that has the given queryid.
- Parameters
query_id – The UUID of the query in standard UUID format that Drill assigns to each query.
timeout – int
- Returns
pydrill.client.Result
-
profiles
(timeout=10)[source]¶ Get the profiles of running and completed queries.
- Parameters
timeout – int
- Returns
pydrill.client.Result
-
query
(sql, timeout=10)[source]¶ Submit a query and return results.
- Parameters
sql – string
timeout – int
- Returns
pydrill.client.ResultQuery
-
stats
(timeout=10)[source]¶ Get Drillbit information, such as ports numbers.
- Parameters
timeout – int
- Returns
pydrill.client.Stats
-
storage
(timeout=10)[source]¶ Get the list of storage plugin names and configurations.
- Parameters
timeout – int
- Returns
pydrill.client.Result
-
storage_delete
(name, timeout=10)[source]¶ Delete a storage plugin configuration.
- Parameters
name – The name of the storage plugin configuration to delete.
timeout – int
- Returns
pydrill.client.Result
-
storage_detail
(name, timeout=10)[source]¶ Get the definition of the named storage plugin.
- Parameters
name – The assigned name in the storage plugin definition.
timeout – int
- Returns
pydrill.client.Result
-
storage_enable
(name, value=True, timeout=10)[source]¶ Enable or disable the named storage plugin.
- Parameters
name – The assigned name in the storage plugin definition.
value – Either True (to enable) or False (to disable).
timeout – int
- Returns
pydrill.client.Result
-
storage_update
(name, config, timeout=10)[source]¶ Create or update a storage plugin configuration.
- Parameters
name – The name of the storage plugin configuration to create or update.
config – Overwrites the existing configuration if there is any, and therefore, must include all
required attributes and definitions. :param timeout: int :return: pydrill.client.Result
-