Usage

To use pydrill in a project:

from pydrill.client import PyDrill

drill = PyDrill(host='localhost', port=8047)

You can also initialize via environment variables such as:

PYDRILL_HOST
PYDRILL_PORT

You can use Drill PAM authentication via auth param:

drill = PyDrill(auth='drill_user:drill_password')

To enable specific storage plugin you can:

drill.storage_enable('mongo')

You can view all queries which were executed or are running:

drill.profiles()

To check if Drill is running:

if drill.is_active():
    your_code

Query involves only providing sql:

employees = drill.query('''
  SELECT * FROM cp.`employee.json` LIMIT 5
''')

for employee in employees:
    print result

If you feel like building sql queries is not nicest thing ever you should try pydrill_dsl https://pypi.python.org/pypi/pydrill_dsl

Support for pandas:

# pandas dataframe
df = employees.to_dataframe()
print(df[df['salary'] > 20000])

Supported api calls

class pydrill.client.PyDrill(host='localhost', port=8047, trasport_class=<class 'pydrill.transport.Transport'>, connection_class=<class 'pydrill.connection.requests_conn.RequestsHttpConnection'>, auth=None, **kwargs)[source]
>>> drill = PyDrill(host='localhost', port=8047)
>>> drill.is_active()
True
is_active(timeout=2)[source]
Parameters

timeout – int

Returns

boolean

metrics(timeout=10)[source]

Get the current memory metrics.

Parameters

timeout – int

Returns

pydrill.client.Result

options(timeout=10)[source]

List the name, default, and data type of the system and session options.

Parameters

timeout – int

Returns

pydrill.client.Result

perform_request(method, url, params=None, body=None)[source]
plan(sql, timeout=10)[source]
Parameters
  • sql – string

  • timeout – int

Returns

pydrill.client.ResultQuery

profile(query_id, timeout=10)[source]

Get the profile of the query that has the given queryid.

Parameters
  • query_id – The UUID of the query in standard UUID format that Drill assigns to each query.

  • timeout – int

Returns

pydrill.client.Result

profile_cancel(query_id, timeout=10)[source]

Cancel the query that has the given queryid.

Parameters
  • query_id – The UUID of the query in standard UUID format that Drill assigns to each query.

  • timeout – int

Returns

pydrill.client.Result

profiles(timeout=10)[source]

Get the profiles of running and completed queries.

Parameters

timeout – int

Returns

pydrill.client.Result

query(sql, timeout=10)[source]

Submit a query and return results.

Parameters
  • sql – string

  • timeout – int

Returns

pydrill.client.ResultQuery

stats(timeout=10)[source]

Get Drillbit information, such as ports numbers.

Parameters

timeout – int

Returns

pydrill.client.Stats

storage(timeout=10)[source]

Get the list of storage plugin names and configurations.

Parameters

timeout – int

Returns

pydrill.client.Result

storage_delete(name, timeout=10)[source]

Delete a storage plugin configuration.

Parameters
  • name – The name of the storage plugin configuration to delete.

  • timeout – int

Returns

pydrill.client.Result

storage_detail(name, timeout=10)[source]

Get the definition of the named storage plugin.

Parameters
  • name – The assigned name in the storage plugin definition.

  • timeout – int

Returns

pydrill.client.Result

storage_enable(name, value=True, timeout=10)[source]

Enable or disable the named storage plugin.

Parameters
  • name – The assigned name in the storage plugin definition.

  • value – Either True (to enable) or False (to disable).

  • timeout – int

Returns

pydrill.client.Result

storage_update(name, config, timeout=10)[source]

Create or update a storage plugin configuration.

Parameters
  • name – The name of the storage plugin configuration to create or update.

  • config – Overwrites the existing configuration if there is any, and therefore, must include all

required attributes and definitions. :param timeout: int :return: pydrill.client.Result

threads(timeout=10)[source]

Get the status of threads.

Parameters

timeout – int

Returns

pydrill.client.Result