id | title | sidebar_label |
---|---|---|
infra_config |
Infra Config |
Infra Config |
:::caution THIS GUIDE IS ONLY FOR INFRASTRUCTURE SETUP, PLEASE READ THE GENERAL CONFIG TO CONFIGURE ENTITIES SUCH AS QUERY ENGINE & ACCESS PERMISSION. :::
Eventhrough Querybook can be launched without any configuration, it is absolutely required for more powerful infrastructure/flexible customization. In this guide we will walkthrough different kind of environment settings you can set for Querybook. You can see all possible options and default values in this repo by checking out querybook/config/querybook_default_config.yaml
.
There are two ways to pass custom configs to Querybook. The first way is using a custom config yaml file. You can write out the file and then pass it through querybook using docker volumes. For example you can add this in the docker-compose file:
- path_to_my_custom_config.yaml:/opt/querybook/querybook/config/querybook_config.yaml
Otherwise you can also pass the environment variable directly when launching the docker image. The order of precedence for a config settings is as the follows:
- Environment variables (highest priority)
- querybook_config.yaml
- querybook_default_config.yaml (lowest priority)
FLASK_SECRET_KEY
: (required): This is the secret key that's used for securely signing the cookie. See https://flask.palletsprojects.com/en/1.1.x/config/#SECRET_KEY for more details.
FLASK_CACHE_CONFIG
(optional): This can be used to provide caching for API endpoints and internal logic. Follow https://pythonhosted.org/Flask-Cache/ for more details. You should provide a serialized JSON dictionary to be passed into the config.
DATABASE_CONN
(required): A sqlalchemy connection string to the database. Please check here https://docs.sqlalchemy.org/en/13/core/engines.html for formatting.
DATABASE_POOL_SIZE
(optional, defaults to 10): The size of the connection pool. See https://docs.sqlalchemy.org/en/13/core/pooling.html#sqlalchemy.pool.QueuePool.params.pool_size for more details.
DATABASE_POOL_RECYCLE
(optional, defaults to 3600): Number of seconds until database connection in pool gets recycled. See https://docs.sqlalchemy.org/en/13/core/pooling.html#setting-pool-recycle for more details.
REDIS_URL
(required): Connection string required to connect the redis instance. See https://www.digitalocean.com/community/cheatsheets/how-to-connect-to-a-redis-database for more details.
ELASTICSEARCH_HOST
(required): Connection string to elasticsearch host.
ELASTICSEARCH_CONNECTION_TYPE
(optional, defaults to naive): Setting this to naive
will connect to elasticsearch as is. If set to aws
, it will use boto3 to get auth and then connect to elasticsearch.
RESULT_STORE_TYPE
(optional, defaults to db): This configures where the query results/logs will be stored.
- db: This will keep the query results in the default db set by DATABASE_CONN
- s3: This will upload the query results on AWS S3
- file: This will save the query results as csv files in the host
The following settings are only relevant if you are using db
, note that all units are in bytes::
DB_MAX_UPLOAD_SIZE
(optional, defaults to 5242880): The max size of the result that can be retained, any row that exceeds the size limit will be truncated.
The following settings are only relevant if you are using s3
or gcs
(Google Cloud Storage), note that all units are in bytes:
STORE_BUCKET_NAME
(optional): The Bucket nameSTORE_PATH_PREFIX
(optional, defaults to ''): Key/Blob prefix for Querybook's stored results/logsSTORE_MIN_UPLOAD_CHUNK_SIZE
(optional, defaults to 10485760): The chunk size when uploadingSTORE_MAX_UPLOAD_CHUNK_NUM
(optional, defaults to 10000): The number of chunks that can be uploaded, you can determine the maximum upload size by multiplying this with chunk size.STORE_READ_SIZE
(optional, defaults to 131072): The size of chunk when reading from store.STORE_MAX_READ_SIZE
(optional, defaults to 5242880): The max size of file Querybook will read for users to view.
The following settings are only relevant if you are using s3
and your S3 bucket requires signature V4:
S3_BUCKET_S3V4_ENABLED
(optional, defaults to false):true
, if you want to enable signature v4.AWS_REGION
(optional, defaults to us-east-1): AWS region of your S3 bucket.
LOG_LOCATION
(optional): By default server logs goes to stderr. Supply a log path if you want the log to appear in a file.
EVENT_LOGGER_NAME
(optional, defaults to "null"): This configures where the event logs will be stored.
- null: This is the default logger, which does nothing and disregards the logs.
- console: This will print the event logs to the console. Could be used for debugging purpose.
- db: This will save the event logs to table **event_logs** in querybook mysql db.
You can also add addtional loggers in the event logger plugin. See Add Event Logger guide for more details.
STATS_LOGGER_NAME
(optional, defaults to "null"): This configures what stats logger to be used.
- null: This is the default logger, which does nothing and disregards the logs.
- console: This will print the stats logs to the console. Could be used for debugging purpose.
You need to add your own stats logger plugin to use it. See Add Stats Logger guide for more details.
AUTH_BACKEND
(optional, defaults to app.auth.password_auth): Python path to the authentication file. By default Querybook provides:
- app.auth.password_auth: Username/password based on registering on Querybook.
- app.auth.oauth_auth: Oauth based authentication.
You can also supply any custom authentication added in the auth plugin. See "Add Auth" and "Plugins" guide for more details.
the next few configurations are only relevant if you are using OAuth based authentication:
OAUTH_CLIENT_ID
(required)OAUTH_CLIENT_SECRET
(required)OAUTH_AUTHORIZATION_URL
(required): Url for oauth redirectionOAUTH_TOKEN_URL
(required): Url to get the oauth tokenOAUTH_USER_PROFILE
(required): Url to get the user profile
for LDAP authentication:
LDAP_CONN
(required)LDAP_USE_TLS
(optional, defaults toFalse
)LDAP_USE_BIND_USER
(optional, defaults toFalse
)- If
False
: Direct LDAP login- Additional configuration:
LDAP_USER_DN
(required) DN with {} for username/etc (ex.uid={},dc=example,dc=com
)
- Login flow:
- Direct login using formatted
LDAP_USER_DN
+ password
- Direct login using formatted
- Additional configuration:
- If
True
: Advanced LDAP login using bind user-
Additional configuration:
LDAP_BIND_USER
(required) Name of a bind userLDAP_BIND_PASSWORD
(required) Password of a bind userLDAP_SEARCH
(required) LDAP search base (ex.ou=people,dc=example,dc=com
)LDAP_FILTER
(optional) LDAP filter condition (ex.(departmentNumber=01000)
)LDAP_UID_FIELD
(optional) Field that matches the username when searching for the account to bind to (defaults touid
)LDAP_EMAIL_FIELD
: (optional) Field that matches the user email (default tomail
)LDAP_LASTNAME_FIELD
: (optional) Field that matches the user surname (default tosn
)LDAP_FIRSTNAME_FIELD
: (optional) Field that matches the user given name (default togivenName
)LDAP_FULLNAME_FIELD
: (optional) Field that matches the user full/common name (default tocn
)
-
Login flow:
- Initialized connection for the bind user.
- Searching the login user using the bind user in LDAP dictionary based on
LDAP_SEARCH
andLDAP_FILTER
. - The login user credentials are tested in direct login.
- If the previous steps were OK, the user is passed on.
-
- If
If you want to force the user to login again after a certain time, you can the following variable:
LOGS_OUT_AFTER
(optional, defaults to 0): Force user to log out after they have logged in for X number of seconds. If 0 then never expire the log in. In both cases the re-login is required if the browser is closed.
By default Querybook supports email and slack notifications for sharing DataDocs and Query completions.
PUBLIC_URL
(optional, defaults to localhost): The public url to access Querybook's website, used in oauth, email and slack.
QUERYBOOK_SLACK_TOKEN
(optional): Put the Bot User OAuth Access Token from slack here. See https://api.slack.com/docs/oauth#bots for more details.
EMAILER_CONN
(optional, defaults to localhost:22): Location of the emailer server
QUERYBOOK_EMAIL_ADDRESS
(optional, required for email): Origin address when sending emails