Sling CLI Connection Management
Sling CLI's connection management helps you easily interact with many connections quickly and efficiently, all from your shell environment.
Introduction to Sling CLI
Sling CLI is a command line tool which allows easy and efficient movement of data (Extract & Load) from/to Databases and Storage Platforms. It's trivial to get started, you can simply run pip install sling
if you have Python's pip installed. Or you can download the binary for your machine here.
Connection Credentials
In order to use sling
, we must first configure connection credentials, and Sling CLI looks for them in various places. This allows a “plug & play” nature if you are already using another tool such as dbt
, or have connection URLs set in environment variables. It is however recommended to use Sling’s env.yaml
file as it allows a more consistent and flexible experience.
Sling Env File
The first time you run the sling
command, the .sling
folder is created in the current user’s home directory (~/.sling
), which in turn holds a file called env.yaml
. The structure for the Sling’s Env file is simple, you put your connections’ credential under the connections
key as shown below:
connections:
marketing_pg:
url: 'postgres://...'
ssh_tunnel: 'ssh://...' # optional
# or dbt profile styled
marketing_pg:
type: postgres
host: [hostname]
user: [username]
password: [password]
port: [port]
dbname: [database name]
schema: [dbt schema]
ssh_tunnel: 'ssh://...'
finance_bq:
type: bigquery
method: service-account
project: [GCP project id]
dataset: [the name of your dbt dataset]
keyfile: [/path/to/bigquery/keyfile.json]
# global variables, available to all connections at runtime (optional)
variables:
aws_access_key: '...'
aws_secret_key: '...'
Please see here for all the accepted connection types and their respective data point needed.
When using the sling conns list
command with Sling Env credentials, the SOURCE
column will show as sling env yaml
.
Environment variables
If you’d rather use environment variables, it suffices to set them in your shell environment the usual way:
# Mac / Linux
export MY_PG='postgresql://user:mypassw@pg.host:5432/db1'
export MY_SNOWFLAKE='snowflake://user:mypassw@sf.host/db1'
export ORACLE_DB='oracle://user:mypassw@orcl.host:1521/db1'
# Windows Powershell
set MY_PG 'postgresql://user:mypassw@pg.host:5432/db1'
set MY_SNOWFLAKE 'snowflake://user:mypassw@sf.host/db1'
set ORACLE_DB 'oracle://user:mypassw@orcl.host:1521/db1'
When using the sling conns list
command with environment variables, the SOURCE
column will show as env variable
.
DBT Profiles
dbt
is another popular tool that many data professionals use on a daily basis, and supporting existing local profiles allows easy cross-use. The typical location for the dbt credentials are in the ~/dbt/profiles.yml
file. See here for more details.
If you have dbt
credentials in place and use the sling conns list
command, the SOURCE
column will show as dbt profiles yaml
.
The conns
Sub-Command
Now that you have credentials set, sling offers a conns
sub-command to interact with the connections. We can perform the following operations: list
, test
and discover
.
$ sling conns -h
conns - Manage local connections
Usage:
conns [discover|list|test]
Subcommands:
discover list available streams in connection
list list local connections detected
test test a local connection
Flags:
--version Displays the program version string.
-h --help Displays help with available flag, subcommand, and positional value parameters.
Listing Connections
It's convenient to see and list all connections available in our environment. We can simply run the sling conns list
command. Here is an example:
$ sling conns list
+----------------------+------------------+-------------------+
| CONN NAME | CONN TYPE | SOURCE |
+----------------------+------------------+-------------------+
| AWS_S3 | FileSys - S3 | sling env yaml |
| AZURE_STORAGE | FileSys - Azure | sling env yaml |
| BIGQUERY | DB - BigQuery | sling env yaml |
| BIONIC_DB1 | DB - PostgreSQL | dbt profiles yaml |
| BTD_S3 | FileSys - S3 | sling env yaml |
| CLICKHOUSE | DB - Clickhouse | sling env yaml |
| DEMO_POSTGRES | DB - PostgreSQL | sling env yaml |
| SNOWFLAKE | DB - Snowflake | env variable |
| STEAMPIPE | DB - PostgreSQL | sling env yaml |
+----------------------+------------------+-------------------+
Testing Connections
The Sling CLI tool also allows testing connections. Once we know the connection name, we can use the sling conns test
command:
$ sling conns test -h
test - test a local connection
Usage:
test [name]
Positional Variables:
name The name of the connection to test (Required)
Flags:
--version Displays the program version string.
-h --help Displays help with available flag, subcommand, and positional value parameters.
Here is an actual example:
$ sling conns test MSSQL
6:42PM INF success!
Discovering Connection Streams
This is another nifty sub-command that allows one to see which data streams are available for sling
is read from for a particular connection: the sling conns discover
command.
$ sling conns discover -h
discover - list available streams in connection
Usage:
discover [name]
Positional Variables:
name The name of the connection to test (Required)
Flags:
--version Displays the program version string.
-h --help Displays help with available flag, subcommand, and positional value parameters.
-f --filter filter stream name by pattern (e.g. account_*)
--folder discover streams in a specific folder (for file connections)
--schema discover streams in a specific schema (for database connections)
For database connections, it will list the available tables and views. For storage connections, it will list the non-recursive file objects located in the specified source folder. Below are some examples.
Database Connections
$ sling conns discover CLICKHOUSE
6:57PM INF Found 68 streams:
- "default"."docker_logs"
- "default"."sling_docker_logs"
- "system"."aggregate_function_combinators"
- "system"."asynchronous_metric_log"
- "system"."asynchronous_metrics"
- "system"."build_options"
- "system"."clusters"
....
If we want to filter for a specific shema, we can do:
$ sling conns discover CLICKHOUSE --schema default
8:29PM INF Found 2 streams:
- "default"."docker_logs"
- "default"."sling_docker_logs"
Storage Connections
$ sling conns discover AWS_S3
6:52PM INF Found 7 streams:
- s3://my-sling-bucket/logging/
- s3://my-sling-bucket/part.01.0001.csv
- s3://my-sling-bucket/sling/
- s3://my-sling-bucket/temp/
- s3://my-sling-bucket/test.fs.write/
- s3://my-sling-bucket/test/
- s3://my-sling-bucket/test_1000.csv
If we want to see the files in a sub-folder, we can do this:
$ sling conns discover AWS_S3 --folder s3://my-sling-bucket/logging/
6:55PM INF Found 1 streams:
- s3://my-sling-bucket/logging/1/1.log.gz
Running EL Tasks
Now that your connections are set, you are ready to run some Extract and Load tasks! We cover this in detail in a separate post, you can read about it here. From the command line, you can also run sling run -e
which will print a bunch of examples.
Conclusion
All things considered, Sling CLI makes it easy to manage and interact with various types of connections from your shell. If you have any questions or suggestions, feel free to contact us here.