Sling CLI Connection Management

Introduction to Sling CLI

Sling CLI is a command line tool which allows easy and efficient movement of data (Extract & Load) from/to Databases and Storage Platforms. It's trivial to get started, you can simply run pip install sling if you have Python's pip installed. Or you can download the binary for your machine here.

Connection Credentials

In order to use sling, we must first configure connection credentials, and Sling CLI looks for them in various places. This allows a “plug & play” nature if you are already using another tool such as dbt, or have connection URLs set in environment variables. It is however recommended to use Sling’s env.yaml file as it allows a more consistent and flexible experience.

Sling Env File

The first time you run the sling command, the .sling folder is created in the current user’s home directory (~/.sling), which in turn holds a file called env.yaml. The structure for the Sling’s Env file is simple, you put your connections’ credential under the connections key as shown below:

connections:
  marketing_pg:
    url: 'postgres://...' 
    ssh_tunnel: 'ssh://...' # optional

  # or dbt profile styled
  marketing_pg:
    type: postgres        
    host: [hostname]      
    user: [username]      
    password: [password]  
    port: [port]          
    dbname: [database name]
    schema: [dbt schema]  
    ssh_tunnel: 'ssh://...' 

  finance_bq:
    type: bigquery
    method: service-account
    project: [GCP project id]
    dataset: [the name of your dbt dataset]
    keyfile: [/path/to/bigquery/keyfile.json]

# global variables, available to all connections at runtime (optional)
variables:
  aws_access_key: '...'
  aws_secret_key: '...'

Please see here for all the accepted connection types and their respective data point needed.

When using the sling conns list command with Sling Env credentials, the SOURCE column will show as sling env yaml.

Environment variables

If you’d rather use environment variables, it suffices to set them in your shell environment the usual way:

# Mac / Linux
export MY_PG='postgresql://user:mypassw@pg.host:5432/db1'
export MY_SNOWFLAKE='snowflake://user:mypassw@sf.host/db1'
export ORACLE_DB='oracle://user:mypassw@orcl.host:1521/db1'

# Windows Powershell
set MY_PG 'postgresql://user:mypassw@pg.host:5432/db1'
set MY_SNOWFLAKE 'snowflake://user:mypassw@sf.host/db1'
set ORACLE_DB 'oracle://user:mypassw@orcl.host:1521/db1'

When using the sling conns list command with environment variables, the SOURCE column will show as env variable.

DBT Profiles

dbt is another popular tool that many data professionals use on a daily basis, and supporting existing local profiles allows easy cross-use. The typical location for the dbt credentials are in the ~/dbt/profiles.yml file. See here for more details.

If you have dbt credentials in place and use the sling conns list command, the SOURCE column will show as dbt profiles yaml.

The `conns` Sub-Command

Now that you have credentials set, sling offers a conns sub-command to interact with the connections. We can perform the following operations: list, test and discover.

$ sling conns -h
conns - Manage local connections

  Usage:
    conns [discover|list|test]

  Subcommands:
    discover   list available streams in connection
    list       list local connections detected
    test       test a local connection

  Flags:
       --version   Displays the program version string.
    -h --help      Displays help with available flag, subcommand, and positional value parameters.

Listing Connections

It's convenient to see and list all connections available in our environment. We can simply run the sling conns list command. Here is an example:

$ sling conns list
+----------------------+------------------+-------------------+
| CONN NAME            | CONN TYPE        | SOURCE            |
+----------------------+------------------+-------------------+
| AWS_S3               | FileSys - S3     | sling env yaml    |
| AZURE_STORAGE        | FileSys - Azure  | sling env yaml    |
| BIGQUERY             | DB - BigQuery    | sling env yaml    |
| BIONIC_DB1           | DB - PostgreSQL  | dbt profiles yaml |
| BTD_S3               | FileSys - S3     | sling env yaml    |
| CLICKHOUSE           | DB - Clickhouse  | sling env yaml    |
| DEMO_POSTGRES        | DB - PostgreSQL  | sling env yaml    |
| SNOWFLAKE            | DB - Snowflake   | env variable      |
| STEAMPIPE            | DB - PostgreSQL  | sling env yaml    |
+----------------------+------------------+-------------------+

Testing Connections

The Sling CLI tool also allows testing connections. Once we know the connection name, we can use the sling conns test command:

$ sling conns test -h
test - test a local connection

  Usage:
    test [name]

  Positional Variables:
    name   The name of the connection to test (Required)
  Flags:
       --version   Displays the program version string.
    -h --help      Displays help with available flag, subcommand, and positional value parameters.

Here is an actual example:

$ sling conns test MSSQL
6:42PM INF success!

Discovering Connection Streams

This is another nifty sub-command that allows one to see which data streams are available for sling is read from for a particular connection: the sling conns discover command.

$ sling conns discover -h
discover - list available streams in connection

  Usage:
    discover [name]

  Positional Variables:
    name   The name of the connection to test (Required)
  Flags:
       --version   Displays the program version string.
    -h --help      Displays help with available flag, subcommand, and positional value parameters.
    -f --filter    filter stream name by pattern (e.g. account_*)
       --folder    discover streams in a specific folder (for file connections)
       --schema    discover streams in a specific schema (for database connections)

For database connections, it will list the available tables and views. For storage connections, it will list the non-recursive file objects located in the specified source folder. Below are some examples.

Database Connections

$ sling conns discover CLICKHOUSE
6:57PM INF Found 68 streams:
 - "default"."docker_logs"
 - "default"."sling_docker_logs"
 - "system"."aggregate_function_combinators"
 - "system"."asynchronous_metric_log"
 - "system"."asynchronous_metrics"
 - "system"."build_options"
 - "system"."clusters"
 ....

If we want to filter for a specific shema, we can do:

$ sling conns discover CLICKHOUSE --schema default
8:29PM INF Found 2 streams:
 - "default"."docker_logs"
 - "default"."sling_docker_logs"

Storage Connections

$ sling conns discover AWS_S3
6:52PM INF Found 7 streams:
 - s3://my-sling-bucket/logging/
 - s3://my-sling-bucket/part.01.0001.csv
 - s3://my-sling-bucket/sling/
 - s3://my-sling-bucket/temp/
 - s3://my-sling-bucket/test.fs.write/
 - s3://my-sling-bucket/test/
 - s3://my-sling-bucket/test_1000.csv

If we want to see the files in a sub-folder, we can do this:

$ sling conns discover AWS_S3 --folder s3://my-sling-bucket/logging/
6:55PM INF Found 1 streams:
 - s3://my-sling-bucket/logging/1/1.log.gz

Running EL Tasks

Now that your connections are set, you are ready to run some Extract and Load tasks! We cover this in detail in a separate post, you can read about it here. From the command line, you can also run sling run -e which will print a bunch of examples.

Conclusion

All things considered, Sling CLI makes it easy to manage and interact with various types of connections from your shell. If you have any questions or suggestions, feel free to contact us here.