Want to Ingest Files into a Database? Use Sling to lessen the pain.

Find out how you can bulk load CSV, TSV, JSON and XML files with Sling CLI without having to manually define the destination table DDL.

Ingesting Data Files Today

For as long as data practitioners have been loading files into Relational Database Management Systems (RDBMS), it has been necessary to first manually create the destination table, via Data Definition Language (DDL). This process is definitely not consistent, as it can range from being fairly quick to extremely tedious, especially for massive files with lots of columns. And what about unstructured / semi-structured files? Forget about it when data is deeply nested, it can be a huge pain to extract the values into their own proper columns (and determine their proper type).

Manually Defining the DDL

Let’s go through a typically scenario doing this today. Normally, a tool that would allow you to ingest a CSV file would need to first sample the first hundred or so rows, and have you confirm the column types. Now imagine if this files has 50+ columns! Ok, no biggie, you confirm the column types and the tool starts ingesting the 10 million row file. It’s chugging along well, and 2 million rows in, column 46, which was typed as an integer, suddenly encounters a string value, and the database errors out. Does that sound familiar? It’s a nightmare dealing with this, since the types determined from a sample cannot be guaranteed throughout the rest of the rows in the file. Thankfully, there is a nifty tool called sling.

Using Sling CLI to Load files into your Databases

Sling CLI is a powerful tool which allows you to efficiently move data from files into databases, from databases to databases, and from databases to files. It also accepts content via the standard input (stdin) and can output via the standard output (stdout) as well. In the next few sections, we will go over a few use-cases of ingesting files into different databases using Sling CLI. You'll notice that we do not have to define a table DDL. Sling automatically does this for us, as it continuously profiles the column values and can adjust the column type as required.

Furthermore, each database system has their own Bulk loading methodology. For example, PostgreSQL has the COPY function where the client streams in records, Snowflake (SF) has the COPY INTO function where SF ingests quickly from a Stage location, and BigQuery have its own APIs for Bulk Loading (such as from Google Cloud Storage or locally from a SDK client stream). Sling ensures to follow each of those methodologies so that data is ingested as efficiently as possible in a streaming fashion (holding small amount in memory).

OK, let us switch gears and get into the meat. We will go over the following use-cases:

  • Loading a Google Cloud Storage CSV file into a PostgreSQL Database
  • Loading a Local TSV File (via stdin) into a Snowflake Database
  • Loading an Amazon S3 JSON file into a BigQuery Database

But first thing's first, we must install sling on our machine and configure our credentials

Installing sling on our machine

Since it is built in Go, it is offered as a binary for whatever system you are using.

# On Mac
brew install slingdata-io/sling/sling

# On Windows Powershell
scoop bucket add org https://github.com/slingdata-io/scoop-sling.git
scoop install sling

# Using Python Wrapper via pip
pip install sling

If you have a Linux system or desire to download the binary manually, please head over here. Once installed, you should be able to run the sling command.

Configuring our credentials

The next step is to set our credentials file (env.yaml), which needs to be located in the Sling home folder. The env.yaml is sort of a central location to store all our local credentials. If you want to try one of the use-case demonstrated, simply ensure that the needed connections are in the env.yaml file. For our demonstration purposes, the below env.yaml file contains the following connections:

  • DO_SPACES: a S3 Compatible bucket file system offered by DigitalOcean. This would also work with any other S3-compatible product such as CloudFlare R2, MinIO, Backblaze B2, and many others including AWS S3.
  • GOOGLE_STORAGE: a Google Cloud Storage bucket
  • POSTGRES: a PostgreSQL database. This could be any PostgreSQL database, such as Amazon RDS, Supabase, Fly.io, DigitalOcean, or your own local instance!
  • SNOWFLAKE: a Snowflake warehouse instance.
  • BIGQUERY: a BigQuery warehouse instance.
env.yaml
connections:
  DO_SPACES:
    type: s3
    bucket: sling-bucket
    endpoint: nyc3.digitaloceanspaces.com
    access_key_id: "my_access_key_id"
    secret_access_key: "my_secret_access_key"

  GOOGLE_STORAGE:
    type: gs
    bucket: sling-bucket
    gc_key_file: /Users/fritz/.sling/sling-project-123-ce219ceaef9512.json

  POSTGRES:
    type: postgres
    username: fritz
    host: 10.13.123.234 
    password: "mypass123"
    port: 5432
    database: postgres_db
    schema: public

  BIGQUERY:
    type: bigquery
    project: sling-project-123
    location: US
    dataset: public
    gc_key_file: /Users/fritz/.sling/sling-project-123-ce219ceaef9512.json

  SNOWFLAKE:
    type: snowflake
    account: tzb01234.us-east-1
    username: sling
    password: "mypass123"
    database: sling
    schema: public

Once the credentials are set, we can list our connections and test them with the sling conns command:

$ sling conns list
+------------------+------------------+-------------------+
| CONN NAME        | CONN TYPE        | SOURCE            |
+------------------+------------------+-------------------+
| BIGQUERY         | DB - BigQuery    | sling env yaml    |
| DO_SPACES        | FileSys - S3     | sling env yaml    |
| GOOGLE_STORAGE   | FileSys - Google | sling env yaml    |
| POSTGRES         | DB - PostgreSQL  | sling env yaml    |
| SNOWFLAKE        | DB - Snowflake   | sling env yaml    |
+------------------+------------------+-------------------+

$ sling conns test BIGQUERY
3:53PM INF success!

$ sling conns test DO_SPACES
3:54PM INF success!

...

Great, now that we have tested our connections, let's go through a few examples. Remember, if you are facing any issues with accessing your connections and successfully testing them, feel free to email us (support @ slingdata.io).

Loading a Google Cloud Storage CSV file into a PostgreSQL Database

Let us say that we need to load files from Google Cloud Storage (GCS) into a PostgreSQL instance. The first step to get this flowing is to set our credentials file, as described above.

Now let us test our connections.

$ sling conns test GOOGLE_STORAGE
10:52AM INF success!

$ sling conns test POSTGRES
10:52AM INF success!

Let us discover the available file streams with the sling conns discover command:

$ sling conns discover GOOGLE_STORAGE
10:52AM INF Found 2 streams:
 - gs://sling-bucket/LargeDataset.csv
 - gs://sling-bucket/test_file.csv

Great, let's run our task. We want to copy the CSV file gs://sling-bucket/test_file.csv into table public.my_table, so we can run this:

$ sling run --src-conn GOOGLE_STORAGE --src-stream gs://sling-bucket/test_file.csv --tgt-conn POSTGRES --tgt-object public.my_table --mode full-refresh
11:01AM INF connecting to target database (postgres)
11:01AM INF reading from source file system (gs)
11:01AM INF writing to target database [mode: full-refresh]
11:01AM INF streaming data
11:01AM INF dropped table public.my_table
11:01AM INF created table public.my_table
11:01AM INF inserted 18 rows in 1 sec
11:01AM INF execution succeeded

We could run the same task by using a YAML config file, as well as enable debug mode (with flag -d) for more verbose logging:

# Let's create our task configuration file
$ echo '
source:
  conn: GOOGLE_STORAGE
  stream: "gs://sling-bucket/test_file.csv"
target:
  conn: POSTGRES
  object: public.my_table
mode: full-refresh
' > /tmp/my_task.yaml

# we are also providing the `-d` flag for debug mode.
# this allows us to have more logging details.
$ sling run -d -c /tmp/my_task.yaml
11:04AM DBG type is file-db
11:04AM INF connecting to target database (postgres)
11:04AM INF reading from source file system (gs)
11:04AM DBG reading datastream from gs://sling-bucket/test_file.csv
11:04AM INF writing to target database [mode: full-refresh]
11:04AM DBG drop table if exists public.my_table_tmp cascade
11:04AM DBG table public.my_table_tmp dropped
11:04AM DBG create table if not exists public.my_table_tmp ("id" bigint,
"first_name" text,
"last_name" text,
"email" text,
"target" bool,
"create_dt" timestamp,
"rating" numeric)
11:04AM INF streaming data
11:04AM DBG select count(1) cnt from public.my_table_tmp
11:04AM DBG drop table if exists public.my_table cascade
11:04AM DBG table public.my_table dropped
11:04AM INF dropped table public.my_table
11:04AM DBG create table if not exists public.my_table ("id" integer,
"first_name" varchar(255),
"last_name" varchar(255),
"email" varchar(255),
"target" bool,
"create_dt" timestamp,
"rating" numeric)
11:04AM INF created table public.my_table
11:04AM DBG insert into "public"."my_table" ("id", "first_name", "last_name", "email", "target", "create_dt", "rating") select "id", "first_name", "last_name", "email", "target", "create_dt", "rating" from "public"."my_table_tmp"
11:04AM DBG inserted rows into `public.my_table` from temp table `public.my_table_tmp`
11:04AM INF inserted 18 rows in 1 secs [17 r/s]
11:04AM DBG drop table if exists public.my_table_tmp cascade
11:04AM DBG table public.my_table_tmp dropped
11:04AM INF execution succeeded

Notice that since we enabled debug mode with -d, we can see how the table DDL has been automatically determined:

create table if not exists public.my_table ("id" integer,
"first_name" varchar(255),
"last_name" varchar(255),
"email" varchar(255),
"target" bool,
"create_dt" timestamp,
"rating" numeric)

If you'd like to see more details on configuration options, see here. If you are thinking of synching many tables, it's probably best to use a replication configuration file, which is also a YAML file.

Ingesting a Local TSV File (via stdin) into a Snowflake Database

Ingesting a file via stdin is super easy. As expected, you can just pipe your file stream Unix style and execute a sling run command specifying the target connection and destination object details.

# let's first test our connection
$ sling conns test SNOWFLAKE
11:09AM INF success!

# Let's pipe our file into a new table
$ cat /tmp/accounts.tsv | sling run --tgt-conn SNOWFLAKE --tgt-object sling.accounts --mode full-refresh
11:09AM INF connecting to target database (snowflake)
11:09AM INF reading from stream (stdin)
11:09AM INF writing to target database [mode: full-refresh]
11:09AM INF streaming data
11:09AM INF dropped table sling.accounts
11:09AM INF created table sling.accounts
11:09AM INF inserted 94 rows in 5 secs
11:09AM INF execution succeeded

So notice how we didn't specify the delimiter. Whether it's a comma delimited file (CSV) or a tab delimited file (TSV), sling will attempt to auto-detect the delimiter (when it's a |, ,, <tab> or ;). If you'd like to manually specify the delimiter, you can do so as shown below by adding the --src-options flag:

$ cat /tmp/accounts.tsv | sling run --tgt-conn SNOWFLAKE --tgt-object sling.accounts --mode full-refresh --src-options '{"delimiter": "\t"}'
11:09AM INF connecting to target database (snowflake)
11:09AM INF reading from stream (stdin)
11:09AM INF writing to target database [mode: full-refresh]
11:09AM INF streaming data
11:09AM INF dropped table sling.accounts
11:09AM INF created table sling.accounts
11:09AM INF inserted 94 rows in 5 secs
11:09AM INF execution succeeded

To get a full list of advanced options, please head over here.

Loading DigitalOcean Spaces (S3) JSON files into a BigQuery Warehouse

In our third example, we will be ingesting JSON files from a DigitalOcean Spaces bucket into a BigQuery instance. Let us first discover our files while applying a filter for a specific folder with the --folder flag.

$ sling conns discover DO_SPACES --folder s3://sling-bucket/02-05-2022
4:36PM INF Found 21 streams:
 - s3://sling-bucket/02-05-2022/1644049120.1uXKxCrhN2WGAt2fojy6k2fqDSb.26879111-e935-4957-a99e-6b6241e6a84a.json.gz
 - s3://sling-bucket/02-05-2022/1644049299.1uXKxCrhN2WGAt2fojy6k2fqDSb.0185f4a2-fe46-4ea6-bb2b-58cee66fa14d.json.gz
 - s3://sling-bucket/02-05-2022/1644049390.1uXKxCrhN2WGAt2fojy6k2fqDSb.323c8d1b-bc66-4d17-8a26-6fde546cf54f.json.gz
 - s3://sling-bucket/02-05-2022/1644049480.1uXKxCrhN2WGAt2fojy6k2fqDSb.73cf9b8c-be06-431d-98bd-c440932d530d.json.gz
 - s3://sling-bucket/02-05-2022/1644049571.1uXKxCrhN2WGAt2fojy6k2fqDSb.68fbbf20-8a87-4836-95c5-5a4b762c97cb.json.gz
 - s3://sling-bucket/02-05-2022/1644049662.1uXKxCrhN2WGAt2fojy6k2fqDSb.23397ceb-c55b-46d9-bdf7-063ce6c54cdd.json.gz
 - s3://sling-bucket/02-05-2022/1644049752.1uXKxCrhN2WGAt2fojy6k2fqDSb.d23a1079-330e-4009-a18e-ca3ac1d087f2.json.gz
 - s3://sling-bucket/02-05-2022/1644049933.1uXKxCrhN2WGAt2fojy6k2fqDSb.be115fa8-4a75-4353-a5d7-9b1ec9e0e792.json.gz
 - s3://sling-bucket/02-05-2022/1644050114.1uXKxCrhN2WGAt2fojy6k2fqDSb.191e971e-7a18-4faa-8ccd-ddca99fe934c.json.gz
 - s3://sling-bucket/02-05-2022/1644063457.1uXKxCrhN2WGAt2fojy6k2fqDSb.e68674d0-d14c-4277-aef2-88bf778f6bb5.json.gz
 - s3://sling-bucket/02-05-2022/1644063818.1uXKxCrhN2WGAt2fojy6k2fqDSb.8545e07c-3881-4dcd-9976-b860ed85d7c9.json.gz
 - s3://sling-bucket/02-05-2022/1644064268.1uXKxCrhN2WGAt2fojy6k2fqDSb.695c8516-6d8c-46ce-ac01-9d89b4dd13e8.json.gz
 - s3://sling-bucket/02-05-2022/1644064359.1uXKxCrhN2WGAt2fojy6k2fqDSb.cd481db0-2066-4179-ba8c-8dd5f45e190e.json.gz
 - s3://sling-bucket/02-05-2022/1644097266.1uXKxCrhN2WGAt2fojy6k2fqDSb.7b06270d-5c24-40fe-958b-f2175f8018f3.json.gz
 - s3://sling-bucket/02-05-2022/1644097446.1uXKxCrhN2WGAt2fojy6k2fqDSb.18e17223-b55e-4283-a7ce-84af1f525435.json.gz
 - s3://sling-bucket/02-05-2022/1644097537.1uXKxCrhN2WGAt2fojy6k2fqDSb.4bd41934-7be2-4b10-8539-580c54092f73.json.gz
 - s3://sling-bucket/02-05-2022/1644097628.1uXKxCrhN2WGAt2fojy6k2fqDSb.49facabd-6a6f-410d-b4fc-2aebf8ccdad8.json.gz
 - s3://sling-bucket/02-05-2022/1644098354.1uXKxCrhN2WGAt2fojy6k2fqDSb.865bf8b5-7361-4779-8199-6ec20543058d.json.gz
 - s3://sling-bucket/02-05-2022/1644099094.1uXKxCrhN2WGAt2fojy6k2fqDSb.2af2cfb4-8371-4938-b560-e282f8c94d16.json.gz
 - s3://sling-bucket/02-05-2022/1644099185.1uXKxCrhN2WGAt2fojy6k2fqDSb.538e1f27-306f-4942-b79d-c012b7e9518f.json.gz
 - s3://sling-bucket/02-05-2022/1644099275.1uXKxCrhN2WGAt2fojy6k2fqDSb.55cf698c-41c7-4f1b-89df-2567506cd02b.json.gz

Great, now let's ingest all 21 JSON files in that folder, without flattening the nested columns (with debug mode enabled, flag -d):

$ sling run -d --src-conn DO_SPACES --src-stream s3://sling-bucket/02-05-2022/ --tgt-conn BIGQUERY --tgt-object public.json_files --mode full-refresh
4:39PM DBG type is file-db
4:39PM INF connecting to target database (bigquery)
4:39PM INF reading from source file system (s3)
4:39PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644049120.1uXKxCrhN2WGAt2fojy6k2fqDSb.26879111-e935-4957-a99e-6b6241e6a84a.json.gz
4:39PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644049299.1uXKxCrhN2WGAt2fojy6k2fqDSb.0185f4a2-fe46-4ea6-bb2b-58cee66fa14d.json.gz
4:39PM INF writing to target database [mode: full-refresh]
4:39PM DBG drop table if exists public.json_files_tmp
4:39PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644049390.1uXKxCrhN2WGAt2fojy6k2fqDSb.323c8d1b-bc66-4d17-8a26-6fde546cf54f.json.gz
4:39PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644049480.1uXKxCrhN2WGAt2fojy6k2fqDSb.73cf9b8c-be06-431d-98bd-c440932d530d.json.gz
4:39PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644049571.1uXKxCrhN2WGAt2fojy6k2fqDSb.68fbbf20-8a87-4836-95c5-5a4b762c97cb.json.gz
4:39PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644049662.1uXKxCrhN2WGAt2fojy6k2fqDSb.23397ceb-c55b-46d9-bdf7-063ce6c54cdd.json.gz
4:39PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644049752.1uXKxCrhN2WGAt2fojy6k2fqDSb.d23a1079-330e-4009-a18e-ca3ac1d087f2.json.gz
4:39PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644049933.1uXKxCrhN2WGAt2fojy6k2fqDSb.be115fa8-4a75-4353-a5d7-9b1ec9e0e792.json.gz
4:39PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644050114.1uXKxCrhN2WGAt2fojy6k2fqDSb.191e971e-7a18-4faa-8ccd-ddca99fe934c.json.gz
4:39PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644063457.1uXKxCrhN2WGAt2fojy6k2fqDSb.e68674d0-d14c-4277-aef2-88bf778f6bb5.json.gz
4:39PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644063818.1uXKxCrhN2WGAt2fojy6k2fqDSb.8545e07c-3881-4dcd-9976-b860ed85d7c9.json.gz
4:39PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644064268.1uXKxCrhN2WGAt2fojy6k2fqDSb.695c8516-6d8c-46ce-ac01-9d89b4dd13e8.json.gz
4:39PM DBG table public.json_files_tmp dropped
4:39PM DBG create table public.json_files_tmp (`data` json)
4:39PM INF streaming data
4:39PM INF importing into bigquery via local storage
4:39PM DBG writing to /var/folders/49/1zc24t595j79t5mw7_t9gtxr0000gn/T/bigquery/public.json_files_tmp/2022-11-26T163956.037
4:39PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644064359.1uXKxCrhN2WGAt2fojy6k2fqDSb.cd481db0-2066-4179-ba8c-8dd5f45e190e.json.gz
4:39PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644097266.1uXKxCrhN2WGAt2fojy6k2fqDSb.7b06270d-5c24-40fe-958b-f2175f8018f3.json.gz
4:39PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644097446.1uXKxCrhN2WGAt2fojy6k2fqDSb.18e17223-b55e-4283-a7ce-84af1f525435.json.gz
4:39PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644097537.1uXKxCrhN2WGAt2fojy6k2fqDSb.4bd41934-7be2-4b10-8539-580c54092f73.json.gz
4:39PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644097628.1uXKxCrhN2WGAt2fojy6k2fqDSb.49facabd-6a6f-410d-b4fc-2aebf8ccdad8.json.gz
4:39PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644098354.1uXKxCrhN2WGAt2fojy6k2fqDSb.865bf8b5-7361-4779-8199-6ec20543058d.json.gz
4:39PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644099094.1uXKxCrhN2WGAt2fojy6k2fqDSb.2af2cfb4-8371-4938-b560-e282f8c94d16.json.gz
4:39PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644099185.1uXKxCrhN2WGAt2fojy6k2fqDSb.538e1f27-306f-4942-b79d-c012b7e9518f.json.gz
4:39PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644099275.1uXKxCrhN2WGAt2fojy6k2fqDSb.55cf698c-41c7-4f1b-89df-2567506cd02b.json.gz
4:39PM DBG Loading /var/folders/49/1zc24t595j79t5mw7_t9gtxr0000gn/T/bigquery/public.json_files_tmp/2022-11-26T163956.037/part.01.0001.csv.gz
4:40PM DBG select count(1) cnt from public.json_files_tmp
4:40PM DBG drop table if exists public.json_files
4:40PM DBG table public.json_files dropped
4:40PM INF dropped table public.json_files
4:40PM DBG create table public.json_files (`data` json)
4:40PM INF created table public.json_files
4:40PM DBG insert into `public`.`json_files` (`data`) select `data` from `public`.`json_files_tmp`
4:40PM DBG inserted rows into `public.json_files` from temp table `public.json_files_tmp`
4:40PM INF inserted 27 rows in 22 secs [1 r/s]
4:40PM DBG drop table if exists public.json_files_tmp
4:40PM DBG table public.json_files_tmp dropped
4:40PM INF execution succeeded

With can notice that there was 1 column in the created table, since by default sling does not flatten the nested column values:

create table public.json_files (`data` json)

Now let's ingest all files again, this time adding the {"flatten": true} for the --src-options flag in order to flatten the nested columns into their own columns:

$ sling run -d --src-conn DO_SPACES --src-stream s3://sling-bucket/02-05-2022/ --src-options '{"flatten": true}' --tgt-conn BIGQUERY --tgt-object public.json_files --mode full-refresh
4:49PM DBG type is file-db
4:49PM INF connecting to target database (bigquery)
4:49PM INF reading from source file system (s3)
4:49PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644049120.1uXKxCrhN2WGAt2fojy6k2fqDSb.26879111-e935-4957-a99e-6b6241e6a84a.json.gz
4:49PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644049299.1uXKxCrhN2WGAt2fojy6k2fqDSb.0185f4a2-fe46-4ea6-bb2b-58cee66fa14d.json.gz
4:49PM INF writing to target database [mode: full-refresh]
4:49PM DBG drop table if exists public.json_files_tmp
4:49PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644049390.1uXKxCrhN2WGAt2fojy6k2fqDSb.323c8d1b-bc66-4d17-8a26-6fde546cf54f.json.gz
4:49PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644049480.1uXKxCrhN2WGAt2fojy6k2fqDSb.73cf9b8c-be06-431d-98bd-c440932d530d.json.gz
4:49PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644049571.1uXKxCrhN2WGAt2fojy6k2fqDSb.68fbbf20-8a87-4836-95c5-5a4b762c97cb.json.gz
4:49PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644049662.1uXKxCrhN2WGAt2fojy6k2fqDSb.23397ceb-c55b-46d9-bdf7-063ce6c54cdd.json.gz
4:49PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644049752.1uXKxCrhN2WGAt2fojy6k2fqDSb.d23a1079-330e-4009-a18e-ca3ac1d087f2.json.gz
4:49PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644049933.1uXKxCrhN2WGAt2fojy6k2fqDSb.be115fa8-4a75-4353-a5d7-9b1ec9e0e792.json.gz
4:49PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644050114.1uXKxCrhN2WGAt2fojy6k2fqDSb.191e971e-7a18-4faa-8ccd-ddca99fe934c.json.gz
4:49PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644063457.1uXKxCrhN2WGAt2fojy6k2fqDSb.e68674d0-d14c-4277-aef2-88bf778f6bb5.json.gz
4:49PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644063818.1uXKxCrhN2WGAt2fojy6k2fqDSb.8545e07c-3881-4dcd-9976-b860ed85d7c9.json.gz
4:49PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644064268.1uXKxCrhN2WGAt2fojy6k2fqDSb.695c8516-6d8c-46ce-ac01-9d89b4dd13e8.json.gz
4:49PM DBG table public.json_files_tmp dropped
4:49PM DBG create table public.json_files_tmp (`anonymousid` string,
`context__library__name` string,
`context__library__version` string,
`event` string,
`messageid` string,
`originaltimestamp` timestamp,
`properties__application` string,
`properties__cmd` string,
`properties__error` string,
`properties__job_end_time` timestamp,
`properties__job_mode` string,
`properties__job_rows_count` int64,
`properties__job_src_type` string,
`properties__job_start_time` timestamp,
`properties__job_status` string,
`properties__job_tgt_type` string,
`properties__job_type` string,
`properties__os` string,
`properties__version` string,
`receivedat` timestamp,
`request_ip` string,
`rudderid` string,
`sentat` timestamp,
`timestamp` string,
`type` string,
`userid` string)
4:49PM INF streaming data
4:49PM INF importing into bigquery via local storage
4:49PM DBG writing to /var/folders/49/1zc24t595j79t5mw7_t9gtxr0000gn/T/bigquery/public.json_files_tmp/2022-11-26T163956.037
4:49PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644064359.1uXKxCrhN2WGAt2fojy6k2fqDSb.cd481db0-2066-4179-ba8c-8dd5f45e190e.json.gz
4:49PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644097266.1uXKxCrhN2WGAt2fojy6k2fqDSb.7b06270d-5c24-40fe-958b-f2175f8018f3.json.gz
4:49PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644097446.1uXKxCrhN2WGAt2fojy6k2fqDSb.18e17223-b55e-4283-a7ce-84af1f525435.json.gz
4:49PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644097537.1uXKxCrhN2WGAt2fojy6k2fqDSb.4bd41934-7be2-4b10-8539-580c54092f73.json.gz
4:49PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644097628.1uXKxCrhN2WGAt2fojy6k2fqDSb.49facabd-6a6f-410d-b4fc-2aebf8ccdad8.json.gz
4:49PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644098354.1uXKxCrhN2WGAt2fojy6k2fqDSb.865bf8b5-7361-4779-8199-6ec20543058d.json.gz
4:49PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644099094.1uXKxCrhN2WGAt2fojy6k2fqDSb.2af2cfb4-8371-4938-b560-e282f8c94d16.json.gz
4:49PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644099185.1uXKxCrhN2WGAt2fojy6k2fqDSb.538e1f27-306f-4942-b79d-c012b7e9518f.json.gz
4:49PM DBG reading datastream from s3://sling-bucket/02-05-2022/1644099275.1uXKxCrhN2WGAt2fojy6k2fqDSb.55cf698c-41c7-4f1b-89df-2567506cd02b.json.gz
4:49PM DBG Loading /var/folders/49/1zc24t595j79t5mw7_t9gtxr0000gn/T/bigquery/public.json_files_tmp/2022-11-26T163956.037/part.01.0001.csv.gz
4:50PM DBG select count(1) cnt from public.json_files_tmp
4:50PM DBG drop table if exists public.json_files
4:50PM DBG table public.json_files dropped
4:50PM INF dropped table public.json_files
4:50PM DBG create table public.json_files (`anonymousid` string,
`context__library__name` string,
`context__library__version` string,
`event` string,
`messageid` string,
`originaltimestamp` timestamp,
`properties__application` string,
`properties__cmd` string,
`properties__error` string,
`properties__job_end_time` timestamp,
`properties__job_mode` string,
`properties__job_rows_count` int64,
`properties__job_src_type` string,
`properties__job_start_time` timestamp,
`properties__job_status` string,
`properties__job_tgt_type` string,
`properties__job_type` string,
`properties__os` string,
`properties__version` string,
`receivedat` timestamp,
`request_ip` string,
`rudderid` string,
`sentat` timestamp,
`timestamp` string,
`type` string,
`userid` string)
4:50PM INF created table public.json_files
4:50PM DBG inserted rows into `public.json_files` from temp table `public.json_files_tmp`
4:50PM INF inserted 27 rows in 22 secs [1 r/s]
4:50PM DBG drop table if exists public.json_files_tmp
4:50PM DBG table public.json_files_tmp dropped
4:50PM INF execution succeeded

With can notice that there were many columns in the created table, for each nested key. There is a __ separator between the parent / child columns. Nifty huh!?

create table public.json_files (`anonymousid` string,
`context__library__name` string,
`context__library__version` string,
`event` string,
`messageid` string,
`originaltimestamp` timestamp,
`properties__application` string,
`properties__cmd` string,
`properties__error` string,
`properties__job_end_time` timestamp,
`properties__job_mode` string,
`properties__job_rows_count` int64,
`properties__job_src_type` string,
`properties__job_start_time` timestamp,
`properties__job_status` string,
`properties__job_tgt_type` string,
`properties__job_type` string,
`properties__os` string,
`properties__version` string,
`receivedat` timestamp,
`request_ip` string,
`rudderid` string,
`sentat` timestamp,
`timestamp` string,
`type` string,
`userid` string)

Conclusion

We have demonstrated how easy it is to use the Sling CLI tool to load files from various storage sources. There are even more possibilities and configurations, check out our docs site for more details. If you have any questions, issues or comments, feel free to email us at support @ slingdata.io.