Load Data into StarRocks from Any Database

See how to use Sling to load data into StarRocks from MySQL

Introduction

Let's look at how we can easily export load data into a StarRocks from most major databases with Sling, a versatile CLI data integration tool which allows you to quickly extract and load data right from the terminal.

Sling is a tool with a goal of making the experience of ingesting data a positive, even pleasant one. Sling focuses on 3 of data types interfaces:

  • From File Systems to Databases

  • From Databases to Databases

  • From Databases to File Systems

The list of connections that Sling supports continues to grow. You can see the full list here, but it supports all the major platforms including Clickhouse, DuckDB, Google BigQuery, Google BigTable, MariaDB, MongoDB, MotherDuck, MySQL, Oracle, PostgreSQL, Prometheus, Redshift, Snowflake, SQL Server, SQLite, StarRocks and Trino.

StarRocks

StarRocks is a powerful distributed, columnar storage database system designed for real-time analytics. One of the key features is that it supports several table type designs which can meet varying business requirements:

  • Duplicate Key table: each of the records as a separate row
  • Aggregate table: the aggregated record as a row
  • Primary Key table: only the most recently loaded record as a row (based on key)

This makes it suitable for many analytics use cases, such as business intelligence, ad hoc querying, real-time events and even machine learning / AI-driven data processing.

Load Data with Sling

First, let us install Sling. See here on details how to do so. It is usually a simple command, such as:

# On Mac
brew install slingdata-io/sling/sling

# On Windows Powershell
scoop bucket add org https://github.com/slingdata-io/scoop-sling.git
scoop install sling

# On Linux
curl -LO 'https://github.com/slingdata-io/sling-cli/releases/latest/download/sling_linux_amd64.tar.gz' \
  && tar xf sling_linux_amd64.tar.gz \
  && rm -f sling_linux_amd64.tar.gz \
  && chmod +x sling

You should be able to run the sling command at this point.


Next, we'll set up StarRocks. If you don't have a StarRocks instance running, you can quickly launch a development instance on your machine with docker and load data into it.

docker run --rm -p 9030:9030 -p 8030:8030 -p 8040:8040 -it starrocks/allin1-ubuntu

Now we can create a connection for our StarRocks database with Sling and test connectivity. It is important to set the fe_url so we can Stream Load into StarRocks See here for more details on configuration.

$ sling conns set starrocks url="starrocks://root:@localhost:9030/sling" fe_url="http://root:@localhost:8030"
8:01PM INF connection `starrocks` has been set in /Users/me/.sling/env.yaml. Please test with `sling conns test starrocks`

# test our connection
$ sling conns test starrocks
8:01PM INF success!

Great. In this tutorial, we will be extracting data from a MySQL database. We can set up the connection as shown below (see here for more details).

$ sling conns set mysql url="mysql://admin:password@localhost:3306/mysql"
8:02PM INF connection `mysql` has been set in /Users/me/.sling/env.yaml. Please test with `sling conns test mysql`

# test our connection
$ sling conns test mysql
8:02PM INF success!

# discover our tables
$ sling conns discover mysql
+-----+--------------------+----------------------------------------------+-------+---------+
|   # | SCHEMA             | NAME                                         | TYPE  | COLUMNS |
+-----+--------------------+----------------------------------------------+-------+---------+
|   1 | information_schema | ADMINISTRABLE_ROLE_AUTHORIZATIONS            | table |       9 |
|   2 | information_schema | APPLICABLE_ROLES                             | table |       9 |
|   3 | information_schema | CHARACTER_SETS                               | table |       4 |
|   4 | information_schema | CHECK_CONSTRAINTS                            | table |       4 |
|   5 | information_schema | COLLATIONS                                   | table |       7 |
|   6 | information_schema | COLLATION_CHARACTER_SET_APPLICABILITY        | table |       2 |
.....

Great, we're ready to load some data into StarRocks!

Replication

We'll be using a replication to define what tables Sling should load. Replications are the best way to use sling in a reusable manner. The defaults key allows reusing your inputs with the ability to override any of them in a particular stream. Both YAML or JSON files are accepted.

# replication.yaml
source: mysql
target: starrocks

defaults:
    object: main.{stream_schema}_{stream_table}
    mode: full-refresh

streams:
  # all tables in schema `mysql`
  mysql.*:

 # only one table with specific duplicate keys
  finance.account_sales:
    mode: truncate
    target_options:
      table_keys:
        duplicate: [account_id, sale_id]

We can run the replication like this:

sling run -r replication.yaml

Output


09:14PM INF Sling Replication [5 streams] | mysql -> starrocks

09:14PM INF [1 / 5] running stream "mysql"."accounts" 09:14PM INF connecting to source database (mysql) 09:14PM INF connecting to target database (starrocks) 09:14PM INF reading from source database 09:14PM INF writing to target database [mode: full-refresh] 09:14PM INF streaming data 09:14PM INF importing into StarRocks via stream load 09:14PM INF created table main.mysql_accounts 09:14PM INF inserted 62347 rows into main.mysql_accounts in 10 secs [6234 r/s] [20.0 MB] 09:14PM INF execution succeeded

09:14PM INF [2 / 5] running stream "mysql"."orders" 09:14PM INF connecting to source database (mysql) 09:14PM INF connecting to target database (starrocks) 09:14PM INF reading from source database 09:14PM INF writing to target database [mode: full-refresh] 09:14PM INF streaming data 09:14PM INF importing into StarRocks via stream load 09:14PM INF created table main.mysql_orders 09:14PM INF inserted 716540 rows into main.mysql_orders in 40 secs [17,973 r/s] [120 MB] 09:14PM INF execution succeeded

.......

09:15PM INF [5 / 5] running stream "finance"."account_sales" 09:15PM INF connecting to source database (mysql) 09:15PM INF connecting to target database (starrocks) 09:15PM INF reading from source database 09:15PM INF writing to target database [mode: full-refresh] 09:15PM INF streaming data 09:15PM INF importing into StarRocks via stream load 09:15PM INF created table main.finance_account_sales 09:15PM INF inserted 11718 rows into main.finance_account_sales in 0 secs [14,126 r/s] [2.0 MB] 09:15PM INF execution succeeded

09:15PM INF Sling Replication Completed in 1m 34s | mysql -> starrocks | 5 Successes | 0 Failures

So easy! See many more examples here: docs.slingdata.io/sling-cli/run/examples

Conclusion

We went over on how easy it was to install Sling, and export data from MySQL into StarRocks. Feel free to check out other examples here: docs.slingdata.io.