See how to conveniently import JSON Files into MySQL

See how to conveniently import JSON Files into MySQL

Need to flatten and ingest your JSON files? See how we can load JSON files into MySQL from your local drive using Sling.

MySQL and JSON

While MySQL is no longer a young buck, it is surprisingly still a widely-used open-source relational database management system (RDBMS). Some of the key features of MySQL include its ability to support high-performance data handling and large-scale data processing, its support for multiple storage engines, and its ability to be easily integrated with other software and applications.

JSON on the other hand, is a lightweight, human-readable, and easy-to-use data interchange format which is based on a subset of the JavaScript programming language. One of the key advantages of using JSON is that it is easy to work with, both for developers and for machines. It is a hierarchical data format, which means that data is organized into a tree-like structure with nested elements. This makes it easy to represent complex data structures and relationships, and allows for efficient data manipulation and querying. It is used everywhere these days!

Your Loading Situation

So you are in a situation where you not only need to load your JSON files into a MySQL database, but you also need to flatten each file. Well, wonder no more! Sling can be a useful tool to help you accomplish this task, as it is a command-line tool that allows you to efficiently transfer data between files and databases.

By using Sling, it can flatten your JSON file, auto-create your table DDL and then load the data into it from your local drive. This can be a quick and easy way to get your data into a format that is ready for analysis and processing.

Installing Sling CLI

Sling is very easy to install, no matter which operating system you are using. It's written in the go programming language, so it compiles into a single binary file. It also supports many databases. Please see here for the full list of compatible connectors.

# On Mac
brew install slingdata-io/sling/sling

# On Windows Powershell
scoop bucket add org https://github.com/slingdata-io/scoop-sling.git
scoop install sling

# On Linux
curl -LO 'https://github.com/slingdata-io/sling-cli/releases/latest/download/sling_linux_amd64.tar.gz' \
  && tar xf sling_linux_amd64.tar.gz \
  && rm -f sling_linux_amd64.tar.gz \
  && chmod +x sling

Please see here for additional installation options (such as downloading binaries). There is also a Python wrapper library, which is useful if you prefer interacting with Sling inside of Python. Once installed, we should be able to run the sling command.

Loading from our Local Drive

Let us assume that we desire to ingest the following JSON array file, which includes nested objects:

[
 {
   "_id": "638f4cab1c024be3cadd3ca5",
   "isActive": true,
   "balance": "3,148.57",
   "picture": "http://placehold.it/32x32",
   "age": 35,
   "name": "Joann Kim",
   "company": {
     "name": "PROXSOFT",
     "email": "joannkim@proxsoft.com",
     "phone": "+1 (836) 517-2388",
     "address": "951 Ellery Street, Norwood, Palau, 2947",
     "about": "Labore id et sunt cupidatat dolore aute. Sit laborum nulla pariatur nisi dolore consectetur ex exercitation cupidatat ex ex reprehenderit duis. Eiusmod ut aliquip laborum enim proident ex cupidatat ut velit qui amet dolor tempor enim.\r\n",
     "registered": "2020-01-15T02:07:00 +03:00",
     "latitude": -60.00954,
     "longitude": -55.92312
   },
   "tags": [
     "elit",
     "veniam"
   ]
 },
...
]

In our example, our file will be located at path /tmp/records.json. And if you'd like to ingest various similarly structured files inside a folder (say /path/to/my/folder), you could just input that instead of the file path, Sling will read all files in the folder! Only make sure to add the file:// prefix. See below.

# first let's set our MYSQL connection. Sling can pick up connection URLs from environment variables
$ export MYSQL=' mysql://admin:password@mysql.host:3306/mysql'

# let's check and test our MYSQL connection
$ sling conns list
+------------+-------------+---------------+
| CONN NAME  | CONN TYPE   | SOURCE        |
+------------+-------------+---------------+
| MYSQL      | DB - MySQL  | env variable  |
+------------+-------------+---------------+

$ sling conns test MYSQL
11:19PM INF success!

# awesome, now we can run our task
$ sling run --src-stream file:///tmp/records.json --tgt-conn MYSQL --tgt-object mysql.records --mode full-refresh
11:19PM INF connecting to target database (mysql)
11:19PM INF reading from source file system (file)
11:19PM INF writing to target database [mode: full-refresh]
11:19PM INF streaming data
11:19PM INF dropped table mysql.records
11:19PM INF created table mysql.records
11:19PM INF inserted 500 rows in 2 secs
11:19PM INF execution succeeded

How easy was that? Now let's repeat this again with debug mode enabled (flag -d) to see the created table DDL, and this time let's pipe in the data with the cat command:

$ cat /tmp/records.json | sling run -d --tgt-conn MYSQL --tgt-object mysql.records --mode full-refresh
11:20PM INF connecting to target database (mysql)
11:20PM INF reading from stream (stdin)
11:20PM INF writing to target database [mode: full-refresh]
11:20PM DBG drop table if exists mysql.records_tmp
11:20PM DBG table mysql.records_tmp dropped
11:20PM DBG create table if not exists mysql.records_tmp (`data` json)
11:20PM INF streaming data
11:20PM DBG select count(1) cnt from mysql.records_tmp
11:20PM DBG drop table if exists mysql.records
11:20PM DBG table mysql.records dropped
11:20PM INF dropped table mysql.records
11:20PM DBG create table if not exists mysql.records (`data` json)
11:20PM INF created table mysql.records
11:20PM DBG insert into `mysql`.`records` (`data`) select `data` from `mysql`.`records_tmp`
11:20PM DBG inserted rows into `mysql.records` from temp table `mysql.records_tmp`
11:20PM INF inserted 500 rows in 2 secs [170 r/s]
11:20PM DBG drop table if exists mysql.records_tmp
11:20PM DBG table mysql.records_tmp dropped
11:20PM INF execution succeeded

We can see that the DDL used was create table if not exists mysql.records (`data` json).

Flattening our JSON Records

This time, let's flatten the records when ingesting. When we flatten, Sling will create individual columns for each of the keys in the record. We can do so by adding --src-options 'flatten: true' as a flag. See here for all options:

$ sling run -d --src-stream file:///tmp/records.json --src-options 'flatten: true' --tgt-conn MYSQL --tgt-object mysql.records --mode full-refresh
11:23PM INF connecting to target database (mysql)
11:23PM INF reading from source file system (file)
11:23PM DBG reading datastream from /tmp/records.json
11:23PM INF writing to target database [mode: full-refresh]
11:23PM DBG drop table if exists mysql.records_tmp
11:23PM DBG table mysql.records_tmp dropped
11:23PM DBG create table if not exists mysql.records_tmp (`_id` mediumtext,
`age` bigint,
`balance` mediumtext,
`company__about` mediumtext,
`company__address` mediumtext,
`company__email` mediumtext,
`company__latitude` decimal(30,9),
`company__longitude` decimal(30,9),
`company__name` mediumtext,
`company__phone` mediumtext,
`company__registered` mediumtext,
`isactive` char(5),
`name` mediumtext,
`picture` mediumtext,
`tags` json,
`_sling_loaded_at` bigint)
11:23PM INF streaming data
11:23PM DBG select count(1) cnt from mysql.records_tmp
11:23PM DBG drop table if exists mysql.records
11:23PM DBG table mysql.records dropped
11:23PM INF dropped table mysql.records
11:23PM DBG create table if not exists mysql.records (`_id` mediumtext,
`age` bigint,
`balance` mediumtext,
`company__about` mediumtext,
`company__address` mediumtext,
`company__email` mediumtext,
`company__latitude` decimal(30,9),
`company__longitude` decimal(30,9),
`company__name` mediumtext,
`company__phone` mediumtext,
`company__registered` mediumtext,
`isactive` varchar(255),
`name` mediumtext,
`picture` mediumtext,
`tags` json,
`_sling_loaded_at` bigint)
11:23PM INF created table mysql.records
11:23PM DBG insert into `mysql`.`records` (`_id`, `age`, `balance`, `company__about`, `company__address`, `company__email`, `company__latitude`, `company__longitude`, `company__name`, `company__phone`, `company__registered`, `isactive`, `name`, `picture`, `tags`, `_sling_loaded_at`) select `_id`, `age`, `balance`, `company__about`, `company__address`, `company__email`, `company__latitude`, `company__longitude`, `company__name`, `company__phone`, `company__registered`, `isactive`, `name`, `picture`, `tags`, `_sling_loaded_at` from `mysql`.`records_tmp`
11:23PM DBG inserted rows into `mysql.records` from temp table `mysql.records_tmp`
11:23PM INF inserted 500 rows in 3 secs [151 r/s]
11:23PM DBG drop table if exists mysql.records_tmp
11:23PM DBG table mysql.records_tmp dropped
11:23PM INF execution succeeded

Amazing, we can see the DDL now is:

create table if not exists mysql.records (`_id` mediumtext,
`age` bigint,
`balance` mediumtext,
`company__about` mediumtext,
`company__address` mediumtext,
`company__email` mediumtext,
`company__latitude` decimal(30,9),
`company__longitude` decimal(30,9),
`company__name` mediumtext,
`company__phone` mediumtext,
`company__registered` mediumtext,
`isactive` varchar(255),
`name` mediumtext,
`picture` mediumtext,
`tags` json,
`_sling_loaded_at` bigint)

Conclusion

As demonstrated, Sling has a wide compatibility with various storage systems. You can not only ingest JSON files, but CSV, XML files as well as various database systems. If you have any questions, comments and/or facing issues, please feel free to email us at support @ slingdata.io.