Sling API Specs: Pull Data From Any REST API With YAML
Configure REST API extraction the same way you configure database replication

A technical data engineer / passionate technologist who loves tackling interesting problems with code, exploration & persistence.
Introducing Sling API Specs
Sling has always made it easy to move data between databases and file systems. Postgres to Snowflake, S3 to BigQuery, MySQL to Iceberg. One YAML file, one command, done. But when people asked us about pulling data from REST APIs, the answer used to be "you'll need a separate tool for that."
Not anymore.
API specs are Sling's way of describing a REST API in YAML. Once you have a spec, the API behaves like any other source. You point a replication at it, and Sling pulls the data into Postgres, Snowflake, S3, or whatever target you want. Same sling run command. Same configuration style. Same incremental sync, deduplication, and column typing you already use.
This post walks through what an API spec looks like, how it works under the hood, and how to use the official Stripe spec to load Stripe data into a database in about three minutes.
Why YAML for APIs
REST APIs all do roughly the same thing: send an HTTP request, parse the response, follow pagination links, repeat. The boring parts are nearly identical across providers. What changes is the URL pattern, the auth method, the JSON shape, and whatever pagination convention that particular vendor settled on years ago.
A spec captures those differences in one file. Once it's written, you don't think about it again.
name: "Example API"
description: "A minimal Sling API spec"
defaults:
state:
base_url: https://api.example.com/v1
request:
headers:
Accept: "application/json"
Authorization: "Bearer {secrets.api_token}"
endpoints:
users:
request:
url: '{state.base_url}/users'
parameters:
page: '{state.page}'
limit: '{state.limit}'
state:
page: 1
limit: 100
pagination:
next_state:
page: '{state.page + 1}'
stop_condition: "length(response.records) < state.limit"
response:
records:
jmespath: "data.users[]"
primary_key: ["id"]
A few things to notice:
defaultsapplies to every endpoint unless an endpoint overrides it. Auth headers go here so you don't repeat them.stateis mutable per endpoint. Each request can read and update it, which is how pagination works.pagination.next_statedescribes how to advance.stop_conditiondescribes when to stop.response.records.jmespathtells Sling where the actual rows live in the JSON payload.primary_keylets Sling deduplicate when you sync incrementally.
That's most of the surface area. There's more (queues, iterate blocks, OAuth2, sync state, response rules), but you can write a working spec without any of it.
A real example: Stripe
Sling ships with an official Stripe spec, so you don't actually have to write one yourself. But the Stripe spec is also a good way to see how the moving parts fit together, because Stripe touches almost every feature of the format. Cursor pagination, incremental sync, parent-child endpoints, queues, response processors.
Here's the top of stripe.yaml:
name: "Stripe API"
description: "API for extracting data from Stripe payment processing platform."
queues:
- credit_note_ids
- customer_ids
- invoice_ids
defaults:
state:
base_url: https://api.stripe.com/v1
limit: 100
anchor_date: '{coalesce(date_parse(inputs.anchor_date), date_add(now(), -1, "year"))}'
anchor_unix: '{date_format(state.anchor_date, "%s")}'
request:
method: "GET"
headers:
Accept: "application/json"
Authorization: Bearer {secrets.api_key}
Stripe-Version: '{coalesce(secrets.api_version, "2023-10-16")}'
parameters:
limit: '{state.limit}'
starting_after: '{state.starting_after}'
created[gt]: '{state.created_gt}'
rate: 20
concurrency: 5
response:
records:
jmespath: "data[]"
primary_key: ["id"]
pagination:
next_state:
starting_after: '{jmespath(response.records, "[-1].id")}'
stop_condition: jmespath(response.json, "has_more") == false || length(response.records) == 0
The defaults block does a lot of work. Every endpoint inherits the auth header, the rate limit (20 req/sec), the concurrency (5 parallel requests), the pagination strategy (Stripe's starting_after cursor), and the basic response shape. Endpoints below this can be tiny.
Look at customer:
customer:
description: "Retrieve list of customers"
docs: https://docs.stripe.com/api/customers
state:
created_gt: '{coalesce(sync.last_created, state.anchor_unix)}'
sync: [last_created]
request:
url: '{state.base_url}/customers?expand[]=data.discount'
response:
records:
jmespath: "data[]"
primary_key: ["id"]
update_key: "created"
processors:
- expression: "date_parse(record.created)"
output: "record.created_date"
- expression: "record.created"
output: "state.last_created"
aggregation: "maximum"
- expression: "record.id"
output: "queue.customer_ids"
A handful of things are happening in this block:
created_gtreads fromsync.last_created, which Sling persists between runs. First run uses the anchor date (one year ago by default). Every run after picks up where the last left off.processorsrun on each record. The first one parses Stripe's Unix timestamp into a real date. The second tracks the maximumcreatedvalue so the next run knows where to resume.- The third processor pushes every customer ID onto a queue, which a different endpoint can consume.
The queue trick is how you handle parent-child relationships. customer_balance_transaction reads from that queue:
customer_balance_transaction:
description: "Retrieve customer balance transactions"
docs: https://docs.stripe.com/api/customers/balance_transactions
iterate:
over: "queue.customer_ids"
into: "state.customer_id"
request:
url: '{state.base_url}/customers/{state.customer_id}/balance_transactions'
response:
processors:
- expression: "state.customer_id"
output: "record.customer_id"
For each customer ID that came out of the parent endpoint, Sling fires a request to fetch that customer's balance transactions, then stamps the customer ID onto each row so you can join later. No glue scripts, no Python loop. The spec describes the relationship and Sling executes it.
Running it
You don't need to write any of this to use Stripe. The spec ships with Sling. You just point a connection at it.
Add an entry to ~/.sling/env.yaml:
connections:
stripe:
type: api
spec: stripe
secrets:
api_key: sk_live_xxxxxxxxxxxxxxxx
my_postgres:
url: postgres://user:pass@host:5432/mydb
Test the connection:
$ sling conns test stripe
This downloads the latest Stripe spec from the Sling registry, runs a single-record probe against every endpoint, and reports back. On my machine it took about 90 seconds against a real Stripe account. The output looks like this:
INF testing endpoint: "customer_subscription_update"
INF testing endpoint: "customer_update"
INF testing endpoint: "invoice_update"
INF testing endpoint: "payment_method_update"
INF testing endpoint: "payout_update"
INF testing endpoint: "refund_update"
INF testing endpoint: "invoice_line_item"
INF testing endpoint: "setup_attempt"
INF testing endpoint: "subscription_item"
INF success!
You can list available endpoints:
$ sling conns discover stripe
+----+------------------------------+
| # | ENDPOINT |
+----+------------------------------+
| 1 | account |
| 2 | charge |
| 3 | coupon |
| 4 | credit_note |
| 5 | customer |
| 6 | customer_balance_transaction |
| 7 | dispute |
| 8 | event |
| 9 | invoice |
| 10 | invoice_item |
...
Now write a replication. This file syncs every Stripe endpoint into Postgres tables under the stripe schema:
source: stripe
target: my_postgres
defaults:
object: stripe.{stream_name}
mode: incremental
primary_key: [id]
streams:
'*':
The '*' wildcard means "every endpoint." Sling figures out which ones support incremental sync (most of them) and which don't (account info, plans, products), and behaves accordingly.
$ sling run -r stripe_to_postgres.yaml
If you want a single endpoint instead, you can skip the replication file:
$ sling run --src-conn stripe --src-stream customer \
--tgt-conn my_postgres --tgt-object stripe.customer \
--mode full-refresh
I ran this against a small test account and got 5 rows in about a second:
INF connecting to source api (stripe)
INF writing to target file system (file)
INF wrote 5 rows to file:///tmp/stripe-customers.csv in 1 secs [3 r/s]
INF execution succeeded
That's the whole loop. There's no client library to install and no pagination loop to write yourself.
Forking the Stripe spec
The Stripe spec has 30+ endpoints, but maybe you only care about charges and customers. Or maybe Stripe shipped a new endpoint last week and you don't want to wait. You can fork the spec.
After running sling conns test stripe once, the spec lives at ~/.sling/api/specs/stripe.yaml. Copy it:
$ cp ~/.sling/api/specs/stripe.yaml ./my_stripe.spec.yaml
Edit it however you want. Then point your connection at the local file:
connections:
stripe:
type: api
spec: file://./my_stripe.spec.yaml
secrets:
api_key: sk_live_xxxxxxxxxxxx
Specs can also live on S3, GitHub, or any HTTP URL. If your team has internal APIs, drop the spec in a Git repo and reference it by URL. Everyone gets the same definition.
What else specs can do
The Stripe walkthrough covers the common case: a vendor REST API with cursor pagination and a stable schema. Specs handle the weirder cases too:
- OAuth2 flows. Client credentials, refresh tokens, on-the-fly token exchange. The auth section in a spec can describe a multi-step login dance.
- Dynamic endpoints. When the list of resources isn't known until you query the API (think Airtable bases, Salesforce custom objects), you can generate endpoints at runtime.
- Response rules. Retry on 429s with exponential backoff. Stop on a specific error string. Skip records that match a condition.
- Multiple response formats. JSON is the default but specs can parse CSV, XML, and JSON Lines too.
- Setup and teardown sequences. Some APIs need you to create a job, poll it, and then download a result file. Specs can describe that flow as a sequence of requests.
The full reference is in the API spec docs. The structure, authentication, request, response, queue, and dynamic-endpoint pages each cover one corner of the format.
The official spec library
We're maintaining a growing list of official specs you can reference by ID. The current set includes Stripe, HubSpot, Salesforce, GitHub, Shopify, Airtable, Polygon, AWS SES, dbt Cloud, Gmail, and dozens more. The list is at docs.slingdata.io/connections/api-connections. The source for every official spec is in the sling-api-spec GitHub repo if you want to read them or contribute.
If a spec you need isn't there, you can write one yourself, or open a GitHub issue requesting it. We've been adding new ones every couple of weeks.
Why this matters
Most data teams I talk to have at least one Python script somewhere that hits a vendor API, paginates through results, and dumps them in a database. It usually started as a quick fix and turned into a thing somebody owns. The script breaks when the API adds a field, or when the auth token rotates, or when the cron host gets reimaged.
Specs are an attempt to make that kind of script unnecessary. Describe the API once. Run it like any other Sling job. When something breaks, fix the YAML, not the codebase.
You'll need a CLI Pro license or Platform Plan to use API specs, since they're a Pro feature. Everything else in Sling stays open source.
Try the Stripe spec on a test account and let me know what you think. The full docs are at docs.slingdata.io/concepts/api-specs, and the #sling Discord is the fastest way to ask questions or request a new connector.





