Generate API from CSV

Most applications communicate using HTTP APIs, and a vast amount of data is stored in (or can only be exported to) CSV files. How can we get our applications and data talking?

With tap you can upload a CSV and generate a fully managed, documented, secured, monitored, and performant API in minutes.

Let's use tap to generate an HTTP API from an interesting CSV. We'll use Scotland's open contracting public sector procurement data, which is published as a collection of CSVs.

Get set up

open tap and sign in or create your (free) account. This could take a minute to set up your storage bucket and domain if it's your first time signing in.

Next, download the CSV containing all Scotland's tenders (public contracts) hosted by Open Contracting.

Note: We've extracted the file to keep the tutorial streamlined, but you can find the latest data through the "All time" CSV download on the Open Contracting site, which includes a main.csv file containing the tenders.

Add data

After signing in you'll be on the Data page.

Click Create data + to open the data creation form.

Data are the "sources" for all data processing in tap. You can upload any CSV, JSON(L), Parquet, XLSX, or ODS document. JSON documents will be converted into a single-line JSONL file, and XLSX and ODS documents will be extracted into separate CSV sheets.

Note: already have a bunch of data in object storage? Tap works with any S3-compatible object store so talk to us about bringing your own data lake.

Let's call the data ocds_scotland_tenders (for Open Contracting Data Standard).

Click Upload file and select the downloaded CSV. The upload might take a few seconds.

Once the upload completes, tap will automatically try to preview the file. You'll notice that there are a lot of columns and a lot of null values, and we'll improve that later.

For now, click Save to import the selected data.

Once saved, you'll see tap "materialising" the data. This creates an optimised copy of the data in Parquet format to accelerate queries.

Materialisation should only take a few seconds.

Write queries

Switch to the Models page.

Click Create model + to open the model creation form.

Models are SQL SELECT queries that you can use to analyse, transform, join, and enrich your data. Your queries can select and join any of your Data or other Models. tap's SQL engine is powered by Apache DataFusion, a cutting edge, open source, and high performance query engine. See the complete SQL reference for full details.

You can directly download query results (as CSV, JSONL, or Parquet), or save the model for use in other models or to publish as an API (described later).

Click the bar on the right-hand side to open the catalogue. This is a handy reference of all the Data and Models in your tenant, along with details of their schema. Each entry has a clipboard button to copy the SQL identifier for the relation/field.

The SQL editor also has auto-complete. Start typing SELECT * FROM data. (including the trailing period) and it will suggest our ocds_scotland_tenders data set.

Play around with the query if you like, but for the purpose of the tutorial we'll call our model ocds_scotland_tenders_complete and use the following query:

SELECT
data.ocds_scotland_tenders._link,
to_char(data.ocds_scotland_tenders.date, '%Y-%m-%d') AS date,
data.ocds_scotland_tenders.ocid,
data.ocds_scotland_tenders.tender_title AS title,
data.ocds_scotland_tenders.buyer_id,
data.ocds_scotland_tenders."tender_mainProcurementCategory" AS category
FROM
data.ocds_scotland_tenders
WHERE
data.ocds_scotland_tenders.tender_status = 'complete'

See the SQL reference for full documentation of the SQL syntax and features. Note some key features of this query:

  • You can exclude, rename, or introduce new columns based on arbitrary SQL expressions.
  • You can filter the results to prune invalid values or produce a subset of rows. In this case we're only looking at completed tenders.
  • You can use any of the many functions to format and process values.

Go ahead and Preview the query results, and if they're looking good click Save to save the model.

tap understands the connections between data, models, and APIs and prevents changes that would break these data transformation pipelines.

Note: want more features built on data lineage? Talk to ustap is under active development and we want your feedback.

Publish APIs

Let's now switch to the APIs page and publish our model.

Click Create API + to open the API creation form.

APIs allow you to share records from your queries as JSON over HTTPS, secured by API key, with configurable filters that clients can use to select only the data they need. Every programming language and many no- and low-code tools have built-in support for making HTTPS requests and processing JSON data, so publishing an API is a great way of eliminating friction from using data in your applications and tools.

We'll use the path /ocds/scotland/tenders/complete, our ocds_scotland_tenders_complete model will be pre-selected as its our only one.

For filters, we'll uncheck Use defaults and reconfigure them a bit:

  • Remove the date filter.
  • Change the title filter type to Loose text equal.

See the filter reference for more details about the available filter types.

Note: want more filters? Get in touch – we want to help with your use case.

Hit Preview to get a sample of the JSON records.

Now let's go ahead and Save it. After a brief period of materialisation, our API is now ready to use.

Note: your API will be hosted on an auto-generated domain for your tenant. Want to use a custom domain? Let us know!

Open the documentation for your new API in a new tab, but before we explore it let's grab an API key. Switch to the API keys page.

Access to your APIs is secured with API keys. You can manage API keys in tap and distribute them to applications (or 3rd parties) based on what and who should have access to which APIs.

Press Create API key + and give your key an appropriate label. Labels can't be changed later, but you can always create new keys. We recommend using an email address or some other value that clearly identifies the user or application.

Press Create to create your key.

The Password will only be shown once on this screen, but you can always create new keys.

We can now go to the API documentation tab (opened before) to view the Swagger UI for your domain.

Click the Authorize button in the top-right, paste in your API key username and password, and click Authorize to confirm.

You can then Close the authorization box.

Press Try it out and scroll down to see the Execute button. Press the button and see the response appear with the first page of 100 rows. Experiment with the limit, cursor, and filters to see how they behave.

And that's it!

You've just generated a fully managed, documented, secured, monitored, and performant API from a CSV and some SQL.

Looks good? Missing something you need? We need your feedback.

Let us know if you need to get your data and applications talking, or you want to save money on complex, inefficient data platforms and integration tooling. We'd love to discuss and see how tap could help.

tap is built by DigitalSociety – we build bespoke digital tools for complex requirements in web applications, data engineering and cloud.