Data Types
tap uses Arrow, and thus the Arrow type system, for query
execution. The SQL types from
sqlparser-rs
are mapped to Arrow data types according to the following table.
This mapping occurs when defining the schema in a CREATE EXTERNAL TABLE command or when performing a SQL CAST operation.
You can see the corresponding Arrow type for any SQL expression using
the arrow_typeof function. For example:
select arrow_typeof(interval '1 month');
+---------------------------------------------------------------------+
| arrow_typeof(IntervalMonthDayNano("79228162514264337593543950336")) |
+---------------------------------------------------------------------+
| Interval(MonthDayNano) |
+---------------------------------------------------------------------+
You can cast a SQL expression to a specific Arrow type using the arrow_cast function
For example, to cast the output of now() to a Timestamp with second precision:
select arrow_cast(now(), 'Timestamp(Second, None)');
+---------------------+
| now() |
+---------------------+
| 2023-03-03T17:19:21 |
+---------------------+
Character Types
| SQL DataType | Arrow DataType |
|---|---|
CHAR | Utf8 |
VARCHAR | Utf8 |
TEXT | Utf8 |
STRING | Utf8 |
Numeric Types
| SQL DataType | Arrow DataType |
|---|---|
TINYINT | Int8 |
SMALLINT | Int16 |
INT or INTEGER | Int32 |
BIGINT | Int64 |
TINYINT UNSIGNED | UInt8 |
SMALLINT UNSIGNED | UInt16 |
INT UNSIGNED or INTEGER UNSIGNED | UInt32 |
BIGINT UNSIGNED | UInt64 |
FLOAT | Float32 |
REAL | Float32 |
DOUBLE | Float64 |
DECIMAL(precision, scale) | Decimal128(precision, scale) |
Date/Time Types
| SQL DataType | Arrow DataType |
|---|---|
DATE | Date32 |
TIME | Time64(Nanosecond) |
TIMESTAMP | Timestamp(Nanosecond, None) |
INTERVAL | Interval(IntervalMonthDayNano) |
Boolean Types
| SQL DataType | Arrow DataType |
|---|---|
BOOLEAN | Boolean |
Binary Types
| SQL DataType | Arrow DataType |
|---|---|
BYTEA | Binary |
You can create binary literals using a hex string literal such as
X'1234' to create a Binary value of two bytes, 0x12 and 0x34.
Unsupported SQL Types
| SQL Data Type | Arrow DataType |
|---|---|
UUID | Not yet supported |
BLOB | Not yet supported |
CLOB | Not yet supported |
BINARY | Not yet supported |
VARBINARY | Not yet supported |
REGCLASS | Not yet supported |
NVARCHAR | Not yet supported |
CUSTOM | Not yet supported |
ARRAY | Not yet supported |
ENUM | Not yet supported |
SET | Not yet supported |
DATETIME | Not yet supported |
Supported Arrow Types
The following types are supported by the arrow_typeof function:
| Arrow Type |
|---|
Null |
Boolean |
Int8 |
Int16 |
Int32 |
Int64 |
UInt8 |
UInt16 |
UInt32 |
UInt64 |
Float16 |
Float32 |
Float64 |
Utf8 |
LargeUtf8 |
Binary |
Timestamp(Second, None) |
Timestamp(Millisecond, None) |
Timestamp(Microsecond, None) |
Timestamp(Nanosecond, None) |
Time32 |
Time64 |
Duration(Second) |
Duration(Millisecond) |
Duration(Microsecond) |
Duration(Nanosecond) |
Interval(YearMonth) |
Interval(DayTime) |
Interval(MonthDayNano) |
FixedSizeBinary(<len>) (e.g. FixedSizeBinary(16)) |
Decimal128(<precision>, <scale>) e.g. Decimal128(3, 10) |
Decimal256(<precision>, <scale>) e.g. Decimal256(3, 10) |
Looks good? Missing something you need? We need your feedback.
Let us know if you need to get your data and applications talking, or you want to save money on complex, inefficient data platforms and integration tooling. We'd love to discuss and see how tap could help.
tap is built by DigitalSociety – we build bespoke digital tools for complex requirements in web applications, data engineering and cloud.