Data Types
tap uses Arrow, and thus the Arrow type system, for query
execution. The SQL types from
sqlparser-rs
are mapped to Arrow data types according to the following table.
This mapping occurs when defining the schema in a CREATE EXTERNAL TABLE
command or when performing a SQL CAST
operation.
You can see the corresponding Arrow type for any SQL expression using
the arrow_typeof
function. For example:
select arrow_typeof(interval '1 month');
+---------------------------------------------------------------------+
| arrow_typeof(IntervalMonthDayNano("79228162514264337593543950336")) |
+---------------------------------------------------------------------+
| Interval(MonthDayNano) |
+---------------------------------------------------------------------+
You can cast a SQL expression to a specific Arrow type using the arrow_cast
function
For example, to cast the output of now()
to a Timestamp
with second precision:
select arrow_cast(now(), 'Timestamp(Second, None)');
+---------------------+
| now() |
+---------------------+
| 2023-03-03T17:19:21 |
+---------------------+
Character Types
SQL DataType | Arrow DataType |
---|---|
CHAR | Utf8 |
VARCHAR | Utf8 |
TEXT | Utf8 |
STRING | Utf8 |
Numeric Types
SQL DataType | Arrow DataType |
---|---|
TINYINT | Int8 |
SMALLINT | Int16 |
INT or INTEGER | Int32 |
BIGINT | Int64 |
TINYINT UNSIGNED | UInt8 |
SMALLINT UNSIGNED | UInt16 |
INT UNSIGNED or INTEGER UNSIGNED | UInt32 |
BIGINT UNSIGNED | UInt64 |
FLOAT | Float32 |
REAL | Float32 |
DOUBLE | Float64 |
DECIMAL(precision, scale) | Decimal128(precision, scale) |
Date/Time Types
SQL DataType | Arrow DataType |
---|---|
DATE | Date32 |
TIME | Time64(Nanosecond) |
TIMESTAMP | Timestamp(Nanosecond, None) |
INTERVAL | Interval(IntervalMonthDayNano) |
Boolean Types
SQL DataType | Arrow DataType |
---|---|
BOOLEAN | Boolean |
Binary Types
SQL DataType | Arrow DataType |
---|---|
BYTEA | Binary |
You can create binary literals using a hex string literal such as
X'1234'
to create a Binary
value of two bytes, 0x12
and 0x34
.
Unsupported SQL Types
SQL Data Type | Arrow DataType |
---|---|
UUID | Not yet supported |
BLOB | Not yet supported |
CLOB | Not yet supported |
BINARY | Not yet supported |
VARBINARY | Not yet supported |
REGCLASS | Not yet supported |
NVARCHAR | Not yet supported |
CUSTOM | Not yet supported |
ARRAY | Not yet supported |
ENUM | Not yet supported |
SET | Not yet supported |
DATETIME | Not yet supported |
Supported Arrow Types
The following types are supported by the arrow_typeof
function:
Arrow Type |
---|
Null |
Boolean |
Int8 |
Int16 |
Int32 |
Int64 |
UInt8 |
UInt16 |
UInt32 |
UInt64 |
Float16 |
Float32 |
Float64 |
Utf8 |
LargeUtf8 |
Binary |
Timestamp(Second, None) |
Timestamp(Millisecond, None) |
Timestamp(Microsecond, None) |
Timestamp(Nanosecond, None) |
Time32 |
Time64 |
Duration(Second) |
Duration(Millisecond) |
Duration(Microsecond) |
Duration(Nanosecond) |
Interval(YearMonth) |
Interval(DayTime) |
Interval(MonthDayNano) |
FixedSizeBinary(<len>) (e.g. FixedSizeBinary(16) ) |
Decimal128(<precision>, <scale>) e.g. Decimal128(3, 10) |
Decimal256(<precision>, <scale>) e.g. Decimal256(3, 10) |
Looks good? Missing something you need? We need your feedback.
Let us know if you need to get your data and applications talking, or you want to save money on complex, inefficient data platforms and integration tooling. We'd love to discuss and see how tap could help.
tap is built by DigitalSociety – we build bespoke digital tools for complex requirements in web applications, data engineering and cloud.