Skip to content

Commit

Permalink
add VECTOR doc (#18791)
Browse files Browse the repository at this point in the history
* add VECTOR doc
  • Loading branch information
taroface authored and mdlinville committed Aug 9, 2024
1 parent ed1c166 commit 0f50cdb
Show file tree
Hide file tree
Showing 4 changed files with 102 additions and 0 deletions.
1 change: 1 addition & 0 deletions src/current/_includes/v24.2/misc/enterprise-features.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ Feature | Description
[Multi-Region Capabilities]({% link {{ page.version.version }}/multiregion-overview.md %}) | Row-level control over where your data is stored to help you reduce read and write latency and meet regulatory requirements.
[PL/pgSQL]({% link {{ page.version.version }}/plpgsql.md %}) | Use a procedural language in [user-defined functions]({% link {{ page.version.version }}/user-defined-functions.md %}) and [stored procedures]({% link {{ page.version.version }}/stored-procedures.md %}) to improve performance and enable more complex queries.
[Node Map]({% link {{ page.version.version }}/enable-node-map.md %}) | Visualize the geographical distribution of a cluster by plotting its node localities on a world map.
[`VECTOR` type]({% link {{ page.version.version }}/vector.md %}) | Represent data points in multi-dimensional space, using fixed-length arrays of floating-point numbers.

## Recovery and streaming

Expand Down
6 changes: 6 additions & 0 deletions src/current/_includes/v24.2/sidebar-data/sql.json
Original file line number Diff line number Diff line change
Expand Up @@ -1015,6 +1015,12 @@
"urls": [
"/${VERSION}/uuid.html"
]
},
{
"title": "<code>VECTOR</code>",
"urls": [
"/${VERSION}/vector.html"
]
}
]
},
Expand Down
1 change: 1 addition & 0 deletions src/current/v24.2/data-types.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ Type | Description | Example
[`TSQUERY`]({% link {{ page.version.version }}/tsquery.md %}) | A list of lexemes and operators used in [full-text search]({% link {{ page.version.version }}/full-text-search.md %}). | `'list' & 'lexem' & 'oper' & 'use' & 'full' & 'text' & 'search'`
[`TSVECTOR`]({% link {{ page.version.version }}/tsvector.md %}) | A list of lexemes with optional integer positions and weights used in [full-text search]({% link {{ page.version.version }}/full-text-search.md %}). | `'full':13 'integ':7 'lexem':4 'list':2 'option':6 'posit':8 'search':15 'text':14 'use':11 'weight':10`
[`UUID`]({% link {{ page.version.version }}/uuid.md %}) | A 128-bit hexadecimal value. | `7f9c24e8-3b12-4fef-91e0-56a2d5a246ec`
[`VECTOR`]({% link {{ page.version.version }}/vector.md %}) | A fixed-length array of floating-point numbers. | `[1.0, 0.0, 0.0]`

## Data type conversions and casts

Expand Down
94 changes: 94 additions & 0 deletions src/current/v24.2/vector.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
---
title: VECTOR
summary: The VECTOR data type stores fixed-length arrays of floating-point numbers, which represent data points in multi-dimensional space.
toc: true
docs_area: reference.sql
---

{% include enterprise-feature.md %}

{{site.data.alerts.callout_info}}
{% include feature-phases/preview.md %}
{{site.data.alerts.end}}

The `VECTOR` data type stores fixed-length arrays of floating-point numbers, which represent data points in multi-dimensional space. Vector search is often used in AI applications such as Large Language Models (LLMs) that rely on vector representations.

For details on valid `VECTOR` comparison operators, refer to [Syntax](#syntax). For the list of supported `VECTOR` functions, refer to [Functions and Operators]({% link {{ page.version.version }}/functions-and-operators.md %}#pgvector-functions).

{{site.data.alerts.callout_info}}
`VECTOR` functionality is compatible with the [`pgvector`](https://github.com/pgvector/pgvector) extension for PostgreSQL. Vector indexing is **not** supported at this time.
{{site.data.alerts.end}}

## Syntax

A `VECTOR` value is expressed as an [array]({% link {{ page.version.version }}/array.md %}) of [floating-point numbers]({% link {{ page.version.version }}/float.md %}). The array size corresponds to the number of `VECTOR` dimensions. For example, the following `VECTOR` has 3 dimensions:

~~~
[1.0, 0.0, 0.0]
~~~

You can specify the dimensions when defining a `VECTOR` column. This will enforce the number of dimensions in the column values. For example:

~~~ sql
ALTER TABLE foo ADD COLUMN bar VECTOR(3);
~~~

The following `VECTOR` comparison operators are valid:

- `=` (equals). Compare vectors for equality in filtering and conditional queries.
- `<>` (not equal to). Compare vectors for inequality in filtering and conditional queries.
- `<->` (L2 distance). Calculate the Euclidean distance between two vectors, as used in [nearest neighbor search](https://en.wikipedia.org/wiki/Nearest_neighbor_search) and clustering algorithms.
- `<#>` (negative inner product). Calculate the [inner product](https://en.wikipedia.org/wiki/Inner_product_space) of two vectors, as used in similarity searches where the inner product can represent the similarity score.
- `<=>` (cosine distance). Calculate the [cosine distance](https://en.wikipedia.org/wiki/Cosine_similarity) between vectors, such as in text and image similarity measures where the orientation of vectors is more important than their magnitude.

## Size

The size of a `VECTOR` value is variable, but it's recommended to keep values under 1 MB to ensure performance. Above that threshold, [write amplification]({% link {{ page.version.version }}/architecture/storage-layer.md %}#write-amplification) and other considerations may cause significant performance degradation.

## Functions

For the list of supported `VECTOR` functions, refer to [Functions and Operators]({% link {{ page.version.version }}/functions-and-operators.md %}#pgvector-functions).

## Example

Create a table with a `VECTOR` column, specifying `3` dimensions:

{% include_cached copy-clipboard.html %}
~~~ sql
CREATE TABLE items (
category STRING,
vector VECTOR(3),
INDEX (category)
);
~~~

Insert some sample data into the table:

{% include_cached copy-clipboard.html %}
~~~ sql
INSERT INTO items (category, vector) VALUES
('electronics', '[1.0, 0.0, 0.0]'),
('electronics', '[0.9, 0.1, 0.0]'),
('furniture', '[0.0, 1.0, 0.0]'),
('furniture', '[0.0, 0.9, 0.1]'),
('clothing', '[0.0, 0.0, 1.0]');
~~~

Use the [`<->` operator](#syntax) to sort values with the `electronics` category by their similarity to `[1.0, 0.0, 0.0]`, based on geographic distance.

{% include_cached copy-clipboard.html %}
~~~ sql
SELECT category, vector FROM items WHERE category = 'electronics' ORDER BY vector <-> '[1.0, 0.0, 0.0]' LIMIT 5;
~~~

~~~
category | vector
--------------+--------------
electronics | [1,0,0]
electronics | [0.9,0.1,0]
~~~

## See also

- [Functions and Operators]({% link {{ page.version.version }}/functions-and-operators.md %}#pgvector-functions)
- [Data Types]({% link {{ page.version.version }}/data-types.md %})

0 comments on commit 0f50cdb

Please sign in to comment.