Skip to content

Commit

Permalink
docs: Rephrase README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Anush008 committed Jun 25, 2024
1 parent 12c8f88 commit cafb26d
Show file tree
Hide file tree
Showing 10 changed files with 107 additions and 188 deletions.
31 changes: 31 additions & 0 deletions CONFLUENT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Usage with Self-Hosted Kafka

## Installation

1) Download the latest connector zip file from [Github Releases](https://github.com/qdrant/qdrant-kafka/releases).

2) Configure an environment and cluster on Confluent and create a topic to produce messages for.

3) Navigate to the `Connectors` section of the Confluent cluster and click `Add Plugin`. Upload the zip file with the following info.

<img width="687" alt="Screenshot 2024-06-26 at 1 51 26 AM" src="https://github.com/qdrant/qdrant-kafka/assets/46051506/876bcef5-d862-40c6-a0e7-838f1586f222">

4) Once installed, navigate to the connector and set the following configuration values.

<img width="899" alt="Screenshot 2024-06-26 at 1 45 57 AM" src="https://github.com/qdrant/qdrant-kafka/assets/46051506/3999976e-a89a-4a49-b53c-a2e5aee68441">

Replace the placeholder values with your credentials.

5) Add the Qdrant instance host to the allowed networking endpoints.

<img width="764" alt="Screenshot 2024-06-26 at 2 46 16 AM" src="https://github.com/qdrant/qdrant-kafka/assets/46051506/8aefd9c3-0584-4aa5-a70c-37c859f6ee1b">

7) Start the connector.

## Usage

You can now produce messages for the configured topic and they'll be written into the configured Qdrant instance.

<img width="1271" alt="Screenshot 2024-06-26 at 2 50 56 AM" src="https://github.com/qdrant/qdrant-kafka/assets/46051506/3d798780-f236-4ac6-aea0-2b266dda4dba">

Refer to the [message formats](https://github.com/qdrant/qdrant-kafka/blob/main/README.md#message-formats) for the available options when producing messages.
44 changes: 44 additions & 0 deletions KAFKA.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Usage with Self-Hosted Kafka

## Installation

1) Download the latest connector zip file from [Github Releases](https://github.com/qdrant/qdrant-kafka/releases).

2) Refer to the first 3 steps of the [Kafka Quickstart](https://kafka.apache.org/quickstart#quickstart_download) to set up a local Kafka instance and create a topic named `topic_0`.

3) Navigate to the Kafka installation directory.

4) Unzip and copy the `qdrant-kafka-xxx` directory to your Kafka installation's `libs` directory.

5) Update the `connect-standalone.properties` file in your Kafka installation's `config` directory.

```properties
key.converter.schemas.enable=false
value.converter.schemas.enable=false
plugin.path=libs/qdrant-kafka-xxx
```

6) Create a `qdrant-kafka.properties` file in your Kafka installation's `config` directory.
```properties
name=qdrant-kafka
connector.class=io.qdrant.kafka.QdrantSinkConnnector
qdrant.grpc.url=https://xyz-example.eu-central.aws.cloud.qdrant.io:6334
qdrant.api.key=<paste-your-api-key-here>
topics=topic_0
```
7) Start Kafka Connect with the configured properties.
```sh
bin/connect-standalone.sh config/connect-standalone.properties config/qdrant-kafka.properties
```
8) You can now produce messages for the `topic_0` topic and they'll be written into the configured Qdrant instance.

```sh
bin/kafka-console-producer.sh --topic topic_0 --bootstrap-server localhost:9092
> { "collection_name": "{collection_name}", "id": 1, "vector": [ 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8 ], "payload": { "name": "kafka", "description": "Kafka is a distributed streaming platform", "url": "https://kafka.apache.org/" } }
```

Refer to the [message formats](https://github.com/qdrant/qdrant-kafka/blob/main/README.md#message-formats) for the available options when producing messages.
69 changes: 23 additions & 46 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,60 +1,29 @@
# Qdrant Kafka Connector
# Qdrant Connector with Self-Hosted Kafka

Use Qdrant as a sink destination in [Kafka connect](https://docs.confluent.io/platform/current/connect/index.html). Supports streaming dense/sparse vectors into Qdrant collections.
Use Qdrant as a sink destination in [Kafka Connect](https://docs.confluent.io/platform/current/connect/index.html). Supports streaming dense/sparse vectors into Qdrant collections.

## Installation

- Download the latest connector zip file from [Github Releases](https://github.com/qdrant/qdrant-kafka/releases).

- Refer to the first 3 steps of the [Kafka Quickstart](https://kafka.apache.org/quickstart#quickstart_download) to set up a local Kafka instance and create a topic named `topic_0`.

- Navigate to the Kafka installation directory.

- Unzip and copy the `qdrant-kafka-xxx` directories to the `libs` directory of your Kafka installation.

- Update the `connect-standalone.properties` file in the `config` directory of your Kafka installation.

```properties
key.converter.schemas.enable=false
value.converter.schemas.enable=false
plugin.path=libs/qdrant-kafka-xxx
```

- Create a `qdrant-kafka.properties` file in the `config` directory of your Kafka installation.

```properties
name=qdrant-kafka
connector.class=io.qdrant.kafka.QdrantSinkConnnector
qdrant.grpc.url=https://xyz-example.eu-central.aws.cloud.qdrant.io:6334
qdrant.api.key=<paste-your-api-key-here>
topics=topic_0
```

- Start the connector with the configured properties

```sh
bin/connect-standalone.sh config/connect-standalone.properties config/qdrant-kafka.properties
```

## Usage

> [!IMPORTANT]
> Before loading the data using this connector, a collection has to be [created](https://qdrant.tech/documentation/concepts/collections/#create-a-collection) in advance with the appropriate vector dimensions and configurations.
> Qdrant collections have to be [created](https://qdrant.tech/documentation/concepts/collections/#create-a-collection) in advance with the appropriate vector dimensions and configurations.
You can now produce messages with the following command to the `topic_0` topic you created and they'll be streamed to the configured Qdrant instance.
Learn to use the connector with

```sh
bin/kafka-console-producer.sh --topic topic_0 --bootstrap-server localhost:9092
> { "collection_name": "{collection_name}", "id": 1, "vector": [ 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8 ], "payload": { "name": "kafka", "description": "Kafka is a distributed streaming platform", "url": "https://kafka.apache.org/" } }
```
- [Kafka on Confluent Cloud](https://github.com/qdrant/qdrant-kafka/blob/main/CONFLUENT.md)

- [Self-hosted Kafka](https://github.com/qdrant/qdrant-kafka/blob/main/KAFKA.md)

## Message Formats

This sink connector supports ingesting multiple named/unnamed, dense/sparse vectors.
This sink connector supports messages with multiple dense/sparse vectors.

_Click each to expand._

<details>
<summary><b>Unnamed/Default vector</b></summary>

Reference: [Creating a collection with a default vector](https://qdrant.tech/documentation/concepts/collections/#create-a-collection).

```json
{
"collection_name": "{collection_name}",
Expand All @@ -80,7 +49,9 @@ _Click each to expand._
</details>

<details>
<summary><b>Named vector</b></summary>
<summary><b>Named multiple vectors</b></summary>

Reference: [Creating a collection with multiple vectors](https://qdrant.tech/documentation/concepts/collections/#collection-with-multiple-vectors).

```json
{
Expand Down Expand Up @@ -121,11 +92,12 @@ _Click each to expand._
<details>
<summary><b>Sparse vectors</b></summary>

Reference: [Creating a collection with sparse vectors](https://qdrant.tech/documentation/concepts/collections/#collection-with-sparse-vectors).

```json
{
"collection_name": "{collection_name}",
"id": 1,
"shard_key_selector": [5235],
"vector": {
"some-sparse": {
"indices": [
Expand Down Expand Up @@ -167,11 +139,16 @@ _Click each to expand._
<details>
<summary><b>Combination of named dense and sparse vectors</b></summary>

Reference:

- [Creating a collection with multiple vectors](https://qdrant.tech/documentation/concepts/collections/#collection-with-multiple-vectors).

- [Creating a collection with sparse vectors](https://qdrant.tech/documentation/concepts/collections/#collection-with-sparse-vectors).

```json
{
"collection_name": "{collection_name}",
"id": "a10435b5-2a58-427a-a3a0-a5d845b147b7",
"shard_key_selector": ["some-key"],
"vector": {
"some-other-dense": [
0.1,
Expand Down
6 changes: 3 additions & 3 deletions archive/manifest.json
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
{
"name": "qdrant-kafka",
"version": "${project.version}",
"title": "Qdrant Sink Connector for Apache Kafka",
"description": "The official Kafka Sink Connector for Qdrant.",
"title": "Qdrant Connector for Apache Kafka",
"description": "Connector to use Qdrant as a sink destination in Kafka Connect.",
"owner": {
"username": "qdrant",
"name": "Qdrant",
Expand Down Expand Up @@ -36,7 +36,7 @@
},
"logo": "assets/qdrant_logo.png",
"documentation_url": "https://github.com/qdrant/qdrant-kafka/blob/main/README.md",
"source_url": "https://github.com/qdrant/qdrant-kafka/tree/main",
"source_url": "https://github.com/qdrant/qdrant-kafka/",
"docker_image": {},
"license": [
{
Expand Down
4 changes: 3 additions & 1 deletion build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,9 @@ dependencies {
implementation "org.apache.kafka:connect-api:$kafkaVersion"
implementation 'io.qdrant:client:1.9.1'
implementation 'io.grpc:grpc-protobuf:1.59.0'
implementation "io.grpc:grpc-netty-shaded:1.59.0"
implementation 'com.google.guava:guava:33.2.1-jre'
implementation 'com.fasterxml.jackson.core:jackson-databind:2.17.1'
implementation 'com.fasterxml.jackson.core:jackson-databind:2.14.2'
implementation 'com.google.protobuf:protobuf-java-util:3.25.3'
implementation 'org.slf4j:slf4j-api:2.0.13'

Expand Down Expand Up @@ -85,6 +86,7 @@ spotless {
}

shadowJar {
relocate 'io.grpc', 'shadow.grpc'
mergeServiceFiles()
archiveClassifier.set('')
}
Expand Down
48 changes: 0 additions & 48 deletions message_samples/combination.json

This file was deleted.

38 changes: 0 additions & 38 deletions message_samples/named_sparse_vector.json

This file was deleted.

31 changes: 0 additions & 31 deletions message_samples/named_vector.json

This file was deleted.

20 changes: 0 additions & 20 deletions message_samples/unnamed_vector.json

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
/* (C)2024 */
package io.qdrant.kafka;

import java.util.UUID;
import org.junit.jupiter.api.Test;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
Expand Down Expand Up @@ -54,7 +55,8 @@ public void testSparseVector() throws Exception {
int sparseVecCount = randomPositiveInt(100);

for (int i = 0; i < pointsCount; i++) {
writeSparseVector(sparseVecCollection, i, sparseVecName, sparseVecCount);
String uuid = UUID.randomUUID().toString();
writeSparseVector(sparseVecCollection, uuid, sparseVecName, sparseVecCount);
}

waitForPoints(sparseVecCollection, pointsCount);
Expand Down

0 comments on commit cafb26d

Please sign in to comment.