Skip to content

Commit

Permalink
Collection tag (#2266)
Browse files Browse the repository at this point in the history
* feat: collection metadata filter (#2211)

* feat: add dataset collection tags (#2231)

* dataset page

* workflow page

* move

* fix

* add plus filter

* fix

* fix

* fix

* perf: collection tag code

* fix: collection tags (#2249)

* fix

* fix

* fix tags of dataset page

* fix tags of workflow page

* doc

* add comments

* fix: collection tags (#2264)

* fix: metadata filter

* feat: search filter

---------

Co-authored-by: heheer <[email protected]>
Co-authored-by: heheer <[email protected]>
  • Loading branch information
3 people authored Aug 5, 2024
1 parent 56f6e69 commit fe71efb
Show file tree
Hide file tree
Showing 46 changed files with 1,916 additions and 114 deletions.
Binary file added docSite/assets/imgs/collection-tags-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docSite/assets/imgs/collection-tags-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docSite/assets/imgs/collection-tags-3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docSite/content/zh-cn/docs/course/chat_input_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ description: "FastGPT 对话问题引导"
icon: "code"
draft: false
toc: true
weight: 350
weight: 108
---

![](/imgs/questionGuide.png)
Expand Down
50 changes: 50 additions & 0 deletions docSite/content/zh-cn/docs/course/collection_tags.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---
title: "知识库集合标签"
description: "FastGPT 知识库集合标签使用说明"
icon: "developer_guide"
draft: false
toc: true
weight: 108
---

知识库集合标签是 FastGPT 商业版特有功能。它允许你对知识库中的数据集合添加标签进行分类,更高效地管理知识库数据。

而进一步可以在问答中,搜索知识库时添加集合过滤,实现更精确的搜索。

| | | |
| --------------------- | --------------------- | --------------------- |
| ![](/imgs/collection-tags-1.png) | ![](/imgs/collection-tags-2.png) | ![](/imgs/collection-tags-3.png) |

## 标签基础操作说明

在知识库详情页面,可以对标签进行管理,可执行的操作有

- 创建标签
- 修改标签名
- 删除标签
- 将一个标签赋给多个数据集合
- 给一个数据集合添加多个标签

也可以利用标签对数据集合进行筛选

## 知识库搜索-集合过滤说明

利用标签可以在知识库搜索时,通过填写「集合过滤」这一栏来实现更精确的搜索,具体的填写示例如下

```json
{
"tags": {
"$and": ["标签 1","标签 2"],
"$or": ["有 $and 标签时,and 生效,or 不生效"]
},
"createTime": {
"$gte": "YYYY-MM-DD HH:mm 格式即可,集合的创建时间大于该时间",
"$lte": "YYYY-MM-DD HH:mm 格式即可,集合的创建时间小于该时间,可和 $gte 共同使用"
}
}
```

在填写时有两个注意的点,

- 标签值可以为 `string` 类型的标签名,也可以为 `null`,而 `null` 代表着未设置标签的数据集合
- 标签过滤有 `$and``$or` 两种条件类型,在同时设置了 `$and``$or` 的情况下,只有 `$and` 会生效
17 changes: 17 additions & 0 deletions packages/global/core/dataset/api.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,23 @@ export type ExternalFileCreateDatasetCollectionParams = ApiCreateDatasetCollecti
filename?: string;
};

/* ================= tag ===================== */
export type CreateDatasetCollectionTagParams = {
datasetId: string;
tag: string;
};
export type AddTagsToCollectionsParams = {
originCollectionIds: string[];
collectionIds: string[];
datasetId: string;
tag: string;
};
export type UpdateDatasetCollectionTagParams = {
datasetId: string;
tagId: string;
tag: string;
};

/* ================= data ===================== */
export type PgSearchRawType = {
id: string;
Expand Down
18 changes: 18 additions & 0 deletions packages/global/core/dataset/type.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,13 @@ export type DatasetCollectionSchemaType = {
};
};

export type DatasetCollectionTagsSchemaType = {
_id: string;
teamId: string;
datasetId: string;
tag: string;
};

export type DatasetDataIndexItemType = {
defaultIndex: boolean;
dataId: string; // pg data id
Expand Down Expand Up @@ -144,6 +151,17 @@ export type DatasetItemType = Omit<DatasetSchemaType, 'vectorModel' | 'agentMode
permission: DatasetPermission;
};

/* ================= tag ===================== */
export type DatasetTagType = {
_id: string;
tag: string;
};

export type TagUsageType = {
tagId: string;
collections: string[];
};

/* ================= collection ===================== */
export type DatasetCollectionItemType = CollectionWithDatasetType & {
sourceName: string;
Expand Down
1 change: 1 addition & 0 deletions packages/global/core/workflow/constants.ts
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@ export enum NodeInputKeyEnum {
datasetSearchUsingExtensionQuery = 'datasetSearchUsingExtensionQuery',
datasetSearchExtensionModel = 'datasetSearchExtensionModel',
datasetSearchExtensionBg = 'datasetSearchExtensionBg',
collectionFilterMatch = 'collectionFilterMatch',

// concat dataset
datasetQuoteList = 'system_datasetQuoteList',
Expand Down
19 changes: 19 additions & 0 deletions packages/global/core/workflow/template/system/datasetSearch.ts
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,25 @@ export const DatasetSearchModule: FlowNodeTemplateType = {
{
...Input_Template_UserChatInput,
toolDescription: '需要检索的内容'
},
{
key: NodeInputKeyEnum.collectionFilterMatch,
renderTypeList: [FlowNodeInputTypeEnum.JSONEditor, FlowNodeInputTypeEnum.reference],
label: '集合元数据过滤',
valueType: WorkflowIOValueTypeEnum.object,
isPro: true,
description: `目前支持标签和创建时间过滤,需按照以下格式填写:
{
"tags": {
"$and": ["标签 1","标签 2"],
"$or": ["有 $and 标签时,and 生效,or 不生效"]
},
"createTime": {
"$gte": "YYYY-MM-DD HH:mm 格式即可,集合的创建时间大于该时间",
"$lte": "YYYY-MM-DD HH:mm 格式即可,集合的创建时间小于该时间,可和 $gte 共同使用"
}
}
`
}
],
outputs: [
Expand Down
1 change: 1 addition & 0 deletions packages/global/core/workflow/type/io.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ export type FlowNodeInputItemType = InputComponentPropsType & {

// render components params
canEdit?: boolean; // dynamic inputs
isPro?: boolean; // Pro version field
};

export type FlowNodeOutputItemType = {
Expand Down
4 changes: 1 addition & 3 deletions packages/service/common/vectorStore/controller.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,7 @@ export type EmbeddingRecallProps = {
datasetIds: string[];

forbidCollectionIdList: string[];
// forbidEmbIndexIdList: string[];
// similarity?: number;
// efSearch?: number;
filterCollectionIdList?: string[];
};
export type EmbeddingRecallCtrlProps = EmbeddingRecallProps & {
vector: number[];
Expand Down
41 changes: 36 additions & 5 deletions packages/service/common/vectorStore/milvus/class.ts
Original file line number Diff line number Diff line change
Expand Up @@ -213,19 +213,50 @@ export class MilvusCtrl {
};
embRecall = async (props: EmbeddingRecallCtrlProps): Promise<EmbeddingRecallResponse> => {
const client = await this.getClient();
const { teamId, datasetIds, vector, limit, forbidCollectionIdList, retry = 2 } = props;

const {
teamId,
datasetIds,
vector,
limit,
forbidCollectionIdList,
filterCollectionIdList,
retry = 2
} = props;

// Forbid collection
const formatForbidCollectionIdList = (() => {
if (!filterCollectionIdList) return forbidCollectionIdList;
const list = forbidCollectionIdList
.map((id) => String(id))
.filter((id) => !filterCollectionIdList.includes(id));
return list;
})();
const forbidColQuery =
forbidCollectionIdList.length > 0
? `and (collectionId not in [${forbidCollectionIdList.map((id) => `"${String(id)}"`).join(',')}])`
formatForbidCollectionIdList.length > 0
? `and (collectionId not in [${formatForbidCollectionIdList.map((id) => `"${id}"`).join(',')}])`
: '';

// filter collection id
const formatFilterCollectionId = (() => {
if (!filterCollectionIdList) return;
return filterCollectionIdList
.map((id) => String(id))
.filter((id) => !forbidCollectionIdList.includes(id));
})();
const collectionIdQuery = formatFilterCollectionId
? `and (collectionId in [${formatFilterCollectionId.map((id) => `"${id}"`)}])`
: ``;
// Empty data
if (formatFilterCollectionId && formatFilterCollectionId.length === 0) {
return { results: [] };
}

try {
const { results } = await client.search({
collection_name: DatasetVectorTableName,
data: vector,
limit,
filter: `(teamId == "${teamId}") and (datasetId in [${datasetIds.map((id) => `"${String(id)}"`).join(',')}]) ${forbidColQuery}`,
filter: `(teamId == "${teamId}") and (datasetId in [${datasetIds.map((id) => `"${id}"`).join(',')}]) ${collectionIdQuery} ${forbidColQuery}`,
output_fields: ['collectionId']
});

Expand Down
45 changes: 38 additions & 7 deletions packages/service/common/vectorStore/pg/class.ts
Original file line number Diff line number Diff line change
Expand Up @@ -119,14 +119,44 @@ export class PgVectorCtrl {
}
};
embRecall = async (props: EmbeddingRecallCtrlProps): Promise<EmbeddingRecallResponse> => {
const { teamId, datasetIds, vector, limit, forbidCollectionIdList, retry = 2 } = props;

const {
teamId,
datasetIds,
vector,
limit,
forbidCollectionIdList,
filterCollectionIdList,
retry = 2
} = props;

// Get forbid collection
const formatForbidCollectionIdList = (() => {
if (!filterCollectionIdList) return forbidCollectionIdList;
const list = forbidCollectionIdList
.map((id) => String(id))
.filter((id) => !filterCollectionIdList.includes(id));
return list;
})();
const forbidCollectionSql =
forbidCollectionIdList.length > 0
? `AND collection_id NOT IN (${forbidCollectionIdList.map((id) => `'${String(id)}'`).join(',')})`
: 'AND collection_id IS NOT NULL';
// const forbidDataSql =
// forbidEmbIndexIdList.length > 0 ? `AND id NOT IN (${forbidEmbIndexIdList.join(',')})` : '';
formatForbidCollectionIdList.length > 0
? `AND collection_id NOT IN (${formatForbidCollectionIdList.map((id) => `'${id}'`).join(',')})`
: '';

// Filter by collectionId
const formatFilterCollectionId = (() => {
if (!filterCollectionIdList) return;

return filterCollectionIdList
.map((id) => String(id))
.filter((id) => !forbidCollectionIdList.includes(id));
})();
const filterCollectionIdSql = formatFilterCollectionId
? `AND collection_id IN (${formatFilterCollectionId.map((id) => `'${id}'`).join(',')})`
: '';
// Empty data
if (formatFilterCollectionId && formatFilterCollectionId.length === 0) {
return { results: [] };
}

try {
// const explan: any = await PgClient.query(
Expand All @@ -150,6 +180,7 @@ export class PgVectorCtrl {
from ${DatasetVectorTableName}
where team_id='${teamId}'
AND dataset_id IN (${datasetIds.map((id) => `'${String(id)}'`).join(',')})
${filterCollectionIdSql}
${forbidCollectionSql}
order by score limit ${limit};
COMMIT;`
Expand Down
6 changes: 4 additions & 2 deletions packages/service/core/dataset/collection/schema.ts
Original file line number Diff line number Diff line change
Expand Up @@ -106,8 +106,10 @@ try {
updateTime: -1
});

// get forbid
// DatasetCollectionSchema.index({ teamId: 1, datasetId: 1, forbid: 1 });
// Tag filter
DatasetCollectionSchema.index({ teamId: 1, datasetId: 1, tags: 1 });
// create time filter
DatasetCollectionSchema.index({ teamId: 1, datasetId: 1, createTime: 1 });
} catch (error) {
console.log(error);
}
Expand Down
Loading

0 comments on commit fe71efb

Please sign in to comment.