Skip to content

Scope RAG embeddings by document ownership#77

Merged
rktclgh merged 3 commits into
developfrom
issue/be-rag-scoped-documents
May 28, 2026
Merged

Scope RAG embeddings by document ownership#77
rktclgh merged 3 commits into
developfrom
issue/be-rag-scoped-documents

Conversation

@rktclgh

@rktclgh rktclgh commented May 28, 2026

Copy link
Copy Markdown
Owner

Summary

  • split personal reservation RAG data into owner-scoped reservation_* documents
  • add doc_type/visibility/owner/event/reservation metadata to pgvector chunks and scoped search helpers
  • route personal chat to user-private docs and event-info chat to event-first public search
  • trigger reservation reindex on reservation create/update/cancel
  • keep /api/rag/search/test public-only to avoid leaking private chunks

Tests

  • ./gradlew test --no-daemon

Co-authored-by: OmX omx@oh-my-codex.dev

Summary by CodeRabbit

Release Notes

  • New Features

    • Added document visibility control enabling public and private search result filtering
    • Included user reservation information in searchable documents for comprehensive context
    • Implemented smart search routing that prioritizes event-related queries
  • Improvements

    • Enhanced document chunking to better preserve content structure and formatting
    • Improved search accuracy through document type-based filtering capabilities

Review Change Stack

Reservations were embedded inside broad user documents, so private booking questions had weak retrieval boundaries and event questions could be polluted by booth/user chunks. This splits reservation history into owner-scoped reservation documents, keeps public event documents first for event-info questions, and carries scope metadata into pgvector rows.

Constraint: Personal reservation answers must only search the authenticated user's own private documents

Constraint: Event questions need event documents to rank ahead of booth documents that repeat event names

Rejected: Keep reservation snippets inside user_* documents | mixed private documents made per-reservation refresh and retrieval precision too coarse

Confidence: high

Scope-risk: moderate

Directive: Do not add new RAG document types without setting doc_type, visibility, and owner/event metadata on every chunk

Tested: ./gradlew test --no-daemon

Co-authored-by: OmX <omx@oh-my-codex.dev>
@gemini-code-assist

Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@coderabbitai

coderabbitai Bot commented May 28, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@rktclgh, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 46 minutes and 31 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9246dc0a-6d7f-428e-afaf-12741430a3cc

📥 Commits

Reviewing files that changed from the base of the PR and between 6ec586f and 7a04740.

📒 Files selected for processing (3)
  • src/main/java/com/fairing/fairplay/ai/orchestrator/AiChatOrchestrator.java
  • src/main/java/com/fairing/fairplay/ai/rag/service/RagChatService.java
  • src/test/java/com/fairing/fairplay/ai/rag/service/RagChatServiceScopeTest.java
📝 Walkthrough

Walkthrough

이 PR은 RAG 시스템에 사용자 예약 문서를 추가하면서 메타데이터 기반 문서 스코핑과 쿼리 인식 검색 라우팅을 구현합니다. 스키마, 도메인 모델, 검색 인프라, 문서 생성, 서비스 라우팅, 이벤트 기반 인덱싱이 함께 작동하여 개인 예약 데이터와 공개 행사 문서를 적절히 검색합니다.

Changes

User Reservation RAG Documents with Type-Based Scoping

Layer / File(s) Summary
Domain Models and Schema Initialization
src/main/java/com/fairing/fairplay/ai/rag/domain/Chunk.java, src/main/java/com/fairing/fairplay/ai/rag/domain/Document.java, src/main/java/com/fairing/fairplay/ai/rag/event/RagDocumentType.java, src/main/java/com/fairing/fairplay/ai/rag/config/RagPgVectorSchemaInitializer.java
ChunkDocumentdocType, visibility, ownerUserId, eventId, boothId, reservationId 메타데이터 필드 추가; RagDocumentTypeUSER_RESERVATION 열거값 추가; rag_chunks 테이블에 메타데이터 컬럼 추가 및 doc_id 패턴 기반 기존 데이터 백필(index 포함).
Type-Filtered Search Repository Layer
src/main/java/com/fairing/fairplay/ai/rag/repository/RagChunkRepository.java, src/main/java/com/fairing/fairplay/ai/rag/repository/PgVectorRagRepository.java
Repository 인터페이스에 searchPublicSimilarByTypes, searchUserSimilarByTypes, searchPublicKeywordByTypes, searchUserKeywordByTypes 메서드 추가; 구현 클래스에서 visibility='PUBLIC'owner_user_id = ? 스코핑과 동적 doc_type IN (...) 필터링 적용; 저장/조회 쿼리에 메타데이터 컬럼 포함.
Document Chunking and Ingestion
src/main/java/com/fairing/fairplay/ai/rag/service/ChunkingService.java, src/main/java/com/fairing/fairplay/ai/rag/service/DocumentIngestService.java
ChunkingServiceDocument 객체 기반 청킹 오버로드 추가; 생성 청크에 docType, visibility, 소유/연관 ID 메타데이터 전파; DocumentIngestService에서 full Document 객체로 청킹 호출 변경.
Reservation Document Indexing
src/main/java/com/fairing/fairplay/ai/rag/service/ComprehensiveRagDataLoader.java, src/main/java/com/fairing/fairplay/reservation/repository/ReservationRepository.java
ComprehensiveRagDataLoader에 예약 문서 생성 및 배치 로드 구현; buildReservationDocument(Reservation) 메서드로 예약 메타데이터(일정/티켓/상태/가격) 포함 문서 생성; ReservationRepository.findAllForRag() 추가로 관계 fetch-join 로드; 단일 예약 재인덱싱을 위한 loadSingleReservation(Long) 공개 메서드 추가.
Vector Search Service Routing
src/main/java/com/fairing/fairplay/ai/rag/service/VectorSearchService.java
searchUserDatasearchUserPrivate로 위임하도록 변경; 공개 이벤트 우선 검색을 위한 searchPublicEventsFirst 추가(부족 시 searchPublicOnly 폴백 병합); mergeByPriority 헬퍼 추가로 중복 제거 후 우선순위 병합; PUBLIC_EVENT_TYPESUSER_PRIVATE_TYPES 타입 상수 정의.
RAG Indexing Event Publishing and Routing
src/main/java/com/fairing/fairplay/ai/rag/service/RagIndexingEventPublisher.java, src/main/java/com/fairing/fairplay/ai/rag/service/RagReindexEventListener.java
RagIndexingEventPublisherreservationChanged(reservationId)reservationDeleted(reservationId) 메서드 추가; RagReindexEventListener에서 USER_RESERVATION 이벤트를 loadSingleReservation() 또는 deleteDocument("reservation_" + id) 라우팅.
Chat Service and API Integration
src/main/java/com/fairing/fairplay/ai/rag/service/RagChatService.java, src/main/java/com/fairing/fairplay/ai/rag/controller/RagController.java
RagChatService에서 개인정보 질의 시 searchUserPrivate/searchPublicEventsFirst 조합 사용; 행사 정보 질의 감지를 위한 isEventInformationQuery(String) 헬퍼 추가; RagController.testSearch에서 searchPublicOnly 호출로 변경 및 응답에 docId 필드 추가.
Reservation Service RAG Event Publishing
src/main/java/com/fairing/fairplay/reservation/service/ReservationService.java
예약 생성/수정/취소 흐름에서 RAG 인덱싱 이벤트를 userDataChanged(userId)에서 reservationChanged(reservationId)로 변경 (5개 호출 지점).
Comprehensive Test Coverage
src/test/java/com/fairing/fairplay/ai/rag/service/ChunkingServiceTest.java, src/test/java/com/fairing/fairplay/ai/rag/service/ComprehensiveRagDataLoaderUserScopeTest.java, src/test/java/com/fairing/fairplay/ai/rag/service/DocumentIngestServiceTest.java, src/test/java/com/fairing/fairplay/ai/rag/service/RagChatServiceScopeTest.java, src/test/java/com/fairing/fairplay/ai/rag/service/RagLoadJobServiceTest.java, src/test/java/com/fairing/fairplay/ai/rag/service/RagReindexEventListenerTest.java, src/test/java/com/fairing/fairplay/reservation/service/ReservationServiceAuthorizationTest.java
청킹 시 섹션 경계 보존 검증; 예약 문서 생성과 메타데이터 전파 검증; 문서 스코프 메타데이터 청크 전파 검증; 행사 정보/개인 예약 질의 라우팅 및 검색 메서드 호출 검증; USER_RESERVATION 이벤트 라우팅 검증; 예약 서비스 이벤트 발행 검증.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • rktclgh/FairPlay_BE#29: 이 PR의 RagPgVectorSchemaInitializer 메타데이터 컬럼/인덱스 추가가 PR #29에서 도입한 pgvector 스키마 초기화 기반을 확장합니다.
  • rktclgh/FairPlay_BE#66: 이 PR의 RagIndexingEventPublisher/RagReindexEventListener 예약 문서 라우팅이 PR #66에서 구체화한 RAG 이벤트 발행/처리 인프라를 활용합니다.

Poem

🐰 예약이 문서가 되고, 검색이 똑똑해지는 날
공개와 개인을 가르는 경계를 담그고
행사와 예약이 함께 춤을 춘다
타입 기반 필터링으로 맞춤 답변을 선물하네! 🎁

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 15.45% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed PR 제목은 주요 변경사항인 문서 소유권 기반의 RAG 임베딩 스코핑을 명확하고 간결하게 설명합니다.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch issue/be-rag-scoped-documents

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

rktclgh and others added 2 commits May 28, 2026 15:33
The chatbot previously allowed meta-requests to flow into the model, so users could ask for hidden prompts or server resource analysis and receive fabricated or sensitive-looking answers. Add a pre-LLM safety boundary for prompt/system/server-secret requests and reinforce both RAG and fallback system prompts.

Constraint: Chatbot answers must stay within public FairPlay event data and the authenticated user's own reservation data

Rejected: Prompt-only hardening | direct blocking prevents risky requests from reaching search or LLM paths

Confidence: high

Scope-risk: narrow

Directive: Keep safety-boundary requests out of LLM calls; do not rely only on wording inside the system prompt

Tested: ./gradlew test --tests com.fairing.fairplay.ai.rag.service.RagChatServiceScopeTest --no-daemon

Tested: ./gradlew test --no-daemon

Co-authored-by: OmX <omx@oh-my-codex.dev>
The local model should still answer FairPlay-domain help questions when a precise RAG match is unavailable, but it must never leak server resources, prompts, environment values, or secrets. Keep direct safety-boundary input blocking, allow constrained FairPlay-only fallback prompts, and add output inspection that replaces sensitive-looking responses before they reach users.

Constraint: Non-FairPlay questions remain out of scope

Constraint: Server resources and secrets must not leave the AI chat path even when produced by the local model

Rejected: Refuse every no-context question | too conservative for normal FairPlay usage guidance

Rejected: Prompt-only protection | unsafe model output needs a final response gate

Confidence: high

Scope-risk: narrow

Directive: Preserve both input and output guards when changing chat fallback behavior

Tested: ./gradlew test --tests com.fairing.fairplay.ai.rag.service.RagChatServiceScopeTest --no-daemon

Tested: ./gradlew test --no-daemon

Co-authored-by: OmX <omx@oh-my-codex.dev>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/main/java/com/fairing/fairplay/ai/rag/controller/RagController.java (2)

193-198: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

topK 파라미터가 실제 검색에 반영되지 않습니다.

safeTopK를 계산하지만 searchPublicOnly(query)는 항상 서비스의 기본값만 써서, 응답의 topK와 실제 조회 개수가 쉽게 어긋납니다. 이 파라미터를 계속 받을 거면 서비스에 topK 오버로드를 추가해서 여기서 넘겨주세요.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/main/java/com/fairing/fairplay/ai/rag/controller/RagController.java`
around lines 193 - 198, testSearch computes safeTopK but never passes it to the
vector search, causing mismatch; update usage so the controller calls a service
method that accepts topK (e.g., add or use
vectorSearchService.searchPublicOnly(String query, int topK)) and pass safeTopK
from RagController.testSearch, or add an overload to VectorSearchService that
accepts the topK parameter and forwards it to the underlying search
implementation; ensure the service method and the controller parameter name
safeTopK are used so the returned result count matches the requested topK.

199-203: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

공개 테스트 응답에서 private chunk 규모가 새어 나갈 수 있습니다.

이 엔드포인트는 hit 자체는 public-only로 제한하지만, 여기서 내려주는 totalChunks는 공개 스코프 기준 수치가 아니라 전체 인덱스 수치라면 USER_PRIVATE 문서 개수까지 그대로 노출됩니다. /api/rag/search/test를 public-only로 유지하려는 목적과 충돌하니, 이 값은 공개 스코프 기준 count로 바꾸거나 응답에서 제거하는 편이 맞습니다.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/main/java/com/fairing/fairplay/ai/rag/controller/RagController.java`
around lines 199 - 203, The response is leaking private chunk counts via
result.getTotalChunks(); in RagController (the handler for /api/rag/search/test)
either remove the "totalChunks" field from the ResponseEntity or replace it with
a public-only count computed from the search result (e.g., count chunks whose
scope is public by filtering result.getChunks() or use a service method that
returns public-only totals) instead of returning result.getTotalChunks(); ensure
the response uses the same public-only scope as the endpoint and update any
references to "totalChunks" accordingly.
🧹 Nitpick comments (1)
src/test/java/com/fairing/fairplay/reservation/service/ReservationServiceAuthorizationTest.java (1)

333-375: ⚡ Quick win

테스트 메서드명이 현재 검증 의도와 어긋납니다.

이제 reservationChanged를 검증하므로 메서드명도 UserDataReindex 대신 예약 재색인 의미로 바꾸는 게 좋습니다.

제안 diff
-    void createReservationPublishesUserDataReindexAfterReservationMutation() {
+    void createReservationPublishesReservationReindexAfterReservationMutation() {
...
-    void cancelReservationPublishesUserDataReindexAfterReservationMutation() {
+    void cancelReservationPublishesReservationReindexAfterReservationMutation() {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/test/java/com/fairing/fairplay/reservation/service/ReservationServiceAuthorizationTest.java`
around lines 333 - 375, Rename the two test methods so their names reflect that
they verify reservation reindexing via reservationChanged() rather than
"UserDataReindex": change
createReservationPublishesUserDataReindexAfterReservationMutation to something
like createReservationPublishesReservationReindexAfterReservationMutation (or
createReservationPublishesReservationChangedEvent) and change
cancelReservationPublishesUserDataReindexAfterReservationMutation to
cancelReservationPublishesReservationReindexAfterReservationMutation (or
cancelReservationPublishesReservationChangedEvent); update any references or
annotations if needed so ReservationServiceAuthorizationTest still runs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@src/main/java/com/fairing/fairplay/ai/rag/config/RagPgVectorSchemaInitializer.java`:
- Around line 57-65: The backfill SQL in RagPgVectorSchemaInitializer currently
sets visibility for doc_id LIKE 'reservation_%' but does not populate
owner_user_id, breaking owner-scoped searches; update the backfill logic in
RagPgVectorSchemaInitializer to also set owner_user_id for reservation_* entries
by adding a CASE branch that extracts the owner id from doc_id (e.g.,
regexp_replace(doc_id, '^reservation_', '')::bigint when non-empty) or, if more
reliable, join to the reservations table to fetch the user id and assign it to
owner_user_id; ensure this change is applied alongside the existing visibility
CASE so reservation_* rows get both visibility = 'USER_PRIVATE' and the correct
owner_user_id.

In
`@src/main/java/com/fairing/fairplay/ai/rag/repository/PgVectorRagRepository.java`:
- Around line 103-104: The user-scope search calls to searchSimilarByScope
(e.g., the call returning searchSimilarByScope(queryEmbedding, topK, threshold,
"AND owner_user_id = ?", userId)) are missing a visibility constraint; change
the condition string to include visibility = 'USER_PRIVATE' (e.g., "AND
visibility = 'USER_PRIVATE' AND owner_user_id = ?") and apply the same update to
all other user-scoped invocations of searchSimilarByScope (the other occurrences
around the user-search methods), so the repository only returns documents that
are both owned by the user and marked USER_PRIVATE.

In `@src/main/java/com/fairing/fairplay/ai/rag/service/ChunkingService.java`:
- Around line 30-35: The overload chunkDocument(String docId, String content)
creates chunks without metadata which breaks scope-filtered searches; update it
to populate default metadata (set docType to PUBLIC_MISC and visibility to
PUBLIC) before delegating to chunkDocument(Document) or mark this overload as
internal/test-only (e.g., package-private) to avoid accidental public use;
locate the overload method chunkDocument(String, String) and either construct a
Document with the required defaults or change its visibility, ensuring
consistency with the other chunkDocument(Document) path.

In
`@src/main/java/com/fairing/fairplay/ai/rag/service/ComprehensiveRagDataLoader.java`:
- Around line 322-330: In loadSingleReservation(Long reservationId) apply the
same user filter used by loadUserReservations: fetch the Reservation entity
(used by buildReservationDocument), call isIndexableUser(reservation.getUser())
and if it returns false call documentIngestService.deleteDocument("reservation_"
+ reservationId) and return the appropriate LoadResult without
building/ingesting a USER_PRIVATE document; only call
buildReservationDocument(...) and proceed with ingestion when
isIndexableUser(...) is true. Reference methods: loadSingleReservation,
loadUserReservations, isIndexableUser, buildReservationDocument, and
documentIngestService.deleteDocument to locate the change.

In `@src/main/java/com/fairing/fairplay/ai/rag/service/RagChatService.java`:
- Around line 164-179: The current isEventInformationQuery method only matches
compound Korean words (e.g., "행사정보", "행사일정") so natural queries like "송도 맥주축제
언제야" are missed; update isEventInformationQuery to also detect event nouns
(e.g., "축제", "페스티벌", "공연", "박람회", "행사") combined with information intent tokens
(e.g., "언제", "어디", "장소", "시간", "가격", "요금", "문의", "연락처", "전화", "이메일") by
normalizing the input (lowercase/trim) and checking for either direct compound
matches OR (containsAny(eventNouns) && containsAny(infoKeywords)); keep
null-check and reuse the existing compact/normalized string logic in
isEventInformationQuery to locate where to change.

---

Outside diff comments:
In `@src/main/java/com/fairing/fairplay/ai/rag/controller/RagController.java`:
- Around line 193-198: testSearch computes safeTopK but never passes it to the
vector search, causing mismatch; update usage so the controller calls a service
method that accepts topK (e.g., add or use
vectorSearchService.searchPublicOnly(String query, int topK)) and pass safeTopK
from RagController.testSearch, or add an overload to VectorSearchService that
accepts the topK parameter and forwards it to the underlying search
implementation; ensure the service method and the controller parameter name
safeTopK are used so the returned result count matches the requested topK.
- Around line 199-203: The response is leaking private chunk counts via
result.getTotalChunks(); in RagController (the handler for /api/rag/search/test)
either remove the "totalChunks" field from the ResponseEntity or replace it with
a public-only count computed from the search result (e.g., count chunks whose
scope is public by filtering result.getChunks() or use a service method that
returns public-only totals) instead of returning result.getTotalChunks(); ensure
the response uses the same public-only scope as the endpoint and update any
references to "totalChunks" accordingly.

---

Nitpick comments:
In
`@src/test/java/com/fairing/fairplay/reservation/service/ReservationServiceAuthorizationTest.java`:
- Around line 333-375: Rename the two test methods so their names reflect that
they verify reservation reindexing via reservationChanged() rather than
"UserDataReindex": change
createReservationPublishesUserDataReindexAfterReservationMutation to something
like createReservationPublishesReservationReindexAfterReservationMutation (or
createReservationPublishesReservationChangedEvent) and change
cancelReservationPublishesUserDataReindexAfterReservationMutation to
cancelReservationPublishesReservationReindexAfterReservationMutation (or
cancelReservationPublishesReservationChangedEvent); update any references or
annotations if needed so ReservationServiceAuthorizationTest still runs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6b2a08cb-8b85-4705-ae9b-ee5ba89fbdba

📥 Commits

Reviewing files that changed from the base of the PR and between ca3661c and 6ec586f.

📒 Files selected for processing (23)
  • src/main/java/com/fairing/fairplay/ai/rag/config/RagPgVectorSchemaInitializer.java
  • src/main/java/com/fairing/fairplay/ai/rag/controller/RagController.java
  • src/main/java/com/fairing/fairplay/ai/rag/domain/Chunk.java
  • src/main/java/com/fairing/fairplay/ai/rag/domain/Document.java
  • src/main/java/com/fairing/fairplay/ai/rag/event/RagDocumentType.java
  • src/main/java/com/fairing/fairplay/ai/rag/repository/PgVectorRagRepository.java
  • src/main/java/com/fairing/fairplay/ai/rag/repository/RagChunkRepository.java
  • src/main/java/com/fairing/fairplay/ai/rag/service/ChunkingService.java
  • src/main/java/com/fairing/fairplay/ai/rag/service/ComprehensiveRagDataLoader.java
  • src/main/java/com/fairing/fairplay/ai/rag/service/DocumentIngestService.java
  • src/main/java/com/fairing/fairplay/ai/rag/service/RagChatService.java
  • src/main/java/com/fairing/fairplay/ai/rag/service/RagIndexingEventPublisher.java
  • src/main/java/com/fairing/fairplay/ai/rag/service/RagReindexEventListener.java
  • src/main/java/com/fairing/fairplay/ai/rag/service/VectorSearchService.java
  • src/main/java/com/fairing/fairplay/reservation/repository/ReservationRepository.java
  • src/main/java/com/fairing/fairplay/reservation/service/ReservationService.java
  • src/test/java/com/fairing/fairplay/ai/rag/service/ChunkingServiceTest.java
  • src/test/java/com/fairing/fairplay/ai/rag/service/ComprehensiveRagDataLoaderUserScopeTest.java
  • src/test/java/com/fairing/fairplay/ai/rag/service/DocumentIngestServiceTest.java
  • src/test/java/com/fairing/fairplay/ai/rag/service/RagChatServiceScopeTest.java
  • src/test/java/com/fairing/fairplay/ai/rag/service/RagLoadJobServiceTest.java
  • src/test/java/com/fairing/fairplay/ai/rag/service/RagReindexEventListenerTest.java
  • src/test/java/com/fairing/fairplay/reservation/service/ReservationServiceAuthorizationTest.java

Comment on lines +57 to +65
visibility = CASE
WHEN doc_id LIKE 'user\\_%' ESCAPE '\\' OR doc_id LIKE 'reservation\\_%' ESCAPE '\\' THEN 'USER_PRIVATE'
ELSE COALESCE(visibility, 'PUBLIC')
END,
owner_user_id = CASE
WHEN doc_id LIKE 'user\\_%' ESCAPE '\\'
THEN NULLIF(regexp_replace(doc_id, '^user_', ''), '')::bigint
ELSE owner_user_id
END

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

reservation_* 백필에서 owner_user_id 누락으로 사용자 스코프 검색이 깨질 수 있습니다.

현재 백필은 reservation_*USER_PRIVATE로 바꾸지만 owner_user_id는 채우지 않습니다. 이 경우 기존 예약 청크는 사용자 조건(owner_user_id = ?) 검색에서 영구적으로 누락됩니다. reservation_*에도 소유자 매핑 로직(패턴 파싱 또는 조인 기반 보강)을 추가해 주세요.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/main/java/com/fairing/fairplay/ai/rag/config/RagPgVectorSchemaInitializer.java`
around lines 57 - 65, The backfill SQL in RagPgVectorSchemaInitializer currently
sets visibility for doc_id LIKE 'reservation_%' but does not populate
owner_user_id, breaking owner-scoped searches; update the backfill logic in
RagPgVectorSchemaInitializer to also set owner_user_id for reservation_* entries
by adding a CASE branch that extracts the owner id from doc_id (e.g.,
regexp_replace(doc_id, '^reservation_', '')::bigint when non-empty) or, if more
reliable, join to the reservations table to fetch the user id and assign it to
owner_user_id; ensure this change is applied alongside the existing visibility
CASE so reservation_* rows get both visibility = 'USER_PRIVATE' and the correct
owner_user_id.

Comment on lines +103 to +104
return searchSimilarByScope(queryEmbedding, topK, threshold, "AND owner_user_id = ?", userId);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

사용자 스코프 검색에 visibility='USER_PRIVATE' 조건이 빠져 스코프 경계가 약합니다.

개인 질의는 “내 private 문서”만 조회해야 하는데, 현재는 owner_user_id만 검사합니다. 사용자 검색 계열 전부에 visibility = 'USER_PRIVATE'를 함께 강제해 주세요.

수정 예시
- return searchSimilarByScope(queryEmbedding, topK, threshold, "AND owner_user_id = ?", userId);
+ return searchSimilarByScope(
+     queryEmbedding, topK, threshold,
+     "AND visibility = 'USER_PRIVATE' AND owner_user_id = ?",
+     userId
+ );

- return searchKeywordByScope(query, topK, "AND owner_user_id = ?", userId);
+ return searchKeywordByScope(
+     query, topK,
+     "AND visibility = 'USER_PRIVATE' AND owner_user_id = ?",
+     userId
+ );

- return "AND owner_user_id = ? AND doc_type IN (" + placeholders(docTypes) + ")";
+ return "AND visibility = 'USER_PRIVATE' AND owner_user_id = ? AND doc_type IN (" + placeholders(docTypes) + ")";

Also applies to: 139-140, 277-279

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/main/java/com/fairing/fairplay/ai/rag/repository/PgVectorRagRepository.java`
around lines 103 - 104, The user-scope search calls to searchSimilarByScope
(e.g., the call returning searchSimilarByScope(queryEmbedding, topK, threshold,
"AND owner_user_id = ?", userId)) are missing a visibility constraint; change
the condition string to include visibility = 'USER_PRIVATE' (e.g., "AND
visibility = 'USER_PRIVATE' AND owner_user_id = ?") and apply the same update to
all other user-scoped invocations of searchSimilarByScope (the other occurrences
around the user-search methods), so the repository only returns documents that
are both owned by the user and marked USER_PRIVATE.

Comment on lines 30 to +35
public List<Chunk> chunkDocument(String docId, String content) {
return chunkDocument(Document.builder()
.docId(docId)
.content(content)
.build());
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

기존 오버로드가 메타데이터 없는 청크를 생성할 수 있습니다.

chunkDocument(String docId, String content) 경로는 docType/visibility를 채우지 않아 스코프 필터 검색에서 누락/오동작을 유발할 수 있습니다. 최소 기본값(PUBLIC_MISC, PUBLIC)을 강제하거나, 이 오버로드를 내부/테스트 전용으로 제한해 주세요.

수정 예시
 public List<Chunk> chunkDocument(String docId, String content) {
     return chunkDocument(Document.builder()
         .docId(docId)
         .content(content)
+        .docType("PUBLIC_MISC")
+        .visibility("PUBLIC")
         .build());
 }

Also applies to: 54-64

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/main/java/com/fairing/fairplay/ai/rag/service/ChunkingService.java`
around lines 30 - 35, The overload chunkDocument(String docId, String content)
creates chunks without metadata which breaks scope-filtered searches; update it
to populate default metadata (set docType to PUBLIC_MISC and visibility to
PUBLIC) before delegating to chunkDocument(Document) or mark this overload as
internal/test-only (e.g., package-private) to avoid accidental public use;
locate the overload method chunkDocument(String, String) and either construct a
Document with the required defaults or change its visibility, ensuring
consistency with the other chunkDocument(Document) path.

Comment on lines +322 to +330
public LoadResult loadSingleReservation(Long reservationId) {
try {
Document document = inReadOnlyTransaction(() -> reservationRepository.findByIdForResponse(reservationId)
.map(this::buildReservationDocument)
.orElse(null));

if (document == null) {
documentIngestService.deleteDocument("reservation_" + reservationId);
return new LoadResult("UserReservation", 1, 1, 0);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

단건 예약 재인덱싱도 bulk 경로와 같은 사용자 필터를 써야 합니다.

loadUserReservations()는 인덱싱 가능한 사용자만 통과시키는데, 여기서는 바로 buildReservationDocument()로 넘어가서 삭제/시스템 사용자 예약도 다시 USER_PRIVATE 문서로 들어갈 수 있습니다. 예약 변경 이벤트는 보통 이 단건 경로를 타므로, 실제 운영 상태가 전체 로드 결과와 달라집니다. 이 메서드도 isIndexableUser(reservation.getUser())를 적용하고 탈락 시 기존 reservation_* 문서를 삭제하는 쪽이 맞습니다.

가능한 수정 예시
     public LoadResult loadSingleReservation(Long reservationId) {
         try {
             Document document = inReadOnlyTransaction(() -> reservationRepository.findByIdForResponse(reservationId)
+                .filter(reservation -> reservation.getUser() != null && isIndexableUser(reservation.getUser()))
                 .map(this::buildReservationDocument)
                 .orElse(null));
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
public LoadResult loadSingleReservation(Long reservationId) {
try {
Document document = inReadOnlyTransaction(() -> reservationRepository.findByIdForResponse(reservationId)
.map(this::buildReservationDocument)
.orElse(null));
if (document == null) {
documentIngestService.deleteDocument("reservation_" + reservationId);
return new LoadResult("UserReservation", 1, 1, 0);
public LoadResult loadSingleReservation(Long reservationId) {
try {
Document document = inReadOnlyTransaction(() -> reservationRepository.findByIdForResponse(reservationId)
.filter(reservation -> reservation.getUser() != null && isIndexableUser(reservation.getUser()))
.map(this::buildReservationDocument)
.orElse(null));
if (document == null) {
documentIngestService.deleteDocument("reservation_" + reservationId);
return new LoadResult("UserReservation", 1, 1, 0);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/main/java/com/fairing/fairplay/ai/rag/service/ComprehensiveRagDataLoader.java`
around lines 322 - 330, In loadSingleReservation(Long reservationId) apply the
same user filter used by loadUserReservations: fetch the Reservation entity
(used by buildReservationDocument), call isIndexableUser(reservation.getUser())
and if it returns false call documentIngestService.deleteDocument("reservation_"
+ reservationId) and return the appropriate LoadResult without
building/ingesting a USER_PRIVATE document; only call
buildReservationDocument(...) and proceed with ingestion when
isIndexableUser(...) is true. Reference methods: loadSingleReservation,
loadUserReservations, isIndexableUser, buildReservationDocument, and
documentIngestService.deleteDocument to locate the change.

Comment on lines +164 to +179
private boolean isEventInformationQuery(String question) {
if (question == null) return false;

String compact = question.toLowerCase().replaceAll("\\s+", "");
return compact.contains("행사정보") ||
compact.contains("이벤트정보") ||
compact.contains("공연정보") ||
compact.contains("축제정보") ||
compact.contains("박람회정보") ||
compact.contains("행사일정") ||
compact.contains("이벤트일정") ||
compact.contains("행사장소") ||
compact.contains("이벤트장소") ||
compact.contains("문의처") ||
compact.contains("관리자연락처") ||
compact.contains("관리자이메일");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

이 분류기로는 흔한 행사 질문이 event-first 경로로 거의 안 들어갑니다.

지금은 행사정보, 행사일정, 행사장소 같은 합성어만 잡아서 송도 맥주축제 언제야, 부산 불꽃축제 어디서 해, OO 페스티벌 가격 알려줘 같은 자연스러운 질의는 대부분 일반 공개 검색으로 빠집니다. 그러면 이번 PR의 핵심인 event 문서 우선 랭킹이 적용되지 않아 booth/review 청크가 다시 앞설 수 있습니다. 행사 관련 명사와 언제/어디/장소/시간/가격/문의 같은 정보형 키워드 조합까지 넓히는 편이 안전합니다.

가능한 수정 예시
     private boolean isEventInformationQuery(String question) {
         if (question == null) return false;

         String compact = question.toLowerCase().replaceAll("\\s+", "");
-        return compact.contains("행사정보") ||
+        boolean mentionsEvent = compact.contains("행사") ||
+               compact.contains("이벤트") ||
+               compact.contains("공연") ||
+               compact.contains("축제") ||
+               compact.contains("박람회") ||
+               compact.contains("페스티벌");
+        boolean asksEventInfo = compact.contains("정보") ||
+               compact.contains("일정") ||
+               compact.contains("장소") ||
+               compact.contains("위치") ||
+               compact.contains("시간") ||
+               compact.contains("언제") ||
+               compact.contains("가격") ||
+               compact.contains("문의") ||
+               compact.contains("연락처") ||
+               compact.contains("이메일");
+        return compact.contains("행사정보") ||
                compact.contains("이벤트정보") ||
                compact.contains("공연정보") ||
                compact.contains("축제정보") ||
                compact.contains("박람회정보") ||
                compact.contains("행사일정") ||
                compact.contains("이벤트일정") ||
                compact.contains("행사장소") ||
                compact.contains("이벤트장소") ||
                compact.contains("문의처") ||
                compact.contains("관리자연락처") ||
-               compact.contains("관리자이메일");
+               compact.contains("관리자이메일") ||
+               (mentionsEvent && asksEventInfo);
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/main/java/com/fairing/fairplay/ai/rag/service/RagChatService.java` around
lines 164 - 179, The current isEventInformationQuery method only matches
compound Korean words (e.g., "행사정보", "행사일정") so natural queries like "송도 맥주축제
언제야" are missed; update isEventInformationQuery to also detect event nouns
(e.g., "축제", "페스티벌", "공연", "박람회", "행사") combined with information intent tokens
(e.g., "언제", "어디", "장소", "시간", "가격", "요금", "문의", "연락처", "전화", "이메일") by
normalizing the input (lowercase/trim) and checking for either direct compound
matches OR (containsAny(eventNouns) && containsAny(infoKeywords)); keep
null-check and reuse the existing compact/normalized string logic in
isEventInformationQuery to locate where to change.

@rktclgh rktclgh merged commit e95085f into develop May 28, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant