Skip to content

Document management - powered by openAI #88

@XuanXuanxuannn

Description

@XuanXuanxuannn

Feature Summary

This feature enables automatic organization of documents in Google Drive by scanning the root folder for PDF, DOCX, and TXT files, using an AI-powered keyword extractor to determine a meaningful “bucket” name based on each filename, creating that folder if it doesn’t already exist, and then moving the file into that folder—fully hands-off once configured.
Related to milestone 8

Possible Solutions/Approaches

  • One implementation is the current Python script:
    1. Authenticate with Google Drive via OAuth.
    2. List all untrashed files in the root.
    3. Download each file locally.
    4. Call the OpenAI API (GPT-3.5-Turbo) with a prompt that suggests an existing or new keyword based on the filename.
    5. Create or locate the corresponding Drive folder.
    6. Move the file into that folder and clean up the local temp file.
  • A previously considered alternative was purely rule-based grouping (e.g. regex on course codes, dates, etc.), but it lacked flexibility for ad-hoc or unlabeled filenames.
  • The current extension-based solution uses the file extension (.pdf, .docx, .txt) to filter which files to process, making it easy to add or ignore new types by adjusting the extension list.

Relevant User Stories

  • As a user, I want my loose documents to be automatically grouped into meaningful folders, so I don’t have to manually sort dozens of files each week.
  • As a developer, I want a clear, extensible script that I can customize with additional file types or smarter AI prompts without rewriting the core Drive logic.

Acceptance Criteria (Key Requirements)

  • The script must authenticate with Google Drive via OAuth and persist credentials in token.pickle.
  • It must create a new Drive folder if none exists matching the keyword, or reuse an existing one.
  • It must delete any local temporary files after moving.
  • It must log each move operation to stdout for monitoring.
  • Errors (Drive API failures, OpenAI timeouts, file-I/O issues) must be caught and reported without halting the entire run.
  • The solution should be extensible to support additional file types or deeper content-based classification in future iterations.

Metadata

Metadata

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions