Summarize and Chat

This repository contains the source code for the summarize-and-chat project. This project provides a unified document summarization and chat framework with LLMs, aiming to address the challenges of building a scalable solution for document summarization while facilitating natural language interactions through chat interfaces.

Core features include:

Support a range of document lengths and formats (PDF, DOCX, PPTX, TXT, VTT, VTT, Audio) and accommodate various types of content
Support open source LLMs on OpenAI-compatible LLM inference engine
An intuitive user interface for file upload, summary generation, and chat
Summarization:
- Insert, paste or upload your files & preview files
- Pick the way you want to summarize (allow user to provide custom prompts, chunk size, page range for docs or time range for audio)
- Adjust your summary length
- Get your summary in seconds and download your summary
Chat with your doc - ask any question based on your doc for enhanced analysis
- Auto-generated questions from the doc
- Get the answer with the source in seconds
Insight Analysis
- Select two or more docs
- Write the prompt to compare or identify the insights from the selected docs
Speech-to-text convention
Support PDF parsers: PyPDF, PDFMiner, PyMUPDF
APIs - Cohere's summarize API compatible

Disclaimer

Be aware that LLMs pose inherent vulnerabilities and risks, as illustrated by the OWASP Top 10 for Large Language Model Applications. We strongly encourage customers to pay attention to OWASP guidance and the NIST AI Risk Management Framework to build safe and robust AI systems.

What is included

Summarize-and-chat project includes three components:

summarization-client: Angular/Clarity web application for content management, summary generation and chat.
summarization-server: FastAPI gateway server to manage core application functions including access control, document ingestion pipeline,summarization with LangChain, and improved RAG with LlamaIndex from a PGVector Store.
stt-service (speech-to-text): A microservice to convert audio to text using OpenAI’s faster-whisper

Quick start

For development environment and build configuration see build documentation

Contributing

Summarize-and-chat project team welcomes contributions from the community. Before you start working with Summarize-and-chat project, please read our Contributor License Agreement. All contributions to this repository must be signed as described on that page. Your signature certifies that you wrote the patch or have the right to pass it on as an open-source patch. For more detailed information, refer to CONTRIBUTING.md.

Bugs and feature requests

Have a bug or a feature request? Please first read the issue guidelines and search for existing and closed issues. If your problem or idea is not addressed yet, please open a new issue.

Copyright and license

The project is licensed under the terms of the Apache 2.0 license.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
stt-service		stt-service
summarization-client		summarization-client
summarization-server		summarization-server
BUILD.md		BUILD.md
CODE-OF-CONDUCT.md		CODE-OF-CONDUCT.md
CONTRIBUTING_CLA.md		CONTRIBUTING_CLA.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Summarize and Chat

Disclaimer

What is included

Quick start

Contributing

Bugs and feature requests

Copyright and license

About

Releases

Packages

Contributors 2

Languages

License

vmware/summarize-and-chat

Folders and files

Latest commit

History

Repository files navigation

Summarize and Chat

Disclaimer

What is included

Quick start

Contributing

Bugs and feature requests

Copyright and license

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages