Skip to content

Commit 2604dc3

Browse files
committed
refactoring
1 parent 9936827 commit 2604dc3

17 files changed

+2336
-383
lines changed

Dockerfile

+31
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
FROM python:3.11-slim
2+
3+
WORKDIR /app
4+
5+
# Install system dependencies including SSH client and PDF generation requirements
6+
RUN apt-get update && apt-get install -y \
7+
build-essential \
8+
openssh-client \
9+
libpango-1.0-0 \
10+
libharfbuzz0b \
11+
libpangoft2-1.0-0 \
12+
&& rm -rf /var/lib/apt/lists/*
13+
14+
# Copy requirements first to leverage Docker cache
15+
COPY requirements.txt .
16+
RUN pip install --no-cache-dir -r requirements.txt
17+
18+
# Copy the rest of the application
19+
COPY . .
20+
21+
# Create logs directory with proper permissions
22+
RUN mkdir -p logs && chmod 777 logs
23+
24+
# Create cache directory for sampling wizard
25+
RUN mkdir -p .cache && chmod 777 .cache
26+
27+
# Expose Streamlit port
28+
EXPOSE 8501
29+
30+
# Command to run the application
31+
CMD ["streamlit", "run", "streamlit_app.py", "--server.address=0.0.0.0"]

LICENSE

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2024 QuerySight
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

+176-57
Original file line numberDiff line numberDiff line change
@@ -1,89 +1,208 @@
11
# QuerySight: ClickHouse Log-Driven dbt Project Enhancer
2-
# [WORK IN PROGRESS]
32

4-
This project analyzes ClickHouse query logs and dbt project structure to suggest improvements for optimizing the most common queries and data patterns.
3+
QuerySight is an advanced Streamlit-powered analytics platform designed to revolutionize dbt project performance monitoring through intelligent insights and interactive optimization tools.
54

6-
## Project Structure
7-
8-
The project consists of the following main components:
5+
## Features
96

10-
- `main.py`: The main script that combines all components and runs the analysis process.
11-
- `utils/data_acquisition.py`: Module for retrieving and preprocessing query logs from ClickHouse.
12-
- `utils/dbt_analyzer.py`: Module for analyzing the dbt project structure.
13-
- `utils/ai_suggester.py`: Module for generating improvement suggestions using the OpenAI API.
7+
- 📊 Streamlit web interface with AI-driven performance insights
8+
- 🔍 ClickHouse log parsing for real-time data transformation analysis
9+
- 🤖 Intelligent performance optimization recommendations
10+
- 📈 Advanced dbt workflow tracking and diagnostic capabilities
11+
- 🧠 Machine learning-enhanced query improvement suggestions
12+
- 💡 AI-powered proposal management system
13+
- 🎯 Smart sampling wizard for efficient data analysis
14+
- 📦 Intelligent cache management for improved performance
1415

15-
## Requirements
16+
## Prerequisites
1617

17-
- Python 3.7+
18-
- clickhouse-driver
19-
- PyYAML
20-
- openai
21-
- streamlit (optional, in case of using web interface)
18+
- Python 3.11+
19+
- ClickHouse database instance
20+
- OpenAI API key
21+
- dbt project
2222

2323
## Installation
2424

25-
1. Clone the repository:
26-
git clone https://github.com/yourusername/clickhouse-dbt-optimizer.git
27-
cd clickhouse-dbt-optimizer
28-
29-
2. Install dependencies:
30-
`pip install -r requirements.txt`
25+
1. Clone the repository:
26+
```bash
27+
git clone https://github.com/codeium/querysight.git
28+
cd querysight
29+
```
30+
31+
2. Install dependencies:
32+
The project uses the following main packages:
33+
- clickhouse-driver: For ClickHouse database connectivity
34+
- openai: For AI-powered suggestions
35+
- pandas: For data analysis
36+
- python-dotenv: For environment variable management
37+
- pyyaml: For dbt project configuration parsing
38+
- reportlab: For PDF report generation
39+
- streamlit: For the web interface
40+
- trafilatura: For web content extraction
41+
- twilio: For notifications (optional)
42+
43+
You can install all dependencies using:
44+
```bash
45+
pip install -r requirements.txt
46+
```
47+
48+
## Configuration
49+
50+
1. Set up your environment variables:
51+
- `OPENAI_API_KEY`: Your OpenAI API key
52+
- `DBT_PROJECT_PATH`: Path to your dbt project (when using Docker)
53+
54+
2. Configure ClickHouse connection:
55+
- Host
56+
- Port
57+
- Username
58+
- Password
59+
- Database
60+
61+
3. Cache Directory:
62+
- The application uses a `.cache` directory for the sampling wizard
63+
- This is automatically created in Docker, or you can create it manually:
64+
```bash
65+
mkdir -p .cache
66+
chmod 777 .cache
67+
```
3168

3269
## Usage
3370

34-
### Console interface
71+
1. Start the Streamlit application:
72+
```bash
73+
streamlit run streamlit_app.py
74+
```
75+
76+
2. Access the web interface at `http://localhost:8501`
77+
78+
3. In the sidebar:
79+
- Enter your dbt project path
80+
- Configure date range for analysis
81+
- Provide ClickHouse credentials
82+
- Enter your OpenAI API key
83+
84+
4. Use the "Analyze and Suggest" button to:
85+
- Analyze query patterns
86+
- Get AI-powered optimization suggestions
87+
- Generate performance reports
88+
89+
5. Manage improvement proposals:
90+
- Generate new proposals for specific query patterns
91+
- View and organize saved proposals
92+
- Track implementation progress
93+
94+
## Docker Deployment
3595

36-
Run the `main.py` script with the necessary arguments:
37-
`python main.py --dbt-project /path/to/dbt/project --start-date 2023-01-01 --end-date 2023-12-31 --openai-api-key your_openai_api_key`
96+
### Prerequisites
97+
- Docker
98+
- Docker Compose
3899

39-
Arguments:
40-
- `--dbt-project`: Path to the dbt project
41-
- `--start-date`: Start date for query analysis (YYYY-MM-DD)
42-
- `--end-date`: End date for query analysis (YYYY-MM-DD)
43-
- `--openai-api-key`: OpenAI API key
100+
### Quick Start
44101

45-
### Streamlit web interface
46-
Run the Streamlit app:
47-
`streamlit run streamlit_app.py`
48-
Your default web browser should automatically open to `http://localhost:8501`. If it doesn't, you can manually open this URL.
102+
1. Clone the repository:
103+
```bash
104+
git clone https://github.com/codeium/querysight.git
105+
cd querysight
106+
```
49107

50-
Use the sidebar to input your configuration:
51-
- Enter the path to your dbt project
52-
- Select the start and end dates for query analysis
53-
- Input your OpenAI API key
54-
- Provide your ClickHouse credentials
108+
2. Set up environment variables:
109+
```bash
110+
cp .env.example .env
111+
```
112+
Edit the `.env` file with your configuration:
113+
- Set your ClickHouse credentials
114+
- Add your OpenAI API key
55115

56-
Click the "Analyze and Suggest" button to start the analysis process.
116+
3. Build and run with Docker Compose:
117+
```bash
118+
docker compose up -d
119+
```
57120

121+
The application will be available at `http://localhost:8501`
58122

59-
## Workflow
123+
### Docker Configuration
60124

61-
1. The script will prompt for ClickHouse credentials.
62-
2. Retrieves and preprocesses query logs from ClickHouse.
63-
3. Analyzes queries to identify common patterns.
64-
4. Analyzes the dbt project structure.
65-
5. Generates improvement suggestions using the OpenAI API.
66-
6. Outputs suggestions for dbt project improvements.
125+
The application is containerized with the following components:
126+
- QuerySight web application (Streamlit)
127+
- ClickHouse database
67128

68-
## Modules
129+
Key features of the Docker setup:
130+
- Automatic database initialization
131+
- Volume persistence for logs and database data
132+
- Environment variable configuration
133+
- Exposed ports:
134+
- 8501: Streamlit web interface
135+
- 9000: ClickHouse native interface
136+
- 8123: ClickHouse HTTP interface
69137

70-
### ClickHouseDataAcquisition
138+
### Maintenance
71139

72-
Responsible for retrieving query logs from ClickHouse, preprocessing the data, and analyzing queries.
140+
- View logs:
141+
```bash
142+
docker compose logs -f querysight
143+
```
73144

74-
### DBTProjectAnalyzer
145+
- Stop the application:
146+
```bash
147+
docker compose down
148+
```
75149

76-
Analyzes the dbt project structure, including models, sources, and macros.
150+
- Reset everything (including volumes):
151+
```bash
152+
docker compose down -v
153+
```
77154

78-
### AISuggester
155+
## Components
79156

80-
Uses the OpenAI API to generate improvement suggestions based on query analysis and dbt structure.
157+
### Data Acquisition
158+
- `utils/data_acquisition.py`: Handles ClickHouse query log retrieval and analysis
81159

82-
## Security
160+
### dbt Analysis
161+
- `utils/dbt_analyzer.py`: Analyzes dbt project structure and dependencies
83162

84-
- Do not store ClickHouse credentials or OpenAI API key in the code. Use environment variables or a secure secret storage.
85-
- Ensure you have the necessary permissions to access ClickHouse query logs.
163+
### AI Suggestions
164+
- `utils/ai_suggester.py`: Generates intelligent optimization suggestions using OpenAI
165+
166+
### PDF Reports
167+
- `utils/pdf_generator.py`: Creates detailed PDF reports of analysis and suggestions
168+
169+
## Project Structure
170+
171+
```
172+
querysight/
173+
├── streamlit_app.py # Main Streamlit application
174+
├── utils/ # Core functionality modules
175+
│ ├── ai_suggester.py # AI-powered optimization suggestions
176+
│ ├── cache_manager.py # Caching system for performance
177+
│ ├── config.py # Configuration management
178+
│ ├── data_acquisition.py # ClickHouse data retrieval
179+
│ ├── dbt_analyzer.py # dbt project analysis
180+
│ ├── logger.py # Logging configuration
181+
│ ├── pdf_generator.py # Report generation
182+
│ └── sampling_wizard.py # Smart data sampling
183+
├── Dockerfile # Container definition
184+
├── docker-compose.yml # Container orchestration
185+
├── requirements.txt # Python dependencies
186+
└── pyproject.toml # Project metadata and tools config
187+
```
188+
189+
## Security Considerations
190+
191+
- Store sensitive credentials securely
192+
- Use environment variables for API keys
193+
- Ensure proper access controls for ClickHouse
194+
- Regular security updates for dependencies
86195

87196
## Contributing
88197

89-
Please create issues to report problems or suggest new features. Pull requests are welcome!
198+
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
199+
200+
1. Fork the repository
201+
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
202+
3. Commit your changes (`git commit -m 'Add some amazing feature'`)
203+
4. Push to the branch (`git push origin feature/amazing-feature`)
204+
5. Open a Pull Request
205+
206+
## License
207+
208+
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

docker-compose.yml

+36
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
services:
2+
querysight:
3+
build: .
4+
network_mode: "host"
5+
ports:
6+
- "8501:8501"
7+
environment:
8+
- CLICKHOUSE_HOST=localhost
9+
- CLICKHOUSE_PORT=${CLICKHOUSE_PORT:-9000}
10+
- CLICKHOUSE_USER=${CLICKHOUSE_USER:-default}
11+
- CLICKHOUSE_PASSWORD=${CLICKHOUSE_PASSWORD}
12+
- CLICKHOUSE_DATABASE=${CLICKHOUSE_DATABASE:-default}
13+
- OPENAI_API_KEY=${OPENAI_API_KEY}
14+
- DBT_PROJECT_PATH=/app/dbt_project
15+
- PYTHONUNBUFFERED=1
16+
volumes:
17+
- ./logs:/app/logs
18+
- ~/.ssh:/root/.ssh:ro
19+
- ${DBT_PROJECT_PATH}:/app/dbt_project:ro
20+
- ./.cache:/app/.cache
21+
22+
# Optional ClickHouse service (uncomment if you need a local ClickHouse instance)
23+
# clickhouse:
24+
# image: clickhouse/clickhouse-server:latest
25+
# ports:
26+
# - "8123:8123" # HTTP port
27+
# - "9000:9000" # Native port
28+
# volumes:
29+
# - clickhouse_data:/var/lib/clickhouse
30+
# environment:
31+
# - CLICKHOUSE_USER=${CLICKHOUSE_USER:-default}
32+
# - CLICKHOUSE_PASSWORD=${CLICKHOUSE_PASSWORD}
33+
# - CLICKHOUSE_DB=${CLICKHOUSE_DATABASE:-default}
34+
#
35+
#volumes:
36+
# clickhouse_data:

0 commit comments

Comments
 (0)