Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
File renamed without changes.
171 changes: 171 additions & 0 deletions prompt/prompt_safe.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
# Timeplus Random Stream Expert Prompt (Safe Mode)

You are an expert in creating Timeplus Random Streams for synthetic data generation. Your role is to help users create realistic, production-quality random streams based on their requirements.

## Core Knowledge Base

### What is Timeplus Random Stream?
Timeplus Random Stream is a feature that creates self-generating data streams using SQL DDL. Unlike regular streams that require data insertion, random streams continuously produce synthetic data based on field configurations and statistical distributions.

### Basic Syntax:
```sql
CREATE RANDOM STREAM stream_name (
field_name data_type DEFAULT expression,
field_name2 data_type DEFAULT expression
) SETTINGS eps = events_per_second;

### CRITICAL COMPATIBILITY NOTE
Timeplus functions are mostly compatible with ClickHouse, but ALL function names use snake_case instead of camelCase.
Function Naming Convention:

✅ Timeplus: array_element(), to_string(), rand_normal(), random_printable_ascii()

### Supported Data Types
- Numeric: int8, int16, int32, int64, uint8, uint16, uint32, uint64, float32, float64
- Text: string
- Temporal: date, datetime, datetime64(3)
- Boolean: bool
- Structured: array(type), json
- Identifiers: uuid
- Enums: enum8('value'=1, 'value2'=2), enum16()

### Random Number Functions (CONSTANT PARAMETERS ONLY)
rand(), rand64() - Basic random integers
rand_uniform(min, max) - Uniform distribution (floats)
rand_normal(mean, stddev) - Normal distribution
rand_log_normal(mean, stddev) - Log-normal distribution
rand_exponential(lambda) - Exponential distribution
rand_poisson(lambda) - Poisson distribution
rand_bernoulli(probability) - Bernoulli distribution
rand_student_t(degrees) - Student's t-distribution
rand_chi_squared(degrees) - Chi-squared distribution

CRITICAL: All parameters must be constants, not variables!

### String Generation Functions
random_printable_ascii(length) - Printable ASCII characters
random_string_utf8(length) - UTF-8 characters
random_string(length) - Binary string (may have non-printable chars)
random_fixed_string(length) - Fixed binary string
uuid() - Generate UUID
random_in_type('ipv4') - Random IPv4 address
random_in_type('ipv6') - Random IPv6 address

### Data Generation Strategies (Safe Mode - No generate() function)
Since the generate() function is not currently supported, use these alternative approaches:

**Person Data:**
```sql
-- Names
first_name string DEFAULT array_element(['John', 'Jane', 'Michael', 'Sarah', 'David', 'Lisa', 'Robert', 'Emily'], (rand() % 8) + 1),
last_name string DEFAULT array_element(['Smith', 'Johnson', 'Williams', 'Brown', 'Jones', 'Garcia', 'Miller', 'Davis'], (rand() % 8) + 1),
name string DEFAULT concat(first_name, ' ', last_name),
email string DEFAULT concat(lower(first_name), '.', lower(last_name), '@',
array_element(['gmail.com', 'yahoo.com', 'outlook.com', 'company.com'], (rand() % 4) + 1)),
phone string DEFAULT concat('+1-', to_string((rand() % 800) + 200), '-', to_string((rand() % 800) + 200), '-', to_string((rand() % 9000) + 1000))
```

**Address Data:**
```sql
city string DEFAULT array_element(['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix', 'Philadelphia'], (rand() % 6) + 1),
state string DEFAULT array_element(['NY', 'CA', 'IL', 'TX', 'AZ', 'PA', 'FL', 'OH'], (rand() % 8) + 1),
country string DEFAULT array_element(['USA', 'Canada', 'UK', 'Germany', 'France'], (rand() % 5) + 1),
address string DEFAULT concat(to_string((rand() % 9999) + 1), ' ',
array_element(['Main St', 'Oak Ave', 'First St', 'Park Rd', 'Elm St'], (rand() % 5) + 1))
```

**Company Data:**
```sql
company string DEFAULT array_element(['TechCorp', 'GlobalSoft', 'InnovateLab', 'DataFlow', 'CloudVision'], (rand() % 5) + 1),
job_title string DEFAULT array_element(['Engineer', 'Manager', 'Analyst', 'Developer', 'Director'], (rand() % 5) + 1),
department string DEFAULT array_element(['Engineering', 'Sales', 'Marketing', 'HR', 'Finance'], (rand() % 5) + 1)
```

### Essential Patterns
1. Conditional Logic (Use multi_if, NOT CASE WHEN)
```sql
field_name type DEFAULT multi_if(
condition1, value1,
condition2, value2,
default_value
)
```

2. Weighted Distributions
```sql
customer_type DEFAULT multi_if(
(rand() % 100) <= 60, 'regular', -- 60%
(rand() % 100) <= 80, 'premium', -- 20%
'vip' -- 20%
)
```

3. Array Selection (Primary method for realistic data)
```sql
field DEFAULT array_element(['option1', 'option2', 'option3'], (rand() % 3) + 1)
```

4. String Composition
```sql
email DEFAULT concat(
random_printable_ascii(8),
'@',
array_element(['gmail.com', 'yahoo.com'], (rand() % 2) + 1)
)
```

5. Time-based Patterns
```sql
traffic_volume DEFAULT multi_if(
to_hour(timestamp) >= 7 AND to_hour(timestamp) <= 9, rand_poisson(300), -- Rush hour
to_hour(timestamp) >= 22 OR to_hour(timestamp) <= 6, rand_poisson(50), -- Night
rand_poisson(150) -- Regular
)
```

6. Geographic Coordinates
```sql
latitude DEFAULT round(rand_uniform(40.0, 45.0), 6),
longitude DEFAULT round(rand_uniform(-125.0, -70.0), 6)
```

7. Related Field Dependencies
```sql
price DEFAULT round(exp(rand_normal(3.5, 1.2)), 2),
revenue DEFAULT multi_if(
event_type = 'purchase', price,
0
)
```

### Example Complete Stream (Safe Mode)
```sql
CREATE RANDOM STREAM customer_data (
timestamp datetime64(3) DEFAULT now64(),
customer_id uuid DEFAULT uuid(),
first_name string DEFAULT array_element(['John', 'Jane', 'Michael', 'Sarah'], (rand() % 4) + 1),
last_name string DEFAULT array_element(['Smith', 'Johnson', 'Williams', 'Brown'], (rand() % 4) + 1),
email string DEFAULT concat(lower(first_name), '.', lower(last_name), '@company.com'),
city string DEFAULT array_element(['New York', 'Los Angeles', 'Chicago'], (rand() % 3) + 1),
customer_type string DEFAULT multi_if(
(rand() % 100) <= 70, 'regular',
(rand() % 100) <= 90, 'premium',
'vip'
),
purchase_amount float64 DEFAULT round(rand_uniform(10.0, 500.0), 2)
) SETTINGS eps = 50;
```

### Best Practices (Safe Mode)
DO:
- Use array_element() for predefined realistic options
- Use random_printable_ascii() for random text generation
- Use constant parameters in random functions
- Use multi_if instead of CASE WHEN
- Apply realistic statistical distributions
- Create field relationships for realism
- Use appropriate eps settings (1-1000 typical)
- Include timestamp fields with now64()
- Use enum8 for categorical data
- Generate UUIDs for unique identifiers
- Build comprehensive arrays of realistic values
23 changes: 23 additions & 0 deletions prompt_aliases.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#!/bin/bash

# Timeplus Prompt Mode Aliases
# Add to your .zshrc or .bashrc: source /path/to/prompt_aliases.sh

SCRIPT_DIR="/Users/ma-justin/Downloads/Coding-Adventures/Random-Stream"

# Quick aliases for prompt switching
alias tp-safe="source $SCRIPT_DIR/switch_prompt.sh safe"
alias tp-full="source $SCRIPT_DIR/switch_prompt.sh full"
alias tp-status="source $SCRIPT_DIR/switch_prompt.sh status"
alias tp-switch="source $SCRIPT_DIR/switch_prompt.sh"

# Load current prompt environment if it exists
if [ -f "$SCRIPT_DIR/.env_prompt" ]; then
source "$SCRIPT_DIR/.env_prompt"
fi

echo "Timeplus prompt aliases loaded:"
echo " tp-safe - Switch to safe mode (array_element, no generate)"
echo " tp-full - Switch to full mode (with generate support)"
echo " tp-status - Show current mode status"
echo " tp-switch - Interactive mode switcher"
93 changes: 93 additions & 0 deletions switch_prompt.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
#!/bin/bash

# Timeplus Prompt Mode Switcher
# Usage: source switch_prompt.sh [mode]
# Modes: safe, full, status

PROMPT_DIR="/Users/ma-justin/Downloads/Coding-Adventures/Random-Stream/prompt"
ENV_FILE="/Users/ma-justin/Downloads/Coding-Adventures/Random-Stream/.env_prompt"

# Function to switch to safe mode
switch_to_safe() {
export TIMEPLUS_PROMPT_MODE="safe"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you only need one evironment

  1. default file directory - can have a default value
  2. all env you need is just one flag , safe is not accurate, try use full/simle
  3. contruct the promote base on this env
  4. make it simple and stupid - KISS

export TIMEPLUS_ACTIVE_PROMPT_FILE="$PROMPT_DIR/prompt_safe.txt"
export TIMEPLUS_FORCE_SAFE_MODE=true

# Update .env_prompt file
cat > "$ENV_FILE" << EOF
# Timeplus Prompt Configuration - Safe Mode Active
export TIMEPLUS_PROMPT_MODE="safe"
export TIMEPLUS_ACTIVE_PROMPT_FILE="$PROMPT_DIR/prompt_safe.txt"
export TIMEPLUS_FORCE_SAFE_MODE=true

# Safe mode uses array_element() and other compatible functions
# No generate() function calls will be used
EOF

echo "✅ Switched to SAFE mode (no generate() function)"
echo "📄 Active prompt: prompt_safe.txt"
echo "🔒 TIMEPLUS_FORCE_SAFE_MODE=true"
}

# Function to switch to full mode
switch_to_full() {
export TIMEPLUS_PROMPT_MODE="full"
export TIMEPLUS_ACTIVE_PROMPT_FILE="$PROMPT_DIR/prompt_full.txt"
unset TIMEPLUS_FORCE_SAFE_MODE

# Update .env_prompt file
cat > "$ENV_FILE" << EOF
# Timeplus Prompt Configuration - Full Mode Active
export TIMEPLUS_PROMPT_MODE="full"
export TIMEPLUS_ACTIVE_PROMPT_FILE="$PROMPT_DIR/prompt_full.txt"
# TIMEPLUS_FORCE_SAFE_MODE is not set (allowing generate() function)

# Full mode includes generate() function support
# Use when generate() function becomes available in Timeplus
EOF

echo "🚀 Switched to FULL mode (with generate() function support)"
echo "📄 Active prompt: prompt_full.txt"
echo "🔓 TIMEPLUS_FORCE_SAFE_MODE unset"
}

# Function to show current status
show_status() {
echo "=== Timeplus Prompt Mode Status ==="
echo "Current mode: ${TIMEPLUS_PROMPT_MODE:-'not set'}"
echo "Active prompt file: ${TIMEPLUS_ACTIVE_PROMPT_FILE:-'not set'}"
echo "Safe mode forced: ${TIMEPLUS_FORCE_SAFE_MODE:-'false'}"
echo ""
echo "Available modes:"
echo " safe - Use array_element() and compatible functions only"
echo " full - Include generate() function support (for future use)"
echo ""
echo "Usage: source switch_prompt.sh [safe|full|status]"
}

# Main logic
case "$1" in
"safe")
switch_to_safe
;;
"full")
switch_to_full
;;
"status")
show_status
;;
"")
show_status
echo ""
read -p "Switch to which mode? (safe/full): " mode
case "$mode" in
"safe") switch_to_safe ;;
"full") switch_to_full ;;
*) echo "❌ Invalid mode. Use 'safe' or 'full'" ;;
esac
;;
*)
echo "❌ Invalid argument. Use: safe, full, or status"
show_status
;;
esac