Name	Name	Last commit message	Last commit date
parent directory ..
.assets	.assets
configs	configs
materializers	materializers
pipelines	pipelines
steps	steps
utils	utils
.dockerignore	.dockerignore
LICENSE	LICENSE
README.md	README.md
requirements.txt	requirements.txt
run.py	run.py

🎮 GameSense: An LLM That Transforms Gaming Conversations into Structured Data

GameSense is a specialized language model that converts unstructured gaming conversations into structured, actionable data. It listens to how gamers talk and extracts valuable information that can power recommendations, support systems, and analytics.

🎯 What GameSense Does

Input: Gamers' natural language about games from forums, chats, reviews, etc.

Output: Structured data with categorized information about games, platforms, preferences, etc.

Here's a concrete example from our training data:

Input Example (Gaming Conversation)

"Dirt: Showdown from 2012 is a sport racing game for the PlayStation, Xbox, PC rated E 10+ (for Everyone 10 and Older). It's not available on Steam, Linux, or Mac."

Output Example (Structured Information)

inform(
    name[Dirt: Showdown],
    release_year[2012],
    esrb[E 10+ (for Everyone 10 and Older)],
    genres[driving/racing, sport],
    platforms[PlayStation, Xbox, PC],
    available_on_steam[no],
    has_linux_release[no],
    has_mac_release[no]
)

This structured output can be used to:

Answer specific questions about games ("Is Dirt: Showdown available on Mac?")
Track trends in gaming discussions
Power recommendation engines
Extract user opinions and sentiment
Build gaming knowledge graphs
Enhance customer support

🚀 How GameSense Transforms Gaming Conversations

GameSense listens to gaming chats, forum posts, customer support tickets, social media, and other sources where gamers communicate. As gamers discuss different titles, features, opinions, and issues, GameSense:

Recognizes gaming jargon across different genres and communities
Extracts key information about games, platforms, features, and opinions
Structures this information into a standardized format
Makes it available for downstream applications

💡 Real-World Applications

Community Analysis

Monitor conversations across Discord, Reddit, and other platforms to track what games are being discussed, what features players care about, and emerging trends.

Intelligent Customer Support

When a player says: "I can't get Dirt: Showdown to run on my Mac," GameSense identifies:

The specific game (Dirt: Showdown)
The platform issue (Mac)
The fact that the game doesn't support Mac (from structured knowledge)
Can immediately inform the player about platform incompatibility

Smart Recommendations

When a player has been discussing racing games for PlayStation with family-friendly ratings, GameSense can help power recommendations for similar titles they might enjoy.

Automated Content Moderation

By understanding the context of gaming conversations, GameSense can better identify toxic behavior while recognizing harmless gaming slang.

🧠 Technical Approach

GameSense uses Parameter-Efficient Fine-Tuning (PEFT) to customize powerful foundation models for understanding gaming language:

We start with a base model like Microsoft's Phi-2 or Llama 3.1
Fine-tune on the gem/viggo dataset containing structured gaming conversations
Use LoRA adapters for efficient training
Evaluate on gaming-specific benchmarks
Deploy to production environments

🎯 Use Cases

Customer Support Automation: Understand and respond to player issues with context-aware solutions
Community Moderation: Detect toxic language with nuanced understanding of gaming communication
Player Insights: Extract actionable intelligence from forums, chats, and reviews
Recommendation Systems: Power personalized game and content suggestions
In-Game Assistants: Create NPCs or helpers that understand player intentions

🔧 Getting Started

Prerequisites

Python 3.8+
GPU with at least 24GB VRAM (for full model training)
ZenML installed and configured
Neptune.ai account for experiment tracking (optional)

Environment Setup

Set up your Neptune.ai credentials if you want to use Neptune for experiment tracking:

# Set your Neptune project name and API token as environment variables
export NEPTUNE_PROJECT="your-neptune-workspace/your-project-name"
export NEPTUNE_API_TOKEN="your-neptune-api-token"

Quick Setup

Install GameSense:

# Set up a Python virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install requirements
pip install -r requirements.txt

Run the end-to-end pipeline:

# For single-GPU training
python run.py --config orchestrator_finetune.yaml

# For multi-GPU acceleration
python run.py --config orchestrator_finetune.yaml --accelerate

Warning

All pipeline steps have a clean_gpu_memory(force=True) at the beginning. This ensures memory is properly cleared after previous steps.

This functionality might affect other GPU processes running on the same environment. If you don't want to clean GPU memory between steps, you can remove these utility calls from all steps.

The trained model will be automatically stored in your ZenML artifact store, ready for deployment.

⚙️ Configuration Options

GameSense offers flexible configuration to meet your specific gaming platform needs:

Base Model Selection

Choose from powerful foundation models:

Microsoft Phi-2: Lightweight yet powerful for most gaming applications (default)
Llama 3.1: Advanced capabilities for complex gaming interactions

# To use Llama 3.1 instead of Phi-2
python run.py --config configs/llama3-1_finetune_local.yaml

Tip

To finetune the Llama 3.1 base model, use the alternative configuration files provided in the configs folder:

For remote finetuning: llama3-1_finetune_remote.yaml
For local finetuning: llama3-1_finetune_local.yaml

Dataset Configuration

By default, GameSense uses the gem/viggo dataset, which contains structured gaming information like:

gem_id	meaning_representation	target	references
viggo-train-0	inform(name[Dirt: Showdown], release_year[2012], esrb[E 10+ (for Everyone 10 and Older)], genres[driving/racing, sport], platforms[PlayStation, Xbox, PC], available_on_steam[no], has_linux_release[no], has_mac_release[no])	Dirt: Showdown from 2012 is a sport racing game for the PlayStation, Xbox, PC rated E 10+ (for Everyone 10 and Older). It's not available on Steam, Linux, or Mac.	[Dirt: Showdown from 2012 is a sport racing game for the PlayStation, Xbox, PC rated E 10+ (for Everyone 10 and Older). It's not available on Steam, Linux, or Mac.]
viggo-train-1	inform(name[Dirt: Showdown], release_year[2012], esrb[E 10+...])	Dirt: Showdown is a sport racing game...	[Dirt: Showdown is a sport racing game...]

You can also train on your own gaming conversations by formatting them in a similar structure and updating the configuration.

Training Acceleration

For faster training on high-end hardware:

Multi-GPU Training: Distribute training across multiple GPUs using Distributed Data Parallelism (DDP)
Mixed Precision: Optimize memory usage without sacrificing quality

# Enable distributed training across all available GPUs
python run.py --config orchestrator_finetune.yaml --accelerate

Under the hood, the finetuning step will spin up an accelerated job using Hugging Face Accelerate, which will run on all available GPUs.

☁️ Enterprise Deployment

For production deployment, GameSense can be trained and served on cloud infrastructure:

Set up your cloud environment:
- Register an orchestrator or step operator with GPU access (at least 24GB VRAM)
- Register a remote artifact store and container registry
- To access GPUs with sufficient VRAM, you may need to increase your GPU quota (AWS, GCP, Azure)
- If the CUDA version on your GPU instance is incompatible with the default Docker image, modify it in the configuration file. See available PyTorch images
```
# Register a complete stack with GPU support
zenml stack register gamesense-stack -o <ORCHESTRATOR_NAME> \
    -a <ARTIFACT_STORE_NAME> \
    -c <CONTAINER_REGISTRY_NAME> \
    [-s <STEP_OPERATOR_NAME>]
```

Launch remote training:

# For cloud-based training
python run.py --config configs/llama3-1_finetune_remote.yaml

🔄 Customization for Your Gaming Platform

Training on Your Gaming Data

To fine-tune GameSense on your specific gaming platform's data:

Prepare your dataset: Format your gaming conversations, support tickets, or forum posts
Update the configuration: Modify the dataset_name parameter in your config file
Adjust tokenization: If needed, customize the generate_and_tokenize_prompt function

For detailed instructions on data preparation, see our data customization guide.

📊 Performance Monitoring

GameSense includes built-in evaluation using industry-standard metrics:

ROUGE Scores: Measure how well the model can generate natural language from structured data
Gaming-Specific Benchmarks: Evaluate understanding of gaming terminology
Automatic Model Promotion: Only deploy models that meet quality thresholds

All metrics are tracked in the ZenML dashboard for easy monitoring and comparison.

📁 Technical Architecture

GameSense follows a modular architecture for easy customization:

├── configs                                       # Configuration profiles for different deployment scenarios
│   ├── orchestrator_finetune.yaml                # Default local or remote orchestrator configuration
│   └── remote_finetune.yaml                      # Default step operator configuration
├── materializers                                 # Custom data handlers for gaming-specific content
│   └── directory_materializer.py                 # Custom materializer to push directories to the artifact store
├── pipelines                                     # Core pipeline definitions
│   └── train.py                                  # Finetuning and evaluation pipeline
├── steps                                         # Individual pipeline components
│   ├── evaluate_model.py                         # Gaming-specific evaluation metrics
│   ├── finetune.py                               # Model customization for gaming terminology
│   ├── log_metadata.py                           # Helper step for model metadata logging
│   ├── prepare_datasets.py                       # Gaming data processing
│   └── promote.py                                # Production deployment logic
├── utils                                         # Utility functions
│   ├── callbacks.py                              # Custom callbacks
│   ├── loaders.py                                # Loaders for models and data
│   ├── logging.py                                # Logging helpers
│   └── tokenizer.py                              # Load and tokenize
└── run.py                                        # CLI tool to run pipelines on ZenML Stack

🗂️ Bring Your Own Gaming Data

To fine-tune GameSense on your specific gaming platform's data:

Format your dataset: Prepare your gaming conversations in a structured format similar to gem/viggo
Update the configuration: Point to your dataset in the config file
Run the pipeline: GameSense will automatically process and learn from your data

The prepare_data step handles:

Loading, tokenizing, and storing the dataset from an external source to your artifact store
Loading datasets from Hugging Face (requires train, validation, and test splits by default)
Tokenization via the generate_and_tokenize_prompt utility function

For custom data sources, you'll need to prepare the splits in a Hugging Face dataset format. The step returns paths to the stored datasets (train, val, and test_raw splits), with the test set tokenized later during evaluation.

You can structure conversations from:

Game forums
Support tickets
Discord chats
Streaming chats
Reviews
Social media posts

📚 Documentation

For learning more about how to use ZenML to build your own MLOps pipelines, refer to our comprehensive ZenML documentation.

Running on CPU-only Environment

If you don't have access to a GPU, you can still run this project with the CPU-only configuration. We've made several optimizations to make this project work on CPU, including:

Smaller batch sizes for reduced memory footprint
Fewer training steps
Disabled GPU-specific features (quantization, bf16, etc.)
Using smaller test datasets for evaluation
Special handling for Phi-3.5 model caching issues on CPU

To run the project on CPU:

python run.py --config phi3.5_finetune_cpu.yaml

Note that training on CPU will be significantly slower than training on a GPU. The CPU configuration uses:

A smaller model (phi-3.5-mini-instruct) which is more CPU-friendly
Reduced batch size and increased gradient accumulation steps
Fewer total training steps (50 instead of 300)
Half-precision (float16) where possible to reduce memory usage
Smaller dataset subsets (100 training samples, 20 validation samples, 10 test samples)
Special compatibility settings for Phi models running on CPU

For best results, we recommend:

Using a machine with at least 16GB of RAM
Being patient! LLM training on CPU is much slower than on GPU
If you still encounter memory issues, try reducing the max_train_samples parameter even further in the config file

Known Issues and Workarounds

Some large language models like Phi-3.5 have caching mechanisms that are optimized for GPU usage and may encounter issues when running on CPU. Our CPU configuration includes several workarounds:

Disabling KV caching for model generation
Using torch.float16 data type to reduce memory usage
Disabling flash attention which isn't needed on CPU
Using standard AdamW optimizer instead of 8-bit optimizers that require GPU

These changes allow the model to run on CPU with less memory and avoid compatibility issues, although at the cost of some performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gamesense

gamesense

README.md

🎮 GameSense: An LLM That Transforms Gaming Conversations into Structured Data

🎯 What GameSense Does

Input Example (Gaming Conversation)

Output Example (Structured Information)

🚀 How GameSense Transforms Gaming Conversations

💡 Real-World Applications

Community Analysis

Intelligent Customer Support

Smart Recommendations

Automated Content Moderation

🧠 Technical Approach

🎯 Use Cases

🔧 Getting Started

Prerequisites

Environment Setup

Quick Setup

⚙️ Configuration Options

Base Model Selection

Dataset Configuration

Training Acceleration

☁️ Enterprise Deployment

🔄 Customization for Your Gaming Platform

Training on Your Gaming Data

📊 Performance Monitoring

📁 Technical Architecture

🗂️ Bring Your Own Gaming Data

📚 Documentation

Running on CPU-only Environment

Known Issues and Workarounds

Files

gamesense

Directory actions

More options

Directory actions

More options

Latest commit

History

gamesense

Folders and files

parent directory

README.md

🎮 GameSense: An LLM That Transforms Gaming Conversations into Structured Data

🎯 What GameSense Does

Input Example (Gaming Conversation)

Output Example (Structured Information)

🚀 How GameSense Transforms Gaming Conversations

💡 Real-World Applications

Community Analysis

Intelligent Customer Support

Smart Recommendations

Automated Content Moderation

🧠 Technical Approach

🎯 Use Cases

🔧 Getting Started

Prerequisites

Environment Setup

Quick Setup

⚙️ Configuration Options

Base Model Selection

Dataset Configuration

Training Acceleration

☁️ Enterprise Deployment

🔄 Customization for Your Gaming Platform

Training on Your Gaming Data

📊 Performance Monitoring

📁 Technical Architecture

🗂️ Bring Your Own Gaming Data

📚 Documentation

Running on CPU-only Environment

Known Issues and Workarounds