Skip to content

oracle-samples/heatwave-ml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

HeatWave AutoML examples and performance benchmarks

HeatWave is an integrated, massively parallel, high-performance, in-memory query accelerator for MySQL Database Service that accelerates performance of MySQL by orders of magnitude for analytics and mixed workloads. It is the only service that enables you to run OLTP and OLAP workloads simultaneously and directly from your MySQL database, without any changes to your applications. This eliminates the need for complex, time-consuming, and expensive data movement and integration with a separate analytics database. Your applications connect to the HeatWave cluster through standard MySQL protocols.

HeatWave users currently do not have an easy way of creating machine-learning models for their data in the database, or generating predictions and explanations for it. Such users, while being database experts, frequently are relatively new to Machine Learning and can benefit from products that streamline the creation and usage of machine learning models. HeatWave AutoML is the product that addresses this need.

Required Services:

  1. Oracle Cloud Infrastructure
  2. MySQL Database Service and HeatWave

Getting started

  1. Provision MySQL Database Service instance and add a HeatWave cluster.
  2. Clone this repository and change directories
git clone https://github.com/oracle-samples/heatwave-ml.git
  1. Create a Python virtual environment and activate it as follows
python3.8 -m venv py_heatwaveml
source py_heatwaveml/bin/activate
  1. Install the necessary Python packages
pip install pandas numpy unlzw3 scikit-learn pyreadr --user

Python Notebooks

To help customers get started with Heatwave ML and showcase its capabilities, we have prepared a set of Jupyter notebooks. Each notebook focuses on a simple application of Heatwave ML components in practice and walks you through a solution. Here is the list of existing notebooks and a screenshot of the rendered HTML.

<style> table.wrap80 { table-layout: fixed; } table.wrap80 th, table.wrap80 td { white-space: normal; overflow-wrap: anywhere; word-break: break-word; } table.wrap80 col.desc { width: 80ch; } </style>
Description Link
Demonstrates the application of ML_GENERATE for content generation using data from the 2024 Olympic Games HeatWave MySQL AI
Demonstrates the application of ML_GENERATE for summarization using data from the 2024 Olympic Games HeatWave MySQL AI
Showcase the use of ML_RAG for Retrieval Augmented Generation (RAG) and HEATWAVE_CHAT for engaging in natural language interactions using data from the 2024 Olympic Games HeatWave MySQL AI
Training a model to predict whether a bank customer will subscribe to a term deposit HeatWave MySQL AI
Predict the price of a diamond based on its characteristics and prior prices of other diamonds HeatWave MySQL AI
Train an unsupervised anomaly detection model. In this context, "unsupervised" signifies that we'll be training our models without explicitly using the "Class" label (fraudulent or legitimate) during the training phase. Instead, we'll rely on the inherent patterns and structures within the transaction data to identify deviations from the norm. HeatWave MySQL AI
Building a personalized movie recommendation system using the MovieLens 100K dataset HeatWave MySQL AI
Building and evaluating a forecasting model using the synthetic Electricity Consumption dataset HeatWave MySQL AI
Building a LangChain chatbot using HeatWave GenAI showing how HeatWave GenAI can be easily used with any LangChain application HeatWave

SQL examples

SQL Code to run training, predictions and scoring on a variety of common Machine Learning classification and regression datasets.

Example Description #Rows (Training Set) #Features
airlines Predict Flight Delays 377568 8
bank_marketing Direct marketing – Banking Products 31648 17
cnae-9 Documents with free text business descriptions of Brazilian companies 757 857
connect-4 8-ply positions in the game of connect-4 in which neither player has won yet – predict win/loss 47290 161
fashion_mnist Clothing classification problem 60000 785
nomao Active learning is used to efficiently detect data that refer to a same place based on Nomao browser 24126 119
numerai Data is cleaned, regularized and encrypted global equity data 67425 22
higgs Monte Carlo Simulations 10500000 29
census Determine if a person makes > $50k 32561 15
titanic Survival Status of individuals 917 14
creditcard Identify fraudulent  transactions 199364 30
appetency Predict the propensity of customers to buy new products 35000 230
black_friday Customer purchases on Black Friday 116774 10
diamonds Predict price of a diamond 37758 10
mercedes Time the car took to pass testing 2946 377
news_popularity Predict the number of shares of article in social networks (popularity) 27750 60
nyc_taxi Predict tip amount for NYC taxi cab 407284 15
twitter The popularity of a topic on social media 408275 78

Contributing

This project welcomes contributions from the community. Before submitting a pull request, please review our contribution guide

Security

Please consult the security guide for our responsible security vulnerability disclosure process

License

Copyright (c) 2025 Oracle and/or its affiliates.

Released under the Universal Permissive License v1.0 as shown at https://oss.oracle.com/licenses/upl/.

About

No description, website, or topics provided.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •