GitHub - 51nb/marble: A high performance in-memory hive sql engine based on Apache Calcite

Marble is a high performance in-memory hive sql engine based on Apache Calcite.
It can help you to migrate hive sql scripts to a real-time computing system.
It also provides a convenient Table API to help you to build custom SQL engines.

You may want another similar project: direct-spark-sql

Build and run tests

Requirements

Java 1.8 as a build JDK
Maven

1.build marble

cd marble
mvn clean install -DskipTests

(Optional)
if you need modify the patches of Calcite, build calcite-patch project first

git clone https://github.com/51nb/calcite-patch.git
cd calcite-patch
mvn clean install -DskipTests

In the long term,we hope to merge the patches to Calcite finally.

2.import marble project into IDE, but please don't import calcite-patch as a submodule of marble project

3.run the test TableEnvTest and HiveTableEnvTest

Usage

Maven dependency

        <dependency>
            <groupId>org.codehaus.janino</groupId>
            <artifactId>janino</artifactId>
            <version>3.0.11</version>
        </dependency>
        <dependency>
            <groupId>org.codehaus.janino</groupId>
            <artifactId>commons-compiler</artifactId>
            <version>3.0.11</version>
        </dependency>
        <dependency>
            <groupId>com.u51.marble</groupId>
            <artifactId>marble-table-hive</artifactId>
            <version>1.0.0</version>
            <exclusions>
                <exclusion>
                    <groupId>org.apache.calcite</groupId>
                    <artifactId>calcite-core</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.apache.calcite</groupId>
                    <artifactId>calcite-linq4j</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.codehaus.janino</groupId>
                    <artifactId>janino</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.codehaus.janino</groupId>
                    <artifactId>commons-compiler</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

API Overview

TableEnv.enableSqlPlanCacheSize(200);

TableEnv tableEnv = HiveTableEnv.getTableEnv();

DataTable t1 = tableEnv.fromJavaPojoList(pojoList);
DataTable t2 = tableEnv.fromJdbcResultSet(resultSet);
DataTable t3=tableEnv.fromRowListWithSqlTypeMap(rowList,sqlTypeMap);

tableEnv.addSubSchema("test");
tableEnv.registerTable("test","t1",t1);
tableEnv.registerTable("test","t2", t2);
DataTable queryResult = tableEnv.sqlQuery("select * from test.t1 join test.t2 on t1.id=t2.id");
List<Map<String, Object>> rowList=queryResult.toMapList();

It's recommended to enable plan cache for the same sql query:

TableEnv.enableSqlPlanCacheSize(200);

TableEnv is the main table api to execute sql queries on a dataSet.
It can be used to:

convert a java pojo List or jdbc ResultSet to a DataTable
register a DataTable in TableEnv's catalog
add subSchemas and customized functions in TableEnv's catalog
execute a sql query to get the result DataTable

The TableEnv supports Calcite's sql dialect by default,see it's sql reference.
The goal of HiveTableEnv is to support hive sql as far as possible，developers can aslo use a TableConfig to create a new TableEnv to support other sql dialects(MysqlTableEnv,PostgreTableEnv ..etc).

Supported hive sql features

specific keywords and operators
all of UDF,UDAF
part of UDTF
implicit type casting

load customized UDF,UDAF by package name

HiveTableEnv.registerHiveFunctionPackages("com.u51.data.hive.udf");

Benchmark

There're some benchmark tests in the benchmark module,it compares flink,spark and marble on some simple sql queries.

Design

It shows how marble customized calcite in the sql processing flow:
You can find more details from calcite-patch's commit history.Now Marble uses calcite 1.18.0.

The main type mapping between calcite and hive is:

CalciteSqlType	JavaStorageType	HiveObjectInspector
BIGINT	Long	LongObjectInspector
INTEGER	Int	IntObjectInspector
DOUBLE	Double	DoubleObjectInspectors
DECIMAL	BigDecimal	HiveDecimalObjectInspector
VARCHAR	String	StringObjectInspector
DATE	Int	DateObjectInspector
TIMESTAMP	Long	TimestampObjectInspector
ARRAY	List	StandardListObjectInspector
......	......	......

Roadmap

improve compatibility with hive sql.(high priority)
submit patches to Calcite,make it easy to upgrade calcite-core, some related issues:CALCITE-2282,CALCITE-2973,CALCITE-2992.(high priority)
implements UDTF in a generic way.(high priority)
constant folded for hive udf.(low priority)
use a customized sql Planner to replace the default PlannerImpl.(low priority)
TPC-DS queries with a customized scale.(low priority)
vectorized udf execution.(experimental)
distributed broadcast join.(experimental)
cost based optimizer.(experimental)

More issues see issues.

Contributing

Welcome contributions. Please use the Calcite-idea-code-style.xml under the marble directory to reformat code, and ensure that the validation of maven checker-style plugin is success after source code building.

License

This library is distributed under terms of Apache 2 License

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
benchmark		benchmark
marble-table-hive		marble-table-hive
marble-table		marble-table
src/main/config/checkstyle		src/main/config/checkstyle
.gitignore		.gitignore
51信用卡金融风控场景下实时计算引擎的设计与实践.md		51信用卡金融风控场景下实时计算引擎的设计与实践.md
Calcite-idea-code-style.xml		Calcite-idea-code-style.xml
LICENSE		LICENSE
README.MD		README.MD
how_marble_customized_calcite.jpg		how_marble_customized_calcite.jpg
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Build and run tests

Usage

Benchmark

Design

Roadmap

Contributing

License

About

Releases

Packages

Languages

License

51nb/marble

Folders and files

Latest commit

History

Repository files navigation

Build and run tests

Usage

Benchmark

Design

Roadmap

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages