SQuery-spark

Querying and data processing Clojure library, for Apache-Spark

Rationale

SQL is a query DSL.
Python/Java/Scala are general programming languages.

Clojure can do both in 1 language.

Why Clojure

using Clojure macros, we can make a DSL
its general programming language also
DSL code and normal Clojure code can be combined
made for the JVM, can use Java libraries
dynamic simple and practical
functional for easy data processing
syntax provides support for nested pipelines not just vertical pipelines

Design goals

to have Clojure like syntax, and Clojure names,
(see SQuery the MongoDB query language that is based on Clojure syntax)
use macro to use the Clojure operators inside the queries without the need for namespace qualified names
be simple as compact and simple as possible
be programmable, not code in strings like SQL
be simpler than all alternatives, Java,Scala,Python, including SQL

Overall to feel as Clojure was a query language for spark.

Example

Example is very simple, in more complicated queries difference is much bigger

SQuery

(q df
   {:isExpensive 
    (and (= :StockCode "DOT") (or (> :UnitPrice 600) (substring? "POSTAGE" :description)))}
   ((true? :isExpensive))
   [:UnitPrice :isExpensive])

Scala

val DOTCodeFilter = col("StockCode") === "DOT"
val priceFilter = col("UnitPrice") > 600
val descripFilter = col("Description").contains("POSTAGE")
df.withColumn("isExpensive", DOTCodeFilter.and(priceFilter.or(descripFilter)))
.where("isExpensive")
.select("unitPrice", "isExpensive")

Python

DOTCodeFilter = col("StockCode") == "DOT"
priceFilter = col("UnitPrice") > 600
descripFilter = instr(col("Description"), "POSTAGE") >= 1
df.withColumn("isExpensive", DOTCodeFilter & (priceFilter | descripFilter))\
.where("isExpensive")\
.select("unitPrice", "isExpensive")

SQL

SELECT UnitPrice, (StockCode = 'DOT' AND
(UnitPrice > 600 OR instr(Description, "POSTAGE") >= 1)) as isExpensive
FROM dfTable
WHERE (StockCode = 'DOT' AND (UnitPrice > 600 OR instr(Description, "POSTAGE") >= 1))

SQuery mixed with Clojure

Example of getting the working days and their count from a difference of 2 dates

(q t1
   [{:date1 (date "2006-11-09" "yyyy-MM-dd")}
    {:date2 (date "2006-12-09" "yyyy-MM-dd")}]
   {:diff (days-diff :date2 :date1)}
   {:working-dates  (reduce (fn [v t]
                              (let [dt (add-days :date1 (int t))
                                    dt-n (day-of-week dt)]
                                (if- (and (not= dt-n 1) (not= dt-n 7))
                                  (conj v dt)
                                  v)))
                            (date-array [])
                            (range :diff))}
   {:working-days-count (count :working-dates)}
   (show false))

Learn

Examples are added by solving SQL problems with SQuery.
Databases used for data storage is MongoDB but you can use any.
MongoDB is used because SQuery run on MongoDB also so with almost same syntax both are queried.

SQL Practice Problems: 57 beginning, intermediate, and advanced challenges for you to solve using a “learn-by-doing” approach , by Sylvia Moestl Vasilik (completed)
Spark: The Definitive Guide: Big Data Processing Made Simple (partially)
SQL Cookbook: Query Solutions and Techniques for All SQL Users (partially)

*squery is work in progress, so some examples might give compile errors if squery changed

Usage

Don't use yet. It's under construction and constantly changes. Use it for testing only.
For now the SQL using the dataframe API is supported, and RDD support is added.

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
.idea		.idea
data-used		data-used
doc		doc
src		src
test/squery_spark		test/squery_spark
#common-tips		#common-tips
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
project.clj		project.clj
run		run
run-install		run-install
run-old		run-old
squery-spark.iml		squery-spark.iml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SQuery-spark

Rationale

Design goals

Example

SQuery mixed with Clojure

Learn

Usage

License

About

Releases

Packages

Languages

License

tkaryadis/squery-spark

Folders and files

Latest commit

History

Repository files navigation

SQuery-spark

Rationale

Design goals

Example

SQuery mixed with Clojure

Learn

Usage

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages