Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat & docs: add database source sqlite3 and add document of bluk copy in sql server #54

Merged
merged 6 commits into from
Oct 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ This data synchronization tool has the synchronization capability for the follow
| | DB2 LUW | √ | √ | [Read](datax/plugin/reader/db2/README.md)、[Write](datax/plugin/writer/db2/README.md) |
| | SQL Server | √ | √ | [Read](datax/plugin/reader/sqlserver/README.md)、[Write](datax/plugin/writer/sqlserver/README.md) |
| | Oracle | √ | √ | [Read](datax/plugin/reader/oracle/README.md)、[Write](datax/plugin/writer/oracle/README.md) |
| | Sqlite3 | √ | √ | [Read](datax/plugin/reader/sqlite3/README.md)、[Write](datax/plugin/writer/sqlite3/README.md) |
| Unstructured Data Stream | CSV | √ | √ | [Read](datax/plugin/reader/csv/README.md)、[Write](datax/plugin/writer/csv/README.md) |
| | XLSX(excel) | √ | √ | [Read](datax/plugin/reader/xlsx/README.md)、[Write](datax/plugin/writer/xlsx/README.md) |

Expand Down
15 changes: 14 additions & 1 deletion README_USER.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,7 @@ The configurations for `reader` and `writer` are as follows:
| | DB2 LUW | √ | √ | [Read](datax/plugin/reader/db2/README.md), [Write](datax/plugin/writer/db2/README.md) |
| | SQL Server | √ | √ | [Read](datax/plugin/reader/sqlserver/README.md), [Write](datax/plugin/writer/sqlserver/README.md) |
| | Oracle | √ | √ | [Read](datax/plugin/reader/oracle/README.md), [Write](datax/plugin/writer/oracle/README.md) |
| | Sqlite3 | √ | √ | [Read](datax/plugin/reader/sqlite3/README.md)、[Write](datax/plugin/writer/sqlite3/README.md) |
| Unstructured Stream | CSV | √ | √ | [Read](datax/plugin/reader/csv/README.md), [Write](datax/plugin/writer/csv/README.md) |
| | XLSX (excel) | √ | √ | [Read](datax/plugin/reader/xlsx/README.md), [Write](datax/plugin/writer/xlsx/README.md) |

Expand Down Expand Up @@ -186,7 +187,19 @@ datax -c examples/postgrescsv/config.json
datax -c examples/postgresxlsx/config.json
```

##### 2.1.2.10 Other Synchronization Examples
##### 2.1.2.10 Synchronizing with sqlite3

* Before use, download the corresponding [SQLite Download Page](https://www.sqlite.org/download.html).
* Note: On Windows, set `path=%path%;/opt/sqlite/sqlite3.dll`.
* Initialize the database using `cmd/datax/examples/sqlite3/init.sql` **for testing purposes**
* In `examples/sqlite3/config.json`, `url` is the path of sqlite3 database files. On Windows, it can be `E:\sqlite3\test.db`, meanwhile, on Linux, it can be `/sqlite3/test.db`,
* Start the sqlite3 synchronization command:

```bash
datax -c examples/sqlite3/config.json
```

##### 2.1.2.11 Other Synchronization Examples

In addition to the above examples, all data sources listed in the go-etl features can be used interchangeably. Configurations can be set up for data sources such as MySQL to PostgreSQL, MySQL to Oracle, Oracle to DB2, etc.

Expand Down
11 changes: 10 additions & 1 deletion README_USER_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,7 @@ data -c config.json
| | DB2 LUW | √ | √ | [读](datax/plugin/reader/db2/README_zh-CN.md)、[写](datax/plugin/writer/db2/README_zh-CN.md) |
| | SQL Server | √ | √ | [读](datax/plugin/reader/sqlserver/README_zh-CN.md)、[写](datax/plugin/writer/sqlserver/README_zh-CN.md) |
| | Oracle | √ | √ | [读](datax/plugin/reader/oracle/README_zh-CN.md)、[写](datax/plugin/writer/oracle/README_zh-CN.md) |
| | Sqlite3 | √ | √ | [读](datax/plugin/reader/sqlite3/README.md)、[写](datax/plugin/writer/sqlite3/README.md) |
| 无结构流 | CSV | √ | √ | [读](datax/plugin/reader/csv/README_zh-CN.md)、[写](datax/plugin/writer/csv/README_zh-CN.md) |
| | XLSX(excel) | √ | √ | [读](datax/plugin/reader/xlsx/README_zh-CN.md)、[写](datax/plugin/writer/xlsx/README_zh-CN.md) |

Expand Down Expand Up @@ -187,7 +188,15 @@ datax -c examples/postgrescsv/config.json
datax -c examples/postgresxlsx/config.json
```

##### 2.1.2.10 其他同步例子
##### 2.1.2.10 与 sqlite3 同步

* 在使用前,请下载相应的[SQLite驱动](https://www.sqlite.org/download.html).
* 注意:在 Windows 系统上,设置 `path=%path%;/opt/sqlite/sqlite3.dll`。
* 使用 `cmd/datax/examples/sqlite3/init.sql` **用于测试目的** 初始化数据库
* 在 `examples/sqlite3/config.json` 文件中,`url` 表示 sqlite3 数据库文件的路径。在 Windows 系统上,它可以是 `E:\sqlite3\test.db`,而在 Linux 系统上,它可以是 `/sqlite3/test.db`。
* 启动 sqlite3 同步命令:

##### 2.1.2.11 其他同步例子

除了上述例子外,在go-etl特性中所列出的数据源都可以交叉使用,还配置例如mysql到postgresql数据源,mysql到oracle,oracle到db2等等,

Expand Down
1 change: 1 addition & 0 deletions README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ go-etl将提供的etl能力如下:
| | DB2 LUW | √ | √ | [读](datax/plugin/reader/db2/README_zh-CN.md)、[写](datax/plugin/writer/db2/README_zh-CN.md) |
| | SQL Server | √ | √ | [读](datax/plugin/reader/sqlserver/README_zh-CN.md)、[写](datax/plugin/writer/sqlserver/README_zh-CN.md) |
| | Oracle | √ | √ | [读](datax/plugin/reader/oracle/README_zh-CN.md)、[写](datax/plugin/writer/oracle/README_zh-CN.md) |
| | Sqlite3 | √ | √ | [读](datax/plugin/reader/sqlite3/README.md)、[写](datax/plugin/writer/sqlite3/README.md) |
| 无结构流 | CSV | √ | √ | [读](datax/plugin/reader/csv/README_zh-CN.md)、[写](datax/plugin/writer/csv/README_zh-CN.md) |
| | XLSX(excel) | √ | √ | [读](datax/plugin/reader/xlsx/README_zh-CN.md)、[写](datax/plugin/writer/xlsx/README_zh-CN.md) |

Expand Down
47 changes: 47 additions & 0 deletions cmd/datax/examples/sqlite3/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
{
"core" : {
"container": {
"job":{
"id": 1,
"sleepInterval":100
}
}
},
"job":{
"content":[
{
"reader":{
"name": "sqlite3reader",
"parameter": {
"column": ["*"],
"connection": {
"url": "E:\\Sqlite3\\test.db",
"table": {
"db":"main",
"name":"type_table"
}
},
"where": ""
}
},
"writer":{
"name": "sqlite3writer",
"parameter": {
"writeMode": "insert",
"column": ["*"],
"connection": {
"url": "E:\\Sqlite3\\test.db",
"table": {
"db":"main",
"name":"type_table_copy"
}
},
"batchTimeout": "1s",
"batchSize":1000
}
},
"transformer":[]
}
]
}
}
15 changes: 15 additions & 0 deletions cmd/datax/examples/sqlite3/init.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
drop table if exists "type_table";
create table "type_table" (
"t_integer" integer,
"t_real" real,
"t_text" text
);

insert into "type_table" values (1, 1.01, 123456);

drop table if exists "type_table_copy";
create table "type_table_copy" (
"t_integer" integer,
"t_real" real,
"t_text" text
);
47 changes: 47 additions & 0 deletions cmd/datax/tools/testData/sqlite3.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
{
"core" : {
"container": {
"job":{
"id": 1,
"sleepInterval":100
}
}
},
"job":{
"content":[
{
"reader":{
"name": "sqlite3reader",
"parameter": {
"column": ["*"],
"connection": {
"url": "E:\\Sqlite3\\test.db",
"table": {
"db":"main",
"name":"type_table"
}
},
"where": ""
}
},
"writer":{
"name": "sqlite3writer",
"parameter": {
"writeMode": "insert",
"column": ["*"],
"connection": {
"url": "E:\\Sqlite3\\test.db",
"table": {
"db":"main",
"name":"type_table_copy"
}
},
"batchTimeout": "1s",
"batchSize":1000
}
},
"transformer":[]
}
]
}
}
2 changes: 1 addition & 1 deletion datax/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ The Task combines *plugin.BaseTask and implements the following methods:
#### 3.1.4 Command Generation

```bash
cd tools/go-etl/plugin
cd tools/datax/plugin
# Adds a new Reader named Mysql. The -p command can be in any case and is used to specify the name of the Reader. If -d is added, it means the original template will be deleted.
go run main.go -t reader -p Mysql
```
Expand Down
2 changes: 1 addition & 1 deletion datax/README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ Task组合*plugin.BaseTask,实现方法
#### 3.1.4 命令生成

```bash
cd tools/go-etl/plugin
cd tools//datax/plugin
#新增一个名为Mysql的reader -p命令可以时任意大小写,用于指定reader的名字,如果新增-d 代表会删除原来的模板
go run main.go -t reader -p Mysql
```
Expand Down
2 changes: 1 addition & 1 deletion datax/plugin/reader/csv/resources/plugin.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@
"name" : "csvreader",
"developer":"Breeze0806",
"opener":"csv",
"description":""
"description":"CsvReader leverages the os and encoding/csv standard libraries to read files"
}
2 changes: 1 addition & 1 deletion datax/plugin/reader/oracle/resources/plugin.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@
"name" : "oraclereader",
"developer":"Breeze0806",
"dialect":"oracle",
"description":""
"description":"OracleReader connects to remote Oracle databases using Oracle Instant Client via github.com/godror/godror"
}
15 changes: 14 additions & 1 deletion datax/plugin/reader/oracle/resources/plugin_job_template.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,19 @@
{
"name": "oraclereader",
"parameter": {

"connection": {
"url": "connectString=\"192.168.15.130:1521/xe\" heterogeneousPool=false standaloneConnection=true",
"table": {
"schema":"TEST",
"name":"SRC"
}
},
"username": "system",
"password": "oracle",
"column": ["*"],
"split" : {
"key":"id"
},
"where": ""
}
}
151 changes: 151 additions & 0 deletions datax/plugin/reader/sqlite3/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
# Sqlite3Reader Plugin Documentation

## Quick Introduction

The Sqlite3Reader plugin enables data reading from Sqlite3 databases. Under the hood, Sqlite3Reader connects to remote Sqlite3 databases using `github.com/mattn/go-sqlite3` and executes corresponding SQL statements to query data from the database.

## Implementation Principles

Sqlite3Reader connects to remote Sqlite3 databases using `github.com/mattn/go-sqlite3` and generates SQL queries based on user-provided configuration information. These queries are then sent to the remote Sqlite3 database, and the returned results are assembled into an abstract dataset using go-etl's custom data types. This dataset is then passed to downstream Writer processing.
Sqlite3Reader implements specific queries by calling go-etl's custom `storage/database` DBWrapper, which is defined in the dbmsreader's query process. DBWrapper encapsulates many interfaces of `database/sql` and abstracts the database dialect, Dialect. For sqlite3, the implementation of Dialect provided by `storage/database/sqlite3` is used.

## Functionality Description

### Configuration Example

Configuring a job to synchronize data from a Sqlite3 database to a local system:

```json
{
"job": {
"content": [
{
"reader": {
"name": "sqlite3reader",
"parameter": {
"column": [
"*"
],
"connection": {
"url": "E:\\Sqlite3\\test.db",
"table": {
"db": "main",
"name": "type_table"
}
},
"where": ""
}
}
}
]
}
}
```

### Parameter Explanation

#### url

- Description: It is mainly used to configure the path of sqlite3 database files
- Required: Yes
- Default: None

#### table

Describes the sqlite3 table information.

##### name

- Description: Mainly used to configure the table name of the sqlite3 table.
- Required: Yes
- Default: None

#### column

- Description: The set of column names that need to be synchronized from the configured table. JSON array syntax is used to describe the column information. Using "*" represents that all columns are used by default, for example, `["*"]`.

Supports column pruning, which means users can select specific columns for export.

Supports column reordering, meaning the columns can be exported in an order different from the table schema.

Supports constant configuration. Users need to follow the sqlite3 syntax format.

- Required: Yes
- Default: None

#### split

##### key

- Description: Mainly used to configure the splitting key for the sqlite3 table. The splitting key must be of type bigInt/string/time, assuming that the data is evenly distributed based on the splitting key.
- Required: No
- Default: None

##### timeAccuracy

- Description: Mainly used to configure the time splitting key for the sqlite3 table, mainly to describe the smallest unit of time, such as day (for dates), min (for minutes), s (for seconds), ms (for milliseconds), us (for microseconds), ns (for nanoseconds).
- Required: No
- Default: None

##### range

###### type
- Description: Mainly used to configure the default value type of the splitting key for the sqlite3 table, with values being bigInt/string/time. Here, it will check the type of the splitting key in the table, so please make sure the type is correct.
- Required: No
- Default: None

###### left
- Description: Mainly used to configure the default maximum value of the splitting key for the sqlite3 table.
- Required: No
- Default: None

###### right
- Description: Mainly used to configure the default minimum value of the splitting key for the sqlite3 table.
- Required: No
- Default: None

#### where

- Description: Mainly used to configure the where condition for the select statement.
- Required: No
- Default: None

#### querySql

- Description: In some business scenarios, the `where` configuration item is not sufficient to describe the filtering conditions, so users can use this configuration item to customize the filtering SQL. When users configure this item, the DataX system will ignore the `table`, `column`, and other configuration items, and directly use the content of this configuration item for data filtering. For example, if you need to perform a join operation on multiple tables before synchronizing the data, you can use `select a,b from table_a join table_b on table_a.id = table_b.id`.
When the user configures `querySql`, Sqlite3Reader directly ignores the configuration of `table`, `column`, and `where` conditions. The priority of `querySql` is higher than that of `table`, `column`, and `where` options.
- Required: No
- Default: None

#### trimChar

- Description: Whether to remove leading and trailing spaces for the char type in sqlite3.
- Required: No
- Default: false

### Type Conversion

Currently, Sqlite3Reader supports most sqlite3 types, but there are still some individual types that are not supported. Please check your types carefully.

Below is a list of type conversions that Sqlite3Reader performs for sqlite3 types:

| go-etl的类型 | sqlite3数据类型 |
| ------------ |--------------------|
| bigInt | INTEGER |
| decimal | REAL, NUMERIC |
| string | TEXT |
| bytes | BLOB |

## Performance Report

To be tested.

## Constraints and Limitations

### Database Encoding Issues
Currently, only the utf8 character set is supported.

### Data type limitation
The issue of the NUMERIC data type not supporting high-precision real numbers

## FAQ
Loading
Loading