|
1 |
| -# dt-sql-parser |
| 1 | +# CONTRIBUTING |
2 | 2 |
|
3 |
| -## Get Start |
| 3 | +English | [简体中文](./CONTRIBUTING-zh_CN.md) |
4 | 4 |
|
5 |
| -installing the dependencies after cloned project: |
| 5 | +## Development |
6 | 6 |
|
7 |
| -```bash |
8 |
| -yarn install |
9 |
| -``` |
| 7 | +> [!Tips] |
| 8 | +> Before starting, you need to make sure your local Java environment is set up, otherwise you will not be able to generate from the grammar file. You can check it by running `java --version`. |
10 | 9 |
|
11 |
| -- test |
| 10 | +- **Install dependencies** |
12 | 11 |
|
13 |
| -```bash |
14 |
| -yarn test |
15 |
| -``` |
| 12 | + ```bash |
| 13 | + pnpm install |
| 14 | + ``` |
16 | 15 |
|
17 |
| -## Compile the grammar sources |
| 16 | +- **Compile g4 Files** |
18 | 17 |
|
19 |
| -Compile one language: |
| 18 | + ```bash |
| 19 | + # Compile all g4 files |
| 20 | + pnpm antlr4 |
| 21 | + # Compile for a specific language |
| 22 | + pnpm antlr4 --lang mysql |
| 23 | + ``` |
20 | 24 |
|
21 |
| -```bash |
22 |
| -yarn antlr4 --lang=mysql |
23 |
| -``` |
| 25 | +- **Run Unit Tests** |
24 | 26 |
|
25 |
| -Compile all languages: |
| 27 | + ```bash |
| 28 | + pnpm test |
| 29 | + ``` |
26 | 30 |
|
27 |
| -```bash |
28 |
| -yarn antlr4 --all |
29 |
| -``` |
| 31 | +- **Run Benchmark Tests** |
30 | 32 |
|
31 |
| -## Branch Organization |
| 33 | + ```bash |
| 34 | + pnpm benchmark |
| 35 | + ``` |
32 | 36 |
|
33 |
| -## Source Code Organization |
| 37 | +## Directory Overview |
| 38 | + |
| 39 | +- `src/grammar`: Contains g4 files (grammar files) |
| 40 | +- `src/lib`: Generated files from g4 grammar (produced by running `pnpm antlr4`) |
| 41 | +- `src/parser`: Implementations of SQL Parser classes |
| 42 | +- `src/parser/common`: Base classes and utility methods for SQL Parsers |
| 43 | +- `test`: Unit tests |
| 44 | +- `benchmark`: Benchmark tests |
| 45 | + |
| 46 | +## How to Add a New SQL Language |
| 47 | + |
| 48 | +1. **Add New Grammar Files** |
| 49 | + |
| 50 | + Add the new g4 grammar file to `src/grammar/<SQL name>`. Name the file in PascalCase. The grammar rules within the file should adhere to the following: |
| 51 | + |
| 52 | + - The root rule should be named `program`. |
| 53 | + - Support parsing multiple SQL statements. |
| 54 | + - Enable [case-insensitive options](https://github.com/antlr/antlr4/blob/dev/doc/options.md#caseinsensitive) (if the SQL language is case-insensitive). |
| 55 | + - Lexical rules for all keywords should prefix with `KW_` (e.g., `KW_SELECT: 'SELECT';`). This aids in differentiating keyword lexical rules for autocomplete functionality. |
| 56 | + |
| 57 | +2. **Generate Files from Grammar** |
| 58 | + |
| 59 | + Run the following command to generate files from the new grammar: |
| 60 | + |
| 61 | + ```bash |
| 62 | + pnpm antlr4 --lang <SQL name> |
| 63 | + ``` |
| 64 | + |
| 65 | + Check that the corresponding Lexer, Parser, Listener, and Visitor files are generated in the `src/lib/<SQL name>/` directory. |
| 66 | + |
| 67 | +3. **Implement SQL Parser Class** |
| 68 | + |
| 69 | + Create a file `src/parser/<SQL name>/index.ts` and implement the corresponding SQL Parser class. This class should extend from the `BasicSQL` base class. Initially, implement the `createLexerFromCharStream` and `createParserFromTokenStream` methods; other methods can be left empty for now. |
| 70 | + |
| 71 | +4. **Add Basic Unit Tests** |
| 72 | + |
| 73 | + Add basic unit tests in `test/parser/<SQL name>` for: |
| 74 | + |
| 75 | + - Lexer |
| 76 | + - Visitor |
| 77 | + - Listener |
| 78 | + - `parser.validate` method |
| 79 | + |
| 80 | + You can reference tests from other SQL parsers. |
| 81 | + |
| 82 | +5. **SQL Syntax Unit Tests** |
| 83 | + |
| 84 | + Add unit tests for SQL syntax in the `test/parser/<SQL name>/syntax` directory. Ensure coverage of **all** SQL syntax rules. It is recommended to add tests based on the official grammar documentation to ensure accuracy. |
| 85 | + |
| 86 | +6. **Implement SQLSplitListener** |
| 87 | + |
| 88 | + Implement the `SQLSplitListener` and add the `splitListener` getter in the SQL Parser class. Also, add unit tests for the `parser.splitSQLByStatement` method, which splits SQL into individual statements. |
| 89 | + |
| 90 | +7. **Autocomplete Features** |
| 91 | + |
| 92 | + Implement methods `processCandidates` and `preferredRules` for autocomplete functionality. Familiarize yourself with [antlr4-c3](https://github.com/mike-lischke/antlr4-c3). Then, add autocomplete-related unit tests in `test/parser/<SQL name>/suggestion`. |
| 93 | + |
| 94 | +8. **Context Information Collection** |
| 95 | + |
| 96 | + Implement the `SQLEntityCollector` class and the `createEntityCollector` method in the SQL Parser class for SQL context information collection. This enhances the autocomplete functionality. For more details, refer to [here](https://github.com/DTStack/dt-sql-parser/discussions/250#discussioncomment-8215715). |
| 97 | + |
| 98 | + Then, add tests for entity collection methods in `test/parser/<SQL name>/contextCollect`. |
| 99 | + |
| 100 | +## Sources for Grammar Files |
| 101 | + |
| 102 | +SQL grammar files can be quite complex. If you want to add a new SQL language to dt-sql-parser, it is not recommended to start from scratch. Consider the following sources, listed in order of preference: |
| 103 | + |
| 104 | +1. **Official SQL Repositories**: |
| 105 | + |
| 106 | + Some official SQL repositories use Antlr4 for SQL parsing. You can find the corresponding grammar files in their source code. For example: |
| 107 | + - [TrinoSQL](https://github.com/trinodb/trino/blob/385/core/trino-parser/src/main/antlr4/io/trino/sql/parser/SqlBase.g4) |
| 108 | + - [SparkSQL](https://github.com/apache/spark/blob/v3.5.0/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4) |
| 109 | + |
| 110 | + Grammar files from official repositories are generally the most reliable, stable, and performant. |
| 111 | + |
| 112 | +2. **Grammar-v4 Repository**: |
| 113 | + |
| 114 | + This is the official grammar file repository maintained by Antlr. It includes a variety of SQL grammar files. The files here are typically reliable. |
| 115 | + |
| 116 | +3. **Community/Other Open Source Repositories**: |
| 117 | + |
| 118 | + Grammar files obtained from the community or other open source repositories may be less reliable and often require significant time to fix grammar issues. |
0 commit comments