Skip to content

Commit fd1c9e1

Browse files
authored
Merge pull request #3677 from ClickHouse/linter_spellcheck
restructure linting and checks
2 parents 5cd5e1e + 7d2c55a commit fd1c9e1

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

59 files changed

+150
-230
lines changed

.github/workflows/check-build.yml

+5-10
Original file line numberDiff line numberDiff line change
@@ -24,15 +24,10 @@ jobs:
2424
if: matrix.check_type == 'spellcheck'
2525
run: sudo apt-get update && sudo apt-get install -y aspell aspell-en
2626
- name: Set up Python
27-
if: matrix.check_type == 'kbcheck'
28-
uses: actions/setup-python@v5
29-
with:
30-
python-version: '3.x'
31-
- name: Install dependencies
3227
if: matrix.check_type == 'kbcheck'
3328
run: |
34-
python -m pip install --upgrade pip
35-
pip install -r 'scripts/knowledgebase-checker/requirements.txt'
29+
curl -Ls https://astral.sh/uv/install.sh | sh
30+
uv python install 3.12
3631
- name: Setup md-lint environment
3732
if: matrix.check_type == 'md-lint'
3833
uses: actions/setup-node@v3
@@ -47,13 +42,13 @@ jobs:
4742
id: check_step
4843
run: |
4944
if [[ "${{ matrix.check_type }}" == "spellcheck" ]]; then
50-
./scripts/check-doc-aspell
45+
yarn check-spelling
5146
exit_code=$?
5247
elif [[ "${{ matrix.check_type }}" == "kbcheck" ]]; then
53-
./scripts/knowledgebase-checker/knowledgebase_article_checker.py --kb-dir="knowledgebase"
48+
yarn check-kb
5449
exit_code=$?
5550
elif [[ "${{ matrix.check_type }}" == "md-lint" ]]; then
56-
yarn markdownlint-cli2 --config ./scripts/.markdownlint-cli2.yaml 'docs/**/*.md'
51+
yarn check-markdown
5752
exit_code=$?
5853
fi
5954

README.md

+13-1
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,12 @@ We recommend to install rsync in order to only copy what is needed, however the
110110
111111
Running `yarn copy-clickhouse-repo-docs` without any arguments will pull in the latest docs changes from github.
112112
113+
To check spelling and markdown is correct locally run:
114+
115+
```bash
116+
yarn check-style
117+
```
118+
113119
### Notes {#notes}
114120
115121
Here are some things to keep in mind when building a local copy of the ClickHouse docs site.
@@ -134,7 +140,13 @@ Check out the GitHub docs for a refresher on [how to create a pull request](http
134140
135141
### Style guidelines
136142
137-
For documentation style guidelines, see ["Style guide"](/contribute/style-guide.md).
143+
For documentation style guidelines, see ["Style guide"](/contribute/style-guide.md).
144+
145+
To check spelling and markdown is correct locally run:
146+
147+
```bash
148+
yarn check-style
149+
```
138150
139151
### Generating documentation from source code
140152

contribute/contrib-writing-guide.md

+35-165
Original file line numberDiff line numberDiff line change
@@ -45,12 +45,11 @@ sudo npm install --global yarn
4545

4646
note: if the Node version available in your distro is old (`<=v16`), you can use [nvm](https://github.com/nvm-sh/nvm#installing-and-updating) to pick a specific one.
4747

48-
for example to use node 18:
48+
for example to use the appropriate version of node:
4949

5050
```bash
5151
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.0/install.sh | bash
52-
nvm install 18
53-
nvm use 18
52+
nvm use
5453
```
5554

5655
#### OSx {#osx}
@@ -104,6 +103,27 @@ yarn start
104103
# not, make them, and you will see the page update as you save the changes.
105104
```
106105

106+
## Checking style standards {#check-style}
107+
108+
Users can check that spelling and markdown is correct with the following commands:
109+
110+
```bash
111+
yarn check-spelling
112+
yarn check-markdown
113+
```
114+
115+
Knowledge base articles can be checked for structure (e.g. front matter and correct tags) with:
116+
117+
```bash
118+
yarn check-kb
119+
```
120+
121+
These commands can all be run with a single command:
122+
123+
```bash
124+
yarn check-style
125+
```
126+
107127
## Placeholder files {#placeholder-files}
108128
Some of the markdown content is generated from other files; here are some examples:
109129

@@ -128,13 +148,13 @@ pages as:
128148

129149
#### link to another doc {#link-to-another-doc}
130150
```md title="foo"
131-
[async_insert](/docs/operations/settings/settings.md)
151+
[async_insert](/operations/settings/settings.md)
132152
```
133153

134154
#### link to an anchor within another doc {#link-to-an-anchor-within-another-doc}
135155

136156
```md
137-
[async_insert](/docs/operations/settings/settings.md/#async-insert)
157+
[async_insert](/operations/settings/settings.md/#async-insert)
138158
```
139159

140160
Note the initial `/`, the `.md` and the slash between the `.md` and the `#async-insert` in the second example.
@@ -152,17 +172,21 @@ https://clickhouse.com/docs/install/
152172
should be replaced with
153173

154174
```md
155-
/docs/getting-started/install.md
175+
/getting-started/install.md
156176
```
157177

178+
Note a `/docs` prefix is not required.
179+
158180
If you look closely, the path on disk does not match the URL. The URLs can be changed by setting the `slug` in the markdown file frontmatter.
159181

160182
## Introduce your topic {#introduce-your-topic}
183+
161184
The default first page in a folder (category in Docusaurus terminology) is a list of the pages in that folder. If you would like an intro or overview page, then add that page to `docs/en/coverpages/`. This example is addring an Architecture folder, so the full filename would be `docs/en/coverpages/architecture.md`
162185

163186
The next step depends on the location of the file in the nav. In this example, architecture is at the root level:
164187

165188
### Cover pages {#cover-pages}
189+
166190
Intros, cover pages, summaries--whatever you want to call them; it is important to provide the reader with a summary of a section
167191
of the docs. The summary should let them know if they are in the right place. Also include at the bottom of the summary a link to relted content (blogs, videos, etc.)
168192
The cover page is specified in `sidebars.js`, specifically with a `link` of type `doc`:
@@ -186,6 +210,7 @@ The cover page is specified in `sidebars.js`, specifically with a `link` of type
186210
```
187211

188212
## Save time with reusable content {#save-time-with-reusable-content}
213+
189214
Many of the pages in the docs have a section toward the top about gathering the connection details for your ClickHouse service. Sending people off to look at a page to learn about the connection details and then having them forget what they were doing is a poor user experience. If you can just include the right instructions in the current doc, please do so. In general there are two interfaces that people will use when integrating some 3rd party software, or even `clickhouse client` to ClickHouse:
190215

191216
- HTTPS
@@ -213,10 +238,12 @@ Here is how the above renders:
213238
## Avoid sending readers in circles {#avoid-sending-readers-in-circles}
214239

215240
### Links can be overdone {#links-can-be-overdone}
241+
216242
Every time you mention a feature or product you may be tempted to link to it. Don't do it. When peole see links
217243
they can be tempted to visit them, and quite often there is no need for them to go to the linked content. If you mention a technique and you need the reader to learn it right then, add a link. If they should read about it later then add a link down at the bottom of the doc in a **What's next** section.
218244

219245
### Include content in the current doc instead {#include-content-in-the-current-doc-instead}
246+
220247
If you find yourself wanting to send the reader to another doc to perform a task before they perform the main task that you are writing about, then maybe that prerequisite task should be included in the current doc instead so the reader is not clicking back and forth. It may be time to create a snippet pull the content from the other doc into a snippet file and include it in the current doc and the other doc that you pulled it from (see [above](#save-time-with-reusable-content)).
221248

222249
## Avoid multiple pages for a single topic {#avoid-multiple-pages-for-a-single-topic}
@@ -355,125 +382,7 @@ When writing docs about a new feature it helps to be able to use the new feature
355382
```bash
356383
curl https://ClickHouse.com/ | sh
357384
```
358-
359-
### Run unreleased builds in Docker {#run-unreleased-builds-in-docker}
360-
361-
```bash
362-
docker pull clickhouse/clickhouse-server:head
363-
```
364-
365-
```bash
366-
docker run -d \
367-
--cap-add=SYS_NICE \
368-
--cap-add=NET_ADMIN \
369-
--cap-add=IPC_LOCK \
370-
--name some-clickhouse-server \
371-
--ulimit nofile=262144:262144 \
372-
clickhouse/clickhouse-server:head
373-
```
374-
375-
## Tests: A great source of details {#tests-a-great-source-of-details}
376-
377-
If you want to run the tests from the `ClickHouse/tests` directory you either need a full release, a CI build, or to compile yourself. [How to get the binaries](https://clickhouse.com/docs/development/build/#you-dont-have-to-build-clickhouse)
378-
379-
### Extracting build from RPMs {#extracting-build-from-rpms}
380-
381-
If you want to extract the binary files from RPMs to use with the test `runner`, you can use `cpio`
382-
383-
```bash
384-
mkdir 22.12
385-
mv cl*rpm 22.12/
386-
export CHDIR=`pwd`/22.12
387-
cd $CHDIR
388-
rpm2cpio ./clickhouse-server-22.12.1.1738.x86_64.rpm | \
389-
cpio -id --no-absolute-filenames
390-
391-
rpm2cpio ./clickhouse-client-22.12.1.1738.x86_64.rpm | \
392-
cpio -id --no-absolute-filenames
393-
394-
rpm2cpio ./clickhouse-common-static-22.12.1.1738.x86_64.rpm | \
395-
cpio -id --no-absolute-filenames
396-
```
397-
398-
### Modify the ClickHouse server config {#modify-the-clickhouse-server-config}
399-
400-
If you are running the ClickHouse server process and not using the standard
401-
directories of `/etc/clickhouse-server` for configs and `/var` for the data directories
402-
then you will need to edit the config.
403-
404-
Create an override dir:
405-
```bash
406-
mkdir $CHDIR/etc/clickhouse-server/config.d
407-
```
408-
409-
This is a sample `$CHDIR/etc/clickhouse-server/config.d/dirs.xml`
410-
file that overrides the default config:
411-
412-
```xml
413-
<clickhouse>
414-
<logger>
415-
<level>error</level>
416-
<log>/home/droscigno/Downloads/22.12/usr/var/log/clickhouse-server/clickhouse-server.log</log>
417-
<errorlog>/home/droscigno/Downloads/22.12/usr/var/log/clickhouse-server/clickhouse-server.err.log</errorlog>
418-
</logger>
419-
<path>/home/droscigno/Downloads/22.12/usr/var/lib/clickhouse/</path>
420-
<tmp_path>/home/droscigno/Downloads/22.12/usr/var/lib/clickhouse/tmp/</tmp_path>
421-
<user_files_path>/home/droscigno/Downloads/22.12/usr/var/lib/clickhouse/user_files/</user_files_path>
422-
<user_directories>
423-
<local_directory>
424-
<path>/home/droscigno/Downloads/22.12/usr/var/lib/clickhouse/access/</path>
425-
</local_directory>
426-
</user_directories>
427-
<format_schema_path>/home/droscigno/Downloads/22.12/usr/var/lib/clickhouse/format_schemas/</format_schema_path>
428-
</clickhouse>
429385
```
430-
### Run ClickHouse {#run-clickhouse}
431-
432-
From $CHDIR/usr/bin:
433-
434-
```bash
435-
./clickhouse-server -C ../../etc/clickhouse-server/config.xml
436-
```
437-
438-
### Command to run tests: {#command-to-run-tests}
439-
440-
These examples use env vars for the directory names:
441-
442-
- $DOCS is the parent directory of the ClickHouse repo
443-
- $CHDIR is the parent directory of the extracted ClickHouse download files
444-
445-
For SQL tests:
446-
447-
The command to run the tests needs:
448-
449-
- The $PATH to use, with $CHDIR/usr/bin added to the $PATH
450-
- The path to `clickhouse-test`
451-
- The name of the test to run
452-
453-
For example, to run the test `01428_hash_set_nan_key`:
454-
```bash
455-
PATH=$CHDIR/usr/bin/:$PATH \
456-
$DOCS/ClickHouse/tests/clickhouse-test \
457-
01428_hash_set_nan_key
458-
```
459-
460-
To see the queries that were run:
461-
```bash
462-
PATH=$CHDIR/usr/bin/$PATH \
463-
clickhouse-client -q \
464-
"select query from system.query_log ORDER BY event_time FORMAT Vertical"
465-
```
466-
467-
For integration tests:
468-
```bash
469-
cd $DOCS/ClickHouse/tests/integration/
470-
./runner -n 5 \
471-
--src-dir $DOCS/ClickHouse/src \
472-
--binary $CHDIR/usr/bin/clickhouse-server \
473-
--cleanup-containers \
474-
--command bash
475-
```
476-
477386
## How to change code highlighting? {#how-to-change-code-highlighting}
478387
479388
Code highlighting is based on the language chosen for your code blocks. Specify the language when you start the code block:
@@ -495,7 +404,7 @@ If you need a language supported then open an issue in [ClickHouse-docs](https:/
495404

496405
At the moment there’s no easy way to do just that, but you can consider:
497406

498-
- To hit the Watch button on top of GitHub web interface to know as early as possible, even during pull request. Alternative to this is `#github-activity` channel of [public ClickHouse Slack](https://clickhouse.com/slack).
407+
- To hit the "Watch" button on top of GitHub web interface to know as early as possible, even during pull request. Alternative to this is `#github-activity` channel of [public ClickHouse Slack](https://clickhouse.com/slack).
499408
- Some search engines allow to subscribe on specific website changes via email and you can opt-in for that for https://clickhouse.com.
500409

501410
## Embedding videos {#embedding-videos}
@@ -534,46 +443,7 @@ And here is a Vimeo example from the Cloud landing page:
534443

535444
## Algolia {#algolia}
536445

537-
The docs are crawled daily. The configuration for the crawler is in the docs-private repo
538-
as the crawler config contains a key that is used to manage the Algolia account. If you need to modify the crawler configuration log in to crawler.algolia.com and edit the configuration in the
539-
UI. Once the updated configuration is tested, update the configuration stored in the docs-private repo.
540-
541-
**Note**
542-
543-
Comments added to the config get removed by the Algolia editor :( The best practice would be to add your comments to the PR used to update the config in docs-private.
544-
545-
### Doc search tweaks {#doc-search-tweaks}
546-
We use [Docsearch](https://docsearch.algolia.com/) from Algolia; there is not much for you to do to have the docs you write added to the search. Every Monday, the Algolia crawler updates our index.
547-
548-
If a search is not finding the page that you expect, then have a look at the Markdown for that page. For example, a search for `UDF` was returning a bunch of changelog entries, but not the page specifically for user defined functions. This was the Markdown for the page:
549-
550-
```md
551-
---
552-
slug: /sql-reference/statements/create/function
553-
sidebar_position: 38
554-
sidebar_label: FUNCTION
555-
---
556-
557-
# CREATE FUNCTION
558-
559-
Creates a user defined function from a lambda expression.
560-
```
561-
562-
And this was the change to improve the search results (add the expected search terms to the H1 heading):
563-
564-
```md
565-
---
566-
slug: /sql-reference/statements/create/function
567-
sidebar_position: 38
568-
sidebar_label: FUNCTION
569-
---
570-
571-
# CREATE FUNCTION &mdash; user defined function (UDF)
572-
573-
Creates a user defined function from a lambda expression.
574-
```
575-
576-
Note: The docs are crawled each morning. If you make a change and want the docs re-crawled sooner, open an issue in clickhouse-docs.
446+
The docs are crawled daily. This is achieved by parsing the markdown and inserting it into Algolia. We maintain a list of authoritative search results to measure relevancy. See [Search README](../scripts/search/README.md) for further details.
577447

578448
## Tools that you might like {#tools-that-you-might-like}
579449

docs/_snippets/_system_table_cloud.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
:::note Querying in ClickHouse Cloud
22
The data in this system table is held locally on each node in ClickHouse Cloud. Obtaining a complete view of all data, therefore, requires the `clusterAllReplicas` function. See [here](/operations/system-tables/overview#system-tables-in-clickhouse-cloud) for further details.
3-
:::
3+
:::

docs/about-us/about-faq-index.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -16,4 +16,4 @@ description: 'Landing page'
1616
| [Is it possible to deploy ClickHouse with separate storage and compute?](/faq/operations/deploy-separate-storage-and-compute) |
1717
| [Questions about ClickHouse use cases](/faq/use-cases) |
1818
| [Can I use ClickHouse as a key-value storage?](/faq/use-cases/key-value) |
19-
| [Can I use ClickHouse as a time-series database?](/faq/use-cases/time-series) |
19+
| [Can I use ClickHouse as a time-series database?](/faq/use-cases/time-series) |

docs/best-practices/_snippets/_bulk_inserts.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,4 +8,4 @@ If you're unable to batch data client-side, ClickHouse supports asynchronous ins
88

99
:::tip
1010
Regardless of the size of your inserts, we recommend keeping the number of insert queries around one insert query per second. The reason for that recommendation is that the created parts are merged to larger parts in the background (in order to optimize your data for read queries), and sending too many insert queries per second can lead to situations where the background merging can't keep up with the number of new parts. However, you can use a higher rate of insert queries per second when you use asynchronous inserts (see asynchronous inserts).
11-
:::
11+
:::

docs/best-practices/minimize_optimize_joins.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -66,4 +66,4 @@ For optimal performance:
6666
* Avoid more than 3–4 joins per query.
6767
* Benchmark different algorithms on real data - performance varies based on JOIN key distribution and data size.
6868

69-
For more on JOIN optimization strategies, JOIN algorithms, and how to tune them, refer to the[ ClickHouse documentation](/guides/joining-tables) and this [blog series](https://clickhouse.com/blog/clickhouse-fully-supports-joins-part1).
69+
For more on JOIN optimization strategies, JOIN algorithms, and how to tune them, refer to the[ ClickHouse documentation](/guides/joining-tables) and this [blog series](https://clickhouse.com/blog/clickhouse-fully-supports-joins-part1).

docs/chdb/guides/query-remote-clickhouse.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -186,4 +186,4 @@ FROM Python(df)
186186
0 0.693855 2024-09-19 0.000003 2020-02-09
187187
```
188188

189-
If you want to learn more about querying Pandas DataFrames, see the [Pandas DataFrames developer guide](querying-pandas.md).
189+
If you want to learn more about querying Pandas DataFrames, see the [Pandas DataFrames developer guide](querying-pandas.md).

docs/chdb/guides/querying-apache-arrow.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -173,4 +173,4 @@ round(avg(avg_lat_down_ms), 2): 554355225.76
173173
round(avg(avg_lat_up_ms), 2): 552843178.3
174174
round(avg(tests), 2): 6.31
175175
round(avg(devices), 2): 2.88
176-
```
176+
```

docs/chdb/guides/querying-parquet.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -177,4 +177,4 @@ chdb.query(query, 'DataFrame')
177177
```
178178

179179
Interestingly, there are more 5 star reviews than all the other ratings combined!
180-
It looks like people like the products on Amazon or, if they don't, they just don't submit a rating.
180+
It looks like people like the products on Amazon or, if they don't, they just don't submit a rating.

0 commit comments

Comments
 (0)