Skip to content

Commit

Permalink
Provide config parameter description in the notebook
Browse files Browse the repository at this point in the history
Signed-off-by: Constantin M Adam <[email protected]>
  • Loading branch information
cmadam committed Dec 4, 2024
1 parent 2a01b1b commit 54ced0e
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 10 deletions.
10 changes: 5 additions & 5 deletions transforms/universal/doc_id/doc_id.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,11 @@
"jp-MarkdownHeadingCollapsed": true
},
"source": [
"##### **** Configure the transform parameters. The set of dictionary keys holding DocQualityTransform configuration for values are as follows: \n",
"* text_lang - specifies language used in the text content. By default, \"en\" is used.\n",
"* doc_content_column - specifies column name that contains document text. By default, \"contents\" is used.\n",
"* bad_word_filepath - specifies a path to bad word file: local folder (file or directory) that points to bad word file. You don't have to set this parameter if you don't need to set bad words.\n",
"#####"
"##### **** Configure the transform parameters. The set of dictionary keys holding DocIDTransform configuration for values are as follows: \n",
"* doc_column - specifies name of the column containing the document (required for ID generation)\n",
"* hash_column - specifies name of the column created to hold the string document id, if None, id is not generated\n",
"* int_id_column - specifies name of the column created to hold the integer document id, if None, id is not generated\n",
"* start_id - an id from which ID generator starts () "
]
},
{
Expand Down
12 changes: 7 additions & 5 deletions transforms/universal/ededup/ededup.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,13 @@
"jp-MarkdownHeadingCollapsed": true
},
"source": [
"##### **** Configure the transform parameters. The set of dictionary keys holding DocQualityTransform configuration for values are as follows: \n",
"* text_lang - specifies language used in the text content. By default, \"en\" is used.\n",
"* doc_content_column - specifies column name that contains document text. By default, \"contents\" is used.\n",
"* bad_word_filepath - specifies a path to bad word file: local folder (file or directory) that points to bad word file. You don't have to set this parameter if you don't need to set bad words.\n",
"#####"
"##### **** Configure the transform parameters. The set of dictionary keys holding EdedupTransform configuration for values are as follows: \n",
"* doc_column - specifies name of the column containing documents\n",
"* doc_id_column - specifies the name of the column containing a document id\n",
"* use_snapshot - specifies that ededup execution starts with a set of pre-existing hashes, enabling incremental\n",
"execution\n",
"* snapshot_directory - specifies the directory for reading snapshots. If not provided, the default is\n",
"`output_folder/snapshot`"
]
},
{
Expand Down

0 comments on commit 54ced0e

Please sign in to comment.