From d1ae9a68f66813b384afeb6d43831714aa94d691 Mon Sep 17 00:00:00 2001 From: NAIR BENREKIA Nour Eddine Yassine INNOV/IT-S Date: Mon, 31 Mar 2025 10:33:04 +0200 Subject: [PATCH] Adapt Core API tutorials to Khiops V11 --- ...in, Evaluate and Deploy a Classifier.ipynb | 55 ++++++++++--------- ...sifier on a Star Multi-Table Dataset.ipynb | 51 +++++++++-------- ...r on a Snowflake Multi-Table Dataset.ipynb | 21 ++++--- Core Basics 4 - Train a Coclustering.ipynb | 22 +++++--- 4 files changed, 82 insertions(+), 67 deletions(-) diff --git a/Core Basics 1 - Train, Evaluate and Deploy a Classifier.ipynb b/Core Basics 1 - Train, Evaluate and Deploy a Classifier.ipynb index 4adebd2..0f6e528 100644 --- a/Core Basics 1 - Train, Evaluate and Deploy a Classifier.ipynb +++ b/Core Basics 1 - Train, Evaluate and Deploy a Classifier.ipynb @@ -32,7 +32,7 @@ " print(\"\")\n", "\n", "\n", - "# If there are any issues you may Khiops status with the following command\n", + "# If there are any issues, you may print Khiops status with the following command:\n", "# kh.get_runner().print_status()" ] }, @@ -43,12 +43,12 @@ "## Training a Classifier\n", "We'll train a classifier for the `Iris` dataset. This is a classical dataset containing the data of different plants belonging to the genus _Iris_. It contains 150 records, 50 for each of three variants of _Iris_: _Setosa_, _Virginica_ and _Versicolor_. The records for each sample contain the length and width of its petal and sepal. The standard task for this dataset is to construct a classifier for the type of _Iris_ taking as inputs the length and width characteristics.\n", "\n", - "Now to train a classifier with Khiops we use two types of files:\n", + "Now to train a classifier with Khiops, we use two types of files:\n", "- A plain-text delimited data file (for example a `csv` file)\n", "- A _dictionary_ file which describes the schema of the above data table (`.kdic` file extension)\n", "\n", "\n", - "Let's save into variables the locations of these files for the `Iris` dataset and then take a look at their contents:" + "Let's save, into variables, the locations of these files for the `Iris` dataset and then take a look at their contents:" ] }, { @@ -70,7 +70,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Note that the _Iris_ variant information is in the column `Class`. Now let's specify directory to save our results:" + "Note that the _Iris_ variant information is in the column `Class`. Now let's specify the path to the analysis report file." ] }, { @@ -79,8 +79,9 @@ "metadata": {}, "outputs": [], "source": [ - "iris_results_dir = os.path.join(\"exercises\", \"Iris\")\n", - "print(f\"Iris results directory: {iris_results_dir}\")" + "analysis_report_file_path_Iris = os.path.join(\"exercises\", \"Iris\", \"AnalysisReport.khj\")\n", + "\n", + "print(f\"Iris analysis report file path: {analysis_report_file_path_Iris}\")" ] }, { @@ -88,8 +89,8 @@ "metadata": {}, "source": [ "We are now ready to train the classifier with the Khiops function `train_predictor`. This method returns a tuple containing the location of two files:\n", - "- the modeling report (`AllReports.khj`): A JSON file containing information such as the informativeness of each variable, those selected for the model and performance metrics.\n", - "- model's _dictionary_ file (`Modeling.kdic`): This file is an enriched version of the initial dictionary file that contains the model. It can be used to make predictions on new data." + "- the modeling report (`AnalysisReport.khj`): A JSON file containing information such as the informativeness of each variable, those selected for the model and performance metrics. It is saved into `analysis_report_file_path_Iris` variable that we just defined.\n", + "- model's _dictionary_ file (`AnalysisReport.model.kdic`): This file is an enriched version of the initial dictionary file that contains the model. It can be used to make predictions on new data." ] }, { @@ -103,7 +104,7 @@ " dictionary_name=\"Iris\",\n", " data_table_path=iris_data_file,\n", " target_variable=\"Class\",\n", - " results_dir=iris_results_dir,\n", + " analysis_report_file_path=analysis_report_file_path_Iris,\n", " max_trees=0, # by default Khiops constructs 10 decision tree variables\n", ")\n", "print(f\"Iris report file: {iris_report}\")\n", @@ -114,7 +115,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "You can verify that the result files were created in `iris_results_dir`. In the next sections, we'll use the file at `iris_report` to assess the models' performances and the file at `iris_model_kdic` to deploy it. Now we can see the report with the Khiops Visualization app:" + "Note that `iris_report` (the first element of the tuple returned by train_predictor) is identical to `analysis_report_file_path_Iris`. \n", + "\n", + "In the next sections, we'll use the file at `iris_report` to assess the models' performances and the file at `iris_model_kdic` to deploy it. Now we can have a look at the report with the Khiops Visualization app:" ] }, { @@ -133,9 +136,9 @@ "source": [ "### Exercise\n", "\n", - "We'll repeat the examples on this notebook with the `Adult` dataset. It contains characteristics of the adult population in USA such as age, gender and education and its task is to predict the variable `class`, which indicates if the individual earns `more` or `less` than 50,000 dollars.\n", + "We'll repeat the previous steps on the `Adult` dataset. This dataset contains characteristics of the adult population in USA such as age, gender and education and its task is to predict the variable `class`, which indicates if the individual earns `more` or `less` than 50,000 dollars.\n", "\n", - "Let's start by putting into variables the paths for the `Adult` dataset:" + "Let's start by putting, into variables, the paths for the `Adult` dataset:" ] }, { @@ -173,7 +176,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We now save the results directory for this exercise:" + "We now specify the path to the analysis report file for this exercise:" ] }, { @@ -182,8 +185,11 @@ "metadata": {}, "outputs": [], "source": [ - "adult_results_dir = os.path.join(\"exercises\", \"Adult\")\n", - "print(f\"Adult results directory: {adult_results_dir}\")" + "analysis_report_file_path_Adult = os.path.join(\n", + " \"exercises\", \"Adult\", \"AnalysisReport.khj\"\n", + ")\n", + "\n", + "print(f\"Adult analysis report file path: {analysis_report_file_path_Adult}\")" ] }, { @@ -191,7 +197,7 @@ "metadata": {}, "source": [ "#### Train a classifier for the `Adult` database\n", - "Note the name of the target variable is `class` (**in lower case!**). Do not forget to set `max_trees=0`. Save the resulting file locations into the variables `adult_report` and `adult_model_kdic` and print them" + "Note the name of the target variable is `class` (**in lower case!**). Do not forget to set `max_trees=0`. Save the resulting file locations into the variables `adult_report` and `adult_model_kdic` and print them." ] }, { @@ -207,7 +213,7 @@ " dictionary_name=\"Adult\",\n", " data_table_path=adult_data_file,\n", " target_variable=\"class\",\n", - " results_dir=adult_results_dir,\n", + " analysis_report_file_path=analysis_report_file_path_Adult,\n", " max_trees=0,\n", ")\n", "print(f\"Adult report file: {adult_report}\")\n", @@ -239,7 +245,7 @@ "source": [ "## Accessing a Classifiers' Basic Evaluation Metrics\n", "\n", - "We access the classifier's evaluation metrics by loading file at `iris_report` file with the Khiops function `read_analysis_results_file`:" + "We access the classifier's evaluation metrics by loading the file at `iris_report` with the Khiops function `read_analysis_results_file`:" ] }, { @@ -292,7 +298,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "These objects are of class `PredictorPerformance` and have `accuracy` and `auc` attributes for these metrics:" + "These objects are of class `PredictorPerformance`. They have access to `accuracy` and `auc` attributes:" ] }, { @@ -376,7 +382,7 @@ "metadata": {}, "source": [ "## Deploying a Classifier\n", - "We are going to deploy the `Iris` classifier we have just trained on the same dataset (normally we would do this on new data). We saved the model in the file `iris_model_kdic`. This file is usually large and incomprehensible, so you should know what you are doing before editing it. Just this time let's take a quick look at its contents:" + "We are going to deploy the `Iris` classifier we have just trained on the same dataset (normally we would do this on new data). We saved the model in the file `iris_model_kdic`. This file is usually large and incomprehensible, so you should know what you are doing before editing it. Let's take a quick look at its contents:" ] }, { @@ -392,12 +398,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Note that the modeling dictionary contains 5 used variables:\n", - "- `Class` : The original target of the dataset\n", + "Note that the modeling dictionary contains 4 used variables:\n", "- `PredictedClass` : The class with the highest probability according to the model\n", "- `ProbClassIris-setosa`, `ProbClassIris-versicolor`, `ProbClassIris-virginica`: The probabilities of each class according to the model\n", "\n", - "These will be the columns of the output table when deploying the model:" + "These will be the columns of the table obtained after deploying the model. This table will be saved at `iris_deployment_file`." ] }, { @@ -406,7 +411,7 @@ "metadata": {}, "outputs": [], "source": [ - "iris_deployment_file = os.path.join(iris_results_dir, \"iris_deployment.txt\")\n", + "iris_deployment_file = os.path.join(\"exercises\", \"Iris\", \"iris_deployment.txt\")\n", "kh.deploy_model(\n", " iris_model_kdic,\n", " dictionary_name=\"SNB_Iris\",\n", @@ -434,7 +439,7 @@ }, "outputs": [], "source": [ - "adult_deployment_file = os.path.join(adult_results_dir, \"adult_deployment.txt\")\n", + "adult_deployment_file = os.path.join(\"exercises\", \"Adult\", \"adult_deployment.txt\")\n", "kh.deploy_model(\n", " adult_model_kdic,\n", " dictionary_name=\"SNB_Adult\",\n", diff --git a/Core Basics 2 - Train a Classifier on a Star Multi-Table Dataset.ipynb b/Core Basics 2 - Train a Classifier on a Star Multi-Table Dataset.ipynb index 430c05e..e5a7dce 100644 --- a/Core Basics 2 - Train a Classifier on a Star Multi-Table Dataset.ipynb +++ b/Core Basics 2 - Train a Classifier on a Star Multi-Table Dataset.ipynb @@ -78,9 +78,9 @@ "```\n", "The `HeadlineId` variable is special because it is a _key_ that links a particular headline to its words (a 1:n relation).\n", "\n", - "*Note: There are other methods more appropriate for this text-mining problem. This multi-table setup is only for pedagogical purporses.*\n", + "*Note: There are other methods more appropriate for this text-mining problem. This multi-table setup is only used for pedagogical purporses.*\n", "\n", - "To train a classifier with Khiops in this multi-table setup, this schema must be codified in the dictionary file. Let's check the contents of the `HeadlineSarcasm` dictionary file:" + "To train a classifier with Khiops in this multi-table setup, this schema must be coded in a dictionary file. Let's check the contents of the `HeadlineSarcasm` dictionary file:" ] }, { @@ -101,11 +101,11 @@ "metadata": {}, "source": [ "As in the single-table case the `.kdic`file describes the schema for both tables, but note the following differences:\n", - "- The dictionary for the table `Headline` is prefixed by the `Root` keyword to indicate that is the main one.\n", - "- For both tables, their dictionary names are followed by `(HeadlineId)` to indicate that `HeadlineId` is the key of these tables.\n", - "- The schema for the main table contains an extra special variable defined with the statement `Table(Words) HeadlineWords`. This is, in addition to sharing the same key variable, is necessary to indicate the `1:n` relationship between the main and secondary table.\n", + "- The dictionary for the table `Headline` is prefixed by the `Root` keyword. It is here optional and simply tags the main dictionary `Headline` representing the statistical instances.\n", + "- For both tables, dictionary names are followed by `(HeadlineId)` to indicate that `HeadlineId` is their key.\n", + "- The schema of the main table contains an extra special variable defined with the statement `Table(Words) HeadlineWords`. This is, in addition to sharing the same key variable, necessary to indicate the `1:n` relationship between the main and secondary table.\n", "\n", - "Now let's store the location main and secondary tables and peek their contents:" + "Now let's store the location of the main and secondary tables and peek their contents:" ] }, { @@ -117,7 +117,7 @@ "sarcasm_headlines_file = os.path.join(\"data\", \"HeadlineSarcasm\", \"Headlines.txt\")\n", "sarcasm_words_file = os.path.join(\"data\", \"HeadlineSarcasm\", \"HeadlineWords.txt\")\n", "\n", - "print(f\"HeadlineSarcasm main table file: {sarcasm_headlines_file}\")\n", + "print(f\"HeadlineSarcasm main table file location: {sarcasm_headlines_file}\")\n", "print(\"\")\n", "peek(sarcasm_headlines_file, n=3)\n", "\n", @@ -133,12 +133,12 @@ "The call to the `train_predictor` will be very similar to the single-table case but there are some differences. \n", "\n", "The first is that we must pass the path of the extra secondary data table. This is done with the `additional_data_tables` parameter that is a Python dictionary containing key-value pairs for each table. More precisely:\n", - "- keys describe *data paths* of secondary tables. In this case only ``Headline`HeadlineWords``\n", - "- values describe the *file paths* of secondary tables. In this case only the file path we stored in `sarcasm_words_file`\n", + "- keys describe *data paths* of secondary tables. In this case only, it is ``HeadlineWords``\n", + "- values describe the *file paths* of secondary tables. In this case only, it is the file path we stored in `sarcasm_words_file`\n", "\n", - "*Note: For understanding what data paths are see the \"Multi-Table Tasks\" section of the Khiops `core.api` documentation*\n", + "*Note: To understand what data paths are, please check the \"Multi-Table Tasks\" section of the Khiops `core.api` documentation*\n", "\n", - "Secondly, we specify how many features/aggregates Khiops will create with its multi-table AutoML mode. For the `HeadlineSarcasm` dataset Khiops can create features such as:\n", + "Secondly, we must specify how many features/aggregates Khiops will create (at most) with its multi-table AutoML mode. For the `HeadlineSarcasm` dataset Khiops can create features such as:\n", "- *Number of different words in the headline* \n", "- *Most common word in the headline before the third one*\n", "- *Number of times the word 'the' appears*\n", @@ -146,7 +146,7 @@ "\n", "It will then evaluate, select and combine the created features to build a classifier. We'll ask to create `1000` of these features (the default is `100`).\n", "\n", - "With these considerations, let's setup the some extra variables and train the classifier:" + "With these considerations, let's now train the classifier:" ] }, { @@ -155,15 +155,17 @@ "metadata": {}, "outputs": [], "source": [ - "sarcasm_results_dir = os.path.join(\"exercises\", \"HeadlineSarcasm\")\n", + "analysis_report_file_path_Sarcasm = os.path.join(\n", + " \"exercises\", \"HeadlineSarcasm\", \"AnalysisReport.khj\"\n", + ")\n", "\n", "sarcasm_report, sarcasm_model_kdic = kh.train_predictor(\n", " sarcasm_kdic,\n", " dictionary_name=\"Headline\", # This must be the main/root dictionary\n", " data_table_path=sarcasm_headlines_file, # This must be the data file for the main table\n", " target_variable=\"IsSarcasm\",\n", - " results_dir=sarcasm_results_dir,\n", - " additional_data_tables={\"Headline`HeadlineWords\": sarcasm_words_file},\n", + " analysis_report_file_path=analysis_report_file_path_Sarcasm,\n", + " additional_data_tables={\"HeadlineWords\": sarcasm_words_file},\n", " max_constructed_variables=1000, # by default Khiops constructs 100 variables for AutoML multi-table\n", " max_trees=0, # by default Khiops constructs 10 decision tree variables\n", ")\n", @@ -192,7 +194,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "*Note: In the multi-table case, the input tables must be sorted by their key column in lexicographical order. To do this you may use the Khiops `sort_data_table` function or your favorite software. The examples of this tutorial have their tables pre-sorted.*" + "*Note: In the multi-table case, the input tables must be sorted by their key column in lexicographical order. To do this, you may use the Khiops `sort_data_table` function. The examples of this tutorial have their tables pre-sorted.*" ] }, { @@ -201,7 +203,7 @@ "source": [ "### Exercise time!\n", "\n", - "Repeat the previous steps with the `AccidentsSummary` dataset. It describes the characteristics of traffic accidents that happened in France in 2018. It has two tables with the following schema:\n", + "Repeat the previous steps with the `AccidentsSummary` dataset. This dataset describes the characteristics of traffic accidents that happened in France in 2018. It has two tables with the following schema:\n", "```\n", "+---------------+\n", "|Accidents |\n", @@ -220,7 +222,7 @@ " +---1:n--->|... |\n", " +---------------+\n", "```\n", - "So for each accident we have its characteristics (such as `Gravity` or `Light` conditions) and those of each involved vehicle (its `Direction` or `PassengerNumber`). The main task for this dataset is to predict the variable `Gravity` that has two possible values:`Lethal` and `NonLethal`.\n", + "For each accident, we have its characteristics (such as `Gravity` or `Light` conditions) and those of each involved vehicle (its `Direction` or `PassengerNumber`). The main task for this dataset is to predict the variable `Gravity` that has two possible values:`Lethal` and `NonLethal`.\n", "\n", "We first save the paths of the `AccidentsSummary` dictionary file and data table files into variables:" ] @@ -275,7 +277,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We now save the results directory for this exercise:" + "We now define the path of the modeling report for this exercise:" ] }, { @@ -284,8 +286,9 @@ "metadata": {}, "outputs": [], "source": [ - "accidents_results_dir = os.path.join(\"exercises\", \"AccidentSummary\")\n", - "print(f\"AccidentsSummary exercise results directory: {accidents_results_dir}\")" + "analysis_report_file_path_Accidents = os.path.join(\n", + " \"exercises\", \"AccidentSummary\", \"AnalysisReport.khj\"\n", + ")" ] }, { @@ -297,7 +300,7 @@ "\n", "Do not forget:\n", "- The target variable is `Gravity`\n", - "- The key for the `additional_data_tables` parameter is ``Accident`Vehicles`` and its value that of `vehicles_data_file`\n", + "- The key for the `additional_data_tables` parameter is ``Vehicles`` and its value that of `vehicles_data_file`\n", "- Set `max_trees=0`" ] }, @@ -314,8 +317,8 @@ " dictionary_name=\"Accident\",\n", " data_table_path=accidents_data_file,\n", " target_variable=\"Gravity\",\n", - " results_dir=accidents_results_dir,\n", - " additional_data_tables={\"Accident`Vehicles\": vehicles_data_file},\n", + " analysis_report_file_path=analysis_report_file_path_Accidents,\n", + " additional_data_tables={\"Vehicles\": vehicles_data_file},\n", " max_constructed_variables=1000,\n", " max_trees=0,\n", ")\n", diff --git a/Core Basics 3 - Train a Classifier on a Snowflake Multi-Table Dataset.ipynb b/Core Basics 3 - Train a Classifier on a Snowflake Multi-Table Dataset.ipynb index accf3d2..70ca6b5 100644 --- a/Core Basics 3 - Train a Classifier on a Snowflake Multi-Table Dataset.ipynb +++ b/Core Basics 3 - Train a Classifier on a Snowflake Multi-Table Dataset.ipynb @@ -43,7 +43,7 @@ "source": [ "### Training a Multi-Table Classifier\n", "\n", - "We'll train a multi-table classifier on a extension of dataset `AccidentsSummary` that we used in the previous notebook _Sklearn Basics 2_. This dataset `Accidents` contains two additional tables `Place` and `User` and is organized in the following relational snowflake schema:\n", + "We'll train a multi-table classifier on an extension of dataset `AccidentsSummary` that we used in the previous notebook _Core Basics 2_. This dataset `Accidents` contains two additional tables `Place` and `User` and is organized according to the following relational snowflake schema:\n", "\n", "```\n", "Accident\n", @@ -119,10 +119,10 @@ "source": [ "#### Train a classifier for the `Accidents` database with 1000 variables\n", "\n", - "The call to the train_predictor is exactly the same as seen before on the exercice of the previous notebook _Sklearn Basics 2_. The only difference is the extension of the dictionary `additional_data_tables`, which contains paths of the additional tables, with two new paths:\n", + "The call to the `train_predictor` function is exactly the same as seen in the previous notebook _Core Basics 2_. The only difference is the extension of the dictionary `additional_data_tables` with two new paths:\n", "\n", - "- Path of entity `Place` is ``Accident`Place``.\n", - "- Path of table `User` is ``Accident`Vehicles`Users``.\n", + "- Path of entity `Place` is ``Place``.\n", + "- Path of table `User` is ``Vehicles/Users``.\n", "\n", "\n", "Same as previously, we'll ask Khiops to create 1000 additional features with its multi-table AutoML mode.\n", @@ -140,17 +140,20 @@ "metadata": {}, "outputs": [], "source": [ - "accidents_results_dir = os.path.join(\"exercises\", \"Accidents\")\n", + "analysis_report_file_path_Accidents = os.path.join(\n", + " \"exercises\", \"Accidents\", \"AnalysisReport.khj\"\n", + ")\n", + "\n", "accidents_report, accidents_model_kdic = kh.train_predictor(\n", " accidents_kdic,\n", " dictionary_name=\"Accident\",\n", " data_table_path=accidents_data_file,\n", " target_variable=\"Gravity\",\n", - " results_dir=accidents_results_dir,\n", + " analysis_report_file_path=analysis_report_file_path_Accidents,\n", " additional_data_tables={\n", - " \"Accident`Vehicles\": vehicles_data_file,\n", - " \"Accident`Place\": places_data_file,\n", - " \"Accident`Vehicles`Users\": users_data_file,\n", + " \"Vehicles\": vehicles_data_file,\n", + " \"Place\": places_data_file,\n", + " \"Vehicles/Users\": users_data_file,\n", " },\n", " max_constructed_variables=1000,\n", " max_trees=0,\n", diff --git a/Core Basics 4 - Train a Coclustering.ipynb b/Core Basics 4 - Train a Coclustering.ipynb index 90ea629..e460399 100644 --- a/Core Basics 4 - Train a Coclustering.ipynb +++ b/Core Basics 4 - Train a Coclustering.ipynb @@ -79,14 +79,16 @@ "metadata": {}, "outputs": [], "source": [ - "countries_results_dir = os.path.join(\"exercises\", \"CountriesByOrganization\")\n", + "coclustering_report_file_path_CountriesByOrganization = os.path.join(\n", + " \"exercises\", \"CountriesByOrganization\", \"CoclusteringResults.khcj\"\n", + ")\n", "\n", "countries_cc_report = kh.train_coclustering(\n", " countries_kdic,\n", " dictionary_name=\"CountriesByOrganization\",\n", " data_table_path=countries_data_file,\n", " coclustering_variables=[\"Country\", \"Organization\"],\n", - " results_dir=countries_results_dir,\n", + " coclustering_report_file_path=coclustering_report_file_path_CountriesByOrganization,\n", " field_separator=\";\",\n", ")" ] @@ -137,10 +139,10 @@ "metadata": {}, "source": [ "### Exercise\n", - "We'll build a coclustering for the `Tokyo2021` dataset. It is extracted for the `Athletes` table of the [Tokyo 2021 Kaggle dataset](https://www.kaggle.com/arjunprasadsarkhel/2021-olympics-in-tokyo) and each record contains three variables:\n", - "- `Name`: the name of a competing athlete\n", - "- `Country`: the country (or organization) it represents\n", - "- `Discipline`: the athletes discipline\n", + "We'll build a coclustering for the `Tokyo2021` dataset which contains a table called `Athletes` [Tokyo 2021 Kaggle dataset](https://www.kaggle.com/arjunprasadsarkhel/2021-olympics-in-tokyo) where each athlete is described by three variables:\n", + "- `Name`: the name of the competing athlete\n", + "- `Country`: the country (or organization) of the athlete\n", + "- `Discipline`: the athlete's discipline\n", "\n", "The idea for this exercise is to make a coclustering between `Country` and `Discipline` and see which countries resemble the most in terms of the athletes they bring to the Olympics. \n", "\n", @@ -155,7 +157,9 @@ "source": [ "tokyo_kdic = os.path.join(\"data\", \"Tokyo2021\", \"Athletes.kdic\")\n", "tokyo_data_file = os.path.join(\"data\", \"Tokyo2021\", \"Athletes.csv\")\n", - "tokyo_results_dir = os.path.join(\"exercises\", \"Tokyo2021\")" + "coclustering_report_file_path_Tokyo2021 = os.path.join(\n", + " \"exercises\", \"Tokyo2021\", \"CoclusteringResults.khcj\"\n", + ")" ] }, { @@ -201,7 +205,7 @@ " dictionary_name=\"Athletes\",\n", " coclustering_variables=[\"Country\", \"Discipline\"],\n", " data_table_path=tokyo_data_file,\n", - " results_dir=tokyo_results_dir,\n", + " coclustering_report_file_path=coclustering_report_file_path_Tokyo2021,\n", " field_separator=\",\",\n", ")" ] @@ -266,7 +270,7 @@ "outputs": [], "source": [ "tokyo_discipline_clusters_file = os.path.join(\n", - " \"exercises\", \"Tokyo2021\", \"CountryClusters.txt\"\n", + " \"exercises\", \"Tokyo2021\", \"DisciplineClusters.txt\"\n", ")\n", "\n", "kh.extract_clusters(\n",