Skip to content

Commit e701ae1

Browse files
authored
Update README.md
Updated to be accurate to the most recent version
1 parent 533f116 commit e701ae1

File tree

1 file changed

+27
-13
lines changed

1 file changed

+27
-13
lines changed

README.md

Lines changed: 27 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -6,33 +6,47 @@ Note: As of right now, only supervised learning support is being developed. Unsu
66

77
## How To Use: Returning Data Object
88

9-
To return a data object, the user will call the **ML_Data_Process** function. This function has 5* parameters:
9+
To return a data object, the user will call the **ML_Data_Process** function. This function has 6 parameters:
1010

11-
1. SearchProperties
12-
13-
These are the properties that the user wants to *search* for. For example, say a user is interested in searching for materials whose color property is red. This should be specified in this parameter
14-
15-
2. FeatureProperties
11+
1. FeatureProperties
1612

1713
These are the properties the user wants to use as the features of their data. To continue the example from before, they may only want materials whose color property is red but want the properties that serve as the features for their model to be something such as molar mass, stress, solubility, etc. They should specify those properties here.
1814

19-
3. LabelProperties
15+
2. LabelProperties
2016

2117
These are the properties that serve as the labels for the model, similar to the features.
2218

23-
4. Preprocess
19+
3. Library
2420

25-
This parameter is where the user can specify if they want the function to preprocess the data for them. It can be set to False, in which case the function will do no preprocessing and only return a data object compatible with the user's machine learning library (if supported). If the user does want the function to preprocess the data for them, they can set Preprocess to a list of Boolean variables representing what types of preprocessing they want to do. As of right now, this list is of the form [Normalization, Outlier Removal, One Hot Encoding]. In other words, if the user is dealing with numeric data and wants it to be normalized and the outliers to be removed, they would set it to [True, True, False]. Or if they were dealing with categorical data and needed it to be one hot encoded, they would set it to [False, False, True].
21+
This is where the user specifies which of the supported machine learning libraries they are using. As of now, support is being developed for Scikit-learn, Tensorflow, and Pytorch. This is specified using an enum named ML_Library. For example, if the user is using pytorch, they would pass in the value ML_Library.PYTORCH
2622

27-
5. Library
23+
4. SearchParameter
2824

29-
This is where the user specifies which of the supported machine learning libraries they are using. As of now, support is being developed for Scikit-learn, Tensorflow, and Pytorch. This is specified using an enum named ML_Library. For example, if the user is using pytorch, they would pass in the value ML_Library.PYTORCH
25+
This should be a substring present in the names of all the materials the user wants to go through. For example, if you were only interested in materials that contained "carb" in their name, you would set SearchParamter to "carb". The default value for this parameter is an empty string ''.
26+
27+
5. Preprocess
28+
29+
This parameter is where the user can specify if they want the function to preprocess the data for them. The default value for this parameter is False, meaning no preprocessing will be done if not specified and it will only return a data object compatible with the user's machine learning library (if supported). If the user wants the function to preprocess the data for them, they can set Preprocess to a list of Boolean variables representing what types of preprocessing they want to do. As of right now, this list is of the form [Normalization, Outlier Removal, One Hot Encoding]. In other words, if the user is dealing with numeric data and wants it to be normalized and the outliers to be removed, they would set it to [True, True, False]. Or if they were dealing with categorical data and needed it to be one hot encoded, they would set it to [False, False, True].
30+
31+
6. Limit
32+
33+
This parameter controls how many materials the search function will go through, *not how many final materials there are*. After the search function gathers that many materials, many have to be removed due to lacking the specified properties. By default this is set to 10000.
3034

3135
## How to Use: Exporting
3236

3337
There is another function in this code called **ML_Data_Save**. Instead of returning a data object compatible with a machine learning library, this saves the data into a csv file on the user's machine. This is useful if they wanted to instead analyze the data in R.
3438

35-
The use of this function is almost identical to the previous function. It has the same parameters except for Preprocess and Library. They function the same as in the previous function. The user calls this function with their parameters and it will save the material data into a csv file named "material_data"
39+
This function has 3 parameters:
40+
41+
1. Properties
42+
43+
These are the properties that the user wants to save as part of their data, similar to FeatureProperties and LabelProperties
44+
45+
3. Limit
46+
47+
This is the same as in ML_Data_Process
48+
49+
5. SearchParameter
3650

37-
NOTE: These features are still early in development, and as such do not currently function and use is heavily subject to change
51+
This is the same as in ML_Data_Process
3852

0 commit comments

Comments
 (0)