You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+27-13Lines changed: 27 additions & 13 deletions
Original file line number
Diff line number
Diff line change
@@ -6,33 +6,47 @@ Note: As of right now, only supervised learning support is being developed. Unsu
6
6
7
7
## How To Use: Returning Data Object
8
8
9
-
To return a data object, the user will call the **ML_Data_Process** function. This function has 5* parameters:
9
+
To return a data object, the user will call the **ML_Data_Process** function. This function has 6 parameters:
10
10
11
-
1. SearchProperties
12
-
13
-
These are the properties that the user wants to *search* for. For example, say a user is interested in searching for materials whose color property is red. This should be specified in this parameter
14
-
15
-
2. FeatureProperties
11
+
1. FeatureProperties
16
12
17
13
These are the properties the user wants to use as the features of their data. To continue the example from before, they may only want materials whose color property is red but want the properties that serve as the features for their model to be something such as molar mass, stress, solubility, etc. They should specify those properties here.
18
14
19
-
3. LabelProperties
15
+
2. LabelProperties
20
16
21
17
These are the properties that serve as the labels for the model, similar to the features.
22
18
23
-
4. Preprocess
19
+
3. Library
24
20
25
-
This parameter is where the user can specify if they want the function to preprocess the data for them. It can be set to False, in which case the function will do no preprocessing and only return a data object compatible with the user's machine learning library (if supported). If the user does want the function to preprocess the data for them, they can set Preprocess to a list of Boolean variables representing what types of preprocessing they want to do. As of right now, this list is of the form [Normalization, Outlier Removal, One Hot Encoding]. In other words, if the user is dealing with numeric data and wants it to be normalized and the outliers to be removed, they would set it to [True, True, False]. Or if they were dealing with categorical data and needed it to be one hot encoded, they would set it to [False, False, True].
21
+
This is where the user specifies which of the supported machine learning libraries they are using. As of now, support is being developed for Scikit-learn, Tensorflow, and Pytorch. This is specified using an enum named ML_Library. For example, if the user is using pytorch, they would pass in the value ML_Library.PYTORCH
26
22
27
-
5. Library
23
+
4. SearchParameter
28
24
29
-
This is where the user specifies which of the supported machine learning libraries they are using. As of now, support is being developed for Scikit-learn, Tensorflow, and Pytorch. This is specified using an enum named ML_Library. For example, if the user is using pytorch, they would pass in the value ML_Library.PYTORCH
25
+
This should be a substring present in the names of all the materials the user wants to go through. For example, if you were only interested in materials that contained "carb" in their name, you would set SearchParamter to "carb". The default value for this parameter is an empty string ''.
26
+
27
+
5. Preprocess
28
+
29
+
This parameter is where the user can specify if they want the function to preprocess the data for them. The default value for this parameter is False, meaning no preprocessing will be done if not specified and it will only return a data object compatible with the user's machine learning library (if supported). If the user wants the function to preprocess the data for them, they can set Preprocess to a list of Boolean variables representing what types of preprocessing they want to do. As of right now, this list is of the form [Normalization, Outlier Removal, One Hot Encoding]. In other words, if the user is dealing with numeric data and wants it to be normalized and the outliers to be removed, they would set it to [True, True, False]. Or if they were dealing with categorical data and needed it to be one hot encoded, they would set it to [False, False, True].
30
+
31
+
6. Limit
32
+
33
+
This parameter controls how many materials the search function will go through, *not how many final materials there are*. After the search function gathers that many materials, many have to be removed due to lacking the specified properties. By default this is set to 10000.
30
34
31
35
## How to Use: Exporting
32
36
33
37
There is another function in this code called **ML_Data_Save**. Instead of returning a data object compatible with a machine learning library, this saves the data into a csv file on the user's machine. This is useful if they wanted to instead analyze the data in R.
34
38
35
-
The use of this function is almost identical to the previous function. It has the same parameters except for Preprocess and Library. They function the same as in the previous function. The user calls this function with their parameters and it will save the material data into a csv file named "material_data"
39
+
This function has 3 parameters:
40
+
41
+
1. Properties
42
+
43
+
These are the properties that the user wants to save as part of their data, similar to FeatureProperties and LabelProperties
44
+
45
+
3. Limit
46
+
47
+
This is the same as in ML_Data_Process
48
+
49
+
5. SearchParameter
36
50
37
-
NOTE: These features are still early in development, and as such do not currently function and use is heavily subject to change
0 commit comments