-
Notifications
You must be signed in to change notification settings - Fork 3
discretization
#Discretization Algorithms
Many machine learning techniques can be applied only to data sets composed of categorical attributes but a lot of data sets include continuous variables. Cyni Toolbox intends to provide to users a tool that could be used as previous step in the final goal of generating network inference or just as another independent functionality. This tool is available in Cyni Toolbox tab in Cytoscape control panel. Once this tab is clicked, a new panel is shown as the one in Figure below. Then, at the bottom of the Cyni Toolbox panel, there are three tabs and the tab Discretize Data is the one that needs to be clicked to access to data discretization tools.
In this dialog, we can see two drop-box elements that allow users to select the two main elements of this feature. The first element is the technique to use and the second one is the table data where there are continuous values that need to be discretized. After these two elements are chosen, the Cyni panel gets filled with the parameters related to the chosen options. The following techniques are available in Cyni Toolbox.
##Equal Frequency/Width Algorithm
The equal-width discretization algorithm determines the minimum and maximum values of the discretized attribute and then divides the range into the user-defined number of equal width discrete intervals. The equal-frequency algorithm determines the minimum and maximum values of the discretized attribute, sorts all values in ascending order, and divides the range into a user-defined number of intervals so that every interval contains the same number of sorted values. Cyni provides these two possibilities in one algorithm. Figure shows the Cyni panel where the parameters for this technique can be chosen.
Once the input parameters has been selected, clicking on Apply will produce a new column for each chosen column to discretize. These new columns will have the same name of original columns with the prefix “nominal.”. The input parameters for this technique are:
- Intervals: The number of intervals
- Use Equal Frequency: Allow users to choose between equal with algorithm or equal frequency algorithm. If not selected, the equal with technique will be used.
- Apply same discretization thresholds for all selected attributes: The discretization technique can be applied independently to each selected column/attribute or all selected columns/attributes will be considered as a long column and so the generated intervals will be common for all columns/attributes. If not selected, there will be different intervals for each selected attribute.
- Numerical Attributes: The list of column/attributes that need to be discretized. This is a multiple selected box and it shows the names of columns that contains numerical values for the selected table. Each selected name of column in this list will mean that its column will be discretized.
##Manual Discretization
Cyni Toolbox also provides the possibility to users to define their own thresholds to create the desired discrete data. This option is available for a limit number of intervals and for each selected interval the corresponding thresholds used to discretize the data must be specified. The thresholds can be selected without a defined order, the algorithm will take all selected thresholds and it will order them before using them. There is no possibility to choose separate thresholds for different group of numerical attributes. Therefore, once the input parameters has been selected, clicking on '''<<Color2(Apply,green)>>''' will produce a new column for each chosen column to discretize and the chosen thresholds will be used for all selected attributes. These new columns will have the same name of original columns with the prefix “nominal.”.
The input parameters for this technique are:
- Number of Intervals: The number of intervals.
- Threshold X: The threshold used to create the intervals. Its value can be modified using the slider or setting the desired value in the text box.
- Numerical Attributes: The list of column/attributes that need to be discretized. This is a multiple selected box and it shows the names of columns that contains numerical values for the selected table. Each selected name of column in this list will mean that its column will be discretized.