| 1 | =============================================================== |
|---|
| 2 | KnowledgeFlow GUI Quick Primer |
|---|
| 3 | =============================================================== |
|---|
| 4 | |
|---|
| 5 | What's new in the KnowledgeFlow: |
|---|
| 6 | |
|---|
| 7 | Components can now be grouped together under a "meta" component. Start |
|---|
| 8 | by placing some components on the layout and connecting them |
|---|
| 9 | together. Then select a subset of components with the mouse by holding |
|---|
| 10 | down the left button and dragging the resulting rectangle. You will |
|---|
| 11 | then be asked whether you wish to group the selected components and |
|---|
| 12 | for a name to give the group. The selected components will then be |
|---|
| 13 | replaced by a single icon on the layout. All grouped beans can still |
|---|
| 14 | be configured and connections made by right-clicking on the icon to |
|---|
| 15 | display a pop-up menu. At the moment meta components can't form part |
|---|
| 16 | of another group (this functionality will be added in a later |
|---|
| 17 | release). Eventually, functionality will be added to allow the user |
|---|
| 18 | to store custom groups in a user toolbar for reuse. |
|---|
| 19 | |
|---|
| 20 | Introduction: |
|---|
| 21 | |
|---|
| 22 | The KnowledgeFlow provides an alternative to the Explorer as a |
|---|
| 23 | graphical front end to Weka's core algorithms. The KnowledgeFlow is a |
|---|
| 24 | work in progress so some of the functionality from the Explorer is not |
|---|
| 25 | yet available. On the other hand, there are things that can be done in |
|---|
| 26 | the KnowledgeFlow but not in the Explorer. |
|---|
| 27 | |
|---|
| 28 | The KnowledgeFlow presents a "data-flow" inspired interface to |
|---|
| 29 | Weka. The user can select Weka components from a tool bar, place them |
|---|
| 30 | on a layout canvas and connect them together in order to form a |
|---|
| 31 | "knowledge flow" for processing and analyzing data. At present, all of |
|---|
| 32 | Weka's classifiers, filters, clusterers, loaders and savers are |
|---|
| 33 | available in the KnowledgeFlow along with some extra tools. |
|---|
| 34 | |
|---|
| 35 | The KnowledgeFlow can handle data either incrementally or in batches |
|---|
| 36 | (the Explorer handles batch data only). Of course learning from data |
|---|
| 37 | incrementally requires a classifier that can be updated on an instance |
|---|
| 38 | by instance basis. Currently in Weka there are five classifiers that |
|---|
| 39 | can handle data incrementally: NaiveBayesUpdateable, IB1, IBk, LWR |
|---|
| 40 | (locally weighted regression). There is also one meta classifier - |
|---|
| 41 | RacedIncrementalLogitBoost - that can use of any regression base |
|---|
| 42 | learner to learn from discrete class data incrementally. |
|---|
| 43 | |
|---|
| 44 | Features of the KnowledgeFlow: |
|---|
| 45 | |
|---|
| 46 | * intuitive data flow style layout |
|---|
| 47 | * process data in batches or incrementally |
|---|
| 48 | * process multiple batches or streams in parallel! (each separate flow |
|---|
| 49 | executes in its own thread) |
|---|
| 50 | * chain filters together |
|---|
| 51 | * view models produced by classifiers for each fold in a cross validation |
|---|
| 52 | * visualize performance of incremental classifiers during |
|---|
| 53 | processing (scrolling plots of classification accuracy, RMS error, |
|---|
| 54 | predictions etc) |
|---|
| 55 | |
|---|
| 56 | Components available in the KnowledgeFlow: |
|---|
| 57 | |
|---|
| 58 | DataSources: |
|---|
| 59 | All of Weka's loaders are available |
|---|
| 60 | |
|---|
| 61 | DataSinks: |
|---|
| 62 | All of Weka's savers are available |
|---|
| 63 | |
|---|
| 64 | Filters: |
|---|
| 65 | All of Weka's filters are available |
|---|
| 66 | |
|---|
| 67 | Classifiers: |
|---|
| 68 | All of Weka's classifiers are available |
|---|
| 69 | |
|---|
| 70 | Clusterers: |
|---|
| 71 | All of Weka's clusterers are available |
|---|
| 72 | |
|---|
| 73 | Evaluation: |
|---|
| 74 | TrainingSetMaker - make a data set into a training set |
|---|
| 75 | TestSetMaker - make a data set into a test set |
|---|
| 76 | CrossValidationFoldMaker - split any data set, training set or test set |
|---|
| 77 | into folds |
|---|
| 78 | TrainTestSplitMaker - split any data set, training set or test set into |
|---|
| 79 | a training set and a test set |
|---|
| 80 | ClassAssigner - assign a column to be the class for any data set, training |
|---|
| 81 | set or test set |
|---|
| 82 | ClassValuePicker - choose a class value to be considered as the "positive" |
|---|
| 83 | class. This is useful when generating data for ROC style curves (see |
|---|
| 84 | below) |
|---|
| 85 | ClassifierPerformanceEvaluator - evaluate the performance of batch |
|---|
| 86 | trained/tested classifiers |
|---|
| 87 | IncrementalClassifierEvaluator - evaluate the performance of incrementally |
|---|
| 88 | trained classifiers |
|---|
| 89 | ClustererPerformanceEvaluator - evaluate the performance of batch |
|---|
| 90 | trained/tested clusterers |
|---|
| 91 | PredictionAppender - append classifier predictions to a test set. For |
|---|
| 92 | discrete class problems, can either append predicted class labels or |
|---|
| 93 | probability distributions |
|---|
| 94 | |
|---|
| 95 | Visualization: |
|---|
| 96 | DataVisualizer - component that can pop up a panel for visualizing data in |
|---|
| 97 | a single large 2D scatter plot |
|---|
| 98 | ScatterPlotMatrix - component that can pop up a panel containing a matrix of |
|---|
| 99 | small scatter plots (clicking on a small plot pops up a large scatter |
|---|
| 100 | plot) |
|---|
| 101 | AttributeSummarizer - component that can pop up a panel containing a matrix |
|---|
| 102 | of histogram plots - one for each of the attributes in the input data |
|---|
| 103 | ModelPerformanceChart - component that can pop up a panel for visualizing |
|---|
| 104 | threshold (i.e. ROC style) curves. |
|---|
| 105 | TextViewer - component for showing textual data. Can show data sets, |
|---|
| 106 | classification performance statistics etc. |
|---|
| 107 | GraphViewer - component that can pop up a panel for visualizing tree based |
|---|
| 108 | models |
|---|
| 109 | StripChart - component that can pop up a panel that displays a scrolling |
|---|
| 110 | plot of data (used for viewing the online performance of incremental |
|---|
| 111 | classifiers) |
|---|
| 112 | |
|---|
| 113 | |
|---|
| 114 | --------------- |
|---|
| 115 | |
|---|
| 116 | Launching the KnowledgeFlow: |
|---|
| 117 | |
|---|
| 118 | The Weka GUI Chooser window is used to launch Weka's graphical |
|---|
| 119 | environments. Select the button labeled "KnowledgeFlow" to start the |
|---|
| 120 | KnowledgeFlow. Alternatively, you can launch the KnowledgeFlow from a |
|---|
| 121 | terminal window by typing "java weka.gui.beans.KnowledgeFlow". |
|---|
| 122 | |
|---|
| 123 | At the top of the KnowledgeFlow window is are seven tabs: DataSources, |
|---|
| 124 | DataSinks, Filters, Classifiers, Clusterers, Evaluation and |
|---|
| 125 | Visualization. The names are pretty much self explanatory. |
|---|
| 126 | |
|---|
| 127 | EXAMPLE: |
|---|
| 128 | ----------------- |
|---|
| 129 | Setting up a flow to load an arff file (batch mode) and |
|---|
| 130 | perform a cross validation using J48 (Weka's C4.5 implementation). |
|---|
| 131 | |
|---|
| 132 | First start the KnowlegeFlow. |
|---|
| 133 | |
|---|
| 134 | Next click on the DataSources tab and choose "ArffLoader" from the |
|---|
| 135 | toolbar (the mouse pointer will change to a "cross hairs"). |
|---|
| 136 | |
|---|
| 137 | Next place the ArffLoader component on the layout area by clicking |
|---|
| 138 | somewhere on the layout (A copy of the ArffLoader icon will appear on |
|---|
| 139 | the layout area). |
|---|
| 140 | |
|---|
| 141 | Next specify an arff file to load by first right clicking the mouse |
|---|
| 142 | over the ArffLoader icon on the layout. A pop-up menu will |
|---|
| 143 | appear. Select "Configure" under "Edit" in the list from this menu and |
|---|
| 144 | browse to the location of your arff file. Alternatively, you can |
|---|
| 145 | double-click on the icon to bring up the configuration dialog (if |
|---|
| 146 | the component in question has one). |
|---|
| 147 | |
|---|
| 148 | Next click the "Evaluation" tab at the top of the window and choose the |
|---|
| 149 | "ClassAssigner" (allows you to choose which column to be the class) |
|---|
| 150 | component from the toolbar. Place this on the layout. |
|---|
| 151 | |
|---|
| 152 | Now connect the ArffLoader to the ClassAssigner: first right click |
|---|
| 153 | over the ArffLoader and select the "dataSet" under "Connections" in |
|---|
| 154 | the menu. A "rubber band" line will appear. Move the mouse over the |
|---|
| 155 | ClassAssigner component and left click - a red line labeled "dataSet" |
|---|
| 156 | will connect the two components. |
|---|
| 157 | |
|---|
| 158 | Next right click over the ClassAssigner and choose "Configure" from |
|---|
| 159 | the menu. This will pop up a window from which you can specify which |
|---|
| 160 | column is the class in your data (last is the default). |
|---|
| 161 | |
|---|
| 162 | Next grab a "CrossValidationFoldMaker" component from the Evaluation |
|---|
| 163 | toolbar and place it on the layout. Connect the ClassAssigner to the |
|---|
| 164 | CrossValidationFoldMaker by right clicking over "ClassAssigner" and |
|---|
| 165 | selecting "dataSet" from under "Connections" in the menu. |
|---|
| 166 | |
|---|
| 167 | Next click on the "Classifiers" tab at the top of the window and |
|---|
| 168 | scroll along the toolbar until you reach the "J48" component in the |
|---|
| 169 | "trees" section. Place a J48 component on the layout. |
|---|
| 170 | |
|---|
| 171 | Connect the CrossValidationFoldMaker to J48 TWICE by first choosing |
|---|
| 172 | "trainingSet" and then "testSet" from the pop-up menu for the |
|---|
| 173 | CrossValidationFoldMaker. |
|---|
| 174 | |
|---|
| 175 | Next go back to the "Evaluation" tab and place a |
|---|
| 176 | "ClassifierPerformanceEvaluator" component on the layout. Connect J48 |
|---|
| 177 | to this component by selecting the "batchClassifier" entry from the |
|---|
| 178 | pop-up menu for J48. |
|---|
| 179 | |
|---|
| 180 | Next go to the "Visualization" toolbar and place a "TextViewer" |
|---|
| 181 | component on the layout. Connect the ClassifierPerformanceEvaluator to |
|---|
| 182 | the TextViewer by selecting the "text" entry from the pop-up menu for |
|---|
| 183 | ClassifierPerformanceEvaluator. |
|---|
| 184 | |
|---|
| 185 | Now start the flow executing by selecting "Start loading" from the |
|---|
| 186 | pop-up menu for ArffLoader. Depending on how big the data set is and |
|---|
| 187 | how long cross validation takes you will see some animation from some |
|---|
| 188 | of the icons in the layout (J48's tree will "grow" in the icon and the |
|---|
| 189 | ticks will animate on the ClassifierPerformanceEvaluator). You will |
|---|
| 190 | also see some progress information in the "Status" bar and "Log" at |
|---|
| 191 | the bottom of the window. |
|---|
| 192 | |
|---|
| 193 | When finished you can view the results by choosing show results from |
|---|
| 194 | the pop-up menu for the TextViewer component. |
|---|
| 195 | |
|---|
| 196 | Other cool things to add to this flow: connect a TextViewer and/or a |
|---|
| 197 | GraphViewer to J48 in order to view the textual or graphical |
|---|
| 198 | representations of the trees produced for each fold of the cross |
|---|
| 199 | validation (this is something that is not possible in the Explorer). |
|---|
| 200 | ----------------------------- |
|---|
| 201 | |
|---|