1 | =============================================================== |
---|
2 | KnowledgeFlow GUI Quick Primer |
---|
3 | =============================================================== |
---|
4 | |
---|
5 | What's new in the KnowledgeFlow: |
---|
6 | |
---|
7 | Components can now be grouped together under a "meta" component. Start |
---|
8 | by placing some components on the layout and connecting them |
---|
9 | together. Then select a subset of components with the mouse by holding |
---|
10 | down the left button and dragging the resulting rectangle. You will |
---|
11 | then be asked whether you wish to group the selected components and |
---|
12 | for a name to give the group. The selected components will then be |
---|
13 | replaced by a single icon on the layout. All grouped beans can still |
---|
14 | be configured and connections made by right-clicking on the icon to |
---|
15 | display a pop-up menu. At the moment meta components can't form part |
---|
16 | of another group (this functionality will be added in a later |
---|
17 | release). Eventually, functionality will be added to allow the user |
---|
18 | to store custom groups in a user toolbar for reuse. |
---|
19 | |
---|
20 | Introduction: |
---|
21 | |
---|
22 | The KnowledgeFlow provides an alternative to the Explorer as a |
---|
23 | graphical front end to Weka's core algorithms. The KnowledgeFlow is a |
---|
24 | work in progress so some of the functionality from the Explorer is not |
---|
25 | yet available. On the other hand, there are things that can be done in |
---|
26 | the KnowledgeFlow but not in the Explorer. |
---|
27 | |
---|
28 | The KnowledgeFlow presents a "data-flow" inspired interface to |
---|
29 | Weka. The user can select Weka components from a tool bar, place them |
---|
30 | on a layout canvas and connect them together in order to form a |
---|
31 | "knowledge flow" for processing and analyzing data. At present, all of |
---|
32 | Weka's classifiers, filters, clusterers, loaders and savers are |
---|
33 | available in the KnowledgeFlow along with some extra tools. |
---|
34 | |
---|
35 | The KnowledgeFlow can handle data either incrementally or in batches |
---|
36 | (the Explorer handles batch data only). Of course learning from data |
---|
37 | incrementally requires a classifier that can be updated on an instance |
---|
38 | by instance basis. Currently in Weka there are five classifiers that |
---|
39 | can handle data incrementally: NaiveBayesUpdateable, IB1, IBk, LWR |
---|
40 | (locally weighted regression). There is also one meta classifier - |
---|
41 | RacedIncrementalLogitBoost - that can use of any regression base |
---|
42 | learner to learn from discrete class data incrementally. |
---|
43 | |
---|
44 | Features of the KnowledgeFlow: |
---|
45 | |
---|
46 | * intuitive data flow style layout |
---|
47 | * process data in batches or incrementally |
---|
48 | * process multiple batches or streams in parallel! (each separate flow |
---|
49 | executes in its own thread) |
---|
50 | * chain filters together |
---|
51 | * view models produced by classifiers for each fold in a cross validation |
---|
52 | * visualize performance of incremental classifiers during |
---|
53 | processing (scrolling plots of classification accuracy, RMS error, |
---|
54 | predictions etc) |
---|
55 | |
---|
56 | Components available in the KnowledgeFlow: |
---|
57 | |
---|
58 | DataSources: |
---|
59 | All of Weka's loaders are available |
---|
60 | |
---|
61 | DataSinks: |
---|
62 | All of Weka's savers are available |
---|
63 | |
---|
64 | Filters: |
---|
65 | All of Weka's filters are available |
---|
66 | |
---|
67 | Classifiers: |
---|
68 | All of Weka's classifiers are available |
---|
69 | |
---|
70 | Clusterers: |
---|
71 | All of Weka's clusterers are available |
---|
72 | |
---|
73 | Evaluation: |
---|
74 | TrainingSetMaker - make a data set into a training set |
---|
75 | TestSetMaker - make a data set into a test set |
---|
76 | CrossValidationFoldMaker - split any data set, training set or test set |
---|
77 | into folds |
---|
78 | TrainTestSplitMaker - split any data set, training set or test set into |
---|
79 | a training set and a test set |
---|
80 | ClassAssigner - assign a column to be the class for any data set, training |
---|
81 | set or test set |
---|
82 | ClassValuePicker - choose a class value to be considered as the "positive" |
---|
83 | class. This is useful when generating data for ROC style curves (see |
---|
84 | below) |
---|
85 | ClassifierPerformanceEvaluator - evaluate the performance of batch |
---|
86 | trained/tested classifiers |
---|
87 | IncrementalClassifierEvaluator - evaluate the performance of incrementally |
---|
88 | trained classifiers |
---|
89 | ClustererPerformanceEvaluator - evaluate the performance of batch |
---|
90 | trained/tested clusterers |
---|
91 | PredictionAppender - append classifier predictions to a test set. For |
---|
92 | discrete class problems, can either append predicted class labels or |
---|
93 | probability distributions |
---|
94 | |
---|
95 | Visualization: |
---|
96 | DataVisualizer - component that can pop up a panel for visualizing data in |
---|
97 | a single large 2D scatter plot |
---|
98 | ScatterPlotMatrix - component that can pop up a panel containing a matrix of |
---|
99 | small scatter plots (clicking on a small plot pops up a large scatter |
---|
100 | plot) |
---|
101 | AttributeSummarizer - component that can pop up a panel containing a matrix |
---|
102 | of histogram plots - one for each of the attributes in the input data |
---|
103 | ModelPerformanceChart - component that can pop up a panel for visualizing |
---|
104 | threshold (i.e. ROC style) curves. |
---|
105 | TextViewer - component for showing textual data. Can show data sets, |
---|
106 | classification performance statistics etc. |
---|
107 | GraphViewer - component that can pop up a panel for visualizing tree based |
---|
108 | models |
---|
109 | StripChart - component that can pop up a panel that displays a scrolling |
---|
110 | plot of data (used for viewing the online performance of incremental |
---|
111 | classifiers) |
---|
112 | |
---|
113 | |
---|
114 | --------------- |
---|
115 | |
---|
116 | Launching the KnowledgeFlow: |
---|
117 | |
---|
118 | The Weka GUI Chooser window is used to launch Weka's graphical |
---|
119 | environments. Select the button labeled "KnowledgeFlow" to start the |
---|
120 | KnowledgeFlow. Alternatively, you can launch the KnowledgeFlow from a |
---|
121 | terminal window by typing "java weka.gui.beans.KnowledgeFlow". |
---|
122 | |
---|
123 | At the top of the KnowledgeFlow window is are seven tabs: DataSources, |
---|
124 | DataSinks, Filters, Classifiers, Clusterers, Evaluation and |
---|
125 | Visualization. The names are pretty much self explanatory. |
---|
126 | |
---|
127 | EXAMPLE: |
---|
128 | ----------------- |
---|
129 | Setting up a flow to load an arff file (batch mode) and |
---|
130 | perform a cross validation using J48 (Weka's C4.5 implementation). |
---|
131 | |
---|
132 | First start the KnowlegeFlow. |
---|
133 | |
---|
134 | Next click on the DataSources tab and choose "ArffLoader" from the |
---|
135 | toolbar (the mouse pointer will change to a "cross hairs"). |
---|
136 | |
---|
137 | Next place the ArffLoader component on the layout area by clicking |
---|
138 | somewhere on the layout (A copy of the ArffLoader icon will appear on |
---|
139 | the layout area). |
---|
140 | |
---|
141 | Next specify an arff file to load by first right clicking the mouse |
---|
142 | over the ArffLoader icon on the layout. A pop-up menu will |
---|
143 | appear. Select "Configure" under "Edit" in the list from this menu and |
---|
144 | browse to the location of your arff file. Alternatively, you can |
---|
145 | double-click on the icon to bring up the configuration dialog (if |
---|
146 | the component in question has one). |
---|
147 | |
---|
148 | Next click the "Evaluation" tab at the top of the window and choose the |
---|
149 | "ClassAssigner" (allows you to choose which column to be the class) |
---|
150 | component from the toolbar. Place this on the layout. |
---|
151 | |
---|
152 | Now connect the ArffLoader to the ClassAssigner: first right click |
---|
153 | over the ArffLoader and select the "dataSet" under "Connections" in |
---|
154 | the menu. A "rubber band" line will appear. Move the mouse over the |
---|
155 | ClassAssigner component and left click - a red line labeled "dataSet" |
---|
156 | will connect the two components. |
---|
157 | |
---|
158 | Next right click over the ClassAssigner and choose "Configure" from |
---|
159 | the menu. This will pop up a window from which you can specify which |
---|
160 | column is the class in your data (last is the default). |
---|
161 | |
---|
162 | Next grab a "CrossValidationFoldMaker" component from the Evaluation |
---|
163 | toolbar and place it on the layout. Connect the ClassAssigner to the |
---|
164 | CrossValidationFoldMaker by right clicking over "ClassAssigner" and |
---|
165 | selecting "dataSet" from under "Connections" in the menu. |
---|
166 | |
---|
167 | Next click on the "Classifiers" tab at the top of the window and |
---|
168 | scroll along the toolbar until you reach the "J48" component in the |
---|
169 | "trees" section. Place a J48 component on the layout. |
---|
170 | |
---|
171 | Connect the CrossValidationFoldMaker to J48 TWICE by first choosing |
---|
172 | "trainingSet" and then "testSet" from the pop-up menu for the |
---|
173 | CrossValidationFoldMaker. |
---|
174 | |
---|
175 | Next go back to the "Evaluation" tab and place a |
---|
176 | "ClassifierPerformanceEvaluator" component on the layout. Connect J48 |
---|
177 | to this component by selecting the "batchClassifier" entry from the |
---|
178 | pop-up menu for J48. |
---|
179 | |
---|
180 | Next go to the "Visualization" toolbar and place a "TextViewer" |
---|
181 | component on the layout. Connect the ClassifierPerformanceEvaluator to |
---|
182 | the TextViewer by selecting the "text" entry from the pop-up menu for |
---|
183 | ClassifierPerformanceEvaluator. |
---|
184 | |
---|
185 | Now start the flow executing by selecting "Start loading" from the |
---|
186 | pop-up menu for ArffLoader. Depending on how big the data set is and |
---|
187 | how long cross validation takes you will see some animation from some |
---|
188 | of the icons in the layout (J48's tree will "grow" in the icon and the |
---|
189 | ticks will animate on the ClassifierPerformanceEvaluator). You will |
---|
190 | also see some progress information in the "Status" bar and "Log" at |
---|
191 | the bottom of the window. |
---|
192 | |
---|
193 | When finished you can view the results by choosing show results from |
---|
194 | the pop-up menu for the TextViewer component. |
---|
195 | |
---|
196 | Other cool things to add to this flow: connect a TextViewer and/or a |
---|
197 | GraphViewer to J48 in order to view the textual or graphical |
---|
198 | representations of the trees produced for each fold of the cross |
---|
199 | validation (this is something that is not possible in the Explorer). |
---|
200 | ----------------------------- |
---|
201 | |
---|