source: src/main/java/weka/gui/beans/README_KnowledgeFlow @ 4

Last change on this file since 4 was 4, checked in by gnappo, 14 years ago

Import di weka.

File size: 8.7 KB
Line 
1===============================================================
2KnowledgeFlow GUI Quick Primer
3===============================================================
4
5What's new in the KnowledgeFlow:
6
7Components can now be grouped together under a "meta" component. Start
8by placing some components on the layout and connecting them
9together. Then select a subset of components with the mouse by holding
10down the left button and dragging the resulting rectangle. You will
11then be asked whether you wish to group the selected components and
12for a name to give the group. The selected components will then be
13replaced by a single icon on the layout. All grouped beans can still
14be configured and connections made by right-clicking on the icon to
15display a pop-up menu. At the moment meta components can't form part
16of another group (this functionality will be added in a later
17release). Eventually, functionality will be added to allow the user
18to store custom groups in a user toolbar for reuse.
19
20Introduction:
21
22The KnowledgeFlow provides an alternative to the Explorer as a
23graphical front end to Weka's core algorithms. The KnowledgeFlow is a
24work in progress so some of the functionality from the Explorer is not
25yet available. On the other hand, there are things that can be done in
26the KnowledgeFlow but not in the Explorer.
27
28The KnowledgeFlow presents a "data-flow" inspired interface to
29Weka. The user can select Weka components from a tool bar, place them
30on a layout canvas and connect them together in order to form a
31"knowledge flow" for processing and analyzing data. At present, all of
32Weka's classifiers, filters, clusterers, loaders and savers are
33available in the KnowledgeFlow along with some extra tools.
34
35The KnowledgeFlow can handle data either incrementally or in batches
36(the Explorer handles batch data only). Of course learning from data
37incrementally requires a classifier that can be updated on an instance
38by instance basis. Currently in Weka there are five classifiers that
39can handle data incrementally: NaiveBayesUpdateable, IB1, IBk, LWR
40(locally weighted regression). There is also one meta classifier -
41RacedIncrementalLogitBoost - that can use of any regression base
42learner to learn from discrete class data incrementally.
43
44Features of the KnowledgeFlow:
45
46* intuitive data flow style layout
47* process data in batches or incrementally
48* process multiple batches or streams in parallel! (each separate flow
49  executes in its own thread)
50* chain filters together
51* view models produced by classifiers for each fold in a cross validation
52* visualize performance of incremental classifiers during
53  processing (scrolling plots of classification accuracy, RMS error,
54  predictions etc)
55
56Components available in the KnowledgeFlow:
57
58DataSources:
59  All of Weka's loaders are available
60
61DataSinks:
62  All of Weka's savers are available
63
64Filters:
65  All of Weka's filters are available
66
67Classifiers:
68  All of Weka's classifiers are available
69
70Clusterers:
71  All of Weka's clusterers are available
72
73Evaluation:
74  TrainingSetMaker - make a data set into a training set
75  TestSetMaker - make a data set into a test set
76  CrossValidationFoldMaker - split any data set, training set or test set
77    into folds
78  TrainTestSplitMaker - split any data set, training set or test set into
79    a training set and a test set
80  ClassAssigner - assign a column to be the class for any data set, training
81    set or test set
82  ClassValuePicker - choose a class value to be considered as the "positive"
83    class. This is useful when generating data for ROC style curves (see
84    below)
85  ClassifierPerformanceEvaluator - evaluate the performance of batch
86    trained/tested classifiers
87  IncrementalClassifierEvaluator - evaluate the performance of incrementally
88    trained classifiers
89  ClustererPerformanceEvaluator - evaluate the performance of batch
90    trained/tested clusterers
91  PredictionAppender - append classifier predictions to a test set. For
92    discrete class problems, can either append predicted class labels or
93    probability distributions
94
95Visualization:
96  DataVisualizer - component that can pop up a panel for visualizing data in
97    a single large 2D scatter plot
98  ScatterPlotMatrix - component that can pop up a panel containing a matrix of
99    small scatter plots (clicking on a small plot pops up a large scatter
100    plot)
101  AttributeSummarizer - component that can pop up a panel containing a matrix
102    of histogram plots - one for each of the attributes in the input data
103  ModelPerformanceChart - component that can pop up a panel for visualizing
104    threshold (i.e. ROC style) curves.
105  TextViewer - component for showing textual data. Can show data sets,
106    classification performance statistics etc.
107  GraphViewer - component that can pop up a panel for visualizing tree based
108    models
109  StripChart - component that can pop up a panel that displays a scrolling
110    plot of data (used for viewing the online performance of incremental
111    classifiers)
112
113
114---------------
115
116Launching the KnowledgeFlow:
117
118The Weka GUI Chooser window is used to launch Weka's graphical
119environments. Select the button labeled "KnowledgeFlow" to start the
120KnowledgeFlow. Alternatively, you can launch the KnowledgeFlow from a
121terminal window by typing "java weka.gui.beans.KnowledgeFlow".
122
123At the top of the KnowledgeFlow window is are seven tabs: DataSources,
124DataSinks, Filters, Classifiers, Clusterers, Evaluation and
125Visualization. The names are pretty much self explanatory.
126
127EXAMPLE:
128-----------------
129Setting up a flow to load an arff file (batch mode) and
130perform a cross validation using J48 (Weka's C4.5 implementation).
131
132First start the KnowlegeFlow.
133
134Next click on the DataSources tab and choose "ArffLoader" from the
135toolbar (the mouse pointer will change to a "cross hairs").
136
137Next place the ArffLoader component on the layout area by clicking
138somewhere on the layout (A copy of the ArffLoader icon will appear on
139the layout area).
140
141Next specify an arff file to load by first right clicking the mouse
142over the ArffLoader icon on the layout. A pop-up menu will
143appear. Select "Configure" under "Edit" in the list from this menu and
144browse to the location of your arff file. Alternatively, you can
145double-click on the icon to bring up the configuration dialog (if
146the component in question has one).
147
148Next click the "Evaluation" tab at the top of the window and choose the
149"ClassAssigner" (allows you to choose which column to be the class)
150component from the toolbar. Place this on the layout.
151
152Now connect the ArffLoader to the ClassAssigner: first right click
153over the ArffLoader and select the "dataSet" under "Connections" in
154the menu. A "rubber band" line will appear. Move the mouse over the
155ClassAssigner component and left click - a red line labeled "dataSet"
156will connect the two components.
157
158Next right click over the ClassAssigner and choose "Configure" from
159the menu. This will pop up a window from which you can specify which
160column is the class in your data (last is the default).
161
162Next grab a "CrossValidationFoldMaker" component from the Evaluation
163toolbar and place it on the layout. Connect the ClassAssigner to the
164CrossValidationFoldMaker by right clicking over "ClassAssigner" and
165selecting "dataSet" from under "Connections" in the menu.
166
167Next click on the "Classifiers" tab at the top of the window and
168scroll along the toolbar until you reach the "J48" component in the
169"trees" section. Place a J48 component on the layout.
170
171Connect the CrossValidationFoldMaker to J48 TWICE by first choosing
172"trainingSet" and then "testSet" from the pop-up menu for the
173CrossValidationFoldMaker.
174
175Next go back to the "Evaluation" tab and place a
176"ClassifierPerformanceEvaluator" component on the layout. Connect J48
177to this component by selecting the "batchClassifier" entry from the
178pop-up menu for J48.
179
180Next go to the "Visualization" toolbar and place a "TextViewer"
181component on the layout. Connect the ClassifierPerformanceEvaluator to
182the TextViewer by selecting the "text" entry from the pop-up menu for
183ClassifierPerformanceEvaluator.
184
185Now start the flow executing by selecting "Start loading" from the
186pop-up menu for ArffLoader. Depending on how big the data set is and
187how long cross validation takes you will see some animation from some
188of the icons in the layout (J48's tree will "grow" in the icon and the
189ticks will animate on the ClassifierPerformanceEvaluator). You will
190also see some progress information in the "Status" bar and "Log" at
191the bottom of the window.
192
193When finished you can view the results by choosing show results from
194the pop-up menu for the TextViewer component.
195
196Other cool things to add to this flow: connect a TextViewer and/or a
197GraphViewer to J48 in order to view the textual or graphical
198representations of the trees produced for each fold of the cross
199validation (this is something that is not possible in the Explorer).
200-----------------------------
201
Note: See TracBrowser for help on using the repository browser.