Subject: problem statement |
Author: Bhasker , Ajay
| [ Next Thread |
Previous Thread |
Next Message |
Previous Message
]
Date Posted: 02:28:44 07/21/05 Thu
Author Host/IP: NoHost/202.71.147.34
Optimizing Decision Trees by knowledge-based Sequence Finding
- Ajay Subramaniam V
- Bhasker V K
ABSTRACT
A new approach to optimizing search in decision trees (in the form of data matrices/knowledge base/other structures ) by prioritizing decision paths with regard to their performance in obtaining the end result.This is first done by analyzing operations in traditional decision tree path finding algorithms. In recursive programming,the questions asked at the root node and the subsequent path are static in order.Hence the worst case scenario shows that in order to derive the required results,the path might lead to the maximum depth of the tree.A new metric has been introduced to supplement the prioritizing of decision paths called the “purity weight”.Therefore the solution to this problem is related to how to dynamically change the order of questions ,so that significant questions or decision can be dealt first. This is done by filtering or disqualifying sequences or paths from a global buffer or list of probable sequences called the “training set” . While this method is used to filter unlikely paths,the the new metric supplements the discovery and identification of likely paths.Linked lists or other structures are required to separate the probable sequences from the filtered ones.
The process can be categorized into :
o Keeping a knowledge base of the paths /sequences ,specified by the user.
o Finding the question /decision to ask ,based updated training set and purity weight.
o Updating the “training set ” by disqualifying unlikey sequences
This project also hopes to compare and quantify the findings of this method with traditional methods.
Applications of this Project
This method of intelligent decision path-finding algorithms, can be used in forecasting information systems such as Weather analyzing systems, Cancer and disease prediction systems,and so on.
Eg : Weather Analysing system
A data matrix of the sequences of weather parameters to show if the day is ideal or not .From the knowledge base gathered, the search can be optimized to allow the user to find out if the conditions are ideal or not ,based on lesser information provided by reducing time and following paths based on priority instead of hard coded order of interaction.
eg : if temperature->low ,wind = high = ideal
but if temperature->low ,wind = high , pressure ->high = not ideal
Clearly traditional systems following static order of decision making,will take more un needed time to calculate the result which can be substantially imporoved by asking the significant questions first.
Explanation of a Sequence/Pattern
· A particular series or pattern of Boolean values for decisions that make up an end result as suggested as suggested by the user.
Eg: temperature->high , humidity->high = not ideal
Therefore variables in the sequence should be capable of holding the decisions required ,their values and the end result.
· Should also have purity weight at each decision
o Purity weight of each decision
o Purity weight of first decision to indicate depth
o End result for each sequence / pattern(ideal or not ideal)
· Capabilities must also exist so that part of the sewunce could be identified and forecasting could be done before end result or new sequences are obtained.
Purpose of Purity Weight
· The purity weight is used as a supplementary indicator for choosing the decision path with higher success probability.
· They will only be called upon when two or more sequences share parts of the sequence,and hence at this point the sequence with the lesser purity weight will be chosen.
Operation of Dequalifying Sequences
The training set (probable sequences under study initially ) can be filtered by:
· Keeping linked list or stack for each decision
· All sequences that have true for that decision will be appended into the corresponding list.
Eg: SEG ID #10
temperature->high , humidity->high = not ideal
will have the linked entries as :
temperature linked list : {10 , …}
humidity linked list : {10 , …}
· Hence for a decision given by the input, the time for searching and canceling probable sequences can be reduced by iteratively removing sequences (identified from the SEQ ID#) ,from a global list of sequences to search from the data matrix/knowledge base/sequence structure
[
Next Thread |
Previous Thread |
Next Message |
Previous Message
] |
|