| Title: | Harvest the Classification Tree |
| Version: | 1.1 |
| Date: | 2015-07-30 |
| Author: | Bingyuan Liu/Yan Yuan/Qian Shi |
| Maintainer: | Bingyuan Liu <adler1016@gmail.com> |
| Depends: | R (≥ 3.0.1) |
| Imports: | rpart,stats |
| Description: | Aimed at applying the Harvest classification tree algorithm, modified algorithm of classic classification tree.The harvested tree has advantage of deleting redundant rules in trees, leading to a simplify and more efficient tree model.It was firstly used in drug discovery field, but it also performs well in other kinds of data, especially when the region of a class is disconnected. This package also improves the basic harvest classification tree algorithm by extending the field of data of algorithm to both continuous and categorical variables. To learn more about the harvest classification tree algorithm, you can go to http://www.stat.ubc.ca/Research/TechReports/techreports/220.pdf for more information. |
| License: | GPL-2 |
| Packaged: | 2015-07-30 17:13:18 UTC; Bingyuan |
| NeedsCompilation: | no |
| Repository: | CRAN |
| Date/Publication: | 2015-07-31 00:54:59 |
Harvest the classification tree
Description
Aimed at applying the Harvest classification tree algorithm, modified algorithm of classic classification tree.The harvested tree has advantage of deleting redundant rules in trees, leading to a simplify and more efficient tree model.It was firstly used in drug discovery field, but it also performs well in other kinds of data, especially when the region of a class is disconnected. This package also improves the basic harvest classification tree algorithm by extending the field of data of algorithm to both continuous and categorical variables.
To learn more about the harvest classification tree algorithm, you can go to http://www.stat.ubc.ca/Research/TechReports/techreports/220.pdf for more information.
Details
| Package: | Harvest.Tree |
| Type: | Package |
| Version: | 1.1 |
| Date: | 2015-07-30 |
| License: | GPL-2 |
The main function of package called 'harvest', it can be used to analyze the data which is stored in a data frame, where first column stores the class of response data, and the second to last column stores explantory variables accordingly.The 'predict' funciton offers function to predict the unclassified data based on training model. The 'harfunc' function is the fundemental part of 'harvest', which can be used to analyze the data which has already been classified by rpart function(traditional classification tree). Please check the help file of these three functions for more information.
Author(s)
Bingyuan Liu \ Yan Yuan \ Qian Shi
Maintainer: Bingyuan Liu<adler1016@gmail.com>
Bound of rules
Description
This function takes in a ruleset and output the lower and upper bounds of each rule.
Usage
extrule(myrules, varname)
Arguments
myrules |
A 3 column matrix output of function "hughs.path.rpart" |
varname |
the names of x variables |
Value
A p*2 matrix, p is the length of varname. The first column is the lower bound, the second column is the upper bound. The default lower bound is "-Inf",the default upper bound is "Inf". row corresponse to x variables ordered in the data matrix given to rpart.
A harvested classification tree
Description
Basic function to apply the harvest algorithm to the training data set, computing whether we can harvest any nodes based on the classic classification tree algorithm.
Usage
harfunc(rpart.object, data, varname, sig = 0.95)
Arguments
rpart.object |
classification result of training data from traditional classification tree(rpart function). |
data |
original training data where 'y' stores classmembership |
varname |
the name of each explaanatory variables |
sig |
significance level (default 0.95) |
Value
the list of orginial result of classification, likelihood improvment and harvested classification result.
A harvested classification tree
Description
The main function of the package, aiming at develop the harvest classification tree. Training data input and
Usage
harvest(training, num.var, numeric.info, sig = 0.95)
Arguments
training |
original data where 'y' stores classmembership 0 and 1,in the first column, with explanatory variable stores in the second to the last column. |
num.var |
number of explanatory variables |
numeric.info |
the vector stores the number of which variable is continuous |
sig |
significance level (default 0.95) |
Details
The function will return the harvested tree model. Missing values are allowed, and they will be treated accordingly. To use the trained tree model to predict, you can use predict function in this package.
Value
An object of class "harvest", which is the result of algorithm with the following elements for each nodes(nodes are ordered in sequence of harvesting):
rule constriants of the node
total total number of data points in the node
'1' the number of data points belonging to class 1 in the node
'logchange' the improvement of log likelihood of deleting the redundent rules by the algorithm for the node
Examples
data(training)
harvest(training,4,3)
Predictions from a harvested tree
Description
The function predict computes the prediction of membership from a new data set classified by harvested classification model of training data.
Usage
predict(harfunc.object, data, num.var)
Arguments
harfunc.object |
the output of harfunc function. |
data |
test data |
num.var |
number of explaining varibles |
Details
To run the predict function, a trained harvested classification tree formed by harvest function is required.
Value
pred.mat is a data frame stored the information of result of prediction with the following columns:
belong the node that data point belongs to
possibility the probability of point being in class 1
predict the simple perdict based on whether probability is larger than 0.5.
Ranking of nodes
Description
Rank harvested node by lower p value
Usage
rank.nodes(harfunc.object)
Arguments
harfunc.object |
an object of class "harfunc" |
Value
the ranked harvest nodes
A logical matrix for a terminal node
Description
Return a logical matrix of the rule sets which define a terminal node
Usage
rulesets(noden, newsim, varn, nodenumb)
Arguments
noden |
a terminal node defined by a set of rules, from function "treemat" |
newsim |
data to be harvested |
varn |
x variable names |
nodenumb |
all the labels of terminal nodes |
Value
A nxnn logical matrix, n=number of data points to be harvested, nn=number of rules defining a terminal node. Each column of the matrix corresponding to a node that is defined by one variable/rule, its name corresponds to that variable. Note the original terminal node is just the intersection of these nodes.
training
Description
A simulated data set of symptoms of breast cancer patients
Usage
data(training)
Format
A data frame with 300 observations on the following 5 variables.
ya numeric vector
x1a factor with levels
212223242526272829x2a factor with levels
39-40-4950-6970-7475+x3a numeric vector
x4a factor with levels
2004200520062007200820092010
Source
simulated data for breast cancer diagnosis