Package cc.mallet.types
Class GainRatio
- java.lang.Object
-
- cc.mallet.types.SparseVector
-
- cc.mallet.types.FeatureVector
-
- cc.mallet.types.RankedFeatureVector
-
- cc.mallet.types.GainRatio
-
- All Implemented Interfaces:
AlphabetCarrying
,ConstantMatrix
,Vector
,java.io.Serializable
public class GainRatio extends RankedFeatureVector
List of features along with their thresholds sorted in descending order of the ratio of (1) information gained by splitting instances on the feature at its associated threshold value, to (2) the split information.The calculations performed do not take into consideration the instance weights.
To create an instance of GainRatio from an InstanceList, one must do the following:
InstanceList ilist = ... ... GainRatio gr = GainRatio.createGainRatio(ilist);
J. R. Quinlan "Improved Use of Continuous Attributes in C4.5" ftp://ftp.cs.cmu.edu/project/jair/volume4/quinlan96a.ps
- Author:
- Gary Huang ghuang@cs.umass.edu
- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class cc.mallet.types.RankedFeatureVector
RankedFeatureVector.Factory, RankedFeatureVector.PerLabelFactory
-
-
Field Summary
Fields Modifier and Type Field Description static double
log2
-
Fields inherited from class cc.mallet.types.SparseVector
hasInfinite, indices, values
-
-
Constructor Summary
Constructors Modifier Constructor Description protected
GainRatio(Alphabet dataAlphabet, double[] gainRatios, double[] splitPoints, double baseEntropy, LabelVector baseLabelDistribution, int numSplitPointsForBestFeature, int minNumInsts)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected static java.lang.Object[]
calcGainRatios(InstanceList ilist, int[] instIndices, int minNumInsts)
Calculates gain ratios for all (feature, split point) pairs snd returns array of:static GainRatio
createGainRatio(InstanceList ilist)
Constructs a GainRatio object.static GainRatio
createGainRatio(InstanceList ilist, int[] instIndices, int minNumInsts)
Constructs a GainRatio objectdouble
getBaseEntropy()
LabelVector
getBaseLabelDistribution()
double
getMaxValuedThreshold()
int
getNumSplitPointsForBestFeature()
double
getThresholdAtRank(int rank)
static int[]
sortInstances(InstanceList ilist, int[] instIndices, int featureIndex)
-
Methods inherited from class cc.mallet.types.RankedFeatureVector
getIndexAtRank, getMaxValue, getMaxValuedIndex, getMaxValuedIndexIn, getMaxValuedObject, getMaxValuedObjectIn, getMaxValueIn, getObjectAtRank, getRank, getRank, getValueAtRank, printByRank, printByRank, printLowerK, printTopK, set, setRankOrder, setRankOrder, setRankOrder, setReverseRankOrder
-
Methods inherited from class cc.mallet.types.FeatureVector
alphabetsMatch, cloneMatrix, cloneMatrixZeroed, contains, getAlphabet, getAlphabets, getObjectIndices, location, newFeatureVector, toSimpFile, toString, toString, value
-
Methods inherited from class cc.mallet.types.SparseVector
absNorm, addTo, addTo, arrayCopyFrom, arrayCopyFrom, arrayCopyInto, dotProduct, dotProduct, dotProduct, dotProduct, extendedDotProduct, extendedDotProduct, getDimensions, getIndices, getNumDimensions, getValues, incrementValue, indexAtLocation, infinityNorm, isBinary, isInfinite, isNaN, isNaNOrInfinite, location, makeBinary, makeNonBinary, map, numLocations, oneNorm, plusEqualsSparse, plusEqualsSparse, print, removeDuplicates, setAll, setValue, setValueAtLocation, singleIndex, singleSize, singleToIndices, singleValue, sortIndices, timesEquals, timesEqualsSparse, timesEqualsSparse, timesEqualsSparseZero, twoNorm, value, value, valueAtLocation, vectorAdd
-
-
-
-
Constructor Detail
-
GainRatio
protected GainRatio(Alphabet dataAlphabet, double[] gainRatios, double[] splitPoints, double baseEntropy, LabelVector baseLabelDistribution, int numSplitPointsForBestFeature, int minNumInsts)
-
-
Method Detail
-
calcGainRatios
protected static java.lang.Object[] calcGainRatios(InstanceList ilist, int[] instIndices, int minNumInsts)
Calculates gain ratios for all (feature, split point) pairs snd returns array of:1. gain ratios (each element is the max gain ratio of a feature for those split points with at least average gain) 2. the optimal split point for each feature 3. the overall entropy 4. the overall label distribution of the given instances 5. the number of split points of the split feature.
-
sortInstances
public static int[] sortInstances(InstanceList ilist, int[] instIndices, int featureIndex)
-
createGainRatio
public static GainRatio createGainRatio(InstanceList ilist)
Constructs a GainRatio object.
-
createGainRatio
public static GainRatio createGainRatio(InstanceList ilist, int[] instIndices, int minNumInsts)
Constructs a GainRatio object
-
getMaxValuedThreshold
public double getMaxValuedThreshold()
- Returns:
- the threshold of the (feature, threshold) pair with with maximum gain ratio
-
getThresholdAtRank
public double getThresholdAtRank(int rank)
- Returns:
- the threshold of the (feature, threshold) pair with the given rank
-
getBaseEntropy
public double getBaseEntropy()
-
getBaseLabelDistribution
public LabelVector getBaseLabelDistribution()
-
getNumSplitPointsForBestFeature
public int getNumSplitPointsForBestFeature()
-
-