Class GainRatio

  • All Implemented Interfaces:
    AlphabetCarrying, ConstantMatrix, Vector, java.io.Serializable

    public class GainRatio
    extends RankedFeatureVector
    List of features along with their thresholds sorted in descending order of the ratio of (1) information gained by splitting instances on the feature at its associated threshold value, to (2) the split information.

    The calculations performed do not take into consideration the instance weights.

    To create an instance of GainRatio from an InstanceList, one must do the following:

    InstanceList ilist = ... ... GainRatio gr = GainRatio.createGainRatio(ilist);

    J. R. Quinlan "Improved Use of Continuous Attributes in C4.5" ftp://ftp.cs.cmu.edu/project/jair/volume4/quinlan96a.ps

    Author:
    Gary Huang ghuang@cs.umass.edu
    See Also:
    Serialized Form
    • Field Detail

      • log2

        public static final double log2
    • Constructor Detail

      • GainRatio

        protected GainRatio​(Alphabet dataAlphabet,
                            double[] gainRatios,
                            double[] splitPoints,
                            double baseEntropy,
                            LabelVector baseLabelDistribution,
                            int numSplitPointsForBestFeature,
                            int minNumInsts)
    • Method Detail

      • calcGainRatios

        protected static java.lang.Object[] calcGainRatios​(InstanceList ilist,
                                                           int[] instIndices,
                                                           int minNumInsts)
        Calculates gain ratios for all (feature, split point) pairs snd returns array of:
           1.  gain ratios (each element is the max gain ratio of a feature 
         for those split points with at least average gain)
           2.  the optimal split point for each feature
           3.  the overall entropy 
           4.  the overall label distribution of the given instances
           5.  the number of split points of the split feature.
           
      • sortInstances

        public static int[] sortInstances​(InstanceList ilist,
                                          int[] instIndices,
                                          int featureIndex)
      • createGainRatio

        public static GainRatio createGainRatio​(InstanceList ilist)
        Constructs a GainRatio object.
      • createGainRatio

        public static GainRatio createGainRatio​(InstanceList ilist,
                                                int[] instIndices,
                                                int minNumInsts)
        Constructs a GainRatio object
      • getMaxValuedThreshold

        public double getMaxValuedThreshold()
        Returns:
        the threshold of the (feature, threshold) pair with with maximum gain ratio
      • getThresholdAtRank

        public double getThresholdAtRank​(int rank)
        Returns:
        the threshold of the (feature, threshold) pair with the given rank
      • getBaseEntropy

        public double getBaseEntropy()
      • getBaseLabelDistribution

        public LabelVector getBaseLabelDistribution()
      • getNumSplitPointsForBestFeature

        public int getNumSplitPointsForBestFeature()