Class NaiveBayesTrainer

  • All Implemented Interfaces:
    Boostable, ClassifierTrainer.ByIncrements<NaiveBayes>, ClassifierTrainer.ByInstanceIncrements<NaiveBayes>, AlphabetCarrying, java.io.Serializable

    public class NaiveBayesTrainer
    extends ClassifierTrainer<NaiveBayes>
    implements ClassifierTrainer.ByInstanceIncrements<NaiveBayes>, Boostable, AlphabetCarrying, java.io.Serializable
    Class used to generate a NaiveBayes classifier from a set of training data. In an Bayes classifier, the p(Classification|Data) = p(Data|Classification)p(Classification)/p(Data)

    To compute the likelihood:
    p(Data|Classification) = p(d1,d2,..dn | Classification)
    Naive Bayes makes the assumption that all of the data are conditionally independent given the Classification:
    p(d1,d2,...dn | Classification) = p(d1|Classification)p(d2|Classification)..

    As with other classifiers in Mallet, NaiveBayes is implemented as two classes: a trainer and a classifier. The NaiveBayesTrainer produces estimates of the various p(dn|Classifier) and contructs this class with those estimates.

    A call to train() or incrementalTrain() produces a NaiveBayes classifier that can can be used to classify instances. A call to incrementalTrain() does not throw away the internal state of the trainer; subsequent calls to incrementalTrain() train by extending the previous training set.

    A NaiveBayesTrainer can be persisted using serialization.

    Author:
    Andrew McCallum mccallum@cs.umass.edu
    See Also:
    NaiveBayes, Serialized Form
    • Constructor Detail

      • NaiveBayesTrainer

        public NaiveBayesTrainer​(NaiveBayes initialClassifier)
      • NaiveBayesTrainer

        public NaiveBayesTrainer​(Pipe instancePipe)
      • NaiveBayesTrainer

        public NaiveBayesTrainer()
    • Method Detail

      • setDocLengthNormalization

        public NaiveBayesTrainer setDocLengthNormalization​(double d)
      • getDocLengthNormalization

        public double getDocLengthNormalization()
      • getFeatureMultinomialEstimator

        public Multinomial.Estimator getFeatureMultinomialEstimator()
        Get the MultinomialEstimator instance used to specify the type of estimator for features.
        Returns:
        estimator to be cloned on next call to train() or first call to incrementalTrain()
      • setFeatureMultinomialEstimator

        public NaiveBayesTrainer setFeatureMultinomialEstimator​(Multinomial.Estimator me)
        Set the Multinomial Estimator used for features. The MulitnomialEstimator is internally cloned and the clone is used to maintain the counts that will be used to generate probability estimates the next time train() or an initial incrementalTrain() is run. Defaults to a Multinomial.LaplaceEstimator()
        Parameters:
        me - to be cloned on next call to train() or first call to incrementalTrain()
      • getPriorMultinomialEstimator

        public Multinomial.Estimator getPriorMultinomialEstimator()
        Get the MultinomialEstimator instance used to specify the type of estimator for priors.
        Returns:
        estimator to be cloned on next call to train() or first call to incrementalTrain()
      • setPriorMultinomialEstimator

        public NaiveBayesTrainer setPriorMultinomialEstimator​(Multinomial.Estimator me)
        Set the Multinomial Estimator used for priors. The MulitnomialEstimator is internally cloned and the clone is used to maintain the counts that will be used to generate probability estimates the next time train() or an initial incrementalTrain() is run. Defaults to a Multinomial.LaplaceEstimator()
        Parameters:
        me - to be cloned on next call to train() or first call to incrementalTrain()
      • train

        public NaiveBayes train​(InstanceList trainingList)
        Create a NaiveBayes classifier from a set of training data. The trainer uses counts of each feature in an instance's feature vector to provide an estimate of p(Labeling| feature). The internal state of the trainer is thrown away ( by a call to reset() ) when train() returns. Each call to train() is completely independent of any other.
        Specified by:
        train in class ClassifierTrainer<NaiveBayes>
        Parameters:
        trainingList - The InstanceList to be used to train the classifier. Within each instance the data slot is an instance of FeatureVector and the target slot is an instance of Labeling
        validationList - Currently unused
        testSet - Currently unused
        evaluator - Currently unused
        initialClassifier - Currently unused
        Returns:
        The NaiveBayes classifier as trained on the trainingList
      • toString

        public java.lang.String toString()
        Create a NaiveBayes classifier from a set of training data and the previous state of the trainer. Subsequent calls to incrementalTrain() add to the state of the trainer. An incremental training session should consist only of calls to incrementalTrain() and have no calls to train(); *
        Overrides:
        toString in class java.lang.Object
        Parameters:
        trainingList - The InstanceList to be used to train the classifier. Within each instance the data slot is an instance of FeatureVector and the target slot is an instance of Labeling
        validationList - Currently unused
        testSet - Currently unused
        evaluator - Currently unused
        initialClassifier - Currently unused
        Returns:
        The NaiveBayes classifier as trained on the trainingList and the previous trainingLists passed to incrementalTrain()