Class NaiveBayesTrainer
- java.lang.Object
-
- cc.mallet.classify.ClassifierTrainer<NaiveBayes>
-
- cc.mallet.classify.NaiveBayesTrainer
-
- All Implemented Interfaces:
Boostable
,ClassifierTrainer.ByIncrements<NaiveBayes>
,ClassifierTrainer.ByInstanceIncrements<NaiveBayes>
,AlphabetCarrying
,java.io.Serializable
public class NaiveBayesTrainer extends ClassifierTrainer<NaiveBayes> implements ClassifierTrainer.ByInstanceIncrements<NaiveBayes>, Boostable, AlphabetCarrying, java.io.Serializable
Class used to generate a NaiveBayes classifier from a set of training data. In an Bayes classifier, the p(Classification|Data) = p(Data|Classification)p(Classification)/p(Data)To compute the likelihood:
p(Data|Classification) = p(d1,d2,..dn | Classification)
Naive Bayes makes the assumption that all of the data are conditionally independent given the Classification:
p(d1,d2,...dn | Classification) = p(d1|Classification)p(d2|Classification)..As with other classifiers in Mallet, NaiveBayes is implemented as two classes: a trainer and a classifier. The NaiveBayesTrainer produces estimates of the various p(dn|Classifier) and contructs this class with those estimates.
A call to train() or incrementalTrain() produces a
NaiveBayes
classifier that can can be used to classify instances. A call to incrementalTrain() does not throw away the internal state of the trainer; subsequent calls to incrementalTrain() train by extending the previous training set.A NaiveBayesTrainer can be persisted using serialization.
- Author:
- Andrew McCallum mccallum@cs.umass.edu
- See Also:
NaiveBayes
, Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
NaiveBayesTrainer.Factory
-
Nested classes/interfaces inherited from class cc.mallet.classify.ClassifierTrainer
ClassifierTrainer.ByActiveLearning<C extends Classifier>, ClassifierTrainer.ByIncrements<C extends Classifier>, ClassifierTrainer.ByInstanceIncrements<C extends Classifier>, ClassifierTrainer.ByOptimization<C extends Classifier>
-
-
Field Summary
-
Fields inherited from class cc.mallet.classify.ClassifierTrainer
finishedTraining, validationSet
-
-
Constructor Summary
Constructors Constructor Description NaiveBayesTrainer()
NaiveBayesTrainer(NaiveBayes initialClassifier)
NaiveBayesTrainer(Pipe instancePipe)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
alphabetsMatch(AlphabetCarrying object)
Alphabet
getAlphabet()
Alphabet[]
getAlphabets()
NaiveBayes
getClassifier()
double
getDocLengthNormalization()
Multinomial.Estimator
getFeatureMultinomialEstimator()
Get the MultinomialEstimator instance used to specify the type of estimator for features.Multinomial.Estimator
getPriorMultinomialEstimator()
Get the MultinomialEstimator instance used to specify the type of estimator for priors.NaiveBayesTrainer
setDocLengthNormalization(double d)
NaiveBayesTrainer
setFeatureMultinomialEstimator(Multinomial.Estimator me)
Set the Multinomial Estimator used for features.NaiveBayesTrainer
setPriorMultinomialEstimator(Multinomial.Estimator me)
Set the Multinomial Estimator used for priors.java.lang.String
toString()
Create a NaiveBayes classifier from a set of training data and the previous state of the trainer.NaiveBayes
train(InstanceList trainingList)
Create a NaiveBayes classifier from a set of training data.NaiveBayes
trainIncremental(Instance instance)
NaiveBayes
trainIncremental(InstanceList trainingInstancesToAdd)
-
Methods inherited from class cc.mallet.classify.ClassifierTrainer
getValidationInstances, isFinishedTraining, setValidationInstances
-
-
-
-
Constructor Detail
-
NaiveBayesTrainer
public NaiveBayesTrainer(NaiveBayes initialClassifier)
-
NaiveBayesTrainer
public NaiveBayesTrainer(Pipe instancePipe)
-
NaiveBayesTrainer
public NaiveBayesTrainer()
-
-
Method Detail
-
getClassifier
public NaiveBayes getClassifier()
- Specified by:
getClassifier
in classClassifierTrainer<NaiveBayes>
-
setDocLengthNormalization
public NaiveBayesTrainer setDocLengthNormalization(double d)
-
getDocLengthNormalization
public double getDocLengthNormalization()
-
getFeatureMultinomialEstimator
public Multinomial.Estimator getFeatureMultinomialEstimator()
Get the MultinomialEstimator instance used to specify the type of estimator for features.- Returns:
- estimator to be cloned on next call to train() or first call to incrementalTrain()
-
setFeatureMultinomialEstimator
public NaiveBayesTrainer setFeatureMultinomialEstimator(Multinomial.Estimator me)
Set the Multinomial Estimator used for features. The MulitnomialEstimator is internally cloned and the clone is used to maintain the counts that will be used to generate probability estimates the next time train() or an initial incrementalTrain() is run. Defaults to a Multinomial.LaplaceEstimator()- Parameters:
me
- to be cloned on next call to train() or first call to incrementalTrain()
-
getPriorMultinomialEstimator
public Multinomial.Estimator getPriorMultinomialEstimator()
Get the MultinomialEstimator instance used to specify the type of estimator for priors.- Returns:
- estimator to be cloned on next call to train() or first call to incrementalTrain()
-
setPriorMultinomialEstimator
public NaiveBayesTrainer setPriorMultinomialEstimator(Multinomial.Estimator me)
Set the Multinomial Estimator used for priors. The MulitnomialEstimator is internally cloned and the clone is used to maintain the counts that will be used to generate probability estimates the next time train() or an initial incrementalTrain() is run. Defaults to a Multinomial.LaplaceEstimator()- Parameters:
me
- to be cloned on next call to train() or first call to incrementalTrain()
-
train
public NaiveBayes train(InstanceList trainingList)
Create a NaiveBayes classifier from a set of training data. The trainer uses counts of each feature in an instance's feature vector to provide an estimate of p(Labeling| feature). The internal state of the trainer is thrown away ( by a call to reset() ) when train() returns. Each call to train() is completely independent of any other.- Specified by:
train
in classClassifierTrainer<NaiveBayes>
- Parameters:
trainingList
- The InstanceList to be used to train the classifier. Within each instance the data slot is an instance of FeatureVector and the target slot is an instance of LabelingvalidationList
- Currently unusedtestSet
- Currently unusedevaluator
- Currently unusedinitialClassifier
- Currently unused- Returns:
- The NaiveBayes classifier as trained on the trainingList
-
trainIncremental
public NaiveBayes trainIncremental(InstanceList trainingInstancesToAdd)
- Specified by:
trainIncremental
in interfaceClassifierTrainer.ByIncrements<NaiveBayes>
-
trainIncremental
public NaiveBayes trainIncremental(Instance instance)
- Specified by:
trainIncremental
in interfaceClassifierTrainer.ByInstanceIncrements<NaiveBayes>
-
toString
public java.lang.String toString()
Create a NaiveBayes classifier from a set of training data and the previous state of the trainer. Subsequent calls to incrementalTrain() add to the state of the trainer. An incremental training session should consist only of calls to incrementalTrain() and have no calls to train(); *- Overrides:
toString
in classjava.lang.Object
- Parameters:
trainingList
- The InstanceList to be used to train the classifier. Within each instance the data slot is an instance of FeatureVector and the target slot is an instance of LabelingvalidationList
- Currently unusedtestSet
- Currently unusedevaluator
- Currently unusedinitialClassifier
- Currently unused- Returns:
- The NaiveBayes classifier as trained on the trainingList and the previous trainingLists passed to incrementalTrain()
-
alphabetsMatch
public boolean alphabetsMatch(AlphabetCarrying object)
-
getAlphabet
public Alphabet getAlphabet()
- Specified by:
getAlphabet
in interfaceAlphabetCarrying
-
getAlphabets
public Alphabet[] getAlphabets()
- Specified by:
getAlphabets
in interfaceAlphabetCarrying
-
-