All Classes (Mallet 2 API)

All Classes Interface Summary Class Summary Exception Summary
Class	Description
AbstractTopicReports
AccuracyCoverage	Methods for calculating and displaying the accuracy v.
AccuracyCoverageEvaluator	Constructs Accuracy-coverage graph using confidence values to sort Fields.
AccuracyEvaluator	Accuracy of a clustering is (truePositive + trueNegative) / (numberPairwiseComparisons)
AdaBoost	AdaBoost Robert E.
AdaBoostM2	AdaBoostM2
AdaBoostM2Trainer	This version of AdaBoost can handle multi-class problems.
AdaBoostTrainer	This version of AdaBoost should be used only for binary classification.
Addable
AddClassifierTokenPredictions	This pipe uses a Classifier to label each token (i.e., using 0-th order Markov assumption), then adds the predictions as features to each token.
AddClassifierTokenPredictions.TokenClassifiers	This inner class represents the trained token classifiers.
AgglomerativeNeighbor	A `Neighbor` created by merging two clusters of the original Clustering.
AGIS
AllPairsIterator	Iterate over all pairs of Instances.
Alphabet	A mapping between integers and objects where the mapping in each direction is efficient.
AlphabetCarrying	An interface for objects that contain one or more Alphabets.
AlphabetFactory
Array2FeatureVector	Converts a Java array of numerical types to a FeatureVector, where the Alphabet is the data array index wrapped in an Integer object.
ArrayDataAndTargetIterator
ArrayIterator
ArrayListSequence<E>
ArrayListUtils
ArraySequence<E>
ArrayUtils	Static utility methods for arrays (like java.util.Arrays, but more useful).
AStar	Created by IntelliJ IDEA.
AStarNode	Created by IntelliJ IDEA.
AStarState	Created by IntelliJ IDEA.
AugmentableFeatureVector
AugmentableFeatureVectorAddConjunctions	Add specified conjunctions to each instance.
AugmentableFeatureVectorLogScale	Given an AugmentableFeatureVector, set those values greater than or equal to 1 to log(value)+1.
BackTrackLineSearch
BaggingClassifier
BaggingTrainer	Bagging Trainer.
BalancedWinnow	Classification methods of BalancedWinnow algorithm.
BalancedWinnowTrainer	An implementation of the training methods of a BalancedWinnow on-line classifier.
BCubedEvaluator	Evaluate a Clustering using the B-Cubed evaluation metric.
BiNormalSeparation	Bi-Normal Separation is a feature weighting algorithm introduced in: An Extensive Empirical Study of Feature Selection Metrics for Text Classification, George Forman, Journal of Machine Learning Research, 3:1289--1305, 2003.
BiNormalSeparation.Factory	Factory class.
BIOTokenizationFilter	Created: Nov 12, 2004
BIOTokenizationFilterWithTokenIndices
Boostable	This interface is a tag indicating that the classifier attends to the InstanceList.getInstanceWeight() weights when training.
BranchingPipe	Deprecated.
BshInterpreter
BulkLoader	This class reads through a single file, breaking each line into data and (optional) name and label fields.
C45	A C4.5 Decision Tree classifier.
C45.Node
C45Trainer	A C4.5 decision tree learner, approximtely.
CachedDotTransitionIterator	TransitionIterator that caches dot products.
CachedMetric	Stores a hash for each object being compared for efficient computation.
CacheStaleIndicator	Indicates when the value/gradient during training becomes stale.
Calo2Classify	Classify documents, run trials, print statistics from a vector file.
ChainedInstanceIterator	Deprecated.
CharSequence2CharNGrams	Transform a character sequence into a token sequence of character N grams.
CharSequence2TokenSequence	Pipe that tokenizes a character sequence.
CharSequenceArray2TokenSequence	Transform an array of character Sequences into a token sequence.
CharSequenceLexer
CharSequenceLowercase	Replace the data string or string buffer with a lowercased version.
CharSequenceNoDiacritics	A string normalizer which performs the following steps: Unicode canonical decomposition (`Form#NFD`) Removal of diacritical marks Unicode canonical composition (`Form#NFC`)
CharSequenceRemoveHTML	This pipe removes HTML from a CharSequence.
CharSequenceRemoveUUEncodedBlocks
CharSequenceReplace	Given a string, repeatedly look for matches of the regex, and replace the entire match with the given replacement string.
CharSequenceReplaceHtmlEntities	Be careful here: this pipe must be applied before `CharSequenceLowercase` because it is case sensitive.
CharSubsequence	Given a string, return only the portion of the string inside a regex parenthesized group.
Classification	The result of classifying a single instance.
Classification2ConfidencePredictingFeatureVector	Pipe features from underlying classifier to the confidence prediction instance list
Classifier	Abstract parent of all Classifiers.
Classifier2Info	Diagnostic facilities for a classifier.
ClassifierAccuracyEvaluator
ClassifierEnsemble	Classifer for an ensemble of classifers, combined with learned weights.
ClassifierEnsembleTrainer
ClassifierEvaluator
ClassifierTrainer<C extends Classifier>	Each ClassifierTrainer trains one Classifier based on various interfaces for consuming training data.
ClassifierTrainer.ByActiveLearning<C extends Classifier>	For active learning, in which this trainer will select certain instances and request that the Labeler instance label them.
ClassifierTrainer.ByIncrements<C extends Classifier>	For various kinds of online learning by batches, where training instances are presented, consumed for learning immediately.
ClassifierTrainer.ByInstanceIncrements<C extends Classifier>	For online learning that can operate on one instance at a time.
ClassifierTrainer.ByOptimization<C extends Classifier>
ClassifierTrainer.Factory<CT extends ClassifierTrainer<? extends Classifier>>	Instances of a Factory know how to create new ClassifierTrainers to apply to new Classifiers.
ClassifyingNeighborEvaluator	A `NeighborEvaluator` that is backed by a `Classifier`.
Clusterer	An abstract class for clustering a set of points.
Clustering
ClusteringEvaluator	Evaluates a predicted Clustering against a true Clustering.
ClusteringEvaluators	A list of `ClusteringEvaluators`.
Clusterings
Clusterings2Clusterer
Clusterings2Clusterer.ClusteringPipe
Clusterings2Clusterings
Clusterings2Info
ClusteringScorer	Assign a score to a Clustering.
ClusterSampleIterator	Sample clusters of Instances.
ClusterUtils	Utility functions for Clusterings.
ColorUtils	Utilities for dealing with RGB-style colors.
CommandOption
CommandOption.Boolean
CommandOption.Double
CommandOption.DoubleArray
CommandOption.File
CommandOption.Integer
CommandOption.IntegerArray
CommandOption.List
CommandOption.ListProviding	For objects that can provide CommandOption.List's (which can be merged into other lists.
CommandOption.Object
CommandOption.ObjectFromBean
CommandOption.Set
CommandOption.SpacedStrings
CommandOption.String
ConcatenatedInstanceIterator
ConfidenceCorrectorEvaluator	Calculates the effectiveness of "constrained viterbi" in propagating corrections in one segment of a sequence to other segments.
ConfidenceEvaluator
ConfidenceEvaluator.EntityConfidence	a simple class to store a confidence score and whether or not this labeling is correct
ConfidencePredictingClassifier
ConfidencePredictingClassifierTrainer
ConfidenceTokenizationFilter	Created: Oct 26, 2005
ConfusionMatrix	Calculates and prints confusion matrix, accuracy, and precision for a given clasification trial.
ConjugateGradient
ConllNer2003Sentence2TokenSequence	Reads a data file in CoNLL 2003 format, and makes some simple transformations.
ConllNer2003Sentence2TokenSequence
ConstantMatrix
ConstrainedForwardBackwardConfidenceEstimator	Estimates the confidence of a `Segment` extracted by a `Transducer` by performing a "constrained lattice" calculation.
ConstrainedViterbiTransducerCorrector	Corrects a subset of the `Segment`s produced by a `Transducer`.
ConstraintsOptimizableByPR	Optimizable for E-step/I-projection in Posterior Regularization (PR).
CoordinateDescent
CopyCallable	This task copies topic-word counts from a global array to a local thread-specific array.
CountMatches
CountMatchesAlignedWithOffsets
CountMatchesMatching
CountsToFeatureSequencePipe
CRF	Represents a CRF model.
CRF.Factors	A simple, transparent container to hold the parameters or sufficient statistics for the CRF.
CRF.State
CRF.TransitionIterator
CRFCacheStaleIndicator	Indicates when the value/gradient becomes stale based on updates to CRF's parameters.
CRFExtractor	Created: Oct 12, 2004
CRFOptimizableByBatchLabelLikelihood	Implements label likelihood gradient computations for batches of data, can be easily parallelized.
CRFOptimizableByBatchLabelLikelihood.Factory
CRFOptimizableByEntropyRegularization	A CRF objective function that is the entropy of the CRF's predictions on unlabeled data.
CRFOptimizableByGE	Optimizable for CRF using Generalized Expectation constraints that consider either a single label or a pair of labels of a linear chain CRF.
CRFOptimizableByGradientValues	A CRF objective function that is the sum of multiple objective functions that implement Optimizable.ByGradientValue.
CRFOptimizableByKL	M-step/M-projection for PR.
CRFOptimizableByLabelLikelihood	An objective function for CRFs that is the label likelihood plus a Gaussian or hyperbolic prior on parameters.
CRFOptimizableByLabelLikelihood.Factory
CRFTrainerByEntropyRegularization	A CRF trainer that maximizes the log-likelihood plus a weighted entropy regularization term on unlabeled data.
CRFTrainerByGE	Trains a CRF using Generalized Expectation constraints that consider either a single label or a pair of labels of a linear chain CRF.
CRFTrainerByL1LabelLikelihood	CRF trainer that implements L1-regularization.
CRFTrainerByLabelLikelihood	Unlike ClassifierTrainer, TransducerTrainer is not "stateless" between calls to train.
CRFTrainerByLikelihoodAndGE
CRFTrainerByPR	Posterior regularization trainer.
CRFTrainerByStochasticGradient	Trains CRF by stochastic gradient.
CRFTrainerByThreadedLabelLikelihood
CRFTrainerByValueGradients	A CRF trainer that can combine multiple objective functions, each represented by a Optmizable.ByValueGradient.
CRFWriter	Saves a trained model to specified filename.
CrossValidationIterator	An iterator which splits an `InstanceList` into n-folds and iterates over the folds for use in n-fold cross-validation.
Csv2Array	Converts a string of comma separated values to an array.
Csv2Classify	Command line tool for classifying a sequence of instances directly from text input, without creating an instance list.
Csv2FeatureVector	Converts a string of the form `feature_1:val_1 feature_2:val_2 ...`
Csv2Vectors	Command line import tool for loading a sequence of instances from a single file, with one instance per line of the input file.
CsvIterator	This iterator, perhaps more properly called a Line Pattern Iterator, reads through a file and returns one instance per line, based on a regular expression.
DBBulkLoader	This class reads through two files (data and metadata), tokenizing metadata for use as a label vector.
DBInstanceIterator
DBInstanceStore
DecisionTree	Decision Tree classifier.
DecisionTree.Node
DecisionTreeTrainer	A decision tree learner, roughly ID3, but only to a fixed given depth in all branches.
DecisionTreeTrainer.Factory
DefaultTokenizationFilter	Created: Nov 12, 2004
DenseMatrix
DenseVector
Directory2FileIterator	Convert a File object representing a directory into a FileIterator which iterates over files in the directory matching a pattern and which extracts a label from each file path to become the target field of the instance.
DirectoryFilter
Dirichlet	Various useful functions related to Dirichlet distributions.
Dirichlet.Estimator
Dirichlet.MethodOfMomentsEstimator
DMRCallable	A parallel Dirichlet-multinomial regression topic model runnable task.
DMRInferencer
DMRLoader	This class loads data into the format for the MALLET Dirichlet-multinomial regression (DMR).
DMROptimizable
DMRTopicModel
DocumentClassifier
DocumentExtraction	Created: Oct 12, 2004
DocumentLengths
DocumentViewer	Diagnosis class that outputs HTML pages that allows you to view errors on a more global per-instance basis.
DoubleList
DownsampleLabelWords	This class implements the method from "Authorless Topic Models" by Thompson and Mimno, COLING 2018.
Element
EmptyInstanceIterator
EnronMessage2TokenSequence
EntropyLattice	Runs subsequence constrained forward-backward to compute the entropy of label sequences.
EuclideanDistance
EvaluateTopics
ExactMatchComparator	Created: Nov 23, 2004
ExpGain
ExpGain.Factory
Extraction	The results of doing information extraction.
ExtractionConfidenceEstimator	Estimates the confidence in the labeling of a LabeledSpan.
ExtractionEvaluator	Created: Oct 8, 2004
Extractor	Generic interface for objects that do information extraction.
FeatureConjunction
FeatureConjunction.List
FeatureConstraintUtil	Utility functions for creating feature constraints that can be used with GE training.
FeatureCooccurrenceCounter
FeatureCounter	Efficient, compact, incremental counting of features in an alphabet.
FeatureCountPipe	Pruning low-count features can be a good way to save memory and computation.
FeatureCounts
FeatureCounts.Factory
FeatureCountTool
FeatureDocFreqPipe	Pruning low-count features can be a good way to save memory and computation.
FeatureInducer
FeatureSelectingClassifierTrainer	Adaptor for adding feature selection to a classifier trainer.
FeatureSelection
FeatureSelector
FeatureSequence	An implementation of `Sequence` that ensures that every Object in the sequence has the same class.
FeatureSequence2AugmentableFeatureVector	Convert the data field from a feature sequence to an augmentable feature vector.
FeatureSequence2FeatureVector	Convert the data field from a feature sequence to a feature vector.
FeatureSequenceConvolution
FeatureSequenceWithBigrams	A FeatureSequence with a parallel record of bigrams, kept in a separate dictionary
FeaturesInWindow
FeaturesOfFirstMention
FeatureTransducer
FeatureValueString2FeatureVector
FeatureVector	A subset of an `Alphabet` in which each element of the subset has an associated value.
FeatureVectorConjunctions	Include in the FeatureVector conjunctions of all its features.
FeatureVectorSequence
FeatureVectorSequence2FeatureVectors	Given instances with a FeatureVectorSequence in the data field, break up the sequence into the individual FeatureVectors, producing one FeatureVector per Instance.
FeatureWindow	Adds all features of tokens in the window to the center token.
Field	Created: Oct 12, 2004
FieldCleaner	Interface for functions that are used to clean up field values after extraction has been performed.
FieldComparator	Interface for functions that compares extracted values of a field to see if they match.
FileIterator	An iterator that generates instances from an initial directory or set of directories.
FileListIterator	An iterator that generates instances for a pipe from a list of filenames.
Filename2CharSequence	Given a filename contained in a string, read in contents of file into a CharSequence.
FileUriIterator
FileUtils	Contains static utilities for manipulating files.
FilterEmptyFeatureVectors
FirstOrderClusterExample	Illustrates use of a supervised clustering method that uses features over clusters.
FixedVocabTokenizer	A simple unicode tokenizer that accepts sequences of letters as tokens.
FSTConstraintUtil	Expectation constraint utilities for fst package.
GainRatio	List of features along with their thresholds sorted in descending order of the ratio of (1) information gained by splitting instances on the feature at its associated threshold value, to (2) the split information.
GammaAverageConfidenceEstimator	Calculates the confidence in an extracted segment by taking the average of P(s_i\|o) for each state in the segment.
GammaProductConfidenceEstimator	Calculates the confidence in an extracted segment by taking the product of eP(s_i\|o) for each state in the segment.
GEConstraint	Interface for GE constraint that considers either one or two states.
GELattice	Runs the dynamic programming algorithm of [Mann and McCallum 08] for computing the gradient of a Generalized Expectation constraint that considers a single label of a linear chain CRF.
GradientAscent
GradientBracketLineOptimizer
GradientGain
GradientGain.Factory
Graph	Framework for standard graph.
Graph2	Methods for a 2-D graph
GraphItem	Holds data for a point on a graph
GreedyAgglomerative	Greedily merges Instances until convergence.
GreedyAgglomerativeByDensity	Greedily merges Instances until convergence.
HashedSparseVector
HierarchicalLDA
HierarchicalLDATUI
HierarchicalPAM	Hierarchical PAM, where each node in the DAG has a distribution over all topics on the next level and one additional "node-specific" topic.
HierarchicalTokenizationFilter	Tokenization filter that will create nested spans based on a hierarchical labeling of the data.
HillClimbingClusterer	A Clusterer that iteratively improves a predicted Clustering using a `NeighborEvaluator`.
HMM	A Hidden Markov Model.
HMM.State
HMM.TransitionIterator
HMMTrainerByLikelihood
IDSorter	This class is contains a comparator for use in sorting integers that have associated floating point values.
IndexedSparseVector
InferTopics
InfiniteDistance
InfoGain
InfoGain.Factory
Input2CharSequence	Pipe that can read from various kinds of text sources (either URI, File, or Reader) into a CharSequence
Instance	A machine learning "example" to be used in training, testing or performance of various machine learning algorithms.
InstanceAccuracyEvaluator	Reports the percentage of instances for which the entire predicted sequence was correct.
InstanceList	A list of machine learning instances, typically used for training or testing of a machine learning algorithm.
InstanceListPrinter
InstanceListTrimFeaturesByCount	Unimplemented.
InstanceListTUI
InstanceWithConfidence	Helper class to store confidence of an Instance.
InvalidOptimizableException	Exception thrown by optimization algorithms, when the problem is usually due to a problem with the given Maximizable instance.
InvertedIndex
IoUtils
IsolatedSegmentTransducerCorrector	Corrects a subset of the `Segment`s produced by a `Transducer`.
JSONTopicReports
KBestClusterer	Return the K best predicted Clusterings
KLGain
KMeans	KMeans Clusterer Clusters the points into k clusters by minimizing the total intra-cluster variance.
Label
LabelAlphabet	A mapping from arbitrary objects (usually String's) to integers (and corresponding Label objects) and back.
LabelDistributionEvaluator	Prints predicted and true label distribution.
LabeledLDA	LabeledLDA
LabeledSpan	Created: Oct 12, 2004
LabeledSpans	Created: Oct 31, 2004
Labeler
Labeling	A distribution over possible labels for an instance.
Labelings	A collection of labelings, either for a multi-label problem (all labels are part of the same label dictionary), or a factorized labeling, (each label is part of a different dictionary).
Labels	Usually some distribution over possible labels for an instance.
LabelSequence
LabelsSequence	A simple `Sequence` implementation where all of the elements must be Labels.
LabelVector
LatticeViewer	Created: Oct 31, 2004
LDA	Deprecated. Use ParallelTopicModel instead.
LDAHyper	Deprecated. Use ParallelTopicModel instead, which uses substantially faster data structures even for non-parallel operation.
LDAStream
LeastSquares
LengthBins	A feature approximating string length.
Lexer
LexiconMembership
LimitedMemoryBFGS
LinearRegression
LinearRegressionTrainer
LineGroupIterator	Iterate over groups of lines of text, separated by lines that match a regular expression.
LineGroupString2TokenSequence
LineIterator
LineOptimizer	Optimize, constrained to move parameters along the direction of a specified line.
LineOptimizer.ByGradient
ListMember	Checks membership in a lexicon in a text file.
LogNumber
LongRegexMatches	Matches a regular expression which spans several tokens.
MakeAmpersandXMLFriendly	convert & to &amp in tokens of a token sequence
MalletLogger
MalletProgressMessageLogger	Created by IntelliJ IDEA.
ManhattenDistance
MarginalProbEstimator	An implementation of topic model marginal probability estimators presented in Wallach et al., "Evaluation Methods for Topic Models", ICML (2009)
Maths
Matrix
Matrixn	Implementation of Matrix that allows arbitrary number of dimensions.
MatrixOps	A class of static utility functions for manipulating arrays of double.
MaxEnt	Maximum Entropy (AKA Multivariate Logistic Regression) classifier.
MaxEntConfidenceEstimator	Estimates the confidence of a `Segment` extracted by a `Transducer` using a `MaxEnt` classifier to classify segments as "correct" or "incorrect." xxx needs some interface work
MaxEntFLGEConstraints	Abstract expectation constraint for use with Generalized Expectation (GE).
MaxEntFLPRConstraints	Abstract expectation constraint for use with Posterior Regularization (PR).
MaxEntGEConstraint	Interface for expectation constraints for use with Generalized Expectation (GE).
MaxEntGERangeTrainer	Training of MaxEnt models with labeled features using Generalized Expectation Criteria.
MaxEntGETrainer	Training of MaxEnt models with labeled features using Generalized Expectation Criteria.
MaxEntKLFLGEConstraints	Expectation constraint for use with GE.
MaxEntL1Trainer
MaxEntL2FLGEConstraints	Expectation constraint for use with GE.
MaxEntL2FLPRConstraints	Expectation constraint for use with Posterior Regularization (PR).
MaxEntOptimizableByGE
MaxEntOptimizableByLabelDistribution
MaxEntOptimizableByLabelLikelihood
MaxEntPRConstraint	Interface for expectation constraints for use with Posterior Regularization (PR).
MaxEntPRTrainer	Penalty (soft) version of Posterior Regularization (PR) for training MaxEnt.
MaxEntRangeL2FLGEConstraints	Expectation constraint for use with GE.
MaxEntSequenceConfidenceEstimator	Estimates the confidence of a `Sequence` extracted by a `Transducer` using a `MaxEnt` classifier to classify Sequences as "correct" or "incorrect." xxx needs some interface work.
MaxEntShell	Simple wrapper for training a MALLET maxent classifier.
MaxEntTrainer	The trainer for a Maximum Entropy classifier.
MaxLattice	The interface to classes implementing the Viterbi algorithm, finding the best sequence of states for a given input sequence.
MaxLatticeDefault	Default, full dynamic programming version of the Viterbi "Max-(Product)-Lattice" algorithm.
MaxLatticeDefault.Factory
MaxLatticeFactory
MCMaxEnt	Maximum Entropy classifier.
MCMaxEntTrainer	The trainer for a Maximum Entropy classifier.
MedoidEvaluator	Uses a `Classifier` over pairs of `Instances` to score `Neighbor`.
MedoidEvaluator.Average
MedoidEvaluator.CombiningStrategy	Specifies how to combine a set of pairwise scores into a cluster-wise score.
MedoidEvaluator.Maximum
MedoidEvaluator.Minimum
MEMM	A Maximum Entropy Markov Model.
MEMM.State
MEMM.TransitionIterator
MEMMTrainer	Trains and evaluates a `MEMM`.
MergeCallable	This task copies topic-word counts from a global array to a local thread-specific array.
Metric
MinHeap	Created by IntelliJ IDEA.
Minkowski
MinSegmentConfidenceEstimator	Estimates the confidence of an entire sequence by the least confidence segment.
MostFrequentClassAssignmentTrainer	A Classifier Trainer to be used with MostFrequentClassifier.
MostFrequentClassifier	A Classifier that will return the most frequent class label based on a training set.
MUCEvaluator	Evaluate a Clustering using the MUC evaluation metric.
MultFileToSequences
MultiInstanceList	An implementation of InstanceList that logically combines multiple instance lists so that they appear as one list without copying the original lists.
Multinomial	A probability distribution over a set of features represented as a `FeatureVector`.
Multinomial.Estimator	A hierarchy of classes used to produce estimates of probabilities, in the form of a Multinomial, from counts associated with the elements of an Alphabet.
Multinomial.LaplaceEstimator	An MEstimator with m set to 1.
Multinomial.Logged	A Multinomial in which the values associated with each feature index fi is Math.log(probability[fi]) instead of probability[fi].
Multinomial.MAPEstimator	Unimplemented, but the MEstimators are.
Multinomial.MEstimator	An Estimator in which probability estimates in a Multinomial are generated by adding a constant m (specified at construction time) to each count before dividing by the total of the m-biased counts.
Multinomial.MLEstimator	An MEstimator with m set to 0.
MultinomialHMM	Latent Dirichlet Allocation.
MultiSegmentationEvaluator	Evaluates a transducer model, computes the precision, recall and F1 scores; considers segments that span across multiple tokens.
MVNormal	Tools for working with multivariate normal distributions
NaiveBayes	A classifier that classifies instances according to the NaiveBayes method.
NaiveBayesEMTrainer
NaiveBayesTrainer	Class used to generate a NaiveBayes classifier from a set of training data.
NaiveBayesTrainer.Factory
NBestViterbiConfidenceEstimator	Estimates the confidence of an entire sequence by the probability that one of the the Viterbi paths rank 2->N is correct.
Neighbor	A Clustering and a modified version of that Clustering.
NeighborEvaluator	Scores the value of changing the current `Clustering` to the modified `Clustering` specified in a `Neighbor` object.
NeighborIterator	Sample Instances with data objects equal to `Neighbor`s.
NEPipes
NGramPreprocessor	This pipe changes text to lowercase, removes common XML entities (quot, apos, lt, gt), and replaces all punctuation except the - character with whitespace.
NodeClusterSampleIterator	Samples merges of a singleton cluster with another (possibly non-singleton) cluster.
NonNegativeMatrixFactorization
Noop	A pipe that does nothing to the instance fields but which has side effects on the dictionary.
NoopTransducerTrainer	A TransducerTrainer that does no training, but simply acts as a container for a Transducer; for use in situations that require a TransducerTrainer, such as the TransducerEvaluator methods.
NormalizedDotProductMetric	Computes 1 - [ / sqrt (*)] aka 1 - cosine similarity
NPTopicModel	A non-parametric topic model that uses the "minimal path" assumption to reduce bookkeeping.
NullLabel	Object that carries a LabelAlphabet.
OffsetConjunctions
OffsetFeatureConjunction
OffsetPropertyConjunctions
OneLabelGEConstraints	A set of constraints on distributions over single labels conditioned on the presence of input features.
OneLabelKLGEConstraints	A set of constraints on distributions over consecutive labels conditioned an input features.
OneLabelL2GEConstraints	A set of constraints on distributions over consecutive labels conditioned an input features.
OneLabelL2IndPRConstraints	A set of constraints on individual input feature label pairs.
OneLabelL2PRConstraints	A set of constraints on distributions over single labels conditioned on the presence of input features.
OneLabelL2RangeGEConstraints	A set of constraints on individual input feature label pairs.
Optimizable
Optimizable.ByBatchGradient
Optimizable.ByCombiningBatchGradient
Optimizable.ByGISUpdate
Optimizable.ByGradient
Optimizable.ByGradientValue
Optimizable.ByHessian
Optimizable.ByValue
Optimizable.ByVotedPerceptron
OptimizableCollection
OptimizationException	General exception thrown by optimization algorithms when there is an optimization-specific problem.
Optimizer
Optimizer.ByBatches	Deprecated.
OptimizerEvaluator	Callback interface that allows optimizer clients to perform some operation after every iteration.
OptimizerEvaluator.ByBatchGradient
OptimizerEvaluator.ByGradient
OrthantWiseLimitedMemoryBFGS	Implementation of orthant-wise limited memory quasi Newton method for optimizing convex L1-regularized objectives.
PagedInstanceList	An InstanceList which avoids OutOfMemoryErrors by saving Instances to disk when there is not enough memory to create a new Instance.
PairF1Evaluator	Evaluates two clustering using pairwise comparisons.
PairSampleIterator	Sample pairs of Instances.
PairwiseEvaluator	Uses a `Classifier` over pairs of `Instances` to score `Neighbor`.
PairwiseEvaluator.Average
PairwiseEvaluator.CombiningStrategy	Specifies how to combine a set of pairwise scores into a cluster-wise score.
PairwiseEvaluator.Maximum
PairwiseEvaluator.Minimum
PairwiseMatrix	2-D upper-triangular matrix.
PairwiseScorer	For each pair of Instances, if the pair is predicted to be in the same cluster, increment the total by the evaluator's score for merging the two.
PAM4L	Four Level Pachinko Allocation with MLE learning, based on Andrew's Latent Dirichlet Allocation.
ParallelTopicModel	Simple parallel threaded implementation of LDA, following Newman, Asuncion, Smyth and Welling, Distributed Algorithms for Topic Models JMLR (2009), with SparseLDA sampling scheme and data structure from Yao, Mimno and McCallum, Efficient Methods for Topic Model Inference on Streaming Document Collections, KDD (2009).
ParenGroupIterator	Iterator that takes a Reader, breaks up the input into top-level parenthesized expressions.
PartiallyRankedFeatureVector
PartiallyRankedFeatureVector.Factory
PartiallyRankedFeatureVector.PerLabelFactory
PatternMatchIterator	Iterates over matching regular expresions.
PerClassAccuracyEvaluator	Determines the precision, recall and F1 on a per-class basis.
PerDocumentF1Evaluator	Created: Oct 8, 2004
PerFieldF1Evaluator	Created: Oct 8, 2004
PerLabelFeatureCounts
PerLabelFeatureCounts.Factory
PerLabelInfoGain
PerLabelInfoGain.Factory
Pipe	The abstract superclass of all Pipes, which transform one data type to another.
PipedInstanceWithConfidence	Helper class to store confidence of an Instance.
PipeException
PipeExtendedIterator	Deprecated.
PipeInputIterator	Deprecated.
PipeUtils	Created: Aug 28, 2005
PlainLogFormatter
PolylingualTopicModel	Latent Dirichlet Allocation for loosely parallel corpora in arbitrary languages
PRAuxClassifier	Auxiliary model (q) for E-step/I-projection in PR training.
PRAuxClassifierOptimizable	Optimizable for training auxiliary model (q) for E-step/I-projection in PR training.
PRAuxiliaryModel	Auxiliar model (q) for E-step/I-projection in Posterior Regularization (PR).
PRConstraint	Interface for PR constraint that considers either one or two states.
PrintInput	Print the data field of each instance.
PrintInputAndTarget	Print the data and target fields of each instance.
PrintTokenSequenceFeatures	Print properties of the token sequence in the data field and the corresponding value of any token in a token sequence or feature in a featur sequence in the target field.
PrintUtilities	A simple utility class that lets you very simply print an arbitrary component.
PriorityQueue	Created by IntelliJ IDEA.
ProgressMessageLogFormatter	Format ProgressMessages destined for screen.
ProgressMessageLogRecord	A log message that is to be written in place (no newline) if the message is headed for the user's terminal.
PropertyHolder	Author: saunders Created Nov 15, 2005 Copyright (C) Univ.
PropertyList
PunctuationIgnoringComparator	Created: Nov 23, 2004
QBCSequenceConfidenceEstimator	Estimates the confidence of an entire sequence by the "disagreement" among a committee of CRFs.
QR
QueueElement	Created by IntelliJ IDEA.
RandomAssignmentTrainer	A Classifier Trainer to be used with RandomClassifier.
RandomClassifier	A Classifier that will return a randomly selected class label.
RandomConfidenceEstimator	Randomly assigns values between 0-1 to the confidence of a `Segment`.
RandomEvaluator	Randomly scores `Neighbor`s.
RandomFeatureVectorIterator
Randoms
RandomSequenceConfidenceEstimator	Estimates the confidence of an entire sequence randomly.
RandomTokenSequenceIterator
RankedFeatureVector
RankedFeatureVector.Factory
RankedFeatureVector.PerLabelFactory
RankingNeighborEvaluator	Uses a `Classifier` that scores an array of `Neighbor`s.
RankMaxEnt	Rank Maximum Entropy classifier.
RankMaxEntTrainer	The trainer for a `RankMaxEnt` classifier.
Record
Record	Created: Oct 12, 2004
RegexFieldCleaner	A field cleaner that removes all occurrences of a given regex.
RegexFileFilter
RegexMatches
Regression
RegressionImporter	Load data suitable for linear and Poisson regression
Replacement
Replacer	This class replaces ngrams as specified in the configuration files.
ROCData	Tracks ROC data for instances in `Trial` results.
RTopicModel	A wrapper for a topic model to be used from the R statistical package through rJava.
RunCRFPipe
SaveDataInSource	Set the source field of each instance to its data field.
SearchNode	Created by IntelliJ IDEA.
SearchState	Created by IntelliJ IDEA.
SearchState.NextStateIterator	Iterator over the states with transitions from a given state.
Segment	Represents a labelled chunk of a `Sequence` segmented by a `Transducer`, usually corresponding to some object extracted from an input `Sequence`.
SegmentationEvaluator
SegmentIterator	Iterates over `Segment`s extracted by a `Transducer` for some `InstanceList`.
SegmentProductConfidenceEstimator	Estimates the confidence of an entire sequence by combining the output of a segment confidence estimator for each segment.
SelectiveFileLineIterator	Very similar to the SimpleFileLineIterator, but skips lines that match a regular expression.
SelectiveSGML2TokenSequence	Similar to `SGML2TokenSequence`, except that only the tags listed in `allowedTags` are converted to `Label`s.
SelfTransitionGEConstraint	GE Constraint on the probability of self-transitions in the FST.
Sequence<E>
SequenceConfidenceInstance	Stores a `Sequence` and a PropertyList, used when extracting features from a Sequence in a pipe for confidence prediction
SequencePair<I,O>
SequencePairAlignment<I,O>
SequencePrintingPipe	Created: Jul 6, 2005
Sequences	Utility methods for cc.mallet.types.Sequence and similar classes.
SerialPipes	Convert an instance through a sequence of pipes.
SGML2TokenSequence	Converts a string containing simple SGML tags into a dta TokenSequence of words, paired with a target TokenSequence containing the SGML tags in effect for each word.
ShallowTransducerTrainer	Deprecated. Use `NoopTransducerTrainer` instead
SimpleFileLineIterator
SimpleLDA	A simple implementation of Latent Dirichlet Allocation using Gibbs sampling.
SimpleTagger	This class's main method trains, tests, or runs a generic CRF-based sequence tagger.
SimpleTagger.SimpleTaggerSentence2FeatureVectorSequence	Converts an external encoding of a sequence of elements with binary features to a `FeatureVectorSequence`.
SimpleTaggerSentence2StringTokenization	This extends `SimpleTaggerSentence2TokenSequence` to use {Slink StringTokenizations} for use with the extract package.
SimpleTaggerSentence2TokenSequence	Converts an external encoding of a sequence of elements with binary features to a `TokenSequence`.
SimpleTaggerWithConstraints	Version of SimpleTagger that trains CRFs with expectation constraints rather than labeled data.
SimpleTokenizer	A simple unicode tokenizer that accepts sequences of letters as tokens.
SingleInstanceIterator
SourceLocation2TokenSequence	Read from File or BufferedRead in the data field and produce a TokenSequence.
Span	A sub-section of a document, either linear or two-dimensional.
SparseMatrixn	Implementation of Matrix that allows arbitrary number of dimensions.
SparseVector	A vector that allocates memory only for non-zero values.
StateLabelMap	Maps states in the lattice to labels.
StateToInstances	Sometimes you have a topic sampling state, but not the original instance list file.
StatFunctions
StochasticMetaAscent
StringAddNewLineDelimiter	Pipe that can adds special text between lines to explicitly represent line breaks.
StringArrayIterator
StringEditFeatureVectorSequence
StringEditVector
StringIterator	Java implementation of Jonathan Wood's "Text Parsing Helper Class".
StringKernel	Computes a similarity metric between two strings, based on counts of common subsequences of characters.
StringList2FeatureSequence	Convert a list of strings into a feature sequence
Strings	Static utility methods for Strings
StringSpan	A sub-section of a linear string.
StringTokenization
SumLattice	Interface to perform forward-backward during training of a transducer.
SumLatticeBeam
SumLatticeBeam.Factory
SumLatticeConstrained
SumLatticeDefault	Default, full dynamic programming implementation of the Forward-Backward "Sum-(Product)-Lattice" algorithm
SumLatticeDefault.Factory
SumLatticeDefaultCachedDot	Default, full dynamic programming implementation of the Forward-Backward "Sum-(Product)-Lattice" algorithm
SumLatticeFactory	Provides factory methods to create inference engine for training a transducer.
SumLatticeKL	Lattice for M-step/M-projection in PR.
SumLatticePR	Lattice for E-step/I-projection in PR.
SumLatticeScaling
SumLatticeScaling.Factory
SVD
SvmLight2Classify	Command line tool for classifying a sequence of instances directly from text input, without creating an instance list.
SvmLight2FeatureVectorAndLabel	This Pipe converts a line in SVMLight format to a Mallet instance with FeatureVector data and Label target.
SvmLight2Vectors	Command line import tool for loading a sequence of instances from an SVMLight feature-value pair file, with one instance per line of the input file.
Target2BIOFormat	Creates a `LabelSequence` out of a `TokenSequence` that is the target of an `Instance`.
Target2Double	Convert object in the target field into a floating-point numeric type
Target2FeatureSequence	Convert a token sequence in the target field into a feature sequence in the target field.
Target2Integer	Convert object in the target field into an integer numeric type
Target2Label	Convert object in the target field into a label in the target field.
Target2LabelSequence	convert a token sequence in the target field into a label sequence in the target field.
TargetRememberLastLabel	For each position in the target, remember the last non-background label.
TargetStringToFeatures
TestAlphabet	Created: Nov 24, 2004
TestAStar	Created by IntelliJ IDEA.
TestAugmentableFeatureVector	Created: Dec 30, 2004
TestBiNormalSeparation
TestCharSequenceNoDiacritics
TestCharSequenceReplaceHtmlEntities
TestClassifiers
TestClusteringEvaluators	Examples drawn from Luo, "On Coreference Resolution Performance Metrics", HLT 2005.
TestCRF	Tests for CRF training.
TestCRF.TestCRF2String
TestCRF.TestCRFTokenSequenceRemoveSpaces
TestDocumentExtraction	Created: Oct 12, 2004
TestFeatureSequence
TestFeatureTransducer
TestFeatureVector
TestHashedSparseVector
TestIndexedSparseVector
TestInstanceList
TestInstanceListWeights
TestInstancePipe
TestInstancePipe.Array2ArrayIterator
TestIterators
TestIterators	Unit Test for PipeInputIterators Created: Thu Feb 26 14:27:15 2004
TestLabelAlphabet	Created: Nov 24, 2004
TestLabelsSequence	Created: Sep 21, 2004
TestLabelVector
TestMaths	Created: Oct 31, 2004
TestMatrix
TestMatrixn	Created: Aug 30, 2004
TestMatrixOps
TestMaxEntTrainer
TestMEMM	Tests for MEMM training.
TestMEMM.TestMEMMTokenSequenceRemoveSpaces
TestMultinomial
TestNaiveBayes
TestOffsetConjunctions
TestOffsetFeatureConjunctions	$Id: TestOffsetFeatureConjunctions.java,v 1.1 2007/10/22 21:37:57 mccallum Exp $
TestOptimizable	Contains static methods for testing subclasses of Maximizable and Maximizable.ByGradient.
TestOptimizer	Unit Test for class TestMaximizer.java Created: Mon Apr 26 19:54:25 2004
TestPagedInstanceList	Created: Apr 19, 2005
TestPatternMatchIterator
TestPerDocumentF1Evaluator	Created: Nov 18, 2004
TestPipeUtils	Created: Aug 28, 2005
TestPriorityQueue	Created by IntelliJ IDEA.
TestPropertyList
TestQR
TestRandom	Created: Jan 19, 2005
TestRankedFeatureVector
TestSequencePrintingPipe	Created: Jul 8, 2005
TestSerializable	Static utility for testing serializable classes in MALLET.
TestSGML2TokenSequence
TestSGML2TokenSequence.Array2ArrayIterator
TestSpacePipe	Unit Test for class TestSpacePipe.java Created: Thu Feb 26 14:56:55 2004
TestSparseMatrixn	Created: Aug 30, 2004
TestSparseVector
TestStaticParameters
TestStaticParameters.Factory
TestStringIterator
TestStrings	Created: Jan 19, 2005
TestSumNegLogProb2
TestToken
TestTokenSequence2PorterStems
Text
Text2Classify	Command line tool for classifying a sequence of instances directly from text input, without creating an instance list.
Text2Clusterings
Text2Vectors	Convert document files into vectors (a persistent instance list).
ThreadedOptimizable	An adaptor for optimizables based on batch values/gradients.
Timing	A class for timing things.
Token	A representation of a piece of text, usually a single word, to which we can attach properties.
Token2FeatureVector	convert the property list on a token into a feature vector
TokenAccuracyEvaluator	Evaluates a transducer model based on predictions of individual tokens.
TokenFirstPosition
Tokenization
TokenizationFilter	Created: Nov 12, 2004
TokenSequence	A representation of a piece of text, usually a single word, to which we can attach properties.
TokenSequence2FeatureSequence	Convert the token sequence in the data field each instance to a feature sequence.
TokenSequence2FeatureSequenceWithBigrams	Convert the token sequence in the data field of each instance to a feature sequence that preserves bigram information.
TokenSequence2FeatureVectorSequence	Convert the token sequence in the data field of each instance to a feature vector sequence.
TokenSequence2PorterStems
TokenSequence2Tokenization	Heuristically converts a simple token sequence into a Tokenization that can be used with all the extract package goodies.
TokenSequenceDocHeader
TokenSequenceLowercase	Convert the text in each token in the token sequence in the data field to lower case.
TokenSequenceMatchDataAndTarget	Run a regular expression over the text of each token; replace the text with the substring matching one regex group; create a target TokenSequence from the text matching another regex group.
TokenSequenceNGrams	Convert the token sequence in the data field to a token sequence of ngrams.
TokenSequenceParseFeatureString	Convert the string in each field `Token.text` to a list of Strings (space delimited).
TokenSequenceRemoveNonAlpha	Remove tokens that contain non-alphabetic characters.
TokenSequenceRemoveStopPatterns	Remove tokens from the token sequence in the data field whose text matches any of a set of regular expressions.
TokenSequenceRemoveStopwords	Remove tokens from the token sequence in the data field whose text is in the stopword list.
TokenText
TokenTextCharNGrams
TokenTextCharPrefix
TokenTextCharSuffix
TokenTextNGrams
TopicalNGrams	Like Latent Dirichlet Allocation, but with integrated phrase discovery.
TopicAssignment	This class combines a sequence of observed features with a sequence of hidden "labels".
TopicInferencer
TopicModel
TopicModelDiagnostics
TopicReports
TopicTrainer	Create a simple LDA topic model, with some reporting options.
TrainCRF
TrainHMM
Transducer	A base class for all sequence models, analogous to `classify.Classifier`.
Transducer.Incrementor	Methods to be called by inference methods to indicate partial counts of sufficient statistics.
Transducer.State	An abstract class used to represent the states of the transducer.
Transducer.TransitionIterator	An abstract class to iterate over the states of the transducer.
TransducerConfidenceEstimator	Abstract class that estimates the confidence of a `Segment` extracted by a `Transducer`.
TransducerCorrector	Interface for transducerCorrectors, which correct a subset of the `Segment`s produced by a `Transducer`.
TransducerEvaluator	An abstract class to evaluate a transducer model.
TransducerExtractionConfidenceEstimator	Estimates the confidence in the labeling of a LabeledSpan using a TransducerConfidenceEstimator.
TransducerSequenceConfidenceEstimator	Abstract class that estimates the confidence of a `Sequence` extracted by a `Transducer`.Note that this is different from `TransducerConfidenceEstimator`, which estimates the confidence for a single `Segment`.
TransducerTrainer	An abstract class to train and evaluate a transducer model.
TransducerTrainer.ByIncrements
TransducerTrainer.ByInstanceIncrements
TransducerTrainer.ByOptimization
Trial	Stores the results of classifying a collection of Instances, and provides many methods for evaluating the results.
TrieLexiconMembership
TUI
TUI
TwoLabelGEConstraints	A set of constraints on distributions over pairs of consecutive labels conditioned on the presence of input features.
TwoLabelKLGEConstraints	A set of constraints on distributions over consecutive labels conditioned an input features.
TwoLabelL2GEConstraints	A set of constraints on distributions over consecutive labels conditioned an input features.
Univariate
UnlabeledFileIterator	An iterator that generates instances from an initial directory or set of directories.
UriUtils
ValueString2FeatureVector
Vector	Deprecated.
Vectors2Classify	Classify documents, run trials, print statistics from a vector file.
Vectors2FeatureConstraints	Create "feature constraints" from data for use in GE training.
Vectors2Info	Diagnostic facilities for a vector file.
Vectors2Topics	Perform topic analysis in the style of LDA and its variants.
Vectors2Vectors	A command-line tool for manipulating InstanceLists.
VectorStats
ViterbiConfidenceEstimator	Estimates the confidence of an entire sequence by the probability of the Viterbi path normalized by the probabliity of the entire lattice.
ViterbiRatioConfidenceEstimator	Estimates the confidence of an entire sequence by the ration of the probabilities of the first and second best Viterbi paths.
ViterbiWriter	Prints the input instances along with the features and the true and predicted labels to a file.
WeightedTopicModel
Winnow	Classification methods of Winnow2 algorithm.
WinnowTrainer	An implementation of the training methods of a Winnow2 on-line classifier.
WordEmbeddingCallable
WordEmbeddings
WordTransformation
WordVectors
WorkerCallable	A parallel topic model callable task.
WorkerRunnable	A parallel topic model runnable task.