All Classes Interface Summary Class Summary Exception Summary
Class |
Description |
AbstractTopicReports |
|
AccuracyCoverage |
Methods for calculating and displaying the accuracy v.
|
AccuracyCoverageEvaluator |
Constructs Accuracy-coverage graph using confidence values to sort Fields.
|
AccuracyEvaluator |
Accuracy of a clustering is (truePositive + trueNegative) / (numberPairwiseComparisons)
|
AdaBoost |
AdaBoost
Robert E.
|
AdaBoostM2 |
AdaBoostM2
|
AdaBoostM2Trainer |
This version of AdaBoost can handle multi-class problems.
|
AdaBoostTrainer |
This version of AdaBoost should be used only for binary classification.
|
Addable |
|
AddClassifierTokenPredictions |
This pipe uses a Classifier to label each token (i.e., using 0-th order Markov assumption),
then adds the predictions as features to each token.
|
AddClassifierTokenPredictions.TokenClassifiers |
This inner class represents the trained token classifiers.
|
AgglomerativeNeighbor |
A Neighbor created by merging two clusters of the original
Clustering.
|
AGIS |
|
AllPairsIterator |
Iterate over all pairs of Instances.
|
Alphabet |
A mapping between integers and objects where the mapping in each
direction is efficient.
|
AlphabetCarrying |
An interface for objects that contain one or more Alphabets.
|
AlphabetFactory |
|
Array2FeatureVector |
Converts a Java array of numerical types to a FeatureVector, where the
Alphabet is the data array index wrapped in an Integer object.
|
ArrayDataAndTargetIterator |
|
ArrayIterator |
|
ArrayListSequence<E> |
|
ArrayListUtils |
|
ArraySequence<E> |
|
ArrayUtils |
Static utility methods for arrays
(like java.util.Arrays, but more useful).
|
AStar |
Created by IntelliJ IDEA.
|
AStarNode |
Created by IntelliJ IDEA.
|
AStarState |
Created by IntelliJ IDEA.
|
AugmentableFeatureVector |
|
AugmentableFeatureVectorAddConjunctions |
Add specified conjunctions to each instance.
|
AugmentableFeatureVectorLogScale |
Given an AugmentableFeatureVector, set those values greater than
or equal to 1 to log(value)+1.
|
BackTrackLineSearch |
|
BaggingClassifier |
|
BaggingTrainer |
Bagging Trainer.
|
BalancedWinnow |
Classification methods of BalancedWinnow algorithm.
|
BalancedWinnowTrainer |
An implementation of the training methods of a BalancedWinnow
on-line classifier.
|
BCubedEvaluator |
Evaluate a Clustering using the B-Cubed evaluation metric.
|
BiNormalSeparation |
Bi-Normal Separation is a feature weighting algorithm introduced in:
An Extensive Empirical Study of Feature Selection Metrics for Text Classification,
George Forman, Journal of Machine Learning Research, 3:1289--1305, 2003.
|
BiNormalSeparation.Factory |
Factory class.
|
BIOTokenizationFilter |
Created: Nov 12, 2004
|
BIOTokenizationFilterWithTokenIndices |
|
Boostable |
This interface is a tag indicating that the classifier attends to the
InstanceList.getInstanceWeight() weights when training.
|
BranchingPipe |
Deprecated. |
BshInterpreter |
|
BulkLoader |
This class reads through a single file, breaking each line
into data and (optional) name and label fields.
|
C45 |
A C4.5 Decision Tree classifier.
|
C45.Node |
|
C45Trainer |
A C4.5 decision tree learner, approximtely.
|
CachedDotTransitionIterator |
TransitionIterator that caches dot products.
|
CachedMetric |
Stores a hash for each object being compared for efficient
computation.
|
CacheStaleIndicator |
Indicates when the value/gradient during training becomes stale.
|
Calo2Classify |
Classify documents, run trials, print statistics from a vector file.
|
ChainedInstanceIterator |
Deprecated. |
CharSequence2CharNGrams |
Transform a character sequence into a token sequence of character N grams.
|
CharSequence2TokenSequence |
Pipe that tokenizes a character sequence.
|
CharSequenceArray2TokenSequence |
Transform an array of character Sequences into a token sequence.
|
CharSequenceLexer |
|
CharSequenceLowercase |
Replace the data string or string buffer with a lowercased version.
|
CharSequenceNoDiacritics |
A string normalizer which performs the following steps:
Unicode canonical decomposition (Form#NFD )
Removal of diacritical marks
Unicode canonical composition (Form#NFC )
|
CharSequenceRemoveHTML |
This pipe removes HTML from a CharSequence.
|
CharSequenceRemoveUUEncodedBlocks |
|
CharSequenceReplace |
Given a string, repeatedly look for matches of the regex, and
replace the entire match with the given replacement string.
|
CharSequenceReplaceHtmlEntities |
|
CharSubsequence |
Given a string, return only the portion of the string inside a regex parenthesized group.
|
Classification |
The result of classifying a single instance.
|
Classification2ConfidencePredictingFeatureVector |
Pipe features from underlying classifier to
the confidence prediction instance list
|
Classifier |
Abstract parent of all Classifiers.
|
Classifier2Info |
Diagnostic facilities for a classifier.
|
ClassifierAccuracyEvaluator |
|
ClassifierEnsemble |
Classifer for an ensemble of classifers, combined with learned weights.
|
ClassifierEnsembleTrainer |
|
ClassifierEvaluator |
|
ClassifierTrainer<C extends Classifier> |
Each ClassifierTrainer trains one Classifier based on various interfaces for consuming training data.
|
ClassifierTrainer.ByActiveLearning<C extends Classifier> |
For active learning, in which this trainer will select certain instances and
request that the Labeler instance label them.
|
ClassifierTrainer.ByIncrements<C extends Classifier> |
For various kinds of online learning by batches, where training instances are presented,
consumed for learning immediately.
|
ClassifierTrainer.ByInstanceIncrements<C extends Classifier> |
For online learning that can operate on one instance at a time.
|
ClassifierTrainer.ByOptimization<C extends Classifier> |
|
ClassifierTrainer.Factory<CT extends ClassifierTrainer<? extends Classifier>> |
Instances of a Factory know how to create new ClassifierTrainers to apply to new Classifiers.
|
ClassifyingNeighborEvaluator |
|
Clusterer |
An abstract class for clustering a set of points.
|
Clustering |
|
ClusteringEvaluator |
Evaluates a predicted Clustering against a true Clustering.
|
ClusteringEvaluators |
|
Clusterings |
|
Clusterings2Clusterer |
|
Clusterings2Clusterer.ClusteringPipe |
|
Clusterings2Clusterings |
|
Clusterings2Info |
|
ClusteringScorer |
Assign a score to a Clustering.
|
ClusterSampleIterator |
Sample clusters of Instances.
|
ClusterUtils |
Utility functions for Clusterings.
|
ColorUtils |
Utilities for dealing with RGB-style colors.
|
CommandOption |
|
CommandOption.Boolean |
|
CommandOption.Double |
|
CommandOption.DoubleArray |
|
CommandOption.File |
|
CommandOption.Integer |
|
CommandOption.IntegerArray |
|
CommandOption.List |
|
CommandOption.ListProviding |
For objects that can provide CommandOption.List's (which can be merged into other lists.
|
CommandOption.Object |
|
CommandOption.ObjectFromBean |
|
CommandOption.Set |
|
CommandOption.SpacedStrings |
|
CommandOption.String |
|
ConcatenatedInstanceIterator |
|
ConfidenceCorrectorEvaluator |
Calculates the effectiveness of "constrained viterbi" in
propagating corrections in one segment of a sequence to other
segments.
|
ConfidenceEvaluator |
|
ConfidenceEvaluator.EntityConfidence |
a simple class to store a confidence score and whether or not this
labeling is correct
|
ConfidencePredictingClassifier |
|
ConfidencePredictingClassifierTrainer |
|
ConfidenceTokenizationFilter |
Created: Oct 26, 2005
|
ConfusionMatrix |
Calculates and prints confusion matrix, accuracy,
and precision for a given clasification trial.
|
ConjugateGradient |
|
ConllNer2003Sentence2TokenSequence |
Reads a data file in CoNLL 2003 format, and makes some simple
transformations.
|
ConllNer2003Sentence2TokenSequence |
|
ConstantMatrix |
|
ConstrainedForwardBackwardConfidenceEstimator |
Estimates the confidence of a Segment extracted by a Transducer by performing a "constrained lattice"
calculation.
|
ConstrainedViterbiTransducerCorrector |
|
ConstraintsOptimizableByPR |
Optimizable for E-step/I-projection in Posterior Regularization (PR).
|
CoordinateDescent |
|
CopyCallable |
This task copies topic-word counts from a global array to a local
thread-specific array.
|
CountMatches |
|
CountMatchesAlignedWithOffsets |
|
CountMatchesMatching |
|
CountsToFeatureSequencePipe |
|
CRF |
Represents a CRF model.
|
CRF.Factors |
A simple, transparent container to hold the parameters or sufficient statistics for the CRF.
|
CRF.State |
|
CRF.TransitionIterator |
|
CRFCacheStaleIndicator |
Indicates when the value/gradient becomes stale based on updates to CRF's
parameters.
|
CRFExtractor |
Created: Oct 12, 2004
|
CRFOptimizableByBatchLabelLikelihood |
Implements label likelihood gradient computations for batches of data, can be
easily parallelized.
|
CRFOptimizableByBatchLabelLikelihood.Factory |
|
CRFOptimizableByEntropyRegularization |
A CRF objective function that is the entropy of the CRF's
predictions on unlabeled data.
|
CRFOptimizableByGE |
Optimizable for CRF using Generalized Expectation constraints that
consider either a single label or a pair of labels of a linear chain CRF.
|
CRFOptimizableByGradientValues |
A CRF objective function that is the sum of multiple
objective functions that implement Optimizable.ByGradientValue.
|
CRFOptimizableByKL |
M-step/M-projection for PR.
|
CRFOptimizableByLabelLikelihood |
An objective function for CRFs that is the label likelihood plus a Gaussian or hyperbolic prior on parameters.
|
CRFOptimizableByLabelLikelihood.Factory |
|
CRFTrainerByEntropyRegularization |
A CRF trainer that maximizes the log-likelihood plus
a weighted entropy regularization term on unlabeled
data.
|
CRFTrainerByGE |
Trains a CRF using Generalized Expectation constraints that
consider either a single label or a pair of labels of a linear chain CRF.
|
CRFTrainerByL1LabelLikelihood |
CRF trainer that implements L1-regularization.
|
CRFTrainerByLabelLikelihood |
Unlike ClassifierTrainer, TransducerTrainer is not "stateless" between calls
to train.
|
CRFTrainerByLikelihoodAndGE |
|
CRFTrainerByPR |
Posterior regularization trainer.
|
CRFTrainerByStochasticGradient |
Trains CRF by stochastic gradient.
|
CRFTrainerByThreadedLabelLikelihood |
|
CRFTrainerByValueGradients |
A CRF trainer that can combine multiple objective functions, each represented
by a Optmizable.ByValueGradient.
|
CRFWriter |
Saves a trained model to specified filename.
|
CrossValidationIterator |
An iterator which splits an InstanceList into n-folds and iterates
over the folds for use in n-fold cross-validation.
|
Csv2Array |
Converts a string of comma separated values to an array.
|
Csv2Classify |
Command line tool for classifying a sequence of
instances directly from text input, without
creating an instance list.
|
Csv2FeatureVector |
Converts a string of the form
feature_1:val_1 feature_2:val_2 ...
|
Csv2Vectors |
Command line import tool for loading a sequence of
instances from a single file, with one instance
per line of the input file.
|
CsvIterator |
This iterator, perhaps more properly called a Line Pattern Iterator,
reads through a file and returns one instance per line,
based on a regular expression.
|
DBBulkLoader |
This class reads through two files (data and metadata),
tokenizing metadata for use as a label vector.
|
DBInstanceIterator |
|
DBInstanceStore |
|
DecisionTree |
Decision Tree classifier.
|
DecisionTree.Node |
|
DecisionTreeTrainer |
A decision tree learner, roughly ID3, but only to a fixed given depth in all branches.
|
DecisionTreeTrainer.Factory |
|
DefaultTokenizationFilter |
Created: Nov 12, 2004
|
DenseMatrix |
|
DenseVector |
|
Directory2FileIterator |
Convert a File object representing a directory into a FileIterator which
iterates over files in the directory matching a pattern and which extracts
a label from each file path to become the target field of the instance.
|
DirectoryFilter |
|
Dirichlet |
Various useful functions related to Dirichlet distributions.
|
Dirichlet.Estimator |
|
Dirichlet.MethodOfMomentsEstimator |
|
DMRCallable |
A parallel Dirichlet-multinomial regression topic model runnable task.
|
DMRInferencer |
|
DMRLoader |
This class loads data into the format for the MALLET
Dirichlet-multinomial regression (DMR).
|
DMROptimizable |
|
DMRTopicModel |
|
DocumentClassifier |
|
DocumentExtraction |
Created: Oct 12, 2004
|
DocumentLengths |
|
DocumentViewer |
Diagnosis class that outputs HTML pages that allows you to view errors on a more
global per-instance basis.
|
DoubleList |
|
DownsampleLabelWords |
This class implements the method from "Authorless Topic Models"
by Thompson and Mimno, COLING 2018.
|
Element |
|
EmptyInstanceIterator |
|
EnronMessage2TokenSequence |
|
EntropyLattice |
Runs subsequence constrained forward-backward to compute the entropy of label
sequences.
|
EuclideanDistance |
|
EvaluateTopics |
|
ExactMatchComparator |
Created: Nov 23, 2004
|
ExpGain |
|
ExpGain.Factory |
|
Extraction |
The results of doing information extraction.
|
ExtractionConfidenceEstimator |
Estimates the confidence in the labeling of a LabeledSpan.
|
ExtractionEvaluator |
Created: Oct 8, 2004
|
Extractor |
Generic interface for objects that do information extraction.
|
FeatureConjunction |
|
FeatureConjunction.List |
|
FeatureConstraintUtil |
Utility functions for creating feature constraints that can be used with GE training.
|
FeatureCooccurrenceCounter |
|
FeatureCounter |
Efficient, compact, incremental counting of features in an alphabet.
|
FeatureCountPipe |
Pruning low-count features can be a good way to save memory and computation.
|
FeatureCounts |
|
FeatureCounts.Factory |
|
FeatureCountTool |
|
FeatureDocFreqPipe |
Pruning low-count features can be a good way to save memory and computation.
|
FeatureInducer |
|
FeatureSelectingClassifierTrainer |
Adaptor for adding feature selection to a classifier trainer.
|
FeatureSelection |
|
FeatureSelector |
|
FeatureSequence |
An implementation of Sequence that ensures that every
Object in the sequence has the same class.
|
FeatureSequence2AugmentableFeatureVector |
Convert the data field from a feature sequence to an augmentable feature vector.
|
FeatureSequence2FeatureVector |
Convert the data field from a feature sequence to a feature vector.
|
FeatureSequenceConvolution |
|
FeatureSequenceWithBigrams |
A FeatureSequence with a parallel record of bigrams, kept in a separate dictionary
|
FeaturesInWindow |
|
FeaturesOfFirstMention |
|
FeatureTransducer |
|
FeatureValueString2FeatureVector |
|
FeatureVector |
A subset of an Alphabet in which each element of the subset has an associated value.
|
FeatureVectorConjunctions |
Include in the FeatureVector conjunctions of all its features.
|
FeatureVectorSequence |
|
FeatureVectorSequence2FeatureVectors |
Given instances with a FeatureVectorSequence in the data field, break up the sequence into
the individual FeatureVectors, producing one FeatureVector per Instance.
|
FeatureWindow |
Adds all features of tokens in the window to the center token.
|
Field |
Created: Oct 12, 2004
|
FieldCleaner |
Interface for functions that are used to clean up field values after
extraction has been performed.
|
FieldComparator |
Interface for functions that compares extracted values of a field to see
if they match.
|
FileIterator |
An iterator that generates instances from an initial
directory or set of directories.
|
FileListIterator |
An iterator that generates instances for a pipe from a list of filenames.
|
Filename2CharSequence |
Given a filename contained in a string, read in contents of file into a CharSequence.
|
FileUriIterator |
|
FileUtils |
Contains static utilities for manipulating files.
|
FilterEmptyFeatureVectors |
|
FirstOrderClusterExample |
Illustrates use of a supervised clustering method that uses
features over clusters.
|
FixedVocabTokenizer |
A simple unicode tokenizer that accepts sequences of letters
as tokens.
|
FSTConstraintUtil |
Expectation constraint utilities for fst package.
|
GainRatio |
List of features along with their thresholds sorted in descending order of
the ratio of (1) information gained by splitting instances on the
feature at its associated threshold value, to (2) the split information.
|
GammaAverageConfidenceEstimator |
Calculates the confidence in an extracted segment by taking the
average of P(s_i|o) for each state in the segment.
|
GammaProductConfidenceEstimator |
Calculates the confidence in an extracted segment by taking the
product of eP(s_i|o) for each state in the segment.
|
GEConstraint |
Interface for GE constraint that considers
either one or two states.
|
GELattice |
Runs the dynamic programming algorithm of [Mann and McCallum 08] for
computing the gradient of a Generalized Expectation constraint that
considers a single label of a linear chain CRF.
|
GradientAscent |
|
GradientBracketLineOptimizer |
|
GradientGain |
|
GradientGain.Factory |
|
Graph |
Framework for standard graph.
|
Graph2 |
Methods for a 2-D graph
|
GraphItem |
Holds data for a point on a graph
|
GreedyAgglomerative |
Greedily merges Instances until convergence.
|
GreedyAgglomerativeByDensity |
Greedily merges Instances until convergence.
|
HashedSparseVector |
|
HierarchicalLDA |
|
HierarchicalLDATUI |
|
HierarchicalPAM |
Hierarchical PAM, where each node in the DAG has a distribution over all topics on the
next level and one additional "node-specific" topic.
|
HierarchicalTokenizationFilter |
Tokenization filter that will create nested spans based on a hierarchical labeling of the data.
|
HillClimbingClusterer |
A Clusterer that iteratively improves a predicted Clustering using
a NeighborEvaluator .
|
HMM |
A Hidden Markov Model.
|
HMM.State |
|
HMM.TransitionIterator |
|
HMMTrainerByLikelihood |
|
IDSorter |
This class is contains a comparator for use in sorting
integers that have associated floating point values.
|
IndexedSparseVector |
|
InferTopics |
|
InfiniteDistance |
|
InfoGain |
|
InfoGain.Factory |
|
Input2CharSequence |
Pipe that can read from various kinds of text sources
(either URI, File, or Reader) into a CharSequence
|
Instance |
A machine learning "example" to be used in training, testing or
performance of various machine learning algorithms.
|
InstanceAccuracyEvaluator |
Reports the percentage of instances for which the entire predicted sequence was
correct.
|
InstanceList |
A list of machine learning instances, typically used for training
or testing of a machine learning algorithm.
|
InstanceListPrinter |
|
InstanceListTrimFeaturesByCount |
Unimplemented.
|
InstanceListTUI |
|
InstanceWithConfidence |
Helper class to store confidence of an Instance.
|
InvalidOptimizableException |
Exception thrown by optimization algorithms, when the problem is usually
due to a problem with the given Maximizable instance.
|
InvertedIndex |
|
IoUtils |
|
IsolatedSegmentTransducerCorrector |
|
JSONTopicReports |
|
KBestClusterer |
Return the K best predicted Clusterings
|
KLGain |
|
KMeans |
KMeans Clusterer
Clusters the points into k clusters by minimizing the total intra-cluster
variance.
|
Label |
|
LabelAlphabet |
A mapping from arbitrary objects (usually String's) to integers
(and corresponding Label objects) and back.
|
LabelDistributionEvaluator |
Prints predicted and true label distribution.
|
LabeledLDA |
LabeledLDA
|
LabeledSpan |
Created: Oct 12, 2004
|
LabeledSpans |
Created: Oct 31, 2004
|
Labeler |
|
Labeling |
A distribution over possible labels for an instance.
|
Labelings |
A collection of labelings, either for a multi-label problem (all
labels are part of the same label dictionary), or a factorized
labeling, (each label is part of a different dictionary).
|
Labels |
Usually some distribution over possible labels for an instance.
|
LabelSequence |
|
LabelsSequence |
A simple Sequence implementation where all of the
elements must be Labels.
|
LabelVector |
|
LatticeViewer |
Created: Oct 31, 2004
|
LDA |
Deprecated.
|
LDAHyper |
Deprecated.
|
LDAStream |
|
LeastSquares |
|
LengthBins |
A feature approximating string length.
|
Lexer |
|
LexiconMembership |
|
LimitedMemoryBFGS |
|
LinearRegression |
|
LinearRegressionTrainer |
|
LineGroupIterator |
Iterate over groups of lines of text, separated by lines that
match a regular expression.
|
LineGroupString2TokenSequence |
|
LineIterator |
|
LineOptimizer |
Optimize, constrained to move parameters along the direction of a specified line.
|
LineOptimizer.ByGradient |
|
ListMember |
Checks membership in a lexicon in a text file.
|
LogNumber |
|
LongRegexMatches |
Matches a regular expression which spans several tokens.
|
MakeAmpersandXMLFriendly |
convert & to & in tokens of a token sequence
|
MalletLogger |
|
MalletProgressMessageLogger |
Created by IntelliJ IDEA.
|
ManhattenDistance |
|
MarginalProbEstimator |
An implementation of topic model marginal probability estimators
presented in Wallach et al., "Evaluation Methods for Topic Models", ICML (2009)
|
Maths |
|
Matrix |
|
Matrixn |
Implementation of Matrix that allows arbitrary
number of dimensions.
|
MatrixOps |
A class of static utility functions for manipulating arrays of
double.
|
MaxEnt |
Maximum Entropy (AKA Multivariate Logistic Regression) classifier.
|
MaxEntConfidenceEstimator |
Estimates the confidence of a Segment extracted by a Transducer using a MaxEnt classifier to classify segments
as "correct" or "incorrect." xxx needs some interface work
|
MaxEntFLGEConstraints |
Abstract expectation constraint for use with Generalized Expectation (GE).
|
MaxEntFLPRConstraints |
Abstract expectation constraint for use with Posterior Regularization (PR).
|
MaxEntGEConstraint |
Interface for expectation constraints for use with Generalized Expectation (GE).
|
MaxEntGERangeTrainer |
Training of MaxEnt models with labeled features using
Generalized Expectation Criteria.
|
MaxEntGETrainer |
Training of MaxEnt models with labeled features using
Generalized Expectation Criteria.
|
MaxEntKLFLGEConstraints |
Expectation constraint for use with GE.
|
MaxEntL1Trainer |
|
MaxEntL2FLGEConstraints |
Expectation constraint for use with GE.
|
MaxEntL2FLPRConstraints |
Expectation constraint for use with Posterior Regularization (PR).
|
MaxEntOptimizableByGE |
|
MaxEntOptimizableByLabelDistribution |
|
MaxEntOptimizableByLabelLikelihood |
|
MaxEntPRConstraint |
Interface for expectation constraints for use with Posterior Regularization (PR).
|
MaxEntPRTrainer |
Penalty (soft) version of Posterior Regularization (PR) for training MaxEnt.
|
MaxEntRangeL2FLGEConstraints |
Expectation constraint for use with GE.
|
MaxEntSequenceConfidenceEstimator |
Estimates the confidence of a Sequence extracted by a Transducer using a MaxEnt classifier to classify Sequences
as "correct" or "incorrect." xxx needs some interface work.
|
MaxEntShell |
Simple wrapper for training a MALLET maxent classifier.
|
MaxEntTrainer |
The trainer for a Maximum Entropy classifier.
|
MaxLattice |
The interface to classes implementing the Viterbi algorithm,
finding the best sequence of states for a given input sequence.
|
MaxLatticeDefault |
Default, full dynamic programming version of the Viterbi "Max-(Product)-Lattice" algorithm.
|
MaxLatticeDefault.Factory |
|
MaxLatticeFactory |
|
MCMaxEnt |
Maximum Entropy classifier.
|
MCMaxEntTrainer |
The trainer for a Maximum Entropy classifier.
|
MedoidEvaluator |
|
MedoidEvaluator.Average |
|
MedoidEvaluator.CombiningStrategy |
Specifies how to combine a set of pairwise scores into a
cluster-wise score.
|
MedoidEvaluator.Maximum |
|
MedoidEvaluator.Minimum |
|
MEMM |
A Maximum Entropy Markov Model.
|
MEMM.State |
|
MEMM.TransitionIterator |
|
MEMMTrainer |
Trains and evaluates a MEMM .
|
MergeCallable |
This task copies topic-word counts from a global array to a local
thread-specific array.
|
Metric |
|
MinHeap |
Created by IntelliJ IDEA.
|
Minkowski |
|
MinSegmentConfidenceEstimator |
Estimates the confidence of an entire sequence by the least
confidence segment.
|
MostFrequentClassAssignmentTrainer |
A Classifier Trainer to be used with MostFrequentClassifier.
|
MostFrequentClassifier |
A Classifier that will return the most frequent class label based on a training set.
|
MUCEvaluator |
Evaluate a Clustering using the MUC evaluation metric.
|
MultFileToSequences |
|
MultiInstanceList |
An implementation of InstanceList that logically combines multiple instance
lists so that they appear as one list without copying the original lists.
|
Multinomial |
A probability distribution over a set of features represented as a FeatureVector .
|
Multinomial.Estimator |
A hierarchy of classes used to produce estimates of probabilities, in
the form of a Multinomial, from counts associated with the elements
of an Alphabet.
|
Multinomial.LaplaceEstimator |
An MEstimator with m set to 1.
|
Multinomial.Logged |
A Multinomial in which the values associated with each feature index fi is
Math.log(probability[fi]) instead of probability[fi].
|
Multinomial.MAPEstimator |
Unimplemented, but the MEstimators are.
|
Multinomial.MEstimator |
An Estimator in which probability estimates in a Multinomial
are generated by adding a constant m (specified at construction time)
to each count before dividing by the total of the m-biased counts.
|
Multinomial.MLEstimator |
An MEstimator with m set to 0.
|
MultinomialHMM |
Latent Dirichlet Allocation.
|
MultiSegmentationEvaluator |
Evaluates a transducer model, computes the precision, recall and F1 scores;
considers segments that span across multiple tokens.
|
MVNormal |
Tools for working with multivariate normal distributions
|
NaiveBayes |
A classifier that classifies instances according to the NaiveBayes method.
|
NaiveBayesEMTrainer |
|
NaiveBayesTrainer |
Class used to generate a NaiveBayes classifier from a set of training data.
|
NaiveBayesTrainer.Factory |
|
NBestViterbiConfidenceEstimator |
Estimates the confidence of an entire sequence by the probability
that one of the the Viterbi paths rank 2->N is correct.
|
Neighbor |
A Clustering and a modified version of that Clustering.
|
NeighborEvaluator |
Scores the value of changing the current Clustering to the
modified Clustering specified in a Neighbor object.
|
NeighborIterator |
Sample Instances with data objects equal to Neighbor s.
|
NEPipes |
|
NGramPreprocessor |
This pipe changes text to lowercase, removes common XML entities (quot, apos, lt, gt), and replaces all punctuation
except the - character with whitespace.
|
NodeClusterSampleIterator |
Samples merges of a singleton cluster with another (possibly
non-singleton) cluster.
|
NonNegativeMatrixFactorization |
|
Noop |
A pipe that does nothing to the instance fields but which has side effects on the dictionary.
|
NoopTransducerTrainer |
A TransducerTrainer that does no training, but simply acts as a container for a Transducer;
for use in situations that require a TransducerTrainer, such as the TransducerEvaluator methods.
|
NormalizedDotProductMetric |
Computes
1 - [ / sqrt (*)]
aka 1 - cosine similarity
|
NPTopicModel |
A non-parametric topic model that uses the "minimal path" assumption
to reduce bookkeeping.
|
NullLabel |
Object that carries a LabelAlphabet.
|
OffsetConjunctions |
|
OffsetFeatureConjunction |
|
OffsetPropertyConjunctions |
|
OneLabelGEConstraints |
A set of constraints on distributions over single
labels conditioned on the presence of input features.
|
OneLabelKLGEConstraints |
A set of constraints on distributions over consecutive
labels conditioned an input features.
|
OneLabelL2GEConstraints |
A set of constraints on distributions over consecutive
labels conditioned an input features.
|
OneLabelL2IndPRConstraints |
A set of constraints on individual input feature label pairs.
|
OneLabelL2PRConstraints |
A set of constraints on distributions over single
labels conditioned on the presence of input features.
|
OneLabelL2RangeGEConstraints |
A set of constraints on individual input feature label pairs.
|
Optimizable |
|
Optimizable.ByBatchGradient |
|
Optimizable.ByCombiningBatchGradient |
|
Optimizable.ByGISUpdate |
|
Optimizable.ByGradient |
|
Optimizable.ByGradientValue |
|
Optimizable.ByHessian |
|
Optimizable.ByValue |
|
Optimizable.ByVotedPerceptron |
|
OptimizableCollection |
|
OptimizationException |
General exception thrown by optimization algorithms when there
is an optimization-specific problem.
|
Optimizer |
|
Optimizer.ByBatches |
Deprecated. |
OptimizerEvaluator |
Callback interface that allows optimizer clients to perform some operation after every iteration.
|
OptimizerEvaluator.ByBatchGradient |
|
OptimizerEvaluator.ByGradient |
|
OrthantWiseLimitedMemoryBFGS |
Implementation of orthant-wise limited memory quasi Newton method for
optimizing convex L1-regularized objectives.
|
PagedInstanceList |
An InstanceList which avoids OutOfMemoryErrors by saving Instances
to disk when there is not enough memory to create a new
Instance.
|
PairF1Evaluator |
Evaluates two clustering using pairwise comparisons.
|
PairSampleIterator |
Sample pairs of Instances.
|
PairwiseEvaluator |
|
PairwiseEvaluator.Average |
|
PairwiseEvaluator.CombiningStrategy |
Specifies how to combine a set of pairwise scores into a
cluster-wise score.
|
PairwiseEvaluator.Maximum |
|
PairwiseEvaluator.Minimum |
|
PairwiseMatrix |
2-D upper-triangular matrix.
|
PairwiseScorer |
For each pair of Instances, if the pair is predicted to be in the same
cluster, increment the total by the evaluator's score for merging the two.
|
PAM4L |
Four Level Pachinko Allocation with MLE learning,
based on Andrew's Latent Dirichlet Allocation.
|
ParallelTopicModel |
Simple parallel threaded implementation of LDA,
following Newman, Asuncion, Smyth and Welling, Distributed Algorithms for Topic Models
JMLR (2009), with SparseLDA sampling scheme and data structure from
Yao, Mimno and McCallum, Efficient Methods for Topic Model Inference on Streaming Document Collections, KDD (2009).
|
ParenGroupIterator |
Iterator that takes a Reader, breaks up the input into
top-level parenthesized expressions.
|
PartiallyRankedFeatureVector |
|
PartiallyRankedFeatureVector.Factory |
|
PartiallyRankedFeatureVector.PerLabelFactory |
|
PatternMatchIterator |
Iterates over matching regular expresions.
|
PerClassAccuracyEvaluator |
Determines the precision, recall and F1 on a per-class basis.
|
PerDocumentF1Evaluator |
Created: Oct 8, 2004
|
PerFieldF1Evaluator |
Created: Oct 8, 2004
|
PerLabelFeatureCounts |
|
PerLabelFeatureCounts.Factory |
|
PerLabelInfoGain |
|
PerLabelInfoGain.Factory |
|
Pipe |
The abstract superclass of all Pipes, which transform one data type to another.
|
PipedInstanceWithConfidence |
Helper class to store confidence of an Instance.
|
PipeException |
|
PipeExtendedIterator |
Deprecated. |
PipeInputIterator |
Deprecated. |
PipeUtils |
Created: Aug 28, 2005
|
PlainLogFormatter |
|
PolylingualTopicModel |
Latent Dirichlet Allocation for loosely parallel corpora in arbitrary languages
|
PRAuxClassifier |
Auxiliary model (q) for E-step/I-projection in PR training.
|
PRAuxClassifierOptimizable |
Optimizable for training auxiliary model (q) for E-step/I-projection in PR training.
|
PRAuxiliaryModel |
Auxiliar model (q) for E-step/I-projection in Posterior Regularization (PR).
|
PRConstraint |
Interface for PR constraint that considers
either one or two states.
|
PrintInput |
Print the data field of each instance.
|
PrintInputAndTarget |
Print the data and target fields of each instance.
|
PrintTokenSequenceFeatures |
Print properties of the token sequence in the data field and the corresponding value
of any token in a token sequence or feature in a featur sequence in the target field.
|
PrintUtilities |
A simple utility class that lets you very simply print
an arbitrary component.
|
PriorityQueue |
Created by IntelliJ IDEA.
|
ProgressMessageLogFormatter |
Format ProgressMessages destined for screen.
|
ProgressMessageLogRecord |
A log message that is to be written in place (no newline)
if the message is headed for the user's terminal.
|
PropertyHolder |
Author: saunders Created Nov 15, 2005 Copyright (C) Univ.
|
PropertyList |
|
PunctuationIgnoringComparator |
Created: Nov 23, 2004
|
QBCSequenceConfidenceEstimator |
Estimates the confidence of an entire sequence by the
"disagreement" among a committee of CRFs.
|
QR |
|
QueueElement |
Created by IntelliJ IDEA.
|
RandomAssignmentTrainer |
A Classifier Trainer to be used with RandomClassifier.
|
RandomClassifier |
A Classifier that will return a randomly selected class label.
|
RandomConfidenceEstimator |
Randomly assigns values between 0-1 to the confidence of a Segment .
|
RandomEvaluator |
|
RandomFeatureVectorIterator |
|
Randoms |
|
RandomSequenceConfidenceEstimator |
Estimates the confidence of an entire sequence randomly.
|
RandomTokenSequenceIterator |
|
RankedFeatureVector |
|
RankedFeatureVector.Factory |
|
RankedFeatureVector.PerLabelFactory |
|
RankingNeighborEvaluator |
|
RankMaxEnt |
Rank Maximum Entropy classifier.
|
RankMaxEntTrainer |
|
Record |
|
Record |
Created: Oct 12, 2004
|
RegexFieldCleaner |
A field cleaner that removes all occurrences of a given regex.
|
RegexFileFilter |
|
RegexMatches |
|
Regression |
|
RegressionImporter |
Load data suitable for linear and Poisson regression
|
Replacement |
|
Replacer |
This class replaces ngrams as specified in the configuration files.
|
ROCData |
Tracks ROC data for instances in Trial results.
|
RTopicModel |
A wrapper for a topic model to be used from the R statistical package through rJava.
|
RunCRFPipe |
|
SaveDataInSource |
Set the source field of each instance to its data field.
|
SearchNode |
Created by IntelliJ IDEA.
|
SearchState |
Created by IntelliJ IDEA.
|
SearchState.NextStateIterator |
Iterator over the states with transitions from a given state.
|
Segment |
Represents a labelled chunk of a Sequence segmented by a
Transducer , usually corresponding to some object extracted
from an input Sequence .
|
SegmentationEvaluator |
|
SegmentIterator |
|
SegmentProductConfidenceEstimator |
Estimates the confidence of an entire sequence by combining the
output of a segment confidence estimator for each segment.
|
SelectiveFileLineIterator |
Very similar to the SimpleFileLineIterator,
but skips lines that match a regular expression.
|
SelectiveSGML2TokenSequence |
Similar to SGML2TokenSequence , except that only the tags
listed in allowedTags are converted to Label s.
|
SelfTransitionGEConstraint |
GE Constraint on the probability of self-transitions in the FST.
|
Sequence<E> |
|
SequenceConfidenceInstance |
Stores a Sequence and a PropertyList, used when extracting
features from a Sequence in a pipe for confidence prediction
|
SequencePair<I,O> |
|
SequencePairAlignment<I,O> |
|
SequencePrintingPipe |
Created: Jul 6, 2005
|
Sequences |
Utility methods for cc.mallet.types.Sequence and similar classes.
|
SerialPipes |
Convert an instance through a sequence of pipes.
|
SGML2TokenSequence |
Converts a string containing simple SGML tags into a dta TokenSequence of words,
paired with a target TokenSequence containing the SGML tags in effect for each word.
|
ShallowTransducerTrainer |
Deprecated.
|
SimpleFileLineIterator |
|
SimpleLDA |
A simple implementation of Latent Dirichlet Allocation using Gibbs sampling.
|
SimpleTagger |
This class's main method trains, tests, or runs a generic CRF-based
sequence tagger.
|
SimpleTagger.SimpleTaggerSentence2FeatureVectorSequence |
|
SimpleTaggerSentence2StringTokenization |
|
SimpleTaggerSentence2TokenSequence |
Converts an external encoding of a sequence of elements with binary
features to a TokenSequence .
|
SimpleTaggerWithConstraints |
Version of SimpleTagger that trains CRFs with expectation constraints
rather than labeled data.
|
SimpleTokenizer |
A simple unicode tokenizer that accepts sequences of letters
as tokens.
|
SingleInstanceIterator |
|
SourceLocation2TokenSequence |
Read from File or BufferedRead in the data field and produce a TokenSequence.
|
Span |
A sub-section of a document, either linear or two-dimensional.
|
SparseMatrixn |
Implementation of Matrix that allows arbitrary
number of dimensions.
|
SparseVector |
A vector that allocates memory only for non-zero values.
|
StateLabelMap |
Maps states in the lattice to labels.
|
StateToInstances |
Sometimes you have a topic sampling state, but not the original
instance list file.
|
StatFunctions |
|
StochasticMetaAscent |
|
StringAddNewLineDelimiter |
Pipe that can adds special text between lines to explicitly
represent line breaks.
|
StringArrayIterator |
|
StringEditFeatureVectorSequence |
|
StringEditVector |
|
StringIterator |
Java implementation of Jonathan Wood's "Text Parsing Helper Class".
|
StringKernel |
Computes a similarity metric between two strings, based on counts
of common subsequences of characters.
|
StringList2FeatureSequence |
Convert a list of strings into a feature sequence
|
Strings |
Static utility methods for Strings
|
StringSpan |
A sub-section of a linear string.
|
StringTokenization |
|
SumLattice |
Interface to perform forward-backward during training of a transducer.
|
SumLatticeBeam |
|
SumLatticeBeam.Factory |
|
SumLatticeConstrained |
|
SumLatticeDefault |
Default, full dynamic programming implementation of the Forward-Backward "Sum-(Product)-Lattice" algorithm
|
SumLatticeDefault.Factory |
|
SumLatticeDefaultCachedDot |
Default, full dynamic programming implementation of the Forward-Backward "Sum-(Product)-Lattice" algorithm
|
SumLatticeFactory |
Provides factory methods to create inference engine for training a transducer.
|
SumLatticeKL |
Lattice for M-step/M-projection in PR.
|
SumLatticePR |
Lattice for E-step/I-projection in PR.
|
SumLatticeScaling |
|
SumLatticeScaling.Factory |
|
SVD |
|
SvmLight2Classify |
Command line tool for classifying a sequence of instances directly from text
input, without creating an instance list.
|
SvmLight2FeatureVectorAndLabel |
This Pipe converts a line in SVMLight format to
a Mallet instance with FeatureVector data and
Label target.
|
SvmLight2Vectors |
Command line import tool for loading a sequence of
instances from an SVMLight feature-value pair file, with one instance
per line of the input file.
|
Target2BIOFormat |
|
Target2Double |
Convert object in the target field into a floating-point numeric type
|
Target2FeatureSequence |
Convert a token sequence in the target field into a feature sequence in the target field.
|
Target2Integer |
Convert object in the target field into an integer numeric type
|
Target2Label |
Convert object in the target field into a label in the target field.
|
Target2LabelSequence |
convert a token sequence in the target field into a label sequence in the target field.
|
TargetRememberLastLabel |
For each position in the target, remember the last non-background
label.
|
TargetStringToFeatures |
|
TestAlphabet |
Created: Nov 24, 2004
|
TestAStar |
Created by IntelliJ IDEA.
|
TestAugmentableFeatureVector |
Created: Dec 30, 2004
|
TestBiNormalSeparation |
|
TestCharSequenceNoDiacritics |
|
TestCharSequenceReplaceHtmlEntities |
|
TestClassifiers |
|
TestClusteringEvaluators |
Examples drawn from Luo, "On Coreference Resolution Performance
Metrics", HLT 2005.
|
TestCRF |
Tests for CRF training.
|
TestCRF.TestCRF2String |
|
TestCRF.TestCRFTokenSequenceRemoveSpaces |
|
TestDocumentExtraction |
Created: Oct 12, 2004
|
TestFeatureSequence |
|
TestFeatureTransducer |
|
TestFeatureVector |
|
TestHashedSparseVector |
|
TestIndexedSparseVector |
|
TestInstanceList |
|
TestInstanceListWeights |
|
TestInstancePipe |
|
TestInstancePipe.Array2ArrayIterator |
|
TestIterators |
|
TestIterators |
Unit Test for PipeInputIterators
Created: Thu Feb 26 14:27:15 2004
|
TestLabelAlphabet |
Created: Nov 24, 2004
|
TestLabelsSequence |
Created: Sep 21, 2004
|
TestLabelVector |
|
TestMaths |
Created: Oct 31, 2004
|
TestMatrix |
|
TestMatrixn |
Created: Aug 30, 2004
|
TestMatrixOps |
|
TestMaxEntTrainer |
|
TestMEMM |
Tests for MEMM training.
|
TestMEMM.TestMEMMTokenSequenceRemoveSpaces |
|
TestMultinomial |
|
TestNaiveBayes |
|
TestOffsetConjunctions |
|
TestOffsetFeatureConjunctions |
$Id: TestOffsetFeatureConjunctions.java,v 1.1 2007/10/22 21:37:57 mccallum Exp $
|
TestOptimizable |
Contains static methods for testing subclasses of
Maximizable and Maximizable.ByGradient.
|
TestOptimizer |
Unit Test for class TestMaximizer.java
Created: Mon Apr 26 19:54:25 2004
|
TestPagedInstanceList |
Created: Apr 19, 2005
|
TestPatternMatchIterator |
|
TestPerDocumentF1Evaluator |
Created: Nov 18, 2004
|
TestPipeUtils |
Created: Aug 28, 2005
|
TestPriorityQueue |
Created by IntelliJ IDEA.
|
TestPropertyList |
|
TestQR |
|
TestRandom |
Created: Jan 19, 2005
|
TestRankedFeatureVector |
|
TestSequencePrintingPipe |
Created: Jul 8, 2005
|
TestSerializable |
Static utility for testing serializable classes in MALLET.
|
TestSGML2TokenSequence |
|
TestSGML2TokenSequence.Array2ArrayIterator |
|
TestSpacePipe |
Unit Test for class TestSpacePipe.java
Created: Thu Feb 26 14:56:55 2004
|
TestSparseMatrixn |
Created: Aug 30, 2004
|
TestSparseVector |
|
TestStaticParameters |
|
TestStaticParameters.Factory |
|
TestStringIterator |
|
TestStrings |
Created: Jan 19, 2005
|
TestSumNegLogProb2 |
|
TestToken |
|
TestTokenSequence2PorterStems |
|
Text |
|
Text2Classify |
Command line tool for classifying a sequence of
instances directly from text input, without
creating an instance list.
|
Text2Clusterings |
|
Text2Vectors |
Convert document files into vectors (a persistent instance list).
|
ThreadedOptimizable |
An adaptor for optimizables based on batch values/gradients.
|
Timing |
A class for timing things.
|
Token |
A representation of a piece of text, usually a single word, to
which we can attach properties.
|
Token2FeatureVector |
convert the property list on a token into a feature vector
|
TokenAccuracyEvaluator |
Evaluates a transducer model based on predictions of individual tokens.
|
TokenFirstPosition |
|
Tokenization |
|
TokenizationFilter |
Created: Nov 12, 2004
|
TokenSequence |
A representation of a piece of text, usually a single word, to which we can attach properties.
|
TokenSequence2FeatureSequence |
Convert the token sequence in the data field each instance to a feature sequence.
|
TokenSequence2FeatureSequenceWithBigrams |
Convert the token sequence in the data field of each instance to a feature sequence that
preserves bigram information.
|
TokenSequence2FeatureVectorSequence |
Convert the token sequence in the data field of each instance to a feature vector sequence.
|
TokenSequence2PorterStems |
|
TokenSequence2Tokenization |
Heuristically converts a simple token sequence into a Tokenization
that can be used with all the extract package goodies.
|
TokenSequenceDocHeader |
|
TokenSequenceLowercase |
Convert the text in each token in the token sequence in the data field to lower case.
|
TokenSequenceMatchDataAndTarget |
Run a regular expression over the text of each token; replace the
text with the substring matching one regex group; create a target
TokenSequence from the text matching another regex group.
|
TokenSequenceNGrams |
Convert the token sequence in the data field to a token sequence of ngrams.
|
TokenSequenceParseFeatureString |
Convert the string in each field Token.text to a list
of Strings (space delimited).
|
TokenSequenceRemoveNonAlpha |
Remove tokens that contain non-alphabetic characters.
|
TokenSequenceRemoveStopPatterns |
Remove tokens from the token sequence in the data field whose text matches any of a set of regular expressions.
|
TokenSequenceRemoveStopwords |
Remove tokens from the token sequence in the data field whose text is in the stopword list.
|
TokenText |
|
TokenTextCharNGrams |
|
TokenTextCharPrefix |
|
TokenTextCharSuffix |
|
TokenTextNGrams |
|
TopicalNGrams |
Like Latent Dirichlet Allocation, but with integrated phrase discovery.
|
TopicAssignment |
This class combines a sequence of observed features
with a sequence of hidden "labels".
|
TopicInferencer |
|
TopicModel |
|
TopicModelDiagnostics |
|
TopicReports |
|
TopicTrainer |
Create a simple LDA topic model, with some reporting options.
|
TrainCRF |
|
TrainHMM |
|
Transducer |
A base class for all sequence models, analogous to classify.Classifier .
|
Transducer.Incrementor |
Methods to be called by inference methods to indicate partial counts of sufficient statistics.
|
Transducer.State |
An abstract class used to represent the states of the transducer.
|
Transducer.TransitionIterator |
An abstract class to iterate over the states of the transducer.
|
TransducerConfidenceEstimator |
|
TransducerCorrector |
Interface for transducerCorrectors, which correct a subset of the
Segment s produced by a Transducer .
|
TransducerEvaluator |
An abstract class to evaluate a transducer model.
|
TransducerExtractionConfidenceEstimator |
Estimates the confidence in the labeling of a LabeledSpan using a
TransducerConfidenceEstimator.
|
TransducerSequenceConfidenceEstimator |
|
TransducerTrainer |
An abstract class to train and evaluate a transducer model.
|
TransducerTrainer.ByIncrements |
|
TransducerTrainer.ByInstanceIncrements |
|
TransducerTrainer.ByOptimization |
|
Trial |
Stores the results of classifying a collection of Instances,
and provides many methods for evaluating the results.
|
TrieLexiconMembership |
|
TUI |
|
TUI |
|
TwoLabelGEConstraints |
A set of constraints on distributions over pairs of consecutive
labels conditioned on the presence of input features.
|
TwoLabelKLGEConstraints |
A set of constraints on distributions over consecutive
labels conditioned an input features.
|
TwoLabelL2GEConstraints |
A set of constraints on distributions over consecutive
labels conditioned an input features.
|
Univariate |
|
UnlabeledFileIterator |
An iterator that generates instances from an initial
directory or set of directories.
|
UriUtils |
|
ValueString2FeatureVector |
|
Vector |
Deprecated. |
Vectors2Classify |
Classify documents, run trials, print statistics from a vector file.
|
Vectors2FeatureConstraints |
Create "feature constraints" from data for use in GE training.
|
Vectors2Info |
Diagnostic facilities for a vector file.
|
Vectors2Topics |
Perform topic analysis in the style of LDA and its variants.
|
Vectors2Vectors |
A command-line tool for manipulating InstanceLists.
|
VectorStats |
|
ViterbiConfidenceEstimator |
Estimates the confidence of an entire sequence by the probability
of the Viterbi path normalized by the probabliity of the entire
lattice.
|
ViterbiRatioConfidenceEstimator |
Estimates the confidence of an entire sequence by the ration of the
probabilities of the first and second best Viterbi paths.
|
ViterbiWriter |
Prints the input instances along with the features and the true and
predicted labels to a file.
|
WeightedTopicModel |
|
Winnow |
Classification methods of Winnow2 algorithm.
|
WinnowTrainer |
An implementation of the training methods of a
Winnow2 on-line classifier.
|
WordEmbeddingCallable |
|
WordEmbeddings |
|
WordTransformation |
|
WordVectors |
|
WorkerCallable |
A parallel topic model callable task.
|
WorkerRunnable |
A parallel topic model runnable task.
|