cc.mallet.types (Mallet 2 API)

Fundamental MALLET types, including FeatureVector, Instance, Label etc.

Interface Summary
Interface	Description
AlphabetCarrying	An interface for objects that contain one or more Alphabets.
CachedMetric	Stores a hash for each object being compared for efficient computation.
ConstantMatrix
Labeler
Labeling	A distribution over possible labels for an instance.
Matrix
Metric
PartiallyRankedFeatureVector.Factory
PartiallyRankedFeatureVector.PerLabelFactory
PropertyHolder	Author: saunders Created Nov 15, 2005 Copyright (C) Univ.
RankedFeatureVector.Factory
RankedFeatureVector.PerLabelFactory
Sequence<E>
Vector	Deprecated.

Class Summary
Class	Description
Alphabet	A mapping between integers and objects where the mapping in each direction is efficient.
AlphabetFactory
ArrayListSequence<E>
ArraySequence<E>
AugmentableFeatureVector
BiNormalSeparation	Bi-Normal Separation is a feature weighting algorithm introduced in: An Extensive Empirical Study of Feature Selection Metrics for Text Classification, George Forman, Journal of Machine Learning Research, 3:1289--1305, 2003.
BiNormalSeparation.Factory	Factory class.
ChainedInstanceIterator	Deprecated.
CrossValidationIterator	An iterator which splits an `InstanceList` into n-folds and iterates over the folds for use in n-fold cross-validation.
DenseMatrix
DenseVector
Dirichlet	Various useful functions related to Dirichlet distributions.
Dirichlet.Estimator
Dirichlet.MethodOfMomentsEstimator
EuclideanDistance
ExpGain
ExpGain.Factory
FeatureConjunction
FeatureConjunction.List
FeatureCounter	Efficient, compact, incremental counting of features in an alphabet.
FeatureCounts
FeatureCounts.Factory
FeatureInducer
FeatureSelection
FeatureSelector
FeatureSequence	An implementation of `Sequence` that ensures that every Object in the sequence has the same class.
FeatureSequenceWithBigrams	A FeatureSequence with a parallel record of bigrams, kept in a separate dictionary
FeatureVector	A subset of an `Alphabet` in which each element of the subset has an associated value.
FeatureVectorSequence
GainRatio	List of features along with their thresholds sorted in descending order of the ratio of (1) information gained by splitting instances on the feature at its associated threshold value, to (2) the split information.
GradientGain
GradientGain.Factory
HashedSparseVector
IDSorter	This class is contains a comparator for use in sorting integers that have associated floating point values.
IndexedSparseVector
InfiniteDistance
InfoGain
InfoGain.Factory
Instance	A machine learning "example" to be used in training, testing or performance of various machine learning algorithms.
InstanceList	A list of machine learning instances, typically used for training or testing of a machine learning algorithm.
InstanceListTUI
InvertedIndex
KLGain
Label
LabelAlphabet	A mapping from arbitrary objects (usually String's) to integers (and corresponding Label objects) and back.
Labelings	A collection of labelings, either for a multi-label problem (all labels are part of the same label dictionary), or a factorized labeling, (each label is part of a different dictionary).
Labels	Usually some distribution over possible labels for an instance.
LabelSequence
LabelsSequence	A simple `Sequence` implementation where all of the elements must be Labels.
LabelVector
LogNumber
ManhattenDistance
Matrixn	Implementation of Matrix that allows arbitrary number of dimensions.
MatrixOps	A class of static utility functions for manipulating arrays of double.
Minkowski
MultiInstanceList	An implementation of InstanceList that logically combines multiple instance lists so that they appear as one list without copying the original lists.
Multinomial	A probability distribution over a set of features represented as a `FeatureVector`.
Multinomial.Estimator	A hierarchy of classes used to produce estimates of probabilities, in the form of a Multinomial, from counts associated with the elements of an Alphabet.
Multinomial.LaplaceEstimator	An MEstimator with m set to 1.
Multinomial.Logged	A Multinomial in which the values associated with each feature index fi is Math.log(probability[fi]) instead of probability[fi].
Multinomial.MAPEstimator	Unimplemented, but the MEstimators are.
Multinomial.MEstimator	An Estimator in which probability estimates in a Multinomial are generated by adding a constant m (specified at construction time) to each count before dividing by the total of the m-biased counts.
Multinomial.MLEstimator	An MEstimator with m set to 0.
NormalizedDotProductMetric	Computes 1 - [ / sqrt (*)] aka 1 - cosine similarity
NullLabel	Object that carries a LabelAlphabet.
PagedInstanceList	An InstanceList which avoids OutOfMemoryErrors by saving Instances to disk when there is not enough memory to create a new Instance.
PartiallyRankedFeatureVector
PerLabelFeatureCounts
PerLabelFeatureCounts.Factory
PerLabelInfoGain
PerLabelInfoGain.Factory
RankedFeatureVector
ROCData	Tracks ROC data for instances in `Trial` results.
SequencePair<I,O>
SequencePairAlignment<I,O>
SingleInstanceIterator
SparseMatrixn	Implementation of Matrix that allows arbitrary number of dimensions.
SparseVector	A vector that allocates memory only for non-zero values.
StringEditFeatureVectorSequence
StringEditVector
StringKernel	Computes a similarity metric between two strings, based on counts of common subsequences of characters.
Token	A representation of a piece of text, usually a single word, to which we can attach properties.
TokenSequence	A representation of a piece of text, usually a single word, to which we can attach properties.

Package cc.mallet.types