AddClassifierTokenPredictions |
This pipe uses a Classifier to label each token (i.e., using 0-th order Markov assumption),
then adds the predictions as features to each token.
|
AddClassifierTokenPredictions.TokenClassifiers |
This inner class represents the trained token classifiers.
|
Array2FeatureVector |
Converts a Java array of numerical types to a FeatureVector, where the
Alphabet is the data array index wrapped in an Integer object.
|
AugmentableFeatureVectorAddConjunctions |
Add specified conjunctions to each instance.
|
AugmentableFeatureVectorLogScale |
Given an AugmentableFeatureVector, set those values greater than
or equal to 1 to log(value)+1.
|
BranchingPipe |
Deprecated. |
CharSequence2CharNGrams |
Transform a character sequence into a token sequence of character N grams.
|
CharSequence2TokenSequence |
Pipe that tokenizes a character sequence.
|
CharSequenceArray2TokenSequence |
Transform an array of character Sequences into a token sequence.
|
CharSequenceLowercase |
Replace the data string or string buffer with a lowercased version.
|
CharSequenceNoDiacritics |
A string normalizer which performs the following steps:
Unicode canonical decomposition (Form#NFD )
Removal of diacritical marks
Unicode canonical composition (Form#NFC )
|
CharSequenceRemoveHTML |
This pipe removes HTML from a CharSequence.
|
CharSequenceRemoveUUEncodedBlocks |
|
CharSequenceReplace |
Given a string, repeatedly look for matches of the regex, and
replace the entire match with the given replacement string.
|
CharSequenceReplaceHtmlEntities |
|
CharSubsequence |
Given a string, return only the portion of the string inside a regex parenthesized group.
|
Classification2ConfidencePredictingFeatureVector |
Pipe features from underlying classifier to
the confidence prediction instance list
|
CountsToFeatureSequencePipe |
|
Csv2Array |
Converts a string of comma separated values to an array.
|
Csv2FeatureVector |
Converts a string of the form
feature_1:val_1 feature_2:val_2 ...
|
Directory2FileIterator |
Convert a File object representing a directory into a FileIterator which
iterates over files in the directory matching a pattern and which extracts
a label from each file path to become the target field of the instance.
|
FeatureCountPipe |
Pruning low-count features can be a good way to save memory and computation.
|
FeatureDocFreqPipe |
Pruning low-count features can be a good way to save memory and computation.
|
FeatureSequence2AugmentableFeatureVector |
Convert the data field from a feature sequence to an augmentable feature vector.
|
FeatureSequence2FeatureVector |
Convert the data field from a feature sequence to a feature vector.
|
FeatureSequenceConvolution |
|
FeatureValueString2FeatureVector |
|
FeatureVectorConjunctions |
Include in the FeatureVector conjunctions of all its features.
|
FeatureVectorSequence2FeatureVectors |
Given instances with a FeatureVectorSequence in the data field, break up the sequence into
the individual FeatureVectors, producing one FeatureVector per Instance.
|
Filename2CharSequence |
Given a filename contained in a string, read in contents of file into a CharSequence.
|
FilterEmptyFeatureVectors |
|
FixedVocabTokenizer |
A simple unicode tokenizer that accepts sequences of letters
as tokens.
|
Input2CharSequence |
Pipe that can read from various kinds of text sources
(either URI, File, or Reader) into a CharSequence
|
InstanceListTrimFeaturesByCount |
Unimplemented.
|
LineGroupString2TokenSequence |
|
MakeAmpersandXMLFriendly |
convert & to & in tokens of a token sequence
|
NGramPreprocessor |
This pipe changes text to lowercase, removes common XML entities (quot, apos, lt, gt), and replaces all punctuation
except the - character with whitespace.
|
Noop |
A pipe that does nothing to the instance fields but which has side effects on the dictionary.
|
Pipe |
The abstract superclass of all Pipes, which transform one data type to another.
|
PipeUtils |
Created: Aug 28, 2005
|
PrintInput |
Print the data field of each instance.
|
PrintInputAndTarget |
Print the data and target fields of each instance.
|
PrintTokenSequenceFeatures |
Print properties of the token sequence in the data field and the corresponding value
of any token in a token sequence or feature in a featur sequence in the target field.
|
SaveDataInSource |
Set the source field of each instance to its data field.
|
SelectiveSGML2TokenSequence |
Similar to SGML2TokenSequence , except that only the tags
listed in allowedTags are converted to Label s.
|
SerialPipes |
Convert an instance through a sequence of pipes.
|
SGML2TokenSequence |
Converts a string containing simple SGML tags into a dta TokenSequence of words,
paired with a target TokenSequence containing the SGML tags in effect for each word.
|
SimpleTaggerSentence2StringTokenization |
|
SimpleTaggerSentence2TokenSequence |
Converts an external encoding of a sequence of elements with binary
features to a TokenSequence .
|
SimpleTokenizer |
A simple unicode tokenizer that accepts sequences of letters
as tokens.
|
SourceLocation2TokenSequence |
Read from File or BufferedRead in the data field and produce a TokenSequence.
|
StringAddNewLineDelimiter |
Pipe that can adds special text between lines to explicitly
represent line breaks.
|
StringIterator |
Java implementation of Jonathan Wood's "Text Parsing Helper Class".
|
StringList2FeatureSequence |
Convert a list of strings into a feature sequence
|
SvmLight2FeatureVectorAndLabel |
This Pipe converts a line in SVMLight format to
a Mallet instance with FeatureVector data and
Label target.
|
Target2Double |
Convert object in the target field into a floating-point numeric type
|
Target2FeatureSequence |
Convert a token sequence in the target field into a feature sequence in the target field.
|
Target2Integer |
Convert object in the target field into an integer numeric type
|
Target2Label |
Convert object in the target field into a label in the target field.
|
Target2LabelSequence |
convert a token sequence in the target field into a label sequence in the target field.
|
TargetRememberLastLabel |
For each position in the target, remember the last non-background
label.
|
TargetStringToFeatures |
|
Token2FeatureVector |
convert the property list on a token into a feature vector
|
TokenSequence2FeatureSequence |
Convert the token sequence in the data field each instance to a feature sequence.
|
TokenSequence2FeatureSequenceWithBigrams |
Convert the token sequence in the data field of each instance to a feature sequence that
preserves bigram information.
|
TokenSequence2FeatureVectorSequence |
Convert the token sequence in the data field of each instance to a feature vector sequence.
|
TokenSequence2PorterStems |
|
TokenSequenceLowercase |
Convert the text in each token in the token sequence in the data field to lower case.
|
TokenSequenceMatchDataAndTarget |
Run a regular expression over the text of each token; replace the
text with the substring matching one regex group; create a target
TokenSequence from the text matching another regex group.
|
TokenSequenceNGrams |
Convert the token sequence in the data field to a token sequence of ngrams.
|
TokenSequenceParseFeatureString |
Convert the string in each field Token.text to a list
of Strings (space delimited).
|
TokenSequenceRemoveNonAlpha |
Remove tokens that contain non-alphabetic characters.
|
TokenSequenceRemoveStopPatterns |
Remove tokens from the token sequence in the data field whose text matches any of a set of regular expressions.
|
TokenSequenceRemoveStopwords |
Remove tokens from the token sequence in the data field whose text is in the stopword list.
|
ValueString2FeatureVector |
|