Package cc.mallet.pipe
Class SimpleTaggerSentence2TokenSequence
- java.lang.Object
-
- cc.mallet.pipe.Pipe
-
- cc.mallet.pipe.SimpleTaggerSentence2TokenSequence
-
- All Implemented Interfaces:
AlphabetCarrying
,java.io.Serializable
- Direct Known Subclasses:
SimpleTaggerSentence2StringTokenization
public class SimpleTaggerSentence2TokenSequence extends Pipe
Converts an external encoding of a sequence of elements with binary features to aTokenSequence
. If target processing is on (training or labeled test data), it extracts element labels from the external encoding to create a targetLabelSequence
. Two external encodings are supported:- A
String
containing lines of whitespace-separated tokens. - a
String
[][]
.
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description protected boolean
setTokensAsFeatures
-
Constructor Summary
Constructors Constructor Description SimpleTaggerSentence2TokenSequence()
Creates a newSimpleTaggerSentence2TokenSequence
instance.SimpleTaggerSentence2TokenSequence(boolean inc)
creates a newSimpleTaggerSentence2TokenSequence
instance which includes tokens as features iff the supplied argument is true.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected java.lang.String
makeText(java.lang.String[] in)
returns the first String in the array or "" if the array has length 0.protected java.lang.String[][]
parseSentence(java.lang.String sentence)
Parses a string representing a sequence of rows of tokens into an array of arrays of tokens.Instance
pipe(Instance carrier)
Takes an instance with data of type String or String[][] and creates an Instance of type TokenSequence.-
Methods inherited from class cc.mallet.pipe.Pipe
alphabetsMatch, getAlphabet, getAlphabets, getDataAlphabet, getInstanceId, getTargetAlphabet, instanceFrom, instancesFrom, instancesFrom, isDataAlphabetSet, isTargetProcessing, newIteratorFrom, preceedingPipeDataAlphabetNotification, preceedingPipeTargetAlphabetNotification, precondition, readResolve, setDataAlphabet, setOrCheckDataAlphabet, setOrCheckTargetAlphabet, setTargetAlphabet, setTargetProcessing
-
-
-
-
Constructor Detail
-
SimpleTaggerSentence2TokenSequence
public SimpleTaggerSentence2TokenSequence()
Creates a newSimpleTaggerSentence2TokenSequence
instance. By default we include tokens as features.
-
SimpleTaggerSentence2TokenSequence
public SimpleTaggerSentence2TokenSequence(boolean inc)
creates a newSimpleTaggerSentence2TokenSequence
instance which includes tokens as features iff the supplied argument is true.
-
-
Method Detail
-
parseSentence
protected java.lang.String[][] parseSentence(java.lang.String sentence)
Parses a string representing a sequence of rows of tokens into an array of arrays of tokens.- Parameters:
sentence
- aString
- Returns:
- the corresponding array of arrays of tokens.
-
makeText
protected java.lang.String makeText(java.lang.String[] in)
returns the first String in the array or "" if the array has length 0.
-
pipe
public Instance pipe(Instance carrier)
Takes an instance with data of type String or String[][] and creates an Instance of type TokenSequence. Each Token in the sequence gets the text of the line preceding it and once feature of value 1 for each "Feature" in the line. For example, if the String[][] is {{a,b},{c,d,e}} (and target processing is off) then the text would be "a b" for the first token and "c d e" for the second. Also, the features "a" and "b" would be set for the first token and "c", "d" and "e" for the second. The last element in the String[] for the current token is taken as the target (label), so in the previous example "b" would have been the label of the first sequence.
-
-