Package cc.mallet.pipe
Class SimpleTaggerSentence2TokenSequence
- java.lang.Object
-
- cc.mallet.pipe.Pipe
-
- cc.mallet.pipe.SimpleTaggerSentence2TokenSequence
-
- All Implemented Interfaces:
AlphabetCarrying,java.io.Serializable
- Direct Known Subclasses:
SimpleTaggerSentence2StringTokenization
public class SimpleTaggerSentence2TokenSequence extends Pipe
Converts an external encoding of a sequence of elements with binary features to aTokenSequence. If target processing is on (training or labeled test data), it extracts element labels from the external encoding to create a targetLabelSequence. Two external encodings are supported:- A
Stringcontaining lines of whitespace-separated tokens. - a
String[][].
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description protected booleansetTokensAsFeatures
-
Constructor Summary
Constructors Constructor Description SimpleTaggerSentence2TokenSequence()Creates a newSimpleTaggerSentence2TokenSequenceinstance.SimpleTaggerSentence2TokenSequence(boolean inc)creates a newSimpleTaggerSentence2TokenSequenceinstance which includes tokens as features iff the supplied argument is true.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected java.lang.StringmakeText(java.lang.String[] in)returns the first String in the array or "" if the array has length 0.protected java.lang.String[][]parseSentence(java.lang.String sentence)Parses a string representing a sequence of rows of tokens into an array of arrays of tokens.Instancepipe(Instance carrier)Takes an instance with data of type String or String[][] and creates an Instance of type TokenSequence.-
Methods inherited from class cc.mallet.pipe.Pipe
alphabetsMatch, getAlphabet, getAlphabets, getDataAlphabet, getInstanceId, getTargetAlphabet, instanceFrom, instancesFrom, instancesFrom, isDataAlphabetSet, isTargetProcessing, newIteratorFrom, preceedingPipeDataAlphabetNotification, preceedingPipeTargetAlphabetNotification, precondition, readResolve, setDataAlphabet, setOrCheckDataAlphabet, setOrCheckTargetAlphabet, setTargetAlphabet, setTargetProcessing
-
-
-
-
Constructor Detail
-
SimpleTaggerSentence2TokenSequence
public SimpleTaggerSentence2TokenSequence()
Creates a newSimpleTaggerSentence2TokenSequenceinstance. By default we include tokens as features.
-
SimpleTaggerSentence2TokenSequence
public SimpleTaggerSentence2TokenSequence(boolean inc)
creates a newSimpleTaggerSentence2TokenSequenceinstance which includes tokens as features iff the supplied argument is true.
-
-
Method Detail
-
parseSentence
protected java.lang.String[][] parseSentence(java.lang.String sentence)
Parses a string representing a sequence of rows of tokens into an array of arrays of tokens.- Parameters:
sentence- aString- Returns:
- the corresponding array of arrays of tokens.
-
makeText
protected java.lang.String makeText(java.lang.String[] in)
returns the first String in the array or "" if the array has length 0.
-
pipe
public Instance pipe(Instance carrier)
Takes an instance with data of type String or String[][] and creates an Instance of type TokenSequence. Each Token in the sequence gets the text of the line preceding it and once feature of value 1 for each "Feature" in the line. For example, if the String[][] is {{a,b},{c,d,e}} (and target processing is off) then the text would be "a b" for the first token and "c d e" for the second. Also, the features "a" and "b" would be set for the first token and "c", "d" and "e" for the second. The last element in the String[] for the current token is taken as the target (label), so in the previous example "b" would have been the label of the first sequence.
-
-