Class SimpleTaggerSentence2TokenSequence

  • All Implemented Interfaces:
    AlphabetCarrying, java.io.Serializable
    Direct Known Subclasses:
    SimpleTaggerSentence2StringTokenization

    public class SimpleTaggerSentence2TokenSequence
    extends Pipe
    Converts an external encoding of a sequence of elements with binary features to a TokenSequence. If target processing is on (training or labeled test data), it extracts element labels from the external encoding to create a target LabelSequence. Two external encodings are supported:
    1. A String containing lines of whitespace-separated tokens.
    2. a String[][].

    Both represent rows of tokens. When target processing is on, the last token in each row is the label of the sequence element represented by this row. All other tokens in the row, or all tokens in the row if not target processing, are the names of features that are on for the sequence element described by the row.

    See Also:
    Serialized Form
    • Field Detail

      • setTokensAsFeatures

        protected boolean setTokensAsFeatures
    • Constructor Detail

      • SimpleTaggerSentence2TokenSequence

        public SimpleTaggerSentence2TokenSequence()
        Creates a new SimpleTaggerSentence2TokenSequence instance. By default we include tokens as features.
      • SimpleTaggerSentence2TokenSequence

        public SimpleTaggerSentence2TokenSequence​(boolean inc)
        creates a new SimpleTaggerSentence2TokenSequence instance which includes tokens as features iff the supplied argument is true.
    • Method Detail

      • parseSentence

        protected java.lang.String[][] parseSentence​(java.lang.String sentence)
        Parses a string representing a sequence of rows of tokens into an array of arrays of tokens.
        Parameters:
        sentence - a String
        Returns:
        the corresponding array of arrays of tokens.
      • makeText

        protected java.lang.String makeText​(java.lang.String[] in)
        returns the first String in the array or "" if the array has length 0.
      • pipe

        public Instance pipe​(Instance carrier)
        Takes an instance with data of type String or String[][] and creates an Instance of type TokenSequence. Each Token in the sequence gets the text of the line preceding it and once feature of value 1 for each "Feature" in the line. For example, if the String[][] is {{a,b},{c,d,e}} (and target processing is off) then the text would be "a b" for the first token and "c d e" for the second. Also, the features "a" and "b" would be set for the first token and "c", "d" and "e" for the second. The last element in the String[] for the current token is taken as the target (label), so in the previous example "b" would have been the label of the first sequence.
        Overrides:
        pipe in class Pipe