Class TokenSequenceMatchDataAndTarget

  • All Implemented Interfaces:
    AlphabetCarrying, java.io.Serializable

    public class TokenSequenceMatchDataAndTarget
    extends Pipe
    implements java.io.Serializable
    Run a regular expression over the text of each token; replace the text with the substring matching one regex group; create a target TokenSequence from the text matching another regex group.

    For example, if you have a data file containing one line per token, and the label also appears on that line, you can first get a TokenSequence in which the text of each line is the Token.getText() of each token, then run this pipe, and separate the target information from the data information. For example to process the following,

             BACKGROUND Then
             PERSON Mr.
             PERSON Smith
             BACKGROUND said
             ...
             
    use new TokenSequenceMatchDataAndTarget (Pattern.compile ("([A-Z]+) (.*)"), 2, 1).
    Author:
    Andrew McCallum mccallum@cs.umass.edu
    See Also:
    Serialized Form
    • Constructor Detail

      • TokenSequenceMatchDataAndTarget

        public TokenSequenceMatchDataAndTarget​(java.util.regex.Pattern regex,
                                               int dataGroup,
                                               int targetGroup)
      • TokenSequenceMatchDataAndTarget

        public TokenSequenceMatchDataAndTarget​(java.lang.String regex,
                                               int dataGroup,
                                               int targetGroup)
    • Method Detail

      • pipe

        public Instance pipe​(Instance carrier)
        Description copied from class: Pipe
        Really this should be 'protected', but isn't for historical reasons.
        Overrides:
        pipe in class Pipe