Class SGML2TokenSequence

  • All Implemented Interfaces:
    AlphabetCarrying, java.io.Serializable

    public class SGML2TokenSequence
    extends Pipe
    implements java.io.Serializable
    Converts a string containing simple SGML tags into a dta TokenSequence of words, paired with a target TokenSequence containing the SGML tags in effect for each word. It does not handle nested SGML tags, nor gracefully handle malformed SGML.
    Author:
    Andrew McCallum mccallum@cs.umass.edu
    See Also:
    Serialized Form
    • Constructor Detail

      • SGML2TokenSequence

        public SGML2TokenSequence​(CharSequenceLexer lexer,
                                  java.lang.String backgroundTag,
                                  boolean saveSource)
      • SGML2TokenSequence

        public SGML2TokenSequence​(CharSequenceLexer lexer,
                                  java.lang.String backgroundTag)
      • SGML2TokenSequence

        public SGML2TokenSequence​(java.lang.String regex,
                                  java.lang.String backgroundTag)
      • SGML2TokenSequence

        public SGML2TokenSequence()
    • Method Detail

      • pipe

        public Instance pipe​(Instance carrier)
        Description copied from class: Pipe
        Really this should be 'protected', but isn't for historical reasons.
        Overrides:
        pipe in class Pipe
      • main

        public static void main​(java.lang.String[] args)