Package cc.mallet.pipe
Class SGML2TokenSequence
- java.lang.Object
-
- cc.mallet.pipe.Pipe
-
- cc.mallet.pipe.SGML2TokenSequence
-
- All Implemented Interfaces:
AlphabetCarrying
,java.io.Serializable
public class SGML2TokenSequence extends Pipe implements java.io.Serializable
Converts a string containing simple SGML tags into a dta TokenSequence of words, paired with a target TokenSequence containing the SGML tags in effect for each word. It does not handle nested SGML tags, nor gracefully handle malformed SGML.- Author:
- Andrew McCallum mccallum@cs.umass.edu
- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description SGML2TokenSequence()
SGML2TokenSequence(CharSequenceLexer lexer, java.lang.String backgroundTag)
SGML2TokenSequence(CharSequenceLexer lexer, java.lang.String backgroundTag, boolean saveSource)
SGML2TokenSequence(java.lang.String regex, java.lang.String backgroundTag)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static void
main(java.lang.String[] args)
Instance
pipe(Instance carrier)
Really this should be 'protected', but isn't for historical reasons.-
Methods inherited from class cc.mallet.pipe.Pipe
alphabetsMatch, getAlphabet, getAlphabets, getDataAlphabet, getInstanceId, getTargetAlphabet, instanceFrom, instancesFrom, instancesFrom, isDataAlphabetSet, isTargetProcessing, newIteratorFrom, preceedingPipeDataAlphabetNotification, preceedingPipeTargetAlphabetNotification, precondition, readResolve, setDataAlphabet, setOrCheckDataAlphabet, setOrCheckTargetAlphabet, setTargetAlphabet, setTargetProcessing
-
-
-
-
Constructor Detail
-
SGML2TokenSequence
public SGML2TokenSequence(CharSequenceLexer lexer, java.lang.String backgroundTag, boolean saveSource)
-
SGML2TokenSequence
public SGML2TokenSequence(CharSequenceLexer lexer, java.lang.String backgroundTag)
-
SGML2TokenSequence
public SGML2TokenSequence(java.lang.String regex, java.lang.String backgroundTag)
-
SGML2TokenSequence
public SGML2TokenSequence()
-
-