Package cc.mallet.pipe
Class TokenSequenceMatchDataAndTarget
- java.lang.Object
-
- cc.mallet.pipe.Pipe
-
- cc.mallet.pipe.TokenSequenceMatchDataAndTarget
-
- All Implemented Interfaces:
AlphabetCarrying
,java.io.Serializable
public class TokenSequenceMatchDataAndTarget extends Pipe implements java.io.Serializable
Run a regular expression over the text of each token; replace the text with the substring matching one regex group; create a target TokenSequence from the text matching another regex group.For example, if you have a data file containing one line per token, and the label also appears on that line, you can first get a TokenSequence in which the text of each line is the Token.getText() of each token, then run this pipe, and separate the target information from the data information. For example to process the following,
BACKGROUND Then PERSON Mr. PERSON Smith BACKGROUND said ...
usenew TokenSequenceMatchDataAndTarget (Pattern.compile ("([A-Z]+) (.*)"), 2, 1)
.- Author:
- Andrew McCallum mccallum@cs.umass.edu
- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description TokenSequenceMatchDataAndTarget(java.lang.String regex, int dataGroup, int targetGroup)
TokenSequenceMatchDataAndTarget(java.util.regex.Pattern regex, int dataGroup, int targetGroup)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Instance
pipe(Instance carrier)
Really this should be 'protected', but isn't for historical reasons.-
Methods inherited from class cc.mallet.pipe.Pipe
alphabetsMatch, getAlphabet, getAlphabets, getDataAlphabet, getInstanceId, getTargetAlphabet, instanceFrom, instancesFrom, instancesFrom, isDataAlphabetSet, isTargetProcessing, newIteratorFrom, preceedingPipeDataAlphabetNotification, preceedingPipeTargetAlphabetNotification, precondition, readResolve, setDataAlphabet, setOrCheckDataAlphabet, setOrCheckTargetAlphabet, setTargetAlphabet, setTargetProcessing
-
-