Package cc.mallet.extract
Class HierarchicalTokenizationFilter
- java.lang.Object
-
- cc.mallet.extract.HierarchicalTokenizationFilter
-
- All Implemented Interfaces:
TokenizationFilter
public class HierarchicalTokenizationFilter extends java.lang.Object implements TokenizationFilter
Tokenization filter that will create nested spans based on a hierarchical labeling of the data. The labels should be of the form LBL1[|LBLk]*. For example,A A|B A|B|C A|B|C A|B A A w1 w2 w3 w4 w5 w6 w7
will result in LabeledSpans like <A>w1 <B>w2 <C>w3 w4</C> w5</B> w6 w7</A> Also, labels of the form <B-field> will force a new instance of the field to begin, even if it is already active. And prefixes of I- are ignored so you can use BIO labeling. Created: Nov 12, 2004- Version:
- $Id: HierarchicalTokenizationFilter.java,v 1.1 2007/10/22 21:37:44 mccallum Exp $
- Author:
-
-
-
Constructor Summary
Constructors Constructor Description HierarchicalTokenizationFilter()
HierarchicalTokenizationFilter(java.util.regex.Pattern ignorePattern)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description LabeledSpans
constructLabeledSpans(LabelAlphabet dict, java.lang.Object document, Label backgroundTag, Tokenization input, Sequence seq)
Converts a the sequence of labels into a set of labeled spans.
-
-
-
-
Method Detail
-
constructLabeledSpans
public LabeledSpans constructLabeledSpans(LabelAlphabet dict, java.lang.Object document, Label backgroundTag, Tokenization input, Sequence seq)
Description copied from interface:TokenizationFilter
Converts a the sequence of labels into a set of labeled spans. Essentially, this converts the output of sequence labeling into an extraction output.- Specified by:
constructLabeledSpans
in interfaceTokenizationFilter
- Returns:
-
-