Class HierarchicalTokenizationFilter

  • All Implemented Interfaces:
    TokenizationFilter

    public class HierarchicalTokenizationFilter
    extends java.lang.Object
    implements TokenizationFilter
    Tokenization filter that will create nested spans based on a hierarchical labeling of the data. The labels should be of the form LBL1[|LBLk]*. For example,
       A   A|B   A|B|C   A|B|C  A|B  A   A
       w1  w2    w3      w4     w5   w6  w7
     
    will result in LabeledSpans like <A>w1 <B>w2 <C>w3 w4</C> w5</B> w6 w7</A> Also, labels of the form <B-field> will force a new instance of the field to begin, even if it is already active. And prefixes of I- are ignored so you can use BIO labeling. Created: Nov 12, 2004
    Version:
    $Id: HierarchicalTokenizationFilter.java,v 1.1 2007/10/22 21:37:44 mccallum Exp $
    Author:
    • Constructor Detail

      • HierarchicalTokenizationFilter

        public HierarchicalTokenizationFilter()
      • HierarchicalTokenizationFilter

        public HierarchicalTokenizationFilter​(java.util.regex.Pattern ignorePattern)