Class TokenSequenceRemoveStopwords

  • All Implemented Interfaces:
    AlphabetCarrying, java.io.Serializable

    public class TokenSequenceRemoveStopwords
    extends Pipe
    implements java.io.Serializable
    Remove tokens from the token sequence in the data field whose text is in the stopword list.
    Author:
    Andrew McCallum mccallum@cs.umass.edu
    See Also:
    Serialized Form
    • Field Detail

      • stopwordsDutch

        public static final java.lang.String[] stopwordsDutch
    • Constructor Detail

      • TokenSequenceRemoveStopwords

        public TokenSequenceRemoveStopwords​(boolean caseSensitive,
                                            boolean markDeletions)
      • TokenSequenceRemoveStopwords

        public TokenSequenceRemoveStopwords​(boolean caseSensitive)
      • TokenSequenceRemoveStopwords

        public TokenSequenceRemoveStopwords()
      • TokenSequenceRemoveStopwords

        public TokenSequenceRemoveStopwords​(java.io.File stoplistFile,
                                            java.lang.String encoding,
                                            boolean includeDefault,
                                            boolean caseSensitive,
                                            boolean markDeletions)
        Load a stoplist from a file.
        Parameters:
        stoplistFile - The file to load
        encoding - The encoding of the stoplist file (eg UTF-8)
        includeDefault - Whether to include the standard mallet English stoplist
      • TokenSequenceRemoveStopwords

        public TokenSequenceRemoveStopwords​(java.io.InputStream stoplistStream,
                                            java.lang.String encoding,
                                            boolean includeDefault,
                                            boolean caseSensitive,
                                            boolean markDeletions)