Package cc.mallet.pipe
Class TokenSequenceRemoveStopwords
- java.lang.Object
-
- cc.mallet.pipe.Pipe
-
- cc.mallet.pipe.TokenSequenceRemoveStopwords
-
- All Implemented Interfaces:
AlphabetCarrying
,java.io.Serializable
public class TokenSequenceRemoveStopwords extends Pipe implements java.io.Serializable
Remove tokens from the token sequence in the data field whose text is in the stopword list.- Author:
- Andrew McCallum mccallum@cs.umass.edu
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String[]
stopwordsDutch
-
Constructor Summary
Constructors Constructor Description TokenSequenceRemoveStopwords()
TokenSequenceRemoveStopwords(boolean caseSensitive)
TokenSequenceRemoveStopwords(boolean caseSensitive, boolean markDeletions)
TokenSequenceRemoveStopwords(java.io.File stoplistFile, java.lang.String encoding, boolean includeDefault, boolean caseSensitive, boolean markDeletions)
Load a stoplist from a file.TokenSequenceRemoveStopwords(java.io.InputStream stoplistStream, java.lang.String encoding, boolean includeDefault, boolean caseSensitive, boolean markDeletions)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description TokenSequenceRemoveStopwords
addStopWords(java.io.File wordlist)
Add whitespace-separated tokens in file "wordlist" to the stoplist.TokenSequenceRemoveStopwords
addStopWords(java.lang.String[] words)
Instance
pipe(Instance carrier)
Really this should be 'protected', but isn't for historical reasons.TokenSequenceRemoveStopwords
removeStopWords(java.io.File wordlist)
Remove whitespace-separated tokens in file "wordlist" to the stoplist.TokenSequenceRemoveStopwords
removeStopWords(java.lang.String[] words)
TokenSequenceRemoveStopwords
setCaseSensitive(boolean flag)
TokenSequenceRemoveStopwords
setMarkDeletions(boolean flag)
-
Methods inherited from class cc.mallet.pipe.Pipe
alphabetsMatch, getAlphabet, getAlphabets, getDataAlphabet, getInstanceId, getTargetAlphabet, instanceFrom, instancesFrom, instancesFrom, isDataAlphabetSet, isTargetProcessing, newIteratorFrom, preceedingPipeDataAlphabetNotification, preceedingPipeTargetAlphabetNotification, precondition, readResolve, setDataAlphabet, setOrCheckDataAlphabet, setOrCheckTargetAlphabet, setTargetAlphabet, setTargetProcessing
-
-
-
-
Constructor Detail
-
TokenSequenceRemoveStopwords
public TokenSequenceRemoveStopwords(boolean caseSensitive, boolean markDeletions)
-
TokenSequenceRemoveStopwords
public TokenSequenceRemoveStopwords(boolean caseSensitive)
-
TokenSequenceRemoveStopwords
public TokenSequenceRemoveStopwords()
-
TokenSequenceRemoveStopwords
public TokenSequenceRemoveStopwords(java.io.File stoplistFile, java.lang.String encoding, boolean includeDefault, boolean caseSensitive, boolean markDeletions)
Load a stoplist from a file.- Parameters:
stoplistFile
- The file to loadencoding
- The encoding of the stoplist file (eg UTF-8)includeDefault
- Whether to include the standard mallet English stoplist
-
TokenSequenceRemoveStopwords
public TokenSequenceRemoveStopwords(java.io.InputStream stoplistStream, java.lang.String encoding, boolean includeDefault, boolean caseSensitive, boolean markDeletions)
-
-
Method Detail
-
setCaseSensitive
public TokenSequenceRemoveStopwords setCaseSensitive(boolean flag)
-
setMarkDeletions
public TokenSequenceRemoveStopwords setMarkDeletions(boolean flag)
-
addStopWords
public TokenSequenceRemoveStopwords addStopWords(java.lang.String[] words)
-
removeStopWords
public TokenSequenceRemoveStopwords removeStopWords(java.lang.String[] words)
-
removeStopWords
public TokenSequenceRemoveStopwords removeStopWords(java.io.File wordlist)
Remove whitespace-separated tokens in file "wordlist" to the stoplist.
-
addStopWords
public TokenSequenceRemoveStopwords addStopWords(java.io.File wordlist)
Add whitespace-separated tokens in file "wordlist" to the stoplist.
-
-