Package cc.mallet.util
Class DBBulkLoader
- java.lang.Object
-
- cc.mallet.util.DBBulkLoader
-
public class DBBulkLoader extends java.lang.ObjectThis class reads through two files (data and metadata), tokenizing metadata for use as a label vector.
-
-
Field Summary
Fields Modifier and Type Field Description protected static java.util.logging.Loggerlogger
-
Constructor Summary
Constructors Constructor Description DBBulkLoader()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static voidgenerateStoplist(SimpleTokenizer prunedTokenizer, NGramPreprocessor preprocessor)Read the data from inputFiles, then write all the words that do not occur pruneCount.value times or more to the pruned word file.static voidmain(java.lang.String[] args)static voidwriteInstanceList(java.util.ArrayList<Pipe> pipes)
-
-
-
Method Detail
-
generateStoplist
public static void generateStoplist(SimpleTokenizer prunedTokenizer, NGramPreprocessor preprocessor) throws java.io.IOException
Read the data from inputFiles, then write all the words that do not occur pruneCount.value times or more to the pruned word file.- Parameters:
prunedTokenizer- the tokenizer that will be used to write instances- Throws:
java.io.IOException
-
writeInstanceList
public static void writeInstanceList(java.util.ArrayList<Pipe> pipes) throws java.lang.Exception
- Throws:
java.lang.Exception
-
main
public static void main(java.lang.String[] args) throws java.lang.Exception- Throws:
java.lang.Exception
-
-