Package cc.mallet.util
Class DBBulkLoader
- java.lang.Object
-
- cc.mallet.util.DBBulkLoader
-
public class DBBulkLoader extends java.lang.Object
This class reads through two files (data and metadata), tokenizing metadata for use as a label vector.
-
-
Field Summary
Fields Modifier and Type Field Description protected static java.util.logging.Logger
logger
-
Constructor Summary
Constructors Constructor Description DBBulkLoader()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static void
generateStoplist(SimpleTokenizer prunedTokenizer, NGramPreprocessor preprocessor)
Read the data from inputFiles, then write all the words that do not occur pruneCount.value times or more to the pruned word file.static void
main(java.lang.String[] args)
static void
writeInstanceList(java.util.ArrayList<Pipe> pipes)
-
-
-
Method Detail
-
generateStoplist
public static void generateStoplist(SimpleTokenizer prunedTokenizer, NGramPreprocessor preprocessor) throws java.io.IOException
Read the data from inputFiles, then write all the words that do not occur pruneCount.value times or more to the pruned word file.- Parameters:
prunedTokenizer
- the tokenizer that will be used to write instances- Throws:
java.io.IOException
-
writeInstanceList
public static void writeInstanceList(java.util.ArrayList<Pipe> pipes) throws java.lang.Exception
- Throws:
java.lang.Exception
-
main
public static void main(java.lang.String[] args) throws java.lang.Exception
- Throws:
java.lang.Exception
-
-