Package cc.mallet.topics.tui
Class DMRLoader
- java.lang.Object
-
- cc.mallet.topics.tui.DMRLoader
-
- All Implemented Interfaces:
java.io.Serializable
public class DMRLoader extends java.lang.Object implements java.io.Serializable
This class loads data into the format for the MALLET Dirichlet-multinomial regression (DMR). DMR topic models learn topic assignments conditioned on observed features.The input format consists of two files, one for text and the other for features. The "text" file consists of one document per line. This class will tokenize and remove stopwords.
The "features" file contains whitespace-delimited features in this format:
blue heavy width=12.08
Features without explicit values ("blue" and "heavy" in the example) are set to 1.0.- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description DMRLoader()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
load(java.io.File wordsFile, java.io.File featuresFile, java.io.File instancesFile)
static void
main(java.lang.String[] args)
static java.io.BufferedReader
openReader(java.io.File file)
-
-
-
Method Detail
-
openReader
public static java.io.BufferedReader openReader(java.io.File file) throws java.io.IOException
- Throws:
java.io.IOException
-
load
public void load(java.io.File wordsFile, java.io.File featuresFile, java.io.File instancesFile) throws java.io.IOException, java.io.FileNotFoundException
- Throws:
java.io.IOException
java.io.FileNotFoundException
-
main
public static void main(java.lang.String[] args) throws java.io.FileNotFoundException, java.io.IOException
- Throws:
java.io.FileNotFoundException
java.io.IOException
-
-