Package cc.mallet.topics
Class NPTopicModel
- java.lang.Object
-
- cc.mallet.topics.NPTopicModel
-
- All Implemented Interfaces:
java.io.Serializable
public class NPTopicModel extends java.lang.Object implements java.io.Serializable
A non-parametric topic model that uses the "minimal path" assumption to reduce bookkeeping.- Author:
- David Mimno
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description protected double
alpha
protected Alphabet
alphabet
protected double
beta
protected double
betaSum
protected java.util.ArrayList<TopicAssignment>
data
static double
DEFAULT_BETA
protected com.carrotsearch.hppc.IntIntHashMap
docsPerTopic
protected java.text.NumberFormat
formatter
protected double
gamma
protected int
maxTopic
protected int
numTopics
protected int
numTypes
protected boolean
printLogLikelihood
protected Randoms
random
int
showTopicsInterval
protected com.carrotsearch.hppc.IntIntHashMap
tokensPerTopic
protected LabelAlphabet
topicAlphabet
protected int
totalDocTopics
protected com.carrotsearch.hppc.IntIntHashMap[]
typeTopicCounts
int
wordsPerTopic
-
Constructor Summary
Constructors Constructor Description NPTopicModel(double alpha, double gamma, double beta)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
addInstances(InstanceList training, int initialTopics)
static void
main(java.lang.String[] args)
void
printState(java.io.File f)
void
printState(java.io.PrintStream out)
void
sample(int iterations)
protected void
sampleTopicsForOneDoc(FeatureSequence tokenSequence, FeatureSequence topicSequence)
void
setRandomSeed(int seed)
void
setTopicDisplay(int interval, int n)
java.lang.String
topWords(int numWords)
-
-
-
Field Detail
-
data
protected java.util.ArrayList<TopicAssignment> data
-
alphabet
protected Alphabet alphabet
-
topicAlphabet
protected LabelAlphabet topicAlphabet
-
maxTopic
protected int maxTopic
-
numTopics
protected int numTopics
-
numTypes
protected int numTypes
-
alpha
protected double alpha
-
gamma
protected double gamma
-
beta
protected double beta
-
betaSum
protected double betaSum
-
DEFAULT_BETA
public static final double DEFAULT_BETA
- See Also:
- Constant Field Values
-
typeTopicCounts
protected com.carrotsearch.hppc.IntIntHashMap[] typeTopicCounts
-
tokensPerTopic
protected com.carrotsearch.hppc.IntIntHashMap tokensPerTopic
-
docsPerTopic
protected com.carrotsearch.hppc.IntIntHashMap docsPerTopic
-
totalDocTopics
protected int totalDocTopics
-
showTopicsInterval
public int showTopicsInterval
-
wordsPerTopic
public int wordsPerTopic
-
random
protected Randoms random
-
formatter
protected java.text.NumberFormat formatter
-
printLogLikelihood
protected boolean printLogLikelihood
-
-
Constructor Detail
-
NPTopicModel
public NPTopicModel(double alpha, double gamma, double beta)
- Parameters:
alpha
- this parameter balances the local document topic counts with the global distribution over topics.gamma
- this parameter is the weight on a completely new, never-before-seen topic in the global distribution.beta
- this parameter controls the variability of the topic-word distributions
-
-
Method Detail
-
setTopicDisplay
public void setTopicDisplay(int interval, int n)
-
setRandomSeed
public void setRandomSeed(int seed)
-
addInstances
public void addInstances(InstanceList training, int initialTopics)
-
sample
public void sample(int iterations) throws java.io.IOException
- Throws:
java.io.IOException
-
sampleTopicsForOneDoc
protected void sampleTopicsForOneDoc(FeatureSequence tokenSequence, FeatureSequence topicSequence)
-
topWords
public java.lang.String topWords(int numWords)
-
printState
public void printState(java.io.File f) throws java.io.IOException
- Throws:
java.io.IOException
-
printState
public void printState(java.io.PrintStream out)
-
main
public static void main(java.lang.String[] args) throws java.io.IOException
- Throws:
java.io.IOException
-
-