Package cc.mallet.topics
Class SimpleLDA
- java.lang.Object
-
- cc.mallet.topics.SimpleLDA
-
- All Implemented Interfaces:
java.io.Serializable
public class SimpleLDA extends java.lang.Object implements java.io.SerializableA simple implementation of Latent Dirichlet Allocation using Gibbs sampling. This code is slower than the regular Mallet LDA implementation, but provides a better starting place for understanding how sampling works and for building new topic models.- Author:
- David Mimno, Andrew McCallum
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description protected doublealphaprotected Alphabetalphabetprotected doublealphaSumprotected doublebetaprotected doublebetaSumprotected java.util.ArrayList<TopicAssignment>datastatic doubleDEFAULT_BETAprotected java.text.NumberFormatformatterprotected intnumTopicsprotected intnumTypesprotected int[]oneDocTopicCountsprotected booleanprintLogLikelihoodprotected RandomsrandomintshowTopicsIntervalprotected int[]tokensPerTopicprotected LabelAlphabettopicAlphabetprotected int[][]typeTopicCountsintwordsPerTopic
-
Constructor Summary
Constructors Constructor Description SimpleLDA(int numberOfTopics)SimpleLDA(int numberOfTopics, double alphaSum, double beta)SimpleLDA(int numberOfTopics, double alphaSum, double beta, Randoms random)SimpleLDA(LabelAlphabet topicAlphabet, double alphaSum, double beta, Randoms random)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidaddInstances(InstanceList training)AlphabetgetAlphabet()java.util.ArrayList<TopicAssignment>getData()intgetNumTopics()LabelAlphabetgetTopicAlphabet()int[]getTopicTotals()int[][]getTypeTopicCounts()static voidmain(java.lang.String[] args)doublemodelLogLikelihood()voidprintDocumentTopics(java.io.File file, double threshold, int max)voidprintState(java.io.File f)voidprintState(java.io.PrintStream out)voidsample(int iterations)protected voidsampleTopicsForOneDoc(FeatureSequence tokenSequence, FeatureSequence topicSequence)voidsetRandomSeed(int seed)voidsetTopicDisplay(int interval, int n)java.lang.StringtopWords(int numWords)voidwrite(java.io.File f)
-
-
-
Field Detail
-
data
protected java.util.ArrayList<TopicAssignment> data
-
alphabet
protected Alphabet alphabet
-
topicAlphabet
protected LabelAlphabet topicAlphabet
-
numTopics
protected int numTopics
-
numTypes
protected int numTypes
-
alpha
protected double alpha
-
alphaSum
protected double alphaSum
-
beta
protected double beta
-
betaSum
protected double betaSum
-
DEFAULT_BETA
public static final double DEFAULT_BETA
- See Also:
- Constant Field Values
-
oneDocTopicCounts
protected int[] oneDocTopicCounts
-
typeTopicCounts
protected int[][] typeTopicCounts
-
tokensPerTopic
protected int[] tokensPerTopic
-
showTopicsInterval
public int showTopicsInterval
-
wordsPerTopic
public int wordsPerTopic
-
random
protected Randoms random
-
formatter
protected java.text.NumberFormat formatter
-
printLogLikelihood
protected boolean printLogLikelihood
-
-
Constructor Detail
-
SimpleLDA
public SimpleLDA(int numberOfTopics)
-
SimpleLDA
public SimpleLDA(int numberOfTopics, double alphaSum, double beta)
-
SimpleLDA
public SimpleLDA(int numberOfTopics, double alphaSum, double beta, Randoms random)
-
SimpleLDA
public SimpleLDA(LabelAlphabet topicAlphabet, double alphaSum, double beta, Randoms random)
-
-
Method Detail
-
getAlphabet
public Alphabet getAlphabet()
-
getTopicAlphabet
public LabelAlphabet getTopicAlphabet()
-
getNumTopics
public int getNumTopics()
-
getData
public java.util.ArrayList<TopicAssignment> getData()
-
setTopicDisplay
public void setTopicDisplay(int interval, int n)
-
setRandomSeed
public void setRandomSeed(int seed)
-
getTypeTopicCounts
public int[][] getTypeTopicCounts()
-
getTopicTotals
public int[] getTopicTotals()
-
addInstances
public void addInstances(InstanceList training)
-
sample
public void sample(int iterations) throws java.io.IOException- Throws:
java.io.IOException
-
sampleTopicsForOneDoc
protected void sampleTopicsForOneDoc(FeatureSequence tokenSequence, FeatureSequence topicSequence)
-
modelLogLikelihood
public double modelLogLikelihood()
-
topWords
public java.lang.String topWords(int numWords)
-
printDocumentTopics
public void printDocumentTopics(java.io.File file, double threshold, int max) throws java.io.IOException- Parameters:
file- The filename to print tothreshold- Only print topics with proportion greater than this numbermax- Print no more than this many topics- Throws:
java.io.IOException
-
printState
public void printState(java.io.File f) throws java.io.IOException- Throws:
java.io.IOException
-
printState
public void printState(java.io.PrintStream out)
-
write
public void write(java.io.File f)
-
main
public static void main(java.lang.String[] args) throws java.io.IOException- Throws:
java.io.IOException
-
-