Package cc.mallet.topics
Class PolylingualTopicModel
- java.lang.Object
-
- cc.mallet.topics.PolylingualTopicModel
-
- All Implemented Interfaces:
java.io.Serializable
public class PolylingualTopicModel extends java.lang.Object implements java.io.SerializableLatent Dirichlet Allocation for loosely parallel corpora in arbitrary languages- Author:
- David Mimno, Andrew McCallum
- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description classPolylingualTopicModel.TopicAssignment
-
Field Summary
Fields Modifier and Type Field Description protected double[]alphaprotected Alphabet[]alphabetsprotected doublealphaSumprotected double[]betasprotected double[]betaSumsintburninPeriodprotected java.util.ArrayList<PolylingualTopicModel.TopicAssignment>datastatic doubleDEFAULT_BETAprotected int[]docLengthCountsprotected java.text.NumberFormatformatterprotected intiterationsSoFarprotected double[][]languageCachedCoefficientsprotected int[]languageMaxTypeCountsprotected double[]languageSmoothingOnlyMassesprotected int[][]languageTokensPerTopicprotected int[][][]languageTypeTopicCountsprotected java.lang.StringmodelFilenameintnumIterationsprotected intnumStopwordsprotected intnumTopicsprotected int[]oneDocTopicCountsintoptimizeIntervalprotected booleanprintLogLikelihoodprotected Randomsrandomprotected intsaveModelIntervalintsaveSampleIntervalprotected intsaveStateIntervalintshowTopicsIntervalprotected java.lang.StringstateFilenameprotected LabelAlphabettopicAlphabetprotected inttopicBitsprotected int[][]topicDocCountsprotected inttopicMaskprotected int[]vocabularySizesintwordsPerTopic
-
Constructor Summary
Constructors Constructor Description PolylingualTopicModel(int numberOfTopics)PolylingualTopicModel(int numberOfTopics, double alphaSum)PolylingualTopicModel(int numberOfTopics, double alphaSum, Randoms random)PolylingualTopicModel(LabelAlphabet topicAlphabet, double alphaSum, Randoms random)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidaddInstances(InstanceList[] training)voidestimate()voidestimate(int iterationsThisRound)java.util.ArrayList<PolylingualTopicModel.TopicAssignment>getData()TopicInferencergetInferencer(int language)Return a tool for estimating topic distributions for new documentsintgetNumTopics()MarginalProbEstimatorgetProbEstimator(int language)Return a tool for estimating topic distributions for new documentsLabelAlphabetgetTopicAlphabet()voidloadTestingIDs(java.io.File testingIDFile)static voidmain(java.lang.String[] args)doublemodelLogLikelihood()voidoptimizeBetas()voidprintDocumentTopics(java.io.File f)voidprintDocumentTopics(java.io.PrintWriter pw)voidprintDocumentTopics(java.io.PrintWriter pw, double threshold, int max)voidprintState(java.io.File f)voidprintState(java.io.PrintStream out)voidprintTopWords(java.io.File file, int numWords, boolean useNewLines)voidprintTopWords(java.io.PrintStream out, int numWords, boolean usingNewLines)static PolylingualTopicModelread(java.io.File f)protected voidsampleTopicsForOneDoc(PolylingualTopicModel.TopicAssignment topicAssignment, boolean shouldSaveState)voidsetBurninPeriod(int burninPeriod)voidsetModelOutput(int interval, java.lang.String filename)voidsetNumIterations(int numIterations)voidsetOptimizeInterval(int interval)voidsetRandomSeed(int seed)voidsetSaveState(int interval, java.lang.String filename)Define how often and where to save the statevoidsetTopicDisplay(int interval, int n)voidwrite(java.io.File serializedModelFile)
-
-
-
Field Detail
-
data
protected java.util.ArrayList<PolylingualTopicModel.TopicAssignment> data
-
topicAlphabet
protected LabelAlphabet topicAlphabet
-
numStopwords
protected int numStopwords
-
numTopics
protected int numTopics
-
topicMask
protected int topicMask
-
topicBits
protected int topicBits
-
alphabets
protected Alphabet[] alphabets
-
vocabularySizes
protected int[] vocabularySizes
-
alpha
protected double[] alpha
-
alphaSum
protected double alphaSum
-
betas
protected double[] betas
-
betaSums
protected double[] betaSums
-
languageMaxTypeCounts
protected int[] languageMaxTypeCounts
-
DEFAULT_BETA
public static final double DEFAULT_BETA
- See Also:
- Constant Field Values
-
languageSmoothingOnlyMasses
protected double[] languageSmoothingOnlyMasses
-
languageCachedCoefficients
protected double[][] languageCachedCoefficients
-
oneDocTopicCounts
protected int[] oneDocTopicCounts
-
languageTypeTopicCounts
protected int[][][] languageTypeTopicCounts
-
languageTokensPerTopic
protected int[][] languageTokensPerTopic
-
docLengthCounts
protected int[] docLengthCounts
-
topicDocCounts
protected int[][] topicDocCounts
-
iterationsSoFar
protected int iterationsSoFar
-
numIterations
public int numIterations
-
burninPeriod
public int burninPeriod
-
saveSampleInterval
public int saveSampleInterval
-
optimizeInterval
public int optimizeInterval
-
showTopicsInterval
public int showTopicsInterval
-
wordsPerTopic
public int wordsPerTopic
-
saveModelInterval
protected int saveModelInterval
-
modelFilename
protected java.lang.String modelFilename
-
saveStateInterval
protected int saveStateInterval
-
stateFilename
protected java.lang.String stateFilename
-
random
protected Randoms random
-
formatter
protected java.text.NumberFormat formatter
-
printLogLikelihood
protected boolean printLogLikelihood
-
-
Constructor Detail
-
PolylingualTopicModel
public PolylingualTopicModel(int numberOfTopics)
-
PolylingualTopicModel
public PolylingualTopicModel(int numberOfTopics, double alphaSum)
-
PolylingualTopicModel
public PolylingualTopicModel(int numberOfTopics, double alphaSum, Randoms random)
-
PolylingualTopicModel
public PolylingualTopicModel(LabelAlphabet topicAlphabet, double alphaSum, Randoms random)
-
-
Method Detail
-
loadTestingIDs
public void loadTestingIDs(java.io.File testingIDFile) throws java.io.IOException- Throws:
java.io.IOException
-
getTopicAlphabet
public LabelAlphabet getTopicAlphabet()
-
getNumTopics
public int getNumTopics()
-
getData
public java.util.ArrayList<PolylingualTopicModel.TopicAssignment> getData()
-
setNumIterations
public void setNumIterations(int numIterations)
-
setBurninPeriod
public void setBurninPeriod(int burninPeriod)
-
setTopicDisplay
public void setTopicDisplay(int interval, int n)
-
setRandomSeed
public void setRandomSeed(int seed)
-
setOptimizeInterval
public void setOptimizeInterval(int interval)
-
setModelOutput
public void setModelOutput(int interval, java.lang.String filename)
-
setSaveState
public void setSaveState(int interval, java.lang.String filename)Define how often and where to save the state- Parameters:
interval- Save a copy of the state everyintervaliterations.filename- Save the state to this file, with the iteration number as a suffix
-
addInstances
public void addInstances(InstanceList[] training)
-
estimate
public void estimate() throws java.io.IOException- Throws:
java.io.IOException
-
estimate
public void estimate(int iterationsThisRound) throws java.io.IOException- Throws:
java.io.IOException
-
optimizeBetas
public void optimizeBetas()
-
sampleTopicsForOneDoc
protected void sampleTopicsForOneDoc(PolylingualTopicModel.TopicAssignment topicAssignment, boolean shouldSaveState)
-
printTopWords
public void printTopWords(java.io.File file, int numWords, boolean useNewLines) throws java.io.IOException- Throws:
java.io.IOException
-
printTopWords
public void printTopWords(java.io.PrintStream out, int numWords, boolean usingNewLines)
-
printDocumentTopics
public void printDocumentTopics(java.io.File f) throws java.io.IOException- Throws:
java.io.IOException
-
printDocumentTopics
public void printDocumentTopics(java.io.PrintWriter pw)
-
printDocumentTopics
public void printDocumentTopics(java.io.PrintWriter pw, double threshold, int max)- Parameters:
pw- A print writerthreshold- Only print topics with proportion greater than this numbermax- Print no more than this many topics
-
printState
public void printState(java.io.File f) throws java.io.IOException- Throws:
java.io.IOException
-
printState
public void printState(java.io.PrintStream out)
-
modelLogLikelihood
public double modelLogLikelihood()
-
getInferencer
public TopicInferencer getInferencer(int language)
Return a tool for estimating topic distributions for new documents
-
getProbEstimator
public MarginalProbEstimator getProbEstimator(int language)
Return a tool for estimating topic distributions for new documents
-
write
public void write(java.io.File serializedModelFile)
-
read
public static PolylingualTopicModel read(java.io.File f) throws java.lang.Exception
- Throws:
java.lang.Exception
-
main
public static void main(java.lang.String[] args) throws java.io.IOException- Throws:
java.io.IOException
-
-