Package cc.mallet.topics
Class TopicModelDiagnostics
- java.lang.Object
-
- cc.mallet.topics.TopicModelDiagnostics
-
public class TopicModelDiagnostics extends java.lang.Object
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description class
TopicModelDiagnostics.TopicScores
-
Field Summary
Fields Modifier and Type Field Description static double[]
DEFAULT_DOC_PROPORTIONS
static int
FIFTY_PERCENT_INDEX
static int
TWO_PERCENT_INDEX
-
Constructor Summary
Constructors Constructor Description TopicModelDiagnostics(ParallelTopicModel model, int numTopWords)
-
Method Summary
-
-
-
Field Detail
-
TWO_PERCENT_INDEX
public static final int TWO_PERCENT_INDEX
- See Also:
- Constant Field Values
-
FIFTY_PERCENT_INDEX
public static final int FIFTY_PERCENT_INDEX
- See Also:
- Constant Field Values
-
DEFAULT_DOC_PROPORTIONS
public static final double[] DEFAULT_DOC_PROPORTIONS
-
-
Constructor Detail
-
TopicModelDiagnostics
public TopicModelDiagnostics(ParallelTopicModel model, int numTopWords)
-
-
Method Detail
-
collectDocumentStatistics
public void collectDocumentStatistics()
-
getCodocumentMatrix
public int[][] getCodocumentMatrix(int topic)
-
getTokensPerTopic
public TopicModelDiagnostics.TopicScores getTokensPerTopic(int[] tokensPerTopic)
-
getDocumentEntropy
public TopicModelDiagnostics.TopicScores getDocumentEntropy(int[] tokensPerTopic)
-
getDistanceFromUniform
public TopicModelDiagnostics.TopicScores getDistanceFromUniform()
-
getEffectiveNumberOfWords
public TopicModelDiagnostics.TopicScores getEffectiveNumberOfWords()
-
getDistanceFromCorpus
public TopicModelDiagnostics.TopicScores getDistanceFromCorpus()
Low-quality topics may be very similar to the global distribution.
-
getTokenDocumentDiscrepancies
public TopicModelDiagnostics.TopicScores getTokenDocumentDiscrepancies()
-
getWordLengthScores
public TopicModelDiagnostics.TopicScores getWordLengthScores()
Low-quality topics often have lots of unusually short words.
-
getWordLengthStandardDeviation
public TopicModelDiagnostics.TopicScores getWordLengthStandardDeviation()
Low-quality topics often have lots of unusually short words.
-
getCoherence
public TopicModelDiagnostics.TopicScores getCoherence()
-
getRank1Percent
public TopicModelDiagnostics.TopicScores getRank1Percent()
-
getDocumentPercentRatio
public TopicModelDiagnostics.TopicScores getDocumentPercentRatio(int numeratorIndex, int denominatorIndex)
-
getDocumentPercent
public TopicModelDiagnostics.TopicScores getDocumentPercent(int i)
-
getExclusivity
public TopicModelDiagnostics.TopicScores getExclusivity()
Low-quality topics may have words that are also prominent in other topics.
-
toString
public java.lang.String toString()
- Overrides:
toString
in classjava.lang.Object
-
toXML
public java.lang.String toXML()
-
main
public static void main(java.lang.String[] args) throws java.lang.Exception
- Throws:
java.lang.Exception
-
-