Class SimpleLDA

  • All Implemented Interfaces:
    java.io.Serializable

    public class SimpleLDA
    extends java.lang.Object
    implements java.io.Serializable
    A simple implementation of Latent Dirichlet Allocation using Gibbs sampling. This code is slower than the regular Mallet LDA implementation, but provides a better starting place for understanding how sampling works and for building new topic models.
    Author:
    David Mimno, Andrew McCallum
    See Also:
    Serialized Form
    • Field Detail

      • numTopics

        protected int numTopics
      • numTypes

        protected int numTypes
      • alpha

        protected double alpha
      • alphaSum

        protected double alphaSum
      • beta

        protected double beta
      • betaSum

        protected double betaSum
      • oneDocTopicCounts

        protected int[] oneDocTopicCounts
      • typeTopicCounts

        protected int[][] typeTopicCounts
      • tokensPerTopic

        protected int[] tokensPerTopic
      • showTopicsInterval

        public int showTopicsInterval
      • wordsPerTopic

        public int wordsPerTopic
      • formatter

        protected java.text.NumberFormat formatter
      • printLogLikelihood

        protected boolean printLogLikelihood
    • Constructor Detail

      • SimpleLDA

        public SimpleLDA​(int numberOfTopics)
      • SimpleLDA

        public SimpleLDA​(int numberOfTopics,
                         double alphaSum,
                         double beta)
      • SimpleLDA

        public SimpleLDA​(int numberOfTopics,
                         double alphaSum,
                         double beta,
                         Randoms random)
      • SimpleLDA

        public SimpleLDA​(LabelAlphabet topicAlphabet,
                         double alphaSum,
                         double beta,
                         Randoms random)
    • Method Detail

      • getAlphabet

        public Alphabet getAlphabet()
      • getNumTopics

        public int getNumTopics()
      • setTopicDisplay

        public void setTopicDisplay​(int interval,
                                    int n)
      • setRandomSeed

        public void setRandomSeed​(int seed)
      • getTypeTopicCounts

        public int[][] getTypeTopicCounts()
      • getTopicTotals

        public int[] getTopicTotals()
      • addInstances

        public void addInstances​(InstanceList training)
      • sample

        public void sample​(int iterations)
                    throws java.io.IOException
        Throws:
        java.io.IOException
      • modelLogLikelihood

        public double modelLogLikelihood()
      • topWords

        public java.lang.String topWords​(int numWords)
      • printDocumentTopics

        public void printDocumentTopics​(java.io.File file,
                                        double threshold,
                                        int max)
                                 throws java.io.IOException
        Parameters:
        file - The filename to print to
        threshold - Only print topics with proportion greater than this number
        max - Print no more than this many topics
        Throws:
        java.io.IOException
      • printState

        public void printState​(java.io.File f)
                        throws java.io.IOException
        Throws:
        java.io.IOException
      • printState

        public void printState​(java.io.PrintStream out)
      • write

        public void write​(java.io.File f)
      • main

        public static void main​(java.lang.String[] args)
                         throws java.io.IOException
        Throws:
        java.io.IOException