Class TopicInferencer

  • All Implemented Interfaces:
    java.io.Serializable
    Direct Known Subclasses:
    DMRInferencer

    public class TopicInferencer
    extends java.lang.Object
    implements java.io.Serializable
    See Also:
    Serialized Form
    • Field Detail

      • numTopics

        protected int numTopics
      • topicMask

        protected int topicMask
      • topicBits

        protected int topicBits
      • numTypes

        protected int numTypes
      • alpha

        protected double[] alpha
      • beta

        protected double beta
      • betaSum

        protected double betaSum
      • typeTopicCounts

        protected int[][] typeTopicCounts
      • tokensPerTopic

        protected int[] tokensPerTopic
      • smoothingOnlyMass

        protected double smoothingOnlyMass
      • cachedCoefficients

        protected double[] cachedCoefficients
    • Constructor Detail

      • TopicInferencer

        public TopicInferencer​(int[][] typeTopicCounts,
                               int[] tokensPerTopic,
                               Alphabet alphabet,
                               double[] alpha,
                               double beta,
                               double betaSum)
      • TopicInferencer

        public TopicInferencer()
    • Method Detail

      • setRandomSeed

        public void setRandomSeed​(int seed)
      • getSampledDistribution

        public double[] getSampledDistribution​(Instance instance,
                                               int numIterations,
                                               int thinning,
                                               int burnIn)
        Use Gibbs sampling to infer a topic distribution. Topics are initialized to the (or a) most probable topic for each token. Using zero iterations returns exactly this initial topic distribution.

        This code does not adjust type-topic counts: P(w|t) is clamped.

      • writeInferredDistributions

        public void writeInferredDistributions​(InstanceList instances,
                                               java.io.File distributionsFile,
                                               int numIterations,
                                               int thinning,
                                               int burnIn,
                                               double threshold,
                                               int max)
                                        throws java.io.IOException
        Infer topics for the provided instances and write distributions to the provided file.
        Parameters:
        instances -
        distributionsFile -
        numIterations - The total number of iterations of sampling per document
        thinning - The number of iterations between saved samples
        burnIn - The number of iterations before the first saved sample
        threshold - The minimum proportion of a given topic that will be written
        max - The total number of topics to report per document]
        Throws:
        java.io.IOException
      • read

        public static TopicInferencer read​(java.io.File f)
                                    throws java.lang.Exception
        Throws:
        java.lang.Exception