DownsampleLabelWords (Mallet 2 API)

java.lang.Object
- cc.mallet.transform.DownsampleLabelWords

```
public class DownsampleLabelWords
extends java.lang.Object
```
This class implements the method from "Authorless Topic Models" by Thompson and Mimno, COLING 2018. The goal is to reduce the frequency of words that are unusually associated with a particular label. This is useful as a pre-processing step for topic modeling becuase it reduces the correlation of topics to known class labels. The problem comes up most often in fiction, where topics tend to simply reproduce lists of characters. The input is a labeled feature sequence, of the sort used for topic modeling. Unlike the regular topic modeling system, labels are required, since we need something to correlate. The output is another feature sequence with word tokens removed. Note that some words may disappear from the corpus, but they will still be present in the alphabet. The code takes one parameter, equivalent to a p-value where the null hypothesis is that a word occurs no more frequently in one category than in the collection as a whole.

Author:

David Mimno

- Constructor Summary
  
  Constructors
  Constructor Description
  
  DownsampleLabelWords()
- Method Summary
  
  All Methods Static Methods Concrete Methods
  Modifier and Type Method Description
  
  static void main(java.lang.String[] args)
  - Methods inherited from class java.lang.Object
    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail
- DownsampleLabelWords
```
public DownsampleLabelWords()
```

Method Detail

main

public static void main(java.lang.String[] args)
                 throws java.io.FileNotFoundException,
                        java.io.IOException

Throws:: java.io.FileNotFoundException; java.io.IOException