Package cc.mallet.classify
Class FeatureConstraintUtil
- java.lang.Object
-
- cc.mallet.classify.FeatureConstraintUtil
-
public class FeatureConstraintUtil extends java.lang.Object
Utility functions for creating feature constraints that can be used with GE training.- Author:
- Gregory Druck gdruck@cs.umass.edu
-
-
Constructor Summary
Constructors Constructor Description FeatureConstraintUtil()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static double[][]
getFeatureLabelCounts(InstanceList list, boolean useValues)
static java.util.HashMap<java.lang.Integer,java.util.ArrayList<java.lang.Integer>>
labelFeatures(InstanceList list, java.util.ArrayList<java.lang.Integer> features)
static java.util.HashMap<java.lang.Integer,java.util.ArrayList<java.lang.Integer>>
labelFeatures(InstanceList list, java.util.ArrayList<java.lang.Integer> features, boolean reject)
Label features using heuristic described in "Learning from Labeled Features using Generalized Expectation Criteria" Gregory Druck, Gideon Mann, Andrew McCallum.static java.util.HashMap<java.lang.Integer,double[]>
readConstraintsFromFile(java.lang.String filename, InstanceList data)
Reads feature constraints from a file, whether they are stored using Strings or indices.static java.util.HashMap<java.lang.Integer,double[]>
readConstraintsFromFileIndex(java.lang.String filename, InstanceList data)
Reads feature constraints stored using strings from a file.static java.util.HashMap<java.lang.Integer,double[]>
readConstraintsFromFileString(java.lang.String filename, InstanceList data)
Reads feature constraints stored using strings from a file.static java.util.HashMap<java.lang.Integer,double[][]>
readRangeConstraintsFromFile(java.lang.String filename, InstanceList data)
Reads range constraints stored using strings from a file.static java.util.ArrayList<java.lang.Integer>
selectFeaturesByInfoGain(InstanceList list, int numFeatures)
Select features with the highest information gain.static java.util.ArrayList<java.lang.Integer>
selectTopLDAFeatures(int numSelFeatures, ParallelTopicModel lda, Alphabet alphabet)
Select top features in LDA topics.static java.util.HashMap<java.lang.Integer,double[]>
setTargetsUsingData(InstanceList list, java.util.ArrayList<java.lang.Integer> features)
static java.util.HashMap<java.lang.Integer,double[]>
setTargetsUsingData(InstanceList list, java.util.ArrayList<java.lang.Integer> features, boolean normalize)
static java.util.HashMap<java.lang.Integer,double[]>
setTargetsUsingData(InstanceList list, java.util.ArrayList<java.lang.Integer> features, boolean useValues, boolean normalize)
Set target distributions using estimates from data.static java.util.HashMap<java.lang.Integer,double[]>
setTargetsUsingFeatureVoting(java.util.HashMap<java.lang.Integer,java.util.ArrayList<java.lang.Integer>> labeledFeatures, InstanceList trainingData)
Set target distributions using feature voting heuristic described in "Learning from Labeled Features using Generalized Expectation Criteria" Gregory Druck, Gideon Mann, Andrew McCallum.static java.util.HashMap<java.lang.Integer,double[]>
setTargetsUsingHeuristic(java.util.HashMap<java.lang.Integer,java.util.ArrayList<java.lang.Integer>> labeledFeatures, int numLabels, double majorityProb)
Set target distributions using "Schapire" heuristic described in "Learning from Labeled Features using Generalized Expectation Criteria" Gregory Druck, Gideon Mann, Andrew McCallum.
-
-
-
Method Detail
-
readRangeConstraintsFromFile
public static java.util.HashMap<java.lang.Integer,double[][]> readRangeConstraintsFromFile(java.lang.String filename, InstanceList data)
Reads range constraints stored using strings from a file. Format can be either: feature_name (label_name:lower_probability,upper_probability)+ or feature_name (label_name:probability)+ Constraints are only added for feature-label pairs that are present.- Parameters:
filename
- File with feature constraints.data
- InstanceList used for alphabets.- Returns:
- Constraints.
-
readConstraintsFromFile
public static java.util.HashMap<java.lang.Integer,double[]> readConstraintsFromFile(java.lang.String filename, InstanceList data)
Reads feature constraints from a file, whether they are stored using Strings or indices.- Parameters:
filename
- File with feature constraints.data
- InstanceList used for alphabets.- Returns:
- Constraints.
-
readConstraintsFromFileString
public static java.util.HashMap<java.lang.Integer,double[]> readConstraintsFromFileString(java.lang.String filename, InstanceList data)
Reads feature constraints stored using strings from a file. feature_name (label_name:probability)+ Labels that do appear get probability 0.- Parameters:
filename
- File with feature constraints.data
- InstanceList used for alphabets.- Returns:
- Constraints.
-
readConstraintsFromFileIndex
public static java.util.HashMap<java.lang.Integer,double[]> readConstraintsFromFileIndex(java.lang.String filename, InstanceList data)
Reads feature constraints stored using strings from a file. feature_index label_0_prob label_1_prob ... label_n_prob Here each label must appear.- Parameters:
filename
- File with feature constraints.data
- InstanceList used for alphabets.- Returns:
- Constraints.
-
selectFeaturesByInfoGain
public static java.util.ArrayList<java.lang.Integer> selectFeaturesByInfoGain(InstanceList list, int numFeatures)
Select features with the highest information gain.- Parameters:
list
- InstanceList for computing information gain.numFeatures
- Number of features to select.- Returns:
- List of features with the highest information gains.
-
selectTopLDAFeatures
public static java.util.ArrayList<java.lang.Integer> selectTopLDAFeatures(int numSelFeatures, ParallelTopicModel lda, Alphabet alphabet)
Select top features in LDA topics.- Parameters:
numSelFeatures
- Number of features to select.ldaEst
- LDAEstimatePr which provides an interface to an LDA model.seqAlphabet
- The alphabet for the sequence dataset, which may be different from the vector dataset alphabet.alphabet
- The vector dataset alphabet.- Returns:
- ArrayList with the int indices of the selected features.
-
setTargetsUsingData
public static java.util.HashMap<java.lang.Integer,double[]> setTargetsUsingData(InstanceList list, java.util.ArrayList<java.lang.Integer> features)
-
setTargetsUsingData
public static java.util.HashMap<java.lang.Integer,double[]> setTargetsUsingData(InstanceList list, java.util.ArrayList<java.lang.Integer> features, boolean normalize)
-
setTargetsUsingData
public static java.util.HashMap<java.lang.Integer,double[]> setTargetsUsingData(InstanceList list, java.util.ArrayList<java.lang.Integer> features, boolean useValues, boolean normalize)
Set target distributions using estimates from data.- Parameters:
list
- InstanceList used to estimate targets.features
- List of features for constraints.normalize
- Whether to normalize by feature counts- Returns:
- Constraints (map of feature index to target), with targets set using estimates from supplied data.
-
setTargetsUsingHeuristic
public static java.util.HashMap<java.lang.Integer,double[]> setTargetsUsingHeuristic(java.util.HashMap<java.lang.Integer,java.util.ArrayList<java.lang.Integer>> labeledFeatures, int numLabels, double majorityProb)
Set target distributions using "Schapire" heuristic described in "Learning from Labeled Features using Generalized Expectation Criteria" Gregory Druck, Gideon Mann, Andrew McCallum.- Parameters:
labeledFeatures
- HashMap of feature indices to lists of label indices for that feature.numLabels
- Total number of labels.majorityProb
- Probability mass divided among majority labels.- Returns:
- Constraints (map of feature index to target distribution), with target distributions set using heuristic.
-
setTargetsUsingFeatureVoting
public static java.util.HashMap<java.lang.Integer,double[]> setTargetsUsingFeatureVoting(java.util.HashMap<java.lang.Integer,java.util.ArrayList<java.lang.Integer>> labeledFeatures, InstanceList trainingData)
Set target distributions using feature voting heuristic described in "Learning from Labeled Features using Generalized Expectation Criteria" Gregory Druck, Gideon Mann, Andrew McCallum.- Parameters:
labeledFeatures
- HashMap of feature indices to lists of label indices for that feature.trainingData
- InstanceList to use for computing expectations with feature voting.- Returns:
- Constraints (map of feature index to target distribution), with target distributions set using feature voting.
-
labelFeatures
public static java.util.HashMap<java.lang.Integer,java.util.ArrayList<java.lang.Integer>> labelFeatures(InstanceList list, java.util.ArrayList<java.lang.Integer> features, boolean reject)
Label features using heuristic described in "Learning from Labeled Features using Generalized Expectation Criteria" Gregory Druck, Gideon Mann, Andrew McCallum.- Parameters:
list
- InstanceList used to compute statistics for labeling features.features
- List of features to label.reject
- Whether to reject labeling features.- Returns:
- Labeled features, HashMap mapping feature indices to list of labels.
-
labelFeatures
public static java.util.HashMap<java.lang.Integer,java.util.ArrayList<java.lang.Integer>> labelFeatures(InstanceList list, java.util.ArrayList<java.lang.Integer> features)
-
getFeatureLabelCounts
public static double[][] getFeatureLabelCounts(InstanceList list, boolean useValues)
-
-