Package cc.mallet.classify
Class FeatureConstraintUtil
- java.lang.Object
-
- cc.mallet.classify.FeatureConstraintUtil
-
public class FeatureConstraintUtil extends java.lang.ObjectUtility functions for creating feature constraints that can be used with GE training.- Author:
- Gregory Druck gdruck@cs.umass.edu
-
-
Constructor Summary
Constructors Constructor Description FeatureConstraintUtil()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static double[][]getFeatureLabelCounts(InstanceList list, boolean useValues)static java.util.HashMap<java.lang.Integer,java.util.ArrayList<java.lang.Integer>>labelFeatures(InstanceList list, java.util.ArrayList<java.lang.Integer> features)static java.util.HashMap<java.lang.Integer,java.util.ArrayList<java.lang.Integer>>labelFeatures(InstanceList list, java.util.ArrayList<java.lang.Integer> features, boolean reject)Label features using heuristic described in "Learning from Labeled Features using Generalized Expectation Criteria" Gregory Druck, Gideon Mann, Andrew McCallum.static java.util.HashMap<java.lang.Integer,double[]>readConstraintsFromFile(java.lang.String filename, InstanceList data)Reads feature constraints from a file, whether they are stored using Strings or indices.static java.util.HashMap<java.lang.Integer,double[]>readConstraintsFromFileIndex(java.lang.String filename, InstanceList data)Reads feature constraints stored using strings from a file.static java.util.HashMap<java.lang.Integer,double[]>readConstraintsFromFileString(java.lang.String filename, InstanceList data)Reads feature constraints stored using strings from a file.static java.util.HashMap<java.lang.Integer,double[][]>readRangeConstraintsFromFile(java.lang.String filename, InstanceList data)Reads range constraints stored using strings from a file.static java.util.ArrayList<java.lang.Integer>selectFeaturesByInfoGain(InstanceList list, int numFeatures)Select features with the highest information gain.static java.util.ArrayList<java.lang.Integer>selectTopLDAFeatures(int numSelFeatures, ParallelTopicModel lda, Alphabet alphabet)Select top features in LDA topics.static java.util.HashMap<java.lang.Integer,double[]>setTargetsUsingData(InstanceList list, java.util.ArrayList<java.lang.Integer> features)static java.util.HashMap<java.lang.Integer,double[]>setTargetsUsingData(InstanceList list, java.util.ArrayList<java.lang.Integer> features, boolean normalize)static java.util.HashMap<java.lang.Integer,double[]>setTargetsUsingData(InstanceList list, java.util.ArrayList<java.lang.Integer> features, boolean useValues, boolean normalize)Set target distributions using estimates from data.static java.util.HashMap<java.lang.Integer,double[]>setTargetsUsingFeatureVoting(java.util.HashMap<java.lang.Integer,java.util.ArrayList<java.lang.Integer>> labeledFeatures, InstanceList trainingData)Set target distributions using feature voting heuristic described in "Learning from Labeled Features using Generalized Expectation Criteria" Gregory Druck, Gideon Mann, Andrew McCallum.static java.util.HashMap<java.lang.Integer,double[]>setTargetsUsingHeuristic(java.util.HashMap<java.lang.Integer,java.util.ArrayList<java.lang.Integer>> labeledFeatures, int numLabels, double majorityProb)Set target distributions using "Schapire" heuristic described in "Learning from Labeled Features using Generalized Expectation Criteria" Gregory Druck, Gideon Mann, Andrew McCallum.
-
-
-
Method Detail
-
readRangeConstraintsFromFile
public static java.util.HashMap<java.lang.Integer,double[][]> readRangeConstraintsFromFile(java.lang.String filename, InstanceList data)Reads range constraints stored using strings from a file. Format can be either: feature_name (label_name:lower_probability,upper_probability)+ or feature_name (label_name:probability)+ Constraints are only added for feature-label pairs that are present.- Parameters:
filename- File with feature constraints.data- InstanceList used for alphabets.- Returns:
- Constraints.
-
readConstraintsFromFile
public static java.util.HashMap<java.lang.Integer,double[]> readConstraintsFromFile(java.lang.String filename, InstanceList data)Reads feature constraints from a file, whether they are stored using Strings or indices.- Parameters:
filename- File with feature constraints.data- InstanceList used for alphabets.- Returns:
- Constraints.
-
readConstraintsFromFileString
public static java.util.HashMap<java.lang.Integer,double[]> readConstraintsFromFileString(java.lang.String filename, InstanceList data)Reads feature constraints stored using strings from a file. feature_name (label_name:probability)+ Labels that do appear get probability 0.- Parameters:
filename- File with feature constraints.data- InstanceList used for alphabets.- Returns:
- Constraints.
-
readConstraintsFromFileIndex
public static java.util.HashMap<java.lang.Integer,double[]> readConstraintsFromFileIndex(java.lang.String filename, InstanceList data)Reads feature constraints stored using strings from a file. feature_index label_0_prob label_1_prob ... label_n_prob Here each label must appear.- Parameters:
filename- File with feature constraints.data- InstanceList used for alphabets.- Returns:
- Constraints.
-
selectFeaturesByInfoGain
public static java.util.ArrayList<java.lang.Integer> selectFeaturesByInfoGain(InstanceList list, int numFeatures)
Select features with the highest information gain.- Parameters:
list- InstanceList for computing information gain.numFeatures- Number of features to select.- Returns:
- List of features with the highest information gains.
-
selectTopLDAFeatures
public static java.util.ArrayList<java.lang.Integer> selectTopLDAFeatures(int numSelFeatures, ParallelTopicModel lda, Alphabet alphabet)Select top features in LDA topics.- Parameters:
numSelFeatures- Number of features to select.ldaEst- LDAEstimatePr which provides an interface to an LDA model.seqAlphabet- The alphabet for the sequence dataset, which may be different from the vector dataset alphabet.alphabet- The vector dataset alphabet.- Returns:
- ArrayList with the int indices of the selected features.
-
setTargetsUsingData
public static java.util.HashMap<java.lang.Integer,double[]> setTargetsUsingData(InstanceList list, java.util.ArrayList<java.lang.Integer> features)
-
setTargetsUsingData
public static java.util.HashMap<java.lang.Integer,double[]> setTargetsUsingData(InstanceList list, java.util.ArrayList<java.lang.Integer> features, boolean normalize)
-
setTargetsUsingData
public static java.util.HashMap<java.lang.Integer,double[]> setTargetsUsingData(InstanceList list, java.util.ArrayList<java.lang.Integer> features, boolean useValues, boolean normalize)
Set target distributions using estimates from data.- Parameters:
list- InstanceList used to estimate targets.features- List of features for constraints.normalize- Whether to normalize by feature counts- Returns:
- Constraints (map of feature index to target), with targets set using estimates from supplied data.
-
setTargetsUsingHeuristic
public static java.util.HashMap<java.lang.Integer,double[]> setTargetsUsingHeuristic(java.util.HashMap<java.lang.Integer,java.util.ArrayList<java.lang.Integer>> labeledFeatures, int numLabels, double majorityProb)Set target distributions using "Schapire" heuristic described in "Learning from Labeled Features using Generalized Expectation Criteria" Gregory Druck, Gideon Mann, Andrew McCallum.- Parameters:
labeledFeatures- HashMap of feature indices to lists of label indices for that feature.numLabels- Total number of labels.majorityProb- Probability mass divided among majority labels.- Returns:
- Constraints (map of feature index to target distribution), with target distributions set using heuristic.
-
setTargetsUsingFeatureVoting
public static java.util.HashMap<java.lang.Integer,double[]> setTargetsUsingFeatureVoting(java.util.HashMap<java.lang.Integer,java.util.ArrayList<java.lang.Integer>> labeledFeatures, InstanceList trainingData)Set target distributions using feature voting heuristic described in "Learning from Labeled Features using Generalized Expectation Criteria" Gregory Druck, Gideon Mann, Andrew McCallum.- Parameters:
labeledFeatures- HashMap of feature indices to lists of label indices for that feature.trainingData- InstanceList to use for computing expectations with feature voting.- Returns:
- Constraints (map of feature index to target distribution), with target distributions set using feature voting.
-
labelFeatures
public static java.util.HashMap<java.lang.Integer,java.util.ArrayList<java.lang.Integer>> labelFeatures(InstanceList list, java.util.ArrayList<java.lang.Integer> features, boolean reject)
Label features using heuristic described in "Learning from Labeled Features using Generalized Expectation Criteria" Gregory Druck, Gideon Mann, Andrew McCallum.- Parameters:
list- InstanceList used to compute statistics for labeling features.features- List of features to label.reject- Whether to reject labeling features.- Returns:
- Labeled features, HashMap mapping feature indices to list of labels.
-
labelFeatures
public static java.util.HashMap<java.lang.Integer,java.util.ArrayList<java.lang.Integer>> labelFeatures(InstanceList list, java.util.ArrayList<java.lang.Integer> features)
-
getFeatureLabelCounts
public static double[][] getFeatureLabelCounts(InstanceList list, boolean useValues)
-
-