Package cc.mallet.fst
Class CRF
- java.lang.Object
-
- cc.mallet.fst.Transducer
-
- cc.mallet.fst.CRF
-
- All Implemented Interfaces:
java.io.Serializable
- Direct Known Subclasses:
MEMM
public class CRF extends Transducer implements java.io.Serializable
Represents a CRF model.- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classCRF.FactorsA simple, transparent container to hold the parameters or sufficient statistics for the CRF.static classCRF.Stateprotected static classCRF.TransitionIterator-
Nested classes/interfaces inherited from class cc.mallet.fst.Transducer
Transducer.Incrementor
-
-
Field Summary
Fields Modifier and Type Field Description protected intcachedNumParametersStampprotected java.util.ArrayList<FeatureInducer>featureInducersprotected FeatureSelection[]featureSelectionsprotected FeatureSelectionglobalFeatureSelectionprotected java.util.ArrayList<CRF.State>initialStatesprotected AlphabetinputAlphabetprotected java.util.HashMap<java.lang.String,CRF.State>name2stateprotected intnumParametersprotected AlphabetoutputAlphabetprotected CRF.Factorsparametersprotected java.util.ArrayList<CRF.State>statesprotected intweightsStructureChangeStampprotected intweightsValueChangeStamp-
Fields inherited from class cc.mallet.fst.Transducer
CERTAIN_WEIGHT, IMPOSSIBLE_WEIGHT, inputPipe, outputPipe
-
-
Method Summary
All Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description voidaddFullyConnectedStates(java.lang.String[] stateNames)Add a group of states that are fully connected with each other, with parameters equal zero, and labels on their out-going arcs the same name as their destination state names.voidaddFullyConnectedStatesForBiLabels()voidaddFullyConnectedStatesForLabels()voidaddFullyConnectedStatesForThreeQuarterLabels(InstanceList trainingSet)voidaddFullyConnectedStatesForTriLabels()java.lang.StringaddOrderNStates(InstanceList trainingSet, int[] orders, boolean[] defaults, java.lang.String start, java.util.regex.Pattern forbidden, java.util.regex.Pattern allowed, boolean fullyConnected)Assumes that the CRF's output alphabet containsStrings.voidaddSelfTransitioningStateForAllLabels(java.lang.String name)voidaddStartState()voidaddStartState(java.lang.String name)voidaddState(java.lang.String name, double initialWeight, double finalWeight, java.lang.String[] destinationNames, java.lang.String[] labelNames)Default gives separate parameters to each transition.voidaddState(java.lang.String name, double initialWeight, double finalWeight, java.lang.String[] destinationNames, java.lang.String[] labelNames, java.lang.String[] weightNames)voidaddState(java.lang.String name, double initialWeight, double finalWeight, java.lang.String[] destinationNames, java.lang.String[] labelNames, java.lang.String[][] weightNames)voidaddState(java.lang.String name, java.lang.String[] destinationNames)Add a state with parameters equal zero, and labels on out-going arcs the same name as their destination state names.voidaddStatesForBiLabelsConnectedAsIn(InstanceList trainingSet)Add states to create a second-order Markov model on labels, adding only those transitions the occur in the given trainingSet.voidaddStatesForHalfLabelsConnectedAsIn(InstanceList trainingSet)Add as many states as there are labels, but don't create separate weights for each source-destination pair of states.voidaddStatesForLabelsConnectedAsIn(InstanceList trainingSet)Add states to create a first-order Markov model on labels, adding only those transitions the occur in the given trainingSet.voidaddStatesForThreeQuarterLabelsConnectedAsIn(InstanceList trainingSet)Add as many states as there are labels, but don't create separate observational-test-weights for each source-destination pair of states---instead have all the incoming transitions to a state share the same observational-feature-test weights.voidevaluate(TransducerEvaluator eval, InstanceList testing)Deprecated.voidfreezeWeights(int weightsIndex)Freezes a set of weights to their current values.voidfreezeWeights(java.lang.String weightsName)Freezes a set of weights to their current values.double[]getDefaultWeights()AlphabetgetInputAlphabet()intgetNumParameters()AlphabetgetOutputAlphabet()doublegetParameter(int sourceStateIndex, int destStateIndex, int featureIndex)Only gets the parameter from the first group of parameters.doublegetParameter(int sourceStateIndex, int destStateIndex, int featureIndex, int weightIndex)CRF.FactorsgetParameters()doublegetParametersAbsNorm()Transducer.StategetState(int index)CRF.StategetState(java.lang.String name)SparseVector[]getWeights()SparseVectorgetWeights(int weightIndex)SparseVectorgetWeights(java.lang.String weightName)intgetWeightsIndex(java.lang.String weightName)java.lang.StringgetWeightsName(int weightIndex)intgetWeightsStructureChangeStamp()intgetWeightsValueChangeStamp()voidinduceFeaturesFor(InstanceList instances)When the CRF has done feature induction, these new feature conjunctions must be created in the test or validation data in order for them to take effect.java.util.IteratorinitialStateIterator()booleanisTrainable()booleanisWeightsFrozen(int weightsIndex)protected CRF.StatenewState(java.lang.String name, int index, double initialWeight, double finalWeight, java.lang.String[] destinationNames, java.lang.String[] labelNames, java.lang.String[][] weightNames, CRF crf)intnumStates()Sequence[]predict(InstanceList testing)Deprecated.voidprint()voidprint(java.io.PrintWriter out)voidsetAsStartState(CRF.State state)voidsetDefaultWeight(int widx, double val)voidsetDefaultWeights(double[] w)voidsetFeatureSelection(int weightIdx, FeatureSelection fs)voidsetParameter(int sourceStateIndex, int destStateIndex, int featureIndex, double value)Only sets the parameter from the first group of parameters.voidsetParameter(int sourceStateIndex, int destStateIndex, int featureIndex, int weightIndex, double value)voidsetWeights(int weightsIndex, SparseVector transitionWeights)voidsetWeights(SparseVector[] m)voidsetWeights(java.lang.String weightName, SparseVector transitionWeights)voidsetWeightsDimensionAsIn(InstanceList trainingData)voidsetWeightsDimensionAsIn(InstanceList trainingData, boolean useSomeUnsupportedTrick)voidsetWeightsDimensionDensely()voidunfreezeWeights(java.lang.String weightsName)Unfreezes a set of weights.voidweightsStructureChanged()This method should be called whenever the CRFs weights (parameters) have their structure/arity/number changed.voidweightsValueChanged()This method should be called whenever the CRFs weights (parameters) are changed.voidwrite(java.io.File f)-
Methods inherited from class cc.mallet.fst.Transducer
averageTokenAccuracy, canIterateAllTransitions, generatePath, getInputPipe, getMaxLatticeFactory, getOutputPipe, getSumLatticeFactory, isGenerative, label, less_efficient_sumLogProb, no_longer_needed_sumNegLogProb, setMaxLatticeFactory, setSumLatticeFactory, stateIndexOfString, sumLogProb, transduce, transduce
-
-
-
-
Field Detail
-
inputAlphabet
protected Alphabet inputAlphabet
-
outputAlphabet
protected Alphabet outputAlphabet
-
states
protected java.util.ArrayList<CRF.State> states
-
initialStates
protected java.util.ArrayList<CRF.State> initialStates
-
name2state
protected java.util.HashMap<java.lang.String,CRF.State> name2state
-
parameters
protected CRF.Factors parameters
-
globalFeatureSelection
protected FeatureSelection globalFeatureSelection
-
featureSelections
protected FeatureSelection[] featureSelections
-
featureInducers
protected java.util.ArrayList<FeatureInducer> featureInducers
-
weightsValueChangeStamp
protected int weightsValueChangeStamp
-
weightsStructureChangeStamp
protected int weightsStructureChangeStamp
-
cachedNumParametersStamp
protected int cachedNumParametersStamp
-
numParameters
protected int numParameters
-
-
Method Detail
-
getInputAlphabet
public Alphabet getInputAlphabet()
-
getOutputAlphabet
public Alphabet getOutputAlphabet()
-
weightsStructureChanged
public void weightsStructureChanged()
This method should be called whenever the CRFs weights (parameters) have their structure/arity/number changed.
-
weightsValueChanged
public void weightsValueChanged()
This method should be called whenever the CRFs weights (parameters) are changed.
-
newState
protected CRF.State newState(java.lang.String name, int index, double initialWeight, double finalWeight, java.lang.String[] destinationNames, java.lang.String[] labelNames, java.lang.String[][] weightNames, CRF crf)
-
addState
public void addState(java.lang.String name, double initialWeight, double finalWeight, java.lang.String[] destinationNames, java.lang.String[] labelNames, java.lang.String[][] weightNames)
-
addState
public void addState(java.lang.String name, double initialWeight, double finalWeight, java.lang.String[] destinationNames, java.lang.String[] labelNames, java.lang.String[] weightNames)
-
addState
public void addState(java.lang.String name, double initialWeight, double finalWeight, java.lang.String[] destinationNames, java.lang.String[] labelNames)Default gives separate parameters to each transition.
-
addState
public void addState(java.lang.String name, java.lang.String[] destinationNames)Add a state with parameters equal zero, and labels on out-going arcs the same name as their destination state names.
-
addFullyConnectedStates
public void addFullyConnectedStates(java.lang.String[] stateNames)
Add a group of states that are fully connected with each other, with parameters equal zero, and labels on their out-going arcs the same name as their destination state names.
-
addFullyConnectedStatesForLabels
public void addFullyConnectedStatesForLabels()
-
addStartState
public void addStartState()
-
addStartState
public void addStartState(java.lang.String name)
-
setAsStartState
public void setAsStartState(CRF.State state)
-
addStatesForLabelsConnectedAsIn
public void addStatesForLabelsConnectedAsIn(InstanceList trainingSet)
Add states to create a first-order Markov model on labels, adding only those transitions the occur in the given trainingSet.
-
addStatesForHalfLabelsConnectedAsIn
public void addStatesForHalfLabelsConnectedAsIn(InstanceList trainingSet)
Add as many states as there are labels, but don't create separate weights for each source-destination pair of states. Instead have all the incoming transitions to a state share the same weights.
-
addStatesForThreeQuarterLabelsConnectedAsIn
public void addStatesForThreeQuarterLabelsConnectedAsIn(InstanceList trainingSet)
Add as many states as there are labels, but don't create separate observational-test-weights for each source-destination pair of states---instead have all the incoming transitions to a state share the same observational-feature-test weights. However, do create separate default feature for each transition, (which acts as an HMM-style transition probability).
-
addFullyConnectedStatesForThreeQuarterLabels
public void addFullyConnectedStatesForThreeQuarterLabels(InstanceList trainingSet)
-
addFullyConnectedStatesForBiLabels
public void addFullyConnectedStatesForBiLabels()
-
addStatesForBiLabelsConnectedAsIn
public void addStatesForBiLabelsConnectedAsIn(InstanceList trainingSet)
Add states to create a second-order Markov model on labels, adding only those transitions the occur in the given trainingSet.
-
addFullyConnectedStatesForTriLabels
public void addFullyConnectedStatesForTriLabels()
-
addSelfTransitioningStateForAllLabels
public void addSelfTransitioningStateForAllLabels(java.lang.String name)
-
addOrderNStates
public java.lang.String addOrderNStates(InstanceList trainingSet, int[] orders, boolean[] defaults, java.lang.String start, java.util.regex.Pattern forbidden, java.util.regex.Pattern allowed, boolean fullyConnected)
Assumes that the CRF's output alphabet containsStrings. Creates an order-n CRF with input predicates and output labels given bytrainingSetand order, connectivity, and weights given by the remaining arguments.- Parameters:
trainingSet- the training instancesorders- an array of increasing non-negative numbers giving the orders of the features for this CRF. The largest number n is the Markov order of the CRF. States are n-tuples of output labels. Each of the other numbers k inordersrepresents a weight set shared by all destination states whose last (most recent) k labels agree. Ifordersisnull, an order-0 CRF is built.defaults- If non-null, it must be the same length asorders, withtruepositions indicating that the weight set for the corresponding order contains only the weight for a default feature; otherwise, the weight set has weights for all features built from input predicates.start- The label that represents the context of the start of a sequence. It may be also used for sequence labels. If no label of this name exists, one will be added. Connection wills be added between the start label and all other labels, even if fullyConnected is false. This argument may be null, in which case no special start state is added.forbidden- If non-null, specifies what pairs of successive labels are not allowed, both for constructing norder states or for transitions. A label pair (u,v) is not allowed if u + "," + v matchesforbidden.allowed- If non-null, specifies what pairs of successive labels are allowed, both for constructing norder states or for transitions. A label pair (u,v) is allowed only if u + "," + v matchesallowed.fullyConnected- Whether to include all allowed transitions, even those not occurring intrainingSet,- Returns:
- The name of the start state.
-
getState
public CRF.State getState(java.lang.String name)
-
setWeights
public void setWeights(int weightsIndex, SparseVector transitionWeights)
-
setWeights
public void setWeights(java.lang.String weightName, SparseVector transitionWeights)
-
getWeightsName
public java.lang.String getWeightsName(int weightIndex)
-
getWeights
public SparseVector getWeights(java.lang.String weightName)
-
getWeights
public SparseVector getWeights(int weightIndex)
-
getDefaultWeights
public double[] getDefaultWeights()
-
getWeights
public SparseVector[] getWeights()
-
setWeights
public void setWeights(SparseVector[] m)
-
setDefaultWeights
public void setDefaultWeights(double[] w)
-
setDefaultWeight
public void setDefaultWeight(int widx, double val)
-
isWeightsFrozen
public boolean isWeightsFrozen(int weightsIndex)
-
freezeWeights
public void freezeWeights(int weightsIndex)
Freezes a set of weights to their current values. Frozen weights are used for labeling sequences (as in transduce), but are not be modified by the train methods.- Parameters:
weightsIndex- Index of weight set to freeze.
-
freezeWeights
public void freezeWeights(java.lang.String weightsName)
Freezes a set of weights to their current values. Frozen weights are used for labeling sequences (as in transduce), but are not be modified by the train methods.- Parameters:
weightsName- Name of weight set to freeze.
-
unfreezeWeights
public void unfreezeWeights(java.lang.String weightsName)
Unfreezes a set of weights. Frozen weights are used for labeling sequences (as in transduce), but are not be modified by the train methods.- Parameters:
weightsName- Name of weight set to unfreeze.
-
setFeatureSelection
public void setFeatureSelection(int weightIdx, FeatureSelection fs)
-
setWeightsDimensionAsIn
public void setWeightsDimensionAsIn(InstanceList trainingData)
-
setWeightsDimensionAsIn
public void setWeightsDimensionAsIn(InstanceList trainingData, boolean useSomeUnsupportedTrick)
-
setWeightsDimensionDensely
public void setWeightsDimensionDensely()
-
getWeightsIndex
public int getWeightsIndex(java.lang.String weightName)
-
numStates
public int numStates()
- Specified by:
numStatesin classTransducer
-
getState
public Transducer.State getState(int index)
- Specified by:
getStatein classTransducer
-
initialStateIterator
public java.util.Iterator initialStateIterator()
- Specified by:
initialStateIteratorin classTransducer
-
isTrainable
public boolean isTrainable()
-
getWeightsValueChangeStamp
public int getWeightsValueChangeStamp()
-
getWeightsStructureChangeStamp
public int getWeightsStructureChangeStamp()
-
getParameters
public CRF.Factors getParameters()
-
getParametersAbsNorm
public double getParametersAbsNorm()
-
setParameter
public void setParameter(int sourceStateIndex, int destStateIndex, int featureIndex, double value)Only sets the parameter from the first group of parameters.
-
setParameter
public void setParameter(int sourceStateIndex, int destStateIndex, int featureIndex, int weightIndex, double value)
-
getParameter
public double getParameter(int sourceStateIndex, int destStateIndex, int featureIndex)Only gets the parameter from the first group of parameters.
-
getParameter
public double getParameter(int sourceStateIndex, int destStateIndex, int featureIndex, int weightIndex)
-
getNumParameters
public int getNumParameters()
-
predict
@Deprecated public Sequence[] predict(InstanceList testing)
Deprecated.This method is deprecated.
-
evaluate
@Deprecated public void evaluate(TransducerEvaluator eval, InstanceList testing)
Deprecated.This method is deprecated.
-
induceFeaturesFor
public void induceFeaturesFor(InstanceList instances)
When the CRF has done feature induction, these new feature conjunctions must be created in the test or validation data in order for them to take effect.
-
print
public void print()
- Overrides:
printin classTransducer
-
print
public void print(java.io.PrintWriter out)
-
write
public void write(java.io.File f)
-
-