Package cc.mallet.extract
Class CRFExtractor
- java.lang.Object
-
- cc.mallet.extract.CRFExtractor
-
- All Implemented Interfaces:
Extractor
,java.io.Serializable
public class CRFExtractor extends java.lang.Object implements Extractor
Created: Oct 12, 2004- Version:
- $Id: CRFExtractor.java,v 1.1 2007/10/22 21:37:44 mccallum Exp $
- Author:
- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description CRFExtractor(CRF crf)
CRFExtractor(CRF crf, Pipe tokpipe)
CRFExtractor(CRF crf, Pipe tokpipe, TokenizationFilter filter)
CRFExtractor(CRF crf, Pipe tokpipe, TokenizationFilter filter, java.lang.String backgroundTag)
CRFExtractor(java.io.File crfFile)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Extraction
extract(Tokenization spans)
Performs extraction from an object that has been already been tokenized.Extraction
extract(InstanceList ilist)
Assumes Instance.source contains the Tokenization object.Extraction
extract(java.lang.Object o)
Performs extraction given a raw object.Extraction
extract(java.util.Iterator<Instance> source)
Performs extraction on a a set of raw documents.java.lang.String
getBackgroundTag()
CRF
getCrf()
Pipe
getFeaturePipe()
Returns the pipe used by this extractor for.Alphabet
getInputAlphabet()
Returns an alphabet of the features used by the extractor.LabelAlphabet
getTargetAlphabet()
Returns an alphabet of the labels used by the extractor.TokenizationFilter
getTokenizationFilter()
Pipe
getTokenizationPipe()
Returns the pipe used by this extractor to tokenize the input.Sequence
pipeInput(java.lang.Object input)
InstanceList
pipeInstances(java.util.Iterator<Instance> source)
void
setFeaturePipe(Pipe featurePipe)
void
setTokenizationPipe(Pipe tokenizationPipe)
Sets the pipe used by this extractor for tokenization.void
slicePipes(int num)
Transfer some Pipes from the feature pipe to the tokenization pipe.
-
-
-
Constructor Detail
-
CRFExtractor
public CRFExtractor(CRF crf)
-
CRFExtractor
public CRFExtractor(java.io.File crfFile) throws java.io.IOException
- Throws:
java.io.IOException
-
CRFExtractor
public CRFExtractor(CRF crf, Pipe tokpipe, TokenizationFilter filter)
-
CRFExtractor
public CRFExtractor(CRF crf, Pipe tokpipe, TokenizationFilter filter, java.lang.String backgroundTag)
-
-
Method Detail
-
extract
public Extraction extract(java.lang.Object o)
Description copied from interface:Extractor
Performs extraction given a raw object. The object will be passed through the Extractor's pipe.
-
extract
public Extraction extract(Tokenization spans)
Description copied from interface:Extractor
Performs extraction from an object that has been already been tokenized. This method will pass spans through the extractor's pipe.
-
pipeInstances
public InstanceList pipeInstances(java.util.Iterator<Instance> source)
-
extract
public Extraction extract(InstanceList ilist)
Assumes Instance.source contains the Tokenization object.
-
extract
public Extraction extract(java.util.Iterator<Instance> source)
Description copied from interface:Extractor
Performs extraction on a a set of raw documents. The Instances output from source will be passed through both the tokentization pipe and the feature extraction pipe.
-
getTokenizationFilter
public TokenizationFilter getTokenizationFilter()
-
getBackgroundTag
public java.lang.String getBackgroundTag()
-
getTokenizationPipe
public Pipe getTokenizationPipe()
Description copied from interface:Extractor
Returns the pipe used by this extractor to tokenize the input. The type of Instance of this pipe expects is specific to the individual extractor. This pipe will return an Instance whose data is a Tokenization.- Specified by:
getTokenizationPipe
in interfaceExtractor
- Returns:
- a pipe
-
setTokenizationPipe
public void setTokenizationPipe(Pipe tokenizationPipe)
Description copied from interface:Extractor
Sets the pipe used by this extractor for tokenization. The pipe should takes a raw object and convert it into a Tokenization.The pipe @link{edu.umass.cs.mallet.base.pipe.CharSequence2TokenSequence} is an example of a pipe that could be used here.
- Specified by:
setTokenizationPipe
in interfaceExtractor
-
getFeaturePipe
public Pipe getFeaturePipe()
Description copied from interface:Extractor
Returns the pipe used by this extractor for. The pipe takes an Instance and converts it into a form usable by the particular extraction algorithm. This pipe expects the Instance's data field to be a Tokenization. For example, pipes often perform feature extraction. The type of raw object expected by the pipe depends on the particular subclass of extractor.- Specified by:
getFeaturePipe
in interfaceExtractor
- Returns:
- a pipe
-
setFeaturePipe
public void setFeaturePipe(Pipe featurePipe)
-
getInputAlphabet
public Alphabet getInputAlphabet()
Description copied from interface:Extractor
Returns an alphabet of the features used by the extractor. The alphabet maps strings describing the features to indices.- Specified by:
getInputAlphabet
in interfaceExtractor
- Returns:
- the input alphabet
-
getTargetAlphabet
public LabelAlphabet getTargetAlphabet()
Description copied from interface:Extractor
Returns an alphabet of the labels used by the extractor. Labels include entity types (such as PERSON) and slot names (such as EMPLOYEE-OF).- Specified by:
getTargetAlphabet
in interfaceExtractor
- Returns:
- the target alphabet
-
getCrf
public CRF getCrf()
-
slicePipes
public void slicePipes(int num)
Transfer some Pipes from the feature pipe to the tokenization pipe. The feature pipe must be a SerialPipes. This will destructively modify the CRF object of the extractor. This is useful if you have a CRF hat has been trained from a single pipe, which you need to split up int feature and tokenization pipes
-
pipeInput
public Sequence pipeInput(java.lang.Object input)
-
-