Package cc.mallet.extract
Class StringTokenization
- java.lang.Object
-
- java.util.AbstractCollection<E>
-
- java.util.AbstractList<E>
-
- java.util.ArrayList<Token>
-
- cc.mallet.types.TokenSequence
-
- cc.mallet.extract.StringTokenization
-
- All Implemented Interfaces:
Tokenization
,Sequence
,java.io.Serializable
,java.lang.Cloneable
,java.lang.Iterable<Token>
,java.util.Collection<Token>
,java.util.List<Token>
,java.util.RandomAccess
public class StringTokenization extends TokenSequence implements Tokenization
- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description StringTokenization(java.lang.CharSequence seq)
Create an empty StringTokenizationStringTokenization(java.lang.CharSequence string, CharSequenceLexer lexer)
Creates a tokenization of the given string.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.Object
getDocument()
Returns the document of which this is a tokenization.Span
getSpan(int i)
Span
subspan(int firstToken, int lastToken)
Returns a span formed by concatenating the spans from start to end.-
Methods inherited from class cc.mallet.types.TokenSequence
add, addAll, getNumericProperty, getProperties, getProperty, hasProperty, removeLast, setNumericProperty, setProperty, toFeatureSequence, toFeatureVector, toString, toStringShort
-
Methods inherited from class java.util.ArrayList
add, add, addAll, addAll, clear, clone, contains, ensureCapacity, equals, forEach, get, hashCode, indexOf, isEmpty, iterator, lastIndexOf, listIterator, listIterator, remove, remove, removeAll, removeIf, removeRange, replaceAll, retainAll, set, size, sort, spliterator, subList, toArray, toArray, trimToSize
-
-
-
-
Constructor Detail
-
StringTokenization
public StringTokenization(java.lang.CharSequence seq)
Create an empty StringTokenization
-
StringTokenization
public StringTokenization(java.lang.CharSequence string, CharSequenceLexer lexer)
Creates a tokenization of the given string. Tokens are added from all the matches of the given lexer.
-
-
Method Detail
-
subspan
public Span subspan(int firstToken, int lastToken)
Description copied from interface:Tokenization
Returns a span formed by concatenating the spans from start to end. In more detail:- The start of the new span will be the start index of getSpan(start).
- The end of the new span will be the start index of getSpan(end).
- Unless start == end, the new span will completely include getSpan(start).
- The new span will never intersect getSpan(end)
- If start == end, then the new span contains no text.
- Specified by:
subspan
in interfaceTokenization
- Parameters:
firstToken
- The index of the first token in the new span (inclusive). This is an index of a token, *not* an index into the document.lastToken
- The index of the first token in the new span (exclusive). This is an index of a token, *not* an index into the document.- Returns:
- A span into this tokenization's document
-
getSpan
public Span getSpan(int i)
- Specified by:
getSpan
in interfaceTokenization
-
getDocument
public java.lang.Object getDocument()
Description copied from interface:Tokenization
Returns the document of which this is a tokenization.- Specified by:
getDocument
in interfaceTokenization
-
-