Package cc.mallet.extract
Interface Tokenization
-
- All Superinterfaces:
Sequence
- All Known Implementing Classes:
StringTokenization
public interface Tokenization extends Sequence
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description java.lang.Object
getDocument()
Returns the document of which this is a tokenization.Span
getSpan(int i)
Span
subspan(int start, int end)
Returns a span formed by concatenating the spans from start to end.
-
-
-
Method Detail
-
getDocument
java.lang.Object getDocument()
Returns the document of which this is a tokenization.
-
getSpan
Span getSpan(int i)
-
subspan
Span subspan(int start, int end)
Returns a span formed by concatenating the spans from start to end. In more detail:- The start of the new span will be the start index of getSpan(start).
- The end of the new span will be the start index of getSpan(end).
- Unless start == end, the new span will completely include getSpan(start).
- The new span will never intersect getSpan(end)
- If start == end, then the new span contains no text.
- Parameters:
start
- The index of the first token in the new span (inclusive). This is an index of a token, *not* an index into the document.end
- The index of the first token in the new span (exclusive). This is an index of a token, *not* an index into the document.- Returns:
- A span into this tokenization's document
-
-