Package cc.mallet.types
Class PagedInstanceList
- java.lang.Object
-
- java.util.AbstractCollection<E>
-
- java.util.AbstractList<E>
-
- java.util.ArrayList<Instance>
-
- cc.mallet.types.InstanceList
-
- cc.mallet.types.PagedInstanceList
-
- All Implemented Interfaces:
AlphabetCarrying
,java.io.Serializable
,java.lang.Cloneable
,java.lang.Iterable<Instance>
,java.util.Collection<Instance>
,java.util.List<Instance>
,java.util.RandomAccess
public class PagedInstanceList extends InstanceList
An InstanceList which avoids OutOfMemoryErrors by saving Instances to disk when there is not enough memory to create a new Instance. It implements a fixed-size paging scheme, where each page on disk storesinstancesPerPage
Instances. So, while the number of Instances per pages is constant, the size in bytes of each page may vary. Using this class instead of InstanceList means the number of Instances you can store is essentially limited only by disk size (and patience). The paging scheme is optimized for the most frequent case of looping through the InstanceList from index 0 to n. If there are n instances, then instances 0->(n/size()) are stored together on page 1, instances (n/size)+1 -> 2*(n/size) are on page 2, ... etc. This way, pages adjacent in theinstances
list will usually be in the same page.- Author:
- Aron Culotta culotta@cs.umass.edu
- See Also:
InstanceList
, Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class cc.mallet.types.InstanceList
InstanceList.CrossValidationIterator, InstanceList.StratifiedCrossValidationIterator
-
-
Field Summary
-
Fields inherited from class cc.mallet.types.InstanceList
TARGET_PROPERTY
-
-
Constructor Summary
Constructors Constructor Description PagedInstanceList(Pipe pipe, int numPages, int instancesPerPage)
PagedInstanceList(Pipe pipe, int numPages, int instancesPerPage, java.io.File swapDir)
Creates a PagedInstanceList where "instancesPerPage" instances are swapped to disk in directory "swapDir" if the amount of free system memory drops below "minFreeMemory" bytes
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
add(Instance instance)
Appends the instance to this list.void
clear()
InstanceList
cloneEmpty()
Instance
get(int index)
Returns theInstance
at the specified index.boolean
getCollectGarbage()
int
getSwapIns()
long
getSwapInTime()
int
getSwapOuts()
long
getSwapOutTime()
java.util.Iterator<Instance>
iterator()
static InstanceList
load(java.io.File file)
Constructs a newInstanceList
, deserialized fromfile
.Instance
set(int index, Instance instance)
Replaces theInstance
at positionindex
with a new one.void
setCollectGarbage(boolean b)
InstanceList
shallowClone()
int
size()
InstanceList[]
split(java.util.Random r, double[] proportions)
Shuffles the elements of this list among several smaller lists.-
Methods inherited from class cc.mallet.types.InstanceList
add, add, add, add, addAll, addAll, addThruPipe, addThruPipe, clone, cloneEmptyInto, crossValidationIterator, crossValidationIterator, getAlphabet, getAlphabets, getDataAlphabet, getDataClass, getFeatureSelection, getInstanceWeight, getInstanceWeight, getPerLabelFeatureSelection, getPipe, getTargetAlphabet, getTargetClass, hideSomeLabels, hideSomeLabels, remove, remove, removeSources, removeTargets, sampleWithInstanceWeights, sampleWithReplacement, sampleWithWeights, save, setFeatureSelection, setInstance, setInstanceWeight, setInstanceWeight, setPerLabelFeatureSelection, setPipe, shuffle, split, splitInOrder, splitInOrder, splitInTwoByModulo, stratifiedSplit, stratifiedSplitInOrder, subList, subList, targetLabelDistribution, unhideAllLabels
-
Methods inherited from class java.util.ArrayList
contains, ensureCapacity, equals, forEach, hashCode, indexOf, isEmpty, lastIndexOf, listIterator, listIterator, remove, removeAll, removeIf, removeRange, replaceAll, retainAll, sort, spliterator, toArray, toArray, trimToSize
-
-
-
-
Constructor Detail
-
PagedInstanceList
public PagedInstanceList(Pipe pipe, int numPages, int instancesPerPage, java.io.File swapDir)
Creates a PagedInstanceList where "instancesPerPage" instances are swapped to disk in directory "swapDir" if the amount of free system memory drops below "minFreeMemory" bytes- Parameters:
pipe
- instance pipenumPages
- number of pages to keep in memoryinstancesPerPage
- number of Instances to store in each pageswapDir
- where the pages on disk live.
-
PagedInstanceList
public PagedInstanceList(Pipe pipe, int numPages, int instancesPerPage)
-
-
Method Detail
-
split
public InstanceList[] split(java.util.Random r, double[] proportions)
Shuffles the elements of this list among several smaller lists. Overrides InstanceList.split to add instances in original order, to prevent thrashing.- Overrides:
split
in classInstanceList
- Parameters:
proportions
- A list of numbers (not necessarily summing to 1) which, when normalized, correspond to the proportion of elements in each returned sublist.r
- The source of randomness to use in shuffling.- Returns:
- one
InstanceList
for each element ofproportions
-
add
public boolean add(Instance instance)
Appends the instance to this list. Note that since memory for the Instance has already been allocated, no check is made to catch OutOfMemoryError.- Specified by:
add
in interfacejava.util.Collection<Instance>
- Specified by:
add
in interfacejava.util.List<Instance>
- Overrides:
add
in classInstanceList
- Returns:
true
if successful
-
get
public Instance get(int index)
Returns theInstance
at the specified index. If this Instance is not in memory, swap a block of instances back into memory.
-
set
public Instance set(int index, Instance instance)
Replaces theInstance
at positionindex
with a new one. Note that this is the only sanctioned way of changing an Instance.- Specified by:
set
in interfacejava.util.List<Instance>
- Overrides:
set
in classInstanceList
-
getCollectGarbage
public boolean getCollectGarbage()
-
setCollectGarbage
public void setCollectGarbage(boolean b)
-
shallowClone
public InstanceList shallowClone()
- Overrides:
shallowClone
in classInstanceList
-
cloneEmpty
public InstanceList cloneEmpty()
- Overrides:
cloneEmpty
in classInstanceList
-
clear
public void clear()
- Specified by:
clear
in interfacejava.util.Collection<Instance>
- Specified by:
clear
in interfacejava.util.List<Instance>
- Overrides:
clear
in classInstanceList
-
getSwapIns
public int getSwapIns()
-
getSwapInTime
public long getSwapInTime()
-
getSwapOuts
public int getSwapOuts()
-
getSwapOutTime
public long getSwapOutTime()
-
size
public int size()
-
iterator
public java.util.Iterator<Instance> iterator()
-
load
public static InstanceList load(java.io.File file)
Constructs a newInstanceList
, deserialized fromfile
. If the string value offile
is "-", then deserialize fromSystem.in
.
-
-