Package cc.mallet.types
Class PagedInstanceList
- java.lang.Object
-
- java.util.AbstractCollection<E>
-
- java.util.AbstractList<E>
-
- java.util.ArrayList<Instance>
-
- cc.mallet.types.InstanceList
-
- cc.mallet.types.PagedInstanceList
-
- All Implemented Interfaces:
AlphabetCarrying,java.io.Serializable,java.lang.Cloneable,java.lang.Iterable<Instance>,java.util.Collection<Instance>,java.util.List<Instance>,java.util.RandomAccess
public class PagedInstanceList extends InstanceList
An InstanceList which avoids OutOfMemoryErrors by saving Instances to disk when there is not enough memory to create a new Instance. It implements a fixed-size paging scheme, where each page on disk storesinstancesPerPageInstances. So, while the number of Instances per pages is constant, the size in bytes of each page may vary. Using this class instead of InstanceList means the number of Instances you can store is essentially limited only by disk size (and patience). The paging scheme is optimized for the most frequent case of looping through the InstanceList from index 0 to n. If there are n instances, then instances 0->(n/size()) are stored together on page 1, instances (n/size)+1 -> 2*(n/size) are on page 2, ... etc. This way, pages adjacent in theinstanceslist will usually be in the same page.- Author:
- Aron Culotta culotta@cs.umass.edu
- See Also:
InstanceList, Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class cc.mallet.types.InstanceList
InstanceList.CrossValidationIterator, InstanceList.StratifiedCrossValidationIterator
-
-
Field Summary
-
Fields inherited from class cc.mallet.types.InstanceList
TARGET_PROPERTY
-
-
Constructor Summary
Constructors Constructor Description PagedInstanceList(Pipe pipe, int numPages, int instancesPerPage)PagedInstanceList(Pipe pipe, int numPages, int instancesPerPage, java.io.File swapDir)Creates a PagedInstanceList where "instancesPerPage" instances are swapped to disk in directory "swapDir" if the amount of free system memory drops below "minFreeMemory" bytes
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description booleanadd(Instance instance)Appends the instance to this list.voidclear()InstanceListcloneEmpty()Instanceget(int index)Returns theInstanceat the specified index.booleangetCollectGarbage()intgetSwapIns()longgetSwapInTime()intgetSwapOuts()longgetSwapOutTime()java.util.Iterator<Instance>iterator()static InstanceListload(java.io.File file)Constructs a newInstanceList, deserialized fromfile.Instanceset(int index, Instance instance)Replaces theInstanceat positionindexwith a new one.voidsetCollectGarbage(boolean b)InstanceListshallowClone()intsize()InstanceList[]split(java.util.Random r, double[] proportions)Shuffles the elements of this list among several smaller lists.-
Methods inherited from class cc.mallet.types.InstanceList
add, add, add, add, addAll, addAll, addThruPipe, addThruPipe, clone, cloneEmptyInto, crossValidationIterator, crossValidationIterator, getAlphabet, getAlphabets, getDataAlphabet, getDataClass, getFeatureSelection, getInstanceWeight, getInstanceWeight, getPerLabelFeatureSelection, getPipe, getTargetAlphabet, getTargetClass, hideSomeLabels, hideSomeLabels, remove, remove, removeSources, removeTargets, sampleWithInstanceWeights, sampleWithReplacement, sampleWithWeights, save, setFeatureSelection, setInstance, setInstanceWeight, setInstanceWeight, setPerLabelFeatureSelection, setPipe, shuffle, split, splitInOrder, splitInOrder, splitInTwoByModulo, stratifiedSplit, stratifiedSplitInOrder, subList, subList, targetLabelDistribution, unhideAllLabels
-
Methods inherited from class java.util.ArrayList
contains, ensureCapacity, equals, forEach, hashCode, indexOf, isEmpty, lastIndexOf, listIterator, listIterator, remove, removeAll, removeIf, removeRange, replaceAll, retainAll, sort, spliterator, toArray, toArray, trimToSize
-
-
-
-
Constructor Detail
-
PagedInstanceList
public PagedInstanceList(Pipe pipe, int numPages, int instancesPerPage, java.io.File swapDir)
Creates a PagedInstanceList where "instancesPerPage" instances are swapped to disk in directory "swapDir" if the amount of free system memory drops below "minFreeMemory" bytes- Parameters:
pipe- instance pipenumPages- number of pages to keep in memoryinstancesPerPage- number of Instances to store in each pageswapDir- where the pages on disk live.
-
PagedInstanceList
public PagedInstanceList(Pipe pipe, int numPages, int instancesPerPage)
-
-
Method Detail
-
split
public InstanceList[] split(java.util.Random r, double[] proportions)
Shuffles the elements of this list among several smaller lists. Overrides InstanceList.split to add instances in original order, to prevent thrashing.- Overrides:
splitin classInstanceList- Parameters:
proportions- A list of numbers (not necessarily summing to 1) which, when normalized, correspond to the proportion of elements in each returned sublist.r- The source of randomness to use in shuffling.- Returns:
- one
InstanceListfor each element ofproportions
-
add
public boolean add(Instance instance)
Appends the instance to this list. Note that since memory for the Instance has already been allocated, no check is made to catch OutOfMemoryError.- Specified by:
addin interfacejava.util.Collection<Instance>- Specified by:
addin interfacejava.util.List<Instance>- Overrides:
addin classInstanceList- Returns:
trueif successful
-
get
public Instance get(int index)
Returns theInstanceat the specified index. If this Instance is not in memory, swap a block of instances back into memory.
-
set
public Instance set(int index, Instance instance)
Replaces theInstanceat positionindexwith a new one. Note that this is the only sanctioned way of changing an Instance.- Specified by:
setin interfacejava.util.List<Instance>- Overrides:
setin classInstanceList
-
getCollectGarbage
public boolean getCollectGarbage()
-
setCollectGarbage
public void setCollectGarbage(boolean b)
-
shallowClone
public InstanceList shallowClone()
- Overrides:
shallowClonein classInstanceList
-
cloneEmpty
public InstanceList cloneEmpty()
- Overrides:
cloneEmptyin classInstanceList
-
clear
public void clear()
- Specified by:
clearin interfacejava.util.Collection<Instance>- Specified by:
clearin interfacejava.util.List<Instance>- Overrides:
clearin classInstanceList
-
getSwapIns
public int getSwapIns()
-
getSwapInTime
public long getSwapInTime()
-
getSwapOuts
public int getSwapOuts()
-
getSwapOutTime
public long getSwapOutTime()
-
size
public int size()
-
iterator
public java.util.Iterator<Instance> iterator()
-
load
public static InstanceList load(java.io.File file)
Constructs a newInstanceList, deserialized fromfile. If the string value offileis "-", then deserialize fromSystem.in.
-
-