Class FileIterator

  • All Implemented Interfaces:
    java.util.Iterator<Instance>
    Direct Known Subclasses:
    FileUriIterator

    public class FileIterator
    extends java.lang.Object
    implements java.util.Iterator<Instance>
    An iterator that generates instances from an initial directory or set of directories. The iterator will recurse through sub-directories. Each filename becomes the data field of an instance, and the result of a user-specified regular expression pattern applied to the filename becomes the target value of the instance.

    In document classification it is common that the file name in the data field will be subsequently processed by one or more pipes until it contains a feature vector. The pattern applied to the file name is often used to extract a directory name that will be used as the true label of the instance; this label is kept in the target field.

    Author:
    Andrew McCallum mccallum@cs.umass.edu
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static java.util.regex.Pattern ALL_DIRECTORIES
      Use as label names all the directory names in the filename.
      static java.util.regex.Pattern FIRST_DIRECTORY
      Use as label names the first directory in the filename.
      static java.util.regex.Pattern LAST_DIRECTORY
      Use as label name the last directory in the filename.
      static java.lang.String sep
      Use as label names the directories specified in the constructor, optionally removing common prefix of all starting directories
      static java.util.regex.Pattern STARTING_DIRECTORIES  
    • Constructor Summary

      Constructors 
      Modifier Constructor Description
        FileIterator​(java.io.File directory)  
        FileIterator​(java.io.File[] directories, java.io.FileFilter fileFilter, java.util.regex.Pattern targetPattern)  
      protected FileIterator​(java.io.File[] directories, java.io.FileFilter fileFilter, java.util.regex.Pattern targetPattern, boolean removeCommonPrefix)
      Construct a FileIterator that will supply filenames within initial directories as instances
        FileIterator​(java.io.File[] directories, java.util.regex.Pattern targetPattern)
      Iterate over Files that pass the fileFilter test, setting...
        FileIterator​(java.io.File[] directories, java.util.regex.Pattern targetPattern, boolean removeCommonPrefix)  
        FileIterator​(java.io.File directory, java.io.FileFilter fileFilter)  
        FileIterator​(java.io.File directory, java.io.FileFilter fileFilter, java.util.regex.Pattern targetPattern)  
        FileIterator​(java.io.File directory, java.io.FileFilter fileFilter, java.util.regex.Pattern targetPattern, boolean removeCommonPrefix)  
        FileIterator​(java.io.File directory, java.util.regex.Pattern targetPattern)  
        FileIterator​(java.io.File directory, java.util.regex.Pattern targetPattern, boolean removeCommonPrefix)  
        FileIterator​(java.lang.String directory)  
        FileIterator​(java.lang.String[] directories, java.io.FileFilter ff)  
        FileIterator​(java.lang.String[] directories, java.lang.String targetPattern)  
        FileIterator​(java.lang.String[] directories, java.util.regex.Pattern targetPattern)  
        FileIterator​(java.lang.String[] directories, java.util.regex.Pattern targetPattern, boolean removeCommonPrefix)  
        FileIterator​(java.lang.String directory, java.io.FileFilter filter)  
        FileIterator​(java.lang.String directory, java.util.regex.Pattern targetPattern)  
        FileIterator​(java.lang.String directory, java.util.regex.Pattern targetPattern, boolean removeCommonPrefix)  
    • Field Detail

      • sep

        public static final java.lang.String sep
        Use as label names the directories specified in the constructor, optionally removing common prefix of all starting directories
      • STARTING_DIRECTORIES

        public static final java.util.regex.Pattern STARTING_DIRECTORIES
      • FIRST_DIRECTORY

        public static final java.util.regex.Pattern FIRST_DIRECTORY
        Use as label names the first directory in the filename.
      • LAST_DIRECTORY

        public static final java.util.regex.Pattern LAST_DIRECTORY
        Use as label name the last directory in the filename.
      • ALL_DIRECTORIES

        public static final java.util.regex.Pattern ALL_DIRECTORIES
        Use as label names all the directory names in the filename.
    • Constructor Detail

      • FileIterator

        protected FileIterator​(java.io.File[] directories,
                               java.io.FileFilter fileFilter,
                               java.util.regex.Pattern targetPattern,
                               boolean removeCommonPrefix)
        Construct a FileIterator that will supply filenames within initial directories as instances
        Parameters:
        directories - Array of directories to collect files from
        fileFilter - class implementing interface FileFilter that will decide which names to accept. May be null.
        targetPattern - regex Pattern applied to the filename whose first parenthesized group on matching is taken to be the target value of the generated instance. The pattern is applied to the directory with the matcher.find() method. If null, then all instances will have target null.
        removeCommonPrefix - boolean that modifies the behavior of the STARTING_DIRECTORIES pattern, removing the common prefix of all initially specified directories, leaving the remainder of each filename as the target value.
      • FileIterator

        public FileIterator​(java.io.File[] directories,
                            java.io.FileFilter fileFilter,
                            java.util.regex.Pattern targetPattern)
      • FileIterator

        public FileIterator​(java.io.File[] directories,
                            java.util.regex.Pattern targetPattern)
        Iterate over Files that pass the fileFilter test, setting...
      • FileIterator

        public FileIterator​(java.io.File[] directories,
                            java.util.regex.Pattern targetPattern,
                            boolean removeCommonPrefix)
      • FileIterator

        public FileIterator​(java.lang.String[] directories,
                            java.io.FileFilter ff)
      • FileIterator

        public FileIterator​(java.lang.String[] directories,
                            java.lang.String targetPattern)
      • FileIterator

        public FileIterator​(java.lang.String[] directories,
                            java.util.regex.Pattern targetPattern)
      • FileIterator

        public FileIterator​(java.lang.String[] directories,
                            java.util.regex.Pattern targetPattern,
                            boolean removeCommonPrefix)
      • FileIterator

        public FileIterator​(java.io.File directory,
                            java.io.FileFilter fileFilter,
                            java.util.regex.Pattern targetPattern)
      • FileIterator

        public FileIterator​(java.io.File directory,
                            java.io.FileFilter fileFilter,
                            java.util.regex.Pattern targetPattern,
                            boolean removeCommonPrefix)
      • FileIterator

        public FileIterator​(java.io.File directory,
                            java.io.FileFilter fileFilter)
      • FileIterator

        public FileIterator​(java.io.File directory,
                            java.util.regex.Pattern targetPattern)
      • FileIterator

        public FileIterator​(java.io.File directory,
                            java.util.regex.Pattern targetPattern,
                            boolean removeCommonPrefix)
      • FileIterator

        public FileIterator​(java.lang.String directory,
                            java.util.regex.Pattern targetPattern)
      • FileIterator

        public FileIterator​(java.lang.String directory,
                            java.util.regex.Pattern targetPattern,
                            boolean removeCommonPrefix)
      • FileIterator

        public FileIterator​(java.io.File directory)
      • FileIterator

        public FileIterator​(java.lang.String directory)
      • FileIterator

        public FileIterator​(java.lang.String directory,
                            java.io.FileFilter filter)
    • Method Detail

      • getFileArray

        public java.util.ArrayList<java.io.File> getFileArray()
      • stringArray2FileArray

        public static java.io.File[] stringArray2FileArray​(java.lang.String[] sa)
      • next

        public Instance next()
        Specified by:
        next in interface java.util.Iterator<Instance>
      • remove

        public void remove()
        Specified by:
        remove in interface java.util.Iterator<Instance>
      • nextFile

        public java.io.File nextFile()
      • hasNext

        public boolean hasNext()
        Specified by:
        hasNext in interface java.util.Iterator<Instance>