Class StringIterator

  • All Implemented Interfaces:
    java.util.Iterator<java.lang.Character>

    public final class StringIterator
    extends java.lang.Object
    implements java.util.Iterator<java.lang.Character>
    Java implementation of Jonathan Wood's "Text Parsing Helper Class".
    See Also:
    Text Parsing Helper Class
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static char CR  
      static char LF  
      static char SPACE  
    • Constructor Summary

      Constructors 
      Constructor Description
      StringIterator​(java.lang.String text)  
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      java.lang.String extract​(int start)
      Extracts a substring from the specified range of the current text.
      java.lang.String extract​(int start, int end)
      Extracts a substring from the specified range of the current text.
      boolean hasNext()  
      static boolean isApostrophe​(char c)
      Check if a character is an apostrophe.
      static boolean isArrow​(char c)
      Check if a character is an arrow symbol.
      static boolean isBlank​(java.lang.String s)
      Check if a string is blank.
      static boolean isBracket​(char c)
      Check if a character is a bracket.
      static boolean isCapitalized​(java.lang.String s)
      Check if a string is capitalized.
      static boolean isCjkSymbol​(char c)
      Check if a character is a CJK symbol.
      static boolean isCurrency​(char c)
      Check if a character is a currency symbol.
      static boolean isDoubleQuotationMark​(char c)
      Check if a character is a double quotation mark.
      boolean isEndOfText()
      Indicates if the current position is at the end of the current document.
      static boolean isGeneralPunctuation​(char c)
      Check if a character is a punctuation in Unicode.
      static boolean isHyphen​(char c)
      Check if a character is an hyphen.
      static boolean isLeftBracket​(char c)
      Check if a character is a left bracket.
      static boolean isListMark​(char c)
      Check if a character is a list mark.
      static boolean isLowerCase​(java.lang.String s)
      Check if a string is lower case.
      static boolean isPunctuation​(char c)
      Check if a character is a punctuation in the standard ASCII.
      static boolean isQuotationMark​(char c)
      Check if a character is a quotation mark.
      static boolean isRightBracket​(char c)
      Check if a character is a right bracket.
      static boolean isSeparatorMark​(char c)
      Check if a character is a separator.
      static boolean isSingleQuotationMark​(char c)
      Check if a character is a single quotation mark.
      static boolean isTerminalMark​(char c)
      Check if a character is a final mark.
      static boolean isUpperCase​(java.lang.String s)
      Check if a string is upper case.
      static boolean isWhitespace​(int c)
      Check if a character is a whitespace.
      static java.lang.String join​(java.util.List<java.lang.String> strings, char separator)
      Join a list of strings.
      void moveAhead()
      Moves the current position ahead of one character.
      void moveAhead​(int ahead)
      Moves the current position ahead the specified number of characters.
      void movePast​(char[] chars)
      Moves to the next occurrence of any character that is not one of the specified characters.
      void movePastWhitespace()
      Moves the current position to the next character that is not whitespace.
      void moveTo​(char c)
      Moves to the next occurrence of the specified character.
      void moveTo​(char[] chars)
      Moves to the next occurrence of any one of the specified.
      void moveTo​(java.lang.String s)
      Moves to the next occurrence of the specified string.
      void moveToEndOfLine()
      Moves the current position to the first character that is part of a newline.
      void moveToWhitespace()
      Moves the current position to the next character that is a whitespace.
      java.lang.Character next()  
      static java.lang.String normalize​(java.lang.String text)
      Normalize quotation marks and apostrophes.
      char peek()
      Returns the character beyond the current position, or a null character if the specified position is at the end of the document.
      char peek​(int ahead)
      Returns the character at the specified number of characters beyond the current position, or a null character if the specified position is at the end of the document.
      int position()  
      int remaining()  
      static java.lang.String removeDiacriticalMarks​(java.lang.String s)
      A string normalizer which performs the following steps: Unicode canonical decomposition (Normalizer.Form.NFD) Removal of diacritical marks Unicode canonical composition (Normalizer.Form.NFC)
      void reset​(java.lang.String text)
      Sets the current document and resets the current position to the start of it.
      java.lang.String string()  
      static java.lang.String trimLeft​(java.lang.String s)
      Remove whitespace prefix from string.
      static java.lang.String trimRight​(java.lang.String s)
      Remove whitespace suffix from string.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
      • Methods inherited from interface java.util.Iterator

        forEachRemaining, remove
    • Constructor Detail

      • StringIterator

        public StringIterator​(java.lang.String text)
    • Method Detail

      • normalize

        public static java.lang.String normalize​(java.lang.String text)
        Normalize quotation marks and apostrophes.
        Parameters:
        text - document.
        Returns:
        A normalized text.
      • trimLeft

        public static java.lang.String trimLeft​(java.lang.String s)
        Remove whitespace prefix from string.
        Parameters:
        s - string.
        Returns:
        string without whitespaces at the beginning.
      • trimRight

        public static java.lang.String trimRight​(java.lang.String s)
        Remove whitespace suffix from string.
        Parameters:
        s - string.
        Returns:
        string without whitespaces at the end.
      • isBlank

        public static boolean isBlank​(java.lang.String s)
        Check if a string is blank.
        Parameters:
        s - string.
        Returns:
        true iif s is only made of whitespace characters.
      • isCapitalized

        public static boolean isCapitalized​(java.lang.String s)
        Check if a string is capitalized.
        Parameters:
        s - string.
        Returns:
        true iif s starts with an upper case character and all other characters are lower case.
      • isUpperCase

        public static boolean isUpperCase​(java.lang.String s)
        Check if a string is upper case.
        Parameters:
        s - string.
        Returns:
        true iif s is only made of upper case characters.
      • isLowerCase

        public static boolean isLowerCase​(java.lang.String s)
        Check if a string is lower case.
        Parameters:
        s - string.
        Returns:
        true iif s is only made of lower case characters.
      • removeDiacriticalMarks

        public static java.lang.String removeDiacriticalMarks​(java.lang.String s)
        A string normalizer which performs the following steps:
        1. Unicode canonical decomposition (Normalizer.Form.NFD)
        2. Removal of diacritical marks
        3. Unicode canonical composition (Normalizer.Form.NFC)
      • isWhitespace

        public static boolean isWhitespace​(int c)
        Check if a character is a whitespace. This method takes into account Unicode space characters.
        Parameters:
        c - character as a unicode code point.
        Returns:
        true if c is a space character.
      • isPunctuation

        public static boolean isPunctuation​(char c)
        Check if a character is a punctuation in the standard ASCII.
        Parameters:
        c - character.
        Returns:
        true iif c is a punctuation character.
      • isGeneralPunctuation

        public static boolean isGeneralPunctuation​(char c)
        Check if a character is a punctuation in Unicode.
        Parameters:
        c - character.
        Returns:
        true iif c is a punctuation character.
      • isCjkSymbol

        public static boolean isCjkSymbol​(char c)
        Check if a character is a CJK symbol.
        Parameters:
        c - character.
        Returns:
        true iif c is a CJK symbol.
      • isCurrency

        public static boolean isCurrency​(char c)
        Check if a character is a currency symbol.
        Parameters:
        c - character.
        Returns:
        true iif c is a currency symbol.
      • isArrow

        public static boolean isArrow​(char c)
        Check if a character is an arrow symbol.
        Parameters:
        c - character.
        Returns:
        true iif c is an arrow symbol.
      • isHyphen

        public static boolean isHyphen​(char c)
        Check if a character is an hyphen.
        Parameters:
        c - character.
        Returns:
        true iif c is an hyphen.
      • isApostrophe

        public static boolean isApostrophe​(char c)
        Check if a character is an apostrophe.
        Parameters:
        c - character.
        Returns:
        true iif c is an apostrophe.
      • isListMark

        public static boolean isListMark​(char c)
        Check if a character is a list mark.
        Parameters:
        c - character.
        Returns:
        true iif c is a list mark.
      • isTerminalMark

        public static boolean isTerminalMark​(char c)
        Check if a character is a final mark.
        Parameters:
        c - character.
        Returns:
        true iif c is a final mark.
      • isSeparatorMark

        public static boolean isSeparatorMark​(char c)
        Check if a character is a separator.
        Parameters:
        c - character.
        Returns:
        true iif c is a separator.
      • isQuotationMark

        public static boolean isQuotationMark​(char c)
        Check if a character is a quotation mark.
        Parameters:
        c - character.
        Returns:
        true iif c is a quotation mark.
      • isSingleQuotationMark

        public static boolean isSingleQuotationMark​(char c)
        Check if a character is a single quotation mark.
        Parameters:
        c - character.
        Returns:
        true iif c is a single quotation mark.
      • isDoubleQuotationMark

        public static boolean isDoubleQuotationMark​(char c)
        Check if a character is a double quotation mark.
        Parameters:
        c - character.
        Returns:
        true iif c is a double quotation mark.
      • isBracket

        public static boolean isBracket​(char c)
        Check if a character is a bracket.
        Parameters:
        c - character.
        Returns:
        true iif c is a bracket.
      • isLeftBracket

        public static boolean isLeftBracket​(char c)
        Check if a character is a left bracket.
        Parameters:
        c - character.
        Returns:
        true iif c is a left bracket.
      • isRightBracket

        public static boolean isRightBracket​(char c)
        Check if a character is a right bracket.
        Parameters:
        c - character.
        Returns:
        true iif c is a right bracket.
      • join

        public static java.lang.String join​(java.util.List<java.lang.String> strings,
                                            char separator)
        Join a list of strings. Similar to Guava's
         Joiner.on(separator).join(strings)
         
        Returns:
        a string.
      • reset

        public void reset​(java.lang.String text)
        Sets the current document and resets the current position to the start of it.
      • hasNext

        public boolean hasNext()
        Specified by:
        hasNext in interface java.util.Iterator<java.lang.Character>
      • next

        public java.lang.Character next()
        Specified by:
        next in interface java.util.Iterator<java.lang.Character>
      • isEndOfText

        public boolean isEndOfText()
        Indicates if the current position is at the end of the current document.
        Returns:
        true iif we reached the end of the document, false otherwise.
      • peek

        public char peek​(int ahead)
        Returns the character at the specified number of characters beyond the current position, or a null character if the specified position is at the end of the document.
        Parameters:
        ahead - The number of characters beyond the current position.
        Returns:
        The character at the current position.
      • peek

        public char peek()
        Returns the character beyond the current position, or a null character if the specified position is at the end of the document.
        Returns:
        The character at the current position.
      • moveAhead

        public void moveAhead()
        Moves the current position ahead of one character.
      • moveAhead

        public void moveAhead​(int ahead)
        Moves the current position ahead the specified number of characters.
        Parameters:
        ahead - The number of characters to move ahead.
      • string

        public java.lang.String string()
      • position

        public int position()
      • remaining

        public int remaining()
      • extract

        public java.lang.String extract​(int start)
        Extracts a substring from the specified range of the current text.
      • extract

        public java.lang.String extract​(int start,
                                        int end)
        Extracts a substring from the specified range of the current text.
      • moveTo

        public void moveTo​(java.lang.String s)
        Moves to the next occurrence of the specified string.
        Parameters:
        s - String to find.
      • moveTo

        public void moveTo​(char c)
        Moves to the next occurrence of the specified character.
        Parameters:
        c - Character to find.
      • moveTo

        public void moveTo​(char[] chars)
        Moves to the next occurrence of any one of the specified.
        Parameters:
        chars - Array of characters to find.
      • movePast

        public void movePast​(char[] chars)
        Moves to the next occurrence of any character that is not one of the specified characters.
        Parameters:
        chars - Array of characters to move past.
      • moveToEndOfLine

        public void moveToEndOfLine()
        Moves the current position to the first character that is part of a newline.
      • moveToWhitespace

        public void moveToWhitespace()
        Moves the current position to the next character that is a whitespace.
      • movePastWhitespace

        public void movePastWhitespace()
        Moves the current position to the next character that is not whitespace.