Yume (2004-03-25)

uniq utility.




   Discard all duplicate lines from INPUT (or stdin), writing the
   remaining lines to OUTPUT (or stdout).  Ordering of the original
   input is preserved.  If "-" is used for INPUT, stdin is used.  "-"
   can not be used for OUTPUT.

   -i          Ignore case (default: case sensitive).
   -c          Include number of occurrences with output (default: no).

   -h N        Set history size to N lines (default: infinite).

               To emulate original uniq behavior, use "-h 1".

               If N is negative, only the first -N lines are kept in
               history.  For example, to filter out contents of
               ignore.txt from input.txt:

                  cat ignore.txt input.txt | \
                  ./yume -h -`wc -l < ignore.txt` | \
                  tail -`wc -l < ignore.txt`

   -l expr     Set expression for left marker (default: start of line).
   -r expr     Set expression for right marker (default: end of line).

               Expressions are character based strings, executed after
               the initial (default) expressions.  See marker
               expressions section below.

   -f N        Skip first N fields.

               Equivalent to "-l ^S(sS)M -r $", where M = N - 1.
               "-f 0" is the same as "-f 1".

   -s N        Skip first N characters.

               Equivalent to "-l ^M -r $", where M = N - 1.

               "-s 0" is the same as "-s 1".

   -w N        Compare only the first N characters.

               Equivalent to "-l ^ -r ^M", where M = N - 1.

   -k          Use the entire line for comparison when marker
               expression failed ("-l", "-r", "-f", "-s", or "-w").
               This is more consistent with uniq's behavior.

               Default is to drop the line completely.

   -u          Print only unique lines.
   -d          Print only duplicate lines.

               Default behavior is to print all unique lines, along
               with the first line of each duplicate line:

                  Input lines =  foo bar bar baz
                  Output lines = foo bar     baz

               When "-u" is specified, none of the duplicate lines
               will be printed:

                  Input lines =  foo bar bar baz
                  Output lines = foo         baz

               When "-d" is specified, only the first line of each
               duplicate line will be printed:

                  Input lines =  foo bar bar baz
                  Output lines =     bar

               When both "-u" and "-d" are used, "-u" is in effect.

   For options which take a parameter, the space between the flag and
   the argument is optional.  e.g. "-h1" and "-h 1" are the same.

Marker expressions

   +        Set current direction to left to right.
   -        Set current direction to right to left.

   ^        Set current position to start of line, and set direction
            to left to right.

   $        Set current position to end of line, and set direction to
            right to left.

   @        Use current position for the other marker.

            This allows selecting a region more easily, when the
            markers are nearby.  For example: "-r ^,@," will select
            the region between the first and second commas.

            Note that left marker is always set before the right one,
            so setting something like '-l $-2@-2' probably won't do
            what you expect.  Always use '@' in the right marker
            expression for more predictable behavior.

   ( ) N    Repeat enclosed expression N times.

            ")" not followed by a positive integer results in a silent

            Note that "(expr)0" has the same effect as "(expr)1".

            Repeat count is limited by size of signed integer (2^31).
            Nesting is limited to 8 levels.

   (digit)  Skip forward (digit) number of characters,

   s        Search forward for next whitespace (isspace).
   d        Search forward for next decimal digit (isdigit).
   a        Search forward for next alpha character (isalpha).
   i        Search forward for next alphanumeric character (isalnum).
   S        Search forward for next non-whitespace.
   D        Search forward for next non-digit.
   A        Search forward for next non-alpha character.
   I        Search forward for next non-alphanumeric character.
   /(char)  Search forward for next (char).
   (char)   Search forward for next (char), unless it matches any
            other opcodes.

            If the same character search is executed
            twice, the cursor is moved one character
            forward first.  Thus -1-1- and --- have the
            same effect.

            It might not always be what you expect.  For example,
            "aiaiai" might not move the cursor at all, because Yume
            sees each opcode as different character.

   Initial expression for left marker:    ^
   Initial expression for right marker:   $

   Marker expressions are used to generalize functions that were in
   the original uniq.  There are no plans to expand this expression
   set, users should preprocess text files in a higher level language
   for that (such as perl/sed/awk).

   Errors in executing the expression (e.g. failed character searches,
   unmatched parentheses) results in the line being completely ignored,
   unless "-k" is specified.  Dropped lines still counts as a line, so
   history parameters are still in effect.


   Make Yume run like uniq:

      yume -h1 -k

   Count number of hits for unique visitors in Apache log:

      yume -c -r '^ ' /var/log/httpd/access_log

   Ignore local visitors in Apache log:

      echo 127.0.0. > ignore.txt
      echo 192.168.0. >> ignore.txt
      cat ignore.txt /var/log/httpd/access_log | \
      yume -h -2 -r '^...'

   Filter duplicate messages in kernel log, ignore timestamps:

      yume -l ':: ' /var/log/messages

   See the different instances of SSHD that started:

      yume -r '^/s/s/h/d[@]' /var/log/secure

   Find all unique words 12 characters or longer:

      yume -l '(i)12^' /usr/share/dict/words

Error messages

   read(%s)          Can not open file for reading.
   write(%s)         Can not open file for writing.
   %s?               Unrecognized option.
   %s __?            Not enough arguments for option.
   out of memory     Out of memory.

   I/O errors are silently ignored.


   1. Feature Extension: By default, history is of infinite length
      instead of just one line.  This causes Yume to eliminate
      duplicate lines globally instead of just nearby duplicate lines.
      Specify "-h" to override this behavior.

         Input:      uniq:          yume:          yume -h1:
         dup         dup            dup            dup
         keep        keep           keep           keep
         dup         dup                           dup

      Comparing every line against every other line is not a major
      speed penalty -- all existing lines are indexed by their CRCs,
      and the CRCs are used as a quick way to reject different lines.

      The main reason to use Yume over uniq is this infinite history
      feature, so that you don't have to run 'sort|uniq' and losing
      the ordering of lines.

   2. Feature Extension: Yume adds generalized marker expressions
      ("-l" and "-r") that is not in uniq, and use that to generalize
      the existing options ("-s", "-w" and "-f").  This allows for
      more complex filtering without invoking an external program.

   3. uniq allows "-s", "-w" and "-f" options to be used together,
      Yume treats them as mutually exclusive (and thus the option
      specified later overrides the earlier ones).  To achieve the
      same effect, use marker expressions instead.

         Input:      uniq -f1 -s2:  yume -f1 -s2:  yume -lSs2:
         1 x dup     1 x dup        1 x dup        1 x dup
         23 y dup                   23 y dup

   4. For field ("-f") comparisons, Yume starts on the first
      non-whitespace in the proper field, while uniq starts on the
      first whitespace.  I am keeping Yume's incompatible behavior
      because I think that's usually more useful.  The original uniq
      behavior can be emulated with marker expressions.

         Input:      uniq -f1:      yume -f1:      yume -lSs:
         123    dup  123    dup     123    dup     123    dup
         456 dup     456 dup                       456 dup
         789 dup

      Also, if there aren't enough fields, uniq treats the line like
      an empty line (the first is printed, the rest are marked as
      duplicates).  Yume drops the line completely (not even the first
      line is printed).

   5. Yume ignores end of line sequences before filtering, uniq treats
      them as significant.  Thus two lines that differs by end of line
      only will be treated by Yume as duplicates, but not with uniq.
      There is no way to override this behavior.

         Input:      uniq:          yume:
         dup\r       dup\r          dup\r
         dup\n       dup\n
         dup\r\n     dup\r\n

      This might not be obvious:

         Input:      uniq:          yume:
         \n\r\r      \n\r\r         \n

      uniq seems to treat the file as one line, but Yume sees three
      lines, and removes the two duplicate lines.

   6. If the last line in the original file does not end in newline,
      uniq outputs an extra newline, Yume doesn't.  There is no way to
      override this behavior.

         Input:      uniq:          yume:
         line<EOF>   line\n<EOF>    line<EOF>

   7. Long option style (e.g. --unique instead of -u) are not

   8. GNU's extension ("-D") is not supported.

   9. There is no command line help or version display, but options
      are mostly compatible with the uniq ones (except for
      incompatibilities mentioned above).  Keep this manual around or
      use "man uniq" instead.


   Yume uniq Yume uniq Yume uniq... try say that 3 times fast ^_^;

   More features -> more pixels -> better looking ASCII art.

   Template is based on Kikuchi Yume from "Mahoutsukai ni taisetsu na
   koto", but actually all of the code was written while listening to
   Kokoro Toshokan OST...

-- -