Yume (2004-03-25) uniq utility. Synopsis yume [OPTION] [INPUT [OUTPUT]] Description Discard all duplicate lines from INPUT (or stdin), writing the remaining lines to OUTPUT (or stdout). Ordering of the original input is preserved. If "-" is used for INPUT, stdin is used. "-" can not be used for OUTPUT. -i Ignore case (default: case sensitive). -c Include number of occurrences with output (default: no). -h N Set history size to N lines (default: infinite). To emulate original uniq behavior, use "-h 1". If N is negative, only the first -N lines are kept in history. For example, to filter out contents of ignore.txt from input.txt: cat ignore.txt input.txt | \ ./yume -h -`wc -l < ignore.txt` | \ tail -`wc -l < ignore.txt` -l expr Set expression for left marker (default: start of line). -r expr Set expression for right marker (default: end of line). Expressions are character based strings, executed after the initial (default) expressions. See marker expressions section below. -f N Skip first N fields. Equivalent to "-l ^S(sS)M -r $", where M = N - 1. "-f 0" is the same as "-f 1". -s N Skip first N characters. Equivalent to "-l ^M -r $", where M = N - 1. "-s 0" is the same as "-s 1". -w N Compare only the first N characters. Equivalent to "-l ^ -r ^M", where M = N - 1. -k Use the entire line for comparison when marker expression failed ("-l", "-r", "-f", "-s", or "-w"). This is more consistent with uniq's behavior. Default is to drop the line completely. -u Print only unique lines. -d Print only duplicate lines. Default behavior is to print all unique lines, along with the first line of each duplicate line: Input lines = foo bar bar baz Output lines = foo bar baz When "-u" is specified, none of the duplicate lines will be printed: Input lines = foo bar bar baz Output lines = foo baz When "-d" is specified, only the first line of each duplicate line will be printed: Input lines = foo bar bar baz Output lines = bar When both "-u" and "-d" are used, "-u" is in effect. For options which take a parameter, the space between the flag and the argument is optional. e.g. "-h1" and "-h 1" are the same. Marker expressions + Set current direction to left to right. - Set current direction to right to left. ^ Set current position to start of line, and set direction to left to right. $ Set current position to end of line, and set direction to right to left. @ Use current position for the other marker. This allows selecting a region more easily, when the markers are nearby. For example: "-r ^,@," will select the region between the first and second commas. Note that left marker is always set before the right one, so setting something like '-l $-2@-2' probably won't do what you expect. Always use '@' in the right marker expression for more predictable behavior. ( ) N Repeat enclosed expression N times. ")" not followed by a positive integer results in a silent error. Note that "(expr)0" has the same effect as "(expr)1". Repeat count is limited by size of signed integer (2^31). Nesting is limited to 8 levels. (digit) Skip forward (digit) number of characters, s Search forward for next whitespace (isspace). d Search forward for next decimal digit (isdigit). a Search forward for next alpha character (isalpha). i Search forward for next alphanumeric character (isalnum). S Search forward for next non-whitespace. D Search forward for next non-digit. A Search forward for next non-alpha character. I Search forward for next non-alphanumeric character. /(char) Search forward for next (char). (char) Search forward for next (char), unless it matches any other opcodes. If the same character search is executed twice, the cursor is moved one character forward first. Thus -1-1- and --- have the same effect. It might not always be what you expect. For example, "aiaiai" might not move the cursor at all, because Yume sees each opcode as different character. Initial expression for left marker: ^ Initial expression for right marker: $ Marker expressions are used to generalize functions that were in the original uniq. There are no plans to expand this expression set, users should preprocess text files in a higher level language for that (such as perl/sed/awk). Errors in executing the expression (e.g. failed character searches, unmatched parentheses) results in the line being completely ignored, unless "-k" is specified. Dropped lines still counts as a line, so history parameters are still in effect. Examples Make Yume run like uniq: yume -h1 -k Count number of hits for unique visitors in Apache log: yume -c -r '^ ' /var/log/httpd/access_log Ignore local visitors in Apache log: echo 127.0.0. > ignore.txt echo 192.168.0. >> ignore.txt cat ignore.txt /var/log/httpd/access_log | \ yume -h -2 -r '^...' Filter duplicate messages in kernel log, ignore timestamps: yume -l ':: ' /var/log/messages See the different instances of SSHD that started: yume -r '^/s/s/h/d[@]' /var/log/secure Find all unique words 12 characters or longer: yume -l '(i)12^' /usr/share/dict/words Error messages read(%s) Can not open file for reading. write(%s) Can not open file for writing. %s? Unrecognized option. %s __? Not enough arguments for option. out of memory Out of memory. I/O errors are silently ignored. Incompatibilities 1. Feature Extension: By default, history is of infinite length instead of just one line. This causes Yume to eliminate duplicate lines globally instead of just nearby duplicate lines. Specify "-h" to override this behavior. Input: uniq: yume: yume -h1: dup dup dup dup keep keep keep keep dup dup dup Comparing every line against every other line is not a major speed penalty -- all existing lines are indexed by their CRCs, and the CRCs are used as a quick way to reject different lines. The main reason to use Yume over uniq is this infinite history feature, so that you don't have to run 'sort|uniq' and losing the ordering of lines. 2. Feature Extension: Yume adds generalized marker expressions ("-l" and "-r") that is not in uniq, and use that to generalize the existing options ("-s", "-w" and "-f"). This allows for more complex filtering without invoking an external program. 3. uniq allows "-s", "-w" and "-f" options to be used together, Yume treats them as mutually exclusive (and thus the option specified later overrides the earlier ones). To achieve the same effect, use marker expressions instead. Input: uniq -f1 -s2: yume -f1 -s2: yume -lSs2: 1 x dup 1 x dup 1 x dup 1 x dup 23 y dup 23 y dup 4. For field ("-f") comparisons, Yume starts on the first non-whitespace in the proper field, while uniq starts on the first whitespace. I am keeping Yume's incompatible behavior because I think that's usually more useful. The original uniq behavior can be emulated with marker expressions. Input: uniq -f1: yume -f1: yume -lSs: 123 dup 123 dup 123 dup 123 dup 456 dup 456 dup 456 dup 789 dup Also, if there aren't enough fields, uniq treats the line like an empty line (the first is printed, the rest are marked as duplicates). Yume drops the line completely (not even the first line is printed). 5. Yume ignores end of line sequences before filtering, uniq treats them as significant. Thus two lines that differs by end of line only will be treated by Yume as duplicates, but not with uniq. There is no way to override this behavior. Input: uniq: yume: dup\r dup\r dup\r dup\n dup\n dup\r\n dup\r\n This might not be obvious: Input: uniq: yume: \n\r\r \n\r\r \n uniq seems to treat the file as one line, but Yume sees three lines, and removes the two duplicate lines. 6. If the last line in the original file does not end in newline, uniq outputs an extra newline, Yume doesn't. There is no way to override this behavior. Input: uniq: yume: line line\n line 7. Long option style (e.g. --unique instead of -u) are not supported. 8. GNU's extension ("-D") is not supported. 9. There is no command line help or version display, but options are mostly compatible with the uniq ones (except for incompatibilities mentioned above). Keep this manual around or use "man uniq" instead. Miscellaneous Yume uniq Yume uniq Yume uniq... try say that 3 times fast ^_^; More features -> more pixels -> better looking ASCII art. Template is based on Kikuchi Yume from "Mahoutsukai ni taisetsu na koto", but actually all of the code was written while listening to Kokoro Toshokan OST... -- omoikane@uguu.org - http://uguu.org/