note: There are extensive documents on each of the following commands. This blog post is not meant to show the exhaustive features of any of the commands. Instead, this is a blog post that shows my most common usages of my most commonly used commands. If you don't know linux commands well, and you find yourself needing to grab some data, this blog post might give you a bit of guidance.
Let's start with some sample documents. Let's assume that I have 2 files showing orders that are being placed with a third party and the responses the third party sends.
order.out.logcat
8:22:19 111, 1, Patterns of Enterprise Architecture, Kindle edition, 39.99
8:23:45 112, 1, Joy of Clojure, Hardcover, 29.99
8:24:19 113, -1, Patterns of Enterprise Architecture, Kindle edition, 39.99
order.in.log
8:22:20 111, Order Complete
8:23:50 112, Order sent to fulfillment
8:24:20 113, Refund sent to processing
cat - concatenate files and print on the standard outputThe cat command is simple, as the following example shows.
jfields$ cat order.out.log 8:22:19 111, 1, Patterns of Enterprise Architecture, Kindle edition, 39.99 8:23:45 112, 1, Joy of Clojure, Hardcover, 29.99 8:24:19 113, -1, Patterns of Enterprise Architecture, Kindle edition, 39.99As the description shows, you can also use it to concatenate multiple files.
jfields$ cat order.* 8:22:20 111, Order Complete 8:23:50 112, Order sent to fulfillment 8:24:20 113, Refund sent to processing 8:22:19 111, 1, Patterns of Enterprise Architecture, Kindle edition, 39.99 8:23:45 112, 1, Joy of Clojure, Hardcover, 29.99 8:24:19 113, -1, Patterns of Enterprise Architecture, Kindle edition, 39.99If I wanted to view my log files I can concatenate them and print them to standard out, as the example above shows. That's cool, but things could be a bit more readable.
sort
sort - sort lines of text filesUsing sort is an obvious choice here.
jfields$ cat order.* | sort 8:22:19 111, 1, Patterns of Enterprise Architecture, Kindle edition, 39.99 8:22:20 111, Order Complete 8:23:45 112, 1, Joy of Clojure, Hardcover, 29.99 8:23:50 112, Order sent to fulfillment 8:24:19 113, -1, Patterns of Enterprise Architecture, Kindle edition, 39.99 8:24:20 113, Refund sent to processingAs the example above shows, my data is now sorted. With small sample files, you can probably deal with reading the entire file. However, any real production log is likely to have plenty of lines that you don't care about. You're going to want a way to filter the results of piping cat to sort.
grep
grep, egrep, fgrep - print lines matching a patternLet's pretend that I only care about finding an order for PofEAA. Using grep I can limit my results to PofEAA transactions.
jfields$ cat order.* | sort | grep Patterns 8:22:19 111, 1, Patterns of Enterprise Architecture, Kindle edition, 39.99 8:24:19 113, -1, Patterns of Enterprise Architecture, Kindle edition, 39.99Assume that an issue occurred with the refund on order 113, and you want to see all data related to that order - grep is your friend again.
jfields$ cat order.* | sort | grep ":\d\d 113, " 8:24:19 113, -1, Patterns of Enterprise Architecture, Kindle edition, 39.99 8:24:20 113, Refund sent to processingYou'll notice that I put a bit more than "113" in my regex for grep. This is because 113 can also come up in a product title or a price. With a few extra characters, I can limit the results to strictly the transactions I'm looking for.
Now that we've sent the order details on to refunds, we also want to send the daily totals of sales and refunds on to the accounting team. They've asked for each line item for PofEAA, but they only care about the quantity and price. What we need to do is cut out everything we don't care about.
cut
cut - remove sections from each line of filesUsing grep again, we can see that we grab the appropriate lines. Once we grab what we need, we can cut the line up into pieces, and rid ourselves of the unnecessary data.
jfields$ cat order.* | sort | grep Patterns 8:22:19 111, 1, Patterns of Enterprise Architecture, Kindle edition, 39.99 8:24:19 113, -1, Patterns of Enterprise Architecture, Kindle edition, 39.99 jfields$ cat order.* | sort | grep Patterns | cut -d"," -f2,5 1, 39.99 -1, 39.99At this point we've reduced our data down to what accounting is looking for, so it's time to paste it into a spreadsheet and be done with that task.
Using cut is helpful in tracking down problems, but if you're generating an output file you'll often want something more complicated. Let's assume that accounting also needs to know the order ids for building some type of reference documentation. We can get the information using cut, but the accounting team wants the order id to be at the end of the line, and surrounded in single quotes. (for the record, you might be able to do this with cut, I've never tried)
sed
sed - A stream editor. A stream editor is used to perform basic text transformations on an input stream.The following example shows how we can use sed to transform our lines in the requested way, and then cut is used to remove unnecessary data.
jfields$ cat order.* | sort | grep Patterns \ >| sed s/"[0-9\:]* \([0-9]*\)\, \(.*\)"/"\2, '\1'"/ 1, Patterns of Enterprise Architecture, Kindle edition, 39.99, '111' -1, Patterns of Enterprise Architecture, Kindle edition, 39.99, '113' lmp-jfields01:~ jfields$ cat order.* | sort | grep Patterns \ >| sed s/"[0-9\:]* \([0-9]*\)\, \(.*\)"/"\2, '\1'"/ | cut -d"," -f1,4,5 1, 39.99, '111' -1, 39.99, '113'There's a bit going on in that example regex, but nothing too complicated. The regex does the following things
- remove the timestamp
- capture the order number
- remove the comma and space after the order number
- capture the remainder of the line
Once we've captured the data we need, we can use \1 & \2 to reorder and output the data in our desired format. We also include the requested double quotes, and add our own comma to keep our format consistent. Finally, we use cut to remove the superfluous data.
Now you're in trouble. You've demonstrated that you can slice up a log file in fairly short order, and the CIO needs a quick report of the total number of book transactions broken down by book.
uniq
uniq - removes duplicate lines from a uniqed file(we'll assume that other types of transactions can take place and 'filter' our in file for 'Kindle' and 'Hardcover')
The following example shows how to grep for only book related transactions, cut unnecessary information, and get a counted & unique list of each line.
jfields$ cat order.out.log | grep "\(Kindle\|Hardcover\)" | cut -d"," -f3 | sort | uniq -c 1 Joy of Clojure 2 Patterns of Enterprise ArchitectureHad the requirements been a bit simpler, say "get me a list of all books with transactions", uniq also would have been the answer.
jfields$ cat order.out.log | grep "\(Kindle\|Hardcover\)" | cut -d"," -f3 | sort | uniq Joy of Clojure Patterns of Enterprise ArchitectureAll of these tricks work well, if you know where to find the file you need; however, sometimes you'll find yourself in a deeply nested directory structure without any hints as to where you need to go. If you're lucky enough to know the name of the file you need (or you have a decent guess) you shouldn't have any trouble finding what you need.
find
find - search for files in a directory hierarchyIn our above examples we've been working with order.in.log and order.out.log. On my box those files exist in my home directory. The following example shows how to find those files from a higher level, without even knowing the full filename.
jfields$ find /Users -name "order*" Users/jfields/order.in.log Users/jfields/order.out.logFind has plenty of other options, but this does the trick for me about 99% of the time.
Along the same lines, once you find a file you need, you're not always going to know what's in it and how you want to slice it up. Piping the output to standard out works fine when the output is short; however, when there's a bit more data than what fits on a screen, you'll probably want to pipe the output to less.
less
less - allows forward & backward movement within a fileAs an example, let's go all the way back to our simple cat | sort example. If you execute the following command you'll end up in less, with your in & out logs merged and sorted. Within less you can forward search with "/" and backward search with "?". Both searches take a regex.
jfields$ cat order* | sort | lessWhile in less you can try /113.*, which will highlight all transactions for order 113. You can also try ?.*112, which will highlight all timestamps associated with order 112. Finally, you can use 'q' to quit less.
The linux command line is rich, and someone intimidating. However, with the previous 8 commands, you should be able to get quite a few log slicing tasks completed - without having to drop to your favorite scripting language.
* okay, possibly Unix, that's not the point
I far prefer tac to cat, it's cat reversed, and starts at the bottom of the file vs. the top.
ReplyDeleteAnother one I use all the time is "tail", especially "tail -f" to watch a log file.
ReplyDeleteMike, I used to use tail, but I tend to use less and shift+f to basically tail within less.
ReplyDeleteI would add xargs as the 9th command you should know.
ReplyDeleteIt allows you to complete the story:
$ find /Users -name "order*" | xargs cat | grep "\(Kindle\|Hardcover\)" | cut -d"," -f3 | sort | uniq
sort | uniq is a useless pipe operation.
ReplyDeletesort -u already covers that.
@Uri Nativ xargs is cool but in that case you dont need it, because find has a commandline option calles -exec
ReplyDelete$ find /Users -name "order*" -exec cat {} \; | grep "\(Kindle\|Hardcover\)" | cut -d"," -f3 | sort | uniq
This comment has been removed by the author.
ReplyDelete@Martin Holzhauser Also, GNU find has '-exec [command] {} +', if you don't want [command] executed for each result of find (e.g. xargs-like command iteration).
ReplyDelete@Jay Fields I also switched from tail to less. Note, you can 'less +F ' on the command line to start in 'follow' (shell alias, etc.).
(HTML tags killed my last comment.. no comment editing w/ blogger.com?)
Why do 'sort | uniq'. Just do 'sort -u'
ReplyDeletesort | uniq
ReplyDeletemay be superfluous, but
sort | uniq -c
is definitely not. (Granted, not what he posted, but it is a pattern I use often.)
whatever | sort | uniq -c | sort -rn
for a listing of "most often occurring string" to "least often"
I find ack to work much much better than any combination of find cat and grep. Try it out!
ReplyDelete$ find /Users -name "order*" -exec grep "\(Kindle\|Hardcover\)" {}; | cut -d"," -f3 | sort | uniq
ReplyDeleteUseless Use Of Cat! (UUOC)
Very useful for someone trying to get into Linux for developing, coming from Windows. This is effectively, awesome. Thanks!
ReplyDeleteDon't forget: xargs, nm, ldd, strings, awk, strace, file
ReplyDeleteThe best command is man: Type 'man man' to learn how to use it and 'man -k keyword' to search for commands that you need.
ReplyDeleteI'd add awk to this list, as well. The grep + sed + awk trio has been a durable cohort over the decades. What does awk bring to the table? Where sed's forte is pattern substitution, awk takes over for doing extraction and transforming of structured data (e.g. adding a line at the end of a report with the sum total, or transposing two columns). Awk also adds functions and some data types. You'd think that once you need a data type you should just go to Ruby or Python, but you'll be pleasantly surprised at how much you can get done quickly with awk.
ReplyDeleteThanks Rambo,
ReplyDeleteChelimsky is a big fan of awk as well, so I'm sure I'll get plenty of experience with it over the next year.
@Martin Holzhauer, thanks for the tip. The nice thing with xargs is that its syntax is just easier to memorize :)
ReplyDeletei don't know - my most favourite command on Linux is "python", which is a awesome substitution for all of this commands ...
ReplyDeleteAll these commands are ugly.
ReplyDelete- Yves
These commands are nice, but if you use them often, you need many more options. For example with uniq, I have one with the next set of options, so you can count and do uniq over only a certain part of the strings:
ReplyDelete? : show this text and exit
n : Compare only first n characters or words
Bstring : Block: ignore text from (hex) string at any line
C : Compress white space before compare
E : separator for /SS modes is empty line i.o. "------"
F : Give First line of equal lines (default)
FF : as above, prepend a line count
FFF : as above, add linecounts that are in input
FFFF : as above, but give the number of lines too
H : Human: flush output often (slower but more interactive)
I : Ignore case. Note: consider /Yc to control accented chars
K : Keep trailing spaces and tabs (default: strip)
L : Give Last line of equal lines instead of first one
LL : as above, prepend a line count
LLL : as above, add linecounts that are in input
Example:
sortm file1 | uniq /ll > temp.u
sortm file2 | uniq /ll >> temp.u
(add more sortm | uniq operations if wanted)
sortm /wb2 temp.u | uniq /lll > resultfile
LLLL : as above, but give the number of lines too
M : give info Message at exit
N[w] : add line Numbers [w chars wide, w=1..9 [6]]
NOTE: /N output is not compatible for /FFF or /LLL
O : output Only first line of equal lines (use with /S)
P[w] : give Position number in /S modes [w chars wide, w=1..9 [4]]
Qn : Text between Quotes is seen as a word
n=0: none; 1: single; 2: double; 3: both quotes
QQn : As Qn, but quote doubling assumed
R : Reverse: output lines that would be skipped
S : output lines that compare the Same
SS : as above, but Separate the compare regions
SSS : as above, but give also unique lines
T : keep Tabs (default: convert to spaces)
U : output only lines that are unique
W : compare first words instead of chars
Yc : Extended upper/lower case chars: c= I:IBM; W:Win/ISO8859; N:None [N]
Z : Zombie: Output numbers from /ll modes in parsable format
Two of my favorites not listed here are vi/vim and md5sum. md5sum is a great time saver when you need to compare a large group of files. This command - md5sum *|sort|less puts all of the identical files in a directory together regardless of name.
ReplyDeleteThanks for this list! Since it's getting popular, I think it would be best to mention that running grep first, and then sort, will generally be faster, especially for very large files.
ReplyDelete(Note that if you grep first, though, some of grep options like -C will not make sense)
diff, cmp, cp -a, ls -altr, locate,
ReplyDeleteI am not aware of a distribution of Linux which uses /Users for home directories.
ReplyDeleteIf looking at log files while they are being written, tee is good.
ReplyDeleteI was expecting make, gcc, strace, gdb, ltrace, but these are all basics that are, indeed, a must to know.
ReplyDeleteYou can't always install emacs. vi/vim are essential tools, even if you prefer to use another editor.
ReplyDeleteI like your list, and would also add vi/vim to it. Sometimes you cannot always install your favorite editor, and vi works, albeit awkwardly if you don't know it well, when you are in a pinch.
ReplyDeletecat utility mostly useless as you always can put it args to utility that next in pipe
ReplyDeleteUsing xargs in combination with find is far preferable to using the -exec arg of find. The syntax after the -exec arg is ugly and awkward, whereas xargs brings elegance and simplicity.
ReplyDeleteBut xargs isn't only useful with find. Any time you need to convert a list of data in the pipe stream (files, strings, whatever) into arguments for a command later in the pipe stream, xargs is the tool to do it.
Let's use find/xargs anyway. Let's say, for example, you have a bunch of log files you no longer want, log files with "foo" in the name of the file, and they're all under your current directory. The following will get rid of those files:
find . -type f | egrep 'foo.*log$' | xargs rm
What if you want to remove a bunch of log files where "foo" is not in the name of the file, but rather in the contents of the file? We use xargs twice in the pipe stream to accomplish that, along with an example of awk:
find . -type f -name "*.log" | xargs egrep foo | awk -F: '{print $1}' | sort -u | xargs rm
The commands grep and egrep each have 2 ways of operating on data. The first case is where they're receiving string data from stdin:
... | egrep pattern
The second case is where they're given a set of filenames to search through:
egrep pattern filename [filename...]
In the first find example above, the list of filenames which were piped into egrep came through stdin, and they were thus treated as strings. In the second find example above, the list of filenames which were piped into xargs egrep got treated as arguments to egrep (because of the magic of xargs), and therefore the filenames represented the actual files, and their contents were searched.
I love your blog, Jay Fields!
Excellent introduction! Thanks a lot.
ReplyDelete