Tuesday, August 28, 2012

8 Linux Commands Every Developer Should Know

Every developer, at some point in their career, will find themselves looking for some information on a Linux* box. I don't claim to be an expert, in fact, I claim to be very under-skilled when it comes to linux command line mastery. However, with the following 8 commands I can get pretty much anything I need, done.

note: There are extensive documents on each of the following commands. This blog post is not meant to show the exhaustive features of any of the commands. Instead, this is a blog post that shows my most common usages of my most commonly used commands. If you don't know linux commands well, and you find yourself needing to grab some data, this blog post might give you a bit of guidance.

Let's start with some sample documents. Let's assume that I have 2 files showing orders that are being placed with a third party and the responses the third party sends.
order.out.log
8:22:19 111, 1, Patterns of Enterprise Architecture, Kindle edition, 39.99
8:23:45 112, 1, Joy of Clojure, Hardcover, 29.99
8:24:19 113, -1, Patterns of Enterprise Architecture, Kindle edition, 39.99

order.in.log
8:22:20 111, Order Complete
8:23:50 112, Order sent to fulfillment
8:24:20 113, Refund sent to processing
cat
cat - concatenate files and print on the standard output
The cat command is simple, as the following example shows.
jfields$ cat order.out.log 
8:22:19 111, 1, Patterns of Enterprise Architecture, Kindle edition, 39.99
8:23:45 112, 1, Joy of Clojure, Hardcover, 29.99
8:24:19 113, -1, Patterns of Enterprise Architecture, Kindle edition, 39.99
As the description shows, you can also use it to concatenate multiple files.
jfields$ cat order.* 
8:22:20 111, Order Complete
8:23:50 112, Order sent to fulfillment
8:24:20 113, Refund sent to processing
8:22:19 111, 1, Patterns of Enterprise Architecture, Kindle edition, 39.99
8:23:45 112, 1, Joy of Clojure, Hardcover, 29.99
8:24:19 113, -1, Patterns of Enterprise Architecture, Kindle edition, 39.99
If I wanted to view my log files I can concatenate them and print them to standard out, as the example above shows. That's cool, but things could be a bit more readable.

sort
sort - sort lines of text files
Using sort is an obvious choice here.
jfields$ cat order.* | sort
8:22:19 111, 1, Patterns of Enterprise Architecture, Kindle edition, 39.99
8:22:20 111, Order Complete
8:23:45 112, 1, Joy of Clojure, Hardcover, 29.99
8:23:50 112, Order sent to fulfillment
8:24:19 113, -1, Patterns of Enterprise Architecture, Kindle edition, 39.99
8:24:20 113, Refund sent to processing
As the example above shows, my data is now sorted. With small sample files, you can probably deal with reading the entire file. However, any real production log is likely to have plenty of lines that you don't care about. You're going to want a way to filter the results of piping cat to sort.

grep
grep, egrep, fgrep - print lines matching a pattern
Let's pretend that I only care about finding an order for PofEAA. Using grep I can limit my results to PofEAA transactions.
jfields$ cat order.* | sort | grep Patterns
8:22:19 111, 1, Patterns of Enterprise Architecture, Kindle edition, 39.99
8:24:19 113, -1, Patterns of Enterprise Architecture, Kindle edition, 39.99
Assume that an issue occurred with the refund on order 113, and you want to see all data related to that order - grep is your friend again.
jfields$ cat order.* | sort | grep ":\d\d 113, "
8:24:19 113, -1, Patterns of Enterprise Architecture, Kindle edition, 39.99
8:24:20 113, Refund sent to processing
You'll notice that I put a bit more than "113" in my regex for grep. This is because 113 can also come up in a product title or a price. With a few extra characters, I can limit the results to strictly the transactions I'm looking for.

Now that we've sent the order details on to refunds, we also want to send the daily totals of sales and refunds on to the accounting team. They've asked for each line item for PofEAA, but they only care about the quantity and price. What we need to do is cut out everything we don't care about.

cut
cut - remove sections from each line of files
Using grep again, we can see that we grab the appropriate lines. Once we grab what we need, we can cut the line up into pieces, and rid ourselves of the unnecessary data.
jfields$ cat order.* | sort | grep Patterns
8:22:19 111, 1, Patterns of Enterprise Architecture, Kindle edition, 39.99
8:24:19 113, -1, Patterns of Enterprise Architecture, Kindle edition, 39.99
jfields$ cat order.* | sort | grep Patterns | cut -d"," -f2,5
 1, 39.99
 -1, 39.99
At this point we've reduced our data down to what accounting is looking for, so it's time to paste it into a spreadsheet and be done with that task.

Using cut is helpful in tracking down problems, but if you're generating an output file you'll often want something more complicated. Let's assume that accounting also needs to know the order ids for building some type of reference documentation. We can get the information using cut, but the accounting team wants the order id to be at the end of the line, and surrounded in single quotes. (for the record, you might be able to do this with cut, I've never tried)

sed
sed - A stream editor. A stream editor is used to perform basic text transformations on an input stream.
The following example shows how we can use sed to transform our lines in the requested way, and then cut is used to remove unnecessary data.
jfields$ cat order.* | sort | grep Patterns \
>| sed s/"[0-9\:]* \([0-9]*\)\, \(.*\)"/"\2, '\1'"/
1, Patterns of Enterprise Architecture, Kindle edition, 39.99, '111'
-1, Patterns of Enterprise Architecture, Kindle edition, 39.99, '113'
lmp-jfields01:~ jfields$ cat order.* | sort | grep Patterns \
>| sed s/"[0-9\:]* \([0-9]*\)\, \(.*\)"/"\2, '\1'"/ | cut -d"," -f1,4,5
1, 39.99, '111'
-1, 39.99, '113'
There's a bit going on in that example regex, but nothing too complicated. The regex does the following things
  • remove the timestamp
  • capture the order number
  • remove the comma and space after the order number
  • capture the remainder of the line
There's a bit of noise in there (quotes and slashes), but that's to be expected when you're working on the command line.

Once we've captured the data we need, we can use \1 & \2 to reorder and output the data in our desired format. We also include the requested double quotes, and add our own comma to keep our format consistent. Finally, we use cut to remove the superfluous data.

Now you're in trouble. You've demonstrated that you can slice up a log file in fairly short order, and the CIO needs a quick report of the total number of book transactions broken down by book.

uniq
uniq - removes duplicate lines from a uniqed file
(we'll assume that other types of transactions can take place and 'filter' our in file for 'Kindle' and 'Hardcover')

The following example shows how to grep for only book related transactions, cut unnecessary information, and get a counted & unique list of each line.
jfields$ cat order.out.log | grep "\(Kindle\|Hardcover\)" | cut -d"," -f3 | sort | uniq -c
   1  Joy of Clojure
   2  Patterns of Enterprise Architecture
Had the requirements been a bit simpler, say "get me a list of all books with transactions", uniq also would have been the answer.
jfields$ cat order.out.log | grep "\(Kindle\|Hardcover\)" | cut -d"," -f3 | sort | uniq
 Joy of Clojure
 Patterns of Enterprise Architecture
All of these tricks work well, if you know where to find the file you need; however, sometimes you'll find yourself in a deeply nested directory structure without any hints as to where you need to go. If you're lucky enough to know the name of the file you need (or you have a decent guess) you shouldn't have any trouble finding what you need.

find
find - search for files in a directory hierarchy
In our above examples we've been working with order.in.log and order.out.log. On my box those files exist in my home directory. The following example shows how to find those files from a higher level, without even knowing the full filename.
jfields$ find /Users -name "order*"
Users/jfields/order.in.log
Users/jfields/order.out.log
Find has plenty of other options, but this does the trick for me about 99% of the time.

Along the same lines, once you find a file you need, you're not always going to know what's in it and how you want to slice it up. Piping the output to standard out works fine when the output is short; however, when there's a bit more data than what fits on a screen, you'll probably want to pipe the output to less.

less
less - allows forward & backward movement within a file
As an example, let's go all the way back to our simple cat | sort example. If you execute the following command you'll end up in less, with your in & out logs merged and sorted. Within less you can forward search with "/" and backward search with "?". Both searches take a regex.
jfields$ cat order* | sort | less
While in less you can try /113.*, which will highlight all transactions for order 113. You can also try ?.*112, which will highlight all timestamps associated with order 112. Finally, you can use 'q' to quit less.

The linux command line is rich, and someone intimidating. However, with the previous 8 commands, you should be able to get quite a few log slicing tasks completed - without having to drop to your favorite scripting language.

* okay, possibly Unix, that's not the point

32 comments:

  1. Andrew9:15 AM

    I far prefer tac to cat, it's cat reversed, and starts at the bottom of the file vs. the top.

    ReplyDelete
  2. Another one I use all the time is "tail", especially "tail -f" to watch a log file.

    ReplyDelete
  3. Mike, I used to use tail, but I tend to use less and shift+f to basically tail within less.

    ReplyDelete
  4. I would add xargs as the 9th command you should know.
    It allows you to complete the story:
    $ find /Users -name "order*" | xargs cat | grep "\(Kindle\|Hardcover\)" | cut -d"," -f3 | sort | uniq

    ReplyDelete
  5. sort | uniq is a useless pipe operation.

    sort -u already covers that.

    ReplyDelete
  6. @Uri Nativ xargs is cool but in that case you dont need it, because find has a commandline option calles -exec

    $ find /Users -name "order*" -exec cat {} \; | grep "\(Kindle\|Hardcover\)" | cut -d"," -f3 | sort | uniq

    ReplyDelete
  7. This comment has been removed by the author.

    ReplyDelete
  8. @Martin Holzhauser Also, GNU find has '-exec [command] {} +', if you don't want [command] executed for each result of find (e.g. xargs-like command iteration).

    @Jay Fields I also switched from tail to less. Note, you can 'less +F ' on the command line to start in 'follow' (shell alias, etc.).

    (HTML tags killed my last comment.. no comment editing w/ blogger.com?)

    ReplyDelete
  9. Anonymous7:27 AM

    Why do 'sort | uniq'. Just do 'sort -u'

    ReplyDelete
  10. sort | uniq

    may be superfluous, but

    sort | uniq -c

    is definitely not. (Granted, not what he posted, but it is a pattern I use often.)

    whatever | sort | uniq -c | sort -rn

    for a listing of "most often occurring string" to "least often"

    ReplyDelete
  11. Carlo Barbieri8:31 AM

    I find ack to work much much better than any combination of find cat and grep. Try it out!

    ReplyDelete
  12. $ find /Users -name "order*" -exec grep "\(Kindle\|Hardcover\)" {}; | cut -d"," -f3 | sort | uniq

    Useless Use Of Cat! (UUOC)

    ReplyDelete
  13. Very useful for someone trying to get into Linux for developing, coming from Windows. This is effectively, awesome. Thanks!

    ReplyDelete
  14. Don't forget: xargs, nm, ldd, strings, awk, strace, file

    ReplyDelete
  15. The best command is man: Type 'man man' to learn how to use it and 'man -k keyword' to search for commands that you need.

    ReplyDelete
  16. I'd add awk to this list, as well. The grep + sed + awk trio has been a durable cohort over the decades. What does awk bring to the table? Where sed's forte is pattern substitution, awk takes over for doing extraction and transforming of structured data (e.g. adding a line at the end of a report with the sum total, or transposing two columns). Awk also adds functions and some data types. You'd think that once you need a data type you should just go to Ruby or Python, but you'll be pleasantly surprised at how much you can get done quickly with awk.

    ReplyDelete
  17. Thanks Rambo,
    Chelimsky is a big fan of awk as well, so I'm sure I'll get plenty of experience with it over the next year.

    ReplyDelete
  18. @Martin Holzhauer, thanks for the tip. The nice thing with xargs is that its syntax is just easier to memorize :)

    ReplyDelete
  19. Anonymous2:32 AM

    i don't know - my most favourite command on Linux is "python", which is a awesome substitution for all of this commands ...

    ReplyDelete
  20. Anonymous3:20 AM

    All these commands are ugly.

    - Yves

    ReplyDelete
  21. Anonymous8:20 AM

    These commands are nice, but if you use them often, you need many more options. For example with uniq, I have one with the next set of options, so you can count and do uniq over only a certain part of the strings:

    ? : show this text and exit
    n : Compare only first n characters or words
    Bstring : Block: ignore text from (hex) string at any line
    C : Compress white space before compare
    E : separator for /SS modes is empty line i.o. "------"
    F : Give First line of equal lines (default)
    FF : as above, prepend a line count
    FFF : as above, add linecounts that are in input
    FFFF : as above, but give the number of lines too
    H : Human: flush output often (slower but more interactive)
    I : Ignore case. Note: consider /Yc to control accented chars
    K : Keep trailing spaces and tabs (default: strip)
    L : Give Last line of equal lines instead of first one
    LL : as above, prepend a line count
    LLL : as above, add linecounts that are in input
    Example:
    sortm file1 | uniq /ll > temp.u
    sortm file2 | uniq /ll >> temp.u
    (add more sortm | uniq operations if wanted)
    sortm /wb2 temp.u | uniq /lll > resultfile
    LLLL : as above, but give the number of lines too
    M : give info Message at exit
    N[w] : add line Numbers [w chars wide, w=1..9 [6]]
    NOTE: /N output is not compatible for /FFF or /LLL
    O : output Only first line of equal lines (use with /S)
    P[w] : give Position number in /S modes [w chars wide, w=1..9 [4]]
    Qn : Text between Quotes is seen as a word
    n=0: none; 1: single; 2: double; 3: both quotes
    QQn : As Qn, but quote doubling assumed
    R : Reverse: output lines that would be skipped
    S : output lines that compare the Same
    SS : as above, but Separate the compare regions
    SSS : as above, but give also unique lines
    T : keep Tabs (default: convert to spaces)
    U : output only lines that are unique
    W : compare first words instead of chars
    Yc : Extended upper/lower case chars: c= I:IBM; W:Win/ISO8859; N:None [N]
    Z : Zombie: Output numbers from /ll modes in parsable format

    ReplyDelete
  22. Two of my favorites not listed here are vi/vim and md5sum. md5sum is a great time saver when you need to compare a large group of files. This command - md5sum *|sort|less puts all of the identical files in a directory together regardless of name.

    ReplyDelete
  23. Thanks for this list! Since it's getting popular, I think it would be best to mention that running grep first, and then sort, will generally be faster, especially for very large files.
    (Note that if you grep first, though, some of grep options like -C will not make sense)

    ReplyDelete
  24. Anonymous11:27 AM

    diff, cmp, cp -a, ls -altr, locate,

    ReplyDelete
  25. I am not aware of a distribution of Linux which uses /Users for home directories.

    ReplyDelete
  26. If looking at log files while they are being written, tee is good.

    ReplyDelete
  27. Anonymous8:52 AM

    I was expecting make, gcc, strace, gdb, ltrace, but these are all basics that are, indeed, a must to know.

    ReplyDelete
  28. You can't always install emacs. vi/vim are essential tools, even if you prefer to use another editor.

    ReplyDelete
  29. I like your list, and would also add vi/vim to it. Sometimes you cannot always install your favorite editor, and vi works, albeit awkwardly if you don't know it well, when you are in a pinch.

    ReplyDelete
  30. cat utility mostly useless as you always can put it args to utility that next in pipe

    ReplyDelete
  31. Using xargs in combination with find is far preferable to using the -exec arg of find. The syntax after the -exec arg is ugly and awkward, whereas xargs brings elegance and simplicity.

    But xargs isn't only useful with find. Any time you need to convert a list of data in the pipe stream (files, strings, whatever) into arguments for a command later in the pipe stream, xargs is the tool to do it.

    Let's use find/xargs anyway. Let's say, for example, you have a bunch of log files you no longer want, log files with "foo" in the name of the file, and they're all under your current directory. The following will get rid of those files:

    find . -type f | egrep 'foo.*log$' | xargs rm

    What if you want to remove a bunch of log files where "foo" is not in the name of the file, but rather in the contents of the file? We use xargs twice in the pipe stream to accomplish that, along with an example of awk:

    find . -type f -name "*.log" | xargs egrep foo | awk -F: '{print $1}' | sort -u | xargs rm

    The commands grep and egrep each have 2 ways of operating on data. The first case is where they're receiving string data from stdin:

    ... | egrep pattern

    The second case is where they're given a set of filenames to search through:

    egrep pattern filename [filename...]

    In the first find example above, the list of filenames which were piped into egrep came through stdin, and they were thus treated as strings. In the second find example above, the list of filenames which were piped into xargs egrep got treated as arguments to egrep (because of the magic of xargs), and therefore the filenames represented the actual files, and their contents were searched.

    I love your blog, Jay Fields!

    ReplyDelete
  32. Anonymous8:38 AM

    Excellent introduction! Thanks a lot.

    ReplyDelete

Note: Only a member of this blog may post a comment.