Tuesday, August 28, 2012

8 Linux Commands Every Developer Should Know

Every developer, at some point in their career, will find themselves looking for some information on a Linux* box. I don't claim to be an expert, in fact, I claim to be very under-skilled when it comes to linux command line mastery. However, with the following 8 commands I can get pretty much anything I need, done.

note: There are extensive documents on each of the following commands. This blog post is not meant to show the exhaustive features of any of the commands. Instead, this is a blog post that shows my most common usages of my most commonly used commands. If you don't know linux commands well, and you find yourself needing to grab some data, this blog post might give you a bit of guidance.

Let's start with some sample documents. Let's assume that I have 2 files showing orders that are being placed with a third party and the responses the third party sends.
order.out.log
8:22:19 111, 1, Patterns of Enterprise Architecture, Kindle edition, 39.99
8:23:45 112, 1, Joy of Clojure, Hardcover, 29.99
8:24:19 113, -1, Patterns of Enterprise Architecture, Kindle edition, 39.99

order.in.log
8:22:20 111, Order Complete
8:23:50 112, Order sent to fulfillment
8:24:20 113, Refund sent to processing
cat
cat - concatenate files and print on the standard output
The cat command is simple, as the following example shows.
jfields$ cat order.out.log 
8:22:19 111, 1, Patterns of Enterprise Architecture, Kindle edition, 39.99
8:23:45 112, 1, Joy of Clojure, Hardcover, 29.99
8:24:19 113, -1, Patterns of Enterprise Architecture, Kindle edition, 39.99
As the description shows, you can also use it to concatenate multiple files.
jfields$ cat order.* 
8:22:20 111, Order Complete
8:23:50 112, Order sent to fulfillment
8:24:20 113, Refund sent to processing
8:22:19 111, 1, Patterns of Enterprise Architecture, Kindle edition, 39.99
8:23:45 112, 1, Joy of Clojure, Hardcover, 29.99
8:24:19 113, -1, Patterns of Enterprise Architecture, Kindle edition, 39.99
If I wanted to view my log files I can concatenate them and print them to standard out, as the example above shows. That's cool, but things could be a bit more readable.

sort
sort - sort lines of text files
Using sort is an obvious choice here.
jfields$ cat order.* | sort
8:22:19 111, 1, Patterns of Enterprise Architecture, Kindle edition, 39.99
8:22:20 111, Order Complete
8:23:45 112, 1, Joy of Clojure, Hardcover, 29.99
8:23:50 112, Order sent to fulfillment
8:24:19 113, -1, Patterns of Enterprise Architecture, Kindle edition, 39.99
8:24:20 113, Refund sent to processing
As the example above shows, my data is now sorted. With small sample files, you can probably deal with reading the entire file. However, any real production log is likely to have plenty of lines that you don't care about. You're going to want a way to filter the results of piping cat to sort.

grep
grep, egrep, fgrep - print lines matching a pattern
Let's pretend that I only care about finding an order for PofEAA. Using grep I can limit my results to PofEAA transactions.
jfields$ cat order.* | sort | grep Patterns
8:22:19 111, 1, Patterns of Enterprise Architecture, Kindle edition, 39.99
8:24:19 113, -1, Patterns of Enterprise Architecture, Kindle edition, 39.99
Assume that an issue occurred with the refund on order 113, and you want to see all data related to that order - grep is your friend again.
jfields$ cat order.* | sort | grep ":\d\d 113, "
8:24:19 113, -1, Patterns of Enterprise Architecture, Kindle edition, 39.99
8:24:20 113, Refund sent to processing
You'll notice that I put a bit more than "113" in my regex for grep. This is because 113 can also come up in a product title or a price. With a few extra characters, I can limit the results to strictly the transactions I'm looking for.

Now that we've sent the order details on to refunds, we also want to send the daily totals of sales and refunds on to the accounting team. They've asked for each line item for PofEAA, but they only care about the quantity and price. What we need to do is cut out everything we don't care about.

cut
cut - remove sections from each line of files
Using grep again, we can see that we grab the appropriate lines. Once we grab what we need, we can cut the line up into pieces, and rid ourselves of the unnecessary data.
jfields$ cat order.* | sort | grep Patterns
8:22:19 111, 1, Patterns of Enterprise Architecture, Kindle edition, 39.99
8:24:19 113, -1, Patterns of Enterprise Architecture, Kindle edition, 39.99
jfields$ cat order.* | sort | grep Patterns | cut -d"," -f2,5
 1, 39.99
 -1, 39.99
At this point we've reduced our data down to what accounting is looking for, so it's time to paste it into a spreadsheet and be done with that task.

Using cut is helpful in tracking down problems, but if you're generating an output file you'll often want something more complicated. Let's assume that accounting also needs to know the order ids for building some type of reference documentation. We can get the information using cut, but the accounting team wants the order id to be at the end of the line, and surrounded in single quotes. (for the record, you might be able to do this with cut, I've never tried)

sed
sed - A stream editor. A stream editor is used to perform basic text transformations on an input stream.
The following example shows how we can use sed to transform our lines in the requested way, and then cut is used to remove unnecessary data.
jfields$ cat order.* | sort | grep Patterns \
>| sed s/"[0-9\:]* \([0-9]*\)\, \(.*\)"/"\2, '\1'"/
1, Patterns of Enterprise Architecture, Kindle edition, 39.99, '111'
-1, Patterns of Enterprise Architecture, Kindle edition, 39.99, '113'
lmp-jfields01:~ jfields$ cat order.* | sort | grep Patterns \
>| sed s/"[0-9\:]* \([0-9]*\)\, \(.*\)"/"\2, '\1'"/ | cut -d"," -f1,4,5
1, 39.99, '111'
-1, 39.99, '113'
There's a bit going on in that example regex, but nothing too complicated. The regex does the following things
  • remove the timestamp
  • capture the order number
  • remove the comma and space after the order number
  • capture the remainder of the line
There's a bit of noise in there (quotes and slashes), but that's to be expected when you're working on the command line.

Once we've captured the data we need, we can use \1 & \2 to reorder and output the data in our desired format. We also include the requested double quotes, and add our own comma to keep our format consistent. Finally, we use cut to remove the superfluous data.

Now you're in trouble. You've demonstrated that you can slice up a log file in fairly short order, and the CIO needs a quick report of the total number of book transactions broken down by book.

uniq
uniq - removes duplicate lines from a uniqed file
(we'll assume that other types of transactions can take place and 'filter' our in file for 'Kindle' and 'Hardcover')

The following example shows how to grep for only book related transactions, cut unnecessary information, and get a counted & unique list of each line.
jfields$ cat order.out.log | grep "\(Kindle\|Hardcover\)" | cut -d"," -f3 | sort | uniq -c
   1  Joy of Clojure
   2  Patterns of Enterprise Architecture
Had the requirements been a bit simpler, say "get me a list of all books with transactions", uniq also would have been the answer.
jfields$ cat order.out.log | grep "\(Kindle\|Hardcover\)" | cut -d"," -f3 | sort | uniq
 Joy of Clojure
 Patterns of Enterprise Architecture
All of these tricks work well, if you know where to find the file you need; however, sometimes you'll find yourself in a deeply nested directory structure without any hints as to where you need to go. If you're lucky enough to know the name of the file you need (or you have a decent guess) you shouldn't have any trouble finding what you need.

find
find - search for files in a directory hierarchy
In our above examples we've been working with order.in.log and order.out.log. On my box those files exist in my home directory. The following example shows how to find those files from a higher level, without even knowing the full filename.
jfields$ find /Users -name "order*"
Users/jfields/order.in.log
Users/jfields/order.out.log
Find has plenty of other options, but this does the trick for me about 99% of the time.

Along the same lines, once you find a file you need, you're not always going to know what's in it and how you want to slice it up. Piping the output to standard out works fine when the output is short; however, when there's a bit more data than what fits on a screen, you'll probably want to pipe the output to less.

less
less - allows forward & backward movement within a file
As an example, let's go all the way back to our simple cat | sort example. If you execute the following command you'll end up in less, with your in & out logs merged and sorted. Within less you can forward search with "/" and backward search with "?". Both searches take a regex.
jfields$ cat order* | sort | less
While in less you can try /113.*, which will highlight all transactions for order 113. You can also try ?.*112, which will highlight all timestamps associated with order 112. Finally, you can use 'q' to quit less.

The linux command line is rich, and someone intimidating. However, with the previous 8 commands, you should be able to get quite a few log slicing tasks completed - without having to drop to your favorite scripting language.

* okay, possibly Unix, that's not the point