Wednesday, March 4, 2009

Challenge: unix command lines and quoting

Here's the first straight-to-blog challenge! Post your solutions and questions in the comments.

Misunderstandings about how the shell treats a command cause a lot of pain for unix users. That's a shame, since the rules are pretty easy to learn. Here's some basic command line anatomy.

You can think of a unix command as a series of words separated by one or more whitespace characters. So these commands are all equivalent:

ls /tmp /etc

ls       /tmp /etc

     ls /tmp      /etc

The first word in a command is the program to run. Well, it's usually the first word, unless you've decided to set one or more variables first:

FOO=hello echo $FOO

But that's a more advanced thing to do, and is rarely used.

The command to run is usually a compiled binary found somewhere in your PATH variable. That is, when you type ls, the shell looks through each of the directories in your PATH for a program called ls. The "which" program lets you specify a command and tells you which directory of your path it's found in. Simple challenge: figure out where ls and gimp live on your system.

(Trivia: The command might also be a "shell built-in". For example, although there exists a binary /bin/echo, bash has its own copy of echo inside it, and when you run echo, it does what you wanted directly, instead of running the program in /bin for you. You can see there's a difference by running "echo --help" and then "/bin/echo --help")

If you "echo $PATH", you'll see that your path probably doesn't include the directory called ".", which means "the directory where I happen to be right now." If you used DOS back in the olden days, you probably got used to having the current directory in your path, since you could do things like "cd c:\wp" then "wp.exe" to start Word Perfect. Unix doesn't include . in your path for a very good reason: many unix machines are shared between people with different degrees of access. So if you had "." as the first directory in your path, and tried to run "ls" in somebody else's home directory, and that person had a program in their home directory called "ls", you'd be running their copy of ls instead of the one you expected. That could allow them to take over your account, since you're running a command they wanted you to. Even if you put . at the end of your path, so that ls gets run from the normal place, some malicious person might put a command called, say, "la" in their home directory, in hopes that you might mistype "ls".

Therefore, whenever you want to run a script or program you've written yourself, you need to tell the shell where to find it, even if it's in the current directory. That way, you're being explicit about whether to run a system command or something extra you've written or downloaded.

So, let's create a very simple script: in your favorite text editor, create a file in your home directory called hello.sh that contains simply:

echo hi

Then run "cd" by itself to change to your home directory. Use "chmod 755 hello.sh" to make it executable (more on that in a future challenge), then run it from the current directory with "./hello.sh"

The ./ tells the shell to find it in the current directory. Another common instance in which people specify the full path to a program is when starting or stopping system services. The directory /etc/init.d contains scripts for starting and stopping most of the things on your system like the graphical environment (/etc/init.d/gdm) and the ssh server (/etc/init.d/ssh).

Once you've specified the program to run, the remaining words are passed into the program as arguments. If you change your hello.sh to "echo $1" instead of "echo hi", the special variable $1 will get replaced with the first argument you call it with:

$ ./hello.sh foo
foo
$ ./hello.sh foo bar
foo

Likewise, $2 through $9 get set to the further arguments that get passed. There's also $*, which expands to all the arguments passed in. The name of the program itself even gets passed in as $0, and some programs take advantage of that: if you rename them or create symlinks to them with particular names, they behave differently.

Mini challenge: change hello.sh so that it works like this:

$ ./hello.sh Archibald slippery
Hello Archibald, its nice to meet you on this slippery day.

What fun! But what if you wanted to list "Archibald Q. Wentsocket" as the person to greet? This is where quoting comes in. Putting quotes: " or single quotes ' around a group of words groups them together into a single argument:

$ ./hello.sh "Archibald Q. Wentsocket" goatlike
Hello Archibald Q. Wentsocket, its nice to meet you on this goatlike day.

What's the difference between those, you ask? Well, before the shell executes your command, it does variable and wildcard (or "glob") expansion on the command line:

$ NAME=Archibald
$ ADJECTIVE=succulent
$ ./hello.sh $NAME $ADJECTIVE
Hello Archibald, its nice to meet you on this succulent day.

Wildcard expansion allows you to use * to fill in for any number of letters in a filename. Here the shell finds only one file in the current directory starting with he and ending with .sh, so it substitutes he*.sh with hello.sh before running the command:

$ echo he*.sh
hello.sh

$ ./hello.sh he*.sh
Hello hello.sh, its nice to meet you on this day.

If you ever find yourself on a unix system so broken that even ls doesn't work, you can use this to your advantage: instead of "ls" to list the files in the current directory, use "echo *".

You can also use ? to substitute for a single character in a filename: he??o.sh would match hello.sh, heplo.sh, and he37o.sh.

When you use " to quote an argument, any variables or wildcards inside the quoted string get expanded like normal. Try these:

$ ./hello.sh "$NAME $ADJECTIVE"
$ ./hello.sh "you simpering idiot. This script is called he??o.sh, and also" foolish

When you use single quotes, the quoted string is taken /literally/, with no expansion. Replace the " with ' in the above and see what happens.

There's another way to quote things, and it's useful when you want a " or ' to appear in an argument, or when you only want to "escape" a single character:

$./hello.sh \$NAME $ADJECTIVE
$./hello.sh he\?\?o.sh intransigent

Bonus challenge: (this requires some of the skills from prior challenges that haven't been posted on the blog yet) Write a shell script that does different things depending on the name of the script. Create symlinks to the script to exploit these behaviors. Now create a hard link to the script (also using ln), and explain how a hard link differs from a symlink.

Find needles in haystacks

Leftover from last time: create a file with 10 megabytes of random data using dd. (Hint: use the bs option and count=1)

Then, download http://lunkwill.org/random-bytes, which is 10MB of random data I created using dd. I've hidden a plain text message inside that's at least 20 characters long. Good luck finding it with just a text editor. Use the 'strings' command to find it for you. (You may also want to use wget to download the file instead of your normal web browser.)

Now use od |head to print out the data in the file in a more readable format. od -x is the most common way to use it, since it prints out the values (except for the address in the first column) in hex instead of octal. Also, od -c will show you the printable characters printably.

Tricky part: use od, grep and wc to count the number of FF (bytes with decimal value 255) bytes in the file.

Hint: use --format to print out single bytes in hex, --width to print out only a byte at a time, and check out the -v and -A options.

Today I used od -x with a serial remote control receiver, so that I could see what bytes were output when I pressed various buttons. I used minicom to set /dev/ttyUSB0 to 1200 baud, 8N1, then did control-a then q to exit without resetting the port settings. Then od -x /dev/ttyUSB0 showed me what it heard from the port (at least, after it had heard a few lines worth of data).

Challenge #1: Waste disk space

The other day, I gave an old hard drive to my cousin so he can build a linux machine. I wanted to erase the disk first, but it was taking too long, so I told him that once he gets unix installed, I'd show him how to use dd to blank the rest of the disk.

So that's the challenge: create a file full of nulls (zeros) that uses up all the available space on the local disk (not the netfiler). As "df" will show you, /tmp will be on the local disk.

As yourself (not as root), use dd to read from /dev/zero and write to a file in /tmp until it runs out of space and dies. Then delete it before your system starts freaking out.

For extra credit, fill it with random data and tell us how it compares with /dev/zero for speed and CPU usage (xosview and top can help with this). Can you get dd to tell you its current write speed on demand using "kill"?

Maybe these challenges should all come with an implicit spoiler warning: post your solution once you get it, and if you haven't got it, don't peek at everybody else's answers.

Welcome to the challenge!

Some of my coworkers were interested in learning unix, and so we created an internal mailing list where I could post regular, short lessons on unix along with simple challenges to apply what they learned. It worked out well enough that I wanted to make it available to my friends and family outside of work.

Since the challenges tend to build on one another, it's probably best to start from the beginning and solve the challenges before moving on to the later ones.

And let me know if I'm going too fast, but also poke around on the internet to find answers to things I don't explicitly explain. A big part of a new skill like this is knowing where to quickly get answers to your questions, so I'll try to give you the vocabulary for what to search for in the challenges.