Wednesday, March 4, 2009

Challenge: unix command lines and quoting

Here's the first straight-to-blog challenge! Post your solutions and questions in the comments.

Misunderstandings about how the shell treats a command cause a lot of pain for unix users. That's a shame, since the rules are pretty easy to learn. Here's some basic command line anatomy.

You can think of a unix command as a series of words separated by one or more whitespace characters. So these commands are all equivalent:

ls /tmp /etc

ls       /tmp /etc

     ls /tmp      /etc

The first word in a command is the program to run. Well, it's usually the first word, unless you've decided to set one or more variables first:

FOO=hello echo $FOO

But that's a more advanced thing to do, and is rarely used.

The command to run is usually a compiled binary found somewhere in your PATH variable. That is, when you type ls, the shell looks through each of the directories in your PATH for a program called ls. The "which" program lets you specify a command and tells you which directory of your path it's found in. Simple challenge: figure out where ls and gimp live on your system.

(Trivia: The command might also be a "shell built-in". For example, although there exists a binary /bin/echo, bash has its own copy of echo inside it, and when you run echo, it does what you wanted directly, instead of running the program in /bin for you. You can see there's a difference by running "echo --help" and then "/bin/echo --help")

If you "echo $PATH", you'll see that your path probably doesn't include the directory called ".", which means "the directory where I happen to be right now." If you used DOS back in the olden days, you probably got used to having the current directory in your path, since you could do things like "cd c:\wp" then "wp.exe" to start Word Perfect. Unix doesn't include . in your path for a very good reason: many unix machines are shared between people with different degrees of access. So if you had "." as the first directory in your path, and tried to run "ls" in somebody else's home directory, and that person had a program in their home directory called "ls", you'd be running their copy of ls instead of the one you expected. That could allow them to take over your account, since you're running a command they wanted you to. Even if you put . at the end of your path, so that ls gets run from the normal place, some malicious person might put a command called, say, "la" in their home directory, in hopes that you might mistype "ls".

Therefore, whenever you want to run a script or program you've written yourself, you need to tell the shell where to find it, even if it's in the current directory. That way, you're being explicit about whether to run a system command or something extra you've written or downloaded.

So, let's create a very simple script: in your favorite text editor, create a file in your home directory called hello.sh that contains simply:

echo hi

Then run "cd" by itself to change to your home directory. Use "chmod 755 hello.sh" to make it executable (more on that in a future challenge), then run it from the current directory with "./hello.sh"

The ./ tells the shell to find it in the current directory. Another common instance in which people specify the full path to a program is when starting or stopping system services. The directory /etc/init.d contains scripts for starting and stopping most of the things on your system like the graphical environment (/etc/init.d/gdm) and the ssh server (/etc/init.d/ssh).

Once you've specified the program to run, the remaining words are passed into the program as arguments. If you change your hello.sh to "echo $1" instead of "echo hi", the special variable $1 will get replaced with the first argument you call it with:

$ ./hello.sh foo
foo
$ ./hello.sh foo bar
foo

Likewise, $2 through $9 get set to the further arguments that get passed. There's also $*, which expands to all the arguments passed in. The name of the program itself even gets passed in as $0, and some programs take advantage of that: if you rename them or create symlinks to them with particular names, they behave differently.

Mini challenge: change hello.sh so that it works like this:

$ ./hello.sh Archibald slippery
Hello Archibald, its nice to meet you on this slippery day.

What fun! But what if you wanted to list "Archibald Q. Wentsocket" as the person to greet? This is where quoting comes in. Putting quotes: " or single quotes ' around a group of words groups them together into a single argument:

$ ./hello.sh "Archibald Q. Wentsocket" goatlike
Hello Archibald Q. Wentsocket, its nice to meet you on this goatlike day.

What's the difference between those, you ask? Well, before the shell executes your command, it does variable and wildcard (or "glob") expansion on the command line:

$ NAME=Archibald
$ ADJECTIVE=succulent
$ ./hello.sh $NAME $ADJECTIVE
Hello Archibald, its nice to meet you on this succulent day.

Wildcard expansion allows you to use * to fill in for any number of letters in a filename. Here the shell finds only one file in the current directory starting with he and ending with .sh, so it substitutes he*.sh with hello.sh before running the command:

$ echo he*.sh
hello.sh

$ ./hello.sh he*.sh
Hello hello.sh, its nice to meet you on this day.

If you ever find yourself on a unix system so broken that even ls doesn't work, you can use this to your advantage: instead of "ls" to list the files in the current directory, use "echo *".

You can also use ? to substitute for a single character in a filename: he??o.sh would match hello.sh, heplo.sh, and he37o.sh.

When you use " to quote an argument, any variables or wildcards inside the quoted string get expanded like normal. Try these:

$ ./hello.sh "$NAME $ADJECTIVE"
$ ./hello.sh "you simpering idiot. This script is called he??o.sh, and also" foolish

When you use single quotes, the quoted string is taken /literally/, with no expansion. Replace the " with ' in the above and see what happens.

There's another way to quote things, and it's useful when you want a " or ' to appear in an argument, or when you only want to "escape" a single character:

$./hello.sh \$NAME $ADJECTIVE
$./hello.sh he\?\?o.sh intransigent

Bonus challenge: (this requires some of the skills from prior challenges that haven't been posted on the blog yet) Write a shell script that does different things depending on the name of the script. Create symlinks to the script to exploit these behaviors. Now create a hard link to the script (also using ln), and explain how a hard link differs from a symlink.

10 comments: