Some Unix Tips
Here are some Unix commands and shortcuts that I have found useful,
especially when working with large files of data.
Some of these commands work only in the csh or tcsh shells. However,
those shells are the default on the Math Department computers, and
most likely you are set up to use these shells.
You can check this by doing
"finger login", where "login" is your login name.
Shell shortcuts
-
!!
Repeats the most recent command.
-
!t
Repeats the most recent command beginning with "t". Similarly for
other letters, or strings of initial letters in a command. This is
very handy when going through a compiling/running/editing cycle.
For example, if you work on a latex document, after giving
the full command names "latex file", "xdvi file", "dvips file", during
the first cycle, you can then repeat the steps using the shortcuts "!l",
"!x", "$d". (Be careful, though; if you type "!r" instead of "!t", you
may inadvertently repeat the last delete command.)
-
dvips !$
This executes dvips with the last argument of the most recent command.
For example, after doing "xdvi verylongfilename", the above
command will run the file "verylongfilename" through dvips. Similarly, after
editing a file with "vi verylongfilename", you can print out the file
by saying "lpr verylongfilename".
-
history
Gives a list of the most recent commands. Useful if you want to
repeat a command, but don't remember the exact form you used.
The number of commands saved in this way can be changed by adding
a line such as "set history = 1000" in the .cshrc file.
-
history | grep 'perl'
List all commands within the saved command history matching 'perl'.
-
!1729
Repeat command number 1729 in the history listing.
-
Up/down arrows
As an alternative to the exclamation mark commands, you can use
the up/down arrows to move back and forth in the command history and
repeat/edit a command.
Pipes and Redirection
One of the most useful features of Unix is the ability to "pipe" the
output of one command into another command, and to "redirect" the
output to a file.
Pipes are denoted by vertical bars (|); redirections are denoted by
"greater than" signs (>). Spaces around these symbols are not
required, though for clarity, you might want to surround the symbols
by spaces.
Redirection of output is a very simple concept; instead of displaying
the output on the screen the program dumps its output onto a file.
A Unix pipe works much like a plumber's pipe: it takes the output of
the command to the left of the pipe symbol (|) and uses this as the
input of the command to the right of the pipe symbol.
Multiple pipes can be stacked together.
Also, you can combine history shortcuts with pipes. For example,
if, after displaying a sorted file with the "sort" command,
you find that the file is too big to fit on one screen, the command
"!! | less" will redisplay the file, one screen at a time.
The following
examples illustrate the use of pipes. Most are
self-explanatory; "file" stands for a generic filename.
-
sort file | less
Sort the file and
display the sorted file, one page at a time. (If you prefer the
standard command "more", you can use this instead of "less".
"less" is an enhanced version of "more" - for example,
it allows you to move backwards
and forwards in the file; to exit "less" , use "q".)
-
sort file | lpr
Sort the file and send the sorted file to the printer.
-
sort file > file.sorted
Sort the file and create a new file, "file.sorted", with the contents
of the sorted file.
Tools for analyzing data files
The following are some handy tools for analyzing data files.
-
sort: Sort files.
-
grep: Extract lines matching a given pattern from a file.
-
sed: Perform simple substitutions and other modifications
on lines of a file.
-
awk: Extract specific fields, columns, from a file (plus much
more).
-
perl: The ultimate geek tool.
Can do all of the above, plus much more.
The first three are part of any standard Unix distribution and come
with man pages that you can consult. Perl comes with multiple man
pages, there is an elaborate online documentation system (accessed with
perldoc), and an enormous amount of information available, both online
and in print.
See the separate
Perl Tips page for more.
While learning the full power of these tools takes time
and effort (years in the
case of Perl), it is easy to learn enough to be able to use
these utilities on the command line for simple data analysis tasks.
To illustrate this, assume you have a file of numbers, three per line,
separated by blanks (this is the most convenient format for the above
utilities), like the following:
123 398 17359
317 19 2909
39 -399 -5789
49 33 200
255 33 -378
Here is how you could accomplish various tasks using one of the
mentioned utilities. (Here "file" is assumed to be the filename.
Recall that you can save the output of each of these commands to a
file by appending a command like "> file.out" or pipe it through
a pager with "| less".)
-
sort -n file
Sort the file by first column. The -n option ensures numeric (as
opposed to lexicographic) sort.
-
sort -k 2 -n file
Sort the file by second column. The "-k" option here denotes the
column used as sort key.
-
grep '33' file
Extract all lines containing the string "33"
(in the above example, lines 4 and 5).
-
grep -c '33' file
Same, but display only the number of matching lines (2 in the
example), not the lines themselves. This is useful to analyze
large data files of output data. For example, if a sequence of one
million integers, is saved as a file, one per line,
"grep -c '33' file" will display the number of 0's in that sequence.
-
grep -c '-' *.out
Same command, but applied to all files in the current directory
matching "*.out". For each file there is an output line of the form
"Filename: x", where "x" is the number of matching lines in the file.
-
sort -n file | cat -n
Sort the file, then prepend line numbers to each line. This results in
the following:
1 39 -399 -5789
2 49 33 200
3 123 398 17359
4 255 33 -378
5 317 19 2909
This can be useful if you want to count the number of lines
in which the first entry is in a given range: simply subtract the
line numbers corresponding to the beginning and end of the range.
-
perl -lpe's/\s+/,/g' file
Replace any blank space by a comma. This converts the above list
to one in which the fields are separated by commas (which is a common
spreadsheet format).
1,39,-399,-5789
2,49,33,200
3,123,398,17359
4,255,33,-378
5,317,19,2909
-
sort -n file | uniq -c | sort -n
This counts the number of occurrences of each line, then displays the
distinct lines of the file,
along with the corresponding counts in increasing order.
This is useful for files containing multiple identical lines.
For example, suppose a
file consists of lines each containing a single number from a given
set, say {0,1,2}. Then above command would show how many times each of
the numbers 0,1,2 in the set occurs and would display these counts in
increasing order.
-
perl -lane'print $F[1]' file
Print second column of the file.
The "e" option indicates that the string is to be interpreted as a
perl script. The "l" option ensures proper end-of-line handling. The
"a" option causes perl to autosplit each line into an array of fields
$F[0], $F[1], ..., with blank space acting as default field separator.
(Note that in Perl, array indices start at 0, so the first array element
has index 0.) The "n" option makes *not* print the default, so that
only material specified by an explicit print command gets printed. If
"p" is specified instead of "n", each line gets printed after any
editing commands specified by "e" have been executed.
-
perl -lane'print $F[-1]' file
Print the last column of the file. In Perl, negative array indices
denote array elements counted from the right. Thus $F[-1] denotes the
last field (column), $F[-2] the second last, etc.
-
perl -lane'print $F[2],$F[1]' file
Print the second and third columns of the file in reverse order,
separated by a comma. (Something like this may be useful to generate
a file that can be imported to gnuplot, gp, etc.)
-
perl -pe 's/3/1/g' file
Replace 3 in the file by 1. (The "g" modifier stands for a global
substitute/replace operation. Without it, only the first occurrence
would be substituted.)
-
perl -i.bak -pe 's/3/1/g' file
The same, but with "in place" editing. The substitution is performed on
the file itself, and the original version of the file is saved onto
a file with extension ".bak". (Saving the original version onto a backup
file is safety mechanism; the name of the backup file can be changed by
replacing the string ".bak" by something else. If no such string is
provided in the "-i" option, then the file is modified without backing
up.)
-
More Perl Tips:
See the separate
Perl Tips page.
Miscellany
-
fmt file
Reformat file by wrapping overlong lines and filling short lines.
-
fmt -s file
Wrap overlong lines, but do not fill short lines. Thus, all linebreaks
that were present in the original version are preserved.
-
nroff file
Reformat file and also justify lines to have a uniform width, mimicking
typeset output. Very impressive. (nroff is part of the *roff family
of tools, which used to be the standard typesetting tool in Unix, and is
still used to format Unix man pages.)
Last modified: Mon 20 Jul 2009 09:43:54 AM CDT
A.J. Hildebrand