What is sed?
— WikiWikiWeb: Sed Language
sed - a Stream EDitor
sed is a UNIX utility which processes a text file one line at a time. It has RegularExpression based string manipulation, a hold buffer, and some basic flow control. Amazing things can be done with these BearSkinsAndStoneKnives.
Yes, I guess
sed is a stone knife or bear-skin, in that it's one of those ancient Unix utilities with a reputation for being powerful in the right context, but a bit difficult to wield. Whether or not the reputation is justified, it's good to know a little about
sed because it's everywhere in Unix/Linux, and it proves useful in many situations.
My main use case for
sed is when I find myself thinking, "Gee, I just need to run a regular expression over this file or bunch of text."
In addition to replacing, it can delete certain text or lines, or insert blank lines. The way
sed changes files is called non-interactive editing: all change instructions are defined up front, and then applies them to the input, line by line.
This makes it suited to be part of shell scripts or other automated workflows, and handy for one-time changes such as data cleanup.
How to learn sed
Check out Sed Examples by Sasikala for a nice overview of
sed features and uses. Each of the posts contains examples organized by feature and command, which makes it easy to find something specific to the task at hand.
There's a Digitalocean tutorial: The Basics of Using the Sed Stream Editor to Manipulate Text in Linux. However, it's my opinion that some of the best
sed info is found on websites of a certain vintage:
Here's a simple example for replacing a certain string value in some data. Given a text file:
> cat trends.txt
old and busted fashions
old and busted hats
music that is old and busted
all old and busted everything
Perform the substitution on all lines of the original file with command
s, and pipe the output to a new file:
> sed 's/old and busted/new hotness/' trends.txt > new-trends.txt
The new file looks exactly like the original file, but with our phrase replaced:
> cat new-trends.txt
new hotness fashions
new hotness hats
music that is new hotness
all new hotness everything
I encountered some wild Pokemon data that needed a bit of cleanup:
As the sample shows, the property values in
stats are formatted as strings, and some are numbers. It's not only
speed, but all of the stats are formatted inconsistently throughout the file, which makes the data less easy to use. It's possible the database or import script could coerce the values for us, but let's make changes to the source file itself so that the data will be consistent no matter how we use it.
Here is the incantation to
> sed -E '/[[:space:]]*("id"|"height")/!s/"([[:digit:]]+\.*[[:digit:]]*)"/\1/'
- The RegEx part before
sed to ignore a line if it has either
" id" or
" height" [where there is a space character before the attribute]
- This is because
heights have numbers in their values, but the nature of the data indicates that they should remain formatted as strings.
- The substitution command
s/"( ...symbols... )"/\1/ does this:
- Match on patterns that look like numbers inside quotation marks
- Group the part inside the quotation marks with parentheses
- Replace with the matched group
- The flag
-E is for modern/extended RegEx format
In practice, the full command would also indicate the original and output filenames, as seen in the simple replace example earlier in this post. The end result is that all number values are represented as Numbers, not Strings, except for those properties where string representation is appropriate for number-like values.
Bottom line: it's good to know about sed, what it does, and how it can be applied to everyday problems. It's a widely available utility, worth keeping in your Unix toolbox, even if it appears at first to have all the user-friendliness of an old flint blade.