eareese.com

About sed

What is sed?

sed - a Stream EDitor
sed is a UNIX utility which processes a text file one line at a time. It has RegularExpression based string manipulation, a hold buffer, and some basic flow control. Amazing things can be done with these BearSkinsAndStoneKnives.
WikiWikiWeb: Sed Language

Yes, I guess sed is a stone knife or bear-skin, in that it's one of those ancient1 Unix utilities with a reputation for being powerful in the right context, but a bit difficult to wield. Whether or not the reputation is justified, it's good to know a little about sed because it's everywhere in Unix/Linux, and it proves useful in many situations.

My main use case for sed is when I find myself thinking, "Gee, I just need to run a regular expression over this file or bunch of text."

In addition to replacing, it can delete certain text or lines, or insert blank lines. The way sed changes files is called non-interactive editing: all change instructions are defined up front, and then applies them to the input, line by line.

This makes it suited to be part of shell scripts or other automated workflows, and handy for one-time changes such as data cleanup.

How to learn sed

Check out Sed Examples by Sasikala for a nice overview of sed features and uses. Each of the posts contains examples organized by feature and command, which makes it easy to find something specific to the task at hand.

There's a Digitalocean tutorial: The Basics of Using the Sed Stream Editor to Manipulate Text in Linux. However, it's my opinion that some of the best sed info is found on websites of a certain vintage:

Example usage

Here's a simple example for replacing a certain string value in some data. Given a text file:

> cat trends.txt
old and busted fashions
old and busted hats
music that is old and busted
all old and busted everything

Perform the substitution on all lines of the original file with command s, and pipe the output to a new file:

> sed 's/old and busted/new hotness/' trends.txt > new-trends.txt

The new file looks exactly like the original file, but with our phrase replaced:

> cat new-trends.txt
new hotness fashions
new hotness hats
music that is new hotness
all new hotness everything

Case study

I encountered some wild Pokemon data that needed a bit of cleanup:

{
  "id": "040",
  "name": "Wigglytuff",
  "img": "http://img.pokemondb.net/artwork/wigglytuff.jpg",
  "type": ["Normal"],
  "stats": {
    "hp": "140",
    "attack": "70",
    "defense": 45,
    "spattack": "75",
    "spdefense": "50",
    "speed": 45
  },
  "moves": {
    // ...
  }
  // ...
}

As the sample shows, the property values in stats are formatted as strings, and some are numbers. It's not only defense and speed, but all of the stats are formatted inconsistently throughout the file, which makes the data less easy to use. It's possible the database or import script could coerce the values for us, but let's make changes to the source file itself so that the data will be consistent no matter how we use it.

Here is the incantation to sed:

> sed -E '/[[:space:]]*("id"|"height")/!s/"([[:digit:]]+\.*[[:digit:]]*)"/\1/'
  • The RegEx part before ! tells sed to ignore a line if it has either " id" or " height" [where there is a space character before the attribute]
  • This is because ids and heights have numbers in their values, but the nature of the data indicates that they should remain formatted as strings.
  • The substitution command s/"( ...symbols... )"/\1/ does this:
  • Match on patterns that look like numbers inside quotation marks
  • Group the part inside the quotation marks with parentheses
  • Replace with the matched group
  • Examples: "7.25" becomes 7.25, "10" becomes 10
  • The flag -E is for modern/extended RegEx format

In practice, the full command would also indicate the original and output filenames, as seen in the simple replace example earlier in this post. The end result is that all number values are represented as Numbers, not Strings, except for those properties where string representation is appropriate for number-like values.

Summary

Bottom line: it's good to know about sed, what it does, and how it can be applied to everyday problems. It's a widely available utility, worth keeping in your Unix toolbox, even if it appears at first to have all the user-friendliness of an old flint blade.

Links


  1. sed is older than I am, so that's prehistoric from my perspective. If you originated in the 80s, then it predates you, too:

    "sed" stands for Stream EDitor. Sed is a non-interactive editor, written by the late Lee E. McMahon in 1973 or 1974. A brief history of sed's origins may be found in an early history of the Unix tools, at http://www.columbia.edu/~rh120/ch106.x09.

    The sed FAQ, Section 2


Go back to the homepage