Unsound &
Incomplete

Phrase-Aware Text Wrapping in HTML

Text is more readable when wrapping happens between phrases rather than in the middle of them. Consider the following wrapped title:

Bar
Brawlers Beat
Up Man, Then Eat
Him

If you’re like me, you intuitively assume that, when a line ends, the last phrase in that line also ends. The wrapping shown above violates that assumption: the meaning of the last word in each line depends on the first word in the next—you may feel like you’re stumbling through the text as you revise your interpretation of the words “bar,” “beat,” and “eat” when proceeding from each line to the next.

The same text is more readable when lines are wrapped to avoid breaking up phrases:

Bar Brawlers
Beat Up Man,
Then Eat Him

I call the above a “phrase-aware” wrapping, since text wrapping is done in a way that avoids breaking up phrases. Below, I’ll show two approaches to enforcing phrase-aware wrapping in HTML.

(Updated April 2024: I learned from Ellen Lupton’s excellent Thinking with Type that this is known as “breaking for sense.”)

Constraining with Non‑Breaking Spaces

A simple way to prevent undesirable wrapping is to use non-breaking spaces, represented in HTML with  , between words that should always appear on the same line:

Bar Brawlers Beat Up Man,
Then Eat Him

You should use non-breaking spaces judiciously: if there are too few places where a line can break, the line will overflow its container, which is usually worse than wrapping in a bad spot.

(Coincidentally, the title of this section uses a non-breaking hyphen, ‑, to avoid wrapping in the middle of “non-breaking.”)

More Flexibility: Hierarchical Wrapping with Flexbox

Using non-breaking spaces provides a small level of control, allowing you to express which words must not be separated by a line break, but not a preference between possible breaks. What if you want to instead express constraints like “keep this clause entirely on one line if possible, otherwise, keep the noun and verb phrases each on one line?” This, too, can be done by using nested flexboxes to express hierarchical wrapping preferences:

<head>
  <style type="text/css">
    span.wrap-unit {
        display: inline-flex;
        flex-wrap: wrap;
    }
  </style>
</head>
<body>
  <span class="wrap-unit">
    <span class="wrap-unit">Bar Brawlers&nbsp;</span>
    <span class="wrap-unit">
      <span class="wrap-unit">Beat Up&nbsp;</span>
      Man,&nbsp;
    </span>
    <span class="wrap-unit">
      Then&nbsp;
      <span class="wrap-unit">Eat Him</span>
    </span>
  </span>
</body>

Note that we still have to insert non-breaking spaces in all spans but the last, since otherwise there will be no whitespace between the text in adjacent spans.

The above HTML still allows for the same wrapping we achieved with non-breaking spaces:

Bar Brawlers
Beat Up Man,
Then Eat Him

It also allows for more breaks, like wrapping between “then” and “eat him”:

Bar Brawlers
Beat Up
Man,
Then
Eat Him

What I Actually Do

In theory, both of the above approaches are automatable. For example, you could create a constituency parse for text to be wrapped using nltk.parse and then either extract phrases from the parse tree (to insert non-breaking spaces) or create a flexbox hierarchy mirroring the parse tree, if you wanted to go the flexbox route.

In practice, though, the amount of text I actually want to constrain wrapping for is small: just titles and section headers, which, to my eye, are the worst offenders when wrapped poorly. (Perhaps simply because they’re physically the largest!) And, while hierarchical wrapping is a nicety, avoiding the very worst wraps with non-breaking spaces seems to have a better return on investment. So I just manually insert non-breaking spaces, and only in title and section header text.