July 10, 2022

Phrase-Aware Text Wrapping in HTML

Text is more readable when wrapping happens between phrases rather than in the middle of them. Consider the following wrapped title:

Bar
Brawlers Beat
Up Man, Then Eat
Him

If you’re like me, you intuitively assume that, when a line ends, the last phrase in that line also ends. The wrapping shown above violates that assumption: the meaning of the last word in each line depends on the first word in the next—you may feel like you’re stumbling through the text as you revise your interpretation of the words “bar,” “beat,” and “eat” when proceeding from each line to the next.

The same text is more readable when lines are wrapped as follows:

Bar Brawlers
Beat Up Man,
Then Eat Him

I’d call the above a “phrase-aware” wrapping, since text wrapping is done in a way that avoids breaking up phrases. Below, I’ll show two approaches to enforcing phrase-aware wrapping in HTML.

Constraining with Non-Breaking Spaces

A simple way to prevent undesirable wrapping is to use non-breaking spaces, represented in HTML with  , between words that should always appear on the same line:

Bar&amp;nbsp;Brawlers Beat&amp;nbsp;Up&amp;nbsp;Man, Then&amp;nbsp;Eat&amp;nbsp;Him

You should use non-breaking spaces judiciously: if there are too few places where a line can break, the line will overflow its container, which is usually worse than wrapping in a bad spot.

More Flexibility: Hierarchical Wrapping with Flexbox

Using non-breaking spaces provides a small level of control. What if you want to instead express constraints like “keep this clause entirely on one line if possible, otherwise, keep the noun and verb phrases each on one line?” This, too, can be done using nested flexboxes to express hierarchical wrapping preferences:

TODO: Add CSS
<span class="wrap-unit">
  <span class="wrap-unit">Bar Brawlers&nbsp;</span>
  <span class="wrap-unit">
    <span class="wrap-unit">Beat Up&nbsp;</span>
    Man,&nbsp;
  </span>
  <span class="wrap-unit">
    Then&nbsp;
    <span class="wrap-unit">Eat Him</span>
  </span>
</span>

Note that we still have to insert non-breaking spaces in all spans but the last, since otherwise there will be no whitespace between the spans.

What I Actually Do

In theory, both of the above approaches are automatable. For example, you could create a constituency parse for text to be wrapped using nltk.parse and then either extract phrases from the parse tree (to insert non-breaking spaces) or create a flexbox hierarchy mirroring the parse tree, if you wanted to go the flexbox route.

In practice, though, the amount of text I actually want to constrain wrapping for is small: just titles and section headers, which, to my eye, are the worst offenders when wrapped poorly. (Perhaps simply because they’re physically the largest!) And, while hierarchical wrapping is a nicety, avoiding the very worst wraps with non-breaking spaces seems to have a better return on investment. So I just manually insert non-breaking spaces, and only in title and section header text.