Skip to content

Strategies of text segmentation for (comparative) distant reading

Why segmentation

According to the Hyperscience knowledge base, "segmentation in NLP involves breaking down a larger piece of text into smaller, meaningful units such as sentences or paragraphs." One basic level of segmentation that you may want to carry out before analysing a text is a division into several larger sections that can potentially also denote shifts in content.

Author- or publisher-defined segmentation

A basic, more structural approach to segmentation is to divide a text into sections that are already predefined by the author or publisher. This can be, for instance, volumes or chapters. You can also separate an appendix from the main text, or put references or endnotes into a separate file.

Alternative segmentation approaches

More complex, content-related segmentations are also possible and can help you gain additional insights into a text, especially when it has several narrative layers.

Introduction, main section, ending

One approach can be to treat the introduction (if there is one) as one separate section and to also split the ending or conclusion from the rest of the text, assuming that the introduction sets the stage for what happens after and that the conclusion offers some form of resolution. This can be especially interesting when you expect a plot twist or a major change in tone, e.g. in a crime novel.

Dividing the main plot from one or more sub-plots

Dividing the main plot from one or more sub-plots can help you explore differences in tone and character description between these sections. This is especially interesting when a sub-plot (in a literary work) is used to offer a different world view than the main plot. One example of a novel with contrasting main and sub plots is Anna Karenina.

Dividing text by setting or location

If your text has different settings or locations, this can be another interesting division marker and can help you explore differences in the description of spaces and the people moving within them. In the novel Dracula, for instance, you could seperate sections playing out in Dracula's castle from sections set on the ship, in the asylum, or in the houses of the different characters.

Segmentation based on central characters

In texts whose sections focus clearly on different (groups of) characters, creating segments dedicated to these individual characters is another option. This segmentation strategy is especially recommended for texts that offer point-of-view narratives, such as in the A Song of Ice and Fire fantasy series.

Segmentation by speech type

If your text includes narrated text as well as direct speech, you may want to use segmentation to separate them for a comparative analysis.

Isolating quotes from other works

Another segmentation strategy is to isolate quotes from other works that may be used in your text. Quoting extensively from other literary works is, for example, very common in pre-20th-century novels.

Segmenation by genre

This segmentation strategy often overlaps with isolating quotes. Segmentation by genre means that you consider if your text belongs to just one literary genre or if different genres are combined. For instance, a novel can contain poetry or newspaper articles (including fictitious ones). A newspaper article can, conversely, contain song lyrics, etc. In the novel Frankenstein, several poems (by other authors) are quoted:

Did I request thee, Maker, from my clay To mould me man? Did I solicit thee From darkness to promote me?

—Paradise Lost.

Like one who, on a lonely road, Doth walk in fear and dread, And, having once turned round, walks on, And turns no more his head; Because he knows a frightful fiend Doth close behind him tread.

[Coleridge’s “Ancient Mariner.”]

We rest; a dream has power to poison sleep. We rise; one wand’ring thought pollutes the day. We feel, conceive, or reason; laugh or weep, Embrace fond woe, or cast our cares away; It is the same: for, be it joy or sorrow, The path of its departure still is free. Man’s yesterday may ne’er be like his morrow; Nought may endure but mutability!

——The sounding cataract Haunted him like a passion: the tall rock, The mountain, and the deep and gloomy wood, Their colours and their forms, were then to him An appetite; a feeling, and a love, That had no need of a remoter charm, By thought supplied, or any interest Unborrow’d from the eye.

[Wordsworth’s “Tintern Abbey”.]