SONNETCAST
  • Home
  • About
  • OVERVIEW
    • Introduction
    • The Procreation Sonnets
    • Special Guest: Professor Stephen Regan – The Sonnet as a Poetic Form
    • Special Guests: Sir Stanley Wells and Paul Edmondson – The Order of the Sonnets
    • The Halfway Point Summary
    • The Rival Poet
    • Special Guest: Professor Gabriel Egan – Computational Approaches to the Study of Shakespeare
    • Special Guest: Professor Abigail Rokison-Woodall – Speaking Shakespeare
    • Special Guest: Professor David Crystal – Original Pronunciation
    • The Fair Youth
    • Special Guest: Professor Phyllis Rackin – Shakespeare and Women
    • The Dark Lady
    • A Lover's Complaint
    • The Quarto Edition of 1609 and its Dedication
    • Dating the Sonnets— With Miro Roman
    • Summary & Conclusion
  • THE SONNETS
    • Sonnet 1: From Fairest Creatures We Desire Increase
    • Sonnet 2: When Forty Winters Shall Besiege Thy Brow
    • Sonnet 3: Look in Thy Glass and Tell the Face Thou Viewest
    • Sonnet 4: Unthrifty Loveliness, Why Dost Thou Spend
    • Sonnet 5: Those Hours That With Gentle Work Did Frame
    • Sonnet 6: Then Let Not Winter's Ragged Hand Deface
    • Sonnet 7: Lo! In the Orient When the Gracious Light
    • Sonnet 8: Music to Hear, Why Hearst Thou Music Sadly?
    • Sonnet 9: Is it for Fear to Wet a Widow's Eye
    • Sonnet 10: For Shame Deny That Thou Bearst Love to Any
    • Sonnet 11: As Fast as Thou Shalt Wane, So Fast Thou Growst
    • Sonnet 12: When I Do Count the Clock that Tells the Time
    • Sonnet 13: O That You Were Yourself, But Love, You Are
    • Sonnet 14: Not From the Stars Do I My Judgement Pluck
    • Sonnet 15: When I Consider Every Thing That Grows
    • Sonnet 16: But Wherefore Do Not You a Mightier Way
    • Sonnet 17: Who Will Believe My Verse in Time to Come
    • Sonnet 18: Shall I Compare Thee to a Summer's Day
    • Sonnet 19: Devouring Time, Blunt Thou the Lion's Paws
    • Sonnet 20: A Woman's Face, With Nature's Own Hand Painted
    • Sonnet 21: So Is it Not With Me as With That Muse
    • Sonnet 22: My Glass Shall Not Persuade Me I Am Old
    • Sonnet 23: As an Unperfect Actor on the Stage
    • Sonnet 24: Mine Eye Hath Played the Painter and Hath Stelled
    • Sonnet 25: Let Those Who Are in Favour With Their Stars
    • Sonnet 26: Lord of My Love to Whom in Vassalage
    • Sonnet 27: Weary With Toil, I Haste Me to My Bed
    • Sonnet 28: How Can I Then Return in Happy Plight
    • Sonnet 29: When in Disgrace With Fortune and Men's Eyes
    • Sonnet 30: When to the Sessions of Sweet Silent Thought
    • Sonnet 31: Thy Bosom Is Endeared With All Hearts
    • Sonnet 32: If Thou Survive My Well-Contented Day
    • Sonnet 33: Full Many a Glorious Morning Have I Seen
    • Sonnet 34: Why Didst Thou Promise Such a Beauteous Day
    • Sonnet 35: No More Be Grieved at That Which Thou Hast Done
    • Sonnet 36: Let Me Confess That We Two Must Be Twain
    • Sonnet 37: As a Decrepit Father Takes Delight
    • Sonnet 38: How Can My Muse Want Subject to Invent
    • Sonnet 39: O How Thy Worth With Manners May I Sing
    • Sonnet 40: Take All My Loves, My Love, Yea Take Them All
    • Sonnet 41: Those Pretty Wrongs That Liberty Commits
    • Sonnet 42: That Thou Hast Her, it Is Not All My Grief
    • Sonnet 43: When Most I Wink, Then Do Mine Eyes Best See
    • Sonnet 44: If the Dull Substance of My Flesh Were Thought
    • Sonnet 45: The Other Two, Slight Air and Purging Fire
    • Sonnet 46: Mine Eye and Heart Are at a Mortal War
    • Sonnet 47: Betwixt Mine Eye and Heart a League Is Took
    • Sonnet 48: How Careful Was I When I Took My Way
    • Sonnet 49: Against That Time, if Ever That Time Come
    • Sonnet 50: How Heavy Do I Journey on the Way
    • Sonnet 51: Thus Can My Love Excuse the Slow Offence
    • Sonnet 52: So Am I as the Rich, Whose Blessed Key
    • Sonnet 53: What Is Your Substance, Whereof Are You Made
    • Sonnet 54: O How Much More Doth Beauty Beauteous Seem
    • Sonnet 55: Not Marble, Nor the Gilded Monuments
    • Sonnet 56: Sweet Love, Renew Thy Force, Be it Not Said
    • Sonnet 57: Being Your Slave, What Should I Do But Tend
    • Sonnet 58: That God Forbid That Made Me First Your Slave
    • Sonnet 59: If There Be Nothing New, But That Which Is
    • Sonnet 60: Like as the Waves Make Towards the Pebbled Shore
    • Sonnet 61: Is it Thy Will Thy Image Should Keep Open
    • Sonnet 62: Sin of Self-Love Possesseth All Mine Eye
    • Sonnet 63: Against My Love Shall Be as I Am Now
    • Sonnet 64: When I have Seen by Time's Fell Hand Defaced
    • Sonnet 65: Since Brass, Nor Stone, Nor Earth, Nor Boundless Sea
    • Sonnet 66: Tired With All These, for Restful Death I Cry
    • Sonnet 67: Ah, Wherefore With Infection Should He Live
    • Sonnet 68: Thus Is His Cheek the Map of Days Outworn
    • Sonnet 69: Those Parts of Thee That The World's Eye Doth View
    • Sonnet 70: That Thou Are Blamed Shall Not Be Thy Defect
    • Sonnet 71: No Longer Mourn for Me When I Am Dead
    • Sonnet 72: O Lest the World Should Task You to Recite
    • Sonnet 73: That Time of Year Thou Mayst in Me Behold
    • Sonnet 74: But Be Contented When That Fell Arrest
    • Sonnet 75: So Are You to My Thoughts as Food to Life
    • Sonnet 76: Why Is My Verse so Barren of New Pride
    • Sonnet 77: Thy Glass Will Show Thee How Thy Beauties Wear
    • Sonnet 78: So Oft Have I Invoked Thee for My Muse
    • Sonnet 79: Whilst I Alone Did Call Upon Thy Aid
    • Sonnet 80: O How I Faint When I of You Do Write
    • Sonnet 81: Or I Shall Live Your Epitaph to Make
    • Sonnet 82: I Grant Thou Wert Not Married to My Muse
    • Sonnet 83: I Never Saw That You Did Painting Need
    • Sonnet 84: Who Is it That Says Most, Which Can Say More
    • Sonnet 85: My Tongue-Tied Muse in Manners Holds Her Still
    • Sonnet 86: Was it the Proud Full Sail of His Great Verse
    • Sonnet 87: Farewell, Thou Art Too Dear for My Posessing
    • Sonnet 88: When Thou Shalt Be Disposed to Set Me Light
    • Sonnet 89: Say That Thou Didst Forsake Me for Some Fault
    • Sonnet 90: Then Hate Me When Thou Wilt, if Ever, Now
    • Sonnet 91: Some Glory in Their Birth, Some in Their Skill
    • Sonnet 92: But Do Thy Worst to Steal Thyself Away
    • Sonnet 93: So Shall I Live, Supposing Thou Art True
    • Sonnet 94: They That Have Power to Hurt and Will Do None
    • Sonnet 95: How Sweet and Lovely Dost Thou Make the Shame
    • Sonnet 96: Some Say Thy Fault Is Youth, Some Wantonness
    • Sonnet 97: How Like a Winter Hath my Absence Been
    • Sonnet 98: From You Have I Been Absent in the Spring
    • Sonnet 99: The Forward Violet Thus Did I Chide
    • Sonnet 100: Where Art Thou, Muse, That Thou Forgetst so Long
    • Sonnet 101: O Truant Muse, What Shall Be Thy Amends
    • Sonnet 102: My Love Is Strengthened Though More Weak in Seeming
    • Sonnet 103: Alack, What Poverty My Muse Brings Forth
    • Sonnet 104: To Me, Fair Friend, You Never Can Be Old
    • Sonnet 105: Let Not My Love Be Called Idolatry
    • Sonnet 106: When in the Chronicle of Wasted Time
    • Sonnet 107: Not Mine Own Fears Nor the Prophetic Soul
    • Sonnet 108: What's in the Brain That Ink May Character
    • Sonnet 109: O Never Say That I Was False of Heart
    • Sonnet 110: Alas, 'Tis True I Have Gone Here and There
    • Sonnet 111: O For My Sake Do You With Fortune Chide
    • Sonnet 112: Your Love and Pity Doth Th'Impression Fill
    • Sonnet 113: Since I Left You, Mine Eye Is in My Mind
    • Sonnet 114: Or Whether Doth My Mind, Being Crowned With You
    • Sonnet 115: Those Lines That I Before Have Writ Do Lie
    • Sonnet 116: Let Me Not to the Marriage of True Minds
    • Sonnet 117: Accuse Me Thus, That I Have Scanted All
    • Sonnet 118: Like as to Make Our Appetites More Keen
    • Sonnet 119: What Potions Have I Drunk of Siren Tears
    • Sonnet 120: That You Were Once Unkind Befriends Me Now
    • Sonnet 121: Tis Better to Be Vile Than Vile Esteemed
    • Sonnet 122: Thy Gift, Thy Tables, Are Within My Brain
    • Sonnet 123: No! Time, Thou Shalt Not Boast That I Do Change
    • Sonnet 124: If My Dear Love Were But the Child of State
    • Sonnet 125: Were't Aught to Me I Bore the Canopy
    • Sonnet 126: O Thou, My Lovely Boy, Who in Thy Power
    • Sonnet 127: In the Old Age Black Was Not Counted Fair
    • Sonnet 128: How Oft When Thou, My Music, Music Playst
    • Sonnet 129: Th'Expense of Spirit in a Waste of Shame
    • Sonnet 130: My Mistress' Eyes Are Nothing Like the Sun
    • Sonnet 131: Thou Art as Tyrannous, so as Thou Art
    • Sonnet 132: Thine Eyes I love, and They, as Pitying Me
    • Sonnet 133: Beshrew That Heart That Makes My Heart to Groan
    • Sonnet 134: So Now I Have Confessed That He Is Thine
    • Sonnet 135: Whoever Hath Her Wish, Thou Hast Thy Will
    • Sonnet 136: If Thy Soul Check Thee That I Come so Near
    • Sonnet 137: Thou Blind Fool Love, What Dost Thou to Mine Eyes
    • Sonnet 138: When My Love Swears That She Is Made of Truth
    • Sonnet 139: O Call Not Me to Justify the Wrong
    • Sonnet 140: Be Wise as Thou Art Cruel, Do Not Press
    • Sonnet 141: In Faith, I Do Not Love Thee With Mine Eyes
    • Sonnet 142: Love Is My Sin, and Thy Dear Virtue Hate
    • Sonnet 143: Lo! As a Careful Housewife Runs to Catch
    • Sonnet 144: Two Loves I Have of Comfort and Despair
    • Sonnet 145: Those Lips That Love's Own Hand Did Make
    • Sonnet 146: Poor Soul, the Centre of My Sinful Earth
    • Sonnet 147: My Love Is as a Fever, Longing Still
    • Sonnet 148: O Me! What Eyes Hath Love Put in My Head
    • Sonnet 149: Canst Thou, O Cruel, Say I Love Thee Not
    • Sonnet 150: O From What Power Hast Thou This Powerful Might
    • Sonnet 151: Love Is too Young to Know What Conscience Is
    • Sonnet 152: In Loving Thee Thou Knowst I Am Forsworn
    • Sonnet 153: Cupid Laid by His Brand and Fell Asleep
    • Sonnet 154: The Little Love-God, Lying Once Asleep
  • THE SONNETEER
  • EVENTS
  • TEXT NOTE
  • CONTACT
    • SUBSCRIBE

Dating the Sonnets — With Miro Roman

Picture
LISTEN TO THE SONNETCAST
SPECIAL EPISODE ON
DATING THE SONNETS WITH MIRO ROMAN
In this special episode, Sebastian Michael is joined by architect, author, and coder Miro Roman to talk about their experimentation with applying a machine learning approach to comparing the full text of William Shakespeare's Sonnets to the full text of his plays and narrative poems to examine whether such a methodology can confirm the rare word analysis research that has previously been carried out by Macdonald P Jackson and others towards dating the sonnets.

[THE BACKGROUND: MACDONALD P JACKSON'S RESEARCH AND WHY THE QUESTION OF DATING THE SONNETS LINGERS]

For today’s episode – which is the penultimate one of our podcast before we get to the summary and conclusion – I wanted to circle back to a question we have touched on before, and examine it in a new light and from a new perspective: Dating the Sonnets.

And that is why for today’s episode I am joined by my good friend and sometime colleague Miro Roman, because the research methodology he has developed and employed in his work does offer such a fundamentally different approach.

I shall introduce him to you in a moment and he will be able to tell us what exactly his approach is and why it is so different from a traditional – or perhaps we should say from a 20th century – approach. But first let me give you the background to this and outline why the question of dating the sonnets lingers.

You may recall, if you have been listening to this podcast – or otherwise let me briefly fill you in – that in our conversation with Sir Stanley Wells and Paul Edmondson about The Order of the Sonnets, we wondered how they had arrived at their putative chronology for the composition of the sonnets, and they told us that they had done so largely based on the research carried out by MacDonald P Jackson.

MacDonald P Jackson is a New Zealand scholar – he is Professor Emeritus of English at the University of Auckland – who in turn built on research previously done by Hieatt, Hieatt and Prescott and others, and what their approach has in common is that they conducted rare word analyses of William Shakespeare’s works to see how the sonnets relate to the plays.

The rationale behind this is that we can fairly accurately date the plays, we believe, because for these we have entries in the Stationers Register, records of first performance dates, and other external evidence, and so while, as with everything else concerning Shakespeare, there can hardly be any absolute certainty, there is something approximating a scholarly consensus about the timeline of the plays.

Now, the contention with rare word analysis is that an author will go through phases in their creative output, with their vocabulary continuously evolving and changing over time, and so if a new word enters their work, it should make itself known in different pieces that are being written in the same period. That in itself is something of a supposition rather than a hard fact or rule, but other types of research also suggest that it is indeed the case, and it is also worth bearing in mind that we are talking about an incredibly dynamic period, right at the very beginning of Modern English, when new words and expressions are being coined virtually every day.

The reason MacDonald P Jackson and those before him concentrated on rare words was that they simply did not have the computing power needed to analyse entire bodies of vocabulary in the 1970s, 80s and 90s, apart from which – and this really is more relevant to addressing the authorship question of literary works than their dating – it was also quite generally believed that what makes a person’s writing individual and specific is their use of unusual words rather than, as turns out to be the case, their use of language overall.

We do discuss this in some detail with Professor Gabriel Egan in our conversation about Computational Approaches to Shakespeare, and so if you are interested, then I can certainly recommend you also have a listen to that episode.


Now, because a sonnet is a short piece of poetry consisting of roughly a hundred words – give or take about twenty – it is not really possible to do useful research with individual sonnets. For this reason MacDonald P Jackson adopted a practice already established by Hieatt, Hieatt, and Prescott that groups the sonnets into four zones. According to this:

1) Zone 1 contains the Fair Youth Sonnets numbered 1-60
2) Zone 2 contains the Fair Youth Sonnets numbered 61-103
3) Zone 3 contains the Fair Youth Sonnets numbered 104-126, and
4) Zone 4 contains the Dark Lady Sonnets numbered 127 to 154, including the Anne Shakespeare Sonnet 145 and the two allegorical poems at the end.

Although these four zones are of unequal size, with Zone 1 containing 60 poems, Zone 2, 43, Zone 3, 23, and Zone 4, 28 poems, this broad allocation was considered sufficient to say something about how rare words that occur in them compare and therefore relate them to the plays.

What MacDonald P Jackson found and what the current understanding therefore of the chronology of composition of the sonnets is, and what Edmondson Wells therefore have followed, is that Zone 3 in particular, which contains the latter part of the Fair Youth Sonnets, was composed later than the others and than had previously been assumed, their composition stretching right into the early 17th century. As MacDonald P Jackson asserts in his paper: “The main conclusion, which is unlikely to be overturned, is that the majority, if not all, of the last twenty-odd sonnets to the Friend were written in the 17th century.”

I found this most intriguing, because some of the sonnets in this group do in fact seem to relate to events that happen in the early 1600s, most prominently Sonnet 107 which many scholars believe makes near-direct reference to the death of Queen Elizabeth I in 1603, and Sonnets 123, 124, and 125, which may contain more covert references to the coronation of King James I, which then followed. Others in the group, though, particularly Sonnets 104, 105, and 106 do not suggest such a late date of composition.

Edmondson Wells took Jackson’s findings further and also rearranged some of the other zones for their decoratively presented edition which also includes sonnets that did not form part of the collection originally published in 1609, but as Professor Gabriel Egan in our subsequent conversation observes, the only strong claim of Jackson’s relates to Zone 3.


Looking at only the rare words in a poet and playwright’s vocabulary is, of course, by necessity severely limiting. We are taking into account words “occurring in the sonnets and in at least three but not more than seven other Shakespeare works,” as Jackson puts it, which means we are pre-determining, not quite arbitrarily but on the basis of highly particular decision making, what matters to us and what doesn’t.

Following our conversation with Professor Gabriel Egan and considering his significant insights into how it is not necessarily rare words but the way language is being used overall that defines an author’s voice, and seeing how far computing has come along in the quarter century since Jackson published his research in 2001, I therefore wondered whether with today’s computing power and machine learning algorithms we can confirm Jackson’s findings, or if not, whether we would be forced to refute them.


[THE HEADLINE SUMMARY OF OUR FINDINGS USING  FULL TEXT COMPARISON]

And the short answer to this is – here comes the spoiler alert, listen away if you don’t want to know the preliminary result of our early phase research – we can’t.

Here, in summary – think of this as an Executive Summary, or what in an academic paper will be the basis for an Abstract – is what we did, what we expected to find, and what we did find:

Rather than identifying and then searching for and comparing specific words, we looked at how the entire vocabulary of the sonnets compares to the entire vocabulary of the plays and the other works by William Shakespeare.

Since our text comparison was going to be done by an algorithm, we subdivided the large Zones 1 and 2 into Sub-Zones 1A, 1B, 1C, and 2A and 2B, so as to get seven similarly sized groups, rather than four unequal sized ones.

We also split the plays into their acts, again to be able to deal with files of a comparable size, and we included in our comparison the long narrative poems Venus and Adonis and The Rape of Lucrece, as well as The Phoenix and the Turtle, and indeed A Lover’s Complaint.

What we were looking for are ‘affinities’ between these texts. We will explain in more detail in a moment what we mean by affinities, but in principle, what we would expect is:

  1. The acts of the plays to cluster together: they inherently share a vocabulary with character and place names, with themes and concerns, and so we would not want to see the five acts of Hamlet, for example, to disperse all over the place, as it were.
  2. The sonnets to cluster together in principle: with the exception of Zone 3, which MacDonald P Jackson has extend to the early 1600s, we would hope to see the sonnets occupy a similar space on any projection or metaphorical ‘landscape’.
  3. The sonnets to largely cluster around plays that were written between 1593 and 1596: this is the time frame that had generally been assumed for the composition of the majority of the sonnets, and that MacDonald P Jackson also still confirms for some of the other zones.
  4. Zone 3 to form an exception and to settle around later plays, stretching right into the early 1600s.

We would further expect plays of a kind to show a tendency to cluster together; so the histories should broadly have a greater affinity to other histories than to the comedies, for example, and we would also expect plays and poems that sit thematically close to the sonnets to also show a greater affinity with the sonnets than those which don’t. This means that plays whose central theme is love, such as Romeo and Juliet, for example, should show a greater than average affinity to the sonnets, purely on the basis of its theme as well as of its date of composition.

What we actually find is that:

  1. the acts of a play do cluster together, much as hoped and expected;
  2. the sonnets do cluster together, also as hoped and expected;
  3. the sonnets do largely cluster around plays that were written between 1593 and 1596, just as hoped and expected.

All of which is good news in so far as it suggests the methodology is sound and the instrument we’re employing works: everything thus far is as expected.

But:

4) Zone 3 behaves no different to the other zones and shows no discernible affinity to any of the later plays any more than any of the other zones.

This comes as a surprise.

What we also find, interestingly, is that the sonnets appear to display a pronounced affinity to two plays we wouldn’t expect: Timon of Athens and The Tempest. The reasons for this are not immediately obvious and we will want and need to examine this further, but one possible explanation that does offer itself is that William Shakespeare, as has frequently been suggested, returned to the sonnets and revised them, while curating them into a coherent collection around about the time just before publication.

And so based on our research so far – and I say ‘so far’, because we will deepen this and refine it, and we fully intend to publish our findings once they are ready – we cannot confirm the conclusion by MacDonald P Jackson and others that Zone 3 of the sonnets – that is Sonnets 104-126 definitely be of a later date of composition than the other sonnets.

Nor – I should stress – can we refute this. We can say and suggest at this point that further research needs to be carried out and that the question of dating the sonnets has not, as yet, been conclusively answered, and any attempt at dating them is and must for the time-being remain entirely, as Edmondson Wells note in their edition, “conjectural.”


[INTRODUCING MIRO ROMAN]

Now then, if you want to understand better what we did and how we went about it, then please stay with us and meet Miro Roman:

Dr Miro Roman is an Assistant Professor at studio2 at UIBK, the University of Innsbruck, Austria, where he founded the research platform House of Coded Objects, and the design studio0more. He is also a senior lecturer at Meteora at the Chair for Digital Architectonics at ETH Zurich.

As an architect, coder, and scholar, he focuses on the intersection of artificial intelligence, big data, social media, and information technologies with design, architecture, art, and fashion.

Roman explores, designs, codes, and writes about the world while playing with 'a lot' – ‘all’ the objects, books, and images; clouds, avatars, streams, lists, indexes, and pixels: what is this abundance of information about, and how does it shape our world? To navigate and surf these vast flows, he codes and articulates synthetic alphabets. For Roman, computation is a form of literacy.

Roman has a longstanding research interest in the exchanges between art and science, particularly through multimodal projects supported by artificial intelligence and big data. He is both an author and a coder, writing books and creating applications that explore these intersections.

Together with Alice_ch3n81, Roman authored A Play Among Books (Birkhäuser, 2022), a book based on his PhD thesis that explores the infinite flow of online books to discuss architecture. This work combines humanistic expression with scientific rigour in a digital framework. As a byproduct of the book, Roman developed the computational library Xenotheka and its search engine, Ask Alice, which have been instrumental in establishing design and research methodologies at the House of Coded Objects (UIBK). These instruments also inform his teaching at Meteora (ETH) and studio0more (UIBK).

Roman was also a guest lecturer at ATTP – the Research Unit for Architecture Theory and Philosophy of Technics at TU Wien (University of Technology Vienna) and a researcher at the Future Cities Laboratory, part of the interdisciplinary Singapore ETH Centre, where he co-edited A Quantum City (Birkhäuser, 2015). This book invites readers to explore the vast indexes of the world and embark on a journey to (re)discover the city.

Roman has lectured widely at institutions including TU Berlin, ENSA Paris Malaquais, ZHdK Zurich, UF School of Architecture Florida, Southeast University Nanjing, SAUP SZU Shenzen, Academy of Architecture Mendrisio, Politecnico di Milano, URA Singapore, and the University of Indonesia. His research has been featured in publications such as A+U, Arch+, and Oris, and his work has been exhibited at venues like the Taipei Fine Arts Museum, London Design Biennale, and the Museum of Contemporary Art in Zagreb.

And I should point out that Miro and I have worked on several projects together, and we jointly taught a seminar at ETH Zurich at Professor Ludger Hovestadt’s Chair for Digital Architectonics, and both he and I have worked very closely also with Professor Vera Bühlmann at her Research Unit for Architecture Theory and Philosophy of Technics, where I am still a guest lecturer today.
[THE CONVERSATION]


SEBASTIAN MICHAEL:

Welcome, Miro, to Sonnetcast, and thank you for joining us.

The first question I have for you is, of course, what is this instrument that we have now been using? How does it come about, and what is the background to this way of working with machine learning algorithms?


MIRO ROMAN:

The background story for me, what was the most interesting moment in kind of my research development was, so I was developing these things [at] ETH and we were discussing about text, we were discussing about computing, about all of these topics. And then we came somehow to the idea of the image of a book. So what could be an image of a book? And for me, this was a kind of a fascinating question. I could not even get my head around what could this be? How to approach it.

And then in parallel to this, there was this notion of what Umberto Eco said at one moment in one of his interviews for Spiegel, he said that we can characterise objects, or that developed culture or sophisticated culture, in today's sense, they characterise objects by properties.

So, for instance, if you go to buy a camera, or if you go to buy a car, in principle you buy these objects by their physical characteristics. So it has this amount of kilos, it has this amount of [horsepower], if it's a car and this amount of cubic centimetres; or if it's a camera, it has this aperture and then it has this kind of lens, it has this specific sensor and so on.

Or even if you describe a person, it's a male or a female, this age, that age. And then this nationality, that nationality, the place of birth and so on. So, for instance, then Umberto Eco says, if an alien was coming and you want to describe, for instance, what is water? And then if you have a scientist describing, the scientist would say, okay, it's H2O. So it has two hydrogens and one oxygen.

But if you have another way of describing, so for instance, a layman had to describe, someone who is very open, they would say, water is life. Water is death. Water is a lake. Water is a sea. Water is blood. Water is ice. Water is steam. Water is liquid.

So these things would just unfold in another way. And this other way looks much more capacious. And what information technology has brought to the table is that we can actually work with this. So we can compute these complicated clouds of words. And through them we are able to get [a] kind of consistency. So we are able to compare objects based on their textual descriptions, but not in a way that we reduce them to specific points, but in a way that we use the whole cloud as a computing material. 

This is what we have, for instance, with Amazon and with Google, and now even more with LLMs and ChatGPT and these kind of big, large language models.

For instance, with Amazon, if you buy this book and this book and this book, it looks which books you like. So what stuff do you click? And then it looks how other people who clicked in that way, what did they click next? And through this it creates a kind of a cloud of associations among books through the method of how other people looked at it. So this is one of the strategies, how you can characterise an object in terms of information.

So you just create these relationships. But the meaning of it, what these people thought and their knowledge about the relationship of books and so on is implicit in the link, but it's never explicit. So it's there. You know why you did it. And then if you have a long line of books, you get the feeling of what this person is about. So if you see, for instance, fifty books that a person has in their library, you can kind of start guessing what kind of cars they have, what kind of bicycles, how they dress, where they live, what's their social status. I think you could easily profile – and it would be interesting to see how correct this is – but I think this would be quite precise.

And the beauty of Google is that it actually was the first one that did… – this was the revolution in search engines. Before Google had this, you had kind of a physical ranking of things, so you would say this is the most important, this is less important, and there would be a person who did it. And what Google said, we will forget this kind of fixed ontology systems. What we will do is a flexible ontology which is happening nonstop. So as people are looking for certain things, then the ranking of those things goes up. When it gets less popular, the ranking goes down and the ranking is just the relationships of it to other things. This is in terms of mathematics, the Markov chains.

So now when we take this method a little bit further, and this was something that I was working on for the last ten years, is how to work with large amounts and large quantities of text. How has writing and how has text changed once we can access all the books at the same time, which is exactly what ChatGPT, large language models, or textual databases do.

And in this sense, we approached also now what you brought to the table, Shakespeare and the sonnets. So we tried to put the sonnets in different constellations and then look how particular sonnets come next to each other or how they behave.

Now, just to break down the method or the instrument in principle, we take a collection of books, collection of texts. In it, in the collection, we are looking for all the individual words. So we in principle first create a dictionary of the collection, the dictionary of the library.

And in this dictionary we correlate how the words are correlated to all the texts. So it's a kind of a big matrix in which all the words belong to all the texts. But then if a word doesn't appear, we say it appears zero times. So each word gets a kind of a series of numbers, a vector which shows how it behaves in a specific library.

So in this library, when we have these vectors, books come close to each other depending on the way how we organise the library. So if the library is in a line, they will all have to form in a line. But if the library is a kind of a rectangular library, they will find their positions and they will find their positions based on those numbers. So this would be something very similar like what we have in 2D or in 3D coordinate systems.

So they would find their place in a specific space. And by this we would see how much are they close to each other, according to the vocabulary they have.


SEBASTIAN MICHAEL:

And so the important thing here is that the instrument – I keep calling it the instrument because you don't have an actual name for it, or do you have a name for it?


MIRO ROMAN:

Yes, it's a kind of an algorithm. The algorithm is called self-organising map. In my specific case I use this algorithm, and then I put it in a library which I call Xenoteka. And then the search engine for this library is called Alice, Alice_ch3n81. But these are kind of very specific instruments. Like for instance ChatGPT is called ChatGPT. And then DeepSeek is called DeepSeek. I have something which is called Alice, but the scope of Alice is much smaller. So these big guys, they have one to two billion tokens and I have twenty to thirty thousand tokens.


SEBASTIAN MICHAEL:

Yes. But the principle essence of this is, of course, that the instrument does not understand the text. It does not try to make semantic sense of it. It does not, much as you say, when we use any of these machine learning algorithms, they don't actually make sense of it. They use a great deal of data to compare entities that are already out there, and then come up with similar or relevant responses.


MIRO ROMAN:

Yes, this is something that I kind of fell in love with, with these things, is that these patterns, the kind of statistics and repetitions in texts, they in a very interesting way, indirectly, implicitly, but just through the patterns capture somehow something that we call 'meaning', something that we call 'sense', something that we call 'semantics' and so on.

But they don't capture this in an explicit way with definitions and so on: they use it in a way how words relate to each other. And then they know, for instance, if these words more often relate to each other than these ones do. So for instance, with ChatGPT, what ChatGPT does and all these LLMs, they just predict the next word. And if you think about any language, so for instance, the English language has, I would say around 100,000 words max, or if we stretch it, it can go to 200,000 with all the technical stuff. Out of this, we use around 20,000 in general. And then in daily speech we use around 6,000 to 7,000 if you are good. So from this 7,000 to 100,000, this is the span.

But even if you look at 100,000: to choose 1 of 100,000 is not so complicated. So it's not that you are choosing one in thirty billion. You are choosing one out of one hundred thousand. And then if you have a kind of a spectrum, then you can very quickly remove eighty percent of these hundred thousand. Then you are very quickly left with twenty. And then it gets quite subtle how you choose these things, and there is no right and wrong here. So some words are more probable, some words are less probable. But it's always statistics. So these machines have no clue in a sense how a human thinks. These are just patterns. And it's so fascinating that these patterns are able in a way to capture this kind of soft stuff. This is for me quite beautiful.

And I think this is what's happening today as a revolution in the way how we approach everything. Because up until now we had a huge gap between the humanities and the hard sciences. And this was exactly because we didn't know how to measure language. So by this, humanities were always on the side of religion, on the side of art, on the side of all of these things where we cannot put numbers to things. Now we can finally put numbers to all the words, and we can start doing mathematics with words.

But what happens now is that we actually destabilised a little bit the scientific stuff, because it's not that the language became scientific; I think what happened is rather the other way that the science got another understanding of what it might mean to be scientific.

I think this will have deep implications on how our civilisation will recalibrate what it means to do science. I would say, for instance, for me, I would say that an appropriate way of thinking about this in a scientific way would be that the measurement, we should think of a measurement as an act of conversing, or as an act of conversation, or as asking a question.


SEBASTIAN MICHAEL:

Yes. That kind of brings me on to the next question I have, which is: maybe you can explain a bit what you have done with this approach before. So your Play Among Books and what you've used the methodology for at the chair, because then that will maybe lead us into why I thought it might also be applicable to this purpose that I have in mind.


MIRO ROMAN:

So what I did, I did a funny experiment. This is like already now ten years ago. I was doing a PhD in architecture, I was interested in the idea how can we write about architecture if we have all the books in the world? Then I went to these big libraries at the time. So for instance, like Anna's Archive today or Library Genesis of yesterday, Memory of the World, these kind of libraries that have a lot of scanned books, and I downloaded, I think, a random twenty, thirty thousand books. So I downloaded thirty thousand books, which I had no clue what they are about. And I developed, or I used this algorithm, which is called self-organising maps, and we were at the chair of Ludger Hovestadt [at] ETH, kind of all working on this in different aspects through images, through models, texts and so on. And I was working with text and I said, okay, can I find in this library of thirty thousand books, books that talk about architecture without ever opening them, which was a kind of an interesting question.

And in this question was this question of an image of a book, how you give, so to say, a fingerprint to a book, but a fingerprint, which is not… – so one cannot think of a fingerprint as something which is yours. Your fingerprint is always in relationship to all the other fingerprints, somehow, in this sense.

So in this space of twenty thousand books, if we create these dictionaries and if we know a few books which are about architecture... – so you always need to have a specific set of data which you know what it is about, and you have this other data which you don't know what it's about. And then for instance, you look for keywords and then you look for these affiliations and so on. And through two or three distillations of this process, I was able to get six to seven hundred books, which are the closest to talking about architecture. So nothing is ever explicit.

I think this is what's the difference between today's world and yesterday's world: yesterday’s world was zero or one, and today is between zero and one.

Or for instance, in another way, how to think about it: yesterday's world was about finding the truth, and today's world is about talking about truth as an object. So it just kind of shifts a little bit. But that's it.

So the idea was to go from a collection of books that you don't know towards a specific one, and this worked wonderfully. I was able to download six hundred books on architecture from this thing, to recognise them. And then out of these six hundred books, I said, okay, now I will profile from this six different approaches to architecture. So then I clustered those books again with an algorithm, and I got six different profiles. So for instance, one profile on architecture could be about people who are interested in cities. One would be to people who are interested in the history of architecture. One would be which is more towards theory of architecture, one would be, which is more towards, I don't know, some kind of activism, this would be sustainability people, or one would be, for instance, another way which would be more kind of gender studies or feminism. But you got these profiles and then, for instance, use those books in order then to see what would be the main concepts for this group or for that group and so on.

But the thing with these things is always that you get words and you get data, but then the meaning or the way how you talk about it comes from your own knowledge and from your own ability to correlate these things.

So it's not objective in this classical way where you get something... – but I think actually it's always like this. So it's always about stories that you tell on top things that you see.


SEBASTIAN MICHAEL:

Of course it is. Yes. I suppose even with traditional statistical methods, if you count populations, if you count opinions in populations, and if you count behaviours, you will still have to interpret them and you still have to be wary of imposing your own intention on the data that you receive. I think that's inherent in analysis.

Now, one of the questions that I imagine somebody would have, and in fact, when I had a conversation with Professor Gabriel Egan – and I refer to him a couple of times because he works with computational approaches to literature – and I asked him whether he thought that AI would change things greatly, and he said, well, there are two limitations, particularly, to AI as far as the field of study is concerned.

The first one is that in literature, specifically in old literature, in this case English Renaissance literature of the 16th and 17th century, there is limited data. There are so many texts that we have, and unless a treasure trove is found with another one thousand plays or another five hundred collections of rich poetry, that's the stock we have: we have x amount of words from that period that are known, and we don't expect to find huge additional libraries. It's not the assumption that we will go digging and find a richly equipped library with actual texts that we can understand, so there's the limit to how much data there is. And then when you work with the sonnets specifically, or with pieces of poetry such as the sonnets, you have the additional problem that they are very short texts.

But interestingly – this is the more salient point in this context of what we're talking about – he says that the other problem that AI poses for researchers, for scientifically minded research and study of, analysis and study of, literature is that you cannot explain what it does. It doesn't show you its steps.

In other words, if you take the plays of William Shakespeare and you take any program that will count the number of times certain words appear, you can count them and you can say, this is what I've done. I've counted the words, and then I've said, these are my parameters. I want words to appear at least five times, but at most twenty times. That's the next step. You can explain that. Then you can say these words appear in these plays and they appear in these sonnets, and so we have a correlation there. And that can be explained. And so similarly with various different approaches, with sort of a semi-manual and semi-computational approach, or by writing a program; he says, our students, when they do tasks or when they do projects, they are not just encouraged, they are required to show us their thinking and to show us their programming. And they can do so because they learn programming. And they can then say, here is the script that I've done, here is what the computer has been instructed to do, and this is what comes out of it.

And then he says, with machine learning and with AI, the problem is we don't know how it does it. It does it, but we don't see how exactly.

So the question then, if we are going to use a machine learning approach like this is how would we, a) – maybe this is a two part question – how reliable would we say this is, because we are not telling the instrument how to do this: when we give the instrument our texts, we instruct it to compare this text with all the others. We put certain texts in a mix with each other and then let it run several times. But we don't say, go and look for this, look for this theme, or look for that theme, or look for this vocabulary or look for that. And so the question in these two parts would be: how reliable or dependable is this approach, and then if somebody were to listen to this and said, well I want to see whether I come to similar results, how would they replicate it? What would they, would somebody have to do if they wanted to carry out essentially the same experiment, if we want to call it that. But I have a feeling we're in the slightly wrong mindset there.


MIRO ROMAN:

So on the first part, on the level of data limitation, of course, yes: these things work better the more data there is. So all AI is always based on data. The more data you have, things get better. But now another light is becoming interesting. So with LLMs and ChatGPT and these kind of things, they now work with the totality of the text. So they are trained on all the texts that ever happened.

What becomes now interesting, and I think this will become more and more important, is how you gather data and which data you select. So of course, if there is no data you are in a problem, then you can do synthetic data and so on. But I think this will become a kind of a new important field of figuring out which data you want to have, because this creates the profile.

So let's not look at this as a kind of a limitation. This is the core of the thing. So how do you gather data and which data do you take. And then what you can do is for instance get very specific data on things.

So like we did, we took only Shakespeare's stuff. Then one way is to look at Shakespeare's stuff as a world in itself. So then the problem is how much of Shakespeare's stuff you have. But the other way is to look at Shakespeare's stuff in different contexts. So then you can look at Shakespeare in the context of ChatGPT, which is the whole humanity, ever, or the Western side of it. Or you can look at Shakespeare in the context of the whole Renaissance; or you can look at Shakespeare in the context of the whole Renaissance and, for instance, Enlightenment. Or you can look at it in the whole Renaissance, and for instance, just the relationship between Shakespeare and Homer.

So I think the way how we gather data and what we consider a data set will become important because it will be able to give, for instance, a lens or a focus point to big LLM models. So today, for instance, it's not anymore a problem to write a nice sentence, which sounds strange, especially to people like you who are writers. But machines can write nice sentences. Now, I would say the writer of the 21st century is about putting constellations of different AIs and different systems and so on, through which one can get something they like.

The other part of the question is, how can we see if this is correct or not, or how can we evaluate, or we don't know the process? The thing is that these processes, in principle, are very simple. It's just that they are done on so many elements simultaneously, so the scale is something that we cannot follow. So we can follow, we know in principle exactly what's happening, but when this ‘exactly what's happening’ is scaled to millions or billions or trillions of elements and they are crisscrossed all together, then these strange things happen. And this is just because we mentally cannot follow these things.

But what we can say, okay, this cloud is similar to that cloud, but not a hundred percent, but eighty percent. So these things are always in a probabilistic way.

But two examples: So there was one guy, Kolmogorov, who was doing statistics, and he asked the question, how much information is contained in Anna Karenina? So that's an interesting question. And he came back with an interesting answer, which is a little bit disturbing. So he said the amount of information contained in Anna Karenina is equal to the amount of information needed by any machine to reproduce what is done there. So this would mean the amount of information in Anna Karenina is equal to the amount of information when you send it to a printer, kind of.

So this is interesting in a sense that the computer can capture something that's Anna Karenina in whichever sense. But the problem is that then Anna Karenina becomes a function. So Anna Karenina becomes a combination of letters which make its text, but everything else is lost. So it becomes a kind of a dry engineering thing.

The other guy who was interesting exactly at the same time, Claude Shannon, he relaxed the whole situation because he said, there is a difference between transmitting a message and what the message means.

So meaning is out of the game, and if meaning is out of the game, then all the humanists can get relaxed because all the nice stuff that the humanists like is not in jeopardy. So we can do acrobatics. We can do all these strange things with information and with data. We can cut it, move it like this, like that. But the meaning is intact, because the meaning is always in the context. So the meaning is with the person who reads and with the knowledge that this person has, and with the culture through which this person grew up and so on. So meaning is always with the one who sends the message and with the one who receives the message. But the message itself is meaningless, because if you have people that don't know how to read, they will not get the message.

And the message and the meaning: it’s an interesting relationship between them. So if you, when I was researching a little bit on the notion of truth, where is the truth of an object? And then one of the most interesting kind of articulation of this was like the truth of an object is in-between the object and the group of experts that look at it. So it depends. Of course it depends: different cultures, different people will give different meaning to things. Nevertheless, the object will stay the same. But you cannot say that it doesn't have anything [to do] with the object. Of course it has something [to do] with the object, because the object has a certain material, but the meaning is a different beast. It lives in a in a different space. So how do we justify or give credibility to these things is really an open question.

So the methods are not spiritual methods. These are mathematical methods. So they are kind of precise. The problem in understanding them is just the scale on which they work, and the correlations that can be done when you correlate a lot of data is just beyond… – our brains cannot get these logics. But the beauty is that the computers can.

But then we need to situate these things in contexts which are familiar to us. And then through our knowledge, we can tell other stories. So I would look at this computational stuff, and for instance, this experiment that we did with Shakespeare, more on the level: can we tell another interesting, unexpected story through these things that we did and somehow contribute to the discussion that exists.

Because we cannot argue like, if you have Shakespeare in our case, and then you look now in this experiment, this was very precise by the years 1591, 1592, 93. I think even if you ask Shakespeare, if he was alive, he would have problems in answering this question. So this would be, for instance, a question: why do you love one person? So you have a lover or person that you want to spend your… – why do you love this person? You cannot answer this question. I don't know why. I just love, I love the whole thing about this person. Or if you're a writer and you write, it's not that you write from five to six. You write always. Even when you sleep, you are with this kind of things. So I think if we would ask Shakespeare which sonnets are affiliated with what, and so on, he would say, you know, these sonnets are from here, these sonnets are from there. And then I wrote this one and this time and this one, and I moved it here, and I moved it there. So it would be a very complicated story.

Shakespeare was working in a theatre. He had collaborators. So is ‘Shakespeare’ really a name only for him, or it's a name for the whole group that was collaborating with him? And I would say then that the answer is, again, twofold. It's both. Sometimes it's him and sometimes it's the group; I don't know, what do you think about this?


SEBASTIAN MICHAEL:

Oh, this is one of the big, big questions that is discussed a great deal: how much of Shakespeare is really Shakespeare? Well, it comes in two layers: on the one hand, how much of Shakespeare's works were actually penned by him, and how much of the works were direct collaborations where somebody else actually wrote part of a play? And we seem to be finding more and more that a lot of these plays were collaborations, even plays we previously had thought of as being entirely canonical and entirely Shakespeare's.

But even the plays, or even the acts that Shakespeare himself pens and that are completely from his own hand, are, of course, immersed in a culture of collaboration and of other playwrights’ writing. And we know that he liberally quotes other people not because he wants to reference them, but because he hears a good sentence and he uses it.

And it's such a very dynamic environment. And also it's, of course, the very beginning of Modern English. So it's at a time when English comes into its own and when he's working in an extremely dynamic city: London in the 1590s grows from about 150,000 people to 200,000 people. The average age is something like 25. It's full of young people. It's a very happening place. It is absolutely not stale. It's not ancient. It's not old-fashioned. It's really happening. And they know each other, these playwrights, because there are maybe at any given time, a dozen or so who really work, and they work with each other. They have competing companies, but sometimes an actor moves from one company to another, and then two companies get folded into one, and then one company starts using another company's playwright.

And the plays are owned by the company, not by the author. So when Shakespeare writes a play, it's owned by the Lord Chamberlain's Men, who then become the King's Men. So the company owns the play and his name as the author of the pieces that people flock to – and they flock to the theatre in their thousands: what is difficult for us to appreciate, The Globe Theatre in London today has a capacity of, I think, about a thousand, even though it's built on the model of the original Globe. But the original Globe, we believe had a capacity of some two thousand, maybe even three thousand people. It was huge. It was like an arena, and it was filled several times a week.

And so it was an unbelievably popular art form and form of entertainment in an extremely dynamic environment. And so… –


MIRO ROMAN:

If we were, for instance, to put it in today's terms, this would be on the level of Netflix. So these things that people enjoy, this was not considered as like crazy high culture or something, which is a little bit, you know, specific that people cannot penetrate. This was for everybody.


SEBASTIAN MICHAEL:

Absolutely. And the great marvel of William Shakespeare in particular, but some of his contemporaries like Marlowe as well, is that they managed to write for the masses, for people who had very little schooling, who couldn't read, and who were not considering themselves to be literary minded. They were able to write for those and at the same time for court and for people who were very highly educated, who had university educations, who appreciated wordplay, who appreciated witticisms, who appreciated double and triple meanings in the language. So it was an extraordinary skill to combine these factors.

But to circle back on the point you've made, they are, of course, immersed in an environment in which language is born and created. And so we today credit William Shakespeare with contributing x number of words to our vocabulary. And it's quite a large number for a single individual. But we slightly make the mistake of thinking that William Shakespeare thought of all these words by himself. He, of course, was in a culture where these words suddenly became currency.

And so that is absolutely true. And what also is absolutely true is something that you mentioned earlier, which we need to bear in mind with any of this that we're discussing, which is that William Shakespeare might be amused by our approach, because if you think: I do not write a sonnet on this date, and then it stays there, fixed in eternity, and therefore it belongs to the 12th of April 1594, because he is, of course, creating a body, and particularly when it comes towards publication, he may well, and there are, in fact, even what we did seems to point towards there being a phase during which he goes back to the texts and reworks them.

And so it may well be that a sonnet was written in a particular instant for a particular reason, be it emotional, be it external, be it because he needed it, or because he wanted to express it. And big debates happen about to what extent it’s lived experience that prompts him to write these sonnets, and to what extent it's just poetic imagination, but that's a whole separate issue.

But we definitely can assume that a poem that was written in a certain moment for a certain reason, whatever that reason is, later gets absorbed into the collection and reworked and reimagined because at that point in the proceedings, something else takes priority. For example, the fact that he may decide that actually, this would sound better if I used this word or this rhyme instead. And so the idea of dating sonnets in itself may be misguided to some extent.

The reason it interests us is that for a long time it had been assumed that the sonnets belonged to an early phase of William Shakespeare's writing, probably around 1592/93, and then stop – 94 and then stop.

And one of the reasons that was the predominant view was that for a long time, especially in the 19th and 20th century, particularly during the Victorian and directly post-Victorian era, people were extremely uncomfortable with the sonnets, with their existence, because they talk about William Shakespeare's love and affection and admiration for and of a young man.

And so one way of dealing with this was to say, well, that was a short period in his life. It was a folly. It was something that a very young Shakespeare…


MIRO ROMAN:

…got confused a little bit…


SEBASTIAN MICHAEL:

…got a bit confused. Yes. And we can forgive him that because later he wrote the great plays, he wrote Hamlet and Othello and Macbeth and King Lear, and that kind of redeems him, so we can forgive him this folly.

And then, interestingly, it was through the research done by MacDonald P Jackson and the people he based his research on, but it was MacDonald P Jackson who was the first person to state categorically that, as far as he can tell, a part of the sonnets stretch right into the early 1600s, so the creation of the sonnets goes over a much longer period than had been assumed.

And that is significant for two reasons that come to mind straight away: one, that William Shakespeare was, if that's the case, involved in writing sonnets over a much longer period, and that the relationships that they deal with would have lasted much longer, and that makes them much more significant: if a friendship or relationship, be that now of whatever nature, whether it's a physical relationship or one that was at one point physical and then wasn’t; but if it continues over several years, then that makes it inherently a more significant connection than if it only lasts two or three months or a year or two.

And because you mentioned the example a moment ago of Netflix and indeed any comparable setup that produces these dramas. When we look at something like Slow Horses that I watched the other day, or watched the whole thing more or less binge watched Slow Horses because I thought it was so good. Have you seen it? It's absolutely wonderful. It is so beautifully created, it has so much character and so much intelligence; and so entertaining as well. And the language is great and everything, and it's not a single author piece, of course; the idea may stem from maybe, I don't even know what the source material is, but certainly if you look at the credits, who wrote these episodes and who directed them and who produced them, there's always a team. It's never just the one screenwriter.


MIRO ROMAN:

That’s why I think when we look at these things, then the approach should be, so you have some kind of a thing. Is it Slow Horses or is it Shakespeare or whatever, and then if you look at it from the perspective of 18th century or Enlightenment authorship, you will get one story. So you could argue, yes, of course we can find the people who are most relevant for the success of this story. That's one way of looking, and then trying to find the person that you think is the most successful. And then because this person has 32% of authorship in your way of definition, and all the rest have less than 32%, you would say, okay, it's the guy with the 32%, and he's going to be the author. So this would be an Enlightenment understanding of the author in a mathematical way of, for instance, structuralists, of Chomsky.

But if you take another perspective and say, okay, we look at it more from the position of a cloud and of a brand and so on, you will get a completely different narrative of what this thing is about. Nevertheless, both of these perspectives are not here to be judged, which one is better. So you again play the Enlightenment game where you say, okay, this method is more accurate than this method. Rather, let's look at it: this method gives one face to this, and this method gives another face. So the more methods we have to communicate with this object, the object will get better and it will tell us better stories. And by this, I think it's a strategic move in which you are able to incorporate all the different perspectives, and things get, in my understanding, more capacious and rather nicer than if you want to get to a kind of a truth of the thing which looks a little bit obsolete today.


SEBASTIAN MICHAEL:

Yes. And incidentally, I can't remember, unfortunately, off the top of my head who it was who said or wrote it, but certainly somebody pointed out to me that, great as Shakespeare undoubtedly is, and we know he's great because he keeps speaking to us and we keep relating to him, and he gets done in all the languages all across the world, four hundred years after his death: that would mean that he obviously has something to say, but that notwithstanding, that one of the reasons maybe we think of him as as great as he is, is because we think of him as great, and because we have so much of him.

We have such a big body of work, we have such a large quantity of words, so many plays by him that he dominates the English Renaissance environment. And because his texts survive, because of the First Folio, it becomes a centre of gravity almost, or a gravitational object, like a little planet or a big planet, or a little sun around which the others then circle. And if the First Folio hadn't been published posthumously by his friends, and we only had the quartos, some of them good, some of them bad, we would lack half of his plays, I believe, and he may never have reached the status he has, not because he wasn't good, but because we wouldn't know of him, we wouldn't have the material.


Now, I think what I want to do now is maybe move a little bit towards talking to you about what I've done or what we've done together, and what I then did with the renderings that you did for me, and it's really excellent to be able to see this in the context of our discussion.

We wanted to compare how the sonnets relate to William Shakespeare's plays. We've adopted and refined an approach that had already been done by the previous researchers: one of the problems we have, I mentioned this earlier, with the sonnets is that they're very short texts. They consist of fourteen lines of ten to eleven syllables each, so they are too short to individually compare with the plays. But what people have done before us is they have grouped the sonnets into zones, and the first thing I did was divide some of these zones into sub-zones so that each group, each zone or sub-zone, would have a similar, comparable number of words. They would be of a comparable size because the original zones, as they had been defined, were of unequal sizes.

And then we also treated the plays not as complete plays but as acts, because, roughly speaking, and this is of course not a precise thing, but roughly speaking, a zone of sonnets, a group of sonnets that consists of maybe between 15 and 20 sonnets, can be compared reasonably with the size of an act in Shakespeare's plays. And so we divided the plays up into acts, into the established acts. They are not actually authorial: the acts were imposed on the plays later, but they are given and people accept them, generally speaking, now, today, as the way we do these plays or read these plays; and we grouped the sonnets into zones that are comparable in size.

And that then gives us a number of files that we have that we can compare with each other, within reason.


MIRO ROMAN:

Just to add to this, this is important because the discrepancy in the size of one sonnet and one play is big. For the algorithms, it's important to have similar sizes. They don't have to be the same, but for instance, if they are between, I don't know, one thousand and one and a half thousand, it's perfect. But if one is twenty and the other one is five thousand, then the ones that are small will always cluster together and the ones… – the size will dominate in the clustering. And if we bring them to relatively similar sizes, then the relevance of the size becomes quite smaller. And we got to around two hundred elements. So when we cut the text and when we clustered the sonnets we got 203, I think, different small texts.


SEBASTIAN MICHAEL:

Yes. And we included, this is important to note as well, we included the narrative poems as well. We included the plays and the sonnets and also Venus and Adonis and The Rape of Lucrece and The Phoenix and the Turtle and A Lover's Complaint separately, even though it was published in the same volume, of course.

And what we then tried to look for was what I keep calling affinities. You described this earlier very well. You said, well, we effectively project one piece's vocabulary onto another piece’s or one book's vocabulary on another book's vocabulary. And we want to create a picture, almost, a sort of a clustering. And the instrument puts out these hit maps…


MIRO ROMAN:

They show a set of data, and then in this particular map of data, they show how some data is in contrast to the other. They say, okay, there is something about this data. It's either a higher number of elements or some kind of whichever measure you took. In our case, it was the frequency of the repetition of words. That's important to know that the basic numbering is very similar. So we just count how many times each word appears. And it's also important to note that the words, for instance, if we have the word ‘play’, ‘plays’, ‘playing’, ‘played’, we will condense all of it to ‘play’. So we will lose a little bit, so some are nouns, some verbs, some are the past or the future and so on. But this we will sacrifice. This is a kind of a thing that we said. So you always when working with data, you always get data and then you have to normalise this data. So you have to do something with it, you polish it a little bit and so on. There is always a certain level that you lose. But that's how it is.


SEBASTIAN MICHAEL:

Yes. And I think that would be considered to be normal practice, I imagine, for this sort of approach.

So first of all, we ran everything together. In other words, we put all the sonnets and all the plays and all the texts together. And we realised, much as expected, actually, that the sonnets cluster together and the plays cluster together in a way that we would expect. But because we wanted to see how different zones of sonnets behave differently, we took the different zones of the sonnets and ran them separately. And we did this three times for each zone so as to get a little bit of consistency, so we didn't have one off hits: we wanted to know whether there's a pattern that emerges.

And then I took these hit maps and I looked at which plays settle where around each zone.


MIRO ROMAN:

Just to give a context what happens. So we we took these 203 different texts or 200 different small texts, and then this gives a kind of a grid that looks like, let's say, like a chess board. On top of this chess board, we projected all the texts, and then we looked which texts come closer to each other. So somehow which texts ‘like’ each other. And this is based on the numberings from the dictionaries.

So the parts of the text that have a similar dictionary and similar frequencies of words, they would come and cluster closer together. And then we were looking for all the zones which other texts come close to each zone. So for instance, for Zone 3, we had then the Two Gentlemen of Verona, Act III, we had Venus and Adonis, and we had The Comedy of Errors in the first run. And then to check a little bit more in detail, we do the second run, we again have Venus and Adonis the closest. So we took the three closest ones, A Midsummer Nights Dream and The Rape of Lucrece. And by this we get a kind of a feeling how, where it feels the most comfortable. And then we did this for all the zones.


SEBASTIAN MICHAEL:

Exactly. And then to give this a – because what we receive is a visual output, we receive these almost like grids or hit maps where we have these clusters and we can look at them, and the clusters are quite clear in the sense that they immediately form a picture.

But to be able to analyse this more clearly, I then applied a scoring system whereby I counted which plays settle around which zones of the sonnets, how often. The assumption being that if in three renderings a text… – let’s take Zone 3, actually, because it's interesting: what we find is that Zone 3 of the sonnets has closest to it Romeo and Juliet as a play, Venus and Adonis, which is a narrative poem, The Rape of Lucrece, which is also a narrative poem, and then not quite so close, but also in its vicinity we find A Midsummer Night's Dream and we find Love's Labour's Lost, and The Two Gentlemen of Verona, you mentioned. Troilus and Cressida.

So these are the plays that cluster around Zone 3. And we did this for all the zones, and we did this three times for each zone. And then because this is a two-dimensional map that we get, I determined – and this you could argue is, it's not quite arbitrary, there is a logic behind it, but my rationale was that the instrument puts out these cells and you have a cell where you have the title of the zones of the sonnets, and then around it you have the titles of the plays and of the other texts. And this is always in cells, in almost like a grid. And my contention is that if you have a zone of the sonnets, a Zone 3, then directly around it there are eight cells that can be occupied by other texts. In one case or two, only in one or two cases, the same cell was occupied by another text as well. That would be the closest proximity you get. And then directly around it you have eight cells. And then in the next 'circle of affinity', as I would call it then, you have 16 cells that could be occupied, and in the third circle around it you have 24, and you could continue this, but I didn't because after three circles I felt we are no longer really that interested.

And so to give this a kind of numerical weight, I then quite literally counted the occurrences of another text's title around each zone of the sonnets, and then gave them a score. And so if a text appears directly next to Zone 3, it would get a high score. And I did this three times for each zone, and then created tables that count how often these texts appear in the vicinity of the zones of the sonnets, and gave them a weight, if you like. A weight of proximity, and the weight of proximity is determined by how many opportunities the text would have to settle in one of these three circles of affinity.

And so what then happened next, once this was carried through, I put the plays on a timeline because we have a fairly well established, accepted timeline for the plays, and I put the zones of the sonnets where the plays are. And what we receive is this timeline, which shows us pretty much as expected, that the sonnets cluster strongly around Romeo and Juliet, A Midsummer Night's Dream, The Two Gentlemen of Verona, Titus Andronicus, also a bit King John, also a bit Love's Labour's Lost.

What we would expect is that Zones 1, 2, and 4 cluster around there, and what we would expect from the previous research that has been done is that Zone 3 shows distinctly different behaviour and reaches further into later plays. And it really doesn’t. What several zones of these sonnets do – not, however, Zone 3, but Zone 1 and Zone 2 and Zone 4 – is they show a larger than expected affinity with Timon of Athens, and I have no explanation for that at the moment. We did not expect that. Timon of Athens is considered to be a problem play, it’s not considered one of the greats. And interestingly, some also group around The Tempest, which is firmly established as a late play of Shakespeare’s, even later than the publication of the sonnets.

But Zones 1, 2, and 4 have sonnets that group around Timon of Athens and The Tempest, as well as around the earlier plays, and Zone 1 in particular has a strong, a heavy cluster around Timon of Athens. And that's interesting. That's unexpected and interesting, but it doesn't show the same pattern as the MacDonald P Jackson research shows, that Zone 3 should reach right into the 1600s. We can't confirm that.

We can't refute it either. Certainly with what we have, I don't think what we've carried out so far is strong enough to say categorically the rare word analysis research is wrong. Also, the rare word analysis has found something that has been replicated, but we don't find it in our approach by projecting the entire texts onto entire texts, we do not find that Zone 3 behaves differently.


MIRO ROMAN:

And then what happens? How would you argue about the Timon of Athens and this cluster, which is kind of separated on the other side, can we make a story for this as well?


SEBASTIAN MICHAEL:

This is the big question. I think the next question to examine is precisely this. Why there is this affinity between the sonnets and Timon and indeed The Tempest. These are two things that we don't have a story for yet, other than the possibility, and this has been put forward by other scholars, that Shakespeare writes the sonnets, lets them rest, or continues writing for a while, then lets them rest, and then picks them up because he feels it's time to put them together for a collection to be published.

And we have good reasons to assume that it was Shakespeare who put together the collection. We don't know whether it was him who wanted them published at that time, but the collection is curated in such a way that suggests an authorial hand, or certainly somebody who understood the sonnets well and wanted them structured in precisely this way: it's not accidental or random.

This project and its website are a work in progress.
If you spot a mistake or if you have any comments or suggestions, please use the contact page to get in touch.
​To be kept informed of developments, please subscribe to the email list. 
If you would like to donate, you can do so here. Thank you!
​​

©2022-25  |   SONNETCAST – WILLIAM SHAKESPEARE'S SONNETS RECITED, REVEALED, RELIVED
​
  • Home
  • About
  • OVERVIEW
    • Introduction
    • The Procreation Sonnets
    • Special Guest: Professor Stephen Regan – The Sonnet as a Poetic Form
    • Special Guests: Sir Stanley Wells and Paul Edmondson – The Order of the Sonnets
    • The Halfway Point Summary
    • The Rival Poet
    • Special Guest: Professor Gabriel Egan – Computational Approaches to the Study of Shakespeare
    • Special Guest: Professor Abigail Rokison-Woodall – Speaking Shakespeare
    • Special Guest: Professor David Crystal – Original Pronunciation
    • The Fair Youth
    • Special Guest: Professor Phyllis Rackin – Shakespeare and Women
    • The Dark Lady
    • A Lover's Complaint
    • The Quarto Edition of 1609 and its Dedication
    • Dating the Sonnets— With Miro Roman
    • Summary & Conclusion
  • THE SONNETS
    • Sonnet 1: From Fairest Creatures We Desire Increase
    • Sonnet 2: When Forty Winters Shall Besiege Thy Brow
    • Sonnet 3: Look in Thy Glass and Tell the Face Thou Viewest
    • Sonnet 4: Unthrifty Loveliness, Why Dost Thou Spend
    • Sonnet 5: Those Hours That With Gentle Work Did Frame
    • Sonnet 6: Then Let Not Winter's Ragged Hand Deface
    • Sonnet 7: Lo! In the Orient When the Gracious Light
    • Sonnet 8: Music to Hear, Why Hearst Thou Music Sadly?
    • Sonnet 9: Is it for Fear to Wet a Widow's Eye
    • Sonnet 10: For Shame Deny That Thou Bearst Love to Any
    • Sonnet 11: As Fast as Thou Shalt Wane, So Fast Thou Growst
    • Sonnet 12: When I Do Count the Clock that Tells the Time
    • Sonnet 13: O That You Were Yourself, But Love, You Are
    • Sonnet 14: Not From the Stars Do I My Judgement Pluck
    • Sonnet 15: When I Consider Every Thing That Grows
    • Sonnet 16: But Wherefore Do Not You a Mightier Way
    • Sonnet 17: Who Will Believe My Verse in Time to Come
    • Sonnet 18: Shall I Compare Thee to a Summer's Day
    • Sonnet 19: Devouring Time, Blunt Thou the Lion's Paws
    • Sonnet 20: A Woman's Face, With Nature's Own Hand Painted
    • Sonnet 21: So Is it Not With Me as With That Muse
    • Sonnet 22: My Glass Shall Not Persuade Me I Am Old
    • Sonnet 23: As an Unperfect Actor on the Stage
    • Sonnet 24: Mine Eye Hath Played the Painter and Hath Stelled
    • Sonnet 25: Let Those Who Are in Favour With Their Stars
    • Sonnet 26: Lord of My Love to Whom in Vassalage
    • Sonnet 27: Weary With Toil, I Haste Me to My Bed
    • Sonnet 28: How Can I Then Return in Happy Plight
    • Sonnet 29: When in Disgrace With Fortune and Men's Eyes
    • Sonnet 30: When to the Sessions of Sweet Silent Thought
    • Sonnet 31: Thy Bosom Is Endeared With All Hearts
    • Sonnet 32: If Thou Survive My Well-Contented Day
    • Sonnet 33: Full Many a Glorious Morning Have I Seen
    • Sonnet 34: Why Didst Thou Promise Such a Beauteous Day
    • Sonnet 35: No More Be Grieved at That Which Thou Hast Done
    • Sonnet 36: Let Me Confess That We Two Must Be Twain
    • Sonnet 37: As a Decrepit Father Takes Delight
    • Sonnet 38: How Can My Muse Want Subject to Invent
    • Sonnet 39: O How Thy Worth With Manners May I Sing
    • Sonnet 40: Take All My Loves, My Love, Yea Take Them All
    • Sonnet 41: Those Pretty Wrongs That Liberty Commits
    • Sonnet 42: That Thou Hast Her, it Is Not All My Grief
    • Sonnet 43: When Most I Wink, Then Do Mine Eyes Best See
    • Sonnet 44: If the Dull Substance of My Flesh Were Thought
    • Sonnet 45: The Other Two, Slight Air and Purging Fire
    • Sonnet 46: Mine Eye and Heart Are at a Mortal War
    • Sonnet 47: Betwixt Mine Eye and Heart a League Is Took
    • Sonnet 48: How Careful Was I When I Took My Way
    • Sonnet 49: Against That Time, if Ever That Time Come
    • Sonnet 50: How Heavy Do I Journey on the Way
    • Sonnet 51: Thus Can My Love Excuse the Slow Offence
    • Sonnet 52: So Am I as the Rich, Whose Blessed Key
    • Sonnet 53: What Is Your Substance, Whereof Are You Made
    • Sonnet 54: O How Much More Doth Beauty Beauteous Seem
    • Sonnet 55: Not Marble, Nor the Gilded Monuments
    • Sonnet 56: Sweet Love, Renew Thy Force, Be it Not Said
    • Sonnet 57: Being Your Slave, What Should I Do But Tend
    • Sonnet 58: That God Forbid That Made Me First Your Slave
    • Sonnet 59: If There Be Nothing New, But That Which Is
    • Sonnet 60: Like as the Waves Make Towards the Pebbled Shore
    • Sonnet 61: Is it Thy Will Thy Image Should Keep Open
    • Sonnet 62: Sin of Self-Love Possesseth All Mine Eye
    • Sonnet 63: Against My Love Shall Be as I Am Now
    • Sonnet 64: When I have Seen by Time's Fell Hand Defaced
    • Sonnet 65: Since Brass, Nor Stone, Nor Earth, Nor Boundless Sea
    • Sonnet 66: Tired With All These, for Restful Death I Cry
    • Sonnet 67: Ah, Wherefore With Infection Should He Live
    • Sonnet 68: Thus Is His Cheek the Map of Days Outworn
    • Sonnet 69: Those Parts of Thee That The World's Eye Doth View
    • Sonnet 70: That Thou Are Blamed Shall Not Be Thy Defect
    • Sonnet 71: No Longer Mourn for Me When I Am Dead
    • Sonnet 72: O Lest the World Should Task You to Recite
    • Sonnet 73: That Time of Year Thou Mayst in Me Behold
    • Sonnet 74: But Be Contented When That Fell Arrest
    • Sonnet 75: So Are You to My Thoughts as Food to Life
    • Sonnet 76: Why Is My Verse so Barren of New Pride
    • Sonnet 77: Thy Glass Will Show Thee How Thy Beauties Wear
    • Sonnet 78: So Oft Have I Invoked Thee for My Muse
    • Sonnet 79: Whilst I Alone Did Call Upon Thy Aid
    • Sonnet 80: O How I Faint When I of You Do Write
    • Sonnet 81: Or I Shall Live Your Epitaph to Make
    • Sonnet 82: I Grant Thou Wert Not Married to My Muse
    • Sonnet 83: I Never Saw That You Did Painting Need
    • Sonnet 84: Who Is it That Says Most, Which Can Say More
    • Sonnet 85: My Tongue-Tied Muse in Manners Holds Her Still
    • Sonnet 86: Was it the Proud Full Sail of His Great Verse
    • Sonnet 87: Farewell, Thou Art Too Dear for My Posessing
    • Sonnet 88: When Thou Shalt Be Disposed to Set Me Light
    • Sonnet 89: Say That Thou Didst Forsake Me for Some Fault
    • Sonnet 90: Then Hate Me When Thou Wilt, if Ever, Now
    • Sonnet 91: Some Glory in Their Birth, Some in Their Skill
    • Sonnet 92: But Do Thy Worst to Steal Thyself Away
    • Sonnet 93: So Shall I Live, Supposing Thou Art True
    • Sonnet 94: They That Have Power to Hurt and Will Do None
    • Sonnet 95: How Sweet and Lovely Dost Thou Make the Shame
    • Sonnet 96: Some Say Thy Fault Is Youth, Some Wantonness
    • Sonnet 97: How Like a Winter Hath my Absence Been
    • Sonnet 98: From You Have I Been Absent in the Spring
    • Sonnet 99: The Forward Violet Thus Did I Chide
    • Sonnet 100: Where Art Thou, Muse, That Thou Forgetst so Long
    • Sonnet 101: O Truant Muse, What Shall Be Thy Amends
    • Sonnet 102: My Love Is Strengthened Though More Weak in Seeming
    • Sonnet 103: Alack, What Poverty My Muse Brings Forth
    • Sonnet 104: To Me, Fair Friend, You Never Can Be Old
    • Sonnet 105: Let Not My Love Be Called Idolatry
    • Sonnet 106: When in the Chronicle of Wasted Time
    • Sonnet 107: Not Mine Own Fears Nor the Prophetic Soul
    • Sonnet 108: What's in the Brain That Ink May Character
    • Sonnet 109: O Never Say That I Was False of Heart
    • Sonnet 110: Alas, 'Tis True I Have Gone Here and There
    • Sonnet 111: O For My Sake Do You With Fortune Chide
    • Sonnet 112: Your Love and Pity Doth Th'Impression Fill
    • Sonnet 113: Since I Left You, Mine Eye Is in My Mind
    • Sonnet 114: Or Whether Doth My Mind, Being Crowned With You
    • Sonnet 115: Those Lines That I Before Have Writ Do Lie
    • Sonnet 116: Let Me Not to the Marriage of True Minds
    • Sonnet 117: Accuse Me Thus, That I Have Scanted All
    • Sonnet 118: Like as to Make Our Appetites More Keen
    • Sonnet 119: What Potions Have I Drunk of Siren Tears
    • Sonnet 120: That You Were Once Unkind Befriends Me Now
    • Sonnet 121: Tis Better to Be Vile Than Vile Esteemed
    • Sonnet 122: Thy Gift, Thy Tables, Are Within My Brain
    • Sonnet 123: No! Time, Thou Shalt Not Boast That I Do Change
    • Sonnet 124: If My Dear Love Were But the Child of State
    • Sonnet 125: Were't Aught to Me I Bore the Canopy
    • Sonnet 126: O Thou, My Lovely Boy, Who in Thy Power
    • Sonnet 127: In the Old Age Black Was Not Counted Fair
    • Sonnet 128: How Oft When Thou, My Music, Music Playst
    • Sonnet 129: Th'Expense of Spirit in a Waste of Shame
    • Sonnet 130: My Mistress' Eyes Are Nothing Like the Sun
    • Sonnet 131: Thou Art as Tyrannous, so as Thou Art
    • Sonnet 132: Thine Eyes I love, and They, as Pitying Me
    • Sonnet 133: Beshrew That Heart That Makes My Heart to Groan
    • Sonnet 134: So Now I Have Confessed That He Is Thine
    • Sonnet 135: Whoever Hath Her Wish, Thou Hast Thy Will
    • Sonnet 136: If Thy Soul Check Thee That I Come so Near
    • Sonnet 137: Thou Blind Fool Love, What Dost Thou to Mine Eyes
    • Sonnet 138: When My Love Swears That She Is Made of Truth
    • Sonnet 139: O Call Not Me to Justify the Wrong
    • Sonnet 140: Be Wise as Thou Art Cruel, Do Not Press
    • Sonnet 141: In Faith, I Do Not Love Thee With Mine Eyes
    • Sonnet 142: Love Is My Sin, and Thy Dear Virtue Hate
    • Sonnet 143: Lo! As a Careful Housewife Runs to Catch
    • Sonnet 144: Two Loves I Have of Comfort and Despair
    • Sonnet 145: Those Lips That Love's Own Hand Did Make
    • Sonnet 146: Poor Soul, the Centre of My Sinful Earth
    • Sonnet 147: My Love Is as a Fever, Longing Still
    • Sonnet 148: O Me! What Eyes Hath Love Put in My Head
    • Sonnet 149: Canst Thou, O Cruel, Say I Love Thee Not
    • Sonnet 150: O From What Power Hast Thou This Powerful Might
    • Sonnet 151: Love Is too Young to Know What Conscience Is
    • Sonnet 152: In Loving Thee Thou Knowst I Am Forsworn
    • Sonnet 153: Cupid Laid by His Brand and Fell Asleep
    • Sonnet 154: The Little Love-God, Lying Once Asleep
  • THE SONNETEER
  • EVENTS
  • TEXT NOTE
  • CONTACT
    • SUBSCRIBE