Open Shakespeare Blog

Shakespeare Quarterly part II

Here, for those interested, is my response to Professor Andrew Murphy’s article in the Shakespeare Quarterly:

“I am a member of the Open Shakespeare Project (www.openshakespeare.org – not to be confused with Open Source Shakespeare) and found this article extremely interesting. I feel that your conclusion points towards many of the approaches to Shakespeare that our project incorporates, and that are part of a more ’social’ approach to Shakespeare.

It occurs to me that as well as spreading Shakespeare to a far larger audience, cheap editions of Shakespeare are also a godsend for students, who may write their thoughts all over their pages without fear of ruining something expensive. If all these scribbles were collected, a formidable body of knowledge of Shakespeare would be available, as would an evolving record of responses to this writer.

Our site has recently acquired the ability for anyone to annotate Shakespeare’s works, and soon will add the capacity to attribute, tag, sort, and hide the annotations made. With this we hope to create an ‘open’ edition of Shakespeare’s plays that would grow along similar lines to Wikipedia, harnessing the power of the internet to bring many minds to bear upon a single subject.

Such problems as found with the OSS still pose difficulties for us: we have to use Moby as a source text since all others, including (lamentably) the wordhoard text, are under copyrights that conflict with our Open license. Nevertheless, just as textual problems are flagged up in a critical edition with a footnote, so too could such problems be drawn to the reader’s attention through annotation. As Whitney Trettien’s article points out, the web comes into its own when it is an ‘expressive medium’ itself, and not one which, like the OSS, unthinkingly delivers content.

Essentially, ISE already has this kind of thinking process, displaying an editor’s annotation on each text right down to the textual variants. It even has the ability to sort such annotations. However, the problems you identify – different kinds of editing, slow progress, uneven quality – all inevitably result, I feel, from the fact that each text only has a single editor. More editors would speed progress but it is not, of course, a given that more editors would improve quality. Wikipedia is still notorious for its occasional inaccuracies.

Nevertheless, such inaccuracies can be resolved by the same process that generates them. If anyone can annotate, so anyone can also review annotation and improve it. I realise that this is a rather utopian position and that people can as easily vandalise as beautify, but I feel it to be a more tenable one than that held by the websites here. The internet allows for unprecedented levels of input as well as appreciation, and such potential is not exploited by the sites reviewed in this article.

Talking of input and appreciation brings me to one further aspect of these sites that interests me, namely how easily one can print from them. The OSS shines in this respect, but attempting to print an ISE fascimile is rather more difficult. I must also admit that printing from an annotated text at The Open Shakespeare Project is currently impossible: the tool only went live fairly recently, and the site is still very much under construction. One day we hope to harness the accumulated and peer-reviewed annotations of many to produce a printed text, and thus complete a cycle between internet and ‘real world’ Shakespeare.

Such a cycle is ignored at the peril of digital scholarship, for it is the mix of real events and online responses to them that makes Facebook so addictive. Other addictive qualities, such as the relatively small time commitment and the chance to interact with other users could be profitably replicated by internet Shakespeare projects. After all, anything capable of sustaining those involved in the long task of making productive use of Shakespeare is always welcome and need not be to the detriment academic rigour.”

Here is the author’s reply:

James: thanks very much for this thoughtful and very interesting response to the review. I’ve had a quick look at your site and think it’s very interesting. It seems to me that you really are pushing forward with a Web 2.0 approach to things, making your site a good deal more interactive than the three I review here. I like the idea of building up a ‘database’ of annotations — and you’re right, of course: textual annotation might be a way round the problems of having to use an outdated source text. I still tend to worry about Wikipedia as a model, however. I always like to tell my students stories of humourous examples of deliberate tampering with Wikipedia, as a way of warning them off using it in their research (perhaps you may know what happened to Thierry Henry’s page, after France put Ireland out of the World Cup?). Will OSP be entirely ‘user governed’, or will you have some sort of ‘top down’ quality control mechanisms? Andy

The discussion raises some interesting issues. How bitesize and user friendly is our website? To what extent should ‘Open Shakespeare’ be user-governed? Any comments and suggestions you may have will be very welcome.


Posted: April 6th, 2010 | Author: James Harriman-Smith | Filed under: Community, Musings, Publicity, Technical, Texts | No Comments »

Annotation is here!

The fabled ability to annotate any text of Shakespeare is now part of the Open Shakespeare website! Massive thanks to Nick for all his work on something far too complex for me to even describe its complexity (apparently there were difficulties with there being ‘no TextRange in the DOM’).

Here’s how to get annotating:

  1. Click ‘read texts’ on the homepage.
  2. Scroll down to find your play of choice in the list and click on ‘annotate’.
  3. Find the line you wish to annotate, then highlight it, then click on the little notepad that appears.
  4. In the newly-present dialogue box, type your words of wisdom.
  5. Press enter to save your annotation and close the dialogue box.

Work has already begun on Hamlet, but feel free to annotate wherever you wish.

As to what you should write in an annotation, we currently have no guidelines: shorter is usually better, and, obviously, offensive comments will be removed – but apart from that, all insights and explications are very welcome.

Improvements to come include: restricting editing and deletion to the owner of each annotation, showing user information on annotations, the ability to filter annotations, and the capacity to use markdown in each comment.


Posted: March 16th, 2010 | Author: James Harriman-Smith | Filed under: Community, News, Releases, Technical, Texts | No Comments »

Editions

There’s a famous line in Hamlet: “O that this too too solid flesh would melt” (1.ii.129). Not only is it the start of an agonised soliloquy in which Hamlet tortures himself over his mother’s apparent desire for her dead husband’s brother, but it is also a line over which many generations of scholars have wrangled. You see, there are several different editions of Hamlet: a first quarto printed in 1603, and then another in 1604, before the folio edition appeared in 1623. The quartos (so named for being the size of a quarter of a sheet of paper) would normally be used for any critical text because they are the earliest. Unfortunately, the quartos for Hamlet are so corrupt that they can’t really be trusted. Nevertheless…they still might contain passages that are more correct than the folio, composed after Shakespeare’s death, ever could be.

To return to that line of Hamlet: the folio has ’solid flesh’, but the first quarto has ’sallied flesh’, and the second quarter has either ’sallied’ or ’sullied’. Each variant changes the way we see Hamlet.

But what does this have to do with Open Shakespeare? Well, this little example shows how important it is to have a reliable text for each play, especially now that we will be annotating and one day producing critical editions from them. Currently, we have the Gutenberg text of the first folio, although, like many other first folios, this text is actually a hodgepodge of other first folios recomposed sometime in the 18th Century. We also have the Moby Shakespeare, so called for the man who produced the most widely circulated digital version of Shakespeare’s plays – but without saying what edition he used…

Having consulted with a few professors here in Cambridge (credit where it’s due: the info about composite folios comes from Prof. Kerrigan), it appears that there is a first folio actually in Cambridge. If we could find a way of digitising it, this would be a great benefit to Open Shakespeare, establishing, if not a ‘perfect’ text (which, once the Globe and Shakespeare’s own playtexts burnt down during a performance of Henry VIII could never now be possible), at least one with some historical authority.

I have no idea how we will digitise the Cambridge folio, so any suggestions would be welcome. I heard once that a young Arthur Miller, in order to hone his play-writing skills, copied out almost all of Shakespeare’s plays by hand. So, if you’re an aspiring playwright with lots of time on your hands, do get in touch.


Posted: March 15th, 2010 | Author: James Harriman-Smith | Filed under: Musings, Technical, Texts | 1 Comment »

XML and the Natural Language Toolkit

I’ve been playing with the nltk (natural language toolkit) and the really useful Jon Bosak xml annotated corpus these days,  and  this are some of the graphs I’ve been able to parse after analyzing the speech of the main characters of the play (characters that say more than 100 lines of code:

exclamations and interrogations

exclamations and interrogations

Here we can see that Macduff is screaming a lot, and that when everybody talks is never to question, but to assert… Poor Macbeth and Lady Macduff question everything, while Lady Macbeth just as much as asserting.

Regarding amount of words in the play, by far Macbeth is the one that talks more:

amount of words spoken by main characters

amount of words spoken by main characters

But what about lexical variety? In this next graph, we can see the variety of the words:

Macbeth - lexical variety

Macbeth - lexical variety

Here we can see the variety of characters speech.

The brown-ish words are said just once per character. The light greens are word that will repeat on their speech, and the dark greens are repetitions of the light green words. I still need to take more measures to see if this is actually the way everybody speaks: by repeating a lot of small words with just some new words once in a while. (There are more words that appear just once, than the words you will repeat through most of your speech! Think about it!)


Posted: February 26th, 2010 | Author: adalovelace | Filed under: Technical, Texts | 2 Comments »

OCRing Shakespeare Entry from Encyclopaedia Britannica 11th Edition

One of next things we want to do for open shakespeare is provide an open introduction for to his works. The obvious idea for this was to use the Shakespeare entry in the 11th ed of the Encyclopaedia Britannica as detailed in this ticket:

http://p.knowledgeforge.net/shakespeare/trac/ticket/24

We’ve now written code to grab the relevant tiffs off wikimedia:

http://p.knowledgeforge.net/shakespeare/svn/trunk/src/shakespeare/src/eb.py

You can also find them online (28 pages) starting at:

http://upload.wikimedia.org/wikipedia/commons/scans/EB1911_tiff/VOL24%20SAINTE-CLAIRE%20DEVILLE-SHUTTLE/ED4A800.TIF

Next step is to then OCR this stuff (after that we can move on to proofing whether by ourselves or via http://pgdp.net). When we first had a stab at this back in April we tried using gocr. Unfortunately the results were so bad that they were unusable. Recently an old ocr engine of HP’s has been released as open source under the name of tesseract:

http://code.google.com/p/tesseract-ocr/

We’re going to have a go using this — though if there is anyone out there with access to an alternative system we’d love to hear about it.


Posted: August 14th, 2007 | Author: admin | Filed under: Technical, Texts | No Comments »

Annotation is Working!

After another push over the last few days I’ve got the web annotation system for Open Shakespeare operational (we’ve been hacking on this on and off since back in December).

To see the system in action visit:

http://demo.openshakespeare.org/view?name=phoenix_and_the_turtle_gut&format=annotate

Quite a bit of effort has been made to decouple the annotation system from Open Shakespeare so that it can be easily reused elsewhere. You can find the code for the annotation system (nicknamed annotater) here:

http://p.knowledgeforge.net/shakespeare/svn/annotater/trunk/

There are still some substantial issues with the Open Shakespeare implementation the most obvious of which are:

a) large texts bring the javascript to its knees ((The Phoenix and the Turtle is the shortest of Shakespeare’s works which is why I’m using it).

b) security/user authentication for annotation adding/editing/deleting

But the basic system is working.


Posted: April 10th, 2007 | Author: admin | Filed under: News, Technical | 3 Comments »

Improvements to the Concordance

One of the main items scheduled for v0.4 of open shakespeare is improvements to the responsiveness of the concordance. Using the v0.3 codebase, using just the sonnets as test material, loading up the list of words for the concordance alone took around 24s on my laptop. This is because even with a single text there are already over 18,000 items in the concordance and we were having to read through all of these to generate the list of words. Some recent commits (e.g. r:72) have gone some way to improving this responsiveness (loading word list is now 3s now compared to 24s) but the result is not entirely satisfactory (printing full statistics is 13s compared to 40s previously). One obvious way to go futher is to use caching — either of individual web pages or of particular key parts such as all the distinct words occurring in the concordance (caching works because the concordance only changes when new texts are added which will usually only happen once — when the system is first initialised).

Relatedly and r:74 is a first step on filtering the concordance — in this case to exclude roman numerals and various non-words. Doing this made me think about whether the concordance should be storing actual words or just stems — for example, it does not seem to make much sense to have different entries for kill, kills, killed etc. Using a stemming algorithm such as the porter stemmer (which I notice has a nice python implementation directly available) we can easily stem each word as we go along. This would have several benefits one of the most prominent being a dramatic reduction in the basic dictionary size (i.e. the number of distinct words in the concordance).


Posted: January 3rd, 2007 | Author: admin | Filed under: Technical | 1 Comment »

Adding Web-Based Annotation Support

We intend to add annotation/commentarysupport to the open shakespeare web demo either in this release or next. As a first step we’ve been looking to see what (open-source) web-based annotation systems are already out there. Below is our list of what we’ve been able to find so far (if you know of more please post a comment). After examining several of these in some detail the one we’re going to try our properly is marginalia (if you’re interested our current efforts to do this including writing a python wsgi annotation service backend can be found here in the subversion repository).

  1. stet: javascript annotation system used for gpl v3 comments system

    • http://en.wikipedia.org/wiki/Stet_(software)
    • Bit of a hack at present and did not seem designed for external reuse (when I last looked the README was fairly emphatic that this was very alpha with little documentation)
  2. commentary: javascript based wsgi middleware developed by ian bicking

    • http://pythonpaste.org/commentary/
    • Rather hacked together (apparently he coded it in a week). Had problems getting it working locally and no documentation to help in adaptation. Seems to be unmaintained (demo site is currently down) which is perhaps not surprising given how many other projects Ian has on the go.
    • One nice feature is that you don’t seem to have to mess with the underlying web pages you want to add comments to (this only works if you are sitting on top of another wsgi application)
  3. marginalia: javascript library and spec for adding web annotation to pages

    • http://www.geof.net/code/annotation/
    • javascript code seems well factored and understandable and docs are good
  4. annotea: W3C project based on RDF

    • http://www.w3.org/2001/Annotea/
    • Been around a long time and now seems to be inactive
    • Server and client support rather lacking. No simple interface based on, e.g., javascript — you have to write a special client yourself — which is a major drawback
    • That said the protocol is well-documented and so writing a client (or a server) shouldn’t be that hard (other than having to mess around with rdf in javascript …)
    • The Schema seems reasonable
    • xpointer based which according to the marginalia site is a problem

Posted: December 18th, 2006 | Author: admin | Filed under: Technical | No Comments »

http://www.openshakespeare.org/

Pages

  • 1. What is Open Shakespeare?
  • 2. How do I use Open Shakespeare?
  • 3. Get Involved
  • 4. Team
  • 5. ‘The Marriage of Text and Technology’
  • About Us

Blogroll

  • Free Culture UK
  • Open Knowledge Foundation
An Open Knowledge Foundation Project | Contact Us | (c) Open Knowledge Foundation
All material available under CC 'by' license v3.0 (all jurisdictions) | This Content and Data is Open

Wordpress theme based on Clean Home. Login.