Computational Literary Analysis on Jane Eyre

About the project

This is a final project of course Introduction to Computational Literary Analysis at 2019 UC Berkeley Summer School, instructed by Jonathon Reeve. Course repository link: https://github.com/JonathanReeve/course-computational-literary-analysis

Introduction

Jane Eyre is a novel by English writer Charlotte Brontë, published under the pen name “Currer Bell”, on 16 October 1847. Jane Eyre follows the experiences of its eponymous heroine, including her growth to adulthood and her love for Mr. Rochester, the brooding master of Thornfield Hall. The novel revolutionised prose fiction by being the first to focus on its protagonist’s moral and spiritual development through an intimate first-person narrative, where actions and events are coloured by a psychological intensity. Charlotte Brontë has been called the “first historian of the private consciousness”, and the literary ancestor of writers like Proust and Joyce. The book contains elements of social criticism, with a strong sense of Christian morality at its core, and is considered by many to be ahead of its time because of Jane’s individualistic character and how the novel approaches the topics of class, sexuality, religion and feminism.(“Jane Eyre”, 2019)The novel has been adapted into a number of other forms, including theatre, movie, television and opera. These are significant rewritings and reinterpretations of the novel. Therefore, in this project, we are going to analyze both novel text and one of its adaption movie. The movie is directed by Cary Fukunaga and starring Mia Wasikowska and Michael Fassbender, and released on 11 March 2011 in the United States. The screenplay is written by Moira Buffini based on the novel.(“Jane Eyre (2011 film)”, 2019) For the movie, we cannot directly analyze the video. As a result, we analyze the subtitles and reviews of the movie.

To get the corpus, we use corpus DB to get the novel text from Gutenberg, and develop an tiny crawler applying requests library to get the movie reviews from IMDB. For the movie subtitles, we download it from zimuku.

Hypothesis 1: The movie is partly faithful to its original

Recently, adaption movies become more and more popular. Adaption movie of literature present the classic story through a brand new way, and improve the vitality of literature. Some adaption movies is faithfully told the story of literature, but some adaption movies have completely changed the original. Hence, to find out whether this adaption movie is faithful to original, we analyze the sentiment change both in the novel and the movie, and compare each other. What I did is to compare those sentiment change in novel text and movie subtitles. As two figures shown below, the sentiment vary all the time. From some perspective, we can find out some similarities. For instance, at the beginning of both stories, the sentiments are negative, for two stories are both describing the tragic childhood of Jane. Also at the end of each story, Jane marry Rochester, who she deeply loves. Obviously, these are happy endings, and the sentiment scores of both stories are positive. However, there are lots of differences between these two figures. As far as I am concerned, this is because the timeline may be mixed by narrative technique in movie to make a better experience for audiences. Therefore, there are some differences within the stories.

Hypothesis 2: Good movie reviews are usually critical

In IMDB, movie reviews are often with a ratings. Everyone can rate the movie and publish movie reviews. Generally speaking, the high rating reviews are usually positive, and low rating review are usually negative. However, according to the computational sentiment analysis result, the figure below, on the whole, it is correct, the slope of fitting line in positive, but some of residuals of rating is significant. It means that no matter how much the ratings are, the reviews are usually critical, and both show the pros and cons of the movie. Considering the source of movie reviews, the default order of reviews is sort by helpfulness. For the crawler only get top 25 reviews, so the reviews are always very helpful and highly recommended. Therefore, good movie reviews are usually critical.

Hypothesis 3: Computers can also write reviews

With the development of Natural Language Processing, Natural-language generation has changed from ridiculous to sensible. Natural-language generation (NLG) is a software process that transforms structured data into natural language. It can be used to generate short blurbs of text in interactive conversations (a chatbot) which might even be read out loud by a text-to-speech system.(“Natural-language generation”, 2019) It also can be used to generate reviews. In other words, computer can also write reviews.

Actually, there are so many model can be used in text generation, such as Markov Chain, Long Short Term Memory Model and so on. The simplest one is Markov Chain Model. A Markov chain is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event(“Markov chain”, 2019). In other words, it generates next word based on a certain number of words in front. However, because it is too simple that it only considered statistics without other significant factors, the text generated by the model is meanless.

Due to the not ideal result, we are going to find out a better solution to generate reviews. One of the best solution nowadays is GPT-2, a text generating model developed by OpenAI, released in Feb 2019. GPT2 was trained simply to predict the next word in 40GB of Internet text. GPT-2 is a large transformer-based language model with 1.5 billion parameters, trained on a dataset of 8 million web pages. GPT-2 is trained with a simple objective: predict the next word, given all of the previous words within some text. The diversity of the dataset causes this simple goal to contain naturally occurring demonstrations of many tasks across diverse domains. GPT-2 is pretrained by a large amount of data. The dataset emphasizes diversity of content, by scraping content from the Internet. In order to preserve document quality, the dataset used only pages which have been curated/filtered by humans—specifically, the dataset used outbound links from Reddit which received at least 3 karma. This can be thought of as a heuristic indicator for whether other users found the link interesting (whether educational or funny), leading to higher data quality than other similar datasets, such as CommonCrawl.(Radford, 2019) Due to better algorithm and much more high quality dataset, GPT-2 writes reviews almost like a human.

One of the reviews from IMDB is start with “‘Jane Eyre’ has”, so we use two model to generate two different reviews start with “‘Jane Eyre’ has”. The review written by human and two results generated by Markov Chain Model and GPT-2 with same beginings are shown below.

Review written by human

“Jane Eyre” has been my favorite book since I was 11 years old. The tale of a feisty orphan-girl-turned-governess who finds true love in a spooky mansion and ultimately redeems a tormented hero has made it to the top of every “Best Love Stories” list since it was first published in 1847, and with good reason. It’s the perfect Gothic novel, melding mystery, horror, and the classic medieval castle setting with heart-stopping romance. There have been at least 18 film versions of “Jane Eyre” and 9 made-for-television movies–27 in all! …

authorsyriejames in IMDB

Read more

Markov Model

” Jane Eyre ” has n’t well. I’ve seen most of the book, which is individualistic, screaming mad women, Jane Eyre ” I could see the music which is beautifully filmed, serious expression and this production somehow, but cold as Jane and from 2006 Masterpiece Theater version of the Rivers ( although I think the story locking any of Mr. Rochester waxing eloquent about his pants on Mrs. Fairfax. And who knows that her gravitas to show how versatile this film ) And who ‘s given the chops to the book. Hopefully, the characters. Fukunaga ‘s impossible to effectively portrays the images from sympathetic yet appropriately stern Jamie, this one thing to me think that he also appreciate that is not disappoint ? Her generosity to use tears.

GPT-2

“Jane Eyre” has been on many American television sets, it is perhaps because it is a well-regarded classic that its many imitators are more commonly accepted than those of its original source. The series’ most famous character, Mr. Darcy, and many others (including the title characters themselves), are based on real-life characters, such as John Darcy (played by Thomas Cromwell in the play and by the late Richard Burton in the film) and Mrs. Darcy (played by Jane Harrison), and the two are often seen together in the show’s third act; they have an excellent rapport. While it bears little resemblance to the original play, “Jane Eyre” is still a very good book about a young lady trying to be accepted in this world. There are also many famous illustrations by William Hogarth that were designed specifically for the show; this graphic novel captures all of the characters’ various styles and clothing.

Hypothesis 4: Color of Jane Eyre is #9B6A6E

Assume the color of Jane Eyre is average of all colors appear in the novel. For each color word appears in the novel, create a color dictionary to record its term frequency. As we all know, a color can be represented by three values RGB. Then, each color is mapping to a vector in RGB three dimensional vector space. If we add up these color vectors in three dimensional vector space by their term frequencies, and divid the sum by the total frequency, we obtain a unique color vector. The mathematical formula is

In other words, we conclude a weighted average of all color in the novel. However, there is a high probability that the color looks like grey, for the values of RGB tend to average. Hence, remove the color without saturation can make the color brighter. After that, revert the vector to a color. The color is #9B6A6E, the color shown below. The color can be used in design of book cover, movie poster and so on.

Conclusion

Jane Eyre is a classic literary novel which have been recreated to many different form, including theatre, movie and so on. The adaption movie of Jane Eyre is a significant recreation. It well told the story, and applied the unique narrative technique of movie. For the movie, audience usually write reviews to evaluate the movie and show their comments. No matter, the audience like or don’t like the movie, a good review is usually critical, showing pros and cons of the movie, but not blindly overpraise or belittle. Since people can write a movie review, why can’t computers? Computer also can write reviews. With advanced model, GPT-2, by open AI, computers can write reviews as people write. Back to the novel, if we want to find a color which can represent the novel, we find out all colors appear in the text, and calculate its weighted average. Then, we get the color #9B6A6E, and it looks like a mix of purple and red. These are my computational literary analysis on Jane Eyre.

Works Cited

Jane Eyre. (2019). Retrieved 19 August 2019, from https://en.wikipedia.org/wiki/Jane_Eyre

Jane Eyre (2011 film). (2019). Retrieved 19 August 2019, from https://en.wikipedia.org/wiki/Jane_Eyre_(2011_film)

Natural-language generation. (2019). Retrieved 19 August 2019, from https://en.wikipedia.org/wiki/Natural-language_generation

Markov chain. (2019). Retrieved 19 August 2019, from https://en.wikipedia.org/wiki/Markov_chain

Radford, A. (2019). Better Language Models and Their Implications. Retrieved 19 August 2019, from https://openai.com/blog/better-language-models/

Source code is here.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.