The Lions of Sicily is the first of the two novels written by Stefania Auci narrating Florio family's history: entrepreneurs, aficionados and visionaries, adventurers and art lovers that pushed Sicily into modernity.
Their story starts in 1799, when Paolo Florio, after the catastrophic heartquake that ruined their homeland, decided to move with his family from Bagnara Calabra to Palermo.
Paolo and Ignazio Florio from the moment they arrived with their schifazzo in Palermo, started looking forward, ambitious and restless, struggling for reaching the top. And they did that: soon their grocery store in Via dei Materassai became the best in Palermo, then the two brothers began sulphur business, bought properties and lands from fallen aristocrats, created their own shipping company...
However it has been Paolo's son, Vincenzo Florio Senior, who succeeded in achieving fame and prestige among the European elite. A man with great business acumen, in 1830 he acquired shares in the Palermo Arenella tuna fishery and finally obtained them in 1838. In the meantime he took over other tuna fisheries located along the coasts of Palermo and Trapani.
He has been the inventor of a revolutionary method for the tuna conservation, that is still used today, in oil and can; in his winery a poor wine, the Marsala, became so prestigious to be worthy of a king.
Palermo has been the theatre of the Florios' business growth, the city observed with admiration the family, but the pride was mitigated by the envy: those successful men would always have been immigrated, facchini, portarobbe. But Palermo didn't know that the Florios, in their actions, were exactly moved by a strong will of social redemption.
Through generations, new ties with increasingly important people, investments and clever intuitions, the Florios became the uncrowned rulers of Sicily.
The Stefania Auci's novel intertwines the Florios' business and social rise with their private lives, on the backgrounf of the most troubled years of Italian history, from the 1818 movements to the Garibaldi's landing in Sicily.
The aim of the present project is to understand in a deeper and clearer way the relationships and the connections that brought the Florios, a family of fishermen, to create a business empire in the 19th century.
To do so, I thought the analysis most fit to work on would be network analysis, because, using Franco Moretti's words, "once you make a network of a play (of a novel, in this case), you stop working on the play proper, and work on a model instead: you reduce the text to characters and interactions, abstract them from anything else [...], a model allows you to see the underlying structures of a complex object. It' like an X-ray".
As Franco Moretti stated in the Stanford Literary Lab Pamphlet about network theory and plot analysis, in the last years literary studies have experienced the rise of "quantitative evidence". While for centuries, the basic task of literary scholarship has been close reading of texts, nowadays, to some academic people the literary study doesn’t always require scholars to read books. This new approach to literature depends on computers to produce new insights.
A quantitative and partially automatic analysis of literature is among the aims of the modern "scriptorium" founded in 2010 by M. Jockers and F. Moretti, the Stanford Literary Laboratory. Among their research activities' there is also the plot analysis based on network theories.
The power of network analysis in the field of literature is evidenced by the rapid rise of work and interest in the field in recent years. Network extraction and analysis has been performed on subjects
as varied as the Marvel universe (Alberich et al.,
2002, Marvel Universe looks almost like a real social
network), Les Miserables ´ (Newman and Girvan, 2004, Finding and evaluating community structure in networks
),
and ancient Greek tragedies (Rydberg-Cox, 2011, Social networks and the language of greek tragedy).
Elson et al. (2010, Extracting social networks from literary
fiction) has looked at debunking comparative literature theories by examining networks
for sixty 19th-century novels.
For my analysis, I decided to create two separate networks, one till the death of Ignazio Florio (1776-1828), so with the two brothers, Paolo and Ignazio, as the main nodes of the network, and another one till the end of the book, that is, till the death of Vincenzo Florio (1799-1868).
In the next sections of the documentation I will present all the steps I did for the realization of the networks and I will try to answer to two main research questions:
The Text Encoding Initiative (TEI) is a consortium which collectively develops and maintains a standard for the representation of texts in digital form. Its chief deliverable is a set of Guidelines which specify encoding methods for machine-readable texts, chiefly in the humanities, social sciences and linguistics. Since 1994, the TEI Guidelines have been widely used by libraries, museums, publishers, and individual scholars to present texts for online research, teaching, and preservation.
The TEI scheme is formulated as an application of the Extensible Markup Language (XML), so the entire novel "I leoni di Sicilia" was transcribed through a supervised OCR and imported into a XML document.
I decided to extract from the text the mentioned characters, the places where the actions took place and the Florio's properties, like houses, enterprises or objects particularly meaningful.
In order to accomplish this task, in the XML document two main sections were implemented:
particDesc
and settingDesc
in the header, combined with the use of persName
, placeName
and objectName
in the body, referenced with @ref
xenodata
which allows easy inclusion of metadata from non-TEI schemes; The entire novel has been annotated trough the use of the TEI, highlighting the main characters, places and objects, in order to extract, in a second moment, these elements and create two networks showing their interconnections.
Undoubtedly the same goal could have been reached also without the TEI scheme, for instance, I could have used simple HTML <span> tags with different classes to distinguish between people, places and objects.
But I decided to annotate the entire novel using the TEI scheme in order to let the project "open", in fact across the humanities, the framework of the Text Encoding Initiative for XML has become the gold standard for scholarly editions of texts. Also the implementation of LOD lays the ground for a future enriched digital edition, through which it will be possible to demonstrate the potential of the LOD approach in the creation of openly available, scientifically reliable, linked, digital texts and its usefulness to a range of users.
At the same time I have decided to use the TEI for the apparently opposite reason, that is, to demonstrate that, when creating output from a TEI-encoded file, there is not a one-to-one relationship between this file and a Web-based ‘digital edition’ as output. It is also used for the creation of many other resources. For example, while the recommendations in Chapter 10 ‘Manuscript Description’ are useful when properly describing a manuscript as metadata for a digital edition, they are also used by libraries for fully detailed manuscript catalogues. There are also modules in the TEI for dictionaries, linguistic corpora, and graphs, networks, and trees.
The two XML marked up documents are available at the followink links:
I decided to create a python function, called "create_networks", to extract from the two XML files annotated with the TEI scheme, all the characters, places and main objects of the novel that are the nodes of the two networks, and their interactions are the edges.
In literary text the definition of connections between characters can vary a lot depending on a multitude of factors: the overall structure of the text, the genre, and even the plot. Characters can be considered connected whenever there are dialogues between them or when they appear in the same sentence (this circumstance being either a direct interaction between two characters or an indirect one). After a first attempt at taclking this problem, I found difficult to consider the coexistence of characters inside a sentence a sufficient condition for defining a relation between them: while testing the Python script used to detect them in the isolated sentences, many times there would be only one character, making it impossible to infer any relation whatsoever. Hence, I have considered the coexistence of characters (and places and objects) in the same paragraph as the condition for establishing relationships between them.
After having specified the criteria used to create the two networks, let's see in the details the functioning of the function "create_networks", that takes in input an XML file and two CSV files, one for the nodes and another one for the edges.
For parsing XML files I used BeautifulSoup (https://beautiful-soup-4.readthedocs.io/en/latest/#), that is a Python library for pulling data out of HTML and XML files, and the Python’s lxml parser.
As first thing, after having imported BeautifulSoup, the function reads the input XML file like a regular file and passes the content into the imported BeautifulSoup library as well as the parser of choice.
from bs4 import BeautifulSoup
def create_networks(xml_file, nodes_sheet, edges_sheet):
# Reading the data inside the xml file to a variable under the name data
with open(xml_file, "r", encoding="utf-8") as tei:
data = BeautifulSoup(tei, features='lxml')
Then I have used the same process below for all the three classes "Person", "Place" and "Object/Property".
I have used the find_all()
method, that looks through a tag’s descendants and retrieves all descendants that match the filters used as arguments, passing as input of the method the tags' names.
#CLASS PERSON
#creation of a list with all the tags 'persname' of the original tei file
pers_list=[]
for paragraph in data.find_all('p'):
pers = paragraph.find_all('persname')
for x in pers:
pers_list.append(x)
#list with all the values of the attribute 'ref' of the tags 'persname'
ref_pers_list = []
for x in pers_list:
ref_pers_list.append(x.get("ref"))
#dictionary with as keys the 'ref' of the persname tags and as values the contents of the persname tags
pers_dict = dict()
for person in data.find_all('person'):
pers_ref_from_id = '#'+person.get('xml:id')
pers_label = person.persname.text
pers_dict[pers_ref_from_id]=pers_label
#list with just the contents of the persname tags
pers_label_list=[]
for ref in ref_pers_list:
if ref in pers_dict:
pers_label_list.append(pers_dict[ref])
#list of tuples with the values ('ref', 'label', 'type') of each person
tuples_nodes_pers=[]
for item in pers_list:
tuples_nodes_pers.append((item.get('ref'), pers_dict[(item.get('ref'))], "person"))
After having repeated the same steps for the places and the objects, I obtained three list of tuples (tuples_nodes_pers
, tuples_nodes_place
, tuples_nodes_prop
) one for each class, containing the values 'ref', 'label' and 'type' of each Person, Place and Object, in the form: ('#Ignazio1', 'Ignazio Florio', 'person').
These lists of tuples have been used to write the nodes' csv files, as shown here below.
I've used the function writer
of the module csv, that returns a writer object responsible for converting the user’s data into delimited strings on the given file-like object.
#CSV - NODES_SHEET
import csv
with open(nodes_sheet, 'w', encoding='UTF-8', newline='') as file:
writer = csv.writer(file)
writer.writerow(["Id", "Label", "Type"])
for item_1 in tuples_nodes_pers:
writer.writerow(item_1)
for item_2 in tuples_nodes_place:
writer.writerow(item_2)
for item_3 in tuples_nodes_prop:
writer.writerow(item_3)
The process to create the csv files containing the networks' edges has been a bit longer. The code below ends with the creation of the list tpl_lst1
, which contains tuples, each of which is a link between two nodes of a network.
#creation of a list with all the tags 'persname', 'placename' and 'objectname' of the original tei file
#that appear in the same paragraph
p_list=[]
for p in data.find_all('p'):
persons = p.find_all('persname')
places = p.find_all('placename')
objects = p.find_all('objectname')
if persons or places or objects:
p_list.append([persons + places + objects])
#using list comprehension
#Removing empty lists from p_list, which is a list of lists
N = []
p_list = [[ele for ele in sub if ele != N] for sub in p_list]
#replacement of the tags with their corresponding ref values
for x in p_list:
for y in x:
i = 0
while i < len(y):
y[i] = y[i].get("ref")
i += 1
#creation of a list with tuples containing the ref values of all the nodes that appear in the same paragraph
tuples_edges=[]
for x in p_list:
for y in x:
if len(y) > 0:
tuples_edges.append(tuple(y))
#Create a list of tuples associating two by two the items of each tuple in tuples_edges together
tpl_lst1 = list()
for paragraph in tuples_edges:
for i in range(len(paragraph)):
tpl1 = tuple((paragraph[0], paragraph[i]))
tpl_lst1.append(tpl1)
The tpl_lst1
has been necessary to write the csv files containing the edges between the nodes, as shown in the next cell:
#CSV - EDGES_SHEET
import csv
with open(edges_sheet, 'w', encoding='UTF-8', newline='') as file:
writer = csv.writer(file)
writer.writerow(["Source", "Target"])
for item in tpl_lst1:
writer.writerow(item)
The python script with the function "create_networks" can be seen here.
The obtained csv files can be checked at the following links:
Then the CSV files have been passed to Gephi for the visualisation of the two networks, following the tutorial at this link: https://gephi.org/users/quick-start/.
import networkx as nx
import matplotlib.pyplot as plt
import numpy as np
#Having used the network Power Grid.gml, included with Gephi, I exported the graph as graphml and
#read it with networkx
G1 = nx.read_graphml('Ignazio_Paolo.graphml')
#I've removed from the network selfloop edges, that is, edges with the same node at both ends
G1.remove_nodes_from(["Source", "Target"])
G1.remove_edges_from(nx.selfloop_edges(G1))
pos = nx.spring_layout(G1)
plt.figure(figsize=(20,20))
node_size=50
nx.draw_networkx(G1, pos=pos, with_labels=False, node_size=node_size)
Franco Moretti, in the Stanford Lit Lab Pamphlet, Network theory, plot analysis, wrote "when discussing this figure (the protagonist), literary theory usually turns to concepts of ‘consciousness’ and ‘interiority’[...]. When a group of researchers applied network theory to the Marvel comics series, however, their view of the protagonist made no reference to interiority; the protagonist was simply ‘the character that minimized the sum of the distances to all other vertices’; in other words, the centre of the network".
So I decided to perform some centrality measures in order to analyze the main nodes of the network from different perspectives.
Underlying the fact that the degree of a node in an undirected graph is simply the number of neighbors it has, the degree centrality is based on the assumption that important nodes have many connections.
degreeCent = nx.degree_centrality(G1)
node_color = [20000.0 * G1.degree(v) for v in G1]
node_size = [v * 2000 for v in degreeCent.values()]
plt.figure(figsize=(40, 40))
nx.draw_networkx(G1, pos=pos, with_labels=False,
node_color=node_color,
node_size=node_size)
plt.axis('off')
sorted(degreeCent, key=degreeCent.get, reverse=True)[:5]
['#Ignazio1', '#Vincenzo1', '#Paolo', '#Palermo', '#mogliePaolo']
The degree distribution of a graph is the probability distribution of the degrees over the entire network.
degree_sequence = sorted((d for n, d in G1.degree()), reverse=True)
dmax = max(degree_sequence)
fig = plt.figure("Degree of Ignazio_Paolo graph", figsize=(8, 8))
# Create a gridspec for adding subplots of different sizes
axgrid = fig.add_gridspec(5, 4)
#1. The subgraph of connected components
ax0 = fig.add_subplot(axgrid[0:3, :])
Gcc = G1.subgraph(sorted(nx.connected_components(G1), key=len, reverse=True)[0])
pos = nx.spring_layout(Gcc, seed=10396953)
nx.draw_networkx_nodes(Gcc, pos, ax=ax0, node_size=20)
nx.draw_networkx_edges(Gcc, pos, ax=ax0, alpha=0.4)
ax0.set_title("Connected components of G")
ax0.set_axis_off()
#2. The degree-rank plot for the Graph
ax1 = fig.add_subplot(axgrid[3:, :2])
ax1.plot(degree_sequence, "b-", marker="o")
ax1.set_title("Degree Rank Plot")
ax1.set_ylabel("Degree")
ax1.set_xlabel("Rank")
#3. The degree histogram
ax2 = fig.add_subplot(axgrid[3:, 2:])
ax2.bar(*np.unique(degree_sequence, return_counts=True))
ax2.set_title("Degree histogram")
ax2.set_xlabel("Degree")
ax2.set_ylabel("# of Nodes")
fig.tight_layout()
plt.show()
This is based on the assumption that important nodes are close to other nodes. It is calculated as the sum of the path lengths from the given node to all other nodes.
closeCent = nx.closeness_centrality(G1)
node_color = [20000.0 * G1.degree(v) for v in G1]
node_size = [v * 100 for v in closeCent.values()]
plt.figure(figsize=(40, 40))
nx.draw_networkx(G1, with_labels=False,
node_color=node_color,
node_size=node_size)
plt.axis('off')
sorted(closeCent, key=closeCent.get, reverse=True)[:5]
['#Ignazio1', '#Vincenzo1', '#Paolo', '#Palermo', '#mogliePaolo']
It assumes that important nodes connect other nodes.
betCent = nx.betweenness_centrality(G1)
node_color = [20000.0 * G1.degree(v) for v in G1]
node_size = [v * 10000 for v in betCent.values()]
plt.figure(figsize=(40, 40))
nx.draw_networkx(G1, with_labels=False,
node_color=node_color,
node_size=node_size)
plt.axis('off')
sorted(betCent, key=betCent.get, reverse=True)[:5]
['#Ignazio1', '#Vincenzo1', '#Paolo', '#inglesi', '#Palermo']
#dictionary with all the results for each node of all the centrality measures applied
final_dict= dict()
for x in betCent.keys():
new_dic=dict()
new_dic['betCent']= betCent[x]
new_dic['degreeCent']=degreeCent[x]
new_dic['closeCent']=closeCent[x]
final_dict[x]=new_dic
import pandas as pd
df = pd.DataFrame(final_dict)
df = df.transpose()
df
betCent | degreeCent | closeCent | |
---|---|---|---|
#Ignazio1 | 0.326797 | 0.476821 | 0.627200 |
#nipoteIgn | 0.044634 | 0.125828 | 0.462586 |
#Paolo | 0.153740 | 0.291391 | 0.535532 |
#Vincenzo1 | 0.209453 | 0.357616 | 0.572998 |
#mogliePaolo | 0.064645 | 0.185430 | 0.480132 |
... | ... | ... | ... |
#zolfo | 0.000000 | 0.006623 | 0.359789 |
#tritacortice | 0.000000 | 0.013245 | 0.394443 |
#china | 0.000857 | 0.026490 | 0.409525 |
#Assunta | 0.000000 | 0.006623 | 0.380433 |
#tonnaraArenella | 0.000000 | 0.019868 | 0.396691 |
152 rows × 3 columns
import matplotlib.ticker
plt.rcParams["figure.figsize"] = [7, 7]
plt.rcParams["figure.autolayout"] = True
fig, ax = plt.subplots()
df['degreeCent'].value_counts().plot( kind='kde', ax=ax, logx=True, bw_method=0.05)
df['closeCent'].plot( kind='kde', ax=ax, logx=True, color='red')
df['betCent'].plot(kind='kde', ax=ax, logx= True, color = 'green')
plt.show()
C:\Users\marta\AppData\Local\Programs\Python\Python39\lib\site-packages\IPython\core\pylabtools.py:151: UserWarning: This figure includes Axes that are not compatible with tight_layout, so results might be incorrect. fig.canvas.print_figure(bytes_io, **kw)
This graph shows the probability density functions obtained through the Pandas method 'kde', passing as input all the results obtained for each of the three centrality measures applied in my investigation. In particular, pandas.DataFrame.plot.kde generate a Kernel Density Estimate plot using Gaussian kernels. In statistics, kernel density estimation (KDE) is a non-parametric way to estimate the probability density function (PDF) of a random variable (as we can consider each of the executed centrality measures).
In particular:
Analysing the graph obtained from the centrality measures applied to the Ignazio and Paolo Florio's Network, we can infer some information:
The ability to detect and study communities is central in network analysis. In order to find communities in my networks, I have exploited the Louvain algorithm, available in Gephi. It is an unsupervised algorithm, it means that it does not require the input of the number of communities nor their sizes before execution; it is divided in two phases: Modularity Optimization and Community Aggregation. Both the steps are executed until there are no more changes in the network and maximum modularity is achieved.
Modularity is a scale value between −0.5 (non-modular clustering) and 1 (fully modular clustering) that measures the relative density of edges inside communities with respect to edges outside communities. So each resulting community will be formed by nodes that are more densely connected together than to the rest of the network.
communities=set()
for u in G1.nodes():
communities.add(G1.nodes[u]['Modularity Class'])
communities
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}
list0, list1, list2, list3, list4, list5, list6, list7, list8, list9, list10, list11 = ([] for i in range(len(communities)))
comm_dic= dict()
for u in G1.nodes():
if G1.nodes[u]['Modularity Class']== 0:
list0.append(G1.nodes[u]['label'])
comm_dic[0]=list0
if G1.nodes[u]['Modularity Class']== 1:
list1.append(G1.nodes[u]['label'])
comm_dic[1]=list1
if G1.nodes[u]['Modularity Class']== 2:
list2.append(G1.nodes[u]['label'])
comm_dic[2]=list2
if G1.nodes[u]['Modularity Class']== 3:
list3.append(G1.nodes[u]['label'])
comm_dic[3]=list3
if G1.nodes[u]['Modularity Class']== 4:
list4.append(G1.nodes[u]['label'])
comm_dic[4]=list4
if G1.nodes[u]['Modularity Class']== 5:
list5.append(G1.nodes[u]['label'])
comm_dic[5]=list5
if G1.nodes[u]['Modularity Class']== 6:
list6.append(G1.nodes[u]['label'])
comm_dic[6]=list6
if G1.nodes[u]['Modularity Class']== 7:
list7.append(G1.nodes[u]['label'])
comm_dic[7]=list7
if G1.nodes[u]['Modularity Class']== 8:
list8.append(G1.nodes[u]['label'])
comm_dic[8]=list8
if G1.nodes[u]['Modularity Class']== 9:
list9.append(G1.nodes[u]['label'])
comm_dic[9]=list9
if G1.nodes[u]['Modularity Class']== 10:
list10.append(G1.nodes[u]['label'])
comm_dic[10]=list10
if G1.nodes[u]['Modularity Class']== 11:
list11.append(G1.nodes[u]['label'])
comm_dic[11]=list11
comm_dic
{0: ['Ignazio Florio', 'Michele (garzone)', 'aristocratica (cliente)', 'Maurizio Reggio', 'mastro Salvatore (spedizionista)', 'notaio Leone', 'Salvatore Burgarello', 'Vincenzo Mazza', 'Pietro Ugo delle Favare', 'Francesco (capo dei commessi)', 'Ignazio Messina', 'via dei Materassai', 'mandamento di Castellammare', 'magazzino via dei Materassai', 'magazzino palazzo Steri', 'ufficio commerciale dei Florio', 'drogheria Florio', 'china in polvere', 'schooner "Assunta"', "tonnara dell'Arenella"], 1: ['Vincenzo Florio', 'Raffaele Barbaro', 'i francesi', 'Benjamin Ingham', 'don Sorce', 'Margherita Conticello', 'Abraham Gibbs', 'John Woodhouse', 'James Hopps', 'Joseph Whitaker', 'Isabella Pillitteri', 'baronessa Pillitteri', 'doganiere', 'Sicilia', 'via della Tavola Tonda', 'Marsiglia', "piazzetta di Sant'Eligio", 'Leeds', 'Londra', 'sommacco', 'zolfo', 'macchina trita cortice'], 2: ['Paolo Florio', 'Rosa Bellantoni', 'Giovanna Saffiotti', 'Vincenzo Saffiotti', 'donna povera (cliente)', 'Curatolo', 'Peppino', 'Antonino Gagliano', 'Calabria', 'Palazzo Steri', "piazza Sant'Oliva", 'fede nuziale di Rosa Bellantoni', 'San Francesco di Paola'], 3: ['bagnaroti', 'Cala di Palermo', 'porta Doganella', 'porta Calcina', 'porta Carbone', 'chiesa di Santa Maria di Porto Salvo', 'chiesa di San Mamiliano', "chiesa dell'Annunziata", 'chiesa di San Giorgio dei Genovesi', 'chiesa di Santa Maria di Piedigrotta', 'Castello a Mare', 'Lazzaretto', 'piano San Giacomo', 'chiesa di Piedigrotta', 'via degli Argentieri', 'Scialle di Giuseppina'], 4: ['Vittoria', 'Giuseppina Saffiotti Florio', 'Francesco', 'Mattia Florio Barbaro', 'Paolo Barbaro', 'Anna Barbaro', 'Emiddio Barbaro', 'mastro Filippo', 'Mariuccia Colosimo', 'Rosa (domestica)', 'Orsola (donna di servizio)', 'cerusico Caruso', 'Giuseppe Barbaro', 'Olimpia', 'Pietro Spoliti', 'Marianna (cuoca)', 'notaio Serretta', 'Bagnara Calabra', 'Contrada Pietraliscia', 'chiesa di Santa Maria la Nova', 'chiesa di San Giacomo', 'via San Sebastiano', 'Inghilterra', 'India', 'Cina', 'Mar Tirreno', 'campagna', 'Mistretta', 'casa Florio Bagnara', 'Aromateria'], 5: ['Don Bottari', 'Gnazì Canzoneri', 'figlia Canzoneri', 'Carmelo Saguto', 'Gulì', 'Mimmo Russello', 'Vincenzo Romano', 'Giuseppe Pajno', 'Guglielmo Li Vigni', 'Niccolò Raffo', 'Venanzio Canzoneri', 'Gaspare Pizzimenti'], 6: ['Contrada Granaro'], 7: ['Capo Marturano'], 8: ['barone (cliente)', 'i napoletani', 'i siciliani', 'Salvatore Leone', 'Palermo', 'Messina', 'Perù', 'Marsala', 'Licata', 'Canicattì', 'Alcamo', 'Girgenti', 'casa Florio-via dei Materassai(Palermo)', 'cortice'], 9: ['monte Pellegrino'], 10: ['Luigi XVI di Borbone', 'Maria Antonietta', 'Ferdinando di Borbone-Due Sicilie', "Maria Carolina d'Asburgo-Lorena", 'gli inglesi', 'Napoleone', 'Regno di Napoli', 'Francia', 'Mar Mediterraneo', 'Napoli', 'Venezia', 'Livorno', 'Genova', "chiesa di Sant'Andrea degli Amalfitani", 'Spagna', 'Malta', 'Turchia', 'Egitto', 'Tunisi'], 11: ['vicolo della Neve', "via dell'Alloro", 'strada dei Zagarellai']}
Examining the the obtained communities, I can infer something about the main characters of the first part of the novel, that is, the main nodes of the first network:
I have then repeated all the steps executed before also for the analysis of the second network, the one generated from the chapters "Zolfo"-"Epilogo".
import networkx as nx
import matplotlib.pyplot as plt
import numpy as np
#Having used the network Power Grid.gml, included with Gephi, I exported the graph as graphml and
#read it with networkx
G2 = nx.read_graphml('Vincenzo.graphml')
#I've removed from the network selfloop edges, that is, edges with the same node at both ends
G2.remove_nodes_from(["Source", "Target"])
G2.remove_edges_from(nx.selfloop_edges(G2))
pos = nx.spring_layout(G2)
plt.figure(figsize=(20,20))
node_size=50
nx.draw_networkx(G2, pos=pos, with_labels=False, node_size=node_size)
degreeCent = nx.degree_centrality(G2)
node_color = [20000.0 * G2.degree(v) for v in G2]
node_size = [v * 2000 for v in degreeCent.values()]
plt.figure(figsize=(40, 40))
nx.draw_networkx(G2, pos=pos, with_labels=False,
node_color=node_color,
node_size=node_size)
plt.axis('off')
sorted(degreeCent, key=degreeCent.get, reverse=True)[:5]
['#Vincenzo1', '#Giulia', '#Ignazio2', '#Palermo', '#mogliePaolo']
degree_sequence = sorted((d for n, d in G2.degree()), reverse=True)
dmax = max(degree_sequence)
fig = plt.figure("Degree of Vincenzo graph", figsize=(8, 8))
# Create a gridspec for adding subplots of different sizes
axgrid = fig.add_gridspec(5, 4)
#1. The subgraph of connected components
ax0 = fig.add_subplot(axgrid[0:3, :])
Gcc = G2.subgraph(sorted(nx.connected_components(G2), key=len, reverse=True)[0])
pos = nx.spring_layout(Gcc, seed=10396953)
nx.draw_networkx_nodes(Gcc, pos, ax=ax0, node_size=20)
nx.draw_networkx_edges(Gcc, pos, ax=ax0, alpha=0.4)
ax0.set_title("Connected components of G")
ax0.set_axis_off()
#2. The degree-rank plot for the Graph
ax1 = fig.add_subplot(axgrid[3:, :2])
ax1.plot(degree_sequence, "b-", marker="o")
ax1.set_title("Degree Rank Plot")
ax1.set_ylabel("Degree")
ax1.set_xlabel("Rank")
#3. The degree histogram
ax2 = fig.add_subplot(axgrid[3:, 2:])
ax2.bar(*np.unique(degree_sequence, return_counts=True))
ax2.set_title("Degree histogram")
ax2.set_xlabel("Degree")
ax2.set_ylabel("# of Nodes")
fig.tight_layout()
plt.show()
closeCent = nx.closeness_centrality(G2)
node_color = [20000.0 * G2.degree(v) for v in G2]
node_size = [v * 1000 for v in closeCent.values()]
plt.figure(figsize=(40, 40))
nx.draw_networkx(G2, with_labels=False,
node_color=node_color,
node_size=node_size)
plt.axis('off')
sorted(closeCent, key=closeCent.get, reverse=True)[:5]
['#Vincenzo1', '#Giulia', '#Palermo', '#Ignazio2', '#GP']
betCent = nx.betweenness_centrality(G2)
node_color = [20000.0 * G2.degree(v) for v in G2]
node_size = [v * 10000 for v in betCent.values()]
plt.figure(figsize=(40, 40))
nx.draw_networkx(G2, with_labels=False,
node_color=node_color,
node_size=node_size)
plt.axis('off')
sorted(betCent, key=betCent.get, reverse=True)[:5]
['#Vincenzo1', '#Giulia', '#Palermo', '#Carlo', '#Borbone']
#dictionary with all the results for each node of all the centrality measures applied
final_dict= dict()
for x in betCent.keys():
new_dic=dict()
new_dic['betCent']= betCent[x]
new_dic['degreeCent']=degreeCent[x]
new_dic['closeCent']=closeCent[x]
final_dict[x]=new_dic
import pandas as pd
df = pd.DataFrame(final_dict)
df = df.transpose()
df
betCent | degreeCent | closeCent | |
---|---|---|---|
#mogliePaolo | 0.044803 | 0.130890 | 0.458034 |
#Vincenzo1 | 0.666127 | 0.607330 | 0.689531 |
#Olimpia | 0.000000 | 0.010471 | 0.416122 |
#IgnazioMessina | 0.005299 | 0.047120 | 0.444186 |
#pirati | 0.004735 | 0.015707 | 0.411638 |
... | ... | ... | ... |
#mulino | 0.000000 | 0.010471 | 0.423503 |
#corriere | 0.000000 | 0.005236 | 0.408994 |
#privativa | 0.000279 | 0.026178 | 0.432127 |
#istitutoCredito | 0.000000 | 0.010471 | 0.409871 |
#villaOlivuzza | 0.000000 | 0.010471 | 0.416122 |
192 rows × 3 columns
import matplotlib.ticker
plt.rcParams["figure.figsize"] = [7, 7]
plt.rcParams["figure.autolayout"] = True
fig, ax = plt.subplots()
df['degreeCent'].value_counts().plot( kind='kde', ax=ax, logx=True, bw_method=0.05)
df['closeCent'].plot( kind='kde', ax=ax, logx=True, color='red')
df['betCent'].plot(kind='kde', ax=ax, logx= True, color = 'green')
plt.show()
C:\Users\marta\AppData\Local\Programs\Python\Python39\lib\site-packages\IPython\core\pylabtools.py:151: UserWarning: This figure includes Axes that are not compatible with tight_layout, so results might be incorrect. fig.canvas.print_figure(bytes_io, **kw)
Also in this case:
Analysing the graph obtained from the centrality measures applied to the Vincenzo Florio's Network, we can infer some information:
communities=set()
for u in G2.nodes():
communities.add(G2.nodes[u]['Modularity Class'])
communities
{0, 1, 2, 3, 4, 5, 6, 7}
list0, list1, list2, list3, list4, list5, list6, list7 = ([] for i in range(len(communities)))
comm_dic= dict()
for u in G2.nodes():
if G2.nodes[u]['Modularity Class']== 0:
list0.append(G2.nodes[u]['label'])
comm_dic[0]=list0
if G2.nodes[u]['Modularity Class']== 1:
list1.append(G2.nodes[u]['label'])
comm_dic[1]=list1
if G2.nodes[u]['Modularity Class']== 2:
list2.append(G2.nodes[u]['label'])
comm_dic[2]=list2
if G2.nodes[u]['Modularity Class']== 3:
list3.append(G2.nodes[u]['label'])
comm_dic[3]=list3
if G2.nodes[u]['Modularity Class']== 4:
list4.append(G2.nodes[u]['label'])
comm_dic[4]=list4
if G2.nodes[u]['Modularity Class']== 5:
list5.append(G2.nodes[u]['label'])
comm_dic[5]=list5
if G2.nodes[u]['Modularity Class']== 6:
list6.append(G2.nodes[u]['label'])
comm_dic[6]=list6
if G2.nodes[u]['Modularity Class']== 7:
list7.append(G2.nodes[u]['label'])
comm_dic[7]=list7
comm_dic
{0: ['Vincenzo Florio', 'Ignazio Messina', 'pirati', 'comandante Miloro', 'gli americani', 'Benjamin Ingham', 'i palermitani', 'Mercurio Nasca di Montemaggiore', 'servitore', 'factotum', 'notaio Michele Tamajo', 'Giuseppe Calabrese', 'i napoletani', 'Tommaso Portalupi', 'Giovanni Portalupi', 'Giulia Portalupi', 'baronessa Pillitteri', 'barone Morillo', 'Antonia Portalupi', 'Antonietta', 'duchessa Alessandra Spadafora', 'John Woodhouse', 'Spilateri', 'principe di Castelforte', 'priore di San Martino delle Scale', 'Carlo Giachery', 'Luigi Giachery', 'notaio Avellone', 'duca di Cumia', 'prete battesimo', 'Carmelo Caratozzolo', 'Vito', 'notaio Caldara', 'Pallavicini', 'Vito Cordova', 'Saro Ernandez', 'fratelli Sgroi', 'Carlo Filangeri principe di Satriano', 'Ferdinando II', 'i siciliani', 'I Borbone', 'gli austriaci', 'Giovanni Caruso', 'Luigi Nicola de Majo', 'Michele', 'Giuseppe La Masa', 'Rosolino Pilo', 'Ruggero Settimo', 'Pasquale Calvi', 'Nicolò Turrisi Colonna', 'barone Pietro Riso', 'Pietro Rossi', 'Vincenzo Caruso', 'Vincenzo Cassisi', 'Sebastiano Camarrone', 'i Savoia', 'Francesco Merle', 'Giuseppe Garibaldi', 'Vittorio Emanuele II', 'Francecso Crispi', 'notaio Quattrocchi', 'Ignazio Florio jr', 'Cala di Palermo', 'Brasile', 'Stati Uniti', 'Palermo', 'Palazzo Steri', 'via della Zecca Regia', "via dell'Alloro", 'Milano', 'Lombardia', 'Sicilia', 'teatro Carolino', 'Piemonte', 'casa dei Portalupi', 'Cassaro', 'mandamento di Castellammare', 'via della Zecca Regia', 'via dei Chiavettieri', 'Genova', 'Roma', 'Parigi', 'Veneto', 'Europa', 'via degli Argentieri', 'Monreale', 'America', 'via Bambinai', 'porta di San Giorgio', 'Palazzo dei Normanni', 'Palazzo delle Finanze', 'Noviziato', 'Porta Felice', 'Caltanissetta', 'Palazzo di Città', 'Mar Mediterraneo', 'Glasgow', 'quartiere Bocccadifalco', 'convento della Gancia', 'Toscana', 'Emilia', 'Alcamo', 'Partinico', 'drogheria Florio', 'nave "Anna"', "tonnara dell'Arenella", 'Casa Commerciale Florio', 'marsala (vino)', 'zolfo', 'ufficio commerciale dei Florio', 'terreno di zolfo', 'appartamento di Giulia', 'villa a San Lorenzo', 'casa emergenza colera', 'Società dei battelli a vapore siciliani', 'tonnara di Favignana', 'Palazzina dei Quattro Pizzi', 'Fonderia Oretea', 'piroscafo "Palermo"', 'piroscafo "Indépendant"', 'compagnia di navigazione "Ignazio e Vincenzo Florio"', 'Banco Regio', 'casa padronale-cantina Marsala', 'mulino per il sommacco', 'il Corriere siciliano', 'privativa del servizio postale', 'Istituto di Credito per gli affari commerciali in Sicilia'], 1: ['i calabresi', 'chiesa di San Giovanni dei Napoletani'], 2: ['Raffaele Barbaro', 'gli inglesi', 'Marsala', 'isole Egadi', 'Francia', 'terreno per costruire una cantina', 'cantina Florio', 'società tra Raffaele Barbaro e la Ignazio e Vincenzo Florio'], 3: ['Giuseppina Saffiotti Florio', 'Olimpia', 'Ignazio Florio', 'le cameriere', 'Isabella Pillitteri', 'Mattia Florio Barbaro', 'Paolo Florio', 'Paolo Barbaro', 'don Saverio', 'Ninetta', 'Luisa', 'via dei Materassai', 'Bagnara Calabra', 'piano San Giacomo', 'Calabria', 'fede nuziale di Rosa Bellantoni', "tonnara di San Nicolò l'Arena", 'tonnara di Vergine Maria', "tonnara dell'isola delle Femmine", 'casa in Via dei Materassai 53', 'Scialle di Giuseppina'], 4: ['Francesco di Giorgio', 'Lorenzo Lugaro', 'sommacco', 'magazzino via dei Materassai'], 5: ['Angelina', 'Giuseppina', 'Lucia (domestica)', 'Ignazio Florio (figlio di Vincenzo)', 'Mademoiselle Brigitte', 'Luigi De Pace', 'madre di Luigi De Pace', 'i Trigona', "Giovanna d'Ondes", 'Vincenzo Florio (figlio di Ignazio)', 'Marsiglia', 'Favignana', 'ufficio a Piano San Giacomo', "villino Florio all'Olivuzza"], 6: ['Augusto Merle', "Romualdo Trigona di Sant'Elia", 'Giuseppe Lanza di Trabia', 'Gabriele Chiaramonte Bordonaro', 'Stefania Branciforte', 'Laura Naselli', 'Joseph Whitaker', 'Salvatore De Pace', 'Sophia', 'Willie Whitaker', 'Napoli'], 7: ['i garibaldini', 'Porta Termini', 'via Maqueda', 'palazzo Ajutamicristo', 'chiostro della Magione']}
I have deduced some information also by the analysis of this second network:
The main question that, after the network analysis, came to my mind is: "Has been this entire study useful for answering to my initial research questions?".
Undoubtedly yes, thanks to this project I have been able to see in a more clear way the organization of the novel's plot, the real importance (or better, centrality) of the main characters and I got also an easily understandable view of the intricated relationships between people and places. According to Moretti in Graph, Maps and Trees, the quantitative approach to literature allows you to grasp the entire system as a whole.
Just a deep reading of the novel makes it possible to understand the system in its details.
For instance, the secret love between Ignazio Florio, Paolo's brother, and Giuseppina Florio, Paolo's wife, has been a silenced constant in the first part of "The Lions of Sicily", it came to the light in the actions of the two characters just one time in the entire novel: not enough to create a strong and significant link between the characters' nodes in the network.