Graph Components, Centrality, and Link Analysis
The first post on this blog introduced a graph with all the verses in the standard works connected by their annotated cross-references. As I mentioned there, the graph contains 41 995 verses and 45 985 cross-references. However, the graph is composed of many disconnected components: most verses are impossible to reach by following cross-references from any given starting verse.
Graph Components
Consider the group of nodes shown here:
Most of the edges are reciprocal (that is, there are edges pointing in both directions). However, there are no edges connecting to the larger graph—this is the complete set of connections for these verses, and no verses in the standard works outside this group reference any of the verses in this group. In graph theory terminology, this is a weakly connected component that is isolated from all other components in the graph.
The full cross-reference graph is actually mostly composed of disconnected graphs like this one. In fact, more than 50% of verses are singletons that have no incoming or outgoing edges—they do not reference any verses and they are not referenced by any other verses. Here’s a histogram showing the full distribution of component sizes (note that both axes use a logarithmic scale):
The frequency of components with a given size drops rapidly as the size increases. There are 22,243 singletons, 1159 two-node components, 342 three-node components, and so on. The bar on the far right is the one we’re most interested in; it represents a single component containing 15,004 nodes and 40,011 edges. This component contains 36% of the verses in the standard works (76% of the verses that have any incoming or outgoing references) and 87% of the cross-references.
Since this component is by far the largest and most interesting subgraph, subsequent analysis of the cross-reference graph is focused only on this subgraph.
Which nodes are most important?
One way to think about the cross-reference graph is as an information network. Verses contain information—explanations of doctrine, accounts of historical events, prophecies, etc.—and cross-references are a mechanism for propagating that information throughout the scriptures. If many authors cite the same verse, we might assume that verse is more “important” than a verse that is rarely referenced.
Before going further, I should clarify that we are talking about “importance” in the graph theoretical sense. We certainly hope that the graph structure approximates the doctrinal basis of the scriptures and highlights important topics, events, and teachings (much like the scripture mastery program, which has manually chosen a set of important nodes according to those criteria). However, this isn’t guaranteed; sparsity and bias in the existing references could leave us with significant gaps that future work to propose new cross-references could help to address.
Centrality
There are many ways of assessing the importance of nodes in graphs, variously referred to as centrality metrics. We have already explored one of these: degree centrality, which simply counts the number of incoming references (technically this is “in-degree” centrality). More advanced methods will also consider the importance of the referrers—a node can be important while having few references if those references come from other important nodes. One method in this class is PageRank, which is used by Google to rank web pages for serving search results.
The table below shows the most important verses as measured by degree and PageRank centrality. While many of the same verses are ranked highly by both methods, there are notable differences. The rank correlation (Kendall’s 𝜏) over all the nodes in the graph is 0.73 (with perfect correlation being 1.0).
Verse | Rank (Degree) | Rank (PageRank) |
---|---|---|
D&C 1:38 | 1 | 1 |
D&C 17:1 | 1 | 7 |
1 Ne. 17:35 | 3 | 2 |
Hel. 12:3 | 3 | 4 |
D&C 1:14 | 5 | 6 |
Moses 6:57 | 6 | 9 |
2 Ne. 25:20 | 6 | 18 |
1 Ne. 19:10 | 6 | 27 |
D&C 88:63 | 9 | 8 |
D&C 1:16 | 9 | 17 |
2 Ne. 9:28 | 11 | 10 |
Mosiah 4:26 | 22 | 3 |
2 Ne. 9:37 | 112 | 5 |
The disparity in ranks for 2 Ne. 9:37 is quite remarkable. Although it has only 12 incoming references, it is ranked above 2 Ne. 9:28 (with 20 incoming references) by PageRank. This can be explained by noting that the PageRank algorithm divides the influence of each node across all of its outgoing connections. I like the example given by Mark Newman:
For instance, websites like Amazon or eBay link to the web pages of thousands of manufacturers and sellers; if I’m selling something on Amazon it might link to me. Amazon is an important website, and would have high centrality by any sensible measure, but should I therefore be considered important by association? Most people would say not: I am only one of many that Amazon links to and its contribution to the centrality of my page will get diluted as a result. (Newman, Mark. Networks. Oxford University Press, 2018. p. 165.)
Similarly, we wouldn’t expect that verses referenced by a verse with many other connections would derive much benefit from that association. In the cross-reference graph, 2 Ne. 9:28 is referenced by 2 Ne. 26:20. Since 2 Ne. 26:20 is highly ranked (146th), we might expect it to contribute significantly to the score of 2 Ne. 9:28. However, 2 Ne. 26:20 also references 16 other verses and therefore contributes much less than we would otherwise expect. (This feature of PageRank will be especially important as we begin proposing new connections.)
Hubs and Authorities
Another useful link analysis algorithm is hyperlink-induced topic search (HITS). This algorithm introduces the concepts of hubs and authorities; a hub is a node that points to many authoritative sources (like an entry in the Topical Guide), while an authority is considered a source of truth. In practice, these definitions are circular—authorities are recognized because they are referenced by important hubs, and hubs emerge by pointing to many authorities—and the algorithm iterates until it finds a self-consistent solution. Note that these labels are not mutually exclusive; it’s possible for a node to be both a hub and an authority. Additionally, HITS does not compensate for edge counts like PageRank does.
Running HITS on the cross-reference graph identifies a set of hubs and authorities that we can compare to the nodes identified by degree centrality and PageRank:
Hubs | Authorities | |
---|---|---|
1 Ne. 19:10 | Mosiah 7:19 | |
2 Ne. 26:12 | 1 Ne. 19:10 | |
D&C 19:27 | 2 Ne. 25:20 | |
Mosiah 7:19 | Mosiah 7:27 | |
1 Ne. 13:42 | 2 Ne. 26:12 | |
2 Ne. 19:6 | Alma 11:39 | |
D&C 8:3 | 2 Ne. 10:3 | |
D&C 18:26 | 3 Ne. 11:14 | |
Moro. 7:22 | D&C 19:27 | |
D&C 18:6 | 1 Ne. 13:42 |
Several of these verses are considered to be both hubs and authorities: 1 Ne. 19:10, 2 Ne. 26:12, D&C 19:27, Mosiah 7: 19, and 1 Ne. 13:42. Surprisingly, only two verses are repeated from the earlier centrality analyses: 1 Ne. 19:10 and 2 Ne. 25: 20.
Thus we see
There are definitely some strong themes in the verses we’ve highlighted in this post. First of all, it’s clear that the Book of Mormon and Doctrine and Covenants make up the majority of the structurally important nodes in the cross-reference graph; this isn’t surprising, considering the role of those volumes in the restoration of the gospel of Jesus Christ and the likely emphases of the people who compiled the original references (the Old and New Testaments also have lower relative reference counts than restoration scriptures). Despite the systematic differences between these volumes (e.g. due to how they were created and compiled), I hope that future work to propose new cross-references can help to reduce this bias.
Returning to the nodes we’ve identified, many of these verses refer to the role of prophets and the mission of Jesus Christ. There is repeated emphasis on the exodus of Israel from Egypt and the title of Jehovah as the God of Israel. Notably, one of the authorities is 3 Ne. 11:14 , where the Lord extends an invitation:
Arise and come forth unto me, that ye may thrust your hands into my side, and also that ye may feel the prints of the nails in my hands and in my feet, that ye may know that I am the God of Israel, and the God of the whole earth, and have been slain for the sins of the world.
I’m delighted to see such fundamental doctrines highlighted by our analysis of the cross-reference graph, and I hope that additional interesting and insightful patterns will emerge as we continue digging.
The code used for the analysis and figures in this post is available on GitHub. Additional images were generated with Cytoscape.
Updates:
- January 4, 2021: Updated the reference parsing code to handle multiple verses in the same chapter (e.g. “1 Ne. 3:7, 10” now creates edges to 1 Ne. 3:7 and 1 Ne. 3:10). This changed the edge count from 45 786 to 45 985 and resulted in multiple changes to the set and order of nodes chosen by the various node ranking algorithms.
© Copyright 2020–2022 Steven Kearnes. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.