[Back to Keyring Analysis Page]

Explanation of this Keyring Analysis

This analysis inspired by the original work of Neal McBurnett, whose work can be found at http://bcn.boulder.co.us/~neal/pgpstat/. Neal's original reports on the PGP keyring began in January 1996, and gave one of the first good views into both the math and social aspects behind the global web of trust. Thanks Neal.

Before you can get anything from this report, you'll need to have a decent understanding of the principles behind public key cryptography, and on a concept known as the "web of trust". I suggest some background reading from the following sites:

Please do _not_ proceed to use public key cryptography without some basic knowledge of how it works, or you risk jeapordizing the security of those that might put their trust in you!

The "strong set"

This analysis focuses on the web of trust, and in particular something called a "strongly connected" set. Consider that every key in the world is some point in a graph. Signing another key creates a one-directional edge/vector from the 'signer' to the 'signee'. A strongly connected set is one in which one can follow these edges from any point in the set and get to any other point in the set.

For instance, if key A signs key B, B signs C, and C signs A, these keys represent a strongly connected set because from any one of the points, any of the other points is reachable. It does not have to be a single 'hop' away, but may go through other points. A cannot reach C directly, but can reach B, which can reach C, so A can get to C (and has a path length of 2 to do so). A could also sign C directly, and the set would remain strongly connected, and A would now have a quicker path to C.

Worldwide, there are thousands of small strongly connected sets between groups of friends. As people exchange keys between these sets, they merge and form larger and larger strongly connected sets of keys. Because of the way the math and sociodynamics work, there ends up being one large strongly connected set in the world, with the size of the next-largest set being much much smaller. This happens because as soon as another strongly connected set begins to grow, it becomes more and more likely that someone in that set will exchange signatures with someone in the large set and join the two. Therefore, I refer to one "strong set" worldwide, which is by far the largest strongly connected set of keys.

A measure of trust

There are a variety of metrics one could apply to this set, but I've chosen initially to measure the "mean shortest distance" (MSD) to each key. Since every key is reachable from every other in the strong set, it is possible to find out the shortest distance (number of hops) to any given key from any other key. Averaging these distances gives the MSD to that key from every other key in the strong set.

It is desirable to have as short as possible an MSD to your key, as that means that on average, people can reach your key quickly through signatures, and thus your key is relatively more trusted than a key with a higher MSD.

NOTE: This does not mean that you should universally trust keys with a low MSD. This is merely a relative measurement for statistical purposes.

The MSD has the property of being no more than 1 higher than your lowest signature. In the worst case, every key in the strong set could reach you by getting to that key, plus 1 hop to get to you. It also encourages the joining of keys that are separated by great distances in the graph, as it will make you a highway of sorts for shortest paths between keys in those groups. In the end, it encourages an overall tightening of the world graph, shortening distances between key owners.

Bad measures of trust

There are other ways to measure how "trusted" a key is, many of them very misleading.

First of all, any measure of trust that does not include your own key, and signatures you've made and received, is bad. Just because I've published something that claims that key X is relatively trusted, does not mean that you should trust it. Your personal web of trust is really the only good measure of trust.

That said, there are other pitfalls to avoid. It would be very easy to measure trust based on the number of signatures a key has received. This has the disadvantage of assigning the same level of trust to every signature out there. For instance, if I create my own key and use it to sign my other key, is that as valuable as if Phil Zimmermann had signed my original key? The MSD method of measuring calculates a truer measure of distances than simple signature counting.

You could also measure the MSD from a given key to every other key out there. This would give some measure of how close you are to others. While this is potentially valuable, it potentially has the very dangerous drawback of encouraging frivolous signatures. I could sign every key on the net and make myself the best "signer" in the world, but this would only serve as a circumvention of the system. This is why I only measure MSD to a key, rather than from it. This properly encourages verifying yourself to others. Signing others' keys is still very important, as it increases your personal web of trust.

The algorithm specifics

The code behind this analysis isn't terribly long, nor is it packaged terribly well, but it serves its purpose and does so quickly. It is a short shell script and perl script for keyring pre-processing, and a few hundred line C program for the actual analysis. Everything is copyright (c)2001 M. Drew Streib and released under the GPL. You can find the code here.

The first task is to find the strong set. I do this by first finding a key that I assume to be in the strong set, in this case mine. Since I was able to trace signatures to and from some of the world's famous encryption experts, this assumption is pretty safe (and the resulting analysis proves it to be true). Note that this will work from any key in the strong set without affecting results, as this is only used to find the strong set in the first place, and not as the basis for any measurement.

I performed this same test beginning with keys from Phil Zimmermann and Ted Ts'o, and got the exact same results, as expected.

From this starting point, I perform a depth-first search (DFS), marking nodes as "connected" as I reach them. I perform this search twice, once looking for signatures to me, and once looking for keys I can reach with my own signatures. The intersection of these two sets gives the strong set. I have thus proved that I can reach those nodes, and they can reach me, and thus they can all reach each other as well (at a minimum through me). The efficiency of this search is O(n).

After finding the strong set, it is necessary to find the mean distance to each node from every other node. This is done with a modified BFS, in which shortest distances are stored. The efficiency of this modified search is O(n). These distances are then simply averaged to get one node's MSD.

This second search is repeated for each key, giving the basis for the overall ranking and the average MSD calculation. The overall efficiency of finding the set of MSDs then, is O(n^2).

 

All original content on this site is copyright (c)2000-2113 M. Drew Streib and licensed under the OpenContent License unless otherwise noted.