Page 1
Distributed Identity Management in the PGP
Web of Trust
Daniel Margolis <dmargoli@seas.upenn.edu>
Faculty Advisor: Professor Matt Blaze <blaze@cis.upenn.edu>
11 April, 2006
1 Introduction
A web of trust is a conceptual model used to determine the authenticity of keys
in a cryptographic system. The term is most closely associated with PGP (and
other OpenPGP-compatible applications), which is by far the most prevalent
real-world example of such a model. In the case of PGP, a web of trust serves
as a distributed replacement for the centralized structure of a traditional PKI,
in which institutionalized certificate authorities (e.g. VeriSign et al. who sign
commercial SSL certificates) serve to verify keys in the same manner. However,
the web of trust is not merely a poor man’s certificate authority; the web of
trust can function where a certificate authority is infeasible due to expense or
lack of infrastructure as well as in situations where there is no way for a single
authority to properly verify claims (which may extend well beyond key-identity
pairings). Yet the manual web of trust system, by which users are assumed to
be capable of finding trusted paths between themselves and the key they wish
to verify, and of determining trust by observing the chain of signatures, leaves
much to be desired.
Current state of the art in the PGP system involves the use of centralized
key servers to facilitate distribution of keys and key signatures (endorsements
by one principal of other key-identity pairings). Yet these servers do no authen-
tication, making the information contained upon them as meaningful as that on
any public wiki or bulletin board. Users are left to their own devices to deter-
mine how much trust to place in the signatures of other principals; anecdotal
observation shows that for many, simply verifying that the key in question exists
on the key server is taken as sufficient evidence of its validity. Hardly better
is the practice of trusting keys that are signed by a large number of people;
endorsements of a key by other unverified keys bear little or no weight. Yet
without automated tools for determining the trust of keys in the web of trust
in an established way, users have no feasible alternatives to these bad practices.
In this project, I hope to determine if it is possible to calculate trust for non-
adjacent keys using the strength of the signing paths between the querying key
1

Page 2
and the target key (i.e. the client key and the key for which the client wishes
to determine validity). In such a system, a server, possessing a database of
participating keys and signatures, would return all relevant signing paths (to be
verified offline by the client) and a computation of the trust thus imparted to the
target key. Clients should be able to determine a numerical score indicating the
trustworthiness of that key, and to determine the relative validity of conflicting
keys (in the event of multiple keys claiming the same identity, as with a malicious
impostor).
2 Related Work
A large amount of data exists on the real-world properties of the PGP web of
trust [6, 1]. More interesting is Michael Reiter and Stuart Stubblebine’s paper
detailing the notion of using independent trust paths to verify the identity of
a principal in web of trust systems[3], which included the creation of a Web
service to find independent paths in the PGP web of trust. However, Reiter
and Stubblebine mainly explore efficient methods of finding those independent
paths, and only discuss the value of those paths in authentication as a means of
finding hopefully independent testimony about a principal’s identity. While that
is an interesting discussion, as a practical approach I believe it to be limited,
since non-disjoint paths are also of value, and a more nuanced approach in
which edges are weighted with the strength of the trust relationship allows a
far more sophisticated method of determining the likelihood of a key’s validity
than simply comparing the testimony from each independent path.
Reiter and Stubblebine also, however, attempt to arrive at a set of required
properties a trust metric must satisfy to be useful, and evaluate much of the
existing research on the subject. In the process, they identify many of the flaws
in existing or proposed systems, and help to clarify what it means to measure
trust.[2] However, again, they do not arrive at a complete or implemented sys-
tem; in this research, I hope to implement and test a system that meets many,
if not all, of the requirements laid out in their work.
Beth, Borcherding, and Klein describe a formal approach to trust very simi-
lar to my own, in which the web of trust is viewed as a directed weighted graph
with edge weights in the range [0, 1] as probabilities and featuring a distinction
between what they call “direct trust” and “recommendation trust,” respectively
equivalent to what I refer to as “key trust” and “principal trust” (detailed in
Section 4.1).[8]
Raph Levien has implemented perhaps the most interesting real-world ex-
periment in applying sophisticated trust metrics with his work on the online
developer community Advogato. Advogato implements a novel trust metric
based on a model of network flow as a means of certifying users as one of three
levels (“Master,” “Journeyer,” and “Apprentice”); this has proven an effective
measure against the spam, trolling, and pollution that infest many other pub-
lic online communities. Levien has also done much to identify weaknesses in
existing systems and the properties desired in a useful trust metric.[7]
2

Page 3
Finally, it is worth noting that many other real-world examples exist of
implicit trust metrics; in many cases, their success relies upon the quality of
those metrics. eBay’s feedback system can be seen as a naive approach to trust;
in the eBay system, ratings are explicitly viewed as identical regardless of source,
allowing an individual with many positive reviews from many new members to
appear as trusted as one with an equal number of positives from well established
buyers and sellers. This also allows a sort of positive-feedback black market to
develop, in which users can potentially conduct many inexpensive sham sales
in order to exchange positive feedback. A more nuanced approach eBay might
take would be to weight more heavily positive reviews from users with more
positives of their own.
Websites such as the ’blog Slashdot.org similarly attempt to approximate
a meaning of trust by allowing randomly selected users–“moderators”–to rate
posts; after a number of highly rated posts a user is allowed to post at higher
level (higher level posts being more visible and more highly read). As an added
measure of security, moderators are themselves moderated by other users; the
possibility of being a moderator in the future depends upon one’s moderations
being validated by the “meta-moderators.”
In the more abstract sense, Google’s PageRank algorithm, and even the
peer recommendation systems of Amazon’s suggestions or the news aggrega-
tor Findory.com can be viewed as distributed trust metrics. PageRank allows
highly ranked Websites to themselves impart more meaningful suggestions (in
the form of links), a property similar to Levien’s network flow approach (though
algorithmically different), while Amazon and Findory value primarily the im-
plicit suggestions of users similar to the user items are being suggested to; in
this case, suggestions mean more in proportion to how similar two users appear
to be.
Other possible applications are described in Section 5.
3 Technical Approach
3.1 Technical Overview
The implementation of this system can be divided into the following compo-
nents:
• A central key server used to distribute keys and certificates (endorsements
of identity and trustworthiness) efficiently
1
1
Centralizing certificate distribution incurs a substantial risk, even with clients doing offline
verification of each signature. A malicious server can withhold information, so that in the
event of conflicting claims to an identity, though the false claim may not have a very high
trust rating, the user may not know that there is a valid claim being laid as well, and thus
default to the untrusted–and forged–key. An interesting addition which may help mitigate
this problem would be a server-less peer-to-peer key distribution network. Such a network,
however, faces significant performance challenges; in order to be particularly resilient to hosts
going offline, all participants may replicate some or all of the available certificate database,
which, in the test environment, is already approximately 20 megabytes. In any case, this
3

Page 4
Server
Client
Target
principal
(name, e-mail
address, etc)
Signature
verification
and trust
calculation
Resulting
trust levels
Paths query
Query results
Figure 1: Server-client diagram
• A client with the ability to retrieve and verify offline key signing paths,
and to calculate trust paths
The server must be able to return to the client all keys on all paths between
the source and target keys, as well as being able to handle the traditional key-
server duties of storing and returning individual keys and key signatures. This
interaction is shown in Figure 1.
The actual trust calculation may be done at either the server or client level,
but it is critical that the client validate all signatures on all paths independently
of the server; in doing so, the client prevents the server from ever over-reporting
trust (note that since it can always hide the existence of certain trust paths
entirely, the server can effectively under-report trust).
For the actual trust calculation itself, I have implemented a standard inter-
face to a library providing this functionality; the goal of this is to allow multiple
algorithms to be swapped in so as to test which give the best results.
Additionally, I have implemented test code to provide a simulated envi-
ronment for comparing trust calculation algorithms; this is detailed further in
Section 4.4.
3.2 Technical Implementation
The implementation is done in the Python language, targeting the official Python
interpreter version 2.4. Python allows for extremely rapid application develop-
addition falls closer to the topic of reliable distributed data replication, such as is the goal
of the Freenet Project (http://freenet.sf.net/), and it will not be discussed in this paper any
further.
4

Page 5
ment, platform agnosticism, and the use of compiled object code (commonly
written in C) for modules where performance is critical.
The database that actually stores keys and certificates is implemented using
the SQLObject library for Python, allowing full compatibility with MySQL,
PostgreSQL, SQLite, Firebird, Sybase, MAX DB, and MSSQL. Development
has been primarily done with SQLite.
Server-client communication is achieved using the XML-RPC protocol, with
functionality provided by the Python SimpleXMLRPCServer library, included
in the standard Python distribution. Since XML-RPC is based upon HTTP,
the server can either be run as a standalone TCP server or as a CGI script (both
modes of operation are supported).
Implementing the necessary cryptographic algorithms is beyond the scope of
this project. I initially used GnuPG (the Gnu Privacy Guard, an open source
cryptography application) for this purpose; this provided full OpenPGP com-
patibility. However, GnuPG’s lack of easy integration required the spawning of
a child process for each function call, introducing practical limitations in perfor-
mance. GnuPG was also highly limited in terms of easy key storage. As a result,
while GnuPG compatibility is still largely maintained (though a given installa-
tion must either use GnuPG or not), I have largely abandoned active support
of it in favor of a “homebrew” public key cryptographic system implemented
using the PyCrypto library. This provides higher performance, better integra-
tion, and far easier key generation and storage, which is a boon for testing and
development. The system retains full functionality as before, save that users
must generate a separate key pair for use (as opposed to using their existing
GnuPG keys).
The trust certificates–which are used to represent an indication of trust im-
parted by a principal for another key and principal are cryptographically signed
ASCII data of the format shown in Figure 2. The signee identity is the name or
e-mail address associated with the signee (it is not a key fingerprint, but rather
the name by which the signee is known). The signee key fingerprint corresponds
to the key being signed; the signing key is known implicitly by the cryptographic
signature itself. The key trust represents a value, between 0 and 1, indicating
the signer’s belief that the indicated key is valid (i.e. that that key is indeed the
key corresponding to that identity). The principal trust represents a value, be-
tween 0 and 1, indicating the signer’s belief that the individual indicated by the
given key makes valid judgements. The separation of these two ratings, while
an added burden for the end user, is important for distinguishing between two
separate notions of trust–one may trust a principal (if he is who he says he is)
to make valid judgements, and thus trust all his judgements; alternatively, one
may fully believe a principal to be who he says he is, but not trust his judge-
ments about other principals. This distinction is discussed in greater detail in
Section 4.1.
Certificates are signed by the principal granting them, and the signature is
stored on the server. When clients receive a chain of certificates, then, in order
to verify that chain the client must verify that each signature is valid and that
each signature was granted by the recipient of the previous certificate (in other
5

Page 6
signee identity
signee fingerprint
key trust
principal trust
Figure 2: Certificate Format
words, that the signee of the preceding certificate is the signer of this certificate).
3.3 Technical Challenges
Implementing the basic server-client architecture is itself fairly trivial. Python
features a convenient library implementation of the XML-RPC protocol. Sim-
ilarly, third party Python cryptography libraries make implementing the cryp-
tographic features fairly straightforward.
The primary technical challenges were in implementing an efficient algorithm
for calculating trust and creating a realistic simulation environment to test
these algorithms in. The biggest challenge, though, was simply implementing
a resilient and useful algorithm for computing trust to begin with. In order
to measure success in dealing with this particular challenge, I have used the
simulation environment to measure what degree of trust a malicious user must
be granted before he is able to successfully convince other users to trust a false
key, and how many bad trust ratings must be given out (or how many users
must be tricked into trusting the wrong person) in order to have a false key be
trusted. This proof the strength of various algorithms for computing this trust.
The algorithm details are further discussed in Section 4, while testing details
are further discussed in Section 4.4.
3.4 Compatibility
The implementation aims for compatibility with POSIX platforms. Develop-
ment has been done primarily on MacOS X and Linux, making those the best
supported platforms. However, given the cross-platform nature of Python (and
the supporting modules and database servers), compatibility with any platform
fully supported by the Python compiler and gcc/glibc (PyCrypto and SQLOb-
ject both rely upon compiled C code) should be feasible with little modification.
4 Algorithm Design
4.1 Concept of Trust
In order to design and analyze our trust algorithms, we must first construct a
meaningful concept of trust. What is it we are trying to measure, and what
6

Page 7
represents success? Further, what sorts of attacks are we trying to prevent, and
what is the goal of most attackers?
By trust, we really mean how much faith we can place in a given identity-key
pairing. In any such application of our system, we are trying to map unique and
unforgeable identifiers (in this case public keys) to ambiguous and potentially-
contested identifiers (names, e-mail addresses, etc). The goal, then, of the trust
metric is to indicate the strength of a given such mapping. However, while that
is the sole external application of trust, internally there is a second sort of trust
at work. Whenever a referral is passed from one principal to another–whenever
a user analyzes the signature given by another user–a second notion of trust
comes into play, that sort of trust being how meaningful the testimony of this
principal actually is. To put it differently, in a trust path A → B → C, where
A is the starting principal and C the target identity, the edge from A to B
indicates how much faith to put in B’s recommendation, while the edge from B
to C indicates how much faith to put in the identity of C. The first sort of trust
I call “principal trust”–since it represents how much trust to give the statements
of a principal, as opposed to merely his identity–while I call the second “key
trust”–how much to trust his key’s claim of belonging to his identity. A similar
distinction is made by Beth, Borcherding, and Klein.[8]
The meaning of a numerical trust ranking is more elusive. Beth, Borcherd-
ing, and Klein describe them as “numbers of positive/negative experiences” in
dealing with the given principal[8], but this seems little more than a vague re-
statement of the same problem. How are these experiences prioritized? If there
is one good experience and one bad, can the good experience be said to be,
arbitrarily, twice as important as the bad one, or can the bad one be said to
be 50% more significant in judging trustworthiness? Looking at trust rankings
as probabilities lends more conceptual clarity, though it itself leaves some ques-
tions unanswered. For instance, should a user make a Bayesian assumption that
if he has not interacted with a principal, he should give that principal a trust
rating of 0.5? If a principal is given a low principal trust rating, does this imply
that the opposite of whatever that principal claims is likely to be true? Clearly,
neither of these are practical assumptions. In answer to the first question, we
would rather assume no trust at all than assume a probability of 0.5 that we
can trust any principal on whom we have no background (not to mention the
algorithmic challenges of assuming everyone should start with a ranking of 0.5).
To the second, we again make the assumption that we should default to no trust
at all; if a principal has a very low principal trust rating, it does not mean that
the negative of his claims are always true (or else an untrusted attacker could
easily circumvent the metric merely by saying the truth!), but rather that he
lies and his testimony is entirely insignificant.
4.2 Desired Properties
Before attempting to create the actual algorithm to be used, it is helpful to
informally enumerate the properties that must be shared by any candidate al-
gorithms. Reiter and Stubblebine identify eight design principles they believe to
7

Page 8
be fundamental requirements of any effective trust metric, covering the meaning
of trust ratings, efficient implementations, and so forth[2]; in this section, how-
ever, I intend solely to discuss properties that make a given algorithm a useful
trust metric. These desired properties include:
1. Longer paths are weaker
2. Paths passing through weak edges are weaker
3. Non-disjoint paths are of little value (i.e. two paths that share many nodes
are less significant than two disjoint paths that assert the same thing)
4. Disjoint paths are significant
5. Attack resilience–a single bad principal trust rating should only be able
to do damage in proportion to the original error
2
Or, from the perspective of a user, we wish to trust the recommendations of
users we trust, and of users who are trusted by users we trust, but we do not
wish to give more trust to a user we don’t directly know than we do to the user
who connects us. We also wish to consider the testimony of multiple trusted
users, but we do not wish to consider as meaningful the testimony of multiple
users originating from a single referral (non-disjoint paths); if that one referral
is bad, these two users may in fact be the same (malicious) user (i.e. “sock
puppets”).
Performance is also a serious consideration, and many of these algorithms
have to be bounded by recursion or traversal limits to cope with the size of the
web (the test web, comprising the strongly connected set from the PGP public
keyservers, is over 1,000 keys large).
4.3 Tested Designs
4.3.1 Strongest Path
This algorithm is a modification of Dijkstra’s shortest-path algorithm, where
we instead find the strongest path from the start key to the end identity. The
strength of a path is the product of the edge weights comprising this path, i.e.
for path P = {e
1
, e
2
, ...e
n
}, Weight(P) =
n
1
e
i
. A pseudocode implementation
of this algorithm is below.
This algorithm satisfies properties 1, 2, and 3. However, it completely disre-
gards all paths but the strongest, making the (perhaps naive) assumption that
presented with a given decision, trust can be evaluated based solely on the tes-
timony of the most trusted principals. However, this algorithm is also resilient
2
Levien describes this as resulting from the “bottleneck property”: that adding more target
nodes reachable through a single bad node should not result in more trust being afforded said
target nodes (otherwise, those target nodes, all perhaps “sock puppets” of a single malicious
user, could be used to all direct their aggregate trust at an impostor).[7] This “bottleneck
property” is an extension of property 2 above, however.
8

Page 9
Algorithm 1 Strongest Path
Assume the web as a connected graph G = (V,E), where V is the set of principals
and E is the set of weighted edges.
f(node)=
for each child of children(node):
if child.rank < node.rank * weight(node, child):
child.rank <- node.rank * weight(node, child)
if child != destination:
f(child)
Note that in the implemented version, for performance reasons recursion is
bounded by an arbitrary threshold below which node weights are not evaluated
(i.e. if a node can only be reached by a path with a weight less than the set
threshold, all paths passing through that node are assumed to be of weight
zero).
in that its primary deficiency results only in assigning too little trust, and it
preserves the restriction that more trust than is granted a neighbor can never
be granted the recommendations of that neighbor (i.e. it never trusts what a
principal says more than it trusts the principal himself), satisfying property 5.
4.3.2 Disjoint Paths
This is an algorithm that, rather than consider merely the strongest path, con-
siders all disjoint paths from the start node to the end node. Each path is given
a weight that is the product of the edge weights that comprise it (i.e. given a
path P = {e
1
, e
2
, ...e
n
}, Weight(P) =
n
1
e
i
). The individual path weights are
then combined into a total weight. This is done by summing the weights (though
perhaps some other undiscovered approach would be more effective here). This
algorithm thus satisfies all four properties. However, this algorithm is still weak,
possibly weaker than the strongest path approach. Specifically, this algorithm
allows many weak disjoint paths to sum to a value greater than a single highly
trusted path, allowing a user who has trusted multiple “sock puppets” to be eas-
ily deceived. However, since all paths must be disjoint, the only way for a user
to be deceived is if he himself has trusted multiple malicious users; he is invul-
nerable to a neighbor who has done so. Assuming equal trust given to malicious
users who deceive our starting point and to valid users, then, and assuming the
paths going through the malicious users are of equal or greater weight than the
“real” paths, we can state that the amount of damage “sock puppets” who are
mistakenly trusted by a valid user can do is linearly bounded by the number of
“sock puppets” directly granted trust (i.e. the number who directly neighbor
the deceived user); the requirement that paths be disjoint means that malicious
9

Page 10
users must repeatedly be granted false trust in order to increase the amount of
trust they can trick their victim into assigning.
Also note that this algorithm no longer limits final trust ratings to the range
[0, 1].
4.3.3 Network Flow
Levien, with Advogato.org, uses a network flow approach to trust modelling in
which principals are able to grant certificates in proportion to their network
capacity. The web of trust is converted to a graph in which edges are weighted
by a capacity that is proportionate to the edge’s distance from “seed” nodes
(i.e. fully trusted nodes). These capacities are assigned arbitrarily based on
the breadth-first distance the given edge is from the seed node (i.e. those of
distance 1 have capacity x, those of distance 2 have capacity y : y < x, etc.).
The maximum capacity of a node is then measured using the Ford-Fulkerson
method (or in this case, the Edmonds-Karp implementation). In the Advogato
system, certificates can only be granted at one of three levels (“Apprentice,”
“Journeyer,” and “Master”), and principals can only grant certificates at or
below their own level. As such, a principal has a different total capacity for
each certificate level (the capacity for certificates he cannot grant is 0). This
has the property that the number of incorrect certificates that can be given out
is proportional to the number of “confused” nodes.[7]
This approach is very interesting, but the precise implementation used by
Levien is limited in our model. Since our edges are already weighted with
something analogous to a capacity (principal trust is, like capacity in the above
approach, a metric for how much trust a given principal can allocate), we can
dispense with the arbitrary capacity assignments, as well as with the fixed seed
node (since the seed is, in our system, whoever the querying principal is). The
result, however, works nearly identically to Advogato. One interesting feature
is that while trust rankings are limited to the range [0, 1], the ultimate weight
accorded an end point is not (this is true of the disjoint paths method, as well;
in practice, however, such results are highly unlikely to occur as shown in the
real-world tests). For example, in a graph shown below in Figure 3, the final
rank (or capacity) of D is 1.4.
This algorithm can be shown to formally hold to the “bottleneck property,”
as Levien terms it, satisfying property number 2–that paths are only as strong
as their weakest edge. However, since nodes can allocate all of their “capacity”
to other nodes, trust does not degrade over longer paths, failing property 1
(e.g. path P = {0.5, 0.5} has the same strength as path P = {0.5, 0.5, 0.5}, and
so on). While non-disjoint paths are of some significance in the network flow
model, they are hampered by the “bottleneck property,” limiting or negating
any harm that can be done by false non-disjoint paths. This approach fully
satisfies property 4; non-disjoint paths are all of equal significance. And by the
“bottleneck property,” this algorithm satisfies property 5 as well as any others
discussed here, since this results in linear bounding of how much trust can be
manipulated as the result of poorly placed trust (either of a higher ranking than
10

Page 11
A
B
.8
C
.8
D
.7
.7
Figure 3: Sample Network Flow Input
deserved, or of many principals being trusted more than they deserve).
It is possible that a modified version of this algorithm could be designed
to degrade capacities along long paths, enforcing property 1 in a way not done
by the standard network flow algorithm. In the unmodified version, however,
this failing could be considered the primary–indeed, the only–failing of this
algorithm.
4.4 Test Results
Potential algorithms can be evaluated in two general ways. First, they can be
evaluated, as above, based on which of the desired characteristics they fulfill.
This is a good way to determine theoretical weaknesses, such as whether a
malicious user can create many “sock puppets” and assign false trust, or whether
a principal may be trusted less than it should be because not all paths to it
are considered. Second, potential algorithms can be evaluated based on the
distribution of their behavior in a real-world test environment, specifically, the
distribution of weights assigned to actual paths.
The first approach is pursued above in the abstract analysis of each algo-
rithm. For the second, approach, a test web of trust was constructed from a
snapshot of the PGP Web of Trust’s strongly connected set as found on the
pgp.net keyserver network. Signing relationships were imported into a test web,
and edge weights randomly assigned in a normal distribution with a mean of 0.5
and a standard deviation of 0.15 (bounded by 0 and 1). This distribution was
picked in the hopes that it would resemble the real-world distribution found in
a deployed system.
3
A portion (for performance’s sake) of principals were then
randomly selected as start and end points, the paths between them ranked, and
the aggregate statistics analyzed. Of specific interest is the mean rank, the range
3
It seems plausible to speculate that in real-world usage, there might be two separate dis-
tributions; one would be of rankings tightly clustered near to 1.0, representative of individuals
who have directly exchanged keys over highly trusted media (e.g. face-to-face), the other a
broader distribution centering at the lower ends. However, for testing purposes, the simpler
normal distribution should be sufficient.
11

Page 12
Strongest Path Disjoint Paths Network Flow
Max
0.202
0.957
12.380
Min
0.0
0.0
0.0
Mean
0.011
0.032
3.138
Standard Deviation
0.028
0.111
1.799
Table 1: Test Results
of ranks, and the standard deviation. A good algorithm should ideally present a
fairly broad distribution, with a high maximum (strongly trusted destinations)
and a high standard deviation (implying significant differences are presented
between lesser ranked paths and highly ranked paths). Assuming that the algo-
rithm satisfies our previously stated desired properties, lower ranked paths are
lower ranked for the proper reasons and higher ranked similarly so, thus we can
assume from a high standard deviation that discrimination between weak and
strong paths is fairly good. Note that this metric is weak, however, without
external analysis of in which ways the algorithm is discriminating (hence the
previous discussion).
4.4.1 Real-world Test Results
Here are presented the results from the real-world tests. These were run against
a sample of 257–20%–keys and identities randomly selected from the web of
trust generated from the PGP strong-set residing on the pgp.net keyservers.
We can see in Table 1 that the strongest path algorithm generates the
lowest distribution of weights, highly skewed at the lower end. This is clearly
due to such a fast degradation of trust in relation to path length (the average
path length in the test runs was approximately 6.29; search traversals were lim-
ited to paths of length 10); as such, the mean and standard deviation are both
low–this is undesirable. The disjoint paths algorithm fares slightly better;
degradation of paths is somewhat offset by the disjoint paths summing together
(the average number of disjoint paths in the test runs was approximately 5.41).
Note, too, that the maximum trust reported by this algorithm is still less than
1; this is not a restriction of this approach, but because the average weight of a
single path (as computed in this algorithm and the strongest path approach)
is less than 1 divided by the average number of disjoint paths, values exceed-
ing 1 are highly unlikely. Finally, the network flow algorithm, as expected,
generated the greatest values (and greatest range of values). This is a highly
desirable characteristic, but it results primarily from disregarding path length
as a consideration (by failing to degrade capacity over path length).
Next we can observe the correlation of trust ratings with path length. As
we can see in Figure 5, the strongest path and disjoint paths algorithms are
heavily biased in favor of shorter paths. The network flow algorithm shows
very slight correlation, appearing to favor longer paths. This most likely stems
from longer paths being coincident with more non-disjoint paths, which allow
12

Page 13
more augmenting paths to bypass bottlenecks (thus allowing a greater capacity
to these paths). This correlation should be considered a minor flaw, however,
since it is contrary to our preferred bias in favor of shorter paths.
Finally, in Figure 6, we see the plot of trust versus the number of disjoint
paths. There is no notable correlation in our strongest path algorithm (nor
should there be, since only the strongest path is considered). More troubling,
perhaps, the disjoint paths algorithm appears to show little correlation, either.
The network flow algorithm shows little correlation (which may be contrary
to what one would have expected, since disjoint paths are augmenting paths
that increase capacity in the network).
4.5 Other Unimplemented Algorithms
4.5.1 Bayesian Belief Network
This approach treats each trust rating as a probability. Principal trusts are
assumed to be the probability with which the granting principal believes the
recipient is an accurate certificate granter (i.e. the probability that certificates
granted by him can be trusted), while key trusts are assumed to be the proba-
bility that that principal is in fact who he claims he is. Note that a low principal
trust cannot be interpreted as an assertion that the principal always lies (i.e.
that the negative of his assertions are always true); rather, it is an assertion
that the principal’s certificates are meaningless. The following example illus-
trates this point.
Assume principals A, B, and C. The principal trust of A for B is x, and the
key trust of B for C is y. From the perspective of A querying about the identity
of C, the the probability of C’s identity being valid (P(C)) can be evaluated
as a conditional probability dependent upon the probability of B’s assertion
being valid (P(B)). Thus P(C|B) = y and P(C|¬B) = 0.5 (because we cannot
assume that ¬B → ¬C, i.e. we cannot assume B always lies). We can conclude,
then, that P(C) = P(B)×P(C|B)+P(¬B)×P(C|¬B) = x×y +(1−x)×0.5.
This algorithm does satisfy the four desired properties. However, because
of the Bayesian assumption of a probability of 0.5 of untrusted principals lying,
rankings will tend to center to 0.5. Nodes to which there is no path will also be
ranked as 0.5, since no certifying path implies an unknown level of trust, which in
a Bayesian interpretation implies a probability of 0.5 for each possibility (honest
or not). This is notably different from the previously-discussed algorithms, in
which the mean rankings were far lower.
4.5.2 Dempster-Shafer
Dempster-Shafer theory provides a neater means of coping with the problem of
ambiguity in ratings (when treated as probabilities, as described above, in which
we end up centering “unknown” trusts at 0.5). Unlike Bayesian probability the-
ory, Dempster-Shafer allows us to indicate the beliefs of various assertions (in
this case, trust ratings) in ways that don’t necessarily correspond to Bayesian
13

Page 14
Figure 4: Distribution of Trusts
14

Page 15
Figure 5: Trust Versus Path Length
15

Page 16
Figure 6: Trust Versus Number of Disjoint Paths
16

Page 17
probabilities. Specifically, Dempster-Shafer requires that the probabilities as-
signed to all subsets of the set of considered outcomes sum to 1, rather than
that the probabilities assigned to all elements of the outcome set sum to 1. For
example, consider the possibility that a principal is truthful or not truthful, i.e.
{T, ¬T} is the set of possible outcomes. In a Bayesian approach we require that
P(T) + P(¬T) = 1; this has the effect that if we don’t know anything about a
principal and don’t wish to assert anything (specifically, we do not wish to assert
that the principal is always dishonest, since this would affect our conclusions as
much as assuming the principal is always honest), we must assign a probability
of 0.5 to both T and ¬T. In the Dempster-Shafer approach, however, we re-
quire that P(T) + P(¬T) + P({T, ¬T}) = 1, which in this application allows a
principal to spread the strength of his claim between three more fine-grained as-
sertions: that a principal is trustworthy, that a principal is not trustworthy, and
that a principal might be either. In this case, trust ratings might be expanded
to occupy the range [-1, 1], allowing 0 to be equivalent to P({T, ¬T}) = 1, -1 to
be equivalent to P(¬T) = 1, and 1 to be equivalent to P(T) = 1. Note that this
range enforces the further restriction that one cannot assert, say, P(T) = 0.5
and P(¬T) = 0.5.
Dempster-Shafer may provide a more elegant way of formally computing
probabilities from the web of trust as an evidentiary model than is possible
with Bayesian probability theory. This approach is detailed further in [4].
5 Future Research
Trust metrics are a field of ongoing research. Due to the lack of clearly de-
fined success–what does it really mean to trust, and when is it being allocated
properly?–it is unlikely that anyone will invent a single perfect algorithm that
will answer such questions once and for all. I believe the attempts to apply prob-
ability to trust metrics by conceptualizing them as belief networks are sound
and provide particularly promising approaches to determining trust. Yet I also
believe future research would do well to aim to better understand how such sys-
tems are actually used in practice, and what threats are in fact most severe. For
example, in the theoretical approach, all nodes are generally considered to be
equivalent; in practice, one might consider that identities that are “high value”
targets–that are particularly worth trying to impersonate–are also more likely
to be ore validated (in the way that, for instance, Phil Zimmerman is at the
center of the PGP strong set). Observations like these may prove useful, yet are
easily overlooked without real-world implementations and experimentation.
As such, the direction I would most like to take future development is to
aim for practical usage and, especially, OpenPGP compatibility. Few of the
similar research projects discussed above have culminated in actual deployed
infrastructure, and doing so could be a highly interesting experiment.
Further opportunities also exist for other applications of trust metrics. On-
line communities are one previously discussed possibility; a particularly inter-
esting possibility, heavily lobbied for by Raph Levien (creator of the Advogato
17

Page 18
community) is to apply his trust metrics to Wikipedia. Most interesting is the
suggestion to use not a global trust metric (of the sort used on Advogato or that
is analogous to PageRank and the like), but rather to use a subjective metric like
those detailed in this paper; in such a system, users would be presented with
the version of the page closest to their own beliefs, creating an encyclopedia
that embraces a subjective, rather than objective, notion of accuracy. Further,
even Internet protocols like DNS and BGP seem ripe for applications of this
technology; in short, any attempt to map unique names to forgeable identities,
especially where the notion of a correct matching is subjective, is a possible
application of trust metrics.
6 Conclusions
In this research, I have shown that it is possible to implement a practical in-
frastructure to allow distributed key signing and exchange to be verified with
sophisticated trust metrics. I have also shown three trust metrics and have
shown their relative strength and usefulness; all three show themselves to be
practical approaches to the problem of determining key validity in a web of
trust without trusted third parties. To continue this research, one could add
full OpenPGP compatibility to the implemented infrastructure to make it of
more value to existing PGP users. One could further research good trust met-
rics, implement the other proposed metrics, and design better approaches for
evaluating metrics. One could also implement trust metrics in a variety of other
settings, as previously described. In this research, at least, we have seen the
steps to a practical implementation and the evaluation of that implementation.
References
[1] Neal McBurnett. PGP Web of Trust Statistics, April 2004. URL http://
bcn.boulder.co.us/ ˜neal/ pgpstat/ .
[2] Michael K. Reiter and Stuart G. Stubblebine. Toward Acceptable Metrics
of Authentication. IEEE Symposium on Security and Privacy, May 1997.
[3] Michael K. Reiter and Stuart G. Stubblebine. Resilient Authentication Using
Path Independence. IEEE Transactions on Computers 47 (12) (December
1998). URL http://www.ece.cmu.edu/˜reiter/papers/1998/ToC.pdf .
[4] Bin Yu and Munindar P. Singh. An Evidential Model of Dis-
tributed Reputation Management, North Carolina State University, July
2002. URL http://www.csc.ncsu.edu/faculty/mpsingh/papers/mas/aamas-
02-trust.pdf .
[5] Henk
P.
Penning.
Computing
Shortest
Paths
in
WOTs,
University
of
Utrecht.
URL
http://www.cs.uu.nl/people/henkp/henkp/pgp/pathfinder/mk path.cgi.
18

Page 19
[6] Henk P. Penning. Analysis of the Strong Set in the PGP
Web of Trust,
University of Utrecht,
September 2005. URL
http://www.cs.uu.nl/people/henkp/henkp/pgp/pathfinder/plot/ .
[7] Ralph Levien. Attack Resistent Trust Metrics, UC Berkeley, July 2004. URL
http://www.levien.com/thesis/compact.pdf.
[8] Thomas Beth, Malte Borcherding and Birgit Klein. Valuation of Trust in
Open Networks. European Symposium on Research in Computer Security,
1994.
19