Page 1 |

Distributed Identity Management in the PGP

Web of Trust

Daniel Margolis <dmargoli@seas.upenn.edu>

Faculty Advisor: Professor Matt Blaze <blaze@cis.upenn.edu>

11 April, 2006

1 Introduction

A web of trust is a conceptual model used to determine the authenticity of keys

in a cryptographic system. The term is most closely associated with PGP (and

other OpenPGP-compatible applications), which is by far the most prevalent

real-world example of such a model. In the case of PGP, a web of trust serves

as a distributed replacement for the centralized structure of a traditional PKI,

in which institutionalized certificate authorities (e.g. VeriSign et al. who sign

commercial SSL certificates) serve to verify keys in the same manner. However,

the web of trust is not merely a poor man’s certificate authority; the web of

trust can function where a certificate authority is infeasible due to expense or

lack of infrastructure as well as in situations where there is no way for a single

authority to properly verify claims (which may extend well beyond key-identity

pairings). Yet the manual web of trust system, by which users are assumed to

be capable of finding trusted paths between themselves and the key they wish

to verify, and of determining trust by observing the chain of signatures, leaves

much to be desired.

Current state of the art in the PGP system involves the use of centralized

key servers to facilitate distribution of keys and key signatures (endorsements

by one principal of other key-identity pairings). Yet these servers do no authen-

tication, making the information contained upon them as meaningful as that on

any public wiki or bulletin board. Users are left to their own devices to deter-

mine how much trust to place in the signatures of other principals; anecdotal

observation shows that for many, simply verifying that the key in question exists

on the key server is taken as sufficient evidence of its validity. Hardly better

is the practice of trusting keys that are signed by a large number of people;

endorsements of a key by other unverified keys bear little or no weight. Yet

without automated tools for determining the trust of keys in the web of trust

in an established way, users have no feasible alternatives to these bad practices.

In this project, I hope to determine if it is possible to calculate trust for non-

adjacent keys using the strength of the signing paths between the querying key

1

and the target key (i.e. the client key and the key for which the client wishes

to determine validity). In such a system, a server, possessing a database of

participating keys and signatures, would return all relevant signing paths (to be

verified offline by the client) and a computation of the trust thus imparted to the

target key. Clients should be able to determine a numerical score indicating the

trustworthiness of that key, and to determine the relative validity of conflicting

keys (in the event of multiple keys claiming the same identity, as with a malicious

impostor).

2 Related Work

A large amount of data exists on the real-world properties of the PGP web of

trust [6, 1]. More interesting is Michael Reiter and Stuart Stubblebine’s paper

detailing the notion of using independent trust paths to verify the identity of

a principal in web of trust systems[3], which included the creation of a Web

service to find independent paths in the PGP web of trust. However, Reiter

and Stubblebine mainly explore efficient methods of finding those independent

paths, and only discuss the value of those paths in authentication as a means of

finding hopefully independent testimony about a principal’s identity. While that

is an interesting discussion, as a practical approach I believe it to be limited,

since non-disjoint paths are also of value, and a more nuanced approach in

which edges are weighted with the strength of the trust relationship allows a

far more sophisticated method of determining the likelihood of a key’s validity

than simply comparing the testimony from each independent path.

Reiter and Stubblebine also, however, attempt to arrive at a set of required

properties a trust metric must satisfy to be useful, and evaluate much of the

existing research on the subject. In the process, they identify many of the flaws

in existing or proposed systems, and help to clarify what it means to measure

trust.[2] However, again, they do not arrive at a complete or implemented sys-

tem; in this research, I hope to implement and test a system that meets many,

if not all, of the requirements laid out in their work.

Beth, Borcherding, and Klein describe a formal approach to trust very simi-

lar to my own, in which the web of trust is viewed as a directed weighted graph

with edge weights in the range [0, 1] as probabilities and featuring a distinction

between what they call “direct trust” and “recommendation trust,” respectively

equivalent to what I refer to as “key trust” and “principal trust” (detailed in

Section 4.1).[8]

Raph Levien has implemented perhaps the most interesting real-world ex-

periment in applying sophisticated trust metrics with his work on the online

developer community Advogato. Advogato implements a novel trust metric

based on a model of network flow as a means of certifying users as one of three

levels (“Master,” “Journeyer,” and “Apprentice”); this has proven an effective

measure against the spam, trolling, and pollution that infest many other pub-

lic online communities. Levien has also done much to identify weaknesses in

existing systems and the properties desired in a useful trust metric.[7]

2

Finally, it is worth noting that many other real-world examples exist of

implicit trust metrics; in many cases, their success relies upon the quality of

those metrics. eBay’s feedback system can be seen as a naive approach to trust;

in the eBay system, ratings are explicitly viewed as identical regardless of source,

allowing an individual with many positive reviews from many new members to

appear as trusted as one with an equal number of positives from well established

buyers and sellers. This also allows a sort of positive-feedback black market to

develop, in which users can potentially conduct many inexpensive sham sales

in order to exchange positive feedback. A more nuanced approach eBay might

take would be to weight more heavily positive reviews from users with more

positives of their own.

Websites such as the ’blog Slashdot.org similarly attempt to approximate

a meaning of trust by allowing randomly selected users–“moderators”–to rate

posts; after a number of highly rated posts a user is allowed to post at higher

level (higher level posts being more visible and more highly read). As an added

measure of security, moderators are themselves moderated by other users; the

possibility of being a moderator in the future depends upon one’s moderations

being validated by the “meta-moderators.”

In the more abstract sense, Google’s PageRank algorithm, and even the

peer recommendation systems of Amazon’s suggestions or the news aggrega-

tor Findory.com can be viewed as distributed trust metrics. PageRank allows

highly ranked Websites to themselves impart more meaningful suggestions (in

the form of links), a property similar to Levien’s network flow approach (though

algorithmically different), while Amazon and Findory value primarily the im-

plicit suggestions of users similar to the user items are being suggested to; in

this case, suggestions mean more in proportion to how similar two users appear

to be.

Other possible applications are described in Section 5.

3 Technical Approach

3.1 Technical Overview

The implementation of this system can be divided into the following compo-

nents:

• A central key server used to distribute keys and certificates (endorsements

of identity and trustworthiness) efficiently

1

1

Centralizing certificate distribution incurs a substantial risk, even with clients doing offline

verification of each signature. A malicious server can withhold information, so that in the

event of conflicting claims to an identity, though the false claim may not have a very high

trust rating, the user may not know that there is a valid claim being laid as well, and thus

default to the untrusted–and forged–key. An interesting addition which may help mitigate

this problem would be a server-less peer-to-peer key distribution network. Such a network,

however, faces significant performance challenges; in order to be particularly resilient to hosts

going offline, all participants may replicate some or all of the available certificate database,

which, in the test environment, is already approximately 20 megabytes. In any case, this

3

Server

Client

Target

principal

(name, e-mail

address, etc)

Signature

verification

and trust

calculation

Resulting

trust levels

Paths query

Query results

Figure 1: Server-client diagram

• A client with the ability to retrieve and verify offline key signing paths,

and to calculate trust paths

The server must be able to return to the client all keys on all paths between

the source and target keys, as well as being able to handle the traditional key-

server duties of storing and returning individual keys and key signatures. This

interaction is shown in Figure 1.

The actual trust calculation may be done at either the server or client level,

but it is critical that the client validate all signatures on all paths independently

of the server; in doing so, the client prevents the server from ever over-reporting

trust (note that since it can always hide the existence of certain trust paths

entirely, the server can effectively under-report trust).

For the actual trust calculation itself, I have implemented a standard inter-

face to a library providing this functionality; the goal of this is to allow multiple

algorithms to be swapped in so as to test which give the best results.

Additionally, I have implemented test code to provide a simulated envi-

ronment for comparing trust calculation algorithms; this is detailed further in

Section 4.4.

3.2 Technical Implementation

The implementation is done in the Python language, targeting the official Python

interpreter version 2.4. Python allows for extremely rapid application develop-

addition falls closer to the topic of reliable distributed data replication, such as is the goal

of the Freenet Project (http://freenet.sf.net/), and it will not be discussed in this paper any

further.

4

ment, platform agnosticism, and the use of compiled object code (commonly

written in C) for modules where performance is critical.

The database that actually stores keys and certificates is implemented using

the SQLObject library for Python, allowing full compatibility with MySQL,

PostgreSQL, SQLite, Firebird, Sybase, MAX DB, and MSSQL. Development

has been primarily done with SQLite.

Server-client communication is achieved using the XML-RPC protocol, with

functionality provided by the Python SimpleXMLRPCServer library, included

in the standard Python distribution. Since XML-RPC is based upon HTTP,

the server can either be run as a standalone TCP server or as a CGI script (both

modes of operation are supported).

Implementing the necessary cryptographic algorithms is beyond the scope of

this project. I initially used GnuPG (the Gnu Privacy Guard, an open source

cryptography application) for this purpose; this provided full OpenPGP com-

patibility. However, GnuPG’s lack of easy integration required the spawning of

a child process for each function call, introducing practical limitations in perfor-

mance. GnuPG was also highly limited in terms of easy key storage. As a result,

while GnuPG compatibility is still largely maintained (though a given installa-

tion must either use GnuPG or not), I have largely abandoned active support

of it in favor of a “homebrew” public key cryptographic system implemented

using the PyCrypto library. This provides higher performance, better integra-

tion, and far easier key generation and storage, which is a boon for testing and

development. The system retains full functionality as before, save that users

must generate a separate key pair for use (as opposed to using their existing

GnuPG keys).

The trust certificates–which are used to represent an indication of trust im-

parted by a principal for another key and principal are cryptographically signed

ASCII data of the format shown in Figure 2. The signee identity is the name or

e-mail address associated with the signee (it is not a key fingerprint, but rather

the name by which the signee is known). The signee key fingerprint corresponds

to the key being signed; the signing key is known implicitly by the cryptographic

signature itself. The key trust represents a value, between 0 and 1, indicating

the signer’s belief that the indicated key is valid (i.e. that that key is indeed the

key corresponding to that identity). The principal trust represents a value, be-

tween 0 and 1, indicating the signer’s belief that the individual indicated by the

given key makes valid judgements. The separation of these two ratings, while

an added burden for the end user, is important for distinguishing between two

separate notions of trust–one may trust a principal (if he is who he says he is)

to make valid judgements, and thus trust all his judgements; alternatively, one

may fully believe a principal to be who he says he is, but not trust his judge-

ments about other principals. This distinction is discussed in greater detail in

Section 4.1.

Certificates are signed by the principal granting them, and the signature is

stored on the server. When clients receive a chain of certificates, then, in order

to verify that chain the client must verify that each signature is valid and that

each signature was granted by the recipient of the previous certificate (in other

5

signee identity

signee fingerprint

key trust

principal trust

Figure 2: Certificate Format

words, that the signee of the preceding certificate is the signer of this certificate).

3.3 Technical Challenges

Implementing the basic server-client architecture is itself fairly trivial. Python

features a convenient library implementation of the XML-RPC protocol. Sim-

ilarly, third party Python cryptography libraries make implementing the cryp-

tographic features fairly straightforward.

The primary technical challenges were in implementing an efficient algorithm

for calculating trust and creating a realistic simulation environment to test

these algorithms in. The biggest challenge, though, was simply implementing

a resilient and useful algorithm for computing trust to begin with. In order

to measure success in dealing with this particular challenge, I have used the

simulation environment to measure what degree of trust a malicious user must

be granted before he is able to successfully convince other users to trust a false

key, and how many bad trust ratings must be given out (or how many users

must be tricked into trusting the wrong person) in order to have a false key be

trusted. This proof the strength of various algorithms for computing this trust.

The algorithm details are further discussed in Section 4, while testing details

are further discussed in Section 4.4.

3.4 Compatibility

The implementation aims for compatibility with POSIX platforms. Develop-

ment has been done primarily on MacOS X and Linux, making those the best

supported platforms. However, given the cross-platform nature of Python (and

the supporting modules and database servers), compatibility with any platform

fully supported by the Python compiler and gcc/glibc (PyCrypto and SQLOb-

ject both rely upon compiled C code) should be feasible with little modification.

4 Algorithm Design

4.1 Concept of Trust

In order to design and analyze our trust algorithms, we must first construct a

meaningful concept of trust. What is it we are trying to measure, and what

6

represents success? Further, what sorts of attacks are we trying to prevent, and

what is the goal of most attackers?

By trust, we really mean how much faith we can place in a given identity-key

pairing. In any such application of our system, we are trying to map unique and

unforgeable identifiers (in this case public keys) to ambiguous and potentially-

contested identifiers (names, e-mail addresses, etc). The goal, then, of the trust

metric is to indicate the strength of a given such mapping. However, while that

is the sole external application of trust, internally there is a second sort of trust

at work. Whenever a referral is passed from one principal to another–whenever

a user analyzes the signature given by another user–a second notion of trust

comes into play, that sort of trust being how meaningful the testimony of this

principal actually is. To put it differently, in a trust path A → B → C, where

A is the starting principal and C the target identity, the edge from A to B

indicates how much faith to put in B’s recommendation, while the edge from B

to C indicates how much faith to put in the identity of C. The first sort of trust

I call “principal trust”–since it represents how much trust to give the statements

of a principal, as opposed to merely his identity–while I call the second “key

trust”–how much to trust his key’s claim of belonging to his identity. A similar

distinction is made by Beth, Borcherding, and Klein.[8]

The meaning of a numerical trust ranking is more elusive. Beth, Borcherd-

ing, and Klein describe them as “numbers of positive/negative experiences” in

dealing with the given principal[8], but this seems little more than a vague re-

statement of the same problem. How are these experiences prioritized? If there

is one good experience and one bad, can the good experience be said to be,

arbitrarily, twice as important as the bad one, or can the bad one be said to

be 50% more significant in judging trustworthiness? Looking at trust rankings

as probabilities lends more conceptual clarity, though it itself leaves some ques-

tions unanswered. For instance, should a user make a Bayesian assumption that

if he has not interacted with a principal, he should give that principal a trust

rating of 0.5? If a principal is given a low principal trust rating, does this imply

that the opposite of whatever that principal claims is likely to be true? Clearly,

neither of these are practical assumptions. In answer to the first question, we

would rather assume no trust at all than assume a probability of 0.5 that we

can trust any principal on whom we have no background (not to mention the

algorithmic challenges of assuming everyone should start with a ranking of 0.5).

To the second, we again make the assumption that we should default to no trust

at all; if a principal has a very low principal trust rating, it does not mean that

the negative of his claims are always true (or else an untrusted attacker could

easily circumvent the metric merely by saying the truth!), but rather that he

lies and his testimony is entirely insignificant.

4.2 Desired Properties

Before attempting to create the actual algorithm to be used, it is helpful to

informally enumerate the properties that must be shared by any candidate al-

gorithms. Reiter and Stubblebine identify eight design principles they believe to

7

be fundamental requirements of any effective trust metric, covering the meaning

of trust ratings, efficient implementations, and so forth[2]; in this section, how-

ever, I intend solely to discuss properties that make a given algorithm a useful

trust metric. These desired properties include:

1. Longer paths are weaker

2. Paths passing through weak edges are weaker

3. Non-disjoint paths are of little value (i.e. two paths that share many nodes

are less significant than two disjoint paths that assert the same thing)

4. Disjoint paths are significant

5. Attack resilience–a single bad principal trust rating should only be able

to do damage in proportion to the original error

2

Or, from the perspective of a user, we wish to trust the recommendations of

users we trust, and of users who are trusted by users we trust, but we do not

wish to give more trust to a user we don’t directly know than we do to the user

who connects us. We also wish to consider the testimony of multiple trusted

users, but we do not wish to consider as meaningful the testimony of multiple

users originating from a single referral (non-disjoint paths); if that one referral

is bad, these two users may in fact be the same (malicious) user (i.e. “sock

puppets”).

Performance is also a serious consideration, and many of these algorithms

have to be bounded by recursion or traversal limits to cope with the size of the

web (the test web, comprising the strongly connected set from the PGP public

keyservers, is over 1,000 keys large).

4.3 Tested Designs

4.3.1 Strongest Path

This algorithm is a modification of Dijkstra’s shortest-path algorithm, where

we instead find the strongest path from the start key to the end identity. The

strength of a path is the product of the edge weights comprising this path, i.e.

for path P = {e

1

, e

2

, ...e

n

}, Weight(P) =

n

1

e

i

. A pseudocode implementation

of this algorithm is below.

This algorithm satisfies properties 1, 2, and 3. However, it completely disre-

gards all paths but the strongest, making the (perhaps naive) assumption that

presented with a given decision, trust can be evaluated based solely on the tes-

timony of the most trusted principals. However, this algorithm is also resilient

2

Levien describes this as resulting from the “bottleneck property”: that adding more target

nodes reachable through a single bad node should not result in more trust being afforded said

target nodes (otherwise, those target nodes, all perhaps “sock puppets” of a single malicious

user, could be used to all direct their aggregate trust at an impostor).[7] This “bottleneck

property” is an extension of property 2 above, however.

8

Algorithm 1 Strongest Path

Assume the web as a connected graph G = (V,E), where V is the set of principals

and E is the set of weighted edges.

f(node)=

for each child of children(node):

if child.rank < node.rank * weight(node, child):

child.rank <- node.rank * weight(node, child)

if child != destination:

f(child)

Note that in the implemented version, for performance reasons recursion is

bounded by an arbitrary threshold below which node weights are not evaluated

(i.e. if a node can only be reached by a path with a weight less than the set

threshold, all paths passing through that node are assumed to be of weight

zero).

in that its primary deficiency results only in assigning too little trust, and it

preserves the restriction that more trust than is granted a neighbor can never

be granted the recommendations of that neighbor (i.e. it never trusts what a

principal says more than it trusts the principal himself), satisfying property 5.

4.3.2 Disjoint Paths

This is an algorithm that, rather than consider merely the strongest path, con-

siders all disjoint paths from the start node to the end node. Each path is given

a weight that is the product of the edge weights that comprise it (i.e. given a

path P = {e

1

, e

2

, ...e

n

}, Weight(P) =

n

1

e

i

). The individual path weights are

then combined into a total weight. This is done by summing the weights (though

perhaps some other undiscovered approach would be more effective here). This

algorithm thus satisfies all four properties. However, this algorithm is still weak,

possibly weaker than the strongest path approach. Specifically, this algorithm

allows many weak disjoint paths to sum to a value greater than a single highly

trusted path, allowing a user who has trusted multiple “sock puppets” to be eas-

ily deceived. However, since all paths must be disjoint, the only way for a user

to be deceived is if he himself has trusted multiple malicious users; he is invul-

nerable to a neighbor who has done so. Assuming equal trust given to malicious

users who deceive our starting point and to valid users, then, and assuming the

paths going through the malicious users are of equal or greater weight than the

“real” paths, we can state that the amount of damage “sock puppets” who are

mistakenly trusted by a valid user can do is linearly bounded by the number of

“sock puppets” directly granted trust (i.e. the number who directly neighbor

the deceived user); the requirement that paths be disjoint means that malicious

9

users must repeatedly be granted false trust in order to increase the amount of

trust they can trick their victim into assigning.

Also note that this algorithm no longer limits final trust ratings to the range

[0, 1].

4.3.3 Network Flow

Levien, with Advogato.org, uses a network flow approach to trust modelling in

which principals are able to grant certificates in proportion to their network

capacity. The web of trust is converted to a graph in which edges are weighted

by a capacity that is proportionate to the edge’s distance from “seed” nodes

(i.e. fully trusted nodes). These capacities are assigned arbitrarily based on

the breadth-first distance the given edge is from the seed node (i.e. those of

distance 1 have capacity x, those of distance 2 have capacity y : y < x, etc.).

The maximum capacity of a node is then measured using the Ford-Fulkerson

method (or in this case, the Edmonds-Karp implementation). In the Advogato

system, certificates can only be granted at one of three levels (“Apprentice,”

“Journeyer,” and “Master”), and principals can only grant certificates at or

below their own level. As such, a principal has a different total capacity for

each certificate level (the capacity for certificates he cannot grant is 0). This

has the property that the number of incorrect certificates that can be given out

is proportional to the number of “confused” nodes.[7]

This approach is very interesting, but the precise implementation used by

Levien is limited in our model. Since our edges are already weighted with

something analogous to a capacity (principal trust is, like capacity in the above

approach, a metric for how much trust a given principal can allocate), we can

dispense with the arbitrary capacity assignments, as well as with the fixed seed

node (since the seed is, in our system, whoever the querying principal is). The

result, however, works nearly identically to Advogato. One interesting feature

is that while trust rankings are limited to the range [0, 1], the ultimate weight

accorded an end point is not (this is true of the disjoint paths method, as well;

in practice, however, such results are highly unlikely to occur as shown in the

real-world tests). For example, in a graph shown below in Figure 3, the final

rank (or capacity) of D is 1.4.

This algorithm can be shown to formally hold to the “bottleneck property,”

as Levien terms it, satisfying property number 2–that paths are only as strong

as their weakest edge. However, since nodes can allocate all of their “capacity”

to other nodes, trust does not degrade over longer paths, failing property 1

(e.g. path P = {0.5, 0.5} has the same strength as path P = {0.5, 0.5, 0.5}, and

so on). While non-disjoint paths are of some significance in the network flow

model, they are hampered by the “bottleneck property,” limiting or negating

any harm that can be done by false non-disjoint paths. This approach fully

satisfies property 4; non-disjoint paths are all of equal significance. And by the

“bottleneck property,” this algorithm satisfies property 5 as well as any others

discussed here, since this results in linear bounding of how much trust can be

manipulated as the result of poorly placed trust (either of a higher ranking than

10

A

B

.8

C

.8

D

.7

.7

Figure 3: Sample Network Flow Input

deserved, or of many principals being trusted more than they deserve).

It is possible that a modified version of this algorithm could be designed

to degrade capacities along long paths, enforcing property 1 in a way not done

by the standard network flow algorithm. In the unmodified version, however,

this failing could be considered the primary–indeed, the only–failing of this

algorithm.

4.4 Test Results

Potential algorithms can be evaluated in two general ways. First, they can be

evaluated, as above, based on which of the desired characteristics they fulfill.

This is a good way to determine theoretical weaknesses, such as whether a

malicious user can create many “sock puppets” and assign false trust, or whether

a principal may be trusted less than it should be because not all paths to it

are considered. Second, potential algorithms can be evaluated based on the

distribution of their behavior in a real-world test environment, specifically, the

distribution of weights assigned to actual paths.

The first approach is pursued above in the abstract analysis of each algo-

rithm. For the second, approach, a test web of trust was constructed from a

snapshot of the PGP Web of Trust’s strongly connected set as found on the

pgp.net keyserver network. Signing relationships were imported into a test web,

and edge weights randomly assigned in a normal distribution with a mean of 0.5

and a standard deviation of 0.15 (bounded by 0 and 1). This distribution was

picked in the hopes that it would resemble the real-world distribution found in

a deployed system.

3

A portion (for performance’s sake) of principals were then

randomly selected as start and end points, the paths between them ranked, and

the aggregate statistics analyzed. Of specific interest is the mean rank, the range

3

It seems plausible to speculate that in real-world usage, there might be two separate dis-

tributions; one would be of rankings tightly clustered near to 1.0, representative of individuals

who have directly exchanged keys over highly trusted media (e.g. face-to-face), the other a

broader distribution centering at the lower ends. However, for testing purposes, the simpler

normal distribution should be sufficient.

11

Strongest Path Disjoint Paths Network Flow

Max

0.202

0.957

12.380

Min

0.0

0.0

0.0

Mean

0.011

0.032

3.138

Standard Deviation

0.028

0.111

1.799

Table 1: Test Results

of ranks, and the standard deviation. A good algorithm should ideally present a

fairly broad distribution, with a high maximum (strongly trusted destinations)

and a high standard deviation (implying significant differences are presented

between lesser ranked paths and highly ranked paths). Assuming that the algo-

rithm satisfies our previously stated desired properties, lower ranked paths are

lower ranked for the proper reasons and higher ranked similarly so, thus we can

assume from a high standard deviation that discrimination between weak and

strong paths is fairly good. Note that this metric is weak, however, without

external analysis of in which ways the algorithm is discriminating (hence the

previous discussion).

4.4.1 Real-world Test Results

Here are presented the results from the real-world tests. These were run against

a sample of 257–20%–keys and identities randomly selected from the web of

trust generated from the PGP strong-set residing on the pgp.net keyservers.

We can see in Table 1 that the strongest path algorithm generates the

lowest distribution of weights, highly skewed at the lower end. This is clearly

due to such a fast degradation of trust in relation to path length (the average

path length in the test runs was approximately 6.29; search traversals were lim-

ited to paths of length 10); as such, the mean and standard deviation are both

low–this is undesirable. The disjoint paths algorithm fares slightly better;

degradation of paths is somewhat offset by the disjoint paths summing together

(the average number of disjoint paths in the test runs was approximately 5.41).

Note, too, that the maximum trust reported by this algorithm is still less than

1; this is not a restriction of this approach, but because the average weight of a

single path (as computed in this algorithm and the strongest path approach)

is less than 1 divided by the average number of disjoint paths, values exceed-

ing 1 are highly unlikely. Finally, the network flow algorithm, as expected,

generated the greatest values (and greatest range of values). This is a highly

desirable characteristic, but it results primarily from disregarding path length

as a consideration (by failing to degrade capacity over path length).

Next we can observe the correlation of trust ratings with path length. As

we can see in Figure 5, the strongest path and disjoint paths algorithms are

heavily biased in favor of shorter paths. The network flow algorithm shows

very slight correlation, appearing to favor longer paths. This most likely stems

from longer paths being coincident with more non-disjoint paths, which allow

12

more augmenting paths to bypass bottlenecks (thus allowing a greater capacity

to these paths). This correlation should be considered a minor flaw, however,

since it is contrary to our preferred bias in favor of shorter paths.

Finally, in Figure 6, we see the plot of trust versus the number of disjoint

paths. There is no notable correlation in our strongest path algorithm (nor

should there be, since only the strongest path is considered). More troubling,

perhaps, the disjoint paths algorithm appears to show little correlation, either.

The network flow algorithm shows little correlation (which may be contrary

to what one would have expected, since disjoint paths are augmenting paths

that increase capacity in the network).

4.5 Other Unimplemented Algorithms

4.5.1 Bayesian Belief Network

This approach treats each trust rating as a probability. Principal trusts are

assumed to be the probability with which the granting principal believes the

recipient is an accurate certificate granter (i.e. the probability that certificates

granted by him can be trusted), while key trusts are assumed to be the proba-

bility that that principal is in fact who he claims he is. Note that a low principal

trust cannot be interpreted as an assertion that the principal always lies (i.e.

that the negative of his assertions are always true); rather, it is an assertion

that the principal’s certificates are meaningless. The following example illus-

trates this point.

Assume principals A, B, and C. The principal trust of A for B is x, and the

key trust of B for C is y. From the perspective of A querying about the identity

of C, the the probability of C’s identity being valid (P(C)) can be evaluated

as a conditional probability dependent upon the probability of B’s assertion

being valid (P(B)). Thus P(C|B) = y and P(C|¬B) = 0.5 (because we cannot

assume that ¬B → ¬C, i.e. we cannot assume B always lies). We can conclude,

then, that P(C) = P(B)×P(C|B)+P(¬B)×P(C|¬B) = x×y +(1−x)×0.5.

This algorithm does satisfy the four desired properties. However, because

of the Bayesian assumption of a probability of 0.5 of untrusted principals lying,

rankings will tend to center to 0.5. Nodes to which there is no path will also be

ranked as 0.5, since no certifying path implies an unknown level of trust, which in

a Bayesian interpretation implies a probability of 0.5 for each possibility (honest

or not). This is notably different from the previously-discussed algorithms, in

which the mean rankings were far lower.

4.5.2 Dempster-Shafer

Dempster-Shafer theory provides a neater means of coping with the problem of

ambiguity in ratings (when treated as probabilities, as described above, in which

we end up centering “unknown” trusts at 0.5). Unlike Bayesian probability the-

ory, Dempster-Shafer allows us to indicate the beliefs of various assertions (in

this case, trust ratings) in ways that don’t necessarily correspond to Bayesian

13

Figure 4: Distribution of Trusts

14

Figure 5: Trust Versus Path Length

15

Figure 6: Trust Versus Number of Disjoint Paths

16

probabilities. Specifically, Dempster-Shafer requires that the probabilities as-

signed to all subsets of the set of considered outcomes sum to 1, rather than

that the probabilities assigned to all elements of the outcome set sum to 1. For

example, consider the possibility that a principal is truthful or not truthful, i.e.

{T, ¬T} is the set of possible outcomes. In a Bayesian approach we require that

P(T) + P(¬T) = 1; this has the effect that if we don’t know anything about a

principal and don’t wish to assert anything (specifically, we do not wish to assert

that the principal is always dishonest, since this would affect our conclusions as

much as assuming the principal is always honest), we must assign a probability

of 0.5 to both T and ¬T. In the Dempster-Shafer approach, however, we re-

quire that P(T) + P(¬T) + P({T, ¬T}) = 1, which in this application allows a

principal to spread the strength of his claim between three more fine-grained as-

sertions: that a principal is trustworthy, that a principal is not trustworthy, and

that a principal might be either. In this case, trust ratings might be expanded

to occupy the range [-1, 1], allowing 0 to be equivalent to P({T, ¬T}) = 1, -1 to

be equivalent to P(¬T) = 1, and 1 to be equivalent to P(T) = 1. Note that this

range enforces the further restriction that one cannot assert, say, P(T) = 0.5

and P(¬T) = 0.5.

Dempster-Shafer may provide a more elegant way of formally computing

probabilities from the web of trust as an evidentiary model than is possible

with Bayesian probability theory. This approach is detailed further in [4].

5 Future Research

Trust metrics are a field of ongoing research. Due to the lack of clearly de-

fined success–what does it really mean to trust, and when is it being allocated

properly?–it is unlikely that anyone will invent a single perfect algorithm that

will answer such questions once and for all. I believe the attempts to apply prob-

ability to trust metrics by conceptualizing them as belief networks are sound

and provide particularly promising approaches to determining trust. Yet I also

believe future research would do well to aim to better understand how such sys-

tems are actually used in practice, and what threats are in fact most severe. For

example, in the theoretical approach, all nodes are generally considered to be

equivalent; in practice, one might consider that identities that are “high value”

targets–that are particularly worth trying to impersonate–are also more likely

to be ore validated (in the way that, for instance, Phil Zimmerman is at the

center of the PGP strong set). Observations like these may prove useful, yet are

easily overlooked without real-world implementations and experimentation.

As such, the direction I would most like to take future development is to

aim for practical usage and, especially, OpenPGP compatibility. Few of the

similar research projects discussed above have culminated in actual deployed

infrastructure, and doing so could be a highly interesting experiment.

Further opportunities also exist for other applications of trust metrics. On-

line communities are one previously discussed possibility; a particularly inter-

esting possibility, heavily lobbied for by Raph Levien (creator of the Advogato

17

community) is to apply his trust metrics to Wikipedia. Most interesting is the

suggestion to use not a global trust metric (of the sort used on Advogato or that

is analogous to PageRank and the like), but rather to use a subjective metric like

those detailed in this paper; in such a system, users would be presented with

the version of the page closest to their own beliefs, creating an encyclopedia

that embraces a subjective, rather than objective, notion of accuracy. Further,

even Internet protocols like DNS and BGP seem ripe for applications of this

technology; in short, any attempt to map unique names to forgeable identities,

especially where the notion of a correct matching is subjective, is a possible

application of trust metrics.

6 Conclusions

In this research, I have shown that it is possible to implement a practical in-

frastructure to allow distributed key signing and exchange to be verified with

sophisticated trust metrics. I have also shown three trust metrics and have

shown their relative strength and usefulness; all three show themselves to be

practical approaches to the problem of determining key validity in a web of

trust without trusted third parties. To continue this research, one could add

full OpenPGP compatibility to the implemented infrastructure to make it of

more value to existing PGP users. One could further research good trust met-

rics, implement the other proposed metrics, and design better approaches for

evaluating metrics. One could also implement trust metrics in a variety of other

settings, as previously described. In this research, at least, we have seen the

steps to a practical implementation and the evaluation of that implementation.

References

[1] Neal McBurnett. PGP Web of Trust Statistics, April 2004. URL http://

bcn.boulder.co.us/ ˜neal/ pgpstat/ .

[2] Michael K. Reiter and Stuart G. Stubblebine. Toward Acceptable Metrics

of Authentication. IEEE Symposium on Security and Privacy, May 1997.

[3] Michael K. Reiter and Stuart G. Stubblebine. Resilient Authentication Using

Path Independence. IEEE Transactions on Computers 47 (12) (December

1998). URL http://www.ece.cmu.edu/˜reiter/papers/1998/ToC.pdf .

[4] Bin Yu and Munindar P. Singh. An Evidential Model of Dis-

tributed Reputation Management, North Carolina State University, July

02-trust.pdf .

[5] Henk

P.

Penning.

Computing

Shortest

Paths

in

WOTs,

University

of

Utrecht.

URL

18

[6] Henk P. Penning. Analysis of the Strong Set in the PGP

Web of Trust,

University of Utrecht,

September 2005. URL

[7] Ralph Levien. Attack Resistent Trust Metrics, UC Berkeley, July 2004. URL

[8] Thomas Beth, Malte Borcherding and Birgit Klein. Valuation of Trust in

Open Networks. European Symposium on Research in Computer Security,

1994.

19