Saving Tor bandwidth using merkle trees

When you start any Tor application, like the popular Tor Browser, it first needs to “connect to the network”. In this phase Tor will contact a directory in the network and download a document that is referred to as “the consensus”: a document that gets updated hourly, listing all relays on the network, currently roughly a megabyte in size. Tor needs to know which relays are in the network, or it would not be able to create paths with multiple hops. The way this works is that Tor will connect to some relay on the network, then tell that relay to connect to another relay, et cetera, using the information in the consensus to find a suitable path.

But does it really need to know about all relays on the network, or can we perhaps split the document into smaller pieces and make the client use only a subset? This is an approach that is sometimes referred to as “directory partitioning”. There has been quite a bit of research into this approach, and everyone concludes it is a bad idea, mostly because the specific set of relays a client utilizes reveals information about that client. Solving this is important: the consensus distribution is currently taking 2% of all Tor bandwidth, and this will easily grow to 50% if more people start using Tor.

Another solution that was found to somewhat address this problem is Torsk, an alternate directory protocol. Torsk addresses most of the requirements for the directory problem in Tor, but it moves critical bits of trust to single entities, which can be a problem for an adversary-resistant network like Tor.

A while ago I was thinking about alternative ways of solving this problem and came up with one. It does not solve everything either, but through this blog post I hope to find the right people who can solve this problem for good.

The idea

The idea I would like to propose is to have the directories on the network do the path selection, instead of the client.

We can achieve this by having the client just tell the directory “give me a random relay”. This is obviously insecure, as the directory could easily influence the selection by just returning a relay it controls, so we introduce a special query to guard this: together with the request for a random relay, the client also sends a randomly generated number to the directory, which on the directory can only correspond to one relay.

For this to work, we have to define a scheme that will allow resolving any random number in a certain range, to a relay. This is easy: we can take all relays on the network, and scale their bandwidth estimates so that the sum is equal to the maximum value of our random number. Now all we need to do is assign a start and end position for all relays, and we have a number->relay mapping (we can do a lookup on this mapping in O(log(N)) time using a binary search).

Of course this process plays on the directory, not on the client, so the client still has no idea whether the directory answered truthfully.

To secure all this, our process starts at the directory authorities, the (currently 9) special Tor directories that generate the consensus documents every hour. We split the list of relays from the consensus, so that clients do not need to download it. In the consensus we sign this separate document by including its hash. We also generate a merkle tree over this set of relays, including their start and end positions, and include the root hash of this tree in the consensus. Clients only download the consensus, relays on the network also download the list of relays which can be verified using the signature as published in the consensus.

Going back to our query, where the client sends a random number and the directory knows which relay that maps to, we now have the ability to protect against malicious directories: in the directory part of the query, we return the information of the relay, plus the hashes of siblings in the merkle tree so we can compute the merkle tree root hash and match it with the hash from the consensus. Since there can be multiple consensuses active at any time, the client includes which consensus it wants a relay for in its query.


I figured it may be a lot easier to understand this concept with an example of how I think it can work. There are two pieces to this puzzle: the consensus creation and the query.

In the consensus creation, instead of the document we create now, we publish a list of relays :

relay TerokNor ip=,[2001:1af8:4700:a012:1::1]:443 bandwidth=43800
relay IPredator ip= bandwidth=143000

After publishing, we add start/end pairs to each relay. In the example above TerokNor could get start=0,end=43800 and IPredator could get start=43800,end=186800. We hash the relay information together with their start/end pairs, and create a merkle tree :

      /    \
 5e0cb4    3f0c27
TerokNor  IPredator

Our consensus exists of our root hash (07513d), the bandwidth sum (186800), the directory authority signatures, some unique identifier (can also be the hash of the document), and validity information :

consensus-id 32b92d09b5
valid-from 2016-04-01 00:00:00
valid-until 2016-04-01 01:00:00
merkle-tree-root 07513d
bandwidth-sum 186800
signature moria1 74kLVzMD
signature gabelmoo Cq4Sa4hn

All directories on the network download both documents, but clients only download the consensus which is now a document that is of fixed length.

When a client wants to make a three-hop connection through the Tor network, it starts by connecting to its guard. Once connected, it sends a query that includes the consensus identifier and a random number between 0 and the bandwidth sum :

find 32b92d09b5 82475

The directory looks for that consensus, finds which relay corresponds to the random number, and finds the hashes it needs for the client to verify the result. It then returns this to the client :

found-relay IPredator ip= bandwidth=143000
hash 5e0cb4

Note that the included hash is the hash of the sibling in the merkle tree, not the hash of the relay. Once the client receives this information, it can compute the hash of the relay, and by using the hash values the directory included the client can compute the root hash of the merkle tree and confirm that the directory did not lie.

On performance

There are a bunch of performance advantages to this scheme, such as the obvious reduction in bandwidth: now, instead of downloading the full set of relays every hour, we only download two relays per circuit (middle+exit) plus some overhead because of hashes. Since the average client on the network is not making a thousand circuit per hour, this is a big win in global network bandwidth but also in Tor startup time.

On the directory side, including the directory authorities, the merkle tree scheme is low-cost: computing it can be done quickly and with linear complexity, and storing a sha256 merkle tree for ten thousand relays can be done in less than a megabyte.


In this post I wanted to show that an adversary-resistant network such as Tor is possible without requiring clients to be aware of the full network. A lot of details and optimizations are left out for the purpose of making it easy to understand, and this post is not meant as a proposal to change Tor. If the Tor community is interested in a proposal, I will be happy to write one, including a lot more details and optimizations.

Please post your questions, if any, in the comments.

Implementing a Tor relay from scratch

The last few weeks I have been spending my free time on implementing a Tor OR (Onion Relay) in the Go language ( After a while I realized the language was not suited for the project, and stopped working on it. Today I am sharing what I learned, and will release the partially working code on

In this post I will refer to the network protocol as ‘Tor’, the original implementation as ‘CTor’, and my implementation as ‘GoTor’, in an attempt to reduce confusion between the different versions. For consistency all network speeds are noted as a combined upload plus download, so divide by two where you think it is relevant.


Tor is a peer-to-peer network protocol that provides privacy to its users. It does so by encrypting the data several times, and sending it through a path (‘circuit’) of multiple servers before it reaches the final destination. Each layer of encryption can only be decrypted by the server it is intended for, so that any node in the circuit can only ever know its immediate neighbors. has a better explanation of the network in case you’re interested.

The network has a specification document, but carefully following it will result in an implementation that does not work (this has now been corrected). This posed some challenges, but luckily the CTor source code is relatively easy to read.

The goal

For a while I have been running Tor nodes using CTor. In November 2014 those nodes relayed a total of about 400 terabytes, which is an average of roughly 1200 Mbit per second. Each machine has cheap hardware to push the costs down, but the lack of AES-NI instructions on the CPUs cause a significant slowdown. CTor’s implementation is essentially single-threaded, with only a very small amount of work being done in background threads, so the slowdown of having cheap hardware is immediately noticeable in throughput.

To maximize the amount of relayed data, it is normal to simply run multiple instances of the program, up to two per IP address. Running multiple instances of CTor on the same computer causes inefficiencies, such as needing more connections to other servers and trouble in balancing network throughput, so I set out to write a Tor relay in a way that uses all CPU cores. My final goal was to beat the Tor speed record, which was at roughly 200 megabytes per second.

The path

I started by just reading the specification documents, and quickly understood the basics of the network. I then decided on a language: for a while a colleague had been pushing me to write Go, so I picked Go. It took me about a day to pick up the basics and I got started.

First up: the TLS code. Instead of using OpenSSL, Go has its own TLS implementation called “crypto/tls“, apparently because agl__, one of the people working on the language, decided so. Lots of features are missing, from DH key exchange to extracting the ClientHello message. Next to that, a quick performance benchmark showed that this implementation was very slow compared to popular OpenSSL bindings for Go. So I went with OpenSSL. While I had to fork the code to add some extra bindings, it was mostly painless.

Then came the Tor handshake. There are four versions of the link protocol, and the specification requires implementing at least the first one, which is a real pain. Instead of that, I implemented the fourth handshake, and sent a mail to the Tor developers requesting to drop support for the first and second versions. They agreed, and hopefully this week I will be finishing the patch for that.

So up to this point everything was great. I had spent only a few days and even though both the Tor protocol and the Go language were new to me, I had made significant progress. My local TorBrowser was able to connect to the relay and I saw the commands it was sending while trying to establish a circuit to the next server.

As soon as I started implementing the circuit logic, the first implementation issues started appearing. Each connection had its own buffered channel (a queue) and a circuit would be a bridge between the channels, but when both ends of the circuit would be writing to the other end simultaneously, and both channels were full, the code would deadlock. I mitigated this by just increasing the buffer, and planned to refactor the entire system later on.

Fast-forward a bit to the moment I had written enough to have a nearly full implementation of the relay. Some minor features were still missing, but it was stable enough to put a node in the network, which I did. It very slowly started building consensus (it took more than a week to get any weight assigned at all) but data was flowing nicely and it sometimes hit a few megabit per second. A few fixes had to be done but I was happy.

To get the testing up to speed, I moved the configuration and keys from a node that had been running CTor for months, and had my node run it. Traffic quickly ramped up to 500 Mbit/s, which was already more than a single CTor node could handle, but it didn’t really grow much more than that.


On top of that, memory usage grew very quickly every time I started the program. I had to restart the node twice a day or it would start swapping. The growth was about 6 megabytes per minute on average, which is a lot less than the node relayed, but definitely suggested that there was a memory leak somewhere.

I mentioned the fast relay on the Tor IRC, and a fellow relay operator TheCthulhu offered to help with some testing hardware. He set up a server with my relay and within a few days we had broken the Tor speed record with a nice 250 megabytes per second, effectively maxing out the network link. CPU usage was at a nice 60% across 12 cores. But his relay also suffered from the memory issues and had to be restarted every few days.

Meanwhile my own relay was still at only 500 Mbit/s, which was a quarter of the server’s capacity. After profiling the code and looking at some graphs, I realized that this was because of Go internals. I’ll talk more about this in a bit.

Then there was the overhead of using OpenSSL. During initial benchmarking it looked minimal, but it turns out Go assumes the worst of all C code and the language will refuse to run C code inside Go code. Instead, it will allocate an entirely new stack just for C calls, and tell the scheduler to create a new thread in case the C code hangs. If 80% of the CPU time spent in your program is on C calls, your Go program won’t hesitate to run 500 OS threads.

In the end I decided to drop the project. It was a fun learning process, but the Go language made the final product too slow. By sharing this blog post I hope that others will learn about these and maybe some day adjust the language.

Lessons learned

  • [cpu] Go cannot call C functions directly. Instead, with cgo it creates an entirely new stack and runs the C code from there, while assuming the code will block for locks or I/O. This causes a massive buildup of OS threads;
  • [cpu] To prevent the buildup of OS threads caused by cgo, you can lock concurrent calls using a lock or buffered channel. However, this causes a lot of futex system calls, slowing down the process;
  • [mem] For every OS thread, Go has its own memory allocation cache. This is good, as it avoids locks, but when combining this with all the threads spawned by cgo you will see quite a bit of memory overhead (this claim has been disputed in the comments, it’s possible I misread something in the Go source);
  • [mem] In Go’s I/O model, every connection must have its own goroutine with its own read buffer. Assuming normal TLS records of 16K and a goroutine stack size of 4K, this means that with only 10.000 active connections you will already be using 200 megabytes just because of these buffers;
  • [mem] Go’s default channel implementation uses an array and a mutex to control access. This means that a buffered queue with a capacity of 2000 slices will always be using 48KB, even if it’s empty. However, to be fair, this could be mitigated by using an alternate channel implementation that uses a linked list;

In the end, I concluded that while Go is an amazing language, it’s not that good if there is C code in performance critical loops, if you have to deal with lots of connections, or if you have a lot of data flowing between goroutines. For most projects these restrictions are not an issue at all, but in my case I hit all three.

Lots of thanks to everyone who helped me during this short project. It was a lot of fun, and I am looking forward to working on CTor.

Screen Shot 2015-01-24 at 22.12.29

KPN Experia Box v8 and PPPoE passthrough

A few weeks ago I got a new internet connection, after switching from Online to KPN (both are ISPs). It also came with a new modem, and the old one wouldn’t let me connect anymore. That was acceptable though, as the old modem (Speedtouch 564v6) was horrible.

Sadly, KPN doesn’t exactly have a reputation for delivering quality modems either, and the modem I got from them can’t be configured. With “can’t be configured” I mean that it doesn’t allow me to turn off the firewall, turn off the DHCP server, use UPnP, change the DNS servers, etc. It didn’t even have a telnet interface or other advanced configuration interface.

After spending a few hours with it (and after several configuration resets), I discovered that KPN edited the HTML of the web interface to hide a lot of elements that allow you to configure the modem. With Firebug I was able to recover some settings, including the ability to turn off the DHCP server and use my own instead.

But I wasn’t there yet. UPnP still didn’t work properly, and with manual port forwarding I couldn’t get some applications to work (probably because the firewall was still active).

Since the modem was already connected to my normal router, Apple’s Airport Extreme (nice stuff, but wouldn’t buy again) which has support for PPPoE, I decided to try using that. In theory if I managed to get that to work, I would completely bypass the parts of the Experia Box that make it so annoying. It would disable the firewall, DHCP server, it would let my own router handle UPnP, etc.

The web interface had a checkbox in the interface that said ‘PPPoE passthrough’. “Nice,” I thought. But after changing the settings of the router, it didn’t work. I hooked up my laptop (Mac) directly to the modem, but my laptop couldn’t connect either (the logs indicated that a timeout occurred after a connection was made). Maybe it was a Mac thing? I connected my Windows desktop as well, but the same thing happened there. Apparently ‘PPPoE passthrough’ was just a checkbox that didn’t do anything.

After several more hours of experimenting (I really needed this modem to work), I found a hidden web interface page that let me setup what I wanted in only a few clicks. Which is why I’m writing this article, as I don’t want anyone else to go through this mess.

To enable PPPoE passthrough on the KPN Experia Box v8, reset the modem first (hold the reset button until it restarts), then go to and click Apply. Wait for the modem to restart and then configure your router (which should be the only device connected to your modem, preferably on LAN1). That’s all! The phone function still works, but the wireless router part of the modem should probably be turned off.

For Telfort users, the procedure is a lot more complicated, and I have not yet found the easiest way to do it. Read the comments for a few suggestions from readers.

iTunes Store? Never again.

Every now and then, I download some songs from the internet. I respect the authors of these songs, so I usually buy them from iTunes. But I also have the good habit of cleaning my computer every 2 to 3 months (windows re-install) so I did that a few days ago.

Basically, what you expect from a web-store like the iTunes store, is that it will automatically re-download the songs you had before you cleaned your PC. I mean, if Steam can do this with games of 5 GB, it shouldn’t be too much of a problem for Apple with songs of about 3 MB-. Sure, we are talking about a lot more songs than games, but I don’t think many people buy 5 GB off the iTunes store.

So, I logged into my iTunes this afternoon, after installing iTunes again, and I clicked “Check for available downloads”. It told me I already had all songs! Well, in my Purchased tab, I only saw one song, and that was a free song I downloaded from the iTunes Store because I wanted to test something. So, I checked the iTunes FAQ and it told me I would have to backup all my songs!!

Isn’t it ridiculous that I have to re-buy all my songs again? Is it really too much that I expect the iTunes Store to re-download my purchases after I re-install my computer? Steam does it. The EA download manager does it. does it. And a way bigger company called Apple can’t do something as simple as this?

They force us to have DRM on the songs we purchase. They force us to backup everything we buy, but we can’t burn them on CDs if we want to listen to them. They can only be on five computers at the same time, and if you accidentally forget to deauthorize your computer before re-installing your PC, you can only have four computers at the same time. The iTunes Store really limits what you can do with the music.

I don’t think I’ll ever buy off the iTunes Store again. I will resume my old habits of downloading them from Usenet and Torrents, and if I respect the author well enough, I will make a donation to them. Sorry Apple folks, but it’s 2009 and this kind of trickery isn’t wise.