The Tor network is used by anyone who wants to maintain their online anonymity. There has recently been quite a bit of activity regarding Tor in the media, so I thought it would be helpful to explain a bit about how Tor's peer-to-peer structure is setup, as well as showing how we can create a map of Tor relays and exit nodes.
Before we can start mapping out the network, we first need to know a bit about how Tor works. I'll try to leave the nitty-gritty details (such as the encryption specifics) out of this post and instead hit the major high-level concepts. For anything I miss, feel free to refer to the official specifications.
The goal of Tor is to provide online anonymity to both the visitor of a service (in the case of a standard client), as well as the provider (in the case of Hidden Services). It does this by routing all traffic from the client to the destination through a series of relays called a circuit. Relays are simply Tor clients configured to also act as a router for other clients in order to provide more bandwidth to the network. By default, Tor clients send traffic through a circuit of 3 relays before reaching the final destination.
Tor clients encrypt all their traffic so that routers will only know two things: where the traffic came from immediately before it, and where the next stop for the traffic will be. This is done by encrypting the traffic once for each relay in the circuit, using a different key for each layer of encryption. This way, as each relay receives the traffic, it can only strip off one layer of encryption, and then forward the data to the next destination. If the relay is forwarding the data to another relay, all it will see is encrypted ciphertext. The only relay which will see the actual data being sent to the final destination is the exit relay.
|Traffic Encryption with Tor|
There are 3 types of Tor servers: "normal" non-exit relays, exit-relays, and bridges. Both non-exit and exit relays are publicly listed for anyone to see (and map!). An exit-relay has simply been configured to also act as the exit point for traffic as it leaves the Tor network. A bridge, however, is not entirely publicly listed. This is primarily to allow users in censored environments to access the Tor network via unpublished IP addresses.
So the question is: where is this public data located? Tor has a select few servers called Directory Authorities, which manage and maintain the information about relays. The locations of these servers are hardcoded into Tor clients. Each relay sends its information to the authorities every 18 hours in the form of a "server descriptor". A descriptor contains attributes such as the IP address, OS, Tor version, uptime, etc. in use by the relay. Then, every hour, these authorities vote on and publish a list of microdescriptors (short summaries of the server descriptors) for all currently running relays in a document called the "consensus". This document can be found via HTTP at the following URL:
http://[directory authority IP]/tor/status-vote/current/consensus [example]
In addition to the consensus expiration information, as well as the directory authority information, the consensus contains the microdescriptors of the relays which look like this:
This snippet has the microdescriptors for 3 relays. We can see these microdescriptors contain information such as the nickname, IP address, bandwidth available, any flags assigned, and the exit policy of the relay. We can tell if a relay is a valid exit relay if it has the "Exit" flag listed.
The full information about the descriptor and microdescriptor format can be found in the specification. But, now that we know where to find the raw data, let's see if we can parse out the interesting information.
The Hard Way
Now that we know where the relay information is located, we can create a Python script to pull down the consensus document, and parse out the nicknames, IP addresses, and whether or not the relay is an exit node:
This script creates a JSON file that looks like this:
Then, we can use MaxMind's free GeoLite City database to obtain the latitude and longitude given the relay's IP address:
The final output will give us a file like this:
Before throwing it all into a map, let's first see how the Tor project made it easy to get the same (and more!) data.
The Easy Way
Fortunately for us, the Tor project has already done the hard work for us. Written in Java, the Onionoo project provides a RESTful API into the data pulled from the consensus and more. In fact, instead of using the consensus which contains the microdescriptors of running relays, it queries the full list of router descriptors from the directory authorities. In addition to more information, these lists give the descriptors for non-running relays as well. We can see this list over HTTP at the following URL:
http://[directory authority IP]/tor/server/all [example]
Since the Onionoo API is already up and running, we can use a standard jQuery AJAX request to get the data, which we will use to put into a map:
The Onionoo project provides quite a bit of data, so I'll leave it to you to explore the different possibilities. Here's an example of the data that we receive:
For now, let's throw this data into a pretty map.
Mapping the Data
Now that we have the data we need, let's use the latitude and longitude data from the MaxMind GeoIP database to place a marker for each relay on a map created with jVectorMap. It should be noted that this was just a preference. You are more than free to use any map library (such as MapBox). Here is the final HTML source:
You can find a working demo here. You should note that it may take a few seconds for the data to be retrieved from the Onionoo API.
I hope this post not only showed you how we can use Tor relay information to create nice maps, but also gave a little more insight about how the Tor network structure works. I would recommend reading the specs if you want to know more information since this post just scratched the surface.
As always, please leave any questions or comments below!