Challenges, priorities, and progress in anti-censorship technology at Tor
by phw | August 27, 2020
This blog post seeks to bring clarity to the modus operandi of the Tor Project in the anti-censorship space by providing a summary of the challenges we face, the priorities we focus on, and the progress we have made so far related to our circumvention technology. Censorship circumvention is a complex and ever evolving problem, and this blog post summarizes our approach in tackling it. Please feel free to ask any related question in the comments. Thanks to hanneloresx's translation, you can find a Chinese version of this blog post below.
Tor's Anti-Censorship space
In February 2019, we hired two engineers to focus on and advance anti-censorship technology at Tor. The anti-censorship team also includes several other people in the Tor community who contribute designs, code, insight into past work, infrastructure, documentation,and resources. The goal of Tor's anti-censorship team is to understand network censorship and build technology to circumvent it so that the Tor network can be accessible to everyone.
The state of circumvention
Some Internet Service Providers (ISP)block the domainwww.torproject.org, making it difficult for their users to download a copy of Tor Browser. Our service GetTor can help these users get Tor Browser despite this: simply send an email to gettor@torproject.org, which will automatically respond with alternative download links for Tor Browser. These download links point to GitHub, GitLab, the Internet Archive, and Google Drive. At least one of these hosting providers should be accessible to each of our users. For example, users from China can download Tor Browser from our GitHub mirror.
Once you have your copy of Tor Browser, you are ready to connect to the Tor network. Unfortunately, some ISPs interfere yet again, aided by technology that either blocks the IP addresses of Tor relays and/or detects the Tor protocol dynamically, by inspecting network traffic that passes the ISP's perimeter—so-called deep packet inspection (DPI).
If you are unable to directly connect to the Tor network, you need to use bridges. Bridges are unlisted Tor relays and, depending on the bridge type, can obfuscate network traffic in a way that's more difficult for ISPs to detect.The simplest method of censorship circumvention in Tor Browser is to use our default bridges—a set of a dozen bridges that are part of Tor Browser. These bridges are essentially public, which is why more effective censorship systems such as China's Great Firewall (GFW) block them, but they are still effective in many places like Iran. Take a look at our Tor Browser manual to learn how to enable default bridges.
If you are unable to connect to our default bridges, you currently have three options:
Use Snowflake. Snowflake is currently only available in our Tor Browser alpha version but is on track to be part of Tor Browser stable. Our most recent changes added a new set of STUN servers, making Snowflake available in China and other places that block access to Google services. We are currently stress testing the system to handle more users as we move towards a stable release.
Use meek-azure. While meek-azure should work everywhere (including behind the GFW), it is overloaded and therefore slow. Microsoft's Azure CDN (which meek-azure is based on) is expensive, which is why we have to place a traffic cap on the meek-azure bridge.
What are our challenges?
Technological obstacles
A successful circumvention system consists of two components:
a network protocol (for example WebRTC, obfs4, or TLS) and
an endpointto connect to (for example a Snowflake proxy, a CDN server for meek, or an obfs4 bridge).
Both the protocol and the endpoint must resist detection. The GFW is able to detect the obfs2 and obfs3 protocol on the wire, meaning that it can detect these protocols by simply looking at the bytes that cross the national perimeter. The latest iteration in the obfs series, obfs4, still currently works in China in the sense that the GFW cannot (or chooses not to) block it by simply looking at bytes on the wire. Unfortunately, obfs4 being unblocked is not enough. One also needs an unblocked endpoint to connect, and this is where the trouble starts. We currently use the serviceBridgeDB to hand out bridges to our users. After the user solves a CAPTCHA, BridgeDB will return up to three bridges. CAPTCHAs are not the only defense to prevent censors from learning all bridges since in the age of deep learning, CAPTCHAs represent but a small obstacle. Handing out bridges to users while preventing censors from learning them all remains a hard problem, but more on that later. Fortunately, most censoring countries are limited in the amount of time, money, and talent that they can dedicate to blocking the Tor network, which is why BridgeDB is still effective in many places.
Resource limitations
We are limited in the amount of time and money that we can dedicate to circumvention technology, which means that we are unable to address all problems that we want (and should) address; as a result, we need to think carefully about how we can best spend the limited time we have. Consider, for example, that there are numerous trivial fixes that could make Tor available again in places like China. Unfortunately, censors can often react just as swiftly and block Tor yet again, rendering such fixes a poor investment of our time. The key is to develop technology that is asymmetrically more difficult for a censor to block. A promising circumvention technology is one that takes us n hours to deploy and 2^n hours for a censor to block. Needless to say, it's not always clear which technology would work best in advance of deployment, which is why we seek to maximize our impact with limited resources by formulating and focusing on clear priorities, as explained below.
What are our priorities?
Our goal is to maximize the number of people that we help circumvent Internet censorship around the world. Internet censorship is a moving target, which means that technologies that worked great five years ago find themselves blocked by many ISPs today. This is why we need to constantly invest in new research and technology to be ahead of censors.
Over the last year or so, we invested heavily in Snowflake, which originally started as a research project. Over the next few months, we aim to land Snowflake in the stable Tor Browser version, marking a significant milestone. We have been slowly ramping up its use and now have over 6,000 volunteer proxies helping users circumvent censorship and providing moving targets that are difficult for censors to enumerate and block.
BridgeDB, our existing bridge distribution system, is showing its age. It's heavily tailored towards a specific purpose, making it difficult to extend and generalize. We therefore started working on a more flexible and lightweight reimplementation. This reimplementation will provide the following benefits over the old BridgeDB:
Implement a feedback loop that hands out bridges to censorship measurement platforms such as OONI and feedsthe resulting reachability information back into bridge distribution. That means that if a user from country X requests a bridge, we won't give them a bridge that's known to be blocked in country X.
Our reimplementation will periodically test bridges with the help of bridgestrap so that we won't be handing out bridges whose obfs4 port is firewalled, or that are otherwise broken.
We are working on building the Salmon bridge distribution system, which will help with the problem of endpoint blocking. Salmon was originally proposed in a research paper at PETS'17. The idea is that users have a "reputation score" that goes down when one of their bridges is blocked and it goes up if their assigned bridges remain unblocked. If a user's reputation score gets too low, the user gets blocked, and if it gets high enough, the user can invite others to the system. Take a look at this net4people thread for a crisp overview of Salmon.
We have a roadmap that we revisit every three months to identify a handful of short-term goals. Take a look at our roadmap to see these goals.
How can you learn more or get involved?
Our lack of resources is our biggest challenge against powerful nation-state censors, so we are always looking for new contributors and collaborators. If you would like to learn more or get involved, take a look at the following:
Tor's anti-censorship team meets once a week, on Thursdays at 16:00 UTC in the #tor-meeting IRC channel. Joining a weekly meeting is the best way to meet us and get started on the team. Don't worry if you miss a meeting: we post our meeting logs to the tor-project mailing list. Most of us are also available in the #tor-dev and the #tor-project IRC channelson irc.oftc.net, so feel free to reach out any time. Every two weeks, we typically have a reading group right after our regular meeting. We use these reading groups to discuss research papers or software projects. Take a look at our meeting pad to learn what our next reading group is going to be about and feel free to join our discussion.
All of our software lives in the anti-censorship GitLab project. If you're interested in getting involved, take a look at issues that have the "First Contribution" label.
Finally, here is a list of past blog posts and presentations that provide more information:
What is the relationship between Tor anticensorship efforts and all the other anticensorship projects our there (Shadowsocks, V2Ray, Trojan-GFW, Lantern, GoodbyeDPI, Geneva, etc.)?
All these tools have the same goal but differ in their approach. Some implement a new obfuscation protocol layer and others manipulate transport layer fields to evade detection. Some require users to set up their own proxy instances and others are "ready to use." Over at the net4people GitHub forum, there are summaries of some of these systems:
Most recent articles tend to have a bunch of comments waiting for approval. I think most moderators process them in batches, so it sometimes take a while.
FWIW I have used Tor Browser (via Tails 4.10) in past few days to visit many news sites without problems. However there does appear to be an on-going attack on at least some onions and Secure Drop sites operated by major news sites and prominent journalists.
Could you put RSS support back into the browser? Removal of this key technology is very limiting. What kind of progress if any, have you made in regards to AI?
I'm not aware of any Tor project that heavily relies on artificial intelligence. Off the top of my head, some research projects use machine learning classifiers to tell apart circumvention protocols from "normal" Internet traffic, e.g.: https://censorbib.nymity.ch/#Wang2015a
Another possibility is the use of clustering algorithms to identify Sybil relays in the Tor network, e.g.: https://nymity.ch/sybilhunting/
It is possible that while not all Tor enthusiasts have easy access to Big Data processing tools like tensorflow (i.,e. anyone can download but we cannot run it effectively without suitable hardware), making data collected and published by TP more easily available might assist enthusiasts in brainstorming how modern computer statistics could possibly assist TP in detecting Sybil attacks. To repeat something I have requested previously, I would love to see TP bring back to life torstatus, only publishing the data as a CSV file rather than as a web page. (Data wranglers can make the conversion of course, but why put up an unneccessary obstacle?) Plus, TP needs to make it easier to anonymously report potential problems or weird experiences. Email lists are quite unsuitable for this purpose.
> Another possibility is the use of clustering algorithms to identify Sybil relays in the Tor network, e.g.: https://nymity.ch/sybilhunting/
Could not find the mention of clustering algorithms on that page; could you say more?
I note that clustering algorithms long predate anything most people would call machine learning but many simple techniques are easy to understand. Clear explanations of old style cluster algorithms (which still can be put to good use of course) in Brian Everitt, Cluster Analysis, Cluster Press, a fortunately rare example of a book which contains not the slightest hint to its date of publication, but I'd guess 1980s (the author is still active in the field, I think).
We used single-linkage clustering to sort uptime matrices of Tor relays, making it easy to inspect the resulting visualisations for potential Sybils. Here are some examples: https://nymity.ch/sybilhunting/uptime-visualisation/
Thank you much for the links! It would be nice if I had some way to provide feedback in case I actually think of some useful new way to use clustering methods or another statistical technique (machine learning or otherwise) to help Tor Project protect and improve Tor products and network.
Actually, it was in part something Roger D said that encouraged me to read up on statistics. I was surprised to find that many recent probability theory textbooks do mention the Bayes's Rule paradox he described in terms of firewalls, under various other names. The most common way of explaining it involves dragnet screening for cancer. Because the paradox is so relevant to so many issues in public affairs (and also to anonymity/cybersecurity issues), with the moderators's indulgence I repeat the discussion here:
Imagine a medical screening for cancer which is "quite accurate" in the sense of almost always returning a positive result when a person actually has cancer and almost never returning a positive result when a person does not have cancer.
Let
e = Pr(C) = probability random person has cancer
p = Pr(P|C) = probability a person with cancer will test positive
q = Pr(P|~C) = probability a healthy person will test positive (false positive result)
where naturally 0 < e,p,q < 1.
Put
a = p/q
and note that for any reasonable test a > 1.
Then from the definition of conditional probability
Pr(C|P)*Pr(P) = Pr(PC) = Pr(P|C)*Pr(C)
we obtain Bayes's Rule in the form
Pr(C|P) = Pr(P|C)/Pr(P)*Pr(C)
where Pr(C) is the prior estimate (before the test result comes back) and Pr(C|P) is the posterior estimate, and Pr(P|C)/Pr(P) is the Bayes factor used to update Pr(C) when we receive the test result for a particular person.
which gives a one parameter family of transformations (a is the parameter, e is the variable being transformed) which is precisely the group of Moebius transformations preserving the real line and fixing the points 0,1 on that line.
The surprise is that even with q small (slightly greater than zero) and p large (slightly less than one), if e is small, Pr(C|P) is also small. (Most people who test positive do not in fact have cancer.)
In the context of COVID-19 testing, this says that a dragnet screening for exposure to CoV-SARS-2 virus even using an "accurate" test might still yield unreliable results if the actual incidence of exposure is small.
In the context of screening social media posts such as this for "potentially violent extremism", clearly e is very small, so even a "highly accurate" DHS I&A or state/local fusion center social media screening will flag almost entirely persons who are not in fact potentially violent extremists.
In all cases, rare phenomena (such as violent extremism) are essentially impossible to detect by dragnet screening.
(One can contemplate combining the results of many tests, but these will almost never be independent in the sense of probability theory; the contrary, the combined results are not likely to be much more reliable than the individual test results, in the case of dragnet screening for a rare phenomenon.)
Abuse of statistics (by ignoring fundamental issues such as the one just explained) matters; to give just the most recent example, the NY Post reports that Drump is demanding that USG withhold federal funds from NYC, DC, Seattle, and Portland, on the grounds that the mayors of these cities have generally not expressed eagerness to shoot BLM protesters or presumed "anarchists" in cold blood (although they have been willing to deploy chemical weapons). Note that political violence is a rare phenomenon; possibly Drump has been warned that modern statistics does not enable any easy way to identify and proactively detain those supposedly "predisposed" to engage in political violence, because he has jumped over that (false) "scientific justification" [sic] for precrime screening by insisting on holding the entire populations of the targeted cities responsible for a rare phenomenon which sometimes occurs within city limits.
The proposal to defund DC is particularly startling because DC does not have statehood or political representation to try to defend itself from disaster. Also, of course, the federal government is located in DC, so it is hard to understand how USG could continue to function if that city had no funding for sewage disposal, bridge repair, road work, etc. Similarly, NYC is the largest city in the nation and the financial capitol of the world, so it is hard to see how the global economy would continue to function if that cities infrastructure is destroyed.
It's enough to make one want to board a plane bound for DC, while wearing funereal black and a BLM face mask of course ;-/
I do not believe the hype which claims without much evidence that "AI" (really machine learning, really a kind of souped up regression) can weed out spam and detect disinformation by Rubots (and CIA & the family dog), but maybe if TP says you are using AI, some idgit corporate sponsor will be pleased?
Why would you jump to that strange conclusion? I said I doubted that AI will miraculously solve the problem of "web curation" which includes things like weeding out Rubots---- a generic term for any state-sponsored algorithmically assisted disinformation/disruption campaign, with apologies to Russians generally, but you know, RU does have a reputation ;-)
Anyone who is tempted to fall for this unsupported, sweeping (and false) claim would be well advised to read the book by Cathy O'Neill, Weapons of Math Destruction.
Many leading researchers in the area of machine learning have warned against the rush to embrace these methods as a panacea. From time to time publications such as wired.com offer lengthy articles by experts which show why machine learning cannot in fact solve every problem which currently imperil human existence.
Ah yes, you offer random book promotions and news websites as your own 'expertise' for calling things 'false claims' by making your own 'false claims'. You're 'falling' into the half-truth mantra.
Of course many complex problems are very hard to solve, yet not impossible. This is at the forefront of current technology and human knowledge, foresight is obviously vital.
Wired is a rather mainstream site, or in other words, 'corporate' with monetary interests. I would recommend reading https://sciencex.com/news/ instead.
Has the wiki on Trac been migrated to GitLab? If so, how on earth can anyone find it? Trac's front page has entry points to its wiki, but GitLab links to torproject.org subdomains. If Trac's wiki hasn't been migrated, I'm at least still able to read it on Trac, but I can't update or edit it.
I posted this already on the thread announcing Tor Browser 9.5.4, but it looks like no one's monitoring that thread, for some reason, so I'll post it here in the hopes it gets out sooner. Tor Browser 9.5.4 was released with an EXPIRED key. How could that happen, given Tor's security focus?
BTW, I know of many blogs on the web that don't pre-moderated. Has the Tor blog been overwhelmed by spam? Seems like you could post-moderate. Promoting free speech means more than saying "go buy your own megaphone." It means, here, my friend, use my megaphone. Are there that many people posting hateful or ridiculous theories here? I'd bet most have comments related to TOR, even if you disagree with them.
Regarding moderation: it's not an awful lot of spam but it's a fair share of nonsensical and a large share of off-topic comments. I'm a fan of critical comments that hold us accountable – as long as they are polite. In an ideal world, we would already have a web forum so that folks don't have to use the comment section for off-topic discussions.
I want to say a bit more for the benefit of GPG newbies.
I think I understand what happened here and I will illustrate with another example from my own experience in the past few days.
I use Tails and every two months or so download the ISO image of the newest edition, currently Tails 4.10. Before burning a new DVD (and updating Tails USBs), naturally I verify the validity of the ISO using the detached key. I keep copies of the Tails signing key in various encrypted devices/locations. When I use
gpg --verify *.iso.sig *.iso
to verify the ISO image, sometimes I see an "expired key" message. That happens because the "long term offline identity" signing key (used to make the successive detached keys for the successive ISO images of each new Tails edition as it comes out) has been updated by Tails Project, but I have not yet imported the changes into all my copies of the public half of the signing keypair. When I check the latest version of the signing key to the best of my ability and import it, I no longer see that message, because now GPG knows that the expiration date of the key has been extended by Tails Project.
In more detail: the closely guarded signing key itself (first one listed below) is used to create subkeys which are then used to sign the detached signatures which users use to verify the ISO images. (Defense in depth.) Currently when using Tails 4.10 the command gpg --list-keys should show:
If anyone else gets a different result *please* speak up!
> In an ideal world, we would already have a web forum so that folks don't have to use the comment section for off-topic discussions.
Thank you for that. It is difficult for users not to suspect that the reason why this issue has gone unaddressed for some many years is that USG entities are threatening TP with being suddenly shut down by simultaneous heavily armed SWAT team raids on all the devs located in the USA. I worry that will happen anyway. Note that both Drump and Biden hate Tor, and contrary to the hopes of many worried by what Drump will do in his second term, Biden's character (and ties to offshore and federal LEAs) do not permit him to "move left". Time is very short to prepare for whatever is coming in 2021. As I said, I worry.
Keep in mind you can download updated keys through the normal Internet (clearnet) or through Tor Browser (Tor network to clearnet service) or from a .onion service or by configuring GPG or curl to proxy through the tor exe that starts on your local machine when you open Tor Browser. On Linux, parcimonie can automate it.
Refrain from downloading from SKS keyservers. [1] [2] I don't know if GPG has been patched or what version may be invulnerable.
> configuring GPG or curl to proxy through the tor exe that starts on your local machine when you open Tor Browser. On Linux, parcimonie can automate it.
Not familiar with parcimonie, but Tails automatically routes GPG requests (for key downloads and uploads from a key server) through Tor. Tails is a complete operating system plus computing environment which works out of the box but which is designed not to leak or store personal information. It is not perfect but well tested (the Snowden leaks show NSA could not break Tails as of May 2013).
Thanks for this post; it is encouraging to learn that it is still possible (but still too hard) to use Tor from behind GFC. I'd like to hear more about censorship in Russia, Belarus, Congo, etc.
> Don't worry if you miss a meeting: we post our meeting logs to the tor-project mailing list.
Activists may not wish to increase their personal risk by using email to contact an organization all the bad governments hate with a passion.
> Most of us are also available in the #tor-dev and the #tor-project IRC channels on irc.oftc.net, so feel free to reach out any time.
Unfortunately, some years ago Tails removed, without explanation, the invaluable anonymous chat (Pidgin) which made it possible to *sometimes* enter certain chat rooms and to use OTR to hold a private conversation. Unfortunately both of the chat rooms you mentioned blocked people who have not registered a chat account.
Please ask Tails to consider restoring chat (ideally using something with a less awful UX and better security than Pidgin) and if possible the anon feature.
I also believe it would be useful for at least some Tor devs to adopt Signal and/or post a number where they can be reached for e.g. reports of possible problems using Tor. I realize that this will be received by our enemies as an opportunity to spam you or worse but I hope you can figure out an effective way to lessen the risk from that kind of cheap shot.
> Microsoft Azure
The head of the Azure Big Data team has a private plane and, it seems, a propensity to circle the homes of at last one person helping to organize employee boycotts of M$, Amazon, Google secret surveillance projects. Good example of how activists can turn the tools of Surveillance Capitalism back upon the Surveillance State. Another good example:
theintercept.com
Doorbell Cameras Like Ring Give Early Warning of Police Searches, FBI Warned
Two leaked documents show how a monitoring tool used by police has been turned against them.
Sam Biddle
31 Aug 2020
> The rise of the internet-connected home security camera has generally been a boon to police, as owners of these devices can (and frequently do) share footage with cops at the touch of a button. But according to a leaked FBI bulletin, law enforcement has discovered an ironic downside to ubiquitous privatized surveillance: The cameras are alerting residents when police show up to conduct searches.
Not just Ring but dashcam cameras and many other tools can be repurposed for intrusion detection and realtime remote monitoring by trusted confederates, including devices over which we have reason to hope the owner has better control. Encourage everyone reading this to use their imagination! So far it looks like this is one area where we are gaining on the bad guys.
A new and extremely dangerous FBI tactic is to "turn" right wing extremists, including at least one who is described in FBI's own literature as a "terrorist", and then direct their new "asset" to commit felonies on behalf of the feds, apparently including illegal searches of the homes of suspected Antifa supporters. (FBI can easily obtain door keys by threatening people who legitimately have a copy, and apparently is willing to pass these onto known terrorists.) With a bit of ingenuity such illegal and flagrantly immoral operations can be documented. And then... hurrah for Secure Drop!
Cloudflare is demanding Javascript-enabled CAPTCHAs ubiquitously. What caused them to change from the regular ones that didn't require JS, and what can be done to encourage them to go back?
Security classification is often abused to enable censorship of government foul-ups and misconduct. Because Snowden relied upon GPG, OTR chat, and Tails-- which uses Tor--- it seems relevant to pass on some good news: an appeals court has just ruled that the NSA phone dragnet revealed in the very first published Snowden leak (which unveiled the Prism dragnet surveillance system) is illegal. This vindicates his decision to leak and one hopes will result in his immediate pardon. Indeed, one hopes that Reality Winner and all other USIC whisteblowers will also be pardoned, along with Julian Assange.
Comments
Please note that the comment area below has been archived.
What is the relationship…
What is the relationship between Tor anticensorship efforts and all the other anticensorship projects our there (Shadowsocks, V2Ray, Trojan-GFW, Lantern, GoodbyeDPI, Geneva, etc.)?
All these tools have the…
All these tools have the same goal but differ in their approach. Some implement a new obfuscation protocol layer and others manipulate transport layer fields to evade detection. Some require users to set up their own proxy instances and others are "ready to use." Over at the net4people GitHub forum, there are summaries of some of these systems:
If you are so against…
If Tor Project is so against censorship why did they remove all the comments from the previous blog post? That seems like censorship to me.
Please restore the valuable discourse.
I addressed this concern…
I addressed this concern here.
Are some comments still…
Are some comments still pending approval in recent previous articles?
Most recent articles tend to…
Most recent articles tend to have a bunch of comments waiting for approval. I think most moderators process them in batches, so it sometimes take a while.
That. Also this user might…
That. Also this person might need to scroll a bit further down the blog page to https://blog.torproject.org/new-release-tor-browser-954 ?
Sing out if there's anything you can't find.
Google CAPTCHAs blocking Tor…
Google CAPTCHAs blocking Tor everywhere...
FWIW I have used Tor Browser…
FWIW I have used Tor Browser (via Tails 4.10) in past few days to visit many news sites without problems. However there does appear to be an on-going attack on at least some onions and Secure Drop sites operated by major news sites and prominent journalists.
Just implement hackerfactor…
Just implement hackerfactor suggestion changes. Some are very easy.
You keep telling people to get involved but when they get involved Tor team just make them stay away.
+1 ██████ BUILD ██████ ████…
+1
██████ BUILD ██████ ████ ██████ ████
████ ██████ ████ BRIDGES ██████ ████
████ ████ NOT ████ ██████ ████ █████
██████ ████ ██████ WALLS ████ ██████
-
-
▂▃▄▅▆▇██ BRIDGES ██▇▆▅▄▃
I want this on a T-shirt! 8>
I want this on a T-shirt! 8>
Could you put RSS support…
Could you put RSS support back into the browser? Removal of this key technology is very limiting. What kind of progress if any, have you made in regards to AI?
AI as in artificial…
AI as in artificial intelligence?
yes, https://github.com…
yes, https://github.com/tensorflow
I'm not aware of any Tor…
I'm not aware of any Tor project that heavily relies on artificial intelligence. Off the top of my head, some research projects use machine learning classifiers to tell apart circumvention protocols from "normal" Internet traffic, e.g.: https://censorbib.nymity.ch/#Wang2015a
Another possibility is the use of clustering algorithms to identify Sybil relays in the Tor network, e.g.: https://nymity.ch/sybilhunting/
Nice, would like to see more…
Nice, would like to see more on this subject. Quantum computing will change everything. The question is when it becomes mainstream.
> Nice, would like to see…
> Nice, would like to see more on this subject.
Plus one.
It is possible that while not all Tor enthusiasts have easy access to Big Data processing tools like tensorflow (i.,e. anyone can download but we cannot run it effectively without suitable hardware), making data collected and published by TP more easily available might assist enthusiasts in brainstorming how modern computer statistics could possibly assist TP in detecting Sybil attacks. To repeat something I have requested previously, I would love to see TP bring back to life torstatus, only publishing the data as a CSV file rather than as a web page. (Data wranglers can make the conversion of course, but why put up an unneccessary obstacle?) Plus, TP needs to make it easier to anonymously report potential problems or weird experiences. Email lists are quite unsuitable for this purpose.
> Another possibility is the…
> Another possibility is the use of clustering algorithms to identify Sybil relays in the Tor network, e.g.: https://nymity.ch/sybilhunting/
Could not find the mention of clustering algorithms on that page; could you say more?
I note that clustering algorithms long predate anything most people would call machine learning but many simple techniques are easy to understand. Clear explanations of old style cluster algorithms (which still can be put to good use of course) in Brian Everitt, Cluster Analysis, Cluster Press, a fortunately rare example of a book which contains not the slightest hint to its date of publication, but I'd guess 1980s (the author is still active in the field, I think).
Take a look at Section 4.3.2…
Take a look at Section 4.3.2 in the following paper:
https://nymity.ch/sybilhunting/pdf/sybilhunting-sec16.pdf
We used single-linkage clustering to sort uptime matrices of Tor relays, making it easy to inspect the resulting visualisations for potential Sybils. Here are some examples:
https://nymity.ch/sybilhunting/uptime-visualisation/
Thank you much for the links…
Thank you much for the links! It would be nice if I had some way to provide feedback in case I actually think of some useful new way to use clustering methods or another statistical technique (machine learning or otherwise) to help Tor Project protect and improve Tor products and network.
Actually, it was in part something Roger D said that encouraged me to read up on statistics. I was surprised to find that many recent probability theory textbooks do mention the Bayes's Rule paradox he described in terms of firewalls, under various other names. The most common way of explaining it involves dragnet screening for cancer. Because the paradox is so relevant to so many issues in public affairs (and also to anonymity/cybersecurity issues), with the moderators's indulgence I repeat the discussion here:
Imagine a medical screening for cancer which is "quite accurate" in the sense of almost always returning a positive result when a person actually has cancer and almost never returning a positive result when a person does not have cancer.
Let
e = Pr(C) = probability random person has cancer
p = Pr(P|C) = probability a person with cancer will test positive
q = Pr(P|~C) = probability a healthy person will test positive (false positive result)
where naturally 0 < e,p,q < 1.
Put
a = p/q
and note that for any reasonable test a > 1.
Then from the definition of conditional probability
Pr(C|P)*Pr(P) = Pr(PC) = Pr(P|C)*Pr(C)
we obtain Bayes's Rule in the form
Pr(C|P) = Pr(P|C)/Pr(P)*Pr(C)
where Pr(C) is the prior estimate (before the test result comes back) and Pr(C|P) is the posterior estimate, and Pr(P|C)/Pr(P) is the Bayes factor used to update Pr(C) when we receive the test result for a particular person.
This can be rewritten as
Pr(C|P) = p*e/(p*e+q*(1-e)
= a*e/((a-1)*e+1) = T_a(e)
which gives a one parameter family of transformations (a is the parameter, e is the variable being transformed) which is precisely the group of Moebius transformations preserving the real line and fixing the points 0,1 on that line.
The surprise is that even with q small (slightly greater than zero) and p large (slightly less than one), if e is small, Pr(C|P) is also small. (Most people who test positive do not in fact have cancer.)
In the context of COVID-19 testing, this says that a dragnet screening for exposure to CoV-SARS-2 virus even using an "accurate" test might still yield unreliable results if the actual incidence of exposure is small.
In the context of screening social media posts such as this for "potentially violent extremism", clearly e is very small, so even a "highly accurate" DHS I&A or state/local fusion center social media screening will flag almost entirely persons who are not in fact potentially violent extremists.
In all cases, rare phenomena (such as violent extremism) are essentially impossible to detect by dragnet screening.
(One can contemplate combining the results of many tests, but these will almost never be independent in the sense of probability theory; the contrary, the combined results are not likely to be much more reliable than the individual test results, in the case of dragnet screening for a rare phenomenon.)
Abuse of statistics (by ignoring fundamental issues such as the one just explained) matters; to give just the most recent example, the NY Post reports that Drump is demanding that USG withhold federal funds from NYC, DC, Seattle, and Portland, on the grounds that the mayors of these cities have generally not expressed eagerness to shoot BLM protesters or presumed "anarchists" in cold blood (although they have been willing to deploy chemical weapons). Note that political violence is a rare phenomenon; possibly Drump has been warned that modern statistics does not enable any easy way to identify and proactively detain those supposedly "predisposed" to engage in political violence, because he has jumped over that (false) "scientific justification" [sic] for precrime screening by insisting on holding the entire populations of the targeted cities responsible for a rare phenomenon which sometimes occurs within city limits.
The proposal to defund DC is particularly startling because DC does not have statehood or political representation to try to defend itself from disaster. Also, of course, the federal government is located in DC, so it is hard to understand how USG could continue to function if that city had no funding for sewage disposal, bridge repair, road work, etc. Similarly, NYC is the largest city in the nation and the financial capitol of the world, so it is hard to see how the global economy would continue to function if that cities infrastructure is destroyed.
It's enough to make one want to board a plane bound for DC, while wearing funereal black and a BLM face mask of course ;-/
Probably. I do not believe…
Probably.
I do not believe the hype which claims without much evidence that "AI" (really machine learning, really a kind of souped up regression) can weed out spam and detect disinformation by Rubots (and CIA & the family dog), but maybe if TP says you are using AI, some idgit corporate sponsor will be pleased?
(Not a serious suggestion.)
I guess you don't believe in…
I guess you don't believe in self driving cars either then.
Why would you jump to that…
Why would you jump to that strange conclusion? I said I doubted that AI will miraculously solve the problem of "web curation" which includes things like weeding out Rubots---- a generic term for any state-sponsored algorithmically assisted disinformation/disruption campaign, with apologies to Russians generally, but you know, RU does have a reputation ;-)
Semantics, your 'beliefs'. A…
Semantics, your 'beliefs'. A neural network combined with good software and quantum computing can solve any complex problem, really.
Anyone who is tempted to…
Anyone who is tempted to fall for this unsupported, sweeping (and false) claim would be well advised to read the book by Cathy O'Neill, Weapons of Math Destruction.
Many leading researchers in the area of machine learning have warned against the rush to embrace these methods as a panacea. From time to time publications such as wired.com offer lengthy articles by experts which show why machine learning cannot in fact solve every problem which currently imperil human existence.
Ah yes, you offer random…
Ah yes, you offer random book promotions and news websites as your own 'expertise' for calling things 'false claims' by making your own 'false claims'. You're 'falling' into the half-truth mantra.
Of course many complex problems are very hard to solve, yet not impossible. This is at the forefront of current technology and human knowledge, foresight is obviously vital.
Let's please keep it on…
Let's please keep it on topic.
Wired is a rather mainstream…
Wired is a rather mainstream site, or in other words, 'corporate' with monetary interests. I would recommend reading https://sciencex.com/news/ instead.
Mozilla removed RSS from…
Mozilla removed RSS from Firefox with the release of version 64 in December 2018. https://support.mozilla.org/en-US/kb/feed-reader-replacements-firefox
Yes, to many peoples…
Yes, to many peoples disappointment. The irony that they suggest insecure privacy invading add-ons like Pocket as a replacement.
There was a recent live…
There was a recent live stream from Tor on this subject too:
https://www.youtube.com/watch?v=aOOChyMCZH4
Just watched this. Wish the…
Just watched this. Wish the OONI site worked better without javascript.
Has the wiki on Trac been…
Has the wiki on Trac been migrated to GitLab? If so, how on earth can anyone find it? Trac's front page has entry points to its wiki, but GitLab links to torproject.org subdomains. If Trac's wiki hasn't been migrated, I'm at least still able to read it on Trac, but I can't update or edit it.
For the most part, yes. Here…
For the most part, yes. Here's the anti-censorship team's wiki page:
https://gitlab.torproject.org/tpo/anti-censorship/team
Where are the following…
Where are the following pages on GitLab?
I posted this already on the…
I posted this already on the thread announcing Tor Browser 9.5.4, but it looks like no one's monitoring that thread, for some reason, so I'll post it here in the hopes it gets out sooner. Tor Browser 9.5.4 was released with an EXPIRED key. How could that happen, given Tor's security focus?
BTW, I know of many blogs on the web that don't pre-moderated. Has the Tor blog been overwhelmed by spam? Seems like you could post-moderate. Promoting free speech means more than saying "go buy your own megaphone." It means, here, my friend, use my megaphone. Are there that many people posting hateful or ridiculous theories here? I'd bet most have comments related to TOR, even if you disagree with them.
The key was updated a while…
The key was updated a while ago. See the note in this blog post:
https://blog.torproject.org/new-release-tor-browser-954
Try refreshing your key ring.
Regarding moderation: it's not an awful lot of spam but it's a fair share of nonsensical and a large share of off-topic comments. I'm a fan of critical comments that hold us accountable – as long as they are polite. In an ideal world, we would already have a web forum so that folks don't have to use the comment section for off-topic discussions.
> The key was updated a…
> The key was updated a while ago. See the note in this blog post:
https://blog.torproject.org/new-release-tor-browser-954
>
> Try refreshing your key ring.
I want to say a bit more for the benefit of GPG newbies.
I think I understand what happened here and I will illustrate with another example from my own experience in the past few days.
I use Tails and every two months or so download the ISO image of the newest edition, currently Tails 4.10. Before burning a new DVD (and updating Tails USBs), naturally I verify the validity of the ISO using the detached key. I keep copies of the Tails signing key in various encrypted devices/locations. When I use
gpg --verify *.iso.sig *.iso
to verify the ISO image, sometimes I see an "expired key" message. That happens because the "long term offline identity" signing key (used to make the successive detached keys for the successive ISO images of each new Tails edition as it comes out) has been updated by Tails Project, but I have not yet imported the changes into all my copies of the public half of the signing keypair. When I check the latest version of the signing key to the best of my ability and import it, I no longer see that message, because now GPG knows that the expiration date of the key has been extended by Tails Project.
In more detail: the closely guarded signing key itself (first one listed below) is used to create subkeys which are then used to sign the detached signatures which users use to verify the ISO images. (Defense in depth.) Currently when using Tails 4.10 the command gpg --list-keys should show:
pub rsa4096/0xDBB802B258ACD84F 2015-01-18 [C] [expires: 2022-01-19]
Key fingerprint = A490 D0F4 D311 A415 3E2B B7CA DBB8 02B2 58AC D84F
uid [ unknown] Tails developers (offline long-term identity key)
uid [ unknown] Tails developers
sub rsa4096/0xD21DAD38AF281C0B 2017-08-28 [S] [expires: 2022-01-19]
sub ed25519/0x90B2B4BD7AED235F 2017-08-28 [S] [expires: 2022-01-19]
sub rsa4096/0xA8B0F4E45B1B50E2 2018-08-30 [S] [expires: 2022-01-19]
If anyone else gets a different result *please* speak up!
> In an ideal world, we would already have a web forum so that folks don't have to use the comment section for off-topic discussions.
Thank you for that. It is difficult for users not to suspect that the reason why this issue has gone unaddressed for some many years is that USG entities are threatening TP with being suddenly shut down by simultaneous heavily armed SWAT team raids on all the devs located in the USA. I worry that will happen anyway. Note that both Drump and Biden hate Tor, and contrary to the hopes of many worried by what Drump will do in his second term, Biden's character (and ties to offshore and federal LEAs) do not permit him to "move left". Time is very short to prepare for whatever is coming in 2021. As I said, I worry.
Try refreshing your key ring…
Keep in mind you can download updated keys through the normal Internet (clearnet) or through Tor Browser (Tor network to clearnet service) or from a .onion service or by configuring GPG or curl to proxy through the tor exe that starts on your local machine when you open Tor Browser. On Linux,
parcimonie
can automate it.Refrain from downloading from SKS keyservers. [1] [2] I don't know if GPG has been patched or what version may be invulnerable.
> configuring GPG or curl to…
> configuring GPG or curl to proxy through the tor exe that starts on your local machine when you open Tor Browser. On Linux, parcimonie can automate it.
Not familiar with parcimonie, but Tails automatically routes GPG requests (for key downloads and uploads from a key server) through Tor. Tails is a complete operating system plus computing environment which works out of the box but which is designed not to leak or store personal information. It is not perfect but well tested (the Snowden leaks show NSA could not break Tails as of May 2013).
Thanks for this post; it is…
Thanks for this post; it is encouraging to learn that it is still possible (but still too hard) to use Tor from behind GFC. I'd like to hear more about censorship in Russia, Belarus, Congo, etc.
> Don't worry if you miss a meeting: we post our meeting logs to the tor-project mailing list.
Activists may not wish to increase their personal risk by using email to contact an organization all the bad governments hate with a passion.
> Most of us are also available in the #tor-dev and the #tor-project IRC channels on irc.oftc.net, so feel free to reach out any time.
Unfortunately, some years ago Tails removed, without explanation, the invaluable anonymous chat (Pidgin) which made it possible to *sometimes* enter certain chat rooms and to use OTR to hold a private conversation. Unfortunately both of the chat rooms you mentioned blocked people who have not registered a chat account.
Please ask Tails to consider restoring chat (ideally using something with a less awful UX and better security than Pidgin) and if possible the anon feature.
I also believe it would be useful for at least some Tor devs to adopt Signal and/or post a number where they can be reached for e.g. reports of possible problems using Tor. I realize that this will be received by our enemies as an opportunity to spam you or worse but I hope you can figure out an effective way to lessen the risk from that kind of cheap shot.
> Microsoft Azure
The head of the Azure Big Data team has a private plane and, it seems, a propensity to circle the homes of at last one person helping to organize employee boycotts of M$, Amazon, Google secret surveillance projects. Good example of how activists can turn the tools of Surveillance Capitalism back upon the Surveillance State. Another good example:
theintercept.com
Doorbell Cameras Like Ring Give Early Warning of Police Searches, FBI Warned
Two leaked documents show how a monitoring tool used by police has been turned against them.
Sam Biddle
31 Aug 2020
> The rise of the internet-connected home security camera has generally been a boon to police, as owners of these devices can (and frequently do) share footage with cops at the touch of a button. But according to a leaked FBI bulletin, law enforcement has discovered an ironic downside to ubiquitous privatized surveillance: The cameras are alerting residents when police show up to conduct searches.
Not just Ring but dashcam cameras and many other tools can be repurposed for intrusion detection and realtime remote monitoring by trusted confederates, including devices over which we have reason to hope the owner has better control. Encourage everyone reading this to use their imagination! So far it looks like this is one area where we are gaining on the bad guys.
A new and extremely dangerous FBI tactic is to "turn" right wing extremists, including at least one who is described in FBI's own literature as a "terrorist", and then direct their new "asset" to commit felonies on behalf of the feds, apparently including illegal searches of the homes of suspected Antifa supporters. (FBI can easily obtain door keys by threatening people who legitimately have a copy, and apparently is willing to pass these onto known terrorists.) With a bit of ingenuity such illegal and flagrantly immoral operations can be documented. And then... hurrah for Secure Drop!
> Activists may not wish to…
> Activists may not wish to increase their personal risk by using email to contact an organization all the bad governments hate with a passion.
Neither may developers! cypherpunks on GitLab when??
Cloudflare is demanding…
Cloudflare is demanding Javascript-enabled CAPTCHAs ubiquitously. What caused them to change from the regular ones that didn't require JS, and what can be done to encourage them to go back?
Security classification is…
Security classification is often abused to enable censorship of government foul-ups and misconduct. Because Snowden relied upon GPG, OTR chat, and Tails-- which uses Tor--- it seems relevant to pass on some good news: an appeals court has just ruled that the NSA phone dragnet revealed in the very first published Snowden leak (which unveiled the Prism dragnet surveillance system) is illegal. This vindicates his decision to leak and one hopes will result in his immediate pardon. Indeed, one hopes that Reality Winner and all other USIC whisteblowers will also be pardoned, along with Julian Assange.