Do Not Track (will be its own undoing)

There’s something called “Do Not Track” or DNT for short. Your can send this message to webservers, and it indicates that you don’t want to be “tracked” — whatever that might mean.

This is all fine as long as everyone is free to completely ignore it and continue on with business as usual. However, the FTC has issued some guidelines for Do Not Track, and laws are brewing that threaten to make these guidelines into legal requirements.

The FTC proposal defines the intended “Do Not Track” policy effect in just a few simple words:

Do Not Track should prohibit all data collection, retention, and use.
–Page 5

That is, of course, totally impossible. Data collection, retention, and use is precisely what it means to run a business online. So an exception is allowed: Specifically, “Exceptions are warranted when narrowly tailored to legitimate commercial interests that substantially outweigh privacy and enforcement interests.”

Ideally, as the spirit of the movement is concerned, “tracking” is what they call it when a business remembers something about you, and uses that memory in some future business interaction. So “don’t track me” would essentially mean, “Forget I was ever here.”

Here’s where it starts to get depressing

So, In the very best of scenarios, a DNT visitor would merely be a low-value user, one for which the company’s preferred set of practices are not allowed. Companies would establish a tracking-free data flow for these second-class users, one which allows just enough tracking to prevent fraud, locate errors, and optimize server workloads; but not quite enough tracking optimize recommendations, improve search accuracy, or increase content relevance. Inbound data would have to be segregated as “Normal” data and and “DNT-encumbered” data. Use and retention of DNT-encumbered data is allowed if, and only if, “tailored to legitimate commercial interests that substantially outweigh privacy and enforcement interests.”

Establishing such an alternate data flow would be daunting and expensive to say the least. But normally big changes like this are tied to some business justification. The cost must be offset by a theoretical increase in profit. But in this case, the very purpose of DNT is to prevent the company from monetizing that visit. In other words: the companies would be asked to spend time and money to cater to those few visitors who explicitly want to avoid generating revenue. Companies would have to spend money to make less.

This is a hard sell. The incentives simply don’t line up. By participating, companies stand only to lose. Indeed, the FTC admits that companies would not adopt such a policy without being forced to do so:

Given the diversity of online business models and businesses Do Not Track would affect, and given the consensus-based nature of the relevant trade associations, we believe voluntary comprehensive adoption will not occur.
-Page 13

And if volunteer participation is a bitter prospect, legislation doesn’t sweeten the deal at all. It just makes the bad parts worse. In the face of legislation, DNT-encumbered data isn’t just low-value, it’s now also potentially dangerous. When handing DNT-encumbered data, companies would now have to worry about legal repercussions if by some mistake that data gets mixed with the “normal” data stream.

Opting Out

If DNT never becomes law, then this argument is moot. DNT is dead before it begins.

But if DNT does become law, then laws will reflect the FTC’s timelines, not the readiness of online businesses. Furthermore, even companies that have DNT policies will be vulnerable to legal action if their policy is not “sufficiently compliant,” or if slip-ups occur. So the most sensible course of action for any and every business or site can only be to simply “opt out” of the entire DNT concept as a whole. This is surprisingly simple.

Companies can apply a tiny change to their site that requires users to turn off the DNT header before continuing. This takes no more than 10 minutes to set up, and it guarantees compliance by avoiding the problem entirely. The solution looks like this:

RewriteEngine On
RewriteRule dnt.html - [L]
RewriteCond %{HTTP:DNT} =1
RewriteRule .* /dnt.html [L]

That’s literally all there is to it. As for the DNT error page, you can expect the text to read something like this:

Error: Your browser is sending a Do Not Track header
While Foo Industries does not track users and will never sell your personal information, due to unfortunate complications with the laws surrounding “Do Not Track”, we can’t display this page to users who have this header set. In order to access this site, please follow the directions below to turn off this setting….

And done. Very few users will actually see this message, and they will only see it once. Problem solved, and business continues as usual. Within a year, DNT usage drops to zero.

This might sound odd; rejecting visitors due to a simple configuration preference. Presumably sites would want to increase traffic as much as possible. But in this special case, the traffic they exclude are visitors who not only refuse to participate in monetization, but who also who may bring legal action against the site owner if the special treatment is not satisfactory. In other words, these visitors are exceptionally expensive visitors to serve with little return, giving little incentive for site owners to serve them at all. Like a EULA, DNT-blocking protects the company against unnecessary lawsuits at very low cost.

In other words, Do Not Track is doomed no matter how the enforcement shakes out.

Is Google Spying On Us All

Back in 2012, someone posted a question to the IT Security StackExchange site under the title: Is Google Spying On Us All? It contained exactly the sort of uninformed techno-panic that you’d expect from a question with that title. I normally just ignore this type of bait, but I had some time to kill and something to say.

The response below is based on my original answer to that question.

What Sort of Spying?

Advertisers use what information they have to guess what ads you will want to see. In Google’s case, your search history is the best indicator they have, but ad clicks and ad impressions are also considered. In Amazon’s case your purchase and product browsing history is their best indicator, and you’ll probably notice that their suggestions closely mirror your recent history.

My own search and browsing habits tend to favor highly technical content; servers, programming, malware, etc. The ads I see when browsing under that profile therefore tend to also favor technical content: colocation, hosting, software, etc. This is totally Fine By Me™.

When I watch TV, I have to endure a depressing amount of ads about feminine incontinence, retirement homes, and herpes medication. But on the Internet, the ads are all software and servers. Do I think that’s creepy? Hell no. The fewer herpes ads the better, in my opinion.

Control Your Privacy

To be clear: I’m a strong proponent of online privacy. However, I manage my online privacy by controlling the information I make available online. I don’t expect others to maintain my privacy for me; the concept doesn’t even make sense. If you don’t want them to know something, then don’t tell them.

Telling someone your secrets and then demanding that they forget is a recipe for disaster on numerous fronts. From a security standpoint even the idea is absolutely absurd. Privacy is something you create, not something you demand.

If I don’t want a search associated with me, I use a private browsing session. Sure, I could use a service that promises to not remember what I tell them, but I would be an idiot if I were to depend on that promise. Remember Hushmail? Still, I actually prefer to use a service that allows me to craft my own online preference profile so that they can filter out all the crap I clearly don’t want.

Is what Google does Legal?

So far yes. I would hope that it remains so, since the unintended consequences of adding related legislation would be so far reaching and unexpected that it would have devastating consequences for completely innocent Internet users and site operators. Internet regulation reliably makes things worse. So far we have yet to see a counter-example.

Does Google’s Policy Bother Me?

Of course not. If I buy an apple from a market, is it creepy for the vendor to ask me the next day whether I liked my apple? Do I think he’s spying on me? If I tell him I liked it, is it creepy for him to suggest that I buy more apples at a subsequent visit? No, of course not. It’s just good customer service.

If he tells the fruit vendor next door that he thinks I like apples, should that be illegal? Of course not: It’s his information to give, just like any conclusions I make about him are my information to share as I see fit.

Vendors online remember what we tell them just like vendors at your local market. My fruit vendor may remember that I visited his store even though I didn’t buy anything, and yet I don’t assume that he’s spying on me. I’m visiting him, not the other way around. Likewise, when I visit Google, I don’t think it’s spying for them to remember what I ask them.

Private By Association

The biggest problem with online privacy is the implicit and unstated belief that because I connect to the Internet from the privacy of my own home, anything I do on the Internet also happens in the privacy of my own home. This is lunacy. Everything you do on the Internet is absolutely public unless you can verifiably prove otherwise (which you can’t, by the way).

I’m sure you mother once told you to never put in writing anything that you wouldn’t want to see on the front page of the newspaper. It’s old advice that is just as relevant today as ever, and it most certainly applies to email, text messages, Twitter, Facebook, and anywhere else you can state your opinion.

But the same principle applies to your behavior. Everything you do on the Internet is communicated to parties unknown, parties with whom you have absolutely no logical reason to trust your secrets. Even in the privacy of your own home, online activity is public: all of it, always — unless you can prove otherwise.

Privacy must start and end with you. That’s why it’s called privacy.

Yes, you do have privacy. Privacy is not dead, nor is it in danger. But you have to make it yourself, as you always have. By exercising discretion, by watching what you say and what you do, you create your own privacy. If you expect others to do it for you then the extent of your privacy is limited only to the details that no one else finds interesting.

RSA Made Simple

In response to a question, I wrote up a beginner-friendly implementation of RSA cryptography using Python. You specify your two prime numbers at the top, and then it derives the public and private key from that, and allows you to test out encryption/decryption using the keys you just generated.

Obviously the numbers this code deals in are way too small for it to be even remotely secure, but the concepts are all there and explained reasonably clearly with copious comments. This should be enough to help a beginner understand the basics of this algorithm.

Here’s the code:

Patent Nonsense

There’s been a bit of recent buzz about the patent system, specifically as it relates to software. The fight against software patents has been running hot for the better part of two decades, but it’s only recently been brought to the view of the general public this past year with the acceleration of patent lawsuits, patent-related corporate buyouts, and associated news coverage.

I would argue that the deepest problem with “software patents”, if such a category actually exists, has nothing to do with the patentability of software as such, but rather is a reflection of the fact that the quality of patents being issued in relation to software is inexplicably terrible. It is as if the patent office simply forgot how to say “no” when faced with a bad patent idea.

We’ll start with some examples. The examples are important because what they demonstrate something that is simply not allowed for patents, and yet is more common than otherwise.

Example: 1-Click

Let’s start with the venerable 1-click purchasing patent from Amazon. This is patent number 5960411 for those of you following along at home. For illustration, we’ll first describe a purchase scenario that is (as far as I know) not patented, and presumably not considered patentable.

  1. The user logs in
  2. The user clicks a “buy” button
  3. The user is asked to confirm his purchasing decision
  4. The user is told that the purchase was successful

Now, let’s examine the Amazon’s innovative invention on this front. Here’s their patented method:

  1. The user logs in
  2. The user clicks a “buy” button
  3. The user is told that the purchase was successful

In case it isn’t already depressingly clear, the only difference is the removal of step 3. Amazon actually patented not asking for purchase confirmation. Yes, that really is all there is to it — read the patent if you don’t believe me; all the other details simply deal with the mechanics of setting up a purchase transaction, which details are shared between all shopping cart systems, even those that existed long before Amazon started business. In other words, every step of the process is part of existing prior art. The only difference is the removal of the confirmation step — which is traditionally included as a courtesy rather than for any technical requirement. This is very much like patenting not saying “thank you” at the supermarket in the interest of saving time.

And lest you think that this patent simply “slipped through” the system, this patent has been re-examined twice by the patent office, and has been determined to be valid.

Example: Automatic Linking

This next patent is owned by Apple, and is a cornerstone of their suit against Android manufacturers. This is patent number 5946647. This patent covers automatically generating links in text based on the content. So, for example, if text looks like a phone number, then Apple has patented automatically highlighting the phone number and performing some related activity if you click on it. Again, let that sink in for a moment: Apple hasn’t patented a way of highlighting the text or even a mechanism for determining which text to highlight. They just patented doing it, however you might get the job done.

The patent goes into painstaking detail in describing the computer upon which the process will run, a description which encompasses every major computer made since 1960, with such banality as “…a computer having a memory storing actions, a system for causing the computer to perform an action on a structure identified in computer data….” etc. The whole patent is, in fact, this unremarkable. Like the Amazon patent, every element of every claim represents some common component that is already in public use. It describes components that already exist, performing actions that already are performed, using patterns that are already in use. This therefore raises the question: “Just what, exactly, did Apple invent here?”

The patent doesn’t cover any new device or process or method or mechanism or algorithm. Instead, it covers the use of existing technology for the purpose of applying patterns to data. There is no new insight and no new invention. The patent describes a general system of matching patterns to data — not (and this is critical) some specific system of matching pattern, but rather any system that is used for this purpose. It’s not an explanation of an invention: rather it’s a description of how some future invention might be used.

The primary problem with this patent is the same as with the others, though here it may be more obvious. Specifically, the patent effectively covers an entire idea — this patent covers the very concept of automatically making text clickable, rather than some specific new device for accomplishing that goal. This should make you cringe a bit, because patenting ideas is explicitly disallowed in our system. You simply cannot patent ideas. And yet, in this case, as in the Amazon case, that is effectively what has happened.

Intellectual Flag-Planting

The central problem with these recent poor-quality patents (of which software-related patents comprise the bulk) is the fact that they don’t describe any invention. This statement should be contradictory–or heretical at the very least–since patents, by definition, are supposed to describe an invention. In fact, nearly every patent uses the word “invention” to describe the application if its claims, as if naming it such would make it so.

Instead, these offending patents effectively cover any and every method that could be used to achieve a specific goal. Rather than patenting the mechanism, they effectively patent the purpose. Amazon 1-click doesn’t describe a specific way of processing purchases with a single click; instead it covers every mechanism of processing purchases with 1 click. Likewise, instead of covering a specific algorithm for doing context-driven text interaction, Apple’s patent covers the very concept of such interactions, encompassing every implementation by every vendor, no matter how they did it. In the face of these patents you cannot come up with an alternative mechanism for achieving the same goal, because no matter what your invention, it will still be patented. It’s not the mechanism that is patented — instead it’s the result.

Effectively, with these patents Apple, Amazon, and other similar companies are planting a flag in the territory of ideas and claiming the entire land for themselves rather than building upon it and claiming the structure. Without actually inventing anything, these companies lay claim to all inventions to come that solve a given problem. This would be like Thomas Edison patenting the very concept of a long-lasting lightbulb rather than patenting his specific design.

Following the previous analogy, and in the interest of clarity, I’ll call these non-invention patents “flag-planting” patents.

It’s worth pointing out that not all software-related patents are flag-planting. Some do, in fact, cover real inventions.

Example of an Invention-Specific Software Patent

As a contrast to the above, let me point out patent number 5946647. This patent covers RSA public-key cryptography. Notably, this patent covers the specific mechanism rather than the resulting effect. That is, the patent is specific enough to only cover the RSA algorithm itself, rather than covering public-key cryptography as a whole. At the time, no other public-key cryptography algorithms existed. But by using the background knowledge gleaned by examining this patented algorithm, researchers were able to create the ElGamal and DSA algorithms which use much of the same technology but could be used as a royalty-free replacement to RSA for certain purposes.

This is how it should work — Rivest, Shamir and Adleman came across asymmetric encryption, but instead of laying a patent claim on the entire landscape, they built something new and patented their creation. This allowed others to build right next door using similar materials, but different inventions. Since others were free to solve the same problem without infringing on the patent, they were driven to invent alternate solutions.

Effect on Innovation

An important distinction between flag-planting patents and invention-specific patents is their effect on innovation. Flag-planting patents do not directly reflect any innovation, and more importantly, they lay claim to all future innovation for a given use. For example, if I come up with a novel way of doing 1-click purchasing, my invention will belong to Amazon. If I come up with a new and innovative mechanism for using patterns to make otherwise plain text interactive, then my invention will immediately become the property of Apple, as they have a patent that covers the entire landscape.

Conversely, the RSA patent actually encouraged innovation and invention in order to come up with alternate unpatented ways of solving the same problems. Likewize, the LZW patent (not mentioned) spawned the creation of PNG image compression format as a patent-free alternative.

In the case of flag-planting patents, you’re left wondering what exactly was invented, because the patent doesn’t describe anything innovative or which might require any sort of time investment to come up with.

Unfortunately, flag-planting patents, which are disproportionately powerful in an anti-competitive sense, are becoming dramatically more common within the realm of software patent claims. The reason why is anyone’s guess, but it may have to do with the interdependence of many complex though well-defined components as used in the industry at the exclusion of all theoretical alternatives. This scenario allows for a patent to claim the entire background infrastructure, and thereby describe the whole landscape of solutions as though it were a singular machine, without explicitly enumerating any unique mechanism used to solve the specific problem at hand. This way, the patent covers any real-world machine that solves the specific problem.

This isn’t a disaster waiting to happen, it’s a disaster already happened.