This is a personal tumbleblog, intended for random musings and snippets. I have a somewhat more structured travel and photo blog at disoriented.net, and a neglected vanity site at raingod.com.

Posts Tagged: risks

EFF Gets Straight Privacy Answers From Amazon About New "Silk" Tablet Browser

The Electronic Frontier Foundation has got some answers from Amazon about possible risks associated with the Silk browser technology. The responses are fairly encouraging: it sounds as if Amazon has thought about the issues and tried quite hard to respect user privacy.

Text

There’s lots of excitement today about Amazon’s new Kindle devices, including an inexpensive tablet running Android. Most of that excitement focuses on the price point, but there’s something in the announcements that may be even more significant.

Amazon’s new devices will ship with a new web browser architecture named Silk, which Amazon describes as ‘cloud-accelerated’. What they mean by that is that Silk browsers will use Amazon’s EC2 essentially as a giant caching proxy. The company says that “many website requests will never leave the extended infrastructure of Amazon Web Services”.

The stated goal is to improve the user experience by delivering pages faster. But ‘cloud-acceleration’ also creates an entire class of users - which, given the aggressive pricing of the Kindle devices, is likely to be a large class - who will access the web through a chokepoint controlled by Amazon.

What are the implications? The first, for web developers, is that you need to pay close attention to anything in your configuration that affects caching, such as Cache-Control headers and ETags. Get it wrong, and Amazon will cheerfully serve up stale content to all its Silk users. It’s a rule of thumb when you’re writing a complex web app that each layer of caching can be a generous source of hard-to-understand bugs: Amazon’s new scheme has just dropped a giant layer of cache between you and a large number of users.

We’re assuming, of course, that Amazon will play nice and honor the cache control directives you provide, of course. But Amazon is also theoretically free to introduce some ‘optimizations’ of its own that could inadvertently cripple your site. Suppose the Amazon cache has a rule that says that if it doesn’t get a response from the remote server in 15 seconds, it falls back on the last cached version. Now suppose that your complex dynamic page takes 30 seconds to respond. In the best case, users are going to see a stale version of your page about half the time. Or if Amazon just imposes its own limit on the number of times per minute it goes to your site for new content, similar problems can arise. Amazon is likely to be careful not to do anything that will affect popular sites, but smaller producers may get shorter shrift.

At this point, there’s no reason to think that Amazon won’t obey established rules for making proxies behave correctly, so these scenarios may never arise. The point is, however, that if Amazon does decide to implement any shortcuts for its own convenience, or for the perceived convenience of its users, it’s going to affect a lot of people.

Let’s get back to that chokepoint. What can Amazon do with it, aside from speeding up your web access? Well, the answer is ‘Pretty much anything’. At a stroke, Amazon has gained access to a giant chunk of information about how people use the web. It can see which pages are loaded, which links are followed, how long users spend on each site and how often they go there. It can ‘discover’ pages that would normally be hidden. It can track associations between different pieces of content - how likely visitors to site A are to also load site B - and use that to develop a web recommendation engine similar to the one already used on its shopping site. In essence, Amazon has just joined the big boys - Google, Microsoft, Facebook - as a member of an exclusive club of companies that have a panoptic overview of how the web is used. If you think that this isn’t part of the gameplan and that Amazon hasn’t made plans to leverage all the data it can now access, you’re dreaming.

It needn’t be limited to aggregate information, either. Kindle devices are set up to facilitate buying content from Amazon - like Apple’s iTunes, they’re as much about giving you easy access to the company’s store as they are about giving you access to the media you own. So the device knows who you are, which means that Amazon could make the connection between your Amazon profile and what you do on the web. Will Amazon make use of that information? In the short term, the most likely impact will be on your recommendations in Amazon’s webstore. But the temptation for Amazon to store everything for future use is going to be very strong.

The chokepoint represented by Silk’s ‘cloud-acceleration’ also opens the door to some real nightmare scenarios. One is that any collection of information about your browsing activity that’s held by a third-party is potentially open to attack. If Amazon does collect and store this data, they should regard it as being at least as confidential as payment information, and secure it accordingly. But corners get cut and perfect security looks increasingly like a pipedream. Sooner or later, some of that information is likely to leak. There’s also a risk of other information leakage. If Amazon does apply collaborative filtering to your web activity and use it to drive a recommendation engine, it’s not going to be long before they end up recommending pages that shouldn’t be public or URLs that insecurely encode user credentials (and yes, webmasters shouldn’t be putting credentials in URLs, and anything that you want to stay private should be behind access control - but, as I said, corners get cut).

If Amazon wanted to be truly evil, there’s a lot more that they could do. Actual censorship and on-the-fly replacement or insertion of content or advertising are just two of the possibilities. Similarly, if Amazon’s security fails to hold up and someone gets access to their servers or executes a successful DNS spoof, third parties could make use of that single point of access to perform some of the same tricks.

For Amazon, ‘cloud-acceleration’ through EC2 offers some major new business opportunities. For users of their devices, it offers a small amount of convenience and some potentially significant new risks.

Text

Going back over my post about Google Social Search, which I wrote in haste last night after the new feature was pointed out to me by a somewhat agitated friend, it looks to me as if I may have been wrong about some of the pitfalls of the new system.

The potential privacy killer is the exposure of private second-order contacts. But re-reading Google’s documentation more closely today, it turns out that Google already has a notion of ‘public’ and ‘private’ contacts. ‘Private’ contacts include your Google chat list and Google contacts, and according to the documentation, these are not shared, and will not be used to “expand your social circle”. So it looks as if the sky may not be falling after all.

I apologize for misleading you all, and for maligning Google. It seems that they have learned something since Buzz.

But systems such as Social Search are not risk-free. Google’s position is that they don’t make anything public that wasn’t already public. That’s as it should be, but it’s worth bearing in mind that what Google is doing is to make obvious what’s already public. Yes, all the individual links that make up your implicit social graph may be ‘out there’, but most people won’t necessarily connect all the dots. Tools like Social Search take the complete picture and dump it in your lap.

It’s easy enough to dream up scenarios in which that can still turn around and bite you. Your strait-laced Aunt Hettie may enjoy visiting your personal website full of kitten pictures, unaware that you’re also an active member of a flourishing bondage’n’spanking online community. The day that you inadvertently create a graph link that spans your separate personae, Google Social Search is going to make all the connections and give Aunt Hettie something to think about over her breakfast coffee.

You did it to yourself, says Google. All the information was there. We just put it all together. They’re right, but that doesn’t mean that it isn’t a problem. In general, people aren’t good at thinking about what you might call the calculus of privacy: what connects to what, who has permission to see what, and how they interact. Part of it is that we just don’t think that way yet. But part of it is that the rules keep changing. Just when you think you’ve got it figured out, Google (or whoever) will add a new way of inferring connections and suddenly the whole shape of the graph has changed in ways you never imagined.

There’s another problem. Tools for managing this new ball of wax are either non-existent or ill-adapted. Google says proudly You control who is part of your circle”, and goes on to list ways that you can do that. But the suggestions seem to amount to changing the social graph itself by removing a person (or a network). If you detect a potential exposure, the recommended fix is to take a machete to your social network.

This seems unsatisfactory. Tools designed for one purpose - such as managing your social network - are usually inadequate for another - such as protecting your privacy or controlling your online persona. If your connection to your friend Joe reveals something about you that you don’t like, Google’s answer is that you should break that connection. But when you do that, you lose whatever functionality comes from the connection.

Let’s make that more concrete with an example (not a privacy example this time, but analogous problems exist in that space as well). Suppose Joe tends to write embarrassing drunken rants on every subject under the sun. Each time you do a search, Google’s Social Search feature brings up a couple of Joe’s inebriated screeds, which may not be what you want even when the boss isn’t looking over your shoulder. But Joe’s in your social graph, and the only way to get him out of there is to remove him from your chat contact list and your Gmail address book. To manage one feature - Social Search - you’re forced to reduce the utility of two others - chat and email. Surely that’s not the way it’s supposed to be.

Connections in the social graph are overloaded. Applications built on social networking such as Google Social Search assign a ‘meaning’ to those connections that may be quite different from the ‘meaning’ intended by the user. The connections that the user creates end up being used in ways that he or she did not anticipate or intend, yet there are no tools available to let the user correct or control the way that the graph is used or interpreted. The only tools provided are tools for editing the graph itself.

It’s unrealistic to think that we can stop Google or Facebook or anyone else from adding new whizzbang features that stitch together what people reveal about themselves online and use it in ways that we never anticipated. It’s also unrealistic to think that we can ever predict the ramifications of putting any single piece of information out there (or, equally often, having it put out there by someone else). But there ought to be a middle-ground between withdrawing from online life entirely or accepting that our online persona - the sum total of information that can be learned about us online - is completely out of our control.

If someone like Google wants to think about how to build tools to give users real, flexible control over their personal information, that will impress me a great deal more than their questionably-useful Social Search.