Wednesday, March 11, 2009

Deep packet inspection is not the same as snooping

"Speaking at a House of Lords event to mark the 20th anniversary of the invention of the World Wide Web, Berners-Lee said that deep packet inspection (DPI) was the electronic equivalent of opening people's mail."

No it's not. It's the equivalent of weighing mail in order to figure out how to best deliver packages. Small letters take one path through the postal system, large boxes take another. So-called "postal neutrality" laws would force the post office to route both letters and boxes the same, making the postal system less efficient.

Such "postal neutrality" laws would tilt the market in favor of delivery monopoly Federal Express. This is why the monopoly is pushing for such laws. In much the same way, monopolies like Google, eBay, and Amazon are pushing for net neutrality laws.

I'm joking about "postal neutrality", of course, but I'm not joking about net neutrality. People really do believe in regulating the Internet to help monopolies entrench themselves. People really do believe that "Vint Cerf" is some sort of wise-man saying what's good for the Internet, rather than simply a corporate shill for a monopoly (Vint Cerf is Google's most important lobbyest).

The great thing about our society is that you can encrypt your traffic if you don't want somebody to read it, and you can anonymize it through TOR for even more protection. Seems like it's a better bet to me to ensure that these freedoms are preserved, rather than fighting for a world where governments and Google can read our e-mail, but the ISPs cannot.


On an unrelated note, I'm also amused by this article that explains Deep Packet Inspection. When discussing DPI, the article claims "until now, this wasn't possible with IDS/IPS or stateful firewalls. The different is that DPI has the ability to inspect traffic at layers 2 through 7".

This isn't true. I wrote the first IPS (BlackICE Guard, now IBM Proventia). It's full layer 7, at multi-gigabit speeds. For example, one of the signatures it can block are e-mails with ZIP attachments, where the ZIP file contains a filename that has more than 4 space characters followed by a ".exe" extension. (Viruses put lots of spaces in front the .exe extension to prevent you from seeing it). Proventia has to reassemble TCP stream, parse layer 7 protocols like SMTP, and then parse RFC822 e-mail headers, MIME, BASE64 encoding, and finaly ZIP file format.

And, you know this is true because when the event fires, the full filename appears along with the event. This would be impossible without full 7 layer inspection.

The Proventia IPS does deeper layer 7 inspection than any of the DPI discussed in the "net neutrality" debate. It has done so since 1999. That's one of its selling features: it includes the 7 layer decoded information as part of its events (which no other IPS does).

The so-called "deep" packet inspection everyone is talking about is actually pretty shallow. While inspecting HTTP headers is certainly deeper than inspecting TCP headers, they still aren't capturing and indexing everyone's traffic -- at least, not any more than google-analytics does already.

4 comments:

Anonymous said...

I find the posts about deep packet inspection to be a bit amusing. It was 8 years ago that I was working with then Nortel/Alteon systems that performed packet inspection for information in a sliding window over multiple packets for a single conversation. Doing this in a hardware accellerated platform had great benifits. For one speed and decisions based on signatures DEEP in a conversation (not just the packet).
The art of true deep packet inspection seems to have been lost by many who have reverted to basic header inspection. Are we doomed?

Anonymous said...

I would hope everyone is aware that it is possible to do session reassembly at or near end-points to do analysis and policy enforcement. I work at a company that does this on outbound traffic and we like to think of ourselves as doing deep-session inspection.

But to the general point and leaving TBL aside, I think most people in the net-neutrality crowd are concerned about DPI on common-carrier networks such as ISPs. If GE decides to do DPI on it's internal network, that's GEs business. When Comcast does it to paying customers, they might want to know what the plan is.

That's the distinction. It's more a jurisdiction thing.

In any case, I'm quite bemused by all this. RFC 2474 is over a decade old now and every router and switch vendor has invested time getting this right. Now that we can identify the flows better, we can rate-limit, shape and police all we like and achieve better aggregate outcomes in the process.

Anonymous said...

"It's the equivalent of weighing mail in order to figure out how to best deliver packages."

No, it's not. That would be like looking at the length of the email.

Deep inspection really IS like opening mail. If you make decisions about whether the mail is "business vs non-business" or "real-time voice vs bit-torrent" based on the contents, then something had to open the mail and examine (ie. read) it.

You comment on what we do now is not very complex, but it's a slippery slope. When deep inspection is commonly accepted, ISPs can do anything that they like with the data that they carry, including targetted advertising, etc.

Residential ISPs are already adding rate limiting for high-usage customers. That is a trend that will likely continue regardless of DPI.

Anonymous said...

People against net nutrality seem to be blaming companies like Google for traffic levels and wanting to make Google and others pay for this traffic. I think they miss the main point. The traffic is NOT Google's! Once I make a web request to a Google server, the traffic is MINE. *I'm* paying for the download speed, and *I* am paying for the download limits. What does it matter *where* I get that traffic from? All Google is doing is providing a service to users. Saying Google should pay for traffic and guaranteed access etc is like trying to make the postal company pay for delivering YOUR letters! (and this AFTER *YOU* have already paid to get it delivered!) In case I'm not being too clear... At what point does the traffic stop being "GOOGLE'S" and become "MINE"? If I am paying for everything coming TO me, and they are paying for everything going FROM them, then isn't that data being paid for TWICE?