Schools teach you the wrong way to write network code. They teach you the "synchronous" method. You send a request, wait for a response, then process the response. This doesn't scale to large programs that must interacts with thousands of peers at gigabit speeds. These types of programs require "asynchronous" coding.
The problem is that while you are waiting for a response, you can't do anything else useful. You can't simultaneously interact with a second system, for example. Normally, this isn't a problem because computers respond so quickly that you don't notice the wait. You can also hide it by using multiple threads, but if you had 10 threads, then 10 slow systems will noticeably slow your code.
Asynchronous coding solves this problem by never waiting. It sits in a loop processing events, either incoming packets, or timeout events.
Let's use a TCP connection as an example. As everyone knows, the client sends a SYN packet to the server, the server responds with a SYN-ACK, then the client sends an ACK. This SYN-SYNACK-ACK is known as the "three-way-handshake".
In synchronous code, you send a SYN, then stop and wait for a SYN-ACK. When you get a response packet, you first test it to make sure it conforms to the SYN-ACK you were expecting, otherwise you handle some sort of error.
In asynchronous code, the receive thread sits in an "event dispatch loop". It processes incoming packets. If an incoming SYN-ACK is received, it looks it up in a connection table to see if anybody has sent a SYN packet. If so, it dispatches the SYN-ACK as appropriate.
Imagine you are writing a port scanner, like nmap. One way you could write this is to launch many threads, where each one sends out a SYN packet, then stops and waits for the SYN-ACK. This could could generate thousands of packets per second.
Or, you could write your mapping program with two threads: one that does nothing but sends out SYN packets, and a second thread that receives SYN-ACKs in response. This code could generate a million packets per second.
Recently, a hacker released a TCP DoS tool called NKiller2. The tool uses asynchronous network code. It can appear confusing to people accustomed to synchronous programming. A synchronous coder might expect it to launch many threads, where each thread sends out a SYN and waits for responses for that one connection. This would be too slow - it would probably DoS itself creating too man threads before it was able to DoS the victim.
Instead, NKiller2 is written asynchronously. It runs two threads, one thread that spews out SYN packets, and another thread that responds to incoming packets. This may not be obvious, because both steps are part of the same thread of execution. The code has an event dispatch loop that looks like the following:
while () {
. . .…
send_syn_probe(Target, Sniffer);
. . .
state = check_replies(Target, Sniffer, &reply);
switch (state)
{
case S_SYNACK:
send_probe(reply, Target, S_SYNACK);
}
}
If you are used to synchronous programming, you might assume that the "send_syn_probe()" and "check_replies()" function are related, that it first sends a SYN then checks for a reply to that SYN. That's NOT what's going on.
Instead it's really running two threads, one that sits in a loop sending SYNs, and another that sits in a loop processing replies. The code just combines both into the same loop. You could put the "send_syn_probe()" function at the bottom of the loop, AFTER the "check_replies()", and the code would behave the same.
Or, you could create two versions of this program. Create one that sends SYNs, but has the "check_replies()" commented out. Create a second program with "send_syn_probe()" commented out, but which only receives replies. Now run them both at the same time, and you'll get identical results as the original program.
This code also uses the technique of being completely "stateless". One way to write this code would be for it to create a small connection record. However, since it is creating millions of connections, it would need a large table in memory to track what each connection is doing. Instead, it's much simpler. It will reply to a SYN-ACK packet regardless if it sent a matching SYN packet.
That would be one (of many) easy ways to see if somebody is running this tool against you. Whenever you suspect somebody is DoSing you, send them a SYN-ACK packet out of the blue. If it's a normal, stateful system that tracks SYNs it sent, then the suspected attacker will respond with some sort of error. If it is stateless, Internet scale attacker, they will respond with a data packet.
Internet scale programming like this is all around us. When the Internet worms were ravaging the Internet, a common technique was to set up "tarpits". A tarpit would accept an incoming TCP connection, but never respond. The worm on the other end would stop and wait for a response. Since the tarpit would never respond, the worm would wait forever, stopping its spread. Some worms would launch a hundred threads, each thread would eventually find a tarpit and be halted. (Note: I first tried this with the Morris Worm, it effectively slowed it down, but it would eventually timeout connections and move on - the first worm was written better than most following worms).
Another example of this is Internet-wide scanning. Kaminksy used this approach for scanning for DNS servers: have one thread spew out DNS packets, and a second thread receive them. I used the same technique for scanning for SNMP vulnerabilities. I wrote it for the military to scan Class A networks (with 16-million addresses), but it would scale to the entire Internet. My SNMP scanner was also stateless: it would accept any SNMP response regardless if it actually sent the system a request. This was actually pretty interesting seeing how many SNMP responses didn't match correctly with a request I sent (such as multi-homed hosts).
It works the other way around, too. IronPort used this approach to receiving large amounts of e-mail. They called the operating system they built around this idea "AsyncOS". (They also use this for sending spam).
Asynchronicity is why BlackICE/Proventia IPS is faster than application gateways. Fundamentally, they do the same thing: process application layer data and block it. However, BlackICE does this asynchronously, with a single thread. Application-layer gateways tend to be written synchronously, with a limited amount of threads waiting for data.
Conclusion
They teach you synchronous coding in school because it's easy to understand. However, in order to write software to "Internet scale", you have to learn how to write asynchronous code. This applies to worms, DoS tools, port scanners, firewalls, IPS, e-mail gateways, and so on.
Friday, June 12, 2009
Wednesday, June 10, 2009
Why people don't get security
Posted by
Robert David Graham (@ErrataRob)
Security is only as strong as your weakest link.
Everyone has heard this. It seems obvious. Yet, people repeatedly fail at understanding it.
Recently, a startup called "StrongWebMail" offered a $10k competition to hack their CEO's webmail account. They give you the CEO's password. Their hook is that they also authenticate by calling you back on the phone, so knowing their password isn't enough. Hackers broke in and claim the reward using a typical cross-site-scripting attack.
When conceding, StrongWebMail said this:
It is important to note that the front end protection offered by StrongWebmail.com was not compromised. In fact, Lance [James] and his team were forced to find a way around the phone authentication. We are working with our email provider to solve this vulnerability and ensure that the backend email software is more secure.
This misses the point. The flaw used to crack the system wasn't something rare or unusual, it was instead the most common flaw in web applications. It is a type of flaw that was first exploited over a decade ago in webmail applications.
At the same time, all webmail providers can fix flaws like this within hours, not wait weeks for some other organization to fix the flaw.
This is like advertising you have elite commandos protecting the front door of your bank, yet leaving your back door open. Sure, no other bank has commandos, yet no other banks leave their back door open, either.
Nobody cares about the strength of your strongest feature. What people care about is the strength of your weakest feature. By this measure, StrongWebMail is less secure than any other e-mail system and you would be a fool to rely upon it. It doesn't matter how strong their strongest link is when they have so many weak links.
UPDATE:
By the way, the simple fact they had this contest in the first place means they cannot be trusted. It's a magic trick most frequently used by snake-oil salesmen.
UPDATE:
I misspelled the name in the first post. It should be "StrongWebMail" not "StrongMail", which refers to a completely different company.
Everyone has heard this. It seems obvious. Yet, people repeatedly fail at understanding it.
Recently, a startup called "StrongWebMail" offered a $10k competition to hack their CEO's webmail account. They give you the CEO's password. Their hook is that they also authenticate by calling you back on the phone, so knowing their password isn't enough. Hackers broke in and claim the reward using a typical cross-site-scripting attack.
When conceding, StrongWebMail said this:
It is important to note that the front end protection offered by StrongWebmail.com was not compromised. In fact, Lance [James] and his team were forced to find a way around the phone authentication. We are working with our email provider to solve this vulnerability and ensure that the backend email software is more secure.
This misses the point. The flaw used to crack the system wasn't something rare or unusual, it was instead the most common flaw in web applications. It is a type of flaw that was first exploited over a decade ago in webmail applications.
At the same time, all webmail providers can fix flaws like this within hours, not wait weeks for some other organization to fix the flaw.
This is like advertising you have elite commandos protecting the front door of your bank, yet leaving your back door open. Sure, no other bank has commandos, yet no other banks leave their back door open, either.
Nobody cares about the strength of your strongest feature. What people care about is the strength of your weakest feature. By this measure, StrongWebMail is less secure than any other e-mail system and you would be a fool to rely upon it. It doesn't matter how strong their strongest link is when they have so many weak links.
UPDATE:
By the way, the simple fact they had this contest in the first place means they cannot be trusted. It's a magic trick most frequently used by snake-oil salesmen.
UPDATE:
I misspelled the name in the first post. It should be "StrongWebMail" not "StrongMail", which refers to a completely different company.
Thursday, June 04, 2009
Why deep packet inspection is faster
Posted by
Robert David Graham (@ErrataRob)
Snort recently added a more complex NetBIOS, SMB, DCE-RPC protocol parser into its code. In other words, it added "deep packet inspection" (DPI) for these protocols.
This means Snort is now slower, right? If you've got an internal network full of these sorts of packets, shouldn't you be worried that your Snort boxes might be overloaded with this new deep-packet-inspection code?
Nope. Snort is now faster.
The reason is that deep packet inspection is actually FASTER than blindly searching traffic for patterns. The more you understand about the structure of a packet, the LESS work you have to do analyzing it for intrusions.
This was the curious thing we found with BlackICE/Proventia (the IDS/IPS that does more deep packet inspection than any competing product). As everyone knows, adding signatures to an IDS makes it slower. We found the reverse: as we added signatures, the product got faster. The reason was because as we added signatures, we also added more deep-packet-inspection logic. This then meant we needed to do less work later on, and the faster the product became.
This is why Snort still struggles at 1-gbps, whereas Proventia scales to 6-gbps: Proventia does more DPI.
Not all DPI will speed up code, of course. When DPI can be done in a single pass, then it will speed things up. Some DPI, though, requires you to backtrack, which further requires you to buffer old data so that you can backtrack to it. This is the case when looking for intrusions within Word documents. Also, decompression streams can be slow: a 1-gbps gzipped stream can easily expand out to 10-gbps worth of data. If you put Proventia in front of your servers sending out compressed HTTP traffic, you might want to turn off the decompression feature for that reason.
Also, a lot depends upon how you write your DPI logic. The Snort NetBIOS/DCE code isn't horrendously bad, but it's slower than it needs to be. For example, it uses the "ntohs()" function to swap bytes, which is a bad way of coding. Most DPI code, like that you find in e-mail servers, is a lot worse. That's why DPI is considered "slow", it's because most programmers don't write DPI code well.
UPDATE
Consider this rule I downloaded from EmergencyThreats.net.
This is blind to the HTTP protocol. It is slow, because it must search everything that goes across those ports. It's prone to false positives, because the pattern may exist for reasons unrelated to the original attack.
However, with hypothetical DPI extensions to Snort, you might write it like the following. Since it reduces the range of the pattern down to just that header field, it would be faster, and less prone to false-positives.
This means Snort is now slower, right? If you've got an internal network full of these sorts of packets, shouldn't you be worried that your Snort boxes might be overloaded with this new deep-packet-inspection code?
Nope. Snort is now faster.
The reason is that deep packet inspection is actually FASTER than blindly searching traffic for patterns. The more you understand about the structure of a packet, the LESS work you have to do analyzing it for intrusions.
This was the curious thing we found with BlackICE/Proventia (the IDS/IPS that does more deep packet inspection than any competing product). As everyone knows, adding signatures to an IDS makes it slower. We found the reverse: as we added signatures, the product got faster. The reason was because as we added signatures, we also added more deep-packet-inspection logic. This then meant we needed to do less work later on, and the faster the product became.
This is why Snort still struggles at 1-gbps, whereas Proventia scales to 6-gbps: Proventia does more DPI.
Not all DPI will speed up code, of course. When DPI can be done in a single pass, then it will speed things up. Some DPI, though, requires you to backtrack, which further requires you to buffer old data so that you can backtrack to it. This is the case when looking for intrusions within Word documents. Also, decompression streams can be slow: a 1-gbps gzipped stream can easily expand out to 10-gbps worth of data. If you put Proventia in front of your servers sending out compressed HTTP traffic, you might want to turn off the decompression feature for that reason.
Also, a lot depends upon how you write your DPI logic. The Snort NetBIOS/DCE code isn't horrendously bad, but it's slower than it needs to be. For example, it uses the "ntohs()" function to swap bytes, which is a bad way of coding. Most DPI code, like that you find in e-mail servers, is a lot worse. That's why DPI is considered "slow", it's because most programmers don't write DPI code well.
UPDATE
Consider this rule I downloaded from EmergencyThreats.net.
alert tcp $HOME_NET any -> $EXTERNAL_NET $HTTP_PORTS (\
msg:"ET P2P ABC Torrent User-Agent (ABC/ABC-3.1.0)"; \
flow:to_server,established; \
content:"User-Agent\: ABC/ABC"; nocase; \
sid:2003475;)
This is blind to the HTTP protocol. It is slow, because it must search everything that goes across those ports. It's prone to false positives, because the pattern may exist for reasons unrelated to the original attack.
However, with hypothetical DPI extensions to Snort, you might write it like the following. Since it reduces the range of the pattern down to just that header field, it would be faster, and less prone to false-positives.
alert http $HOME_NET any -> $EXTERNAL_NET any (\
msg:"ET P2P ABC Torrent User-Agent (ABC/ABC-3.1.0)"; \
header.useragent:"ABC/ABC"; \
sid:2003475;)
Subscribe to:
Posts (Atom)