Friday, January 15, 2016

The Rising Sophistication of Network Scanning


Gone are the days when computers didn't need firewalls. We are now living in an internet security arms race and your personal information lies in the balance. A good firewall can go a long way but they aren't infallible.

In this article I would like to show you a hidden system that is hard at work scanning thousands, maybe millions, of unsuspecting devices. And I'll show how this system efficiently harvests each device's personal IP address and hands it off to a scanner, which proceeds to run a port/security scan against each unsuspecting victim for vulnerabilities.

The first red flag came when I noticed a steady flow of unsolicited network scans being hurtled at my devices. What was most puzzling was the fact that the devices that were targeted had randomized IPv6 addresses and were not published in DNS or any public record. For all intents and purposes they were hidden safely within my lab network. 

Imagine my surprise when the firewall started logging swaths of packets, from distant internet addresses, aimed directly at my hidden devices. They were being directly targeted.

So how did they know the unpublished IP addresses? These addresses are 128 bits in length, which is an unimaginable range of numbers. Unguessable is the only way to describe the randomized addresses that were in use on the devices.

My next move was to search the firewall logs for any outbound traffic sent to these scanners that may have triggered the scans. Nothing—not a single outbound packet was sent to the scanners. The scanners had my hidden addresses even though my devices had never connected to them.

In this article I'll retrace my steps as I dissected this advanced system that harvests ip addresses and then passes them to another system for scanning at lightning speed.

Background

If you're one of the billions of Debian, Ubuntu, or Raspbian users, or if you have servers running these operating systems, and you have an IPv6 address, then it's pretty likely your device has already been scanned. The devices in my lab network that were targeted were all running the Debian based Raspbian distribution of Linux.

Debian is a distribution of the Linux operating system. It’s known for its minimalist approach and huge success in supporting the ARM computer architecture. It runs on cloud servers, routers, and the newly popular $35 credit-card sized Raspberry Pi computers (they run Raspbian). It comes packaged with the “NTP time daemon”, preconfigured to query a specific set of time providers. 

As of recently I also learned the other parts of the pool contained harvesters, including the RedHat NTP pool.

From Harvesting to Scan - how long does it take?

It takes less than five seconds for your address to be harvested and scanned. The entire scan takes less than one second and scans over 100 common TCP and UDP ports.

How did they get my randomized IPv6 address?

Is it registered in DNS? No. Guessable? No. Here’s how: My computer made a normal request to the public Network Time Protocol (NTP) pool to set its clock. Alongside NTP, the server must also run some sort of application or script that grabs source IPs of inbound packets. But I’m getting ahead of myself, and I still need to describe how I came to this conclusion.

How common is it to use the NTP?

Pretty much every computer these days sets it’s time to the atomic standard. Most often, a computer will use NTP to synchronize its clock to a public NTP server, as defined by the computer’s configuration.

The address of the NTP server for each computer is typically preconfigured by the manufacturer or, in the case of a Raspberry Pi, it's spun into the Raspbian firmware image (/etc/ntp.conf).

Using IPv6 to get NTP time from Debian? You're probably getting scanned by Shodan.

NTP uses a concept of pools or groups of servers. Ntp.org aggregates the public pools and makes them available with names such as 0.Debian.pool.ntp.org.

What is port scanning? It’s akin to walking through a parking lot and pulling on door handles to see which doors are unlocked.  Just as cars can contain valuables, computers can also contain valuables, including personal data.

Port scans can be used ethically, but they can also be used maliciously. For example, evil hackers use port scan results to identify potential victims based on the software they detected during the port scan. Researchers, on the other hand, use port scan data to compile reports about the internet as a whole, As seen on Shodan (link). These are the two main reasons to initiate a port scan.

Do these scans target me specifically? Well, no, not exactly. These scans do target Debian systems, but they do not specifically target individuals.

Millions of devices use the Debian NTP servers. A rising percentage of those devices have IPv6 addresses.

What's IPv6?

Internet Protocol version 6 (IPv6) is the latest version of the network protocol that carries your data across your home or office network and across the internet as a whole. It's a very important upgrade for the internet and it will enable the unbounded growth of network connected devices that has continued to accelerate over the last two decades or more.

Adoption of the new protocol is ramping up behind the scenes on networks around the world. For the most part, it is a transparent upgrade. You know there’s a sidewalk under your feet while you’re walking and you probably don’t care much whether it’s made of concrete or something else. Its biggest selling point is the vast number of addresses that are available with IP version 6. So you can connect your thermostat and refrigerator to the internet.

Ipv6 offers a number of security features. Randomized host addressing is one - also known as temporary addresses or privacy extensions.

Randomized Addressing?

This refers to the so-called "privacy extensions" which provides a means for devices to assign themselves an IP address that is half home/network prefix and half random bits. This is supposed to offer a means to cloak the device by making its address nearly unguessable. but as we all know, security through obscurity is no security at all. The operators of this scanner network are exploiting the fact that these privacy addresses are exposed any time the device makes an outgoing connection (such as setting its clock to an NTP server)

But temporary addresses are secure, right?

yes and no. They certainly help, and having a temporary, somewhat random IPv6 address that doesn't contain your MAC address is a step in the right direction, it's still possible to receive traffic on that temporary address, during its lifetime. Both devices had temporary addresses assigned. Both devices are behind a firewall and not used by users for internet communication. The only services running were NTP, and an occasional package manager update.
Also see RFC5157 ("IPv6 Implications for Network Scanning")

Make Shodan Do The Dirty Work

It seems the scans are being carried out by way of the Shodan’s scan API. The IPs that perform the scans are registered to Shodan hostnames. Below are a few examples (these are some of the scanners that spewed scan packets at my firewall:

2604:a880:0800:0010:0000:0000:0970:a001 = thor.scan6.shodan.io.
2604:a880:0800:0010:0000:0000:00fe:d001 = gateway.scan6.shodan.io.
2604:a880:0800:0010:0000:0000:0092:2001 = bone.scan6.shodan.io.
2604:a880:0800:0010:0000:0000:00fd:7001 = burger.scan6.shodan.io.
2604:a880:0800:0010:0000:0000:0089:c001 = rock.scan6.shodan.io.
2607:ff10:00c5:0509:bcde:00d0:fde8:e28d = ? Carinet ISP. This one is an oddball, only seen twice.

How exactly are the addresses harvested?

So clearly we can see that performing an ntp time sync against a harvester will result in a follow-up scan, but how are the harvesters capturing my IP information? Are they scraping the ntp log file, or perhaps a firewall log? Well in the interest of narrowing down the list of possibilities, I tried probing the ntp port directly, using netcat, against a known harvester. The result? I got scanned! So it seems a real NTP request is not even needed and a simple empty packet to the harvester’s ntp port is sufficient to get harvested. This suggests the possibility of a firewall log scraper, scapy script, or another low level means of IP harvesting, and also suggests that the ntp application itself on these time servers was probably not customized in any way.

New: On a whim I tried sending an empty packet to port 321 on an affected server (2604:a880:0400:00d0:0000:0000:0009:b00d) and lo and behold, it triggered a scan. So the harvesting seems to be completely independent of the NTP service that runs on the servers.

What Can People Do With The Scan Data?

For starters, they can glean the ratio of IPv6 addresses which do and do not have a firewall - because if they have a firewall, none of the scan probes will be responded to (and without a firewall, some or all packets will be responded to, if nothing else with a port-closed response). Furthermore, they know which services are not protected by a firewall on the clients. And finally, and most disconcerting is the possibility that they are also probing for vulnerabilities in the clients, in which case they would have a good idea how to load spyware/malware/virus onto the client should they choose to do so.

Progression of log analysis

Initially I had been correlating scans with harvesters using an embedded Splunk query, but I found it difficult and nearly impossible to take certain attributes of the primary query, adjust them slightly, and feed them into the phase-2 query that searches for outgoing packets that triggered the scanner. By using the Splunk API interface and writing a small Python script, I was able to do exactly what I needed, and in relatively short time. The script took two days to write, and it runs for about 10 seconds for every day of firewall logs it analysis. The results that it generates save me hours of manual work, and that’s a win in my book!

What method are they using for harvesting?

Most of my testing focuses on using the free NTP servers to get time and then analyzing the following scan. I went a little further though and attempted to find out whether or not real NTP traffic on the port was required or not. As it turns out, it's not. Simply sending an empty packet to port 123 is enough to trigger a scan. So, we can conclude that it's likely that the IP harvesting operation is looking for targets at a packet level, as opposed to some sort of custom NTP application. Chances are there is a Scapy script running or a firewall log scraper, which is collecting the IP Addresses of incoming connections and then passing that to the port scanner which resides elsewhere.

Test Server Setup

How did i configure my test server so that it uses unique ipv6 addresses each time it contacts a harvester?  This part is a bit technical and I'll provide samples below, but in general I increased the rate at which random IPv6 addresses expired and thus were re-generated and assigned. This is quite easy on Linux and Mac by tweaking sysctl. I don't recommend these settings on your desktop however because they might lead to your connections being disrupted of you're using an IP when its validity timer expires.

sysctl.conf

# BH - for testing. Turn off SLAAC.
net.ipv6.conf.all.autoconf=0
net.ipv6.conf.default.autoconf=0

# BH - privacy extensions - override the default 1-day long tempaddr validity time with something much smaller.
# This should help narrow down ipv6 address harvesting servers, among other things.
net.ipv6.conf.all.use_tempaddr=2
net.ipv6.conf.default.use_tempaddr=2
net.ipv6.conf.eth0.use_tempaddr=2
net.ipv6.conf.all.temp_prefered_lft=1200
net.ipv6.conf.all.temp_valid_lft=2400
net.ipv6.conf.default.temp_prefered_lft=1200
net.ipv6.conf.default.temp_valid_lft=2400
net.ipv6.conf.eth0.temp_prefered_lft=1200
net.ipv6.conf.eth0.temp_valid_lft=2400

The settings above result in addresses being expired and allocated once every 20 minutes. This is just above the lowest recommended values where the protocols could start to act wonky. This rate lends itself well to running tests once every 30 minutes and each test getting its own unique ipv6 address.

And of course having a unique source address for each of my time requests made it trivial to backtrack which one triggered the scan - because the target of the scan would match the source of a time request, and the destination of the time request is the harvester.

Which NTP Servers Are Harvesting?

By default, a system running Debian comes pre-configured with five time server address pools: 0-3.debian.pool.ntp.org. It’s the 2.debian.poo..ntp.org hostname that currently offers a few ipv6 addresses:

$ host 2.debian.pool.ntp.org
<snip>
2.debian.pool.ntp.org has IPv6 address 2604:a880:400:d0::9:b002
2.debian.pool.ntp.org has IPv6 address 2604:a880:400:d0::9:b00e
2.debian.pool.ntp.org has IPv6 address 2001:470:e949:a::1
2.debian.pool.ntp.org has IPv6 address 2604:a880:1:20::a7:f004

The address ending in f004 is a known harvester. There are many more. Below I confirmed over a dozen. I have another script running as we speak that is walking through every address in the pool and making a time request, to see if it triggers a scan.

Only a few IPv6 addresses are returned by a DNS lookup against the pool, but they all get returned eventually. There are literally hundreds, maybe thousands of IPs that can be returned. This is an effective means of distributing ntp clients across the large number of available servers and is quite normal in networking. It does however make it somewhat difficult to gain the full list of IPv6 NTP servers so we can check each one for harvesting behavior.

NTP is a funny protocol. In order to get the most accurate time reading, it visits a handful of different servers before making its final adjustment to the clock. This means that in a given synchronization period, it may visit half a dozen different time servers. This made it somewhat more difficult to identify which ntp server was harvesting and triggering the scans. I did my best to mitigate this challenge by using short tempaddr lifetimes and my nifty Detective script to do some heavy lifting.

The Detective Script

I wrote a Python script called detective.py (github link) to help do some of the heavy lifting and log analysis. It performs firewall log analysis to uncover scan operations. It then transforms attributes from the primary search and executes a second query to uncover which outbound packets most likely triggered the scan with great accuracy.

Sample application output:

SCAN DETECTED: startTime=2016-01-13T19:33:10.000+00:00 numPortsScanned=117 SRC=2604:a880:0800:0010:0000:0000:0970:a001 DST=my:prefix:my:prefix:7c0d:e6cb:8719:7d94 durationSeconds=0.0 startTimeEpoch=1452713590.0
                                                                          
One or more of these packets may have triggered the scan...
 lag=47.0s @2016-01-13T19:32:25.000+00:00 epoch=1452713545  my:prefix:my:prefix:7c0d:e6cb:8719:7d94:42539 -> 2604:a880:0001:0020:0000:0000:00a7:f007:123
 lag=7.0s @2016-01-13T19:33:05.000+00:00 epoch=1452713585  my:prefix:my:prefix:7c0d:e6cb:8719:7d94:59081 -> 2604:a880:0001:0020:0000:0000:00a7:f009:123

SCAN DETECTED: startTime=2016-01-13T20:01:19.000+00:00 numPortsScanned=110 SRC=2604:a880:0800:0010:0000:0000:0089:c001 DST=my:prefix:my:prefix:c10a:acd0:af40:c259 durationSeconds=0.0 startTimeEpoch=1452715279.0
                                                                          
One or more of these packets may have triggered the scan...
 lag=8.0s @2016-01-13T20:01:13.000+00:00 epoch=1452715273  my:prefix:my:prefix:c10a:acd0:af40:c259:58417 -> 2604:a880:0001:0020:0000:0000:00a7:f00c:123

SCAN DETECTED: startTime=2016-01-13T19:12:02.000+00:00 numPortsScanned=114 SRC=2604:a880:0800:0010:0000:0000:00ba:4001 DST=my:prefix:my:prefix:95b4:83a9:31a7:02e6 durationSeconds=0.0 startTimeEpoch=1452712322.0
                                                                          
One or more of these packets may have triggered the scan...
 lag=6.0s @2016-01-13T19:11:58.000+00:00 epoch=1452712318  my:prefix:my:prefix:95b4:83a9:31a7:02e6:52382 -> 2604:a880:0001:0020:0000:0000:00a7:f005:123

Confirming Each Harvester

After using my handy Python script (Github link) to get a list of likely harvesters, I confirmed each one individually by requesting the time from them, using ntpdate and checking for an immediate follow-up scan that ensues when harvesters are queried.

I also came up with some basic bash code to surgically probe each suspect while not letting ntp make other connections:

ip6tables -I OUTPUT -j DROP

for I in 0 1 2 3 4 5 6 7 8 9 A B C D E F ; do
 export IPADDR=2604:a880:0001:0020:0000:0000:00a7:f00${I}
 echo Triggering $I
 ip6tables -I OUTPUT -m state --state NEW ! -d $I -j DROP
 ip6tables -L OUTPUT -v -n;  
 ntpdate $I
 ip6tables -D OUTPUT 1
 sleep 1205 # Wait long enough for IPv6 temp address to refresh
done

Which helped confirm a bunch of suspected harvesters. I came up with the following list:

 2604:a880:0400:00d0:0000:0000:0009:b001 (DNS: robot.data.shodan.io)
 2604:a880:0400:00d0:0000:0000:0009:b002
 2604:a880:0400:00d0:0000:0000:0009:b003
 2604:a880:0400:00d0:0000:0000:0009:b004
 2604:a880:0400:00d0:0000:0000:0009:b005
 2604:a880:0400:00d0:0000:0000:0009:b006
 2604:a880:0400:00d0:0000:0000:0009:b007
 2604:a880:0400:00d0:0000:0000:0009:b008 
 2604:a880:0400:00d0:0000:0000:0009:b009 
 2604:a880:0400:00d0:0000:0000:0009:b00a
 2604:a880:0400:00d0:0000:0000:0009:b00b
 2604:a880:0400:00d0:0000:0000:0009:b00c
 2604:a880:0400:00d0:0000:0000:0009:b00d
 2604:a880:0400:00d0:0000:0000:0009:b00d
 2604:a880:0400:00d0:0000:0000:0009:b00e
 2604:a880:0400:00d0:0000:0000:0009:b00f 

 2604:a880:0001:0020:0000:0000:00a7:f001  (DNS: abend.data.shodan.io)
 2604:a880:0001:0020:0000:0000:00a7:f002
 2604:a880:0001:0020:0000:0000:00a7:f003
 2604:a880:0001:0020:0000:0000:00a7:f004
 2604:a880:0001:0020:0000:0000:00a7:f005
 2604:a880:0001:0020:0000:0000:00a7:f006
 2604:a880:0001:0020:0000:0000:00a7:f007
 2604:a880:0001:0020:0000:0000:00a7:f008
 2604:a880:0001:0020:0000:0000:00a7:f009
 2604:a880:0001:0020:0000:0000:00a7:f00a
 2604:a880:0001:0020:0000:0000:00a7:f00b
 2604:a880:0001:0020:0000:0000:00a7:f00c
 2604:a880:0001:0020:0000:0000:00a7:f00d
 2604:a880:0001:0020:0000:0000:00a7:f00e
 2604:a880:0001:0020:0000:0000:00a7:f00f

 2a03:b0c0:0003:00d0:0000:0000:0018:b001  (DNS: analog.data.shodan.io)
 2a03:b0c0:0003:00d0:0000:0000:0018:b002
 2a03:b0c0:0003:00d0:0000:0000:0018:b003
 2a03:b0c0:0003:00d0:0000:0000:0018:b004
 2a03:b0c0:0003:00d0:0000:0000:0018:b005
 2a03:b0c0:0003:00d0:0000:0000:0018:b006
 2a03:b0c0:0003:00d0:0000:0000:0018:b007
 2a03:b0c0:0003:00d0:0000:0000:0018:b008
 2a03:b0c0:0003:00d0:0000:0000:0018:b009
 2a03:b0c0:0003:00d0:0000:0000:0018:b00a
 2a03:b0c0:0003:00d0:0000:0000:0018:b00b
 2a03:b0c0:0003:00d0:0000:0000:0018:b00c
 2a03:b0c0:0003:00d0:0000:0000:0018:b00d
 2a03:b0c0:0003:00d0:0000:0000:0018:b00e
 2a03:b0c0:0003:00d0:0000:0000:0018:b00f







Analysis

The harvesters are using Digital Ocean IP ranges. By default, Digital Ocean provides "droplets" (their name for server or service instances) with an address range of 1-F (15 addresses) of which a server may assign one or all of the addresses. 

Each of the blocks of 15 above is probably a single server, each in a different network region.

GeoIP lookups (link) show the "robot" server(s) to be in San Francisco, the "abend" server(s) to be in New York, and the "analog" servers to be in Frankfurt Germany.
What is the Intention Behind the Scans?
Unclear at this time. You could set up a honeypot server and trigger a scan against it and see what happens after it discovers one or more vulnerable services. If you do, please come back and tell me about your results :)

It’s quite likely the scan results are being saved to a massive database such as Shodan does in the IPv4 world. Check out the website to see for yourself.

Until we know the true intentions of those behind this operation we can assume at very least that the data can and will be reported on, and at worst exploited and injected with malware.

How do I tell if I'm getting scanned?

It depends - do you have a firewall? If so, hopefully it logs dropped packets. There are typically two places a firewall would be installed to protect your computer: on the computer itself or on your gateway device. Comcast and other ISPs only certify hardware that blocks inbound IPv6, to protect you from this sort of attack. You would have to explicitly allow inbound connections in the router settings or be using an uncertified device that doesn't offer the same protection by default.

Further Work

I’m working with the NTP pool maintainers to dig into this and follow the appropriate channels to ferret out any activity that’s against policies.

Contact

I can be reached by email at gmail.com name linuxbrad. 

Please leave feedback, suggestions, criticisms, etc. I would love to hear from you.