Project Kyoketsu Part 1
2024-02-21 03:23:54.082123958 +0000 UTC
Kyoketsu is the name of my new software project. Link: git.atherial.dev/aeth/kyoketsu
Kyoketsu is a distributed client to client network observability system that will leverage the potent parallelism capabilities of GoLang to allow for rapid concurrent host scanning within any given network or subnet, while utilizing the torrent protocol to replicate and sync topology data across the Kyoketsu network in real-time.
I came up with the idea when I was reading the book Ethical Hacking by Daniel Graham. Chapter 4 is concerned with TCP shells and botnets, and specifically the section of the chapter that touches on botnets got me interested in the concept. I’d always heard the term but never stopped to learn more about what they are. A botnet is simply a network of servers running some software that is predominantly used for malevolent reasons, allowing for remote access to however many clients are on the bot-network. This allows for orchestrated endeavors such as a DDoS (Distributed Denial of Service) attack.
The Mirai botnet which compromised over 350,000 smart home devices was built in a pretty clever way; in that the clients that were part of the botnet would resolve the address of their C2 (Command and Control) server via DNS opposed to a naked IP address. This allowed them to have a DNS tunnel out of any ISP blacklists that they were blocked by. Pretty neat, if you ask me.
Later in the chapter Daniel talks about how botnets can have either a client-server model, or a peer-to-peer model. The idea of a peer to peer botnet got my gears turning , namely because this would allow for something like a torrent network to exist between clients, as well as the added benefit of decentralized command and control by way of having no fixed server for each client to talk with.
The torrent protocol, a client to client data distribution method in-which clients are able to share data among each other with the assistance of ‘tracker’ servers, just so happens to be used by people distributing content because of the its resilience against server isolation, DNS blackholes, and IP blacklists. As long as one client on a torrent network has a complete copy of the data being distributed (or ‘seeded’), all that needs to happen is the client must let the tracker know that it has the file and is ready to send it back out again. The resiliency of this scales with the number of clients.
Normally, one of the benefits of distributing data this way is that torrenting clients will verify that the data that they have received is the same as the data that the other clients are sharing by way of a checksum. Once a file is downloaded to a client, it asserts that the checksum of the file on disk is the same as the checksum from the other clients. This is valuable when receiving files because it grants you the assurance that your data hasn’t been tampered with or any sort of code injection. (so long as the original file wasn’t initially infected with the same thing.)
The twofold part of this problem came from a chapter of a similar book that (#TODO: remember name of book and chapter) talks about how when one is attempting to scan large IP ranges to assert whether any given host is listening on some arbitrary port, you can quickly run into problems with address ranges that are simply too big to be scanned in a meaningful way. Imagine you have a Class A subnet (CIDR notation /8 - /15). A /8 subnet could have up to 16,777,214 hosts on it, while a slightly smaller /15 can have up to 131,070 hosts. You likely will not need to scan 16,777,214 hosts, but if you have a company with 300,000 people in it, that’s atleast 300,000 machines that could be listening on some port. Thats not even counting all the internal servers that this company might be using, network devices, IP phones, IoT smart devices, company cell phones, the list goes on. While it is an irrefutably valuable tool that has a long track record of being a go to tool for a hacker/network administrator’s, One server running a bash script wheezing while it attempts to parse nmap output from 300,000 port scans won’t necessarily cut it.
Network discovery with a pool of hosts that big can be problematic because:
1. 300,000 IPs is a lot addresses to scan, forget trying to scan 16,777,214 (with one server)
2. Best practice says that for wired connections, DHCP leases shouldn't exceed 8 days.
3. for wireless, its 24 hours
4. Time matters when analyzing a network, and having reliable data for as long as possible is paramount
In conclusion, the goal of this project is to create a network topology state machine; capable of analyzing and distributing data across itself in real time, with the added benefits in the form of data resilience by way of being able to rebuild the current state as long as 1 client is running, in addition to not being limited by a single C2 server thanks to the peer-to-peer model.
TO BE CONTINUED …
Next: What am I going to do about tracker servers?