Our First Hack Day—Black Hat

Doc Overell’s email arrived before the birds started chirping: “Some notes for Friday’s Hack day.” Opening the companion document revealed what we had all been waiting for, the brief for the first spider.io hack day—one day to spend scheming and implementing from the perspective of the enemy.

The Brief

In 12 hours from conception to proof-of-concept demo, build a product or service that can generate revenue from Internet traffic.

The Rules

The Schedule

The Day

The day kicked off at 08:20 with Simon in the chair for the brainstorming session, which lasted until 9am.

A clear favourite quickly emerged. Details will be omitted here. Suffice to say, it was a nemesis bot. The reason this was so popular was that we’d be creating our worst nightmare: a distributed legal way of crawling that is probably the hardest to detect and block.

The project was then generalised from a distributed bot to a job distribution framework (not limited to crawling), that any service could plug into. However, the team was hungry for a target to test the product on, so why not kill two birds with one stone? Details on the precise application are also omitted.

The Architecture
Project agreed, Ashley took up the reigns to design the Architecture.

The system was split into 6 components:

JSON RPC was agreed as the default spit and glue to hold the project together, with a thin wrapper converting this to other protocols, where required, and the Crawler to be implemented as a Python module for simplicity.

Work Begins

Jobs were swiftly allocated to people and after a short break for provisions, coding began in earnest at 10:15 am.

A brief but much needed pizza break at 1:30pm topped everyone up with meat and cheese (with one vegetarian top-up). With the clock ticking, and much progress still to be made, people quickly got back to work.

Components started coming together around 6pm, with the completion of the crawler and the launch of the JobTracker on EC2. The crawler was quickly hooked up to the Scraper, which joined the JobTracker on EC2 at 7pm, with communication between the two tested. A simple centralised job requester was pulled together (literally at the 11th hour) as a POC that general Job requesters could be integrated into the system.

The final 45 minutes were spent tracking down type errors in JSON rpc calls (integers sent as strings, etc.). The whole thing came together at 8:15pm with us happily scraping CENSORED with distributed, seemingly legitimate traffic.

Conclusions

The day produced a very sophisticated bot, which it probably makes sense to maintain for future testing of our own detection systems—as well as to test how good any other bot-detection services might be. Details of the bot will be kept under lock and key at Spider Towers.

Some Hack-Day Images