Lessons Learned From Building a Twitter Spam Bot
A little less than 72 hours ago, I decided it was time to build up my Ruby chops. By that I mean I wanted to actually build something in Ruby. More importantly, I wanted to build something both interesting and that I could show people. After a solid 10 minutes of consideration, I decided I wanted to make a Twitter bot. Given my interest in programming, I thought it would be fun to build a hybrid that combined Hacker News Bot and Reddit Programming. And that quickly, the idea behind @CoderNews was born.
So fast you’ll freak
I was very surprised at how easy it is to build such a thing. No wonder actual spam bots are always latching on to my account. In just a couple of hours, I whipped up a cluttered hack that aggregated stories from Hacker News and Proggit, pushed it to a virtual machine on my laptop, and watched as the automated tweets came pouring in. I even set up a public repository on GitHub so that the world could see my messy hackjob. Granted, I spent much of the day tweaking various pieces and fixing some bugs that I hadn’t accounted for, the whole process was surprisingly fast.
The best way to screw up UTF-8 decoding is to not do it at all
The next morning, I went to my bot’s timeline to find two surprising facts. First, I noticed that I had somehow tricked 12 people into following the bot that would be spamming them with links every 15 minutes. Second, I saw that a lot of the tweets contained garbage text. One of the biggest offenders was €™. One of the APIs was sending me the HTML-safe version of UTF-8 encoded text that had been decoded in CP-1252. That bold hunk of garbage should have been an apostrophe.
Fortunately, the API is open source and I shot the creator and email. He replied back confirming what I had already concluded and that he hopes to push a fix some time soon. If any of you are savvy with Python and are interested in contributing a fix, the GitHub repository can be found here.
You shall not pass
Upon returning home from work yesterday, I went to check on my bot. Because I hadn’t handled any sort of network errors, it had crashed at some point during the day. I went to restart it; no go. I then called up twitter.com in my browser, to find that it wouldn’t load. I temporarily shrugged this off, as Twitter is notorious for having random downtime. A few hours later, I noticed that I still could not get on Twitter. My TweetDeck client was failing to connect as well. Intrigued, I asked my roommate to try. He was able to connect via his mobile phone, but not through the connection from our house. Awesome. I quickly fired up curl to see if I could get any response. Nope. I wasn’t even getting a 4xx or 5xx HTTP response. The best I could do was ping.
It was at that point that I concluded that my bot had somehow pissed off the spam filters on Twitter and I was IP Banned. Despite the fact that the API Docs say up to 150 requests per hour are allowed (I was making about 8), I felt like this could be the only explanation. Hours had gone by at this point and the issue still existed, so I emailed their support. I explained the monster of a script that I had created, let them know that I had turned it off, and that all I wanted was to be able to tweet again from my personal account. Ten minutes later, and no response from support, I was able to get access Twitter again.
It’s all for the best, of course it is
It’s probably a good thing that my bot only lived to be 2 days old. It was, after all, very spammy. Also, it would have been cumbersome to try to keep a virtual machine constantly running on my laptop. If you’re interested in seeing the mess I created, feel free to fork the GitHub repository. I stand by every line in that script.