Lessons Learned From Building a Twitter Spam Bot

A little less than 72 hours ago, I decided it was time to build up my Ruby chops.  By that I mean I wanted to actually build  something in Ruby.  More importantly, I wanted to build something both interesting and that I could show people.  After a solid 10 minutes of consideration, I decided I wanted to make a Twitter bot.  Given my interest in programming, I thought it would be fun to build a hybrid that combined Hacker News Bot and Reddit Programming.  And that quickly, the idea behind @CoderNews was born.

So fast you’ll freak

I was very surprised at how easy it is to build such a thing.  No wonder actual spam bots are always latching on to my account.  In just a couple of hours, I whipped up a cluttered hack that aggregated stories from Hacker News and Proggit, pushed it to a virtual machine on my laptop, and watched as the automated tweets came pouring in.  I even set up a public repository on GitHub so that the world could see my messy hackjob.  Granted, I spent much of the day tweaking various pieces and fixing some bugs that I hadn’t accounted for, the whole process was surprisingly fast.

The best way to screw up UTF-8 decoding is to not do it at all

The next morning, I went to my bot’s timeline to find two surprising facts.  First, I noticed that I had somehow tricked 12 people into following the bot that would be spamming them with links every 15 minutes.  Second, I saw that a lot of the tweets contained garbage text. One of the biggest offenders was €™.  One of the APIs was sending me the HTML-safe version of UTF-8 encoded text that had been decoded in CP-1252.  That bold hunk of garbage should have been an apostrophe.

Fortunately, the API is open source and I shot the creator and email.  He replied back confirming what I had already concluded and that he hopes to push a fix some time soon.  If any of you are savvy with Python and are interested in contributing a fix, the GitHub repository can be found here.

You shall not pass

Upon returning home from work yesterday, I went to check on my bot.  Because I hadn’t handled any sort of network errors, it had crashed at some point during the day.  I went to restart it; no go.  I then called up twitter.com in my browser, to find that it wouldn’t load.  I temporarily shrugged this off, as Twitter is notorious for having random downtime.  A few hours later, I noticed that I still could not get on Twitter.  My TweetDeck client was failing to connect as well.  Intrigued, I asked my roommate to try.  He was able to connect via his mobile phone, but not through the connection from our house.  Awesome.  I quickly fired up curl to see if I could get any response.  Nope.  I wasn’t even getting a 4xx or 5xx HTTP response.  The best I could do was ping.

It was at that point that I concluded that my bot had somehow pissed off the spam filters on Twitter and I was IP Banned.  Despite the fact that the API Docs say up to 150 requests per hour are allowed (I was making about 8), I felt like this could be the only explanation.  Hours had gone by at this point and the issue still existed, so I emailed their support.  I explained the monster of a script that I had created, let them know that I had turned it off, and that all I wanted was to be able to tweet again from my personal account.  Ten minutes later, and no response from support, I was able to get access Twitter again.

It’s all for the best, of course it is

It’s probably a good thing that my bot only lived to be 2 days old.  It was, after all, very spammy.  Also, it would have been cumbersome to try to keep a virtual machine constantly running on my laptop.  If you’re interested in seeing the mess I created, feel free to fork the GitHub repository.  I stand by every line in that script.

This entry was posted in Blog. Bookmark the permalink.

3 Responses to Lessons Learned From Building a Twitter Spam Bot

  1. Eliot says:

    This was a great read. I’m sorry it had to go down though. I actually enjoyed the steady stream of interesting news while I was at work. Great work though, Matt. That’s awesome.

  2. GDubClay says:

    The proliferation of Twitter spam bots is partially attributed to Twitter’s laziness over the years of not offering any legitimate advertising options for smaller companies. Twitter has a very tight rope to walk across because the biggest advantage that Twitter has is its ecosystem – Twitter is going penny wise and pound foolish by trying to shut out their biggest supporters IMO. I think that Twitter squandered the last few years while Facebook was innovating and creating and emphasizing more of an actual method of making money. They created an intense ad network, the overall ecosystem developed with hundreds of companies listed at BuyFacebookFansReviews that do nothing other than promote Facebook pages. Meanwhile, Twitter has only enhanced its relationship with large advertisers and has completely deemphasized small businesses using Twitter to promote themselves by setting large minimum ad spends and only allowing large brands to have access to certain features. Twitter shutting out its 3rd party developers is going to decrease the amount that 3rd parties promote Twitter and that is risky for Twitter. While Twitter is popular now, that hangs on by a thread because MySpace and Digg used to be kings of the castle too and look what happened to them.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>