Stats on Spam

February 23, 2010 in Personal Projects

Now I’ve been collecting spam for a little over a year on The purpose of which I do not know, yet. I kind of forgot about the project until just recently, so I started harvesting some stats from the database.

Currently we have 729,255 records in the database. That’s 3 MB of data.

Month Spam Submissions
January, 2009 3
February, 2009 1
March, 2009 3
April, 2009 11689
May, 2009 28525
June, 2009 33900
July, 2009 35965
August, 2009 66018
September, 2009 89269
October, 2009 95155
November, 2009 71148
December, 2009 73367
January, 2010 97242
February, 2011 126986
Total 729271

As you can see, the numbers are growing exponentially.

I actually find this whole thing really funny. The irony of software sending in vasts amounts of so useless data is really kind of funny.

spam chart

Hardest working IP addresses

There are 28,942 IP addresses that have submitted spam, so each IP addresses submits around 25 entries each. The most hard-working IP addresses are as follows:

IP Spam Submissions 18022 17410 15772 14782 12933 12353 12269 12242 11838 11473

What’s next?

Now I’ve just got to find out what to do with all that “data”. I thought about creating an API for querying the database, but I wouldn’t really know what for, except for artistic purposes.

Anybody got any ideas.

edit: I added the IP address stats

  • Ronnie Schwartz

    I’ve just found this website and about your project and I think it’s a nice idea.

    Could you please consider sharing the code base at Github or Bitbucket?

    I’d love to help a bit with the API implementation and with the codebase in general and I think some other guys would occassionaly submit stuff.

    • Arnór Heiðar

      That’s a great idea!

      Unfortunately, the code is actually very ugly, hacked-together kind of stuff, so it’s not very publishable. I think the data is the main issue.

      There’s also the option of using other sources for data as well, so the front-end spam-collecting code is really not so important.

      I would love to publish all code that would be written for the API or for any code used to read from the DB etc.

  • John Haugeland

    A little confused about the meaning of the word “exponential,” are we?

    • Arnór Heiðar

      LOL, yes.