The Great SER Wiki Spam Clearout of 2020

From The Sonic Eats Rings Museum
Revision as of 09:04, 24 November 2020 by Sofox (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

After being started in 2008, the SER Wiki, for a time, seemed like a fun, and occasionally dramatic, element of the SER community.

By 2011, activity and interest in the wiki had decreased, but more importantly... spambots had started to take an interest.

This would continue. By being based on popular wiki software MediaWiki, many spam and SEO bots knew how to login to the wiki and make enough posts or profile edits to point links at whatever products they desired. While a nuisance at first, they scaled up into a major problem in the years after 2011 that led to a spam flooded wiki where the original content only made up a tiny proportion. Eventually the wiki was locked, ensuring that nobody, spambots or SERfer, could change a word on the wiki again. Eventually it was taken offline, though Spazz made a backup of the wiki as it was to an SQL file.

The desire to clean up the wiki was there over the years. Sofox offered, but PhpMyAdmin struggled when trying to upload a 1.7GB SQL file. He gave up.

Years later, Sofox would gain more programming and database experience, and realised his mistake was to try to upload the file via a web interface when he should have done everything from command line. Despite this revelation, he didn't do much with it.

In August 2020, in the now populated and somewhat active SER Discord server, talk about he Wiki came up and there was a desire to see it restored. Sofox decided to give it another go!

He logged into his hosting server, and using command line, uploaded the SQL file into an actual MySQL database. Then, he needed to find out what software SER wiki used. For some reason, a site called WikiApiary had a listing for Sonic Eats Rings wiki which included its version of software, namely being MediaWiki 1.11.1. Sofox had the option of installing that exact version of MediaWiki on the new server, but decided to go with the latest stable version. MediaWiki was still going strong thankfully, but with 8 years of updates it was now at version 1.34.2. He installed a clean new version to the server, hooked it up to the database... and it didn't work.

So, instead, going back and installing MediaWiki 1.11 and hooking the original version into the database. Finding that version on MediaWiki's website (even though not directly linked to), he installed it and... it didn't work. A bunch of PHP errors. Clearly it was designed to work with an earlier version of PHP than was provided by the server's hosting.

Back to examining the upgrade page ( https://www.mediawiki.org/wiki/Manual:Upgrading ) and it seemed to imply that the database could be upgraded in one fell swoop. So, trying again with the most recent version, installing directly to server, setting up, configurations... more tweaking... it worked!

Only, without the images... Spazz kindly gave Sofox access so he could gain the original images used in the SER wiki, and manually FTP them onto the new server. It worked. The wiki was working with pages, data... and huge amounts of Spam.

Sofox left it at that for a bit, putting the Wiki in ReadOnly mode while he researched the spam problem. Eventually he left it for months and finally returned to it a few months later in November. It was dispiring, as there was just so much SPAM. He needed to understand the database, its relations, get rid of the spam accounts, revisions, text, pages, etc... After trying different tools and doing lots of research, he came across this page which had a lot of fantastic advise on it. Combining that with his own analysis of the database and research, Sofox gradually targeted all the accounts made after a certain date, and deleted their associated contributions. Also had to rebuild and reset a few things afterwards.

The result? From 1.7GB, the new database was only 23MB. The pages went from 27,389 to 1,452, and the Member List went from 51,733 to 120.

Yeah, spam sucks.

But finally, the database was restored, the wiki unlocked, and the denizens of SER able to claim it for their own once more.


cleaning out the spambot accounts and their posts after the SER wiki had been left idle for years. Ultimately, reduced the entire wiki database size from 1.7GB to 23MB... just by deleting spam.