This content originally appeared on Perishable Press and was authored by Jeff Starr
My free WordPress plugin, Blackhole for Bad Bots, and the pro version, Blackhole Pro are not compatible with cache plugins. This is stated on the plugin home page, installation page, support page, readme.txt, documentation, plugin settings page, help tab, and just about every other possible location.
Contents
- Caching can break things
- Search Console can help
- Blackhole Cleaner
- Example
- How to use
- Download
- Notes
- Questions
Caching can break things
Why doesn’t it work with caching? Because caching can prevent dynamic plugins and scripts from working correctly. So instead of serving dynamic content, sites with caching serve up static/cached pages. This includes the blackhole “trap” page and the “you have been banned” page. So when search engines crawl the site, they pick up the cached blocked pages and corresponding ?blackhole
query parameter in URLs. The cached blocked pages and ?blackhole
URLs then show up in the search results.
So hopefully you have heeded the warning to NOT use any cache plugin together with Blackhole (either free or pro version). But if, for whatever reason, you used Blackhole on a site with caching, you may want to clean up any ?blackhole
URLs that are appearing in the search results.
Search Console can help
It’s not easy to convince Google, Bing, and other search engines to change or update the pages and URLs that have been collected in the search results. You can try to visit Search Console or whatever tools are provided by the search engines and manually request to have specific pages/URLs de-indexed, so they are removed and eventually replaced with updated current/correct versions. But the page updating, removal and recrawling/replacing takes time at best. Fortunately there is a simple plugin that can help to facilitate the process..
Blackhole Cleaner
The Blackhole Cleaner plugin is a super simple plugin that does one thing and one thing only: it removes the ?blackhole
parameter from all query strings on your site. So every time a bot or human visits your site by following a ?blackhole
URL, the Blackhole Cleaner plugin removes the ?blackhole
parameter from the query string via simple redirect. And as it does this, it leaves all other query parameters untouched, so everything will work exactly as it did before, but without the blackhole
parameter included in any URLs.
Example
For example, let’s say that you’re finding the following URLs included in search results:
https://example.com/ai-will-replace-the-internet/?blackhole=1234567890
https://example.com/ai-has-not-reached-singularity-yet-but-soon/?blackhole=0987654321
https://example.com/ai-will-require-complete-compliance-for-access/?blackhole=192837465
Notice the ?blackhole
parameter in each URL? That’s what the Cleaner plugin removes. So after installing and activating Blackhole Cleaner, the above example URLs will get scrubbed clean:
https://example.com/ai-will-replace-the-internet/
https://example.com/ai-has-not-reached-singularity-yet-but-soon/
https://example.com/ai-will-require-complete-compliance-for-access/
How to use
The Blackhole Cleaner is not a one-stop set-it-and-forget-it solution for dealing with unwanted indexed pages in search results. Instead it is best used together with a complete “clean up” strategy. The following strategy assumes that you want to continue using your cache plugin and remove the blackhole plugin.
Step 1: First, make sure to remove the Blackhole plugin via the WordPress Plugins screen, by clicking the remove/delete button. This will make sure that all options and plugin data are removed properly from the site database.
Step 2: Second, once the plugin is removed, clear/empty/reset the cache plugin. Different cache plugins and apps refer to “resetting” the cache using different terminology. But the point is to make sure that the cache no longer contains any previously cached pages or content. So dump or flush the current cached pages entirely, like everything, and start over from scratch.
Step 3: Third, visit the Search Console for Google, Bing, and any other search engine that you care about. If they have one, not all do. Inside of the search console there should be a tool or way to report any URLs and pages that should not be indexed. This is where you can let the search engines know about any pages that need to be re-crawled and re-indexed with fresh/current content. As mentioned, it can take some time for this process to happen, but the Cleaner plugin can help to facilitate and speed things up a bit.
Step 4: Fourth, install the free Blackhole Cleaner plugin. There is a download link below. No changes need to be made to the plugin. It’s entirely plug-&-play, no settings or configuration required.
Lastly, make sure to test well and keep an eye on things. Update the search console as needed, making sure to add any new ?blackhole
pages that may appear in the search results. In time, the URLs will be replaced in the search index along with fresh cached copies of your web pages.
At some point, when you no longer find any blackhole parameters in the search results, it should be fine to remove the Cleaner plugin.
Download
Here you can download the Blackhole Cleaner WordPress plugin for free.
After downloading, install and activate just like any other WordPress plugin.
Notes
Change response status code
By default the Cleaner plugin redirects any ?blackhole
URLs via 301 “Permanent” server response status. To change that to a 302 “Temporary” status, simply replace 301
with 302
in the plugin code. Take a look at the code, it is as simple and clear as I could make it 🙂 There is only one instance of 301
, so you know exactly where to make the change.
If you would rather block any ?blackhole
requests (instead of redirecting), replace the following line:
wp_safe_redirect($url, '301');
..with this:
http_response_code(403);
Save changes and done. Remember to test well before going live with any changes. Note that it’s very bad practice to just block users without giving them a chance or explanation or anything. So only implement blocking in the Cleaner plugin if you know 100% what you are doing. Otherwise just leave it set to redirect via 301 status is recommended.
Remove any other cleanup techniques
The Cleaner plugin is super lightweight and fast, replacing any previous techniques such as this simple Apache/.htaccess directive:
<IfModule mod_rewrite.c>
RewriteCond %{QUERY_STRING} (blackhole) [NC]
RewriteRule (.*) - [F,L]
</IfModule>
There are variations on the above code. As written, the above example simply blocks any ?blackhole
requests. There are redirect versions of the above code, and so forth. The point here is that, if you are using anything like the above code to modify or remove blackhole URLs, make sure to remove it before using the Cleaner plugin. Otherwise, you will get some unexpected results.
Custom blackhole parameters
If you are using the pro version of blackhole, there is an option to customize the blackhole
query parameter. So if you did this, and replaced blackhole
with whatever
, then change the following lines in the Cleaner plugin:
First, change this line:
if (isset($_GET['blackhole'])) {
..to this:
if (isset($_GET['whatever'])) {
And then also change this line:
$url = remove_query_arg('blackhole', $url);
..to this:
$url = remove_query_arg('whatever', $url);
Save changes and done. Remember to test well before going live with any changes.
Questions?
Drop a comment below or reach me directly anytime via my contact form.
This content originally appeared on Perishable Press and was authored by Jeff Starr