网站上的数据删除掉后,有些搜索引擎蜘蛛还是会傻傻的每天都来领个404回去...(比如Baidu,比如Sogou),它们这些畜生基本上不看Robots.txt文件的...410代码说明了一个文件Gone啦...而不是暂时找不到。这篇文章介绍了410代码以及几种实现方法(比如 Rewrite) 原文链接


Let’s all talk about HTTP error code 410.

As far as I can tell, it’s the forgotten stepchild of error 404 (Resource not found). Error 410 means Resource gone, as in, a resource used to exist at this location, but now it’s gone. Not only is it gone, but I don’t know (or I don’t want to tell you) where it went. If I knew where it went, and I wanted to tell you, I would use error 301 (Permanent redirect) and any smart client would simply redirect to the new address. But 410 means Resource gone, no forwarding address. Train gone sorry.

Somewhere in my audience is an HTTP guru who can tell me if I’m getting this right.

Now, there is not a lot of information about error 410. Oh sure, you can search for http error 410 on Google and come up with lots of hits, but they’re all just pages that list all the error codes and give a brief description of each. No docs, no further explanation. I suppose because it addresses a condition that doesn’t come up very often. Also, we’ve all been brainwashed into believing that all resources should be permanent, which simply isn’t true.

Embracing HTTP error code 410 means embracing the impermanence of all things.

Now then, on to implementation. Scouring the Apache documentation, I’ve found several ways to specify that a resource is Gone. The first uses Redirect:

Redirect gone /path/to/resource

For example, if I put up a temporary page:

http://diveintomark.org/tmp/some-screenshot.png

…and I later wanted to delete it to save space (or whatever), I should put this in my .htaccess file:

Redirect gone /tmp/some-screenshot.png

The path is the virtual path of the resource on my server, not the full filename on disk, and not the full URL.

You can also use RedirectMatch to match multiple files, using regular expressions. For instance, this would match all files in my tmp/ directory named something-screenshot.png:

RedirectMatch gone /tmp/.*-screenshot\.png

The third option is to use mod_rewrite, which allows you to use complex conditionals to decide when to serve up the 410 Gone error. For example, I have a mobile edition that contains an index page and stripped-down pages of the most recent 5 articles. These article pages are not meant to be permanent; the whole thing acts like an RSS feed, except that it’s split across several pages because that’s how mobile devices expect it. Each page has its own separate address, but it only lives for a short time. I also can’t reuse the same URLs over and over, like always putting the articles in /mobile/1 through /mobile/5, because that would confuse AvantGo’s caching proxies.

So after articles fall off the mobile index page, I delete them from my server, and I want to serve up the appropriate error code to proxies and robots that come looking for them later. From what I can tell, 410 is the perfect error code for this. I don’t want to manually maintain a list of Redirect rules, though, so I use mod_rewrite:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule mobile/[0-9]{6}\.html$ - [G,L]

In English, this says:

  1. if there is a request for a file that doesn’t exist (that’s the !-f flag),
  2. and the file is in the mobile/ directory and named any-six-digits + .html (that’s the regular expression),
  3. then use HTTP error code 410 Gone (that’s the G flag)
  4. and immediate return without further ado (that’s the L flag).

This is not a perfect solution; if you randomly type arbitrary URLs of pages that never existed but that match the pattern, you’ll get error 410 instead of the more proper 404. But it does cover the more likely case of a proxy or search engine coming back to a resource it previously spidered, and finding that it no longer exists. (Update: read the comments for some possible solutions to this problem.)

When a client requests a page that you have marked as 410 Gone, Apache generates a default error page that looks like this:

Gone

The requested resource
/path/to/requested/resource
is no longer available on this server and there is no forwarding address. Please remove all references to this resource.

…which is fine as far as it goes, but it’s about as aesthetically pleasing as the default 404 Not Found error. However, you can create a custom 410 Gone page, in much the same way you can create a custom 404 Not Found page, by using the ErrorDocument directive in your .htaccess file:

ErrorDocument 410 /path/to/custom/page

Again, this is the virtual path on your server, which is probably just the web address without your domain name. (It could also be a fully-qualified URL to a remote machine, but in that case, the client would not receive HTTP error code 410; they would receive a redirect status code instead. So it’s probably best to keep it local, so clients get both the custom page and the intended 410 error code.)

Other possible uses for 410:

  • Temporary pages (like screenshots)
  • Deleted blog posts
  • A trial site on a server that offers hosted sites free for 30 days, after the trial period has expired and the user hasn’t paid to continue it
  • User sites on hosted servers like Geocities, after the user has been deleted

I don’t know. It’s not a very common situation, and it’s not a very common error code. Which is probably why no one has written a tutorial about it before. I’m not even totally convinced that I’m using it correctly, although I am convinced that there are people reading this who know more about it than I do, and who have an opinion about whether I’m using it correctly.

发表评论
表情
emotemotemotemotemot
emotemotemotemotemot
打开HTML
打开UBB
打开表情
隐藏
昵称   密码   游客无需密码
网址   电邮   [注册]