Documentation

HTTP Content Checks

About HTTP Content Checks

HTTP Content Checks are very similar to HTTP Checks, except that HTTP Content checks look for a specific word (string) in the data returned in the response in addition to checking the status code and response time. HTTP is the transfer protocol for web pages, as well as being used for some other web based technologies. HTTP is a request-response protocol. The client, in most cases in daily life a browser, sends a request to a server and the server sends a response back, hopefully with the information the user was hoping to get. The response includes a header component and a content component. The content component is the part of the response people see in their browser. It's the part that contains the HTML. The HTTP Content check makes an HTTP GET request for the target URL and checks to see if the response code is between 200 and 399, that the response time is within the configured threshold, and that the content string or regex configured in the check is in the page. This check type supports either positive or negative checks, meaning that it can be configured to treat the check as passing if the page includes the content, or it can treat the check as passing if the page does not include the content.

When to use HTTP Content Checks

HTTP Content Checks are a key piece of a good website monitoring strategy. You should use HTTP Content checks anytime you need to monitor if a web page is responding with specific content. Most often the content you should configure to look for is from a part of the page that doesn't change. This is particularly important when you are monitoring dynamic sites (especially blogs) and sites with content management systems. Footer sections of pages are often a good choice.

Optionally, instead of looking for an exact match string in the returned content, you can use a regular expression (regex).

Negative content checks are useful to monitor dynamic content blocks for errors. For example, if you have a feed or block on your page that shows the latest stories, that part of the page might have errors or problems that are independent of the rest of the page loading properly. In those situations you can use the "Does not contain" configuration for HTTP Content checks to make sure that the page does not include "Database connection error" or "0 New Stories." HTTP Content pages are also useful for monitoring status pages. For example, some servers have a status page that lists "OK" for several components of the services on that server, and an "Error" status for a specific service if it has problems. You can configure a HTTP Content check to Pass if the content returned by the page does not include the word "Error."

Using HTTP Content Checks

To set up a HTTP Content check,

Select HTTP Content from the Check type drop down.
Give it a friendly label to identify this check in lists and notifications.
Enable Automated Diagnostics if you'd like detailed technical info about the failure that may help you troubleshoot a failure.
Set how often you want the check to run in the Check Frequency field.
Enter the target URL you want to check. It must be a valid URL, starting with either http:// or https://. Any valid URL will work fine, including basic authentication, port numbers, and query strings. Usually you can just go the page you want checked and copy and paste the URL from the address bar in your browser. If you use the authentication feature for this check, or otherwise include confidential information in your checks, please keep our Terms of Service in mind and limit the access provided by the credentials you use. The following examples are all valid (although these are fictitious, don't actually use these URL's in your checks):
- http://www.example.com
- http://couchdb.example.com:5984
- https://www.example.com/foo/bar.html
- http://www.example.com/foo/bar.html?this=that&eggs=green
- https://sam:[email protected]/foo/bar.html?this=that&eggs=green
- http://[2606:c700:4020:11::53:4a3b]/
To force an IPv6 resolution for the FQDN in your URL, change the dropdown from "(Default IP resolution)" to "Force IPv6 resolution". If you're unsure, the default is what you want.
If you'd like to verify that particular words appear or do not appear on the webpage being checked, type the words you're checking for into the optional 'Content String' field and set it to either 'Contains' or 'Does not contain'. The website's content is not inspected if this field is left blank.
You can use the 'Exact Match'/'Regex' dropdown to have the check treat the 'Content String' field as a regular expression. Exclude the opening and closing slashes, '/' in your regex. Example: '[Ss]ilent' not '/[Ss]ilent/'
If you want the check to follow redirects and evaluate the redirect URL change the 'Redirects:' drop down to "Follow redirects".
Set a time out. The default 5 seconds works fine for most situations.
Set the Sensitivity. High is usually appropriate. Some web hosts have intermittent periods of time when they don't respond quickly. For a visitor this might not be a big deal if it is intermittent, but it might mean you'll get more "FAIL" responses than you anticipate. In those cases, set the Sensitivity lower.
Set the notifications for this check. More information about notifications.

Other considerations

This check doesn't care what the rest of the data returned by the page looks like. In fact, it doesn't even need to be HTML. It just has to contain the string or regex match the check is looking for. This means that the check is useful for checking XML or JSON content as well as standard web pages. It can also look for HTML tags, but the system does some filtering in order to protect against XSS attacks, so there are some specific strings that you can't use for this check. Typically there are ways around this by using parts of the string that don't look like a XSS attack.

There is a 3MB limit for this check on the data received from the server. If your URL returns more than 3MB of data, this check will fail.

When following redirects, there is a 4 redirect limit then the check will fail with the following error "Too many redirects".

The threshold timeout applies for all redirect requests and responses, not just the last HTTP request.

IPv6 URLs require the bracket formatting such as http://[2606:c700:4020:11::53:4a3b]/

SSL/TLS certs are not validated by this check so it will work fine for expired and self-signed certs. You'll want to add an SSL check to verify your SSL certs and get warnings before they expire.

SSLv3/TLS1.0 are not supported.