testomato-site-down

We’re all familiar with the feeling of panic that arrives after receiving an alert saying your site is down.

But ever felt confused?

Here’s a common question about Testomato we’ve been hearing from some of our users:

“Why is Testomato reporting that my site is down when I can access it?”

Testomato may occasionally report your site as down or unavailable, but everything looks fine when you go to investigate the problem for yourself.

There are a few reasons that can cause this to show up in your reports, so we’ve pulled together a short guide to help you better understand what might cause this issue to occur and why Testomato sends you an alert that your site is down when it’s not.

Location Matters

Uptime Monitoring in Testomato is done externally rather than internally. This means that Testomatobot tests and monitors your site or server from a location that is outside the local network where your server is hosted.

In some cases, network latency issues can cause a project to be unavailable to Testomato for a short period of time even though you are still able to access it.

To help avoid this problem, we will retest your project again if Testomatobot is unable to connect with your site or server. If we’re still unable to access your project after the second time we try to connect, we’ll send you an alert that your site is DOWN.

The reason we try to wait is to avoid sending alerts about false timeouts or short-term issues that don’t impact your visitors. Think of it as a safety precaution: we don’t want to bother you until we’ve determined there is an actual problem.

We’ve also started adding more testing locations in Testomato to help minimize the influence of our connection on your project’s test results.

How to Evaluate the Problem

You can dig deeper into the details of an outage by analyzing the HTTP header response for the request made to your website.

To view recent results, click the gear icon of a failed test and then, click HTTP header response at the bottom of the page.

This will open up a more detailed report, containing information about the trace-route and the content of the returned data:

response

For information about older incidents, just click on an issue in your Issues Timeline located in the Reports tab of a project’s dashboard.

This report will show you the error, the response time, the length of the issue, as well as the response headers for the request.

Understanding Your Results

When outages are short (i.e. under 2 minutes), they are most likely caused by a temporary issue somewhere between Testomatobot and your site. The cause of this type of issue can be hard to determine.

Here are good places to start when trying to discover the exact source of a problem:

Timeouts

Timeouts make up over 70% of the short incidents we see reported by Testomato.

Most of the time, timeouts occur during peak hours when you have high visitor traffic. However, another common reason is that your hosting service or server reaches a concurrent connection limit.

A concurrent connection is how many people are calling a task on your website at one time. When someone clicks on a page, and then submits a form on your website, these are considered two separate consecutive actions that require two separate consecutive connections.

However, when 50 people click on the same page at the same time, these are considered 50 concurrent connections. After the page loads, the connection closes.

Hosting services often limit the number of concurrent connections to a site to avoid an overload of processes, or how many visitors can simultaneously connect to your hosting account at one time.

Testomato opens a concurrent connection for every test that it runs on your site, and in some cases this can cause a page to be unavailable when we try to test it.

What can you do to avoid this problem?

We have a few suggestions:

  • Increase your timeout period.
  • Set a longer delay period between subsequent tests to avoid an overlap in tests from Testomato.

To adjust these periods, visit the Settings tab in your project dashboard:

settings1

You can delay your notifications from your account Settings:

Screen Shot 2015-03-19 at 3.33.33 PM

 HTTP Status Codes

Roughly 15% of short incidents are related to server responses with error code. Many of these can be general, which can make it harder for you to track down the error on your website.

Here’s a quick look at the three most common status codes we see in Testomato and possible reasons you might see them:

  • 503 Service Unavailable – Your backend was unable to handle all the requests made to your site. As a result, your proxy will start returning 503 errors.
  • 500 Internal Server Error – This is a very general server error, so the best place to check for more clues about the problem is your application log. These errors are often connected with availability issues and timeouts related to your database server, or other important services your application needs to run.
  • 502 Bad Gateway Error – These errors are usually protocol problems between your proxy and backend server due to maintenance.

PHP Fatal Error – Out of Memory

About 5% of the reported incidents in Testomato are related to PHP errors. You can find full details about each error in your Issues Timeline located in your project’s Reports tab.

However, the PHP error we see the most in Testomato is Fatal Error: Out Of Memory. This is a particularly common if your application processes large amounts of data.

This message means you’re trying to load more memory than you have available in the system.

We suggest evaluating the other processes that run on your server and  how much memory they consume. Increase your system memory accordingly, and then optimize or remove greedy programs (like your database) to another server.

If you’re running on Apache, you might want to take a second look at how its configured. Try getting rid of unnecessary modules, set lower MaxClients, and limit MaxRequestsPerChild to prevent memory leaks. You could also try migrating to a web server with better memory management, such as nginx or IIS.

Download Errors 

Download errors are usually caused by Network, SSL, or DNS problems.

Let’s take a look at the most common errors we send to Testomato users:

  1. “Failed to connect to host (Connection refused)”: This message means that Testomatobot was unable to connect to your server and is getting refused by a firewall. If you filter your traffic, check your firewall configuration and be sure to allow access to Testomatobot’s IP addresses.
  2. “Downloading content failed (Connection reset by peer)”: This message means the connection was terminated before the page could finish downloading. This is often related to maintenance work on your server, such as a software upgrade or a server restart by the server administrator. It could also be caused by general server failures like segmentation faults.
  3. “Failed to connect to host (No route to host)”: This message means there was a routing problem and Testomatobot was unable to connect to your server. Unfortunately, this could occur anywhere between your server and Testomato, but the good news is that this is usually related to a short-term network problem.

Image credit: Garrett Knoll

Have more questions about why short-term incidents and Testomato? 

Feel free to contact our team any time at support@testomato.com. You can also join us on Facebook or Twitter.