how does the status monitoring website work under the hood?

alexdeathway@programming.dev · edit-2 10 months ago

how does the status monitoring website work under the hood?

echo64@lemmy.world · 10 months ago

The answer that the status service websites will tell you: we automatically detect outages by performing http requests and checking responses for errors

the actual answer: some overworked developer gets woken up at 3am via pagerduty and manually set the status website to an outage state

doeknius_gloek@discuss.tchncs.de · 10 months ago

You should check out Uptime Kuma which offers different monitor types. This should give you a good start for your own implementations. Or maybe you’ll find that Uptime Kuma already covers your usecase.

SteveTech@programming.dev · 10 months ago

A lot of external status services just send a HTTP request to a certain url, if it succeeds then it’s up, if it errors or times out then it’s down. They also usually let you check if TCP ports do the usual handshake thing if you aren’t using HTTP.

The response time can also be used to check if a site is running slower than usual too, and if you have a use for it you can usually specify the required response code for success.

Although I wouldn’t be surprised if GitHub has some per-server analytics they can also use to estimate the load, but Instatus would work as described above.

Sometimes these sorts of things are referred to as health checks, if you’re looking for search terms. For example Docker can be set up to poll a container’s web server every few minutes, and mark it as unhealthy it if it stops replying using the HEALTHCHECK instruction in the Dockerfile.

TCB13@lemmy.world · 10 months ago

Simple, do a GET or HEAD HTTP request to the monitored website with a 3 second timeout. If you get a 200 response code then you can assume the website is online and okay.

Why HEAD? Because:

if a URL might produce a large download, a HEAD request could read its Content-Length header to check the filesize without actually downloading the file. (…) A response to a HEAD method should not have a body

Using HEAD instead of GET will make it so your code doesn’t have to actually download your frontpage to get the status. This will speed things up and reduce bandwidth usage.

Note: webservers may also return response codes for redirects, like 301 or 308 and and this case you usually do a follow up request to the URL the server pointed you at in order to check if it returns 200. Some HTTP libraries have built in ways to handling this and with a simple boolean they’ll follow the redirect for you.

nutsack@lemmy.world · edit-2 10 months ago

curl -k https//example.com/healthcheck

towerful@programming.dev · 10 months ago

A webservice can be passively monitored.
So, the status system would check DNS records, ping IP addresses and do a get request to check it gets a 200 response. Further metrics like ping and response times could be monitored and report if they are too high, indicating heavy load.
Uptime Kuma is a foss project that is popular amongst self-hosters.

A webservice can actively report for monitoring. So a webservice would monitor its CPU/RAM/network usage, database connections, cache misses, stuff like that. If you are load balancing, then an additional service would be needed to aggregate the results of all these and decide when its degraded performance due to too many nodes being offline/overloaded.
Things like prometheus, netdata can do the metrics.

Or, like how i think a lot of these work, just report it manually. Ive seen quite a few companies that report green status, despite having fairly huge issues

xmunk@sh.itjust.works · 10 months ago

Nearly always it’s by “pinging” which may or may not actually use ping. Some server somewhere is sitting there querying the server every minute to see if it responds with a 200 - for the better statuses they’ll try and activate various routes and report whether portions of the server are available.

Haatveit@beehaw.org · edit-2 10 months ago

I can’t give an authorative answer (not my domain), but I think there are two ways these types of things are done.

First is just observing the page or service as an external entity; basically requesting a page or hitting an endpoint, and just tracking whether you get a response (if not, it must be down), or for measuring load level in a very naive way, track the response time. This is easy in the sense that you need no special access to the target. But it’s also limited in its accuracy.

Second way, like what your github example is doing, is having access to special api endpoints for (or direct access to) performance metrics. Since the github status page is literally ran by Github, they obviously have easy access to any metric they could want. They probably (certainly) run services whose entire job is to produce reliable data for their status page.

The minute details of each of these options is pretty open ended; many ways to do it.

Just my 5¢ as a non-web developer.