Add selenium remote timeout
This commit is contained in:
@@ -0,0 +1,49 @@
|
|||||||
|
---
|
||||||
|
title: "Selenium Remote Driver Timeout"
|
||||||
|
date: 2019-01-18T17:30:17+01:00
|
||||||
|
author: James McDonald
|
||||||
|
type: post
|
||||||
|
categories:
|
||||||
|
- Tech
|
||||||
|
draft: true
|
||||||
|
---
|
||||||
|
|
||||||
|
Investigating an issue with a Selenium grid revealed some interesting
|
||||||
|
shenanigans. We were experiencing a problem where some (working) tests failed
|
||||||
|
and the Selenium grid was stuck with browsers apparently busy and jobs in the
|
||||||
|
queue. Sometimes the grid itself would become unresponsive.
|
||||||
|
|
||||||
|
After a bunch of investigation I managed to track down the source: the test
|
||||||
|
suite was setting the Selenium client's `read_timeout` to 15 seconds. Doesn't
|
||||||
|
sound so bad, right? So here's where it all goes bork...
|
||||||
|
|
||||||
|
The test job runs 8 tests in parallel, and it's possible for more than one job
|
||||||
|
to be run at the same time, so more multiples of 8.
|
||||||
|
|
||||||
|
The interesting stuff starts when the 15 second timer is exceeded. The client
|
||||||
|
immediately gives up, marks the test as failed because of `ReadTimeout` and
|
||||||
|
goes on to the next test. But Selenium doesn't know about that, so the job
|
||||||
|
stays in the grid's queue. That wouldn't be too bad in itself, but
|
||||||
|
unfortunately that's not the end of it. When the job gets allocated a browser
|
||||||
|
instance, it runs normally. Then, as far as I can tell, the browser instance
|
||||||
|
sits and waits politely. Presumably it expects some client thread to come along
|
||||||
|
and pick up the result, but the client is long gone. So it sits. And waits.
|
||||||
|
Until the `browserTimeout` reaper comes along and stabs it.
|
||||||
|
|
||||||
|
Remember the client that went and started on the next test? That one might get
|
||||||
|
stuck in the queue too. And another, and another. And more from all the other
|
||||||
|
impatient threads running their own tests. Quickly, the browser pool is
|
||||||
|
saturated with stuck browsers waiting for clients that have wandered off. Add a
|
||||||
|
couple of hundred of these and you can jam up the whole grid queue to the point
|
||||||
|
where the grid service no longer responds at all.
|
||||||
|
|
||||||
|
As an aside, it seems like the browsers get very upset by this sitation. Chrome
|
||||||
|
in particular chews up multiple gigabytes whilst apparently doing nothing until
|
||||||
|
these jobs are finished. I'm not necessarily sure it's related, because
|
||||||
|
browsers do love them some RAMs at the best of times.
|
||||||
|
|
||||||
|
There might be several solutions to this, but I went for the simplest one. We
|
||||||
|
increased the timeout to 2 minutes (the default appears to be 1 minute, which
|
||||||
|
would probably also be fine). The nice, patient test clients leave plenty of
|
||||||
|
time for requests to be handled, get the responses they're looking for, and
|
||||||
|
nobody jams up anybody's queues. Lovely.
|
||||||
Reference in New Issue
Block a user