
So this week’s post is a little bit different and I have to confess myself surprised by the outcome. More on that later. First, the prompt for this week’s art was “a performance tester experiencing l’espirit de l’escalier”. The reason for that will become clear later. But I was delighted to see Performance Testing re-imagined like an episode of Dr Who. I do wish I’d walked down more ornate yet impossible staircases in my professional career. Even if I look more like a Sontaran than Dr Who.
Anyway, on to the attempt to use ChatGPT to set performance test objectives. You may recall that was the first stage of ChatGPT’s “plan” (well, to-do list) for running a test. This time rather than just fire one question at it (him/her?) and dissecting that, I ended up having rather a long conversation with it.
It was long for two reasons. First, I’ve come to think of ChatGPT as like a precocious and highly intelligent 11-year old that can’t wait to give you all of the answer, whether you want it or not. (“I don’t care what the genus is, does a camel have one hump or two?!!”) Second I was trying to see if I could trick ChatGPT into saying something daft.
To keep this post within the vague realms of sanity (if only) I found I could simply link to the chat. So you can review it if you like, but I’ll summarise below.
Before we get going (yes I know, I see you looking at your watch), I have to say I *had* thought I’d got the measure of ChatGPT. I thought it was going to be great at generalities and hints – interview question answers – but poor on detail. Turns out, I was wrong…
I decided I would performance test that old stalwart, the demo flight booking website. I’m sure we all remember that fondly. So first I asked “How might I establish performance test objectives for testing a flight booking website?”. ChatGPT then gave me another huge to-do list, summarised here.
Understand User Expectations and Business Requirements, Define Key Performance Indicators (KPIs), Set Baseline Metrics, Load Testing Objectives, Stress Testing Objectives, Scalability Testing Objectives, User Experience Objectives, Set Priorities and Thresholds, Consider Technical and Environmental Factors, Document and Communicate Objectives,
Review and Update Objectives Regularly
Each of those topics had 4-6 subtopics. So, yeah, I agree with the list, but I was starting to think if every item on the list spawns another list, we’re going to enter from fractal planning hell. Just how long were all these tasks going to take? I asked it and it said “that depends on scope, complexity, resource, requirememts, data, stakehold availability, tools, test cycles and documentation”. (There was some screaming at this point). It did however say that this phase of the project was 1-2 weeks for a small project, 2-4 for a medium project and 1-2 months for a complex one. “Luxury” I said in my best Yorkshire accent. But hey, I got a straight answer that wasn’t a list.
So I thought I’d start with the first item in the new list it had given me in response to my question on the first item from last week’s list. I asked it how to identify User Demographics. Can you guess what happened? Indeed, a list with another 10 items to do. Number 2 was “conduct surveys”, which tells you what the list was like.
I threw my hands in despair and asked “what I need to understand is the core processes quicky… I only have a week”. I got an 11 item list.
Finally – and this is my moment of l’espirit de l’escalier – I made the fatal mistake of suggesting an answer. This I suspect is the trap that shady psychics set; “a name beginning with D” “My dad?” “Yes! exactly!”
Indeed ChatGPT was the same. I foolishly asked “I think maybe it’s basically looking for a flight and booking one. Does that seem reasonable?”
and ChatGPT said “Absolutely, at its core, a flight booking website revolves around two primary processes: Searching for Flights, Booking Flights.”
There was a bit more to be fair. Then a list. Naturally. If I’ve learnt one thing about the mind of ChatGPT, it’s inordinately fond of lists. I realised later what I should have asked, hence the staircase. I’ll say more about that later.
Having got my core scenarios, I thought I’d get some other info. First I said the business had said the system had ten billion users, did that sound ok. It noticed that that exceeded global population and suggested there might have been a misunderstanding, which was nice. It told me how I might resolve that. In list form.
Then I said, they meant ten billion hits per hour and again it spotted the problem. There was a list to resolve it. Then I said, ok that was per month not per hour. But Google Analytics said 1000 per hour. Who was right. There was another … need I tell you? But ChatGPT came down on the side of Google Analytics at least.
I then floated a 10ms response time – it baulked at that and gave me some interesting rules of thumb for response times. So I said we’d settle for 200ms and asked about expectations for CPU and memory. Again I got some actual figures, and only a short list. I’m not sure I agree with CPU spiking 100% at peak times, but other opinions are available I’m sure. Depends on the system (and if you want a list of those “depends” I can give you one.) The memory suggestions made sense and it even mentioned memory leaks.
So, I’ve got my load scenarios and my KPI’s – both user and system. Seems like I’m getting somewhere at last. I thought I’d ask it what that mix should be between searching and buying. I got a list, but buried in the list was a nice 70:20:10 ratio of search, details, buy. I was quite impressed by that.
My last question then – which I was going out on a limb, assuming I’d get another to-do list – was this
OK – I like your numbers. So I’m going to create 2 scripts, one to browse and one to buy. I’m going to aim for around 1000 hits per hour and have a ratio of 10:1 for browse to buy. Average response should be 200ms and cpu should be under 85%. I’ll make an average mix and negotiate with the business for a peak mix. One last thing – how many users do you think I should have for my average mix?
Lots of natural language, lots of assumptions needed from previous questions and a non-sequitor of a question. The response it gave me – to use the vernacular of the internet – “blew my mind”
Calculating Concurrent Users:
- Determine Hits per Second:
- 1000 hits/hour is approximately 1000/3600 hits/second, which is roughly 0.28 hits/second.
- Estimate Average Session Length:
- Assume an average session length where a user is active. This might be anywhere from a few minutes to potentially longer on a booking site. For this example, let’s say an average session is 10 minutes (600 seconds).
- Calculate Requests per Session:
- You’ll want to estimate how many hits (page loads or API calls) a typical user makes during a session. Given the browse-to-buy ratio, there will be more browsing hits than buying hits. Let’s assume that in a 10-minute session, a user might make 5 browsing requests and 0.5 buying requests on average (considering the 10:1 ratio).
- Calculate Concurrent Users:
- The formula for calculating concurrent users is: Concurrent Users=(Hits per Second)×(Session Length in Seconds)Concurrent Users=(Hits per Second)×(Session Length in Seconds)
- However, considering the number of requests a user makes during a session, we adjust the hits per second accordingly. If a user makes 5.5 requests per session (5 browse + 0.5 buy), then: Concurrent Users=(Hits per Second)×(Session Length in Seconds)/(Requests per Session)Concurrent Users=(Hits per Second)×(Session Length in Seconds)/(Requests per Session)
- Concurrent Users=0.28×600/5.5≈30.5
This calculation suggests you’d need around 30 virtual users to simulate your average load of 1000 hits per hour, given the assumptions. Since you can’t have half a user, you’d round up to 31.
Which was pretty amazing to me. There was barely any list even. Suddenly, and from the least well defined question, I was getting some really useful information. And really detailed information at that. What was going on?
So in the spirit of ChatGPT, and having spent a couple of days thinking about it, I have some thoughts which I’ll put in list form
- GIGO maybe also means Generality In, Generality Out. I suspect ChatGPT likes the detail and replies in kind.
- Although it’s using natural language, it’s a mistake to think of ChatGPT as fully parsing it. If I was in the ChatGPT role and some one was talking about performance testing a flight booking system I’d know they were after an answer and not a process to get one. I’d say “2 scripts, search and buy, get stats from Google, say 500ms should be ok – check with the business, off you go”. ChatGPT can’t do that and is really giving me what I asked for – a process to get that.
- I’m starting to wonder if there’s something in these Prompt Wizard roles (and frankly wouldn’t mind bagging 250k a year).
So with all that in mind, I might take a side quest next week and explore the best way to get good info out of ChatGPT. Preferably without so many lists.
On the spirit of the staircase, I realised I should have asked something like “What kind of user activity would you expect on a flight booking service?”. Out of curiosity, and not wanting ChatGPT to be primed, I asked exactly the same question for a hotel booking service. I got this
- Searching for Hotels: Users enter dates, destination, and possibly filters such as price range, hotel star rating, amenities, and more.
- Viewing Hotel Details: Users click on hotels to view detailed information like room options, amenities, reviews, and photos.
- Booking a Room: After selecting a room, users go through the process to book, providing personal information and payment details.
- Managing Reservations: Users may log in to view current and past reservations, cancel or modify bookings, and possibly check-in online.
- Writing Reviews: After their stay, users might come back to the service to leave reviews and rate their experience.
- Customer Support Interaction: For questions or issues, users might initiate chats or calls with customer support.
Not bad ChatGPT, not bad. Clearly, I need to go to ChatGPT Hogwarts.
See you next week after Defence against the Dark Arts. Happy System Smashing.
