A live TV broadcast relying on mobile apps and supporting online infrastructure. What could possibly go wrong?!
A challenging project with a WebAPI supporting up to 300,000 concurrent users at peak times.
What could possible go wrong? As it turns out, nothing. The live show, mobile apps and online supporting components were a success and received a number of awards.
'Best Multiplatform Project' and 'Best App' - Broadcast Digital Awards
The Singer Takes it All was a live television show produced by Channel 4 and Endemol Shine. It was the world's first talent show to hand the fate of the contestants over to the viewers. The Glasgow based agency Chunk was responsible for it's online component.
Nominated for a BAFTA Digital Creativity Award
The project as a whole consisted of "karaoke" mobile apps (I didn't work on these) that allowed contestants to record their own video performance, to be watched and judged by the public deciding whether the "hopefuls" were a hit or miss. The top contestants at the end of the week then appeared on the live show where the viewers would vote again to decide their fate.
My personal responsibilities were to:
The WebAPI was challenging in the sense that it had to be secure, fair and handle a large amount of traffic at peak times (during live broadcasts).
9.5k video performances uploaded, 21 million Hopeful votes cast
Due to the voting API being used to decide whether contestants would appear on the live show and potentially win money, it was under a lot of scrutiny. We had to do all we could to reduce the chances of spam voting and ensure there were no duplicates.
OAuth2 and bearer tokens were used to prove identity
Users could only use the apps after registering, which was either via Facebook or Twitter, using OAuth2. This was to help eliminate/reduce spam accounts being setup for voting purposes.
After logging in, a short lived auth token was issued (automatically updated in the background) which allowed user access to be revoked if suspicious activity was recorded.
Again, due to the nature of the show, the voting mechanism had to be fair. Videos had to be delivered to users in a random, evenly distributed order.
To achieve this, we assigned random numbers to each contestant and used SQL to pull a random contestant from the database. To get around the problem of contestants being assigned slightly less favorable random numbers (numbers at either end of the range) we updated them frequently.
We knew when the peak times were, so we could prepare in advance but because there was going to be a sudden spike in traffic, we had to contact Amazon in order to allocate an appropriate load balancer.
up to 300,000 concurrent users at peak times.
In order to reduce the load on the web servers and database we decided to export infrequently updated and expensive data (e.g. current leaderboard) as JSON and store it on S3. We used a single manifest file which the mobile apps would poll in order to see if any data had changed.
In order to stress test the WebAPI, I developed a simple(ish) master/slave application. It was actually a bit of a challenge in itself to generate the volume of traffic required to satisfy the tests, without requiring a prohibitive number of slave web servers.
Due to the high number of requests and work perfomed by the slaves I really had to be careful with threading and object allocation (to reduce time in GC).
Typical "User Journeys" were coded to mimic user behaviour in different parts of the app (including time for thinking/watching a video etc). User Journeys were then apportioned a percentage of the total traffic. This is so different types of User Journeys can run in tandem in order to provide a more comprehensive simulation i.e. not to isolate certain parts of the app.
The master coordinated the slaves by telling them when to start, stop and how to apportion the traffic. It also collected all of the stats sent from the slaves and plotted a graph of response times. Whilst the tests were running, I kept my eye on the response times and also the performance of the servers using the AWS portal.
Documentation of the API was important as there were a number of remote developers consuming it.
Swagger was used for generating interactive documentation
The HTML/JS based tool, Swagger, was a great for visualizing and exploring the API, it generated comprehensive interactive documentation which allowed the mobile developers to to get up to speed with it.
Swagger allows you to GET, POST, PUT and DELETE to your API and inspect the response.
It was an interesting project, giving insight into high traffic cloud based applications and all the different tools and methodologies available to solve problems.