A real-world case study on optimizing React SPA performance.
Website performance isn't just about loading time. Providing a fast and responsive user experience is critical, especially for the productivity desktop applications that people use every day. The engineering team of Recruitment technologies went through a refactoring project to improve one of its web applications, AirSHIFT, for better user input performance. This is how they did it.
Slow response, lower productivity
AirSHIFT is a desktop web application that helps shop owners, such as restaurants and cafes, manage the shift work of their staff members. Built with React, the single-page app provides rich client functionality including multiple shift schedule grid tables organized by day, week, month, and more.
As the Recruit Technologies engineering team added new features to the AirSHIFT app, they started getting more feedback on slow performance. AirSHIFT Engineering Manager Yosuke Furukawa said:
In a user research study, we were surprised when one of the store owners said that he would leave his seat to make coffee after clicking a button, just to kill time waiting for the shift table to load.
After conducting the research, the engineering team realized that many of its users were trying to load massive shift tables onto low-spec computers, such as a 1 GHz Celeron M laptop from 10 years ago.
The AirSHIFT application was blocking the main thread with expensive scripts, but the engineering team didn't realize how expensive the scripts were because they were being developed and tested on rich-spec computers with fast Wi-Fi connections.
After profiling its performance in Chrome DevTools with CPU and network throttling enabled, it became clear that performance needed to be optimized. AirSHIFT formed a working group to address this issue. Here are 5 things they focused on to make their app more responsive to user input.
1. Virtualize large tables
Displaying the shift table required several costly steps: building the virtual DOM and rendering it on the screen in proportion to the number of staff members and time slots. For example, if a restaurant has 50 working members and you want to check their monthly shift schedule, it would be a table of 50 (members) multiplied by 30 (days), resulting in 1,500 cell components to render. This is a very expensive operation, especially for low-spec devices. In reality, things were worse. From the investigation, they discovered that there were stores managing 200 staff members, requiring around 6,000 cell components in a single monthly table.
To reduce the cost of this operation, AirSHIFT virtualized the shift table. The application now only mounts the components inside the viewport and unmounts the components outside the screen.
In this case, AirSHIFT used react virtualized as there were requirements around enabling complex two-dimensional grid tables. They are also exploring ways to convert the implementation to use the light reaction window in the future.
Results
Table virtualization alone reduced scripting time by 6 seconds (in a Macbook Pro 4x slower + Fast 3G accelerated environment). This was the most impactful performance improvement in the refactoring project.
2. Audit with API User Timing
The AirSHIFT team then refactored the scripts that run with user input. the flame table
from Chrome DevTools
allows you to analyze what is actually happening in the main thread. But the AirSHIFT team found it easier to analyze application activity based on the React lifecycle.
React 16 provides its performance tracking through the
User time API, which you can view from the
Times section
from Chrome DevTools. AirSHIFT used the Times section to find unnecessary logic running in React lifecycle events.
Results
The AirSHIFT team found that a
Reaction Tree Reconciliation
it was happening just before every shipping lane. This meant that React was updating the shift table unnecessarily before the navigations. An unnecessary Redux status update was causing this problem. Fixing it saved around 750ms of scripting time. AirSHIFT also made other micro-optimizations that eventually led to a total reduction of 1 second in programming time.
3. Lazy loading components and costly logic transfer to web workers
AirSHIFT has a built-in chat application. Many store owners communicate with their staff members via chat while looking at the shift table, which means that a user may be typing a message while the table is loading. If the main thread is busy with scripts that are rendering the table, the user input could be annoying.
To enhance this experience, AirSHIFT now uses React.lazy and Suspense to display placeholders for table content while slowly loading the actual components.
The AirSHIFT team also migrated some of the expensive business logic within lazily loaded components to
web workers. This solved the user input jank issue by freeing the main thread so it could focus on responding to user input.
Typically developers are faced with complexity when using workers, but this time Comlink did the heavy lifting for them. Below is the pseudocode of how AirSHIFT worked on one of the most expensive operations they ever had: calculating total labor costs.
In App.js, use React.lazy and Suspense to display backup content while loading
import React, { lazy, Suspense } desde 'react'
const Hello = lazy(() => import('./Cost'))
const Loading = () => (
<div>Some fallback content to show while loading</div>
)
export default function App({ userInfo }) {
return (
<div>
<Suspense fallback={<Loading />}>
<Cost />
</Suspense>
</div>
)
}
In the Cost component, use comlink to run the calculation logic
import React desde 'react';
import { proxy } desde 'comlink';
const WorkerlizedCostCalc = proxy(new Worker('./WorkerlizedCostCalc.js'));
export default function Cost({ userInfo }) {
const instance = await new WorkerlizedCostCalc();
const cost = await instance.calc(userInfo);
return <p>{cost}</p>;
}
Implement the calculation logic that runs on the worker and expose it with comlink
import { expose } desde 'comlink'
import { someExpensiveCalculation } desde './CostCalc.js'
expose({
calc(userInfo) {
return someExpensiveCalculation(userInfo);
}
}, self);
Results
Despite the limited amount of logic that worked as a test, AirSHIFT switched around 100ms of its JavaScript from main thread to working thread (simulated with 4x CPU throttle).
AirSHIFT is currently exploring whether they can lazy load other components and offload more logic to web workers to further reduce jank.
4. Establish a performance budget
With all of these optimizations in place, it was critical to ensure that the application continued to function over time. AirSHIFT now uses pack size do not exceed the current JavaScript and CSS file size. In addition to setting these basic budgets, they created a dashboard to display various percentiles of the shift table load time to check if the app works even under less than ideal conditions.
- Now script completion time is measured for each Redux event
- Performance data is collected in Elasticsearch
- The performance of the 10th, 25th, 50th and 75th percentiles of each event is displayed with Kibana
AirSHIFT is now monitoring the shift table load event to make sure it completes in 3 seconds for 75th percentile users. This is an unapplied budget for now, but they are considering automatic notifications through Elasticsearch when they exceed your budget.
Results
In the graph above, you can see that AirSHIFT is now mainly hitting the 3 second budget for 75th percentile users and also loads the shift table in one second for 25th percentile users. By capturing RUM performance data from various conditions and devices, AirSHIFT can now check if a new feature version is really affecting the performance of the application or not.
5. Performance hackathons
While all of these performance optimization efforts were important and impactful, it's not always easy to get engineering and business teams to prioritize non-functional development. Part of the challenge is that some of these performance optimizations cannot be planned. They require experimentation and a trial and error mindset.
AirSHIFT is now conducting internal 1-day performance hackathons to allow engineers to focus only on performance-related work. In these hackathons they remove all limitations and respect the creativity of the engineers, which means that any implementation that contributes to speed is worth considering. To speed up the hackathon, AirSHIFT divides the group into small teams and each team competes to see who can get the biggest Lighthouse performance score improvement. Teams become very competitive! 🔥
Results
The hackathon approach is working well for them.
- Performance bottlenecks can be easily spotted by testing multiple approaches during the hackathon and measuring each with Lighthouse.
- After the hackathon, it's pretty easy to convince the team which optimization they should prioritize for the production release.
- It is also an effective way to defend the importance of speed. All participants can understand the correlation between how you code and how it translates to performance.
A nice side effect was that many other engineering teams within Recruit took an interest in this practical approach and the AirSHIFT team is now facilitating multiple speed hackathons within the company.
Summary
It was definitely not the easiest journey for AirSHIFT to work on these optimizations, but it was certainly worth it. Now AirSHIFT is loading the shift table in 1.5 seconds at the median, which is a 6x improvement over its pre-project performance.
After the performance optimizations were released, one user said:
Thank you very much for speeding up the loading of the shift table. Organizing shift work is much more efficient now.