Methodology
Main objective is to measure the impact on performance by different cache enhancements on a Wordpress webpage with TTFB as the main key performance indicator:
- WP Object Cache
- PHP opCache
- NGINX FastCGI Cache
Process
- Benchmarking tools in use: Apache Benchmark and Google Chrome (developer Tools)
- Defining the BASE performance (without any cache) in terms of Reqs/sec and avg. TTFB.
- Measuring the impact of each cache
- Tests were performed within the same LAN/network, but from a different server and with SSL termination. This does not encompass real-world effect of distance between the user and the server. It could make a noticeable change if serving visitors on other continents, but only a minor impact if your target users are within the same region or even country.
- Cache was cleaned and all key applications - NGINX, PHP-FPM 7.4 and Redis server - restarted before running each test.
- BASE performance numbers provide raw (uncached) performance of used software (Linux, PHP, Wordpress...) running on a given server hardware.
Server in use:
- LXC container
- 4x vCPU (Xeon E5-2690)
- 6GB RAM
- LEMP (Linux Ubuntu 20.04, NGINX, mySQL 8, PHP-FPM 7.4), Redis
- Database on Intel Optane
Why focus on TTFB?
One of the most important UX parameters for users' browsing the webpage is its "perceived speed". How fast is the site loading? How fast it's responding to my clicks? And among key performance indicators of speed is TTFB (Time to first byte).
In technical terms it's the moment when the users' browser starts receiving information from a server response (headers and HTML content of a page). With that info browser can start preparing things for display. Most of that time consists of making a connection with the server that hosts the webpage, distance between the user-server and often the bigest part due to a server preparing the (HTML) response which depends on hardware speed, load on a server, code, cache, database lookups, file lookups...
From a UX perspective TTFB is the duration leading up to the moment when a user will get a feeling "Ok, something happened and I will see content in a very short time". It is a "passive time" on his side while he is waiting, during which he is not being able to do anything else to speed up things. Somehow this time became very critical for UX as patience is not on the list of things a webpage owner can count on. Site with TTFB being too long will be perceived as slow, laggy, un-responsive...
The lower the TTFB, the better.
Reducing TTFB can also be among the most difficult things to optimize for ordinary website owners as they often do not have the profound understanding of elements that affect loading times; Do not understand the meaning of cache and the tradeoffs that come with it; Do not know/understand/care how good the code of their site is and which parts are essential/not needed; What kind of a hardware their site is running on; What is the load of the server their site is being served from;...
Benchmarks (reaching the highest reqs/sec)
$ ab -c 12 -t 20 https://wp.klik-mall.com/
WP Object Cache (Redis) | PHP opCACHE | FastCGI | Regs/sec Higher is better |
Relative to BASE PERF. | Avg. TTFB (ms) Lower is better |
Relative to BASE PERF. | |
BASE performance | 9.29 | 100% | 1250 | 100% | |||
WPobjectCache | 9.55 | 103% | 1183 | 94.64% | |||
opCache | 34.34 | 370% | 343 | 27.44% | |||
FastCGI | 1097.29 | 11812% | 5 | 0.40% | |||
WPobjectCache + opCache | 35.92 | 387% | 328 | 26.24% |
Smaller is better
Bigger is better
Notes & Observations
- WP object cache combined with Redis gave improvements well below expectations (only 3%) considering the speed and potential of Redis itself. What should be noted here is that Wordpress database is on localhost of a tested container and is stored on Intel Optane - the fastest NVMe storage drive at the moment of writing. Its performance is closer to the speed of RAM than commonly used spinning disks on cheap hostings or even some basic SSD storage. So running queries and getting data "on-demand" from DB on Optane does not seem to be that much slower than having results of those queries saved in Redis for re-use (cache-hit was ~100%). But this could turn out quite different if we'd used a more populated Wordpress/WooCommerce page with a lot of products, posts and/or heavy plugins.
- PHP opCache alone boosted the performance by ~3x and gives much better results compared to "WP Object cache".
- CPU utilization: FastCGI ~5%, opCache ~20-30% and ~100% while benchmarking with just WP object cache. PHP opCache removes of need for compiling the code while FastCGI does not even "touch" the code of the application (Wordpress).
- FastCGI improved TTFB performance by 99%+ (same LAN, but different servers) while serving 118x more reqs/sec than BASE performance. Our Managed wordpress hosting is partly tuned for this as we use 2 types of storages for FastCGI cache content: Intel Optane and ZFS. In this case it's on Optane NVMe which means this kind of setup serves cache from a storage with a very low latency, is persistent, cheaper than storing it in RAM and with bigger capacity than available RAM in server.
- There is no "Performance scaling" by combining all three cache methods even though they are non-exclusive to each other. Results are very similar to FastCGI alone as the users' request does not even reach PHP/Wordpress so "WP object cache" and opCache are not even utilized to be able to add to the overall improvement.
- But utilizing a combination of "WP object cache" and opCache without FastCGI makes sense as this gives an indication of expected performance on uncached pages as not all pages can be used in cached mode. That is why there is a separate test combining only these 2 types of cache.
- In a case you can not have a FastCGI cache and can only rely on Wordpress plugins it makes a lot of sense tu use one of their popular full page cache solutions (W3 Total cache, WP Rocket...)
Benchmarks (looking for low TTFB scenario)
Based on an eye-test while clicking on a webpage it "felt" like it's loading much faster than 1250ms TTFB. Looking at TTFB timings in Chrome Developer Tools confirmed that "feeling" as TTFB timings they were consistently significantly lower than avg. TTFB gained during TEST 1.
So I've decided to do another test - keeping the num. of concurrent users below number of CPU cores.
$ ab -c 4 -t 20 https://wp.klik-mall.com/
WP Object Cache (Redis) | PHP opCACHE | FastCGI | Regs/sec Higher is better |
Relative to BASE PERF. | Avg. TTFB (ms) Lower is better |
Relative to BASE PERF. | Single Req. Chrome TTFB (ms) | Relative to BASE PERF. | |
BASE performance | 9.20 | 100% | 426 | 100% | 424.97 | 100% | |||
WPobjectCache | 9.27 | 101% | 424 | 99.53% | 400.13 | 94.15% | |||
opCache | 31.87 | 346% | 120 | 9.60% | 123.8 | 29.13% | |||
FastCGI | 699.31 | 7601% | 2 | 0.16% | 4.38 | 1.03% | |||
WPobjectCache + opCache | 34.37 | 374% | 111 | 8.88% | 121 | 28.47 |
Observations:
- BASE performance Test 1 vs. Test 2 shows that the server can still process only 9-10 reqs/sec. But the avg. TTFB on lighter load was significantly lower (426ms vs. 1250ms or ~3x smaller). This coincided with an eye-test and browser TTFB timings.
- Max. achieved Reqs/sec with FastCGI enabled were ~40% lower than in Test 1, so one can argue that it is a good tradeoff having avg. TTFB 5ms instead of 2ms, while being able to serve 1097.29 reqs/sec instead of 699.31. Those extra 3ms will probably go unnoticed with users.
Conclusions
- Among the tested FastCGI Cache is the best solution and by a wide (wide) margin. This feature transforms a Wordpress site from a bike to a formula while spending much less fuel.
- Using "WP object cache" and opCache can still be very beneficial as there are always a couple of non-cached pages (bypassed by FastCGI) to be processed like /cart, /wp-admin etc. It also helps after cache expirations, cache purge, warming the cache...
- This page provides good starting points for tuning the server for hosting a Wordpress site as all three of these cache-a are non-exclusive and 2 of them (FastCGI and opCache) being totally independent of other Wordpress plugins that often cause issues being un-compatible with each other.
Benchmark results on BASE scenario
$ ab -c 12 -t 20 https://wp.klik-mall.com/ This is ApacheBench, Version 2.3 <$Revision: 1807734 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking wp.klik-mall.com (be patient) Finished 187 requests Server Software: nginx/1.18.0 Server Hostname: wp.klik-mall.com Server Port: 443 SSL/TLS Protocol: TLSv1.2,ECDHE-RSA-AES128-GCM-SHA256,2048,128 TLS Server Name: wp.klik-mall.com Document Path: / Document Length: 33058 bytes Concurrency Level: 12 Time taken for tests: 20.126 seconds Complete requests: 187 Failed requests: 0 Total transferred: 6265399 bytes HTML transferred: 6181846 bytes Requests per second: 9.29 [#/sec] (mean) Time per request: 1291.478 [ms] (mean) Time per request: 107.623 [ms] (mean, across all concurrent requests) Transfer rate: 304.02 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 3 4 1.3 3 10 Processing: 404 1250 195.3 1230 2016 Waiting: 380 1223 193.4 1205 1990 Total: 411 1254 195.5 1233 2026 Percentage of the requests served within a certain time (ms) 50% 1232 66% 1294 75% 1332 80% 1375 90% 1492 95% 1598 98% 1634 99% 1997 100% 2026 (longest request)
Benchmark results with NGINX FastCGI enabled
$ ab -c 12 -t 20 https://wp.klik-mall.com/ This is ApacheBench, Version 2.3 <$Revision: 1807734 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking wp.klik-mall.com (be patient) Completed 5000 requests Completed 10000 requests Completed 15000 requests Completed 20000 requests Finished 21946 requests Server Software: nginx/1.18.0 Server Hostname: wp.klik-mall.com Server Port: 443 SSL/TLS Protocol: TLSv1.2,ECDHE-RSA-AES128-GCM-SHA256,2048,128 TLS Server Name: wp.klik-mall.com Document Path: / Document Length: 33058 bytes Concurrency Level: 12 Time taken for tests: 20.000 seconds Complete requests: 21946 Failed requests: 0 Total transferred: 735362627 bytes HTML transferred: 725536136 bytes Requests per second: 1097.29 [#/sec] (mean) Time per request: 10.936 [ms] (mean) Time per request: 0.911 [ms] (mean, across all concurrent requests) Transfer rate: 35906.21 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 2 6 2.7 5 21 Processing: 1 5 7.6 4 1030 Waiting: 0 3 7.6 2 1029 Total: 4 11 8.1 10 1039 Percentage of the requests served within a certain time (ms) 50% 10 66% 12 75% 13 80% 14 90% 16 95% 18 98% 19 99% 20 100% 1039 (longest request)