Webpage speed is almost always connected to cache. That is why this article gives a display of improvements we can gain by utilizing different types of CACHE to improve a very important KPI: SPEED.

Methodology



Main objective is to measure the impact on performance by different cache enhancements on a Wordpress webpage with TTFB as the main key performance indicator:

  1. WP Object Cache
  2. PHP opCache
  3. NGINX FastCGI Cache

Process

  • Benchmarking tools in use: Apache Benchmark and Google Chrome (developer Tools)
  • Defining the BASE performance (without any cache) in terms of Reqs/sec and avg. TTFB.
  • Measuring the impact of each cache
  • Tests were performed within the same LAN/network, but from a different server and with SSL termination. This does not encompass real-world effect of distance between the user and the server. It could make a noticeable change if serving visitors on other continents, but only a minor impact if your target users are within the same region or even country.
  • Cache was cleaned and all key applications - NGINX, PHP-FPM 7.4 and Redis server - restarted before running each test.
  • BASE performance numbers provide raw (uncached) performance of used software (Linux, PHP, Wordpress...) running on a given server hardware.

Server in use:

  • LXC container
  • 4x vCPU (Xeon E5-2690)
  • 6GB RAM
  • LEMP (Linux Ubuntu 20.04, NGINX, mySQL 8, PHP-FPM 7.4), Redis
  • Database on Intel Optane

Why focus on TTFB?

One of the most important UX parameters for users' browsing the webpage is its "perceived speed". How fast is the site loading? How fast it's responding to my clicks? And among key performance indicators of speed is TTFB (Time to first byte).

In technical terms it's the moment when the users' browser starts receiving information from a server response (headers and HTML content of a page). With that info browser can start preparing things for display. Most of that time consists of making a connection with the server that hosts the webpage, distance between the user-server and often the bigest part due to a server preparing the (HTML) response which depends on hardware speed, load on a server, code, cache, database lookups, file lookups...

From a UX perspective TTFB is the duration leading up to the moment when a user will get a feeling "Ok, something happened and I will see content in a very short time". It is a "passive time" on his side while he is waiting, during which he is not being able to do anything else to speed up things. Somehow this time became very critical for UX as patience is not on the list of things a webpage owner can count on. Site with TTFB being too long will be perceived as slow, laggy, un-responsive...

The lower the TTFB, the better.

Reducing TTFB can also be among the most difficult things to optimize for ordinary website owners as they often do not have the profound understanding of elements that affect loading times; Do not understand the meaning of cache and the tradeoffs that come with it; Do not know/understand/care how good the code of their site is and which parts are essential/not needed; What kind of a hardware their site is running on; What is the load of the server their site is being served from;...

Benchmarks (reaching the highest reqs/sec)



$ ab -c 12 -t 20 https://wp.klik-mall.com/
  WP Object Cache (Redis) PHP opCACHE FastCGI Regs/sec
Higher is better
Relative to BASE PERF. Avg. TTFB (ms)
Lower is better
Relative to BASE PERF.
BASE performance 9.29 100% 1250 100%
WPobjectCache 9.55 103% 1183 94.64%
opCache 34.34 370% 343 27.44%
FastCGI 1097.29 11812% 5 0.40%
WPobjectCache + opCache 35.92 387% 328 26.24%


Smaller is better

Bigger is better

Notes & Observations

  • WP object cache combined with Redis gave improvements well below expectations (only 3%) considering the speed and potential of Redis itself. What should be noted here is that Wordpress database is on localhost of a tested container and is stored on Intel Optane - the fastest NVMe storage drive at the moment of writing. It's performance is closer to the speed of RAM than commonly used spinning disks on cheap hostings or even some basic SSD storage. So running queries and getting data "on-demand" from DB on Optane does not seem to be that much slower than having results of those queries saved in Redis for re-use (cache-hit was ~100%). But this could turn out quite different if we'd used a more populated Wordpress/WooCommerce page with a lot of products, posts and/or heavy plugins.
  • PHP opCache alone boosted the performance by ~3x and gives much better results compared to "WP Object cache".
  • CPU utilization: FastCGI ~5%, opCache ~20-30% and ~100% while benchmarking with just WP object cache. PHP opCache removes of need for compiling the code while FastCGI does not even "touch" the code of the application (Wordpress).
  • FastCGI improved TTFB performance by 99%+ (same LAN, but different servers) while serving 118x more reqs/sec than BASE performance. Our Managed wordpress hosting is partly tuned for this as we use 2 types of storages for FastCGI cache content: Intel Optane and ZFS. In this case it's on Optane NVMe which means this kind of setup serves cache from a storage with a very low latency, is persistent, cheaper than storing it in RAM and with bigger capacity than available RAM in server.
  • There is no "Performance scaling" by combining all three cache methods even though they are non-exclusive to each other. Results are very similar to FastCGI alone as the users' request does not even reach PHP/Wordpress so "WP object cache" and opCache are not even utilized to be able to add to the overall improvement.
  • But utilizing a combination of "WP object cache" and opCache without FastCGI makes sense as this gives an indication of expected performance on uncached pages as not all pages can be used in cached mode. That is why there is a separate test combining only these 2 types of cache.
  • In a case you can not have a FastCGI cache and can only rely on Wordpress plugins it makes a lot of sense tu use one of their popular full page cache solutions (W3 Total cache, WP Rocket...)

To put FastCGI performance into a diff. perspective:

FastCGI
  1. improved TTFB performance by ~99% while
  2. serving 118x more reqs/sec than BASE performance
  3. while utilizing 95% less CPU resources.
>

With FastCGI this setup can serve
3.96 million cached reqs/day
or
118.8 million reqs/month.

Benchmarks (looking for low TTFB scenario)



Based on an eye-test while clicking on a webpage it "felt" like it's loading much faster than 1250ms TTFB. Looking at TTFB timings in Chrome Developer Tools confirmed that "feeling" as TTFB timings they were consistently significantly lower than avg. TTFB gained during TEST 1.

So I've decided to do another test - keeping the num. of concurrent users below number of CPU cores.

$ ab -c 4 -t 20 https://wp.klik-mall.com/
  WP Object Cache (Redis) PHP opCACHE FastCGI Regs/sec
Higher is better
Relative to BASE PERF. Avg. TTFB (ms)
Lower is better
Relative to BASE PERF. Single Req. Chrome TTFB (ms) Relative to BASE PERF.
BASE performance 9.20 100% 426 100% 424.97 100%
WPobjectCache 9.27 101% 424 99.53% 400.13 94.15%
opCache 31.87 346% 120 9.60% 123.8 29.13%
FastCGI 699.31 7601% 2 0.16% 4.38 1.03%
WPobjectCache + opCache 34.37 374% 111 8.88% 121 28.47


Observations:

  • BASE performance Test 1 vs. Test 2 shows that the server can still process only 9-10 reqs/sec. But the avg. TTFB on lighter load was significantly lower (426ms vs. 1250ms or ~3x smaller). This coincided with an eye-test and browser TTFB timings.
  • Max. achieved Reqs/sec with FastCGI enabled were ~40% lower than in Test 1, so one can argue that it is a good tradeoff having avg. TTFB 5ms instead of 2ms, while being able to serve 1097.29 reqs/sec instead of 699.31. Those extra 3ms will probably go unnoticed with users.

Conclusions



  • Among the tested FastCGI Cache is the best solution and by a wide (wide) margin. This feature transforms a Wordpress site from a bike to a formula while spending much less fuel.
  • Using "WP object cache" and opCache can still be very beneficial as there are always a couple of non-cached pages (bypassed by FastCGI) to be processed like /cart, /wp-admin etc. It also helps after cache expirations, cache purge, warming the cache...
  • This page provides good starting points for tuning the server for hosting a Wordpress site as all three of these cache-a are non-exclusive and 2 of them (FastCGI and opCache) being totally independent of other Wordpress plugins that often cause issues being un-compatible with each other.

 

Benchmark results on BASE scenario

$ ab -c 12 -t 20 https://wp.klik-mall.com/
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking wp.klik-mall.com (be patient)
Finished 187 requests


Server Software:        nginx/1.18.0
Server Hostname:        wp.klik-mall.com
Server Port:            443
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES128-GCM-SHA256,2048,128
TLS Server Name:        wp.klik-mall.com

Document Path:          /
Document Length:        33058 bytes

Concurrency Level:      12
Time taken for tests:   20.126 seconds
Complete requests:      187
Failed requests:        0
Total transferred:      6265399 bytes
HTML transferred:       6181846 bytes
Requests per second:    9.29 [#/sec] (mean)
Time per request:       1291.478 [ms] (mean)
Time per request:       107.623 [ms] (mean, across all concurrent requests)
Transfer rate:          304.02 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        3    4   1.3      3      10
Processing:   404 1250 195.3   1230    2016
Waiting:      380 1223 193.4   1205    1990
Total:        411 1254 195.5   1233    2026

Percentage of the requests served within a certain time (ms)
  50%   1232
  66%   1294
  75%   1332
  80%   1375
  90%   1492
  95%   1598
  98%   1634
  99%   1997
 100%   2026 (longest request)

Benchmark results with NGINX FastCGI enabled

$ ab -c 12 -t 20 https://wp.klik-mall.com/
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking wp.klik-mall.com (be patient)
Completed 5000 requests
Completed 10000 requests
Completed 15000 requests
Completed 20000 requests
Finished 21946 requests


Server Software:        nginx/1.18.0
Server Hostname:        wp.klik-mall.com
Server Port:            443
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES128-GCM-SHA256,2048,128
TLS Server Name:        wp.klik-mall.com

Document Path:          /
Document Length:        33058 bytes

Concurrency Level:      12
Time taken for tests:   20.000 seconds
Complete requests:      21946
Failed requests:        0
Total transferred:      735362627 bytes
HTML transferred:       725536136 bytes
Requests per second:    1097.29 [#/sec] (mean)
Time per request:       10.936 [ms] (mean)
Time per request:       0.911 [ms] (mean, across all concurrent requests)
Transfer rate:          35906.21 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        2    6   2.7      5      21
Processing:     1    5   7.6      4    1030
Waiting:        0    3   7.6      2    1029
Total:          4   11   8.1     10    1039

Percentage of the requests served within a certain time (ms)
  50%     10
  66%     12
  75%     13
  80%     14
  90%     16
  95%     18
  98%     19
  99%     20
 100%   1039 (longest request)

Povezane vsebine

 
Impact of CPU speed on websites running PHP (24.08.2020)
If your webpage is running on Wordpress, WooCommerce, Magento etc. and is running slow, picking the right CPU can have a significant impact.