BLOG / DEVELOPMENT


How we optimized our Private Cloud
to provide high performance webhosting on a PHP / LEMP stack


This is an overview of our private webhosting cloud for clients using Klik MALL SaaS platform for their websites and webshops.

It can also serve as a blueprint for high performance Wordpress, WooCommerce, Magento setups as these also run on PHP / LEMP.

 

Quick summary

To highlight the added value of a setup as this private cloud, here are the main differences in comparison to a common "All-in-one" Shared/VPS webhosting:

  • HAProxy is a special, multi purpose server. At first it's used as a WAF (Web Application Firewall) that protects resources from "usage abusers", "useless traffic by bots" etc. Then it's used as a load balancer that provides horizontal scaling for PHP code execution, NodeJS services etc. It also gives us an option to use smaller, but faster and better-suited CPUs to provide better UX with faster loadings.
  • Using the fastest CPU for PHP can reduce TTFB by 30%+ compared to common hosting CPUs. Important thing to notice here is that "best CPU for the job" might not be suitable for common all-in-one hosting packages, but is a perfect fit in a cluster setup like one described below.
  • Using Intel Optane NVMe for Database means using the single best storage drive for this workload. Having enough RAM and optimized queries are 2 best things you can do for database performance. Intel Optane is next in that line.
  • A special filesystem (ZFS) for NAS/media files to provide best "bang for buck" CDN for static files.

We divided core tasks for serving a website/webshop and optimized it in terms of software & hardware to provide max. performance + expandability for bigger loads.

Continue if you want to find out more about how this setup works.



An overview of the cluster and its main servers.




  • Common path of a web request through the cloud
  • 1
    Firewall passes https (80) and https (443) traffic to HAProxy.
  • 2
    HAProxy checks users's request with WAF rules. If all OK, it selects a backend to process the request.
  • 3
    Main app. runs on PHP backend servers which are load-balanced in "Sticky Session" mode.
  • 4
    Data is stored on NAS/CDN server, FullPageCache storage, DB, Redis and others.
  • 5
    PHP processes the request and returns a response to HAProxy. It compresses and encrypts response for the user.
  • 6
    One exception is direct serving of static files via CDN server. This path is: HAProxy > CDN > HAProxy > User.
  • 7
    NodeJS servers are used for supporting services like exporting invoices (e.g. HTML2PDF) and load-balanced in "Round Robin" mode.
  1. LB - load balancer
  2. WAF rules: whitelist, blacklist, good/bad bots, usage rate...
  3. Sticky Session mode.
  4. Round Robin mode.

 

 

HAProxy

Load balancer, WAF, Usage Rate Limiter...

HAProxy is well known as "The Reliable, High Performance TCP/HTTP Load Balancer". It is the central piece of our cloud and we've managed to use it for much more than just load balancing.

HEALTH MONITORING

Periodically checking operational status of backend servers. In case some server is not responding correctly HAProxy transfers all load to other servers in that group.

LOAD BALANCING

For certain parts of the application we wanted an option for horizontal scaling (using multiple backend PHP and NodeJS servers sharing the load that can get too high for a single server or for better performance reasons). "Vertical scaling" is often easier to set up, but has harder limits & often exponentially higher pricing. Live LB demo

SSL TERMINATION POINT

SSL certs are issued from Let's Encrypt and used by HAproxy serving as a SSL termination point. This helps us move apps among backend servers without additional care for certs.
We have the option to create wildcard (*.domain) cert when DNS on Cloudflare.

WEB APPLICATION FIREWALL

Used for protection of well-known target URLs (e.g. wordpress/wp-admin, Magento/admin) that hackers and their bots often target. Additionaly, usage counter will detect potential brute-force attacks and stop them at an early stage (VideoGif Demo). Steps like this allow us to provide better secured admin area on e.g. Managed Wordpress Hosting.

USAGE RATE LIMITER

A dedicated tracker for usage by individual users with rules for abuse detection & prevention (normal user, heavy user, abuser, potential DDOS) in place before requests even hit the main application servers.

"0 DOWNTIME" MIGRATIONS

HAProxy allows us to prepare a new server (swf installation, data transfer and tests) in the background in a "production like" environment. The redirection of the load from the current production server to a new one is easy and instant (0 downtime).

REDUCING USELESS TRAFFIC by BOTS

According to HAProxy "Bots make up nearly half the traffic on the Web." (Nov 28, 2018). We want "good bots" like Google & Bing to get to a website, crawl it, index it and provide relevant search engine traffic. We do not want useless and often excessive "bots" traffic to cripple performance and worsen UX of real users.

ACL BASED BACKEND SERVER TARGETING

HAproxy running in http mode gives us the ability to define different backend servers based on:
  • Root domain (e.g. klik-mall.com)
  • Sub domain (e.g. demo.klik-mall.com)
  • even URL parameters e.g. we can route "/admin" backend to different set of servers if needed. Sort of a "horizontal scaling or using more suitable servers for different parts of app. based on URL params" like /erp, /shop, /cart, /blog etc.

See below in CDN storage how this helps us get better solutions.

COMPRESSION, ENCRYPTION,...

HAproxy also provides:
  • SSL encryption for https traffic
  • HTML compression
  • HTTP/2
A single point of setup for all backend webpages + offloading main app. servers of these tasks. HAProxy is generally a single-threaded process for its routing, but can benefit on tasks like these from multiple cores.

ACCESS LOGS

All http/https requests come through HAProxy and that makes it a bit easier for overall monitoring.


 

Core Application on LEMP (Linux, NGINX, mySQL, PHP)



The core of SaaS application (POS/register, webshops, CMS for website content) runs on a LEMP stack. Cheaper and basic hostings do not have much trouble serving cached content, but often struggle heavily providing even a decent performance for uncached content. It is a very different workload for an application if it can serve e.g. fully cached website pages compared to providing real-time stock status, personalized content for registered users, B2B pricing in a webshop, POS/register operations etc.

As new invoices and webshop orders can not be cached, we needed to pick and tune hardware & software to have excellent non-cached performance.


Tuning PHP for best performance

  • The latest and most performant version PHP-FPM 7.4
  • opCache enabled
  • Tuned PHP settings
  • Horizontal scaling over multiple servers with HAProxy

  • The Best CPU for PHP

Best processor for the (PHP) job



We've written about the Impact of CPU speed on websites running PHP in general and posted a comparison Which are the fastest CPUs for PHP websites like Wordpress and Magento. The best CPUs for PHP based on PHPBench are Intel i9 gaming/workstation processors (i9-9900K, i9-9900KS, i9-10700K, i9-10900K). They can run on frequencies up to 5GHz which is almost 1GHz faster than the fastest Xeons and AMD processors get.

Using this CPU does require a specialized setup as it does not support ECC memory. But in a quest for the shortest TTFB server responses and with an option for horizontal scaling i9-9900K turned out great. Our tests have also shown that even a very light form of virtualization (LXC containers) shaves 10-15% of CPU's PHP performance. A setup like this allows us to have dedicated servers with with bare-metal PHP installations, if needed.

Future proof option for CPUs to not become a bottleneck is horizontal scaling with a HAProxy load balancer. This allows us to add additional "best PHP servers for the job" and to distribute the load among them.

Performance comparison of CPUs for PHP workload

LEGEND

1 Cheap webhosting
1 Common webhosting
1 VPS / Above avg. webhosting
1 Small, high-freq servers
1 Specialized webhosting


"The Best CPU for PHP" notion is based on Phoronix OpenBenchmark PHP Bench results. Our i9-9900K score was 807.000 which puts it among the top results. All internal tests have shown it has a significant impact on the perfomance of PHP.

 

Cheap hostings are one of the main reasons why you often see laggy Wordpress sites or (too) long checkouts on Magento as there are hundreds of websites on a single server which does not even have the appropriate hardware for PHP workload.



Tuning mySQL Database

  • Lightweight virtualization (LXC) or Bare-metal
  • The latest and most performant version + Tuned Settings.
  • Enough RAM + CPU cores
  • Optimized DB Queries. Cached Queries Wherever Possible.

  • Best mySQL Storage Device

Database runs on Intel Optane NVMe - currently the fastest storage device for this type of workload with near-RAM latency and 500.000 IOPS on 4K random reads/writes. It also has 2.5GB/s transfer speed. This kind of hardware is as premium as it gets!

If you are interested in semi-tech reviews:

 



Static files storage and Private CDN



For storing and serving static media files (images, pdfs, css...) we were looking into solutions with 3 main objectives:
  1. Lots of storage space
  2. Optimized for serving website static assets (media files)
  3. Price/Performance ratio

With unlimited budget this would be much easier. But that is not the case so we had to think this through a bit. Storage speeds go from lowest to the highest in order:
HDD > SSD > NVMe > RAM.

  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  • HDD
  • SSD
  • NVMe
  • Optane
  • RAM

As most often, the price goes in the same direction, sometimes exponentially. Storing data into RAM is not a viable solution as it is a non-persistent storage, by far the most expensive and most limited in max. size per server. Second in line is NVMe and even though prices have fallen quite a bit, it is still too expensive for this kind of storage use.

The idea of using Intel Optane or other NVMe drives for old or even obsolete gallery pictures did not sit well with the goal of trying to provide "best bang for buck" hosting. Could we have best of both worlds ($/size of HDDs and speed of NVMe/RAM)?

ZFS

We've ended up setting a dedicated server that has a very specific file system, mostly known and used in enterprise: ZFS ("Zettabyte File System"). You can read more about it here An Introduction to ZFS A Place to Start (ServeTheHome. Aug, 2020) or just Google it. It is by no means a perfect match for everything (learning curve; Very specific hardware requirements; High RAM usage e.g. min. 16GB ECC RAM for our initial 10TB goal; etc.). But what we really liked about it were 3 things:

  1. Having a 3-tiered storage system that could serve very big storage pools in the most efficient manner regarding utilization of available hardware (cache in RAM and NVMe).
  2. Its software RAID capabilities will always be bigger that even our wildest success scenarios would require
  3. Its backup capabilities like snapshots, efficient remote sync etc.



Speed on web comes from cache and ZFS has some unique strengths in this matter

 

For our CDN setup ZFS provides a highly sophisticated alghorithm for "Most frequently used" (MFU) and "Most recently used" (MRU) files. It natively decides what to store to RAM (fastest cache storage - ZFS ARC) and NVMe (2nd fastest cache storage - ZFS L2ARC) for the most efficient and fastest serving of requested content.

If we try to present a typical website's usage of static media files and its storage it would look something like this chart:

LEGEND

1 Files used all the time (logo, css)
2 Frequently used (media on home page, popular posts with galleries)
3 Rarely used (old blog posts, semi-popular webshop items)
4 Almost never, only once or even never (old or in-active gallery pics, exported pdf for an invoice, digital webshop items, only Google bot)
5 Unused storage space


This means that if we can store a relatively small portion of all static files into a fast cache, it will seem to most users like it's all on a very fast storage.

So when someone visits a webpage on our hosting, we serve a logo, css and other freq. used static assets from RAM and Intel Optane. This is pretty much as fast as it gets all while keeping the cost of webhosting close to an average one.

 

Serving static files directly from CDN server

We leveraged this NAS/CDN setup even further. By using HAProxy http ACL rules we set a different backend server target for cdn.* subdomains. With this we can serve static resources stored on our CDN/media storage directly, avoiding this request going through main app. servers/PHP. A combination of a fast webserver for static resources (NGINX) and a storage-efficient, but performant ZFS setup gives great results for this task.

Example:

  1. https://cdn.klik-mall.com/docs/ ... klik-mall-private-cloud-hosting-019.png
  2. https://www.klik-mall.com/docs/ ... klik-mall-private-cloud-hosting-019.png
It's the same file, but we can serve static files via cdn.* subdomains more efficiently and reduce load on main App servers at the same time. It also gives us an option to put "cdn" subdomain DNS on a premium global CDN like Cloudflare, KeyCDN, Cloudfront/AWS etc.

  • Dedicated NAS fileserver for static files running on ZFS
  • Utilizing cache for speed (RAM for ARC and Intel Optane as L2ARC)
  • cdn.* subdomains for direct serving of files and offloading main app. servers
  • Sophisticated backup options


  • Private CDN Solution

Other notes:

  • Klik MALL CDN storage is a NAS mainly used for media files (pic, pdf, mp4, zip...)
  • Most frequently used files (e.g. website logos, customized-prefs.css) are successfully served from RAM or Intel Optane as observed through our monitoring tools. Both storage types give extreme performance in terms of latency and throughput.
  • Less and rarely frequently used files are stored on a main HDD pool that can store TB's of data.
  • ZFS pool being on a local 10Gb network, high perf. Xeons and configured as ours leads us to the conclusion that "upload" internet connection will be a bottleneck long before the server.
  • Though our main goal was CDN delivery via tiered storage, ZFS also has many other useful functionalities for managing webhosting (snapshots, Cow - copy on write, optimal sysnc of remote and offsite backups etc.)

 

 

FPC (Full Page Cache) for very fast webpages



As Speed on Web comes from Cache (cached database query, API call result, partial page content...) one of the biggest improvements in "webpage speed" and loading times can be achieved through serving pre-prepaired webpage HTML content in a form of full page cache. For this task a common practice is to store this HTML to RAM (e.g. Redis) or as a regular file on disk and - if applicable - serve it to user without the need to use app/PHP servers.

Due to our horizontal load balancing setup for PHP servers we wanted a shared storage solution so all servers can check if FPC already exists or to be sure that FPC content was deleted during purge of that cache. Though Redis via network would be a very good option, we set up this storage on ZFS to give it a try.

Considering how ZFS works:
  • we have a solution that is "as fast as RAM" for most frequently used files (Content of these is a HTML of a webpage),
  • preserves RAM for other workloads,
  • it's a much bigger storage space for FPC content than that of a typical Redis instance (ZFS ARC + L2ARC + HDD could be in TB's of storage)
  • saved FPC content is persistent, so there is no "warming up the cache" even after reboots,
  • required monitoring is minimal as ZFS does pretty much all the job as long as there are enough resources.

ZFS could turn out to be sort of a "Full Page Cache on steroids".

 

 

How fast are webpages on this private cloud and Klik MALL CMS?



TTFB - Time to first byte is a good indicator of code and raw hardware performance. The biggest part of TTFB is often due to a time needed by a server to prepare a response content for the user.

The chart displays a comparison of:
  • Average (TTFB) of websites in Germany
  • Recommendation by Google as to what TTFB time would provide a reasonably good user experience
  • TTFB of a common webpage on Klik MALL Platform
  • TTFB of a webpage utilizing full Page Cache (FPC) on Klik MALL Platform
1700ms

Povp. TTFB strani*

1500ms

Google recommends*

~120ms

Klik MALL
(no FPC)

~10ms

Klik MALL
(+ FPC)



Links

* thinkwithgoogle.com: Google research of averagre webpage loading speeds
* Blog (SI): Klik MALL + Full Page Cache = izredno hitra spletna stran
Example:

Website restavracija-mirje.si has a TTFB ~100ms without using FPC, per Google Analytics.

 

 



 
Toast message
Toast-Top message