Web application performance optimization tips for CFML, PHP and other languages

  Follow me: Follow Bruce Kirkpatrick by email subscription Bruce Kirkpatrick on Twitter Bruce Kirkpatrick on Facebook
Mon, Apr 15, 2013 at 11:05PM

It's taken me a long time to eliminate disk overhead from my web application, and I'm not done yet. Once you have an app that is CPU bound instead of disk bound, then you can start to justify using some other more efficient language. However, there is a huge amount of disk access in many database driven web sites.  It takes a lot of work to redesign them to operate mostly in memory and to have caches that flush intelligently.   A lot of my learning and experimenting in the last few years has been related to figuring how out to add layers of caching, replace inefficient code, and optimize the database.

Disk access is extremely slow even with modern hardware

In the past, I wouldn't have thought that simple file operations were so significant for slowing down a request. File operations in the CFML language occur more then you might realize.  Simple things like fileexists(), expandpath, cfinvoke, createobject() and cfinclude, all have disk I/O overhead.  It's not as bad as running a uncached database query, but it's still significant.

If your app has numerous files that are dynamically evaluated on each request, your app will scale poorly under higher load. Disk latency can be 1000 times slower then direct memory access. Whenever possible you want to figure out how to use memory instead. Fortunately, the CFML language makes that easy by letting you store data in the application scope and other shared memory scopes.

With fileexists(), I actually store the result of that sometimes in the application scope now so that it doesn't keep happening. Often I can make a decision that it won't need to evaluate fileexists() until the application scope cache is flushed again. These seemingly small changes make a big difference under load because SATA hard drives really handle heavy random access poorly. Even if you are using SAS drives or SSD drives, you will still experience significant latency with any disk I/O that reduces scalability.

Avoid using network filesystems when possible

File I/O slows my test system down a great deal because I store the source files outside the virtual machine to keep the virtual machine compact. My virtual machine has a samba mount for accessing the source files so that I can keep the image small and various applications in the host system operate better when the files are local.  I develop on Windows, but deploy to Linux.  My virtual machine runs an identical configuration to the Linux production server.  

Nginx, Railo and PHP accesses the source code through Samba, which takes up to 60 milliseconds to return a single response. That is huge!

On the production server, I avoid having public requests that depend on network filesystems, but I still find them useful for connecting virtual machines and remote servers when transferring files between them in various cronjobs.

However, lets say your development machine is storing the source code on faster media then what is used in production.  This would create a situation where you don't discover the performance issues in your app until the code has been deployed to production.  For example, going from a solid state drive to a 7200 RPM cloud SAN or NFS mount.  This would add a huge amount of latency to the app whenever disk operations are performed.  

On my system, it helps me to have the source code running on a slower disk when trying to optimize the application.   The latency from using 7200 rpm hard drive, RAID and samba greatly exaggerates disk problems similar to what you may experience in a cloud environment.   I know if I can eliminate these bottlenecks in such a non-ideal situation, then the app should run faster anywhere else it is deployed.  

I used to develop on my SSD drive, but that was hiding my performance issues.  Sure, we could all deploy our application to SSD drives if we had unlimited budgets, but SSD drives have been small and expensive historically.  It is just starting to get affordable to put all data on SSD drives.  I like to test in worst case conditions so that I don't need to waste money on hardware.   I want someone to be able to server more traffic with less money spent.  Hopefully, they will agree with my decisions once I start distributing my application to other developers.

Database optimization tips

Since the database is the most difficult things to optimize, this section probably deserves its own article that is specific to the database software you are using.  I use MySQL for most of the app.  A lot of the optimization I do with MySQL are completely wrong with another database vendor.  In general, you want to make queries that look at the least amount of data, that use a index to retrieve data and use the simplest internal algorithms to calculate the result.   If you have complex calculations occuring in the database on every request, you should try to determine if those could be converted into a lookup table which can use a very simple query.  I often convert lookup tables into objects in memory during application start-up to make it even faster.

One example in my application is zip code distance calculations between cities.  Rather then run the complex trigonometry functions on every request, I pre-calculated all the distances within 100 miles of each city and store that in a simple table that contains city_parent_id, city_id and distance.   I only need to update the table when I add new cities to it.   Often times I can send a list of cities to the queries as a IN (list) search rather then do a join.  This makes those queries many times faster then the original query that did the distance calculation and 1 or more joins.

Generally, you want to design databases in a relational way which minimizes needing to update redundant data in multiple tables.  However, eliminating joins is a wise decision when you are optimizing a larger search table with hundreds of thousands of records.   Often most of our applications can rely on simple ID lookups, but when you design a search application, you may need to be querying data from a number of tables and a number of text fields based on the user input.  In our real estate applications, we have fields for searching city, view, frontage, subdivision and more.  Many of those can have multiple values.  This is a hard to optimize situation.    

I found that the ultimate solution for this case was to generate a search table that is composed of just the fields that can be searched.  The search query doesn't need to join on any other tables, and it uses the MySQL memory storage engine, which can be up to 10 times faster compared to innodb for certain searches.   When you encounter a complex situation like this, it is a good practice to add redundant data back into the table in order to eliminate joins.   Typically, join operations are much more inefficient when dealing with a larger number of records.  

I have yet to find an alternative faster approach, though I'm trying to investigate Single and Multiple Value Attributes in sphinx to determine if it can run these searches faster then MySQL.  I won't know until I've built a large portion of the database in Sphinx and learned its query syntax more.  I know that full-text search performance in Sphinx is much faster, but I haven't been able to confirm how fast attribute searching is on a complex search.

It is also possible to split tables into separate tables so that they are smaller.  This would improve performance in some situations.  On my app, determining how to split the tables is a bit challenging because of how some sites require access to more of the data, and others only some of it sometimes even across state boundaries.   Currently, the index on data provider seems to handle this well enough, though I'm not sure how that will scale to millions of records.   It is possible that all apps will need to split up tables as they approach millions of records so that the database doesn't need to look at as many records for many queries.    Generally though, it is more efficient to store all data in the same table and rely on indexes until you can verify that separate tables would be faster.   Apps that are designed to have a separate database for each domain like Wordpress are often inefficient with disk resources.  It's easier for the database to operate on fewer table files - or at least that is true with MySQL.

Objects vs Includes in CFML

In CFML, you can either use CFC or CFM files to write your application.  CFM files incur more overhead then a CFC object cached in shared memory.  The reason is that a cached CFC object pretty much only carries the overhead of a function call, whereas a cfinclude call has to translate the path information, verify the class is still up to date, and possible do other work.   On Railo, with the performance template inspect option set to Never, I had 2 to 3 times the amount of requests per second by simply converting a CFM file to a CFC file.  A very fast CFML app will probably eliminate all the CFM files from its source code.  However, if you don't cache the CFC instances, then you won't benefit as much from doing this.  

Also, a cached object doesn't need to use <cflock> if you duplicate() it on each request.  This ensures the "this" and "variables" scope are unique to a specific request for that object.   duplicate() is MUCH faster then createobject().  So whenever you haven't spent the time to make a CFC thread-safe yet, just use duplicate.  This may waste memory, but memory access is so much faster then the disk access inccurred with createobject().  Using the trusted cache doesn't completely fix createobject's overhead.  

If your object has so much data in it that you can't duplicate it, no problem.  Just put that data out in an external struct in the application scope instead and replace the external struct's key with complete data, so that no other threads will read partially built data.

Disk access optimization in PHP is near impossible

PHP has to make a "stat" call on every request and require/include call to verify the file hasn't changed. It will then load the script from the APC cache if it has been configured to do so.  If it is not in the cache, PHP will compile the script again from disk. On a heavily loaded system, all the scripts might not fit into APC because many php servers are running many copies of files and many different applications. This can cause a lot of cache lookup misses to occur.  You should do your best to determine how much memory should be allocated to APC and try to monitor this over time.

Eliminating the "stat" call would make PHP quite a bit faster.  You CAN disable the "stat" call if you are sure your PHP scripts will not change until the server restarts, but a lot of php apps are not designed to support this and are very dynamic in nature.  There are also some issues with FastCGI not updating this the same way.  A lot of large php apps require the "stat" call to function correctly.   PHP apps often generate php code as an optimization, but you need stat functioning for this cache to update correctly.    Other software allows the user to perform updates frequently such as install wordpress plugins or update to the latest version.  This makes disabling "stat" not practical unless you have tight control of the PHP app.

So in the current state of things, PHP apps are wasting a lot of disk I/O on every request just to load the library and framework objects again before doing anything useful. That is the worst thing about PHP.  Disk I/O you can't get rid of.

There is no such thing as an application scope in PHP. The shared memory implementation in APC depends on streaming string serialization, which is incredibly inefficient compared to direct memory access.

The commercial version of Quercus PHP with the JIT compiler and Java implementation of APC cache gives you near Java level performance for loading the library and any stored objects.   This would reduce the framework overhead for an optimized PHP app to be near zero just like you can do with the CFML language already for free.

A full featured PHP app is slower then an equivalent full featured CFML app

A lot of developers falsely would believe PHP is faster then CFML since simple benchmarks of small php apps like a "hello world" test using code igniter seems so fast. But these kinds of tests are designed to include the minimal amount of files and display minimal information. If you are utilizing the full amount of PHP framework features for a request, it would have to be significantly slower then the equivalent CFML and somewhat less functional in order to minimize the gap in performance.

Despite this limitation, PHP still works fast enough for most small applications and helps people save money by reusing popular open source projects.

Assuming you can't optimize PHP and can't afford Quercus PHP, then I'd say the Railo CFML engine is the next best thing.

Railo specific optimizations that are superior to ColdFusion's features

In Railo, you can set the server to only check once per request for file changes instead of never. This gives you a good balance. But more importantly, you are able to eliminate 100% of the fileexists, createobject and cfinclude calls in a CFML app after the first time they load. Also, Railo archives are automatically treated as a trusted cache with virtually no disk overhead, so if you design your app appropriately, you'll be able to hit a much higher requests per second on a full featured app with Railo.

My app is getting crazy fast on Railo

I got the "hello world" on my framework down to just 1ms yesterday by eliminating the last bits of disk access that occurs each request. This means my app can do up to 1000 requests per second now when it doesn't call the database. Very cool.

General tips for optimizing a web application

So what should you focus on when trying to squeeze more performance out of you web application. Here is a list of ideas:
Before you start thinking about rewriting your application in another language to improve performance, you should try to improve these key items below.  You'll probably find that you can take your existing a lot further then you expected.

  • Optimize your database
  • Pre-process data into simpler look-up tables
  • Reduce the amount of data selected in a query to only be what is required (i.e. Type the "select" fields manually instead of using "select *")
  • Reduce the number of files you access in a single request
  • Cache remote resources locally whenever possible
  • Try to use shared memory features of your language for retrieving complex objects faster then you can with the database.
  • Redesign you app so that it doesn't require file system stat calls to determine if the scripts have changed or use the trusted cache feature in ColdFusion - each language may use a different term for this.
  • Reduce the amount of times you loop over data to be the least amount possible.
  • Try to implement unbuffered queries or lazy initialization in your language to reduce CPU/memory usage.
  • Try to pass references to data instead of making copies whenever possible
  • Cache data that isn't dynamic as static data that get served from memory or the disk.
  • Try to determine if you can redesign your pages to run dynamic parts in small ajax requests, and use static caching for the rest of the request.

Before making any of these changes though, make sure your benchmark your app in different places and prioritize the optimization work to focus on the slowest code first. A small reduction in disk overhead can make a huge difference on the amount of traffic you can serve with your existing hardware.

A popular commercial tool for optimizing ColdFusion and Railo is FusionReactor.  It helps you monitor the server and isolate various problems you may have in a visual way.   I also built a script that tracks request data in the application scope so that you run a variety of reports on execution times of the recently running requests.   This helps me determine what is important to look at on the production server.  You can also simulate load on the server by using tools such as ApacheBench or running code in a loop to isolate a section of the app.

Good luck speeding up your apps.  If you want to take advantage of our super fast technology, check out Jetendo CMS.

Bookmark & Share

Popular tags on this blog

Performance |