News and Announcements

  • entries
    500
  • comments
    13,810
  • views
    4,815,791

Contributors to this blog

Making IP.Board more efficient

As part of our ongoing commitment to deliver the best community software on the market, we routinely run IP.Board through various tools designed to measure efficiency and resource usage, helping us to identify areas that could benefit from minor and major refactoring to make the software more efficient. We do this periodically with all of our software in order to ensure we have not introduced new code that will cause resource usage problems on your community. This is typically an unexciting task that does not garner much interest from the average user, but we thought some of you might enjoy hearing about some of the resource-based improvements we have made for 3.2.3.

If hearing about the nitty gritty details that go into making the software used to power your community is of interest to you, read on. If not, and you just want to hear about upcoming features or service offerings, please skip this blog entry and stay tuned for next time!


What are we looking at?

When looking at the amount of resources our software uses, there are multiple points to consider. You have to keep an eye on memory usage, CPU usage and disk space usage. You have to consider the entire server stack too - MySQL, Apache (typically) and PHP, and of course the PHP code itself (e.g. IP.Board). There are some things within our control while developing the software, and there are some things that can only be controlled at the server level, so it is important to consider all angles (and server configurations), while tailoring the software to work in the widest array of environments possible. While we cannot control your MySQL or Apache configuration, some improvements can be made at the software level that will benefit everyone, regardless of your configuration, so we often look to these improvements first.

One tool available to PHP developers that can help while you profile your code to look for improvements is xdebug, and this is where the focus of this blog entry will be. When you use xdebug to profile PHP code, a file (called a "cachegrind" file) is created that you can then load into another software package in order to view the results. The most popular tools to view cachegrind files are kcachegrind (for Linux) and Wincachegrind (for Windows). If you are not familiar with what this looks like, here's a quick screenshot to give you an idea (taken using Wincachegrind, since I'm a Windows user):



Note: Times indicated are not representative of a normal page load. I run my local environment with development mode enabled which utilizes far more resources than normal, due to rebuilding of many caches, including skin cache files, on every page load. The screenshot is merely designed to give you an idea of the interface and types of information available with the tools used.


So...what did we find?

Well, it's rare that we find something big, because we routinely run our software through tools such as this. You generally only find huge areas that can stand for improvement when working through major code refactoring (such as IP.Board 2.x to IP.Board 3.x). We did, however, find many smaller areas of improvement that collectively can mean big resource gains when added up together. If you save 50ms of processing on a page, and that page is hit 10,000 times in a day...well, you can do the math.


Static caching

We found many functions that were called several times on each page load, very often with the same data, returning the same result. Without caching, the functions are required to perform the same operations repeatedly, so we added static inline caching to store the results from the processing so that future calls can just fetch from the cache, instead of reprocessing the same data, and returning the same result.

Some functions where we added static inline caching for 3.2.3: [*]IPSMember::unpackGroup() [*]IPSMember::makeProfileLink() [*]IPSLib::getEnabledApplications() [*]IPSMember::setUpModerator()
Unnecessary code processing

Every operation a PHP script has to perform consumes some level of resources; sometimes this is negligible and sometimes this is measurable and important. When code is executed that does not need to execute, however, it is simply a waste of resources, no matter how small that may be. In reviewing the profiling results, we found some code that was executing which simply did not need to, which we removed/fixed.
[*]We found some various array checks in the output class that were unnecessary, as the array was a class property and initialized when the class was instantiated into an object. [*]We were parsing some dates in the search results area twice, when we only need to do so once. [*]We were parsing the post content in search results, even when we displayed the results as topics. Parsing post content is an expensive operation, so this unnecessary operation was particularly wasteful. [*]We found that one particular IP.SEO hook is outdated and provides no benefit as of 3.2.0. This will be investigated (and potentially removed) with the next release of IP.SEO: http://community.inv...topiclinks-hook [*]Several operations were running when displaying a mini-calendar which were unnecessary (because the results were never displayed in the skin). [*]The IP.Downloads board index latest files hook was loading the category helper class twice, unnecessarily. Similarly, some of the functions within this class were called multiple times, unnecessarily. [*]IP.Downloads was utilizing an unnecessary "GROUP BY" SQL clause. Generally, this requires a temp table to be created, and in this case it was unnecessary. [*]The UCP Manage Attachments page queries for attachments even if it discovers there are none (through the SELECT COUNT(*)... query that is first run). We removed this second query when there are no attachments to retrieve.
More specific changes...

Additionally, we have found several more specific areas of improvement that we have corrected for 3.2.3. These areas of improvement, as a general rule, improve resource usage more than the items listed above, but are less trafficked and thus less likely to be noticeable. As such, your mileage may vary with these improvements depending on how your site is utilized by your members.

We build a cache of each item's 'like' data, and store this separately so that we do not need to query all of an item's like data on every page load ('like' data here means the follow/unfollow system in IP.Board and applications). We were caching this for one hour, however the software is doing a good job of ensuring the cache is rebuilt as needed when records are deleted and so forth. Thus, we have increased this cache from one hour to one day, limiting how often the software needs to rebuild this cache.

A change that we made to one of our JSON encoding routines was causing significantly more resource usage compared to previous versions. We reverted that change back to match 3.2.2, saving nearly 100ms of processing per page this occurred on.

We have a call in our registry destructor that saves topic markers back to the database so that they are not lost between page loads. This is necessary, of course, in order to maintain topic markers across different browsers and computers. We found, however, that this was occurring even when you were using an application that does not utilize the item marking system in our framework (for instance, when you are browsing IP.Calendar). By adding a check in the destructor, we save the software from having to load up the topic marker library, parse all topic markers, just to save them back to the database when the application you are using does not need item marking capabilities.

We discovered a minor bug with the reputation cache loading when the central comments class was utilized in some cases. In certain places, the reputation cache would not load correctly, and while the software largely corrected for this automatically, it meant using more resources than necessary to rebuild this cache because it did not load correctly the first time. By fixing this bug, we saved the software a lot of extra unnecessary processing (and fixed an unreported/undiscovered bug in the process :whistle: ).

We found some applications were not loading caches (from the cache_store table) that they were using. We added these to the initialization routines for the respective applications, saving database queries later on to fetch the caches individually.

We discovered a few areas that were calling IPSMember::buildDisplayData() were not caching the member data as designed. Upon inspection, this was because the member data was joined onto the main queries, rather than fetched separately, so the check in buildDisplayData() to verify the same information is being passed always failed. By refactoring how we pulled members and called this method in a few places, we allow the software to cache the results and prevent parsing of data multiple times unnecessarily. In most cases this means an extra database query, but less PHP processing - the tradeoff is worth it in this case, as the PHP processing is more expensive.

In IP.Content, attachment parsing was happening in a loop for each record that was being displayed in listings. While this is not a problem by itself, per-se, we were able to refactor the code to parse all of the attachments at once, allowing us to run just one database query, instead of one per-record.

Again, in IP.Content, we found that the topic posting library was being called when you were a moderator and unpublished or unapproved records were being displayed to you. The topic library was called in order to post the topic (that mirrors the article, or stores the comments, depending on your configuration), however it was not necessary since the article is not yet visible. We added some simple checks in the code to save from having to load this library unnecessarily.

And last, but not least, we found a major improvement area for IP.Calendar. The mini-calendars displayed in IP.Calendar (and in IP.Content mini-calendar plugin blocks, and on the board index if you utilize the calendar mini-calendar hook) are very much static HTML. The only change that occurs in these mini-calendars is that the current day is bolded, if you are viewing the current month. We utilized some clever caching techniques (using the cache_store table) in order to save the HTML output that is generated, and then we reuse this output instead of rebuilding it repeatedly. The end result is that mini-calendars only need to rebuild once a day now, instead of on every single page load.


Conclusion

IP.Board and our addon applications have large, complicated code-bases. We are beyond the stage, for the most part, where you will find silver-bullet resource hogs in the code that you can fix by adding a simple database index, or changing a couple lines of code. Instead, we are always on the hunt for areas of the code that are heavily utilized (such as library methods) as any small improvements in these areas will add up to significant gains based on the sheer number of calls to the methods.

The above changes may seem minor and unimportant, but the end result was that some pages, following the changes noted above, utilized anywhere from 10ms to 200ms less processing time. When you multiply that by the number of times the pages are viewed in the course of a day, you start to see very real and useful improvements in loading time, without the loss of any existing functionality. These are resource improvements that benefit all sites, from the smallest to the largest, and we are glad we could implement these for 3.2.3 to help you make the most of your community.
















Ricchan, The Heff, Noles and 32 others like this


32 Comments



Posted

Love it... :)
Great job...

  • Loading...

Share this comment


Link to comment

Posted

Always nice to hear about "under the hood" improvements of this nature. Thanks for the detailed post. :)

..Al

  • Loading...

Share this comment


Link to comment

Posted

This article needs more tags. ;)

  • Loading...

Share this comment


Link to comment

Posted

Nice!

Although..
Again, in IP.Content, we found that the topic posting library was being called when you were a moderator and unpublished or unapproved records were being displayed to you. The topic library was called in order to post the topic (that mirrors the article, or stores the comments, depending on your configuration), however it was not necessary since the article is not yet visible. We added some simple checks in the code to save from having to load this library unnecessarily.

Sounds like it may affect my Content Spy app :(




  • Loading...

Share this comment


Link to comment

Posted

Great work! I appreciate your attention to these details.

  • Loading...

Share this comment


Link to comment

Posted

nice

  • Loading...

Share this comment


Link to comment

Posted

Very interesting article.

Thank you Brandon.

  • Loading...

Share this comment


Link to comment

Posted

Wow. Great to see that with this tools all the IP.software will work much faster.

Off course there are still many things that can also be optimized on the server side.
We for example use multiple nginx webservers where all the static content is hosted on. This made our Apache/php server much happier since it's only handling the dynamic IP.software pages.

  • Loading...

Share this comment


Link to comment

Posted

Look after the ms and the s will look are themselves :)

  • Loading...

Share this comment


Link to comment

Posted

Awesome, thanks for the hard work guys.

  • Loading...

Share this comment


Link to comment

Posted

Excellent. Now all we need is for you to release 3.2.3 so we can enjoy the benefits.

  • Loading...

Share this comment


Link to comment

Posted

Wow after reading this I am really looking forward to the release of 3.2.3 in order to enjoy these benefits.

  • Loading...

Share this comment


Link to comment

Posted

That's great! I cannot wait for 3.2.3!

  • Loading...

Share this comment


Link to comment

Posted

Awesome!

  • Loading...

Share this comment


Link to comment

Posted

Sounds good, please work harder, don't make mistakes (introduce new bugs as you usually do) and release the next version asap for all your customers. And make the std editor more functional as other company already did. Thanks.

  • Loading...

Share this comment


Link to comment

Posted

[quote name='Minos' timestamp='1318559856']
Sounds good, please work harder, don't make mistakes (introduce new bugs as you usually do) and release the next version asap for all your customers. And make the std editor more functional as other company already did. Thanks.

changeindia likes this
  • Loading...

Share this comment


Link to comment

Posted

Wow, this was a fascinating read! It's great to get a real inner look at how a complex software platform like IP.Board is built and optimized. I'm currently taking a stab at building my own PHP framework of sorts, and this article has been very insightful to me as an aspiring developer.

I know some people might find behind-the-scenes "dev updates" like this boring, but I personally hope to see more of them in the future.

bfarber likes this
  • Loading...

Share this comment


Link to comment

Posted

Here is one way you'll see a HUGE performance gain: stop resolving MANY-TO-MANY relationships using comma-delimited values in the database!

It's not good practice. Let the database do it's job. Add a couple of indexes to the table, and it will process the many-to-many relationship faster than you can. You currently "speed it up" by caching the values, but its still a dumb idea.

Josh Bond and RidinHighSpeeds like this
  • Loading...

Share this comment


Link to comment

Posted

It's great to hear these are coming along. However, I wonder if there's any plans to handle scalability issues that extremely large forums experience. I know there's not many of us but we're currently doing more than 100m page views a month on minecraftforum. As a result we've ran into a series of db contention issues. Is there any plans to address these directly, or introduce a database driver capable of handling master-slave setups properly?

Some of the most contended tables are the cache table, item markers, and sessions.

If you could allow us to completely sidestep the usage of the cache table by specifying appropriate cache servers (memcached or redis for example), and help optimize frequent operations on the other two it'd help us scale much higher than we are at right now. And again full support for master-slave setups would be incredibly useful.

  • Loading...

Share this comment


Link to comment

Posted

Congratulations, for me you're doing the best decision, that it's taking your time to make the core of the application excellent.

This will be one of the best version updates ever witch site with medium and high traffic will enjoy!

Regards

  • Loading...

Share this comment


Link to comment

Posted

[quote name='Curse.com' timestamp='1318612602']
It's great to hear these are coming along. However, I wonder if there's any plans to handle scalability issues that extremely large forums experience. I know there's not many of us but we're currently doing more than 100m page views a month on minecraftforum. As a result we've ran into a series of db contention issues. Is there any plans to address these directly, or introduce a database driver capable of handling master-slave setups properly?Some of the most contended tables are the cache table, item markers, and sessions. If you could allow us to completely sidestep the usage of the cache table by specifying appropriate cache servers (memcached or redis for example), and help optimize frequent operations on the other two it'd help us scale much higher than we are at right now. And again full support for master-slave setups would be incredibly useful.


To be honest, database efficiency has been relatively stable in our testing of 3.2.x and most of the bottlenecks appear to be happening at the PHP level. Disk access is one such bottleneck (due to many includes on each page).

We have some plans with regards to item markers for a future version, but they will be pretty significant changes, and we did not feel comfortable introducing them in a point release.

With regards to cache_store - you can already use memcache (or eaccelerator, xcache, apc, etc.) for that. Submit a ticket if you need info, or if something is not working right or as you expect, please post an appropriate bug report or suggestion topic.

I would be interested in hearing any performance related changes you'd like to see, or issues you are noticing. Please send me a PM.

  • Loading...

Share this comment


Link to comment

Posted

While I don't understand the technical aspects of what's detailed here, I get the just of it - an increase in efficiency. Whooohoooo :)

  • Loading...

Share this comment


Link to comment

Posted

Nice blog post :)

  • Loading...

Share this comment


Link to comment

Posted

[quote name='Curse.com' timestamp='1318612602']
It's great to hear these are coming along. However, I wonder if there's any plans to handle scalability issues that extremely large forums experience. I know there's not many of us but we're currently doing more than 100m page views a month on minecraftforum. As a result we've ran into a series of db contention issues. Is there any plans to address these directly, or introduce a database driver capable of handling master-slave setups properly?Some of the most contended tables are the cache table, item markers, and sessions. If you could allow us to completely sidestep the usage of the cache table by specifying appropriate cache servers (memcached or redis for example), and help optimize frequent operations on the other two it'd help us scale much higher than we are at right now. And again full support for master-slave setups would be incredibly useful.


We do have clients that use master/slave setups actually it's just not in the core product.

  • Loading...

Share this comment


Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now