Jump to content






Photo
* * * * * 6 votes

Making IP.Board more efficient

Posted by bfarber , in 3.2.x, Beta 13 October 2011 · 7,012 views

optimization ipb board resources
As part of our ongoing commitment to deliver the best community software on the market, we routinely run IP.Board through various tools designed to measure efficiency and resource usage, helping us to identify areas that could benefit from minor and major refactoring to make the software more efficient. We do this periodically with all of our software in order to ensure we have not introduced new code that will cause resource usage problems on your community. This is typically an unexciting task that does not garner much interest from the average user, but we thought some of you might enjoy hearing about some of the resource-based improvements we have made for 3.2.3.

If hearing about the nitty gritty details that go into making the software used to power your community is of interest to you, read on. If not, and you just want to hear about upcoming features or service offerings, please skip this blog entry and stay tuned for next time!


What are we looking at?

When looking at the amount of resources our software uses, there are multiple points to consider. You have to keep an eye on memory usage, CPU usage and disk space usage. You have to consider the entire server stack too - MySQL, Apache (typically) and PHP, and of course the PHP code itself (e.g. IP.Board). There are some things within our control while developing the software, and there are some things that can only be controlled at the server level, so it is important to consider all angles (and server configurations), while tailoring the software to work in the widest array of environments possible. While we cannot control your MySQL or Apache configuration, some improvements can be made at the software level that will benefit everyone, regardless of your configuration, so we often look to these improvements first.

One tool available to PHP developers that can help while you profile your code to look for improvements is xdebug, and this is where the focus of this blog entry will be. When you use xdebug to profile PHP code, a file (called a "cachegrind" file) is created that you can then load into another software package in order to view the results. The most popular tools to view cachegrind files are kcachegrind (for Linux) and Wincachegrind (for Windows). If you are not familiar with what this looks like, here's a quick screenshot to give you an idea (taken using Wincachegrind, since I'm a Windows user):

Posted Image

Note: Times indicated are not representative of a normal page load. I run my local environment with development mode enabled which utilizes far more resources than normal, due to rebuilding of many caches, including skin cache files, on every page load. The screenshot is merely designed to give you an idea of the interface and types of information available with the tools used.


So...what did we find?

Well, it's rare that we find something big, because we routinely run our software through tools such as this. You generally only find huge areas that can stand for improvement when working through major code refactoring (such as IP.Board 2.x to IP.Board 3.x). We did, however, find many smaller areas of improvement that collectively can mean big resource gains when added up together. If you save 50ms of processing on a page, and that page is hit 10,000 times in a day...well, you can do the math.


Static caching

We found many functions that were called several times on each page load, very often with the same data, returning the same result. Without caching, the functions are required to perform the same operations repeatedly, so we added static inline caching to store the results from the processing so that future calls can just fetch from the cache, instead of reprocessing the same data, and returning the same result.

Some functions where we added static inline caching for 3.2.3:
  • IPSMember::unpackGroup()
  • IPSMember::makeProfileLink()
  • IPSLib::getEnabledApplications()
  • IPSMember::setUpModerator()
Unnecessary code processing

Every operation a PHP script has to perform consumes some level of resources; sometimes this is negligible and sometimes this is measurable and important. When code is executed that does not need to execute, however, it is simply a waste of resources, no matter how small that may be. In reviewing the profiling results, we found some code that was executing which simply did not need to, which we removed/fixed.
  • We found some various array checks in the output class that were unnecessary, as the array was a class property and initialized when the class was instantiated into an object.
  • We were parsing some dates in the search results area twice, when we only need to do so once.
  • We were parsing the post content in search results, even when we displayed the results as topics. Parsing post content is an expensive operation, so this unnecessary operation was particularly wasteful.
  • We found that one particular IP.SEO hook is outdated and provides no benefit as of 3.2.0. This will be investigated (and potentially removed) with the next release of IP.SEO: http://community.inv...topiclinks-hook
  • Several operations were running when displaying a mini-calendar which were unnecessary (because the results were never displayed in the skin).
  • The IP.Downloads board index latest files hook was loading the category helper class twice, unnecessarily. Similarly, some of the functions within this class were called multiple times, unnecessarily.
  • IP.Downloads was utilizing an unnecessary "GROUP BY" SQL clause. Generally, this requires a temp table to be created, and in this case it was unnecessary.
  • The UCP Manage Attachments page queries for attachments even if it discovers there are none (through the SELECT COUNT(*)... query that is first run). We removed this second query when there are no attachments to retrieve.
More specific changes...

Additionally, we have found several more specific areas of improvement that we have corrected for 3.2.3. These areas of improvement, as a general rule, improve resource usage more than the items listed above, but are less trafficked and thus less likely to be noticeable. As such, your mileage may vary with these improvements depending on how your site is utilized by your members.

We build a cache of each item's 'like' data, and store this separately so that we do not need to query all of an item's like data on every page load ('like' data here means the follow/unfollow system in IP.Board and applications). We were caching this for one hour, however the software is doing a good job of ensuring the cache is rebuilt as needed when records are deleted and so forth. Thus, we have increased this cache from one hour to one day, limiting how often the software needs to rebuild this cache.

A change that we made to one of our JSON encoding routines was causing significantly more resource usage compared to previous versions. We reverted that change back to match 3.2.2, saving nearly 100ms of processing per page this occurred on.

We have a call in our registry destructor that saves topic markers back to the database so that they are not lost between page loads. This is necessary, of course, in order to maintain topic markers across different browsers and computers. We found, however, that this was occurring even when you were using an application that does not utilize the item marking system in our framework (for instance, when you are browsing IP.Calendar). By adding a check in the destructor, we save the software from having to load up the topic marker library, parse all topic markers, just to save them back to the database when the application you are using does not need item marking capabilities.

We discovered a minor bug with the reputation cache loading when the central comments class was utilized in some cases. In certain places, the reputation cache would not load correctly, and while the software largely corrected for this automatically, it meant using more resources than necessary to rebuild this cache because it did not load correctly the first time. By fixing this bug, we saved the software a lot of extra unnecessary processing (and fixed an unreported/undiscovered bug in the process :whistle: ).

We found some applications were not loading caches (from the cache_store table) that they were using. We added these to the initialization routines for the respective applications, saving database queries later on to fetch the caches individually.

We discovered a few areas that were calling IPSMember::buildDisplayData() were not caching the member data as designed. Upon inspection, this was because the member data was joined onto the main queries, rather than fetched separately, so the check in buildDisplayData() to verify the same information is being passed always failed. By refactoring how we pulled members and called this method in a few places, we allow the software to cache the results and prevent parsing of data multiple times unnecessarily. In most cases this means an extra database query, but less PHP processing - the tradeoff is worth it in this case, as the PHP processing is more expensive.

In IP.Content, attachment parsing was happening in a loop for each record that was being displayed in listings. While this is not a problem by itself, per-se, we were able to refactor the code to parse all of the attachments at once, allowing us to run just one database query, instead of one per-record.

Again, in IP.Content, we found that the topic posting library was being called when you were a moderator and unpublished or unapproved records were being displayed to you. The topic library was called in order to post the topic (that mirrors the article, or stores the comments, depending on your configuration), however it was not necessary since the article is not yet visible. We added some simple checks in the code to save from having to load this library unnecessarily.

And last, but not least, we found a major improvement area for IP.Calendar. The mini-calendars displayed in IP.Calendar (and in IP.Content mini-calendar plugin blocks, and on the board index if you utilize the calendar mini-calendar hook) are very much static HTML. The only change that occurs in these mini-calendars is that the current day is bolded, if you are viewing the current month. We utilized some clever caching techniques (using the cache_store table) in order to save the HTML output that is generated, and then we reuse this output instead of rebuilding it repeatedly. The end result is that mini-calendars only need to rebuild once a day now, instead of on every single page load.


Conclusion

IP.Board and our addon applications have large, complicated code-bases. We are beyond the stage, for the most part, where you will find silver-bullet resource hogs in the code that you can fix by adding a simple database index, or changing a couple lines of code. Instead, we are always on the hunt for areas of the code that are heavily utilized (such as library methods) as any small improvements in these areas will add up to significant gains based on the sheer number of calls to the methods.

The above changes may seem minor and unimportant, but the end result was that some pages, following the changes noted above, utilized anywhere from 10ms to 200ms less processing time. When you multiply that by the number of times the pages are viewed in the course of a day, you start to see very real and useful improvements in loading time, without the loss of any existing functionality. These are resource improvements that benefit all sites, from the smallest to the largest, and we are glad we could implement these for 3.2.3 to help you make the most of your community.

  • Matt, • Jay •, Phil_B and 32 others like this



Thanks. Looking forward for 3.2.3.
    • DarkGizmo, Sandakelum™, LoveSlflaming and 1 other like this
Love it... :)
Great job...
Always nice to hear about "under the hood" improvements of this nature. Thanks for the detailed post. :)

..Al
Photo
• Jay •
Oct 13 2011 10:44 AM
This article needs more tags. ;)
Nice!

Although..

Again, in IP.Content, we found that the topic posting library was being called when you were a moderator and unpublished or unapproved records were being displayed to you. The topic library was called in order to post the topic (that mirrors the article, or stores the comments, depending on your configuration), however it was not necessary since the article is not yet visible. We added some simple checks in the code to save from having to load this library unnecessarily.


Sounds like it may affect my Content Spy app :(
Great work! I appreciate your attention to these details.
nice
Very interesting article.

Thank you Brandon.
Wow. Great to see that with this tools all the IP.software will work much faster.

Off course there are still many things that can also be optimized on the server side.
We for example use multiple nginx webservers where all the static content is hosted on. This made our Apache/php server much happier since it's only handling the dynamic IP.software pages.
Look after the ms and the s will look are themselves :)
Photo
Dan Whitehouse
Oct 13 2011 02:48 PM
Awesome, thanks for the hard work guys.
Excellent. Now all we need is for you to release 3.2.3 so we can enjoy the benefits.
Wow after reading this I am really looking forward to the release of 3.2.3 in order to enjoy these benefits.
Photo
ThatForumGuy
Oct 13 2011 05:47 PM
That's great! I cannot wait for 3.2.3!
Awesome!
Sounds good, please work harder, don't make mistakes (introduce new bugs as you usually do) and release the next version asap for all your customers. And make the std editor more functional as other company already did. Thanks.

Sounds good, please work harder, don't make mistakes (introduce new bugs as you usually do) and release the next version asap for all your customers. And make the std editor more functional as other company already did. Thanks.

Any other stabs you wanted to take? xD
    • changeindia likes this
Wow, this was a fascinating read! It's great to get a real inner look at how a complex software platform like IP.Board is built and optimized. I'm currently taking a stab at building my own PHP framework of sorts, and this article has been very insightful to me as an aspiring developer.

I know some people might find behind-the-scenes "dev updates" like this boring, but I personally hope to see more of them in the future.
    • bfarber likes this
Here is one way you'll see a HUGE performance gain: stop resolving MANY-TO-MANY relationships using comma-delimited values in the database!

It's not good practice. Let the database do it's job. Add a couple of indexes to the table, and it will process the many-to-many relationship faster than you can. You currently "speed it up" by caching the values, but its still a dumb idea.
    • RidinHighSpeeds and Josh Bond like this
It's great to hear these are coming along. However, I wonder if there's any plans to handle scalability issues that extremely large forums experience. I know there's not many of us but we're currently doing more than 100m page views a month on minecraftforum. As a result we've ran into a series of db contention issues. Is there any plans to address these directly, or introduce a database driver capable of handling master-slave setups properly?

Some of the most contended tables are the cache table, item markers, and sessions.

If you could allow us to completely sidestep the usage of the cache table by specifying appropriate cache servers (memcached or redis for example), and help optimize frequent operations on the other two it'd help us scale much higher than we are at right now. And again full support for master-slave setups would be incredibly useful.

July 2014

S M T W T F S
  12345
6789101112
13141516171819
2021222324 25 26
2728293031  

Recent Entries

Latest Visitors

  • Photo
    Daniel Hernandez
    5 minutes ago
  • Photo
    The-Rainb0w
    24 minutes ago
  • Photo
    Mini Lab Help.com
    Today, 02:20 PM
  • Photo
    lsurebel4
    Today, 02:13 PM
  • Photo
    Sayig
    Today, 02:04 PM
  • Photo
    zackc
    Today, 02:00 PM
  • Photo
    valvarez
    Today, 01:54 PM
  • Photo
    John Westervelt
    Today, 01:47 PM
  • Photo
    deivis1251
    Today, 01:31 PM
  • Photo
    Morgin
    Today, 01:30 PM

Recent Comments

Search My Blog