Caching external data in PHP

27

Caching data. If you are a developer you must have heard about it somewhere. Is it really that important? There is only one thing I can say to that: yes!
There are a lot of reasons why you should start caching data that has been calculated. The most common reason is to keep the owner of the data happy, saving him/her bandwidth and server capacity.

In this article I will be telling you how to cache data given from an external service, but can also be used to save local results.

cachingpost

Caching is also important for you! It can speed up your scripts and won’t bug your visitors with long loading times (which will get them walking.)
Imagine this: You have a very nice script which is very powerful but takes a while to load, since it calculates a whole lot of stuff. This action would take up about 4 seconds. Maybe it’s time to question if this action is really necessary every time a page is loaded, will this data fluctuate that much?

A great example of this problem is getting when you are getting your data from an external service. This will always require an other server to respond to your needs, slowing down the script. Wouldn’t it be better to get the information you need and save it on your server for the times you need that data?

There is also something else to keep in mind: How many times is the data updated and how many times do I need to refresh this data? If you are loading a list of things going on right at this moment, then you might not want to cache anything, for the information is real time. But if you want, let’s say, a weekly report on something. Then obviously you only need to update this data once a week.

To explain how to create your own caching method I am going to use another Last.fm API function. We are going to save last weeks favorite artist of a user and refresh this data once a week.

Getting the data to work with

First things first: create a folder using any FTP application. Make sure this folder is writable. I put a folder called “cache” in the root of my script folder. Make sure this folder is writable and readable for PHP, most people just CHMOD the folder to 777.
We’ll be writing our cached data in this folder. Also create the file you’ll be caching too, I used an empty file called “lastfm.xml”. Be sure to make it read and writable.

Next you want to go and grab any service you’d like. an XML provider, an RSS feed or even your own script. It doesn’t matter what you are caching. In this tutorial I am going to work with an XML providing service.

Next up is to create a new PHP class which is going to handle our caching and the gathering of data. Let’s create the class to do this for us. I created a file called “Caching.php” for the class to go in.

Take a look at this skeleton:

<?php
	/*
	 * Caching	A small PHP class to
	 */

	class Caching {

		var $filePath = "";
		var $apiURI = "";

		function __construct() {

		}

		function checkForRenewal() {

		}

		function getExternalInfo() {

		}

		function stripAndSaveFile() {

		}

	}
?>

This piece of code will serve as the foundation of our class which will cache our information. We’ll be filling up each function step by step.

First we need a constructor which will accept the path to the file and a URI to the API. In this case, the XML data can be requested by only calling an URL.
The constructor will be executed the moment you instantiate the class.
A few things we’ll be doing in this function are:

  • Check the input variables.
  • Check if the local file needs to be updated.
  • Get new data if refresh is needed and save it.

I’ve set up the following piece of code for the constructor:

function __construct($filePath, $apiURI) {
	//check if the file path and api URI are specified, if not: break out of construct.
	if (strlen($filePath) > 0 && strlen($apiURI) > 0) {
		//set the local file path and api path
		$this->filePath = $filePath;
		$this->apiURI = $apiURI;

		//does the file need to be updated?
		if ($this->checkForRenewal()) {

			//get the data you need
			$xml = $this->getExternalInfo();

			//save the data to your file
			$this->stripAndSaveFile($xml);

		} else {
			//no need to update the file
			return true;
		}

	} else {
		echo "No file path and / or api URI specified.";
		return false;
	}
}

The process of this function is quiet self explanatory. So now we need to go and enable all the function calls within the constructor.

Making the functions work

First up is the time check called “checkForRenewal”. In this function we’ll be checking the file that has been set in the constructor and see if the last adjusted time has been a week ago. To accomplish this, we are going to compare the times using time() and filemtime(). These methods are extremely easy to use and can add any given amount of seconds to the time, which we need to check whether the file is a week old.

Use the following code for the “checkForRenewal” function:

function checkForRenewal() {
	//set the caching time (in seconds)
	$cachetime = (60 * 60 * 24 * 7); //one week

	//get the file time
	$filetimemod = filemtime($this->filePath) + $cachetime;

	//if the renewal date is smaller than now, return true; else false (no need for update)
	if ($filetimemod < time()) {
		return true;
	} else {
		return false;
	}
}

What this function does is get the time which the file was last modified and add the set seconds to it and see if that time is smaller than now. The cache time is defined in second, so 60 seconds * 60 minutes * 24 hours * 7 days would make a week worth of seconds. You can adjust this to any given time you want. For debugging I suggest setting the time to only 60 seconds.

The next function will be how to get the external of the URI we specified in the constructor. I’ve written a post about how to read XML in PHP before, so I’ll go over this function very briefly.

Reading the XML and saving it

In getExternalInfo() we are going to make the API call and just return the result as an XML set.

Take a look at the following code:

function getExternalInfo() {
	if ($xml = @simplexml_load_file($this->apiURI)) {
		return $xml;
	} else {
		return false;
	}
}

This must have been one of the shortest PHP function I ever wrote, this is all you need to do in order to get the XML you want (if the service provides XML through just doing simple URI calls.)

Next up is going through the XML and only keeping parts you want to save in your XML file.

Again: going to skip through some parts of the next function because you can read more about reading xml in an older article.

Looking at the XML returned by Last.fm I saw the first problem: unnecessary information. The name of the artist and play count are the only things I am interested in. Mbid? Don’t need that. Url? Not this time!
So the next function, “stripAndSaveFile”, will make us read XML, create our own and save it to a file. Sounds simple enough right?

Take a look at what I wrote:

function stripAndSaveFile($xml) {
	//put the artists in an array
	$artists = $xml->weeklyartistchart->artist;

	//building the xml object for SimpleXML
	$output = new SimpleXMLElement("<artists></artists>");

	//get only the top 10
	for ($i = 0; $i < 10; $i++) {

		//create a new artist
		$insert = $output->addChild("artist");

		//insert name and playcount childs to the artist
		$insert->addChild("name", $artists[$i]->name);
		$insert->addChild("playcount", $artists[$i]->playcount);

	}

	//save the xml in the cache
	file_put_contents($this->filePath, $output->asXML());
}

First thing it does is put the node inside a new variable to keep everything readable. The next lines are to create a new SimpleXML object you can work with.
new SimpleXMLElement(““) means that I just created an XML document with as the root element. We’ll be filling up this element with elements. Sounds logical? It sure does!

In the the loop that comes next we’re looping through the first ten entries and create a new artist for every time you are in the loop.

$insert is a new node that will be inserted into the root. The and nodes are also inserted into the node with the information we got from the XML.

Pfew! Now it’s time to save the file on your file system! Which is the last line in this function. It will export the XML as a string and save it in the file.

Making the call!

Up until now we haven’t called the function once. So create a new script and put the following code in it:

<?php
	ini_set('display_errors', 1);
	error_reporting(E_ALL);

	include('Caching.php');
	$caching = new Caching("cache/lastfm.xml",
	"http://ws.audioscrobbler.com/2.0/?method=user.getweeklyartistchart&user=xgayax&api_key=b25b959554ed76058ac220b7b2e0a026");
?>

I’ve put the first two lines in to ensure errors will be shown. If you have that by default; feel free to remove these lines.
Next is including the class file and then make a new instance of the class, giving the two required parameters: The path to the cache XML and the URL to the Last.fm API.

You won’t see anything if it’s correct. Now check out your cache to see if your XML is saved.

Congratulations! You just cached some information.

The source

Missed anything? Here is the source in it’s full galore!

The Caching class:

<?php
	/*
	 * Caching	A small PHP class to get data from Last.fm and cache it
	 * Author:	Gaya Kessler
	 * URL:		http://www.gayadesign.com/
	 */

	class Caching {

		var $filePath = "";
		var $apiURI = "";

		function __construct($filePath, $apiURI) {
			//check if the file path and api URI are specified, if not: break out of construct.
			if (strlen($filePath) > 0 && strlen($apiURI) > 0) {
				//set the local file path and api path
				$this->filePath = $filePath;
				$this->apiURI = $apiURI;

				//does the file need to be updated?
				if ($this->checkForRenewal()) {

					//get the data you need
					$xml = $this->getExternalInfo();

					//save the data to your file
					$this->stripAndSaveFile($xml);

					return true;
				} else {
					//no need to update the file
					return true;
				}

			} else {
				echo "No file path and / or api URI specified.";
				return false;
			}
		}

		function checkForRenewal() {
			//set the caching time (in seconds)
			$cachetime = (60 * 60 * 24 * 7); //one week

			//get the file time
			$filetimemod = filemtime($this->filePath) + $cachetime;

			//if the renewal date is smaller than now, return true; else false (no need for update)
			if ($filetimemod < time()) {
				return true;
			} else {
				return false;
			}
		}

		function getExternalInfo() {
			if ($xml = @simplexml_load_file($this->apiURI)) {
				return $xml;
			} else {
				return false;
			}
		}

		function stripAndSaveFile($xml) {
			//put the artists in an array
			$artists = $xml->weeklyartistchart->artist;

			//building the xml object for SimpleXML
			$output = new SimpleXMLElement("<artists></artists>");

			//get only the top 10
			for ($i = 0; $i < 10; $i++) {

				//create a new artist
				$insert = $output->addChild("artist");

				//insert name and playcount childs to the artist
				$insert->addChild("name", $artists[$i]->name);
				$insert->addChild("playcount", $artists[$i]->playcount);

			}

			//save the xml in the cache
			file_put_contents($this->filePath, $output->asXML());
		}

	}

?>

Calling the scripts:

<?php
	ini_set('display_errors', 1);
	error_reporting(E_ALL);

	include('Caching.php');
	$caching = new Caching($_SERVER['DOCUMENT_ROOT']."/scripts/caching/cache/lastfm.xml",
	"http://ws.audioscrobbler.com/2.0/?method=user.getweeklyartistchart&user=xgayax&api_key=b25b959554ed76058ac220b7b2e0a026");
?>

Articles like this one

If you liked this article you can add this post to:


 

25 Comments

  1. Marco said: May 14, 2009 at 8:21 pm | Permalink

    Whoah, that’s one detailed tutorial.

    One question: Why aren’t you user cURL for this? It’s way more easier, multi-threaded/faster and optimized to do this kind of stuff..


  2. Gaya said: May 14, 2009 at 8:26 pm | Permalink

    Hi Marco, thanks for the comment.
    I bet simplexml_load_file uses cURL, I am not sure. This way the URL is converted into XML nodes and not just the string :)
    It’s to keep things easy I guess haha.


  3. Marco said: May 14, 2009 at 8:29 pm | Permalink

    No, simplexml_load_file sadly doesn’t use cURL.

    You should look into cURL and the retrieved data can be parsed with simplexml_load_file ;) . It’ll improve the performance drastically!

    One downside: cURL isn’t enabled on every server :) .

    Anyway, once again thanks for sharing this useful information. Keep up the great work mate!


  4. Gaya said: May 14, 2009 at 8:36 pm | Permalink

    Ah I see. I’ve used cURL before, didn’t know it was so much faster haha!

    This version does make it a hell of a lot easier :P

    Thanks for commenting Marco! You also keep up the goods!


  5. Vincent Nguyen said: May 14, 2009 at 8:46 pm | Permalink

    You can save many typing with some tricks!
    strlen($filePath) > 0 && strlen($apiURI) > 0 => Simply use strlen($filePath) && strlen($apiURI)
    Or
    ($filetimemod return $filetimemod < time()!
    Anyway, very detail tutorial!
    And should add an extra parameter to define the “cache* time in method checkForRenewal()!


  6. Gaya said: May 14, 2009 at 8:54 pm | Permalink

    Hi Vincent!

    Thanks for your two cents.

    Yeah, these things can improve the length of my code, but for me it doesn’t really improve the readability.

    I was thinking about making the cache time a parameter for the constructor. But then again… it would fit there since the class specifically caches the Last.fm.

    Anyway: Thanks for this useful comment!


  7. Evan said: May 14, 2009 at 10:41 pm | Permalink

    I agree with the above poster about cURL. It does reduce load time and reduces system resources.

    This is great though Gaya, thanks for sharing!


  8. Gaya said: May 15, 2009 at 2:05 pm | Permalink

    Thanks for the comment Evan :)
    I surely will try to compare this with cURL next time :)


  9. Sieb said: May 25, 2009 at 11:45 am | Permalink

    New article!!!


  10. Gaya said: May 26, 2009 at 4:13 pm | Permalink

    Patience Sieb :) Patience


  11. Dennis said: June 6, 2009 at 1:01 am | Permalink

    Thanks for the great article. I Dugg it. You can too. http://digg.com/d1t4kO


  12. Gaya said: June 6, 2009 at 10:40 am | Permalink

    Thanks Dennis! And thanks for the digg :D


  13. Josly said: July 7, 2009 at 7:36 am | Permalink

    thanks for sharing!


  14. paris @ united worx web design said: July 28, 2009 at 11:35 am | Permalink

    Nice tutorial, did something similar recently since i have intgrated google anlytics with my CMS i have made sure to cache the results in the DB, istead of the file system, and only update them if they are older than a day so my application will sort of run a psudo cron job every 1 day to refresh the data and then just get whats in the DBs cache.

    I actually store all the feed data as an array and store the serialized version of that array in the DB together with the date it was fetched.

    although i need to improve it a bit since when you load the page and it it needs to refresh the data from google analytics it will takes 5-6 sec and its annoying. I should trigger the data refresh via ajax so the application wont have a delay at all.


  15. Rein Krul said: September 7, 2009 at 8:43 pm | Permalink

    For the sake of re-usability I’d go with a generic caching class which you use for everything you need to cache. You could even make it abstract so you can make specialized caching mechanisms.


  16. Gaya said: September 7, 2009 at 9:31 pm | Permalink

    Nice to hear from you Rein! Thanks for the comment. You are totally right!
    This is a small example on how to cache Last.fm. But yes, make a basic class and then XML specific would be MUCH better indeed.
    There is always room for expansion.


  17. waffl said: September 12, 2009 at 3:26 pm | Permalink

    Hi. Firstly I would like to thank you for putting up a great in-depth tutorial. However I am having a problem adapting your script to my needs:

    In stripAndSaveFile function, how do I get a few data from an XML file? In my case, they are not in array unlike in your last.fm example but rather in separate unique tags. I’m sorry if I used the terms incorrectly as I am rather new in both programming & PHP.

    Thanks in advance.


  18. Philip John said: November 10, 2009 at 2:00 pm | Permalink

    Absolutely fantastic effort, thanks for sharing! I have one issue with it; if the cache doesn’t already exist it isn’t created. I’m going to try and figure it out anyway but it’d be good to have that included.

    For example, I’m using it for a Wordpress plugin that could end up generating hundreds of files in the cache (I also need to build a ‘clean up’ as well!) which obviously can’t be created manually!


  19. Gaya said: November 10, 2009 at 11:20 pm | Permalink

    Thanks for the comment Philip. You can certainly build something to create the cache file. You could do a file_exists before trying to write and create a file if there isn’t any.

    Keep in mind that in PHP “safe mode” you can’t adjust the rights to a file, but there is always a solution right? ;)

    Let me know if you get that WP plugin done, I’d like to check that out :)


  20. Philip John said: November 16, 2009 at 8:22 pm | Permalink

    Cheers Gaya, I’ll definitely shout when it’s done – it’ll be epic. I did wrap lines 20-33 in the following;
    if ($this->checkForRenewal()) {

    } else { // cache doesn’t exist
    //get the data you need
    $xml = $this->getExternalInfo();

    //save the data to your file
    $this->stripAndSaveFile($xml);
    }

    Thing is file_put_contents() isn’t doing it’s job properly now! I get a “No such file or directory” error even though that function should create the file if it doesn’t exist…and the permissions are all fine. Ho-hum!


  21. Ben Chapman said: January 8, 2010 at 7:21 am | Permalink

    Thanks heaps for this, still trying to get my head around PHP after years of ASP/SQL and 5 years of no development. Got it working pretty quickly, I did extend the class to take the cache time as an input variable as I needed to call it multiple times for different purposes (and different caching requirements).


  22. thom said: March 2, 2010 at 2:22 pm | Permalink

    Hi Gaya,

    Great tutorial but from the outset I’ve been getting this:

    Warning: filemtime() [function.filemtime]: stat failed for /home/fhlinux007/t/thomcurtis.co.uk/user/htdocs/scripts/caching/cache/lastfm.xml in /home/fhlinux007/t/thomcurtis.co.uk/user/htdocs/facebook/Caching.php on line 46

    The code is identical and my permissions are set correctly. I don’t know what to do!


  23. thom said: March 2, 2010 at 2:23 pm | Permalink

    Ignore the last comment, as soon as I posted it I saw what was up, but now my main page that calls Caching.php, loads blank.


  24. thom said: March 2, 2010 at 3:09 pm | Permalink

    …and the XML hasn’t changed.


  25. dolido said: May 16, 2010 at 3:30 pm | Permalink

    thanks 4 this post.


Leave a Comment

Your email is never shared. Get your own avatar with gravatar! Required fields are marked *

*

*