Caching external data in PHP
Caching data. If you are a developer you must have heard about it somewhere. Is it really that important? There is only one thing I can say to that: yes!
There are a lot of reasons why you should start caching data that has been calculated. The most common reason is to keep the owner of the data happy, saving him/her bandwidth and server capacity.
In this article I will be telling you how to cache data given from an external service, but can also be used to save local results.
Caching is also important for you! It can speed up your scripts and won’t bug your visitors with long loading times (which will get them walking.)
Imagine this: You have a very nice script which is very powerful but takes a while to load, since it calculates a whole lot of stuff. This action would take up about 4 seconds. Maybe it’s time to question if this action is really necessary every time a page is loaded, will this data fluctuate that much?
A great example of this problem is getting when you are getting your data from an external service. This will always require an other server to respond to your needs, slowing down the script. Wouldn’t it be better to get the information you need and save it on your server for the times you need that data?
There is also something else to keep in mind: How many times is the data updated and how many times do I need to refresh this data? If you are loading a list of things going on right at this moment, then you might not want to cache anything, for the information is real time. But if you want, let’s say, a weekly report on something. Then obviously you only need to update this data once a week.
To explain how to create your own caching method I am going to use another Last.fm API function. We are going to save last weeks favorite artist of a user and refresh this data once a week.
Getting the data to work with
First things first: create a folder using any FTP application. Make sure this folder is writable. I put a folder called “cache” in the root of my script folder. Make sure this folder is writable and readable for PHP, most people just CHMOD the folder to 777.
We’ll be writing our cached data in this folder. Also create the file you’ll be caching too, I used an empty file called “lastfm.xml”. Be sure to make it read and writable.
Next you want to go and grab any service you’d like. an XML provider, an RSS feed or even your own script. It doesn’t matter what you are caching. In this tutorial I am going to work with an XML providing service.
Next up is to create a new PHP class which is going to handle our caching and the gathering of data. Let’s create the class to do this for us. I created a file called “Caching.php” for the class to go in.
Take a look at this skeleton:
<?php
/*
* Caching A small PHP class to
*/
class Caching {
var $filePath = "";
var $apiURI = "";
function __construct() {
}
function checkForRenewal() {
}
function getExternalInfo() {
}
function stripAndSaveFile() {
}
}
?>
This piece of code will serve as the foundation of our class which will cache our information. We’ll be filling up each function step by step.
First we need a constructor which will accept the path to the file and a URI to the API. In this case, the XML data can be requested by only calling an URL.
The constructor will be executed the moment you instantiate the class.
A few things we’ll be doing in this function are:
- Check the input variables.
- Check if the local file needs to be updated.
- Get new data if refresh is needed and save it.
I’ve set up the following piece of code for the constructor:
function __construct($filePath, $apiURI) {
//check if the file path and api URI are specified, if not: break out of construct.
if (strlen($filePath) > 0 && strlen($apiURI) > 0) {
//set the local file path and api path
$this->filePath = $filePath;
$this->apiURI = $apiURI;
//does the file need to be updated?
if ($this->checkForRenewal()) {
//get the data you need
$xml = $this->getExternalInfo();
//save the data to your file
$this->stripAndSaveFile($xml);
} else {
//no need to update the file
return true;
}
} else {
echo "No file path and / or api URI specified.";
return false;
}
}
The process of this function is quiet self explanatory. So now we need to go and enable all the function calls within the constructor.
Making the functions work
First up is the time check called “checkForRenewal”. In this function we’ll be checking the file that has been set in the constructor and see if the last adjusted time has been a week ago. To accomplish this, we are going to compare the times using time() and filemtime(). These methods are extremely easy to use and can add any given amount of seconds to the time, which we need to check whether the file is a week old.
Use the following code for the “checkForRenewal” function:
function checkForRenewal() {
//set the caching time (in seconds)
$cachetime = (60 * 60 * 24 * 7); //one week
//get the file time
$filetimemod = filemtime($this->filePath) + $cachetime;
//if the renewal date is smaller than now, return true; else false (no need for update)
if ($filetimemod < time()) {
return true;
} else {
return false;
}
}
What this function does is get the time which the file was last modified and add the set seconds to it and see if that time is smaller than now. The cache time is defined in second, so 60 seconds * 60 minutes * 24 hours * 7 days would make a week worth of seconds. You can adjust this to any given time you want. For debugging I suggest setting the time to only 60 seconds.
The next function will be how to get the external of the URI we specified in the constructor. I’ve written a post about how to read XML in PHP before, so I’ll go over this function very briefly.
Reading the XML and saving it
In getExternalInfo() we are going to make the API call and just return the result as an XML set.
Take a look at the following code:
function getExternalInfo() {
if ($xml = @simplexml_load_file($this->apiURI)) {
return $xml;
} else {
return false;
}
}
This must have been one of the shortest PHP function I ever wrote, this is all you need to do in order to get the XML you want (if the service provides XML through just doing simple URI calls.)
Next up is going through the XML and only keeping parts you want to save in your XML file.
Again: going to skip through some parts of the next function because you can read more about reading xml in an older article.
Looking at the XML returned by Last.fm I saw the first problem: unnecessary information. The name of the artist and play count are the only things I am interested in. Mbid? Don’t need that. Url? Not this time!
So the next function, “stripAndSaveFile”, will make us read XML, create our own and save it to a file. Sounds simple enough right?
Take a look at what I wrote:
function stripAndSaveFile($xml) {
//put the artists in an array
$artists = $xml->weeklyartistchart->artist;
//building the xml object for SimpleXML
$output = new SimpleXMLElement("<artists></artists>");
//get only the top 10
for ($i = 0; $i < 10; $i++) {
//create a new artist
$insert = $output->addChild("artist");
//insert name and playcount childs to the artist
$insert->addChild("name", $artists[$i]->name);
$insert->addChild("playcount", $artists[$i]->playcount);
}
//save the xml in the cache
file_put_contents($this->filePath, $output->asXML());
}
First thing it does is put the
new SimpleXMLElement(“
In the the loop that comes next we’re looping through the first ten entries and create a new artist for every time you are in the loop.
$insert is a new node that will be inserted into the
Pfew! Now it’s time to save the file on your file system! Which is the last line in this function. It will export the XML as a string and save it in the file.
Making the call!
Up until now we haven’t called the function once. So create a new script and put the following code in it:
<?php
ini_set('display_errors', 1);
error_reporting(E_ALL);
include('Caching.php');
$caching = new Caching("cache/lastfm.xml",
"http://ws.audioscrobbler.com/2.0/?method=user.getweeklyartistchart&user=xgayax&api_key=b25b959554ed76058ac220b7b2e0a026");
?>
I’ve put the first two lines in to ensure errors will be shown. If you have that by default; feel free to remove these lines.
Next is including the class file and then make a new instance of the class, giving the two required parameters: The path to the cache XML and the URL to the Last.fm API.
You won’t see anything if it’s correct. Now check out your cache to see if your XML is saved.
Congratulations! You just cached some information.
The source
Missed anything? Here is the source in it’s full galore!
The Caching class:
<?php
/*
* Caching A small PHP class to get data from Last.fm and cache it
* Author: Gaya Kessler
* URL: http://www.gayadesign.com/
*/
class Caching {
var $filePath = "";
var $apiURI = "";
function __construct($filePath, $apiURI) {
//check if the file path and api URI are specified, if not: break out of construct.
if (strlen($filePath) > 0 && strlen($apiURI) > 0) {
//set the local file path and api path
$this->filePath = $filePath;
$this->apiURI = $apiURI;
//does the file need to be updated?
if ($this->checkForRenewal()) {
//get the data you need
$xml = $this->getExternalInfo();
//save the data to your file
$this->stripAndSaveFile($xml);
return true;
} else {
//no need to update the file
return true;
}
} else {
echo "No file path and / or api URI specified.";
return false;
}
}
function checkForRenewal() {
//set the caching time (in seconds)
$cachetime = (60 * 60 * 24 * 7); //one week
//get the file time
$filetimemod = filemtime($this->filePath) + $cachetime;
//if the renewal date is smaller than now, return true; else false (no need for update)
if ($filetimemod < time()) {
return true;
} else {
return false;
}
}
function getExternalInfo() {
if ($xml = @simplexml_load_file($this->apiURI)) {
return $xml;
} else {
return false;
}
}
function stripAndSaveFile($xml) {
//put the artists in an array
$artists = $xml->weeklyartistchart->artist;
//building the xml object for SimpleXML
$output = new SimpleXMLElement("<artists></artists>");
//get only the top 10
for ($i = 0; $i < 10; $i++) {
//create a new artist
$insert = $output->addChild("artist");
//insert name and playcount childs to the artist
$insert->addChild("name", $artists[$i]->name);
$insert->addChild("playcount", $artists[$i]->playcount);
}
//save the xml in the cache
file_put_contents($this->filePath, $output->asXML());
}
}
?>
Calling the scripts:
<?php
ini_set('display_errors', 1);
error_reporting(E_ALL);
include('Caching.php');
$caching = new Caching($_SERVER['DOCUMENT_ROOT']."/scripts/caching/cache/lastfm.xml",
"http://ws.audioscrobbler.com/2.0/?method=user.getweeklyartistchart&user=xgayax&api_key=b25b959554ed76058ac220b7b2e0a026");
?>
Articles like this one
24 Comments
-
Whoah, that’s one detailed tutorial.
One question: Why aren’t you user cURL for this? It’s way more easier, multi-threaded/faster and optimized to do this kind of stuff..
-
Hi Marco, thanks for the comment.
I bet simplexml_load_file uses cURL, I am not sure. This way the URL is converted into XML nodes and not just the string :)
It’s to keep things easy I guess haha.
-
No, simplexml_load_file sadly doesn’t use cURL.
You should look into cURL and the retrieved data can be parsed with simplexml_load_file ;) . It’ll improve the performance drastically!
One downside: cURL isn’t enabled on every server :) .
Anyway, once again thanks for sharing this useful information. Keep up the great work mate!
-
Ah I see. I’ve used cURL before, didn’t know it was so much faster haha!
This version does make it a hell of a lot easier :P
Thanks for commenting Marco! You also keep up the goods!
-
You can save many typing with some tricks!
strlen($filePath) > 0 && strlen($apiURI) > 0 => Simply use strlen($filePath) && strlen($apiURI)
Or
($filetimemod return $filetimemod < time()!
Anyway, very detail tutorial!
And should add an extra parameter to define the “cache* time in method checkForRenewal()!
-
Hi Vincent!
Thanks for your two cents.
Yeah, these things can improve the length of my code, but for me it doesn’t really improve the readability.
I was thinking about making the cache time a parameter for the constructor. But then again… it would fit there since the class specifically caches the Last.fm.
Anyway: Thanks for this useful comment!
-
I agree with the above poster about cURL. It does reduce load time and reduces system resources.
This is great though Gaya, thanks for sharing!
-
Thanks for the comment Evan :)
I surely will try to compare this with cURL next time :)
-
New article!!!
-
Patience Sieb :) Patience
-
Thanks for the great article. I Dugg it. You can too. http://digg.com/d1t4kO
-
Thanks Dennis! And thanks for the digg :D
-
thanks for sharing!
-
Nice tutorial, did something similar recently since i have intgrated google anlytics with my CMS i have made sure to cache the results in the DB, istead of the file system, and only update them if they are older than a day so my application will sort of run a psudo cron job every 1 day to refresh the data and then just get whats in the DBs cache.
I actually store all the feed data as an array and store the serialized version of that array in the DB together with the date it was fetched.
although i need to improve it a bit since when you load the page and it it needs to refresh the data from google analytics it will takes 5-6 sec and its annoying. I should trigger the data refresh via ajax so the application wont have a delay at all.
-
For the sake of re-usability I’d go with a generic caching class which you use for everything you need to cache. You could even make it abstract so you can make specialized caching mechanisms.
-
Nice to hear from you Rein! Thanks for the comment. You are totally right!
This is a small example on how to cache Last.fm. But yes, make a basic class and then XML specific would be MUCH better indeed.
There is always room for expansion.
-
Hi. Firstly I would like to thank you for putting up a great in-depth tutorial. However I am having a problem adapting your script to my needs:
In stripAndSaveFile function, how do I get a few data from an XML file? In my case, they are not in array unlike in your last.fm example but rather in separate unique tags. I’m sorry if I used the terms incorrectly as I am rather new in both programming & PHP.
Thanks in advance.
-
Absolutely fantastic effort, thanks for sharing! I have one issue with it; if the cache doesn’t already exist it isn’t created. I’m going to try and figure it out anyway but it’d be good to have that included.
For example, I’m using it for a Wordpress plugin that could end up generating hundreds of files in the cache (I also need to build a ‘clean up’ as well!) which obviously can’t be created manually!
-
Thanks for the comment Philip. You can certainly build something to create the cache file. You could do a file_exists before trying to write and create a file if there isn’t any.
Keep in mind that in PHP “safe mode” you can’t adjust the rights to a file, but there is always a solution right? ;)
Let me know if you get that WP plugin done, I’d like to check that out :)
-
Cheers Gaya, I’ll definitely shout when it’s done – it’ll be epic. I did wrap lines 20-33 in the following;
if ($this->checkForRenewal()) {
…
} else { // cache doesn’t exist
//get the data you need
$xml = $this->getExternalInfo();//save the data to your file
$this->stripAndSaveFile($xml);
}Thing is file_put_contents() isn’t doing it’s job properly now! I get a “No such file or directory” error even though that function should create the file if it doesn’t exist…and the permissions are all fine. Ho-hum!
-
Thanks heaps for this, still trying to get my head around PHP after years of ASP/SQL and 5 years of no development. Got it working pretty quickly, I did extend the class to take the cache time as an input variable as I needed to call it multiple times for different purposes (and different caching requirements).
-
Hi Gaya,
Great tutorial but from the outset I’ve been getting this:
Warning: filemtime() [function.filemtime]: stat failed for /home/fhlinux007/t/thomcurtis.co.uk/user/htdocs/scripts/caching/cache/lastfm.xml in /home/fhlinux007/t/thomcurtis.co.uk/user/htdocs/facebook/Caching.php on line 46
The code is identical and my permissions are set correctly. I don’t know what to do!
-
Ignore the last comment, as soon as I posted it I saw what was up, but now my main page that calls Caching.php, loads blank.
-
…and the XML hasn’t changed.


Recent comments