Store
Community Documentation

v3 Knowledgebase

Reduce the size of data tables with a cron job

When you first start your website, everything seems to respond quickly. You point at the dashboard and it pops right up. Then after a few months and your 5000th member enrolls, you start to notice a bit of a delay between asking for a page and getting it.

Some of the problem could be in the size of the data tables. The first week of your site there may be a few thousand records in, let’s say, the feed table. A few months later, there may be 1,000,000 records. If there are complex sql calls to these records, the number of records processed with one call might be many times that number.

There are solutions to these problems with more sophisticated server options, but it might help in the short term to try to keep the number of records to a reasonable value.

I presume that you don't want to manually go in and hack out records every few days. Something like that is more suited to a cron job. These are bits of code that are run periodically, generally to do maintenance type tasks. Fortunately, phpfox has a good cron system that you can easily use to perform these tasks.
That being said, I have to admit that phpfox does not provide a good user interface to get to the cron system. For that purpose, I would recommend an add-on that you can find here: http://www.phpfox.com/addons/view/cronjob-manager/ courtesy of AE::Brian.

This will give you an admincp access to a user-friendly interface to the cron system. Here you can select to add the cron job and set up how often you wish it to run. The code that you insert here is stored in the database table 'cron'. There is also a table cron_log which keeps a log of the cron jobs that have been completed and when they were run. As far as I know, you can only see this table with phpMyAdmin.

Typically the code that you insert into the cron field known as php_code is a call to an existing method within some module's service class. In the case of the mail cron job, it is
PHP:
Phpfox::getService('mail.process')- >cronDeleteMessages(); 


This is the preferred way of doing these, but it is not the only way. Since we are operating within the phpfox system, we can code just about anything that we could if we were building a new mod or creating a block. This lets us expand our capabilities somewhat.

Let us take the comments table as an example. One of the members asked about trimming the size of this table. Is there a way to use cron to do this? Well, there is no function within the comment services (that I am aware of) to specifically do this task. There is a delete function, but it involves a lot of checks and restrictions that a cron job may not be able to meet. We may need to code this one from scratch.

Let's look at what we need to do first. peeking at the phpfox database from phpMyAdmin, I see a base table called comment with some auxiliary tables called comment_hash, comment_rating, and comment_text. I am not sure what comment_hash is. It is empty in all of the databases to which I have access. But since it does not appear to be keyed to the main file, I won't worry about this table.
The other two tables have the comment_id field and consequently are keyed to the main file, so if we delete an entry in the comment file it falls on us to also delete the other connected table entries. Otherwise they will just hang out there forever, mocking our attempts to streamline our database.

Now we can produce some actual code. We could just call the library function to delete records whose timestamp is older than a certain date. The trouble is that the comment_text table has no timestamp. One way to deal with this is to load an array of the comment id's where the timestamp is older than a certain date. Then we could use the array to do a batch of deletes. The code for that would look like this:

PHP:
$iAgeInDays 200;
$iTime = (Phpfox::getTime() - ($iAgeInDays CRON_ONE_DAY));
$aCommentsToDelete Phpfox::getLib('phpfox.database')->select('comment_id')
    ->
from(Phpfox::getT('comment'))
    ->
where('time_stamp < ' $iTime)
    ->
execute('getSlaveRows');
foreach (
$aCommentsToDelete as $aComment)
{
    if (isset(
$aComment['comment_id']))
    {
        
Phpfox::getLib('phpfox.database')->delete(Phpfox::getT('comment'), 'comment_id = ' . (int)$aComment 
['comment_id']);
        
Phpfox::getLib('phpfox.database')->delete(Phpfox::getT('comment_text'), 'comment_id = ' .  
(int)
$aComment['comment_id']);
        
Phpfox::getLib('phpfox.database')->delete(Phpfox::getT('comment_rating'), 'comment_id = ' .  
(int)
$aComment['comment_id']);
    }


The $iAgeInDays = 200 tells our cron job to delete records older than 200 days. The next element of our code builds an array of comment_id's that meet the deletion criteria. Then, in the 'for each' process, we delete the records in turn from each of the three tables.

Another way to approach this is to let mysql do the work. This is potentially faster, but contains a more complex query that always tends to scare the heck out of me. Anyway, setting that aside, we could build a query that would look something like this:

PHP:
$iAgeInDays 200;
$iTime = (Phpfox::getTime() - ($iAgeInDays CRON_ONE_DAY));
$sScaryQuery "
delete c.*, ct.*, cr.*
 FROM " 
Phpfox::getT('comment') . " c
INNER JOIN " 
Phpfox::getT('comment_text') . " ct
 ON c.comment_id = ct.comment_id
LEFT JOIN " 
Phpfox::getT('comment_rating') . " cr 
 ON c.comment_id = cr.comment_id 
 WHERE c.time_stamp < ' " 
$iTime " ' ";
Phpfox::getLib('database')->query($sScaryQuery); 


I have tried both of these methods on my dev site and they both appeared to work. If you try to write some "Scary Queries" like the one above, there are some things that you can do to minimize your risks. If you are using phpMyAdmin to test your sql's on a test database, you can substitute 'select' for the 'delete'. It will then let you view the records that will be deleted when you run it with the 'delete' in place. If it returns the entire table you can shudder a couple of times and then correct your code instead of reloading your database.

You can do other crons for other tables that tend to grow to unmanageable sizes. You will have to look at each circumstance to see which tables are involved and whether it will adversely affect any of phpfox's processes. For example, the comment cron that we are working with above might result in a record not being there if someone clicks on a feed item referencing that comment. This would most likely result in something like "item not found" returned, but it is preferable to not see these errors, so maybe the feed table should be trimmed correspondingly at least as far back as the Age you set for the comments.

Also, please do not run these crons on your main site without completely testing them on a development site. I show these for your education but if you want to try them, it is at your own risk.