Saturday, August 1, 2009

Smaller History.db (r73)

There's been a longstanding issue with the "archive and delete history" functionality. While it does (if configured to do so) save aside a text-dump copy of the history as it should, the history.db file never actually shrinks in size! In theory it shouldn't necessarily grow too much either as it should be making use of all that pre-allocated space, it still bothers some people (including me).

With the r73 commit, this problem has been fixed. As part of its "delete" processing, PeerBlock now compacts the database file, removing all that pre-allocated space.


Technical Details

Turns out that sqlite3 - which we're using to handle our cache.db and history.db information - has a "quirk" in it such that when you delete rows from database tables it doesn't actually remove the memory/filespace allocated to them. It adds them to a free-list, but still keeps them lying around.

For more details on this, see section 9.1 of the SQLite Optimization FAQ.

This is probably efficient for certain operations, but it does leave this file unnecessarily large sometimes. For example if you're logging a whole lot of traffic on one day, but then don't do that again for a month or so (or ever!), the database file will forevermore be big enough to store that one crazy day's worth of traffic. Seems like a waste to me.

What you need to do in order to actually shrink the database is perform a "vacuum" command. This will release all those empty data structures on the free-list, so that the file will shrink a bit.

Now looking at the code for r73, you'll see that I'm performing the delete, then a commit, then the vacuum. I'm not entirely sure how the sqlite3x wrapper we use works (I looked into it a bit, but since I was able to get my code working didn't bother spending the time to fully grok it), but it appears as though once a commit is made you can't make any more commits. Trying to "stack" the delete and vacuum commands into one commit threw an exception. So this is the only way I could find to get it to work. I guess vacuum doesn't require a commit?

No comments:

Post a Comment