Indexing time periods using MySQL and GNU date

2012-02-28

Periodic data in a database is something i use quite a bit. The problem is using a date time column or a date and period column for each table is quite cumbersome. To make things easier create a time dimension table:

 
CREATE TABLE time (
time_id mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
date date NOT NULL,
tp tinyint(3) unsigned NOT NULL,
year smallint(5) unsigned NOT NULL,
month tinyint(3) unsigned NOT NULL,
day_of_year smallint(5) unsigned NOT NULL,
day_of_week tinyint(3) unsigned NOT NULL,
PRIMARY KEY (time_id),
KEY date (date) USING BTREE
);
 
This indexes every date, and tp column to a time_id. In this case tp is a number from 1 to 48 for half hour periods through out the day. The rest of the columns are use to make queries using time clauses on the data easier.
 
To populate this table, use the excellent gnu date command to specify a starting date and a count of the number of days and a format string that can be used to insert into the database. Here are the commands typed into bash to create the sql file to load into the table:

 

$ daycount=$((366*17))
$ tpcount=48
$ { for ((i=0;$i<$daycount;i=$i+1)); do
for ((j=1;j<=$tpcount;j=$j+1)); do
gdate -d "1996-01-01 + $i days" +"insert into time (date, tp, year, month, day_of_year, day_of_week) values ('%Y-%m-%d', $j, %Y, %m, %j, %u);";
done;
done } > time_data.sql
 
This creates a file with insert statements for 17 years starting from 1996-01-01 for every half hour. After loading into the database, a check is to see how many periods / year there are:
mysql> select year, count(1) from time group by year order by year;
+------+----------+
| year | count(1) |
+------+----------+
| 1996 | 17568 |
| 1997 | 17520 |
| 1998 | 17520 |
| 1999 | 17520 |
| 2000 | 17568 |
| 2001 | 17520 |
| 2002 | 17520 |
| 2003 | 17520 |
| 2004 | 17568 |
| 2005 | 17520 |
| 2006 | 17520 |
| 2007 | 17520 |
| 2008 | 17568 |
| 2009 | 17520 |
| 2010 | 17520 |
| 2011 | 17520 |
| 2012 | 17568 |
| 2013 | 576 |
+------+----------+
18 rows in set (0.11 sec)

Here the leap years have 48 more periods.
Given this price, quantity test data for the first 3 months of the year 2000:

create table price (time_id mediumint not null, price decimal(8,2));
create table quantity (time_id mediumint not null, quantity int);

insert into price select time_id, cast(rand()*1000 as decimal(8,2)) from dim.time where year = 2000 and month in (1,2,3);
insert into quantity select time_id, round(rand()*1000) from dim.time where year = 2000 and month in (1,2,3);

Queries like this make it easy to get information:

mysql> select date, tp, price*quantity cost from price natural join quantity natural join dim.time order by date,tp limit 3;
+------------+----+-----------+
| date | tp | cost |
+------------+----+-----------+
| 2000-01-01 | 1 | 111661.86 |
| 2000-01-01 | 2 | 452187.36 |
| 2000-01-01 | 3 | 57724.85 |
+------------+----+-----------+
3 rows in set (1.28 sec)
or:
select date, tp, a.price, b.price, round((b.price / a.price),2) * 100 price_change from price a, price b, dim.time where b.time_id = a.time_id + 48 and b.time_id = time.time_id limit 10;
+------------+----+--------+--------+--------------+
| date | tp | price | price | price_change |
+------------+----+--------+--------+--------------+
| 2000-01-02 | 1 | 979.49 | 923.78 | 94.00 |
| 2000-01-02 | 2 | 509.22 | 10.22 | 2.00 |
| 2000-01-02 | 3 | 607.63 | 279.77 | 46.00 |
| 2000-01-02 | 4 | 510.47 | 368.20 | 72.00 |
| 2000-01-02 | 5 | 729.49 | 1.70 | 0.00 |
| 2000-01-02 | 6 | 116.03 | 903.87 | 779.00 |
| 2000-01-02 | 7 | 391.66 | 514.25 | 131.00 |
| 2000-01-02 | 8 | 610.23 | 859.65 | 141.00 |
| 2000-01-02 | 9 | 876.16 | 755.51 | 86.00 |
| 2000-01-02 | 10 | 550.10 | 198.58 | 36.00 |
+------------+----+--------+--------+--------------+
 
As well as easy to query, it can make your data tables smaller by only having a integer of the right length  to specify the time.