MySQL optimisation

toastyman · 1 Jun 2010 at 12:17

Hi guys,

I'm developing a high performance, heavy traffic web application and I want to make sure that I get the database structure perfect before I begin. The database could contain several million records and be queried thousands of times each second so optimisation and speed is essential.

Basically the user can provide between one and five words. The idea is that it searches the database to find words that relate to the ones entered.

So for example, if the user enters 'London pizza' it would search for records that matched 'london' and 'pizza'. It doesn't need to look within sentences and the words will match exactly in alphabetical order.

I've created 3 possible database structures to cope with this - could anyone advise as to the best one?

SCENARIO 1 - using 2 tables, 1 inner join - I read somewhere that you should try to avoid joins in tables containing large numbers of records.. although this would minimise disk usage would it be as fast as the other scenarios?

Code:

'lookup' Table
ID | word1 | word 2 | word 3 | word 4 | word 5
---------------------------
1 | london | pizza
2 | london | pasta
3 | london | pizza | takeaway
4 | london | pizza | restaurant


'return' Table
ID | lookup_ID  | property
1 | 1 | fastfood
2 | 2 | italian
3 | 1 | takeaway
4 | 3 | bla
4 | 4 | blo


SELECT `return`.`property` FROM `return` INNER JOIN `lookup` ON `return`.`lookup_ID`=`lookup`.`id` WHERE `lookup`.`word1`='london' AND `lookup`.`word2`='pizza'

property
----------
fastfood
takeaway

SCENARIO 2 - using 2 tables, 2 queries - similar to scenario one but instead of using an inner join it uses 2 simple select queries. Would this be faster / more efficient?

Code:

'lookup' Table
ID | word1 | word 2 | word 3 | word 4 | word 5
---------------------------
1 | london | pizza
2 | london | pasta
3 | london | pizza | takeaway
4 | london | pizza | restaurant


'return' Table
ID | lookup_ID  | property
1 | 1 | fastfood
2 | 2 | italian
3 | 1 | takeaway
4 | 3 | bla
4 | 4 | blo


SELECT `ID` FROM `lookup` WHERE `word1`='london' AND `word2`='pizza'

ID
----------
1


SELECT `property` FROM `return` WHERE `lookup_ID`='1'

property
----------
fastfood
takeaway

SCENARIO 3 - using 1 table, 1 query - Additional disk space required but one simple select query. Would the speed of the query outweigh the cost of using one huge table?

Code:

'lookup' Table
ID | word1 | word 2 | word 3 | word 4 | word 5 | property
---------------------------
1 | london | pizza | | | | fastfood 
1 | london | pizza | | | | takeaway
2 | london | pasta | | | | italian
3 | london | pizza | takeaway | | | bla
4 | london | pizza | restaurant | | | blo

SELECT `property` FROM `lookup` WHERE `word1`='london' AND `word2`='pizza'

property
----------
fastfood
takeaway

Any tips / advice on the above would be brilliant. It might also be the case that there are alternative solutions that I haven't thought of. I'll also have to decide which fields to apply indexes to etc.

Thanks in advance!

daz · 1 Jun 2010 at 12:23

Use memcached if possible.

De-normalize your database structure (avoid joins!) - storage is cheap compared to processing time.

SCENARIO 3 - using 1 table, 1 query - Additional disk space required but one simple select query. Would the speed of the query outweigh the cost of using one huge table?

Yes, it would.

GravyMonster · 1 Jun 2010 at 12:34

#3 will be fastest of all. As for #1 and #2, if there is little data in the two tables, then #1 would be the quicker of them. As more data is entered (couple of million rows) then #2 will start to close the gap and become faster.

With #3 Just make sure the field sizes are as small as possible so that indexing remains fast when the amount of data ramps up.

toastyman · 1 Jun 2010 at 12:55

Cheers guys, thanks for clarifying that. I'll go with #3 then as disk space won't be much of an issue.

To get into more detail, could the table structure be optimised further? Would the order of the field names make any difference (selecting fields from the end of a table, returning a field from the beginning of the table)?

I presume that indexes should be applied to the 5 word fields, but I'm wondering whether they should be separate indexes or one big one.. a search could just be for field 1 and field 2, whereas another search could search all 5 fields. Would I use ADD INDEX(word1,word2,etc..) or ADD INDEX(word1), ADD INDEX (word2) etc..

Just had a look at memcached.. looks very interesting, I'll definitely look at using it (presuming I can get hold of a server with plenty of RAM).

Thanks for the info guys, much appreciated.