1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

MySQL optimisation

Discussion in 'HTML, Graphics & Programming' started by toastyman, 1 Jun 2010.

  1. toastyman

    Gangster

    Joined: 30 Dec 2005

    Posts: 421

    Hi guys,

    I'm developing a high performance, heavy traffic web application and I want to make sure that I get the database structure perfect before I begin. The database could contain several million records and be queried thousands of times each second so optimisation and speed is essential.

    Basically the user can provide between one and five words. The idea is that it searches the database to find words that relate to the ones entered.

    So for example, if the user enters 'London pizza' it would search for records that matched 'london' and 'pizza'. It doesn't need to look within sentences and the words will match exactly in alphabetical order.

    I've created 3 possible database structures to cope with this - could anyone advise as to the best one?



    SCENARIO 1 - using 2 tables, 1 inner join - I read somewhere that you should try to avoid joins in tables containing large numbers of records.. although this would minimise disk usage would it be as fast as the other scenarios?
    Code:
    'lookup' Table
    ID | word1 | word 2 | word 3 | word 4 | word 5
    ---------------------------
    1 | london | pizza
    2 | london | pasta
    3 | london | pizza | takeaway
    4 | london | pizza | restaurant
    
    
    'return' Table
    ID | lookup_ID  | property
    1 | 1 | fastfood
    2 | 2 | italian
    3 | 1 | takeaway
    4 | 3 | bla
    4 | 4 | blo
    
    
    SELECT `return`.`property` FROM `return` INNER JOIN `lookup` ON `return`.`lookup_ID`=`lookup`.`id` WHERE `lookup`.`word1`='london' AND `lookup`.`word2`='pizza'
    
    property
    ----------
    fastfood
    takeaway
    



    SCENARIO 2 - using 2 tables, 2 queries - similar to scenario one but instead of using an inner join it uses 2 simple select queries. Would this be faster / more efficient?
    Code:
    'lookup' Table
    ID | word1 | word 2 | word 3 | word 4 | word 5
    ---------------------------
    1 | london | pizza
    2 | london | pasta
    3 | london | pizza | takeaway
    4 | london | pizza | restaurant
    
    
    'return' Table
    ID | lookup_ID  | property
    1 | 1 | fastfood
    2 | 2 | italian
    3 | 1 | takeaway
    4 | 3 | bla
    4 | 4 | blo
    
    
    SELECT `ID` FROM `lookup` WHERE `word1`='london' AND `word2`='pizza'
    
    ID
    ----------
    1
    
    
    SELECT `property` FROM `return` WHERE `lookup_ID`='1'
    
    property
    ----------
    fastfood
    takeaway
    

    SCENARIO 3 - using 1 table, 1 query - Additional disk space required but one simple select query. Would the speed of the query outweigh the cost of using one huge table?
    Code:
    'lookup' Table
    ID | word1 | word 2 | word 3 | word 4 | word 5 | property
    ---------------------------
    1 | london | pizza | | | | fastfood 
    1 | london | pizza | | | | takeaway
    2 | london | pasta | | | | italian
    3 | london | pizza | takeaway | | | bla
    4 | london | pizza | restaurant | | | blo
    
    SELECT `property` FROM `lookup` WHERE `word1`='london' AND `word2`='pizza'
    
    property
    ----------
    fastfood
    takeaway
    

    Any tips / advice on the above would be brilliant. It might also be the case that there are alternative solutions that I haven't thought of. I'll also have to decide which fields to apply indexes to etc.

    Thanks in advance!
     
  2. daz

    Capodecina

    Joined: 18 Oct 2002

    Posts: 23,970

    Location: Bucks

    Use memcached if possible.

    De-normalize your database structure (avoid joins!) - storage is cheap compared to processing time.

    Yes, it would. :)
     
  3. GravyMonster

    Capodecina

    Joined: 18 Oct 2002

    Posts: 14,306

    Location: The land of milk & beans

    #3 will be fastest of all. As for #1 and #2, if there is little data in the two tables, then #1 would be the quicker of them. As more data is entered (couple of million rows) then #2 will start to close the gap and become faster.

    With #3 Just make sure the field sizes are as small as possible so that indexing remains fast when the amount of data ramps up.
     
  4. toastyman

    Gangster

    Joined: 30 Dec 2005

    Posts: 421

    Cheers guys, thanks for clarifying that. I'll go with #3 then as disk space won't be much of an issue.

    To get into more detail, could the table structure be optimised further? Would the order of the field names make any difference (selecting fields from the end of a table, returning a field from the beginning of the table)?

    I presume that indexes should be applied to the 5 word fields, but I'm wondering whether they should be separate indexes or one big one.. a search could just be for field 1 and field 2, whereas another search could search all 5 fields. Would I use ADD INDEX(word1,word2,etc..) or ADD INDEX(word1), ADD INDEX (word2) etc..

    Just had a look at memcached.. looks very interesting, I'll definitely look at using it (presuming I can get hold of a server with plenty of RAM).

    Thanks for the info guys, much appreciated.