[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
MATCH (col1,col2,...) AGAINST (expr [IN BOOLEAN MODE | WITH QUERY EXPANSION] ) |
As of Version 3.23.23, MySQL has support for full-text indexing
and searching. Full-text indexes in MySQL are an index of type
FULLTEXT
. FULLTEXT
indexes are used with MyISAM
tables
only and can be created from CHAR
, VARCHAR
,
or TEXT
columns at CREATE TABLE
time or added later with
ALTER TABLE
or CREATE INDEX
. For large datasets, it will be
much faster to load your data into a table that has no FULLTEXT
index, then create the index with ALTER TABLE
(or
CREATE INDEX
). Loading data into a table that already has a
FULLTEXT
index could be significantly slower.
Full-text searching is performed with the MATCH()
function.
mysql> CREATE TABLE articles ( -> id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY, -> title VARCHAR(200), -> body TEXT, -> FULLTEXT (title,body) -> ); Query OK, 0 rows affected (0.00 sec) mysql> INSERT INTO articles VALUES -> (NULL,'MySQL Tutorial', 'DBMS stands for DataBase ...'), -> (NULL,'How To Use MySQL Efficiently', 'After you went through a ...'), -> (NULL,'Optimizing MySQL','In this tutorial we will show ...'), -> (NULL,'1001 MySQL Tricks','1. Never run mysqld as root. 2. ...'), -> (NULL,'MySQL vs. YourSQL', 'In the following database comparison ...'), -> (NULL,'MySQL Security', 'When configured properly, MySQL ...'); Query OK, 6 rows affected (0.00 sec) Records: 6 Duplicates: 0 Warnings: 0 mysql> SELECT * FROM articles -> WHERE MATCH (title,body) AGAINST ('database'); +----+-------------------+------------------------------------------+ | id | title | body | +----+-------------------+------------------------------------------+ | 5 | MySQL vs. YourSQL | In the following database comparison ... | | 1 | MySQL Tutorial | DBMS stands for DataBase ... | +----+-------------------+------------------------------------------+ 2 rows in set (0.00 sec) |
The MATCH()
function performs a natural language search for a string
against a text collection (a set of one or more columns included in
a FULLTEXT
index). The search string is given as the argument to
AGAINST()
. The search is performed in case-insensitive fashion.
For every row in the table, MATCH()
returns a relevance value,
that is, a similarity measure between the search string and the text in
that row in the columns named in the MATCH()
list.
When MATCH()
is used in a WHERE
clause (see example above)
the rows returned are automatically sorted with highest relevance first.
Relevance values are non-negative floating-point numbers. Zero relevance
means no similarity. Relevance is computed based on the number of words
in the row, the number of unique words in that row, the total number of
words in the collection, and the number of documents (rows) that contain
a particular word.
It is also possible to perform a boolean mode search. This is explained later in the section.
The preceding example is a basic illustration showing how to use the
MATCH()
function. Rows are returned in order of decreasing
relevance.
The next example shows how to retrieve the relevance values explicitly.
As neither WHERE
nor ORDER BY
clauses are present, returned
rows are not ordered.
mysql> SELECT id,MATCH (title,body) AGAINST ('Tutorial') FROM articles; +----+-----------------------------------------+ | id | MATCH (title,body) AGAINST ('Tutorial') | +----+-----------------------------------------+ | 1 | 0.64840710366884 | | 2 | 0 | | 3 | 0.66266459031789 | | 4 | 0 | | 5 | 0 | | 6 | 0 | +----+-----------------------------------------+ 6 rows in set (0.00 sec) |
The following example is more complex. The query returns the relevance
and still sorts the rows in order of decreasing relevance. To achieve
this result, you should specify MATCH()
twice. This will cause no
additional overhead, because the MySQL optimizer will notice that the
two MATCH()
calls are identical and invoke the full-text search
code only once.
mysql> SELECT id, body, MATCH (title,body) AGAINST -> ('Security implications of running MySQL as root') AS score -> FROM articles WHERE MATCH (title,body) AGAINST -> ('Security implications of running MySQL as root'); +----+-------------------------------------+-----------------+ | id | body | score | +----+-------------------------------------+-----------------+ | 4 | 1. Never run mysqld as root. 2. ... | 1.5055546709332 | | 6 | When configured properly, MySQL ... | 1.31140957288 | +----+-------------------------------------+-----------------+ 2 rows in set (0.00 sec) |
As of Version 4.1.1, full-text search supports query expansion (in particular, its variant "blind query expansion"). It is generally useful when a search phrase is too short, which often means that the user is relying on implied knowledge that the full-text search engine usually lacks. For example, a user searching for "database" may really mean that "MySQL", "Oracle", "DB2", and "RDBMS" all are phrases that should match "databases" and should be returned, too. This is implied knowledge. Blind query expansion (also known as automatic relevance feedback) works by performing the search twice, where the search phrase for the second search is the original search phrase concatenated with the few top found documents from the first search. Thus, if one of these documents contained the word "databases" and the word "MySQL", then the second search will find the documents that contain the word "MySQL" but not "database". Another example could be searching for books by Georges Simenon about Maigret, when a user is not sure how to spell "Maigret". Then, searching for "Megre and the reluctant witnesses" will find only "Maigret and the Reluctant Witnesses" without query expansion, but all books with the word "Maigret" on the second pass of a search with query expansion. Note: Because blind query expansion tends to increase noise significantly, by returning non-relevant documents, it's only meaningful to use when a search phrase is rather short.
MySQL uses a very simple parser to split text into words. A "word" is any sequence of characters consisting of letters, digits, `'', and `_'. Any "word" that is present in the stopword list or is just too short is ignored. The default minimum length of words that will be found by full-text searches is four characters. This can be changed as described in 13.7.2 Fine-tuning MySQL Full-text Search.
Every correct word in the collection and in the query is weighted according to its significance in the query or collection. This way, a word that is present in many documents will have lower weight (and may even have a zero weight), because it has lower semantic value in this particular collection. Otherwise, if the word is rare, it will receive a higher weight. The weights of the words are then combined to compute the relevance of the row.
Such a technique works best with large collections (in fact, it was carefully tuned this way). For very small tables, word distribution does not reflect adequately their semantic value, and this model may sometimes produce bizarre results.
mysql> SELECT * FROM articles WHERE MATCH (title,body) AGAINST ('MySQL'); Empty set (0.00 sec) |
The search for the word MySQL
produces no results in the above
example, because that word is present in more than half the rows. As such,
it is effectively treated as a stopword (that is, a word with zero semantic
value). This is the most desirable behavior -- a natural language query
should not return every second row from a 1 GB table.
A word that matches half of rows in a table is less likely to locate relevant documents. In fact, it will most likely find plenty of irrelevant documents. We all know this happens far too often when we are trying to find something on the Internet with a search engine. It is with this reasoning that such rows have been assigned a low semantic value in this particular dataset.
As of Version 4.0.1, MySQL can also perform boolean full-text searches using
the IN BOOLEAN MODE
modifier.
mysql> SELECT * FROM articles WHERE MATCH (title,body) -> AGAINST ('+MySQL -YourSQL' IN BOOLEAN MODE); +----+------------------------------+-------------------------------------+ | id | title | body | +----+------------------------------+-------------------------------------+ | 1 | MySQL Tutorial | DBMS stands for DataBase ... | | 2 | How To Use MySQL Efficiently | After you went through a ... | | 3 | Optimizing MySQL | In this tutorial we will show ... | | 4 | 1001 MySQL Tricks | 1. Never run mysqld as root. 2. ... | | 6 | MySQL Security | When configured properly, MySQL ... | +----+------------------------------+-------------------------------------+ |
This query retrieved all the rows that contain the word MySQL
(note: the 50% threshold is not used), but that do not contain
the word YourSQL
. Note that a boolean mode search does not
automatically sort rows in order of decreasing relevance. You can
see this from result of the preceding query, where the row with the
highest relevance (the one that contains MySQL
twice) is listed
last, not first. A boolean full-text search can also work even without
a FULLTEXT
index, although it would be slow.
The boolean full-text search capability supports the following operators:
+
-
MATCH() ... AGAINST()
without the IN BOOLEAN
MODE
modifier.
< >
<
operator
decreases the contribution and the >
operator increases it.
See the example below.
( )
~
-
operator.
*
"
"
, matches only
rows that contain this phrase literally, as it was typed.
And here are some examples:
apple banana
+apple +juice
+apple macintosh
+apple -macintosh
+apple +(>turnover <strudel)
apple*
"some words"
13.7.1 Full-text Restrictions | ||
13.7.2 Fine-tuning MySQL Full-text Search | ||
13.7.3 Full-text Search TODO |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |