| PostgreSQL 8.3beta1 Documentation | ||||
|---|---|---|---|---|
| Prev | Fast Backward | Chapter 12. Full Text Search | Fast Forward | Next |
Function ts_debug allows easy testing of your full text searching
configuration.
ts_debug([config_name], document TEXT) returns SETOF ts_debug
ts_debug displays information about every token of
document as produced by the
parser and processed by the configured dictionaries using the configuration
specified by config_name.
ts_debug's result type is defined as:
CREATE TYPE ts_debug AS (
"Alias" text,
"Description" text,
"Token" text,
"Dictionaries" regdictionary[],
"Lexized token" text
);
For a demonstration of how function ts_debug works we
first create a public.english configuration and
ispell dictionary for the English language. You can skip the test step and
play with the standard english configuration.
CREATE TEXT SEARCH CONFIGURATION public.english ( COPY = pg_catalog.english );
CREATE TEXT SEARCH DICTIONARY english_ispell (
TEMPLATE = ispell,
DictFile = english,
AffFile = english,
StopWords = english
);
ALTER TEXT SEARCH CONFIGURATION public.english
ALTER MAPPING FOR lword WITH english_ispell, english_stem;SELECT * FROM ts_debug('public.english','The Brightest supernovaes');
Alias | Description | Token | Dictionaries | Lexized token
-------+---------------+-------------+---------------------------------------+---------------------------------
lword | Latin word | The | {public.english_ispell,pg_catalog.english_stem} | public.english_ispell: {}
blank | Space symbols | | |
lword | Latin word | Brightest | {public.english_ispell,pg_catalog.english_stem} | public.english_ispell: {bright}
blank | Space symbols | | |
lword | Latin word | supernovaes | {public.english_ispell,pg_catalog.english_stem} | pg_catalog.english_stem: {supernova}
(5 rows)In this example, the word Brightest was recognized by a parser as a Latin word (alias lword) and came through the dictionaries public.english_ispell and pg_catalog.english_stem. It was recognized by public.english_ispell, which reduced it to the noun bright. The word supernovaes is unknown by the public.english_ispell dictionary so it was passed to the next dictionary, and, fortunately, was recognized (in fact, public.english_stem is a stemming dictionary and recognizes everything; that is why it was placed at the end of the dictionary stack).
The word The was recognized by public.english_ispell dictionary as a stop word (Section 12.4.1) and will not be indexed.
You can always explicitly specify which columns you want to see:
SELECT "Alias", "Token", "Lexized token"
FROM ts_debug('public.english','The Brightest supernovaes');
Alias | Token | Lexized token
-------+-------------+---------------------------------
lword | The | public.english_ispell: {}
blank | |
lword | Brightest | public.english_ispell: {bright}
blank | |
lword | supernovaes | pg_catalog.english_stem: {supernova}
(5 rows)