Καλώς ορίσατε στο dotNETZone.gr - Σύνδεση | Εγγραφή | Βοήθεια

How to load the Greek stemmer and word breaker for SQL Server Full Text Search

Ever wondered how you can use full text search for Greek text in SQL Server? Out of the box SQL Server doesn't provide any stemmers or word breakers for Greek, which makes FTS work similar to a simple LIKE search. Fortunately, the same binary interfaces are used across all Microsoft products which means that you can use the stemmers and word breakers from other products to enable FTS in SQL Server - as long as you have the license for them!

As a technical excercise, you can use the Greek stemmer and word breaker from Sharepoint Server, which are described in KB929912. All you have to do is to add the appropriate registry entries to SQL Server. The generic process for adding new word breakers and stemmers to SQL Server is described in How To: Load Licensed Third Party Word Breakers . The steps are as follows:

  1. Copy the grclr.dll, grste.dll, grcste.lex files from the C:\PROGRAM FILES\MICROSOFT OFFICE SERVERS\12.0\Bin\ folder to C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\Binn folder. SQL Server will look into this folder to locate stemmers and word breakers.
  2. Add the proper registry entries for Locale, WBreakerClass and StemmerClass. For convenience you can create a registry file with the proper values:
    Windows Registry Editor Version 5.00
    
    [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSearch\CLSID]
    
    [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSearch\CLSID\{1FB980F8-1764-4920-B8E5-89E341205B4A}]
    "Default"="grclr.dll"
    
    [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSearch\CLSID\{3E9A499D-1A5C-4ca8-B948-C5D18DC466B1}]
    "Default"="grcste.dll"
    
    [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSearch\Language]
    
    [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSearch\Language\grc]
    "TsaurusFile"="tsgrc.xml"
    "Locale"=dword:00000408
    "WBreakerClass"="{1FB980F8-1764-4920-B8E5-89E341205B4A}"
    "StemmerClass"="{3E9A499D-1A5C-4ca8-B948-C5D18DC466B1}"
  3. Execute exec sp_fulltext_service 'update_languages'; in SQL Server Management Studio to load Greek FTS support.
  4. Execute 
    select * from sys.fulltext_languages where lcid=1032
    to verify that Greek is actually detected.
    Note that you may have to restart the SQL Server Filter Daemon from Services before SQL Server actually starts using Greek stemming.

After the three steps above, you can start using the CONTAINS and FREETEXT T-SQL functions to search text and files in Greek. Running

select * from dbo.testwords where freetext([words],N'Τρέχω')

will return

ID, Words
--  ---------
10, Έτρεξα
11, Έτρεχα
15, Τρέχει
16, Τρέχεις
17, Τρέχω

WARNING: This article describes just a technical excercise! I do not know if the Greek stemmer can actually be used outside a Sharepoint installation. You should contact Microsoft Hellas to clarify whether and how you can use the stemmer.

And to think that when the Greek community asked in MS Connect for Greek FTS support  we were told that there were "no immediate plans to support Greek FTS"! When it was already available for over a year already!

Έχουν δημοσιευτεί Δευτέρα, 25 Ιανουαρίου 2010 6:14 μμ από το μέλος Παναγιώτης Καναβός
Δημοσίευση στην κατηγορία: , ,

Ενημέρωση για Σχόλια

Αν θα θέλατε να λαμβάνετε ένα e-mail όταν γίνονται ανανεώσεις στο περιεχόμενο αυτής της δημοσίευσης, παρακαλούμε γίνετε συνδρομητής εδώ

Παραμείνετε ενήμεροι στα τελευταία σχόλια με την χρήση του αγαπημένου σας RSS Aggregator και συνδρομή στη Τροφοδοσία RSS με σχόλια

Σχόλια:

Χωρίς Σχόλια

Ποιά είναι η άποψή σας για την παραπάνω δημοσίευση;

(απαιτούμενο)
απαιτούμενο
(απαιτούμενο)
ÅéóÜãåôå ôïí êùäéêü:
CAPTCHA Image