YAZ  5.34.0
Typedefs | Functions
libstemmer.h File Reference

Go to the source code of this file.

Typedefs

typedef unsigned char sb_symbol
 

Functions

const char ** sb_stemmer_list (void)
 
struct sb_stemmersb_stemmer_new (const char *algorithm, const char *charenc)
 
void sb_stemmer_delete (struct sb_stemmer *stemmer)
 
const sb_symbolsb_stemmer_stem (struct sb_stemmer *stemmer, const sb_symbol *word, int size)
 
int sb_stemmer_length (struct sb_stemmer *stemmer)
 

Typedef Documentation

◆ sb_symbol

typedef unsigned char sb_symbol

Definition at line 8 of file libstemmer.h.

Function Documentation

◆ sb_stemmer_delete()

void sb_stemmer_delete ( struct sb_stemmer stemmer)

Delete a stemmer object.

This frees all resources allocated for the stemmer. After calling this function, the supplied stemmer may no longer be used in any way.

It is safe to pass a null pointer to this function - this will have no effect.

Definition at line 67 of file libstemmer.c.

References sb_stemmer::close, sb_stemmer::env, and free().

Referenced by main(), and sb_stemmer_new().

◆ sb_stemmer_length()

int sb_stemmer_length ( struct sb_stemmer stemmer)

Get the length of the result of the last stemmed word. This should not be called before sb_stemmer_stem() has been called.

Definition at line 92 of file libstemmer.c.

References sb_stemmer::env, and SN_env::l.

Referenced by stem_file().

◆ sb_stemmer_list()

const char** sb_stemmer_list ( void  )

Returns an array of the names of the available stemming algorithms. Note that these are the canonical names - aliases (ie, other names for the same algorithm) will not be included in the list. The list is terminated with a null pointer.

The list must not be modified in any way.

Definition at line 17 of file libstemmer.c.

References algorithm_names.

◆ sb_stemmer_new()

struct sb_stemmer* sb_stemmer_new ( const char *  algorithm,
const char *  charenc 
)

Create a new stemmer object, using the specified algorithm, for the specified character encoding.

All algorithms will usually be available in UTF-8, but may also be available in other character encodings.

Parameters
algorithmThe algorithm name. This is either the english name of the algorithm, or the 2 or 3 letter ISO 639 codes for the language. Note that case is significant in this parameter - the value should be supplied in lower case.
charencThe character encoding. NULL may be passed as this value, in which case UTF-8 encoding will be assumed. Otherwise, the argument may be one of "UTF_8", "ISO_8859_1" (ie, Latin 1), "CP850" (ie, MS-DOS Latin 1) or "KOI8_R" (Russian). Note that case is significant in this parameter.
Returns
NULL if the specified algorithm is not recognised, or the algorithm is not available for the requested encoding. Otherwise, returns a pointer to a newly created stemmer for the requested algorithm. The returned pointer must be deleted by calling sb_stemmer_delete().
Note
NULL will also be returned if an out of memory error occurs.

Definition at line 35 of file libstemmer.c.

References sb_stemmer::close, stemmer_modules::close, sb_stemmer::create, stemmer_modules::create, stemmer_modules::enc, ENC_UNKNOWN, sb_stemmer::env, malloc(), modules, stemmer_modules::name, sb_getenc(), sb_stemmer_delete(), sb_stemmer::stem, and stemmer_modules::stem.

Referenced by main().

◆ sb_stemmer_stem()

const sb_symbol* sb_stemmer_stem ( struct sb_stemmer stemmer,
const sb_symbol word,
int  size 
)

Stem a word.

The return value is owned by the stemmer - it must not be freed or modified, and it will become invalid when the stemmer is called again, or if the stemmer is freed.

The length of the return value can be obtained using sb_stemmer_length().

If an out-of-memory error occurs, this will return NULL.

Definition at line 77 of file libstemmer.c.

References sb_stemmer::env, SN_env::l, SN_env::p, SN_set_current(), and sb_stemmer::stem.

Referenced by stem_file().