Elasticsearch

Properties

Keywords

Document, Index, partition, shards

Installation

Dev-tools

Discover

Create index, field mapping

Loading Data to ES using python

Query data using dev-tools

Query data using python (Create a search engine using python)

BM25 algorithms for searching
Lucene Software
Query Functionalities available - radial search, weighted search

Update data using python

List indices using python

from elasticsearch import Elasticsearch

es = Elasticsearch([{'host': '', 'port': 9200}])

indices=es.indices.get_alias().keys()

Mapping or Schema in ES

Querying in ES

term match

must vs should term search

exact match

radial search

Weighted search on different fields

Lucene Search Engine

Scoring in ES

The scoring function is a mathematical expression for finding a value for the relative importance of different documents
Scoring is very much is based on the TF-IDF, used in the document/information retrieval

score (q,d) = queryNorm(q) * coord(q,d) * idf(t)2 * t.getBoost() * norm(t,d) (t in q)

score (q,d): a relevance score of a document d for query q
queryNorm (q): query normalization factor
- queryNorm = 1 / √sumOfSquaredWeights
- sumOfSquaredWeights is computed by adding together the IDF of each term in the query, squared
- QNF is the ratio that aims to make the results of different queries comparable. It is calculated at the beginning of each query using the above formula
coord (q,d): query coordination factor
- term score * number of matching terms / total number of terms in the query.
- In the case of a multi-term query, the coordination factor rewards the documents that contain a higher number of terms of that query.
- The more query terms appear in the document the more relevance it might have.
- For simplicity, let’s say you have a query with three terms: “nice,” “red,” and “carpet,” each with a 1.5 score.
- So, for example, the document that matches “nice red” will have 3.0 * 2 / 3 = 2.0 score. Of course, the documents that contain all three terms will be much more relevant than the document that contains just two of them.
tf (t in d): term frequency of the term t in document d
- =tf(t in d) = √frequency
- Number of times each term appears in the document
idf (t): inverse document frequency for term t
- idf = 1 + ln(numDocs/(docFreq + 1))
- Inverse document frequency (IDF) assigns low weight/relevance to terms that appear frequently in all of the documents in the index
t.getBoost(): the boost applied to the query
norm(t,d) – the field-length norm
- norm = 1/sqrt(numFieldTerms)
- The value of this parameter depends on the document field length in which a match with the query was found

Querying same index twice can return documents in different order

https://www.elastic.co/guide/en/elasticsearch/reference/current/consistent-scoring.html

Updating a Document in ES

Reference for Further Reading

PreviousDatabase NextNeo4j

Last updated 3 years ago

Was this helpful?