Elasticsearch
Properties
Keywords
Document, Index, partition, shards
Installation
Dev-tools
Discover
Create index, field mapping
Loading Data to ES using python
Query data using dev-tools
Query data using python (Create a search engine using python)
BM25 algorithms for searching
Lucene Software
Query Functionalities available - radial search, weighted search
Update data using python
List indices using python
from elasticsearch import Elasticsearch
es = Elasticsearch([{'host': '', 'port': 9200}])
indices=es.indices.get_alias().keys()
Mapping or Schema in ES
Querying in ES
term match
must vs should term search
exact match
radial search
Weighted search on different fields
Lucene Search Engine
Scoring in ES
The scoring function is a mathematical expression for finding a value for the relative importance of different documents
Scoring is very much is based on the TF-IDF, used in the document/information retrieval
score (q,d)
: a relevance score of a documentd
for queryq
queryNorm (q)
: query normalization factorqueryNorm = 1 / √sumOfSquaredWeights
sumOfSquaredWeights
is computed by adding together the IDF of each term in the query, squaredQNF is the ratio that aims to make the results of different queries comparable. It is calculated at the beginning of each query using the above formula
coord (q,d):
query coordination factorterm score * number of matching terms / total number of terms in the query.
In the case of a multi-term query, the coordination factor rewards the documents that contain a higher number of terms of that query.
The more query terms appear in the document the more relevance it might have.
For simplicity, let’s say you have a query with three terms: “nice,” “red,” and “carpet,” each with a 1.5 score.
So, for example, the document that matches “nice red” will have 3.0 * 2 / 3 = 2.0 score. Of course, the documents that contain all three terms will be much more relevant than the document that contains just two of them.
tf (t in d)
: term frequency of the termt
in documentd
=
tf(t in d) = √frequencyNumber of times each term appears in the document
idf (t)
: inverse document frequency for termt
idf = 1 + ln(numDocs/(docFreq + 1))
Inverse document frequency (IDF) assigns low weight/relevance to terms that appear frequently in all of the documents in the index
t.getBoost()
: the boost applied to the querynorm(t,d)
– the field-length normnorm = 1/sqrt(numFieldTerms)
The value of this parameter depends on the document field length in which a match with the query was found
Querying same index twice can return documents in different order
Updating a Document in ES
Reference for Further Reading
Last updated
Was this helpful?