Live Coding Spring, Kafka, & Elasticsearch: Personalized Search Results on Ranking and User Profile
Link |
https://springone.io/2021/sessions/spring-kafka-elasticsearch |
Author(s) |
Erdem Günay as CTO, Layermark |
Length |
26:42 |
Date |
13-09-2021 |
Language |
English 🇺🇸 |
Track |
Architecture |
Rating |
⭐⭐⭐⭐☆ |
-
✅ Informative overview of what Elasticsearch is capable of, although the live demo is impossible to follow but still impressive
-
⛔ The way result popularity in real-time was updated from Kafka was not clearly explained.
-
⛔ It’s not clear why Kafka figures in the demo if an easier approach could be used
-
⛔ Data structure should be shown as not everybody has experience with Elasticsearch (all because I guess mostly caused by lack of time)
ElasticSearch Analyzers
By default, ElasticSearch finds by an exact match, to enable easy search using a single letter ignoring accented characters (é
, í
, etc…) it is needed to use ElasticSearch Analyzers utilizing POST _analyze
with "tokenizer": "standard"
and "char_filter"
with pattern_replace
"type` to replace anything that is not an alphanumeric character.
To get rid of capital letters and non-ASCII characters, it is needed to add token:
"filter" : ["asciifolding", "lowercase"]
…and a specific edge-ngram
among them.
It is needed to delete the former index and recreate the index.
Each hit
has a _score
that sets the element order in the returned structure.
For search by fields, it is possible to add boosting, ex. boost the search of artist name artist_name
by a factor of 5 (exact match should have a bigger score) as artist_name^5
or artist_name.prefix^1
where are the generated tokens stored.
"fuzziness" : 1
enables to match other elements (ex. "query": "sezan"
would match Selena Gomez
with a low _score
since there is a partial match in individual letters from search (basically allows typos).
Boosting results by popularity
If the search is based on a single letter (s
), Shakira
might be placed below Selena Goméz
although the popularity says otherwise (I like Shakira more, though). It is needed to enable scoring on a search through POST /content/_search
and provide "script_score"
in "functions"
in "function_score"
in "query"
: `
"script" : {
"source" : "Math.max(((!doc['ranking'].empty )
? Math.log10(doc.['ranking'].value)
: 1), 1)",
"lang" : "painless"
}`
Assuming the popularity is updated programmatically (200 asynchronous hits by Kafka) it is needed to process the listen-event messages and place them in listen-event indices using: POST /listen-event-*/_search/
. User profiles are also getting generated (POST /user-profile/_search
).
Boosting by user behavior
If a particular user searches for a certain element, that element should be boosted in the search for that particular user only. Another function must be taken into account similarly as previous boosting:
"script" : {
"source" : "params.boosts.get(doc[params.artistIdFieldName].value)",
"lang" : "painless",
"params" : { ..
}
}