tl;dr; fvh
is marginally faster than unified
and unified
is a bit faster than plain
.
When you send a full-text search query to Elasticsearch, you can specify (if and) how it should highlight, with HTML tags, highlights. E.g.
The correct way to index data into <mark>Elasticsearch</mark> with (Python) <mark>elasticsearch</mark>-dsl
Among other configuration options, you can pick one of 3 different highlighter algorithms:
The last one, fvh
, requires that you index more at index-time (in particular to add term_vector="with_positions_offsets"
to the mapping). In a previous benchmark I did, the total document size on disk, as described by http://localhost:9200/_cat/indices?v
grew by 38%.
I bombarded my local Elasticsearch 7.7 instance with thousands of queries collected from logs. Some single-word, some multi-word. The fields it highlights are things like title
(~5-50 words) and body
(~100-2,000 words).
Basically, I edited the search query by testing one at a time. For example:
search_query = search_query.highlight(
- "title", fragment_size=120, number_of_fragments=1, type="unified"
+ "title", fragment_size=120, number_of_fragments=1, type="plain"
)
...etc.
After doing 1,000 searches 3 different times per each highlighter type
option, and recording the times it took I recorded the following:
(milliseconds per query, lower is better)
UNIFIED: MEAN 18.1ms MEDIAN 19.0ms PLAIN: MEAN 24.5ms MEDIAN 27.5ms FVH: MEAN 16.1ms MEDIAN 17.6ms
Thin marginal win for fvh
over unified
.
Conclusion
Conclusion? Or should I say "Caveats" instead? There's a lot more to it than raw performance speed. In this benchmark, it takes ~20 milliseconds to search on 2 different indexes, each with a scoring function and indexes containing between 1,000 and 5,000 documents with hundreds of thousands of words. So it's pretty minor.
Each highlighter performs slightly differently too, so you'd have to study the outcome a bit more carefully to get a better feel for if it works the way you and your team prefer it to work.
If there's any conclusion, other than the boring usual "it depends on your setup and preferences", the performance difference is noticeable but not blowing you away. It makes sense that fvh
is a bit faster because you've paid for it by indexing more upfront (the offsets) at the expense of memory.
Comments