Standard Analyzer (default)
-
removes stop words, lowercases, tokenizes, recognizes emails and URLs
[quick] [brown] [fox] [jumped] [over] [lazy@dog.com]
Simple Analyzer
-
lowercases and tokenizes
[a] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dog] [com]
Stop Analyzer
- lowercases, tokenizes, splits by non-letter characters, removes stop words
[quick] [brown] [fox] [jumped] [over] [lazy] [dog] [com]
Whitespace Analyzer
- splits by whitespace characters
Keyword Analyzer
- entire sentence is a single token
Language Analyzer
-
understands English, French and Spanish, the most sophisiticated of all
Custom-defined Analyzer
- user-defined set of text filters
Analyzer - A collection of text filters
Parsing the sentence "A Quick Brown Fox jumped over the Lazy@Dog.com"
Perfect for auto-complete! Amortizes to 30% of total parsed data. So people on the web say...
Parsing "quick" with n-gram tokenizer will generate indices like :
Length 1 (unigram): [ q, u, i, c, k ]
Length 2 (bigram): [ qu, ui, ic, ck ]
Length 3 (trigram): [ qui, uic, ick ]
Length 4 (four-gram): [ quic, uick ]
Length 5 (five-gram): [ quick ]
Supported data types:
Client
Router
< HTTP >
<< >>
Field - Named key in a document, think column name in a SQL database
Term - Value for a field
Document - Individual record, a collection of fields
Index - The "schemaless" list for the collection of documents
Primary shard - Independent lucene index, only shard accepting writes to its documents
Replica shard - Duplicate shard for faster retrieval and high-availability of the data
Data node - Holds data shards and performs CRUD operations, search and aggregations
Master node - Only node that can modify the cluster, index & shard configurations
Ingest node - Node that applies ingest pipeline for document enrichment before indexing
Coordinating node - An elected data node responsible for the query and results
Machine Learning node - If X-Pack is installed, to use Machine Learning features in Kibana, a minimum of one ML node is required. (Read on how it differs?)
Duplicates data in multiple n-gram indices trades in disk space for speed
Inverted indices are hashmaps with complexity O(1) assuming good distribution
It keeps as much as possible in-memory
Multi-tiered Caching
Nested AND, OR and NOT syntax
Provides rich DSL language
As Elasticsearch is new, its plugin ecosystem is smaller than Apache Solr's
The documentation for Elasticsearch is absolutely brilliant. It has brief and to the point explanations with plenty of examples. Absolute treasure when it comes to writing DSL queries. Definitely worth going through the "Getting Started" section.
Thank you for listening!