An Elastic technical review.
PART 1 <<<
1 - Overview
2 - Elastic Overview
3 - Compare to MDB
PART 2
4 - Strengths, in Haiku
PART 3
5 - Use Cases
6 - Final Takeaways
This post is so long that it had to broken up into 3 parts. Use “View Whole Thread” to view it all.
OVERVIEW
As I said before, EVERY COMPANY MUST be a tech-driven company. This new tech landscape is driving the hyper-growth stories we are seeking here, as companies sprout up to be the “picks and shovels” plays that are supplying this gold rush of technological innovation. They are creating new tools that are enabling companies to solve their own problems themselves, that are cross-applicable to every business, across every industry. And luckily for us, it still seems like the early innings!
So I’ve had a series of posts, doing technical deep dives to try to isolate what companies are doing to make their service so sticky, as well as other posts about the tech behind companies products:
- Insights from Elastic conference. https://discussion.fool.com/insights-from-elastic-conference-340…
- Cutting through the FUD on MDB. https://discussion.fool.com/cutting-through-the-fud-on-mdb-34145…
- An Okta technical review. https://discussion.fool.com/an-okta-technical-review-34177580.as…
- MDB goes mobile. https://discussion.fool.com/mdb-goes-mobile-34194422.aspx
I have spoken on MDB twice before, and haven’t felt the need to dive very deep into the technical details of their product line, as it seems pretty easily understood – once you know what a NoSQL document store is, you know what MongoDB excels at. But let’s walk through their history a bit and where it has put them strategically. [Reminder, I call the company MDB to differentiate product from company. Elastic thankfully makes it easier so I refer to them by name. Downside is this means I use the word “elastic” over 200 times here.]
MDB started by making an open-source NoSQL database, then it sold support and tooling for that database to enterprises that were using it for either their internal database or as an embedded database within their products. Once cloud computing took hold, MDB then started providing a managed, vendor-neutral, cloud hosting service for its core database… one that its customers flocked to, for its scale, high availability (HA), ease of use, and the fact it completely saves them money by eliminating costs around infrastructure and ongoing maintenance. MDB’s approach has them creating a core platform around MongoDB of tools that reduce customer friction – for either self-hosted or for managed Atlas. They have apps for data exploration (Compass) and a mgmt interface (Ops Manager or Cloud Manager). They have SaaS tooling around Atlas service, like a serverless platform (Stitch), a cloud migration tool, and a visualization dashboard tool (Charts, in beta). And as I discussed before, it’s now increasing customer flexibility and increasing the applicability of its platform with its moves into being a synchronized mobile database (also in beta, but, finally, with a major acquisition to help them move faster).
Elastic is an incredibly similar storyline to MongoDB – the database and the company – but their technology stack and solutions it provides and the TAM it has are a bit tougher to understand. So today, we dive deep into Elastic, and its suite of technologies that underpin its appeal to customers, and its new product lines spinning from that core.
How do I know the company’s products so well? Besides being a software developer that works with a lot of databases and data feeds, I have worked with Elasticsearch (not the full ELK stack) for the past 4 years, using it as a vital piece of my architecture. More recently, I have run some parts of my stack within the AWS environment the past year (more for the data storage resources than compute). I’m about to try using managed Elasticsearch in AWS, and, besides it being a data store within my stack, I’m also about to start using ELK for APM and monitoring of my stack. [No brainer that I should have implemented long ago, it’s just I don’t have the time; too many other interesting projects (around data streaming) to do!]
Warning: There is a lot to like in Elastic, in ways that excite me beyond what MDB is doing. But for that, you’ll need to do a lot of reading below to get to the Final Thoughts. But don’t just jump there… I recommend the middle bits too! Dammit, don’t miss the haiku! My last deep dive into the tech behind a company was Okta – and this deep dive is even longer. For one, I know the company way better. For two, it was worth diving in deeper into their strengths and strategies.
ELASTIC OVERVIEW
Elastic is known for its suite of products it calls the Elastic Stack. (It was first called the “ELK” stack after the first 3 products, and it’s still mostly called that.) After starting as a company focused on its Elasticsearch search database, Elastic shifted gears early on towards being a solution for specific use cases, when it acquired companies with complimentary tools, then integrated them into a platform around its core engine.
Their Getting Started docs describe the core well enough: Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows you to store, search, and analyze big volumes of data quickly and in near real time. It is generally used as the underlying engine/technology that powers applications that have complex search features and requirements. https://www.elastic.co/guide/en/elasticsearch/reference/curr…
Elastic Stack is:
- Elasticsearch, the search and analytics database at the core.
- Logstash, the data processing and transformation pipeline, for data ingestion into Elasticsearch.
- Kibana, the visual interface over Elasticsearch, with data visualization dashboards and a cluster & data management interface.
- Beats, light-weight data shippers utilized for transmitting monitoring data from network and systems, and ingesting them into Elasticsearch.
- “Features” (formerly X-Pack), are modules that enhance the capabilities of Elastic Stack, such as adding cluster monitoring, alerting, data security, reporting, machine learning (ML), and a visual presentation app called Canvas. [Not sure why Elastic decided to now blandly call it “Features”. I guess they let the new marketing intern have a go at it.]
The Product Line:
Elasticsearch (ES) is the open-source NoSQL database at the heart of the stack, which provides search and analytic capabilities over your data. It is built over the open-source Apache Lucene indexing engine (created by Doug Cutting, eventual creator of Hadoop), but with a focus on cluster capabilities to manage and search over ever-growing datasets. It is a distributed software, making the engine as powerful as the cluster hardware it is installed on, and the cluster can easily expand over time. A developer uses a REST API interface or native Java libraries to store JSON data, and can then search or analyze that data via the query interface. Bottom line - if you are slicing and dicing over large data (hundreds of Gb or more) or big data (hundreds of Tb or more) for search or analytical purposes, Elasticsearch is ideal. It is popular, having 40k stars on Github and a high DB Engine rating (#7 overall and #1 for search). There are a few competitors in the open-source space… but as I discuss later in detail, the real competition to Elastic is elsewhere.
Alternate open-source search-based databases on the market are:
- Apache Solr, which is also based on Lucene. But it pales in comparison, having only 2.5k vs 40k stars for ES on Github as a sign of its popularity, and #16 on DB Engine ratings. It has very little of the surrounding ecosystem of tools that ES has. It emerged out of CNET, and later was merged with the Apache Lucene project itself. An company, Lucidworks, was created to support it, and they created their own enterprise edition of Solr called Fusion. [Contrary to anecdotal commentary on a recent post, this is not a company I would look to as supplanting Elastic in any way. But yes, it is a direct competitor.]
- Apache Druid, a OLAP/BI analytics engine, which has 8k stars on Github. It’s also a clustered data engine but is a lot more convoluted & complex to run. It came out of work done at Metamarkets, an marketing analytical SaaS company, and now has an enterprise company, Imply, supporting it. Druid is a columnar store, which means it tracks field data together, not as separate rows. This allows for advanced analytics. [I’d keep my eye on it, but ultimately it isn’t gaining much momentum, likely due to the complexity of the cluster setup required.]
Other competition is other scalable NoSQL databases, like MongoDB or Cassandra. Developers may prefer and pick those, but the search and analytical capabilities of those pale in comparison to what Elasticsearch is capable of. Plus there is no tooling around those for doing ingest, like Logstash and Beats.
Kibana is the visual interface over Elasticsearch. At it’s core, it’s a visualization dashboard web app that allows you to rapidly graph ad-hoc queries against your ES data, and create persistent visualization dashboards. It is pretty similar to another open-source visualization dashboard, Grafana, but is more closely tied to having ES as the underlying database, and, unlike Grafana, provides a mgmt interface over your ES cluster and its data. Kibana also includes “out-of-the-box” dashboards for specific server apps you are monitoring with Beats. As you hook up monitoring over your server-side apps (say a PostgreSQL database or an Nginx web server) with Filebeat and Metricbeat, you can use Kibana’s dashboard templates honed for that specific application as a starting point, then customize it from there as desired. It has plug-in modules for time-series data visualization (Timelion), geospatial visualization over maps (Elastic Maps), and exposes dashboards over many of the X-Pack features like ML anomaly detection and APM monitoring.
Logstash is the ingestion piece, that allows for continuously reading in logs from various servers, transforming log entries into JSON objects and ingest them into ES. It has a rich system of data pipeline steps, where you can convert, enrich, filter and transform log data prior to ingestion.
Beats is a collection of light-weight “data shippers”, each specific to collecting a type of data feed from remote servers or devices. This includes a log file shipper, metric shipper, network traffic monitoring and more. They are installed as system agents on your servers, which allow you to continuously collect data and ingest it into ES. Each beat comes with a wide variety of server apps it can work with out-of-the-box. As mentioned before, on the opposite side in Kibana, it includes sample dashboards for each server app that are honed to that app’s metrics or logs.
The Beat flavors are:
- Filebeat for log file monitoring. It can tie into logs from server apps (like databases or web servers) for log monitoring across your infrastructure, and can do container log monitoring of Docker and Kubernetes.
- Metricbeat for real-time server metric monitoring, including metrics from system (like cpu/disk/memory usage, or temperature readings) or server app (like databases or web servers). It can perform real-time metric monitoring of your stack, and can do container metric monitoring of Docker and Kubernetes.
- Packetbeat for real-time network traffic & latency monitoring.
- Auditbeat/Winlogbeat for system-level auditing of Linux/Windows systems.
- Heartbeat for system uptime monitoring. Simpler & lighter uptime check than Metricbeat.
- Functionbeat for real-time serverless function monitoring (i.e. monitor your AWS Lambda function).
Elastic has built up a collection of CODE-FREE out-of-the-box solutions here within Beats and Kibana for monitoring use cases. Elastic has curated a list of log & metric Beats packs and sample Kibana dashboards for a wide-variety of popular server-side applications (like PostgreSQL, MySQL, MongoDB, Cassandra, Nginx, Redis, Kafka, Kubernetes, and even Elasticsearch itself). For each system within your architecture, like your PostgreSQL database, you can use Filebeat to pull in logs, and Metricbeat to pull in real-time metrics; then within Kibana, you can pull in the PostgreSQL-specific dashboard templates, so that you can start immediately visualizing those metrics and logs. From there, you can then customize the dashboards and the queries used on the visualizations and reports within, as desired.
Beats, with its focused modules and “out-of-the-box” dashboards within Kibana, form a monitoring solution that has competition:
- InfluxDB, a time-series database, competes on certain use cases like monitoring. It has a TICK stack that is similar to the ELK stack, with tools for ingest and metric shippers. For visualization you must rely on outside tools like Grafana. There is an enterprise company that built it is InfluxData, and they, of course, offer managed cloud hosting. I spoke with an Elastic VP at their ElasticON conference last year, and he didn’t see customers picking InfluxDB over Elastic Stack. Perhaps a slanted view, as I do see InfluxDB in use at my work and in the field.
- The open-source Prometheus monitoring platform is popular. It too must be paired with the Grafana visual dashboard board.
Features/X-Pack are modules within the Elastic Stack to enhance the platform and help focus Elastic Stack over specific use cases, and to help manage your cluster. [I added marketing link to the higher impact ones.]
Modules include:
- Monitoring module for cluster monitoring.
- Security module for role-based access security over your data, down to document/field level.
- Alerting module for cluster alerting & data monitoring via queries (get notifications on spotted errors).
- Reporting module to generate, schedule, email reports. Generate PDFs of your Kibana dashboards.
- SQL module to allow for ES querying via well-known SQL data querying language. (It limits the search capabilities from what you have in the API, but is helpful for developers.)
- Hadoop Connector to directly connect Elasticsearch directly to Hadoop for querying.
- Graph tool to explore relationships in your data (use cases: fraud detection, security analysis, user recommendations).
https://www.elastic.co/products/stack/graph - ML for performing machine learning over your data and visualizing results (use cases: detect anomalies, isolate patterns, pinpoint causes, demand forecasting).
https://www.elastic.co/products/stack/machine-learning - Canvas app for creating presentation visualizations from real-time queries (aka over live data). Great for real-time demo presentations and interactive info-graphics.
https://www.elastic.co/products/stack/canvas
With X-Pack, they used to all be Premium and only come with Enterprise Support tiers. However, back in April 2018 they changed their licensing. (MDB soon followed suit with their licensing change in October 2018.) They made a new pricing tier, Basic, that includes several modules that are now free. [More later on the licensing changes.]
There are a few other ancillary services and products that Elastic has:
-
APM Server (server-side app) for collecting APM metrics from your application code and saving it to ES. Not officially a “Feature” module, as it is a stand-alone server-side app. Definitely not a Beat, either, as it requires code changes - you embed Elastic’s SDK into your code so it can start streaming live metrics from your app to the APM Server. APM may allow you to bypass needing to pull in app logs via Filebeat - it’s akin to hooking up Metricbeat directly into your app.
-
Elastic Map Service to provide high-quality global and regional maps, with which you can overlay geospatial data in Kibana. https://www.elastic.co/elastic-maps-service
-
Elastic Common Schema is an effort from Elastic to try to unify data schemas for common activities (logs, metrics, APM, networking data). It is an attempt to unify how to store like-data from different sources (say, Cisco’s firewall logs vs Fortinet’s). Elastic is hoping to convert various data sources into a single common schema, which allows you to simplify the search, analysis and visualization queries you do on that data.
Getting Support:
There are 4 tiers of support subscriptions. https://www.elastic.co/subscriptions
- Open-Source (free) - includes ELK, APM Server, most of Beats, limited Elastic Map Service
- Basic (free) - includes free X-Pack modules (monitoring, SQL), APM Server UI, and all of Beats and Elastic Map Service
- Gold - provides biz hours support, plus includes rest of X-Pack (w/ only the basic Security, and no ML), and Elastic Monitoring service
- Platinum - includes 24/7/365 support, plus all the above, adding full ML & full Security modules (SSO, ACLs down to document field, encryption at rest), cross-cluster replication.
I griped above about how X-Pack is now just referred to as “Elastic Stack Features”, a kind of bland label instead of a product name. But after reviewing that pricing chart showing features that turn on and off per tier, they are clearly blending in these modular features now; they are now directly embedded within their product lines, like ML and Security both being heavily integrated across all of Elastic Stack. I think they are losing “X-Pack” because they don’t want to think of them as separate plug-ins – they are integrated modules, and that has only expanded across other parts of the Stack. Kibana, Beats and Logstash have internal modules that turn off and on. This is why the open-source purists are upset – there are proprietary modules embedded inside open-source packages. But, it seems Elastic is pretty clear about what is and is not included in each tier. Though its a bit too detailed, perhaps - they need some higher level overview of what modules are turning on and off. Support levels are spelled out clearly. You have to go Platinum tier for ML capabilities and advanced Security features. For Gold & Platinum support levels, they offer custom training and consulting services for additional fees.
Hosting:
As for hosting their stack, a customer has the typical options plus an extra one for running your own on-premise cloud:
-
Host it themselves (self-managed, self-hosted), either on-premise or in the cloud (on EC2 or Docker instances). It’s a complicated software to configure, but obviously doable. I’ve used it entirely for free thus far – but elsewhere in my company they just bought enterprise support to use Elastic Stack for monitoring infrastructure (system logs via Filebeat and metric collection via Metricbeat). It’s up to the customer if they need support and the added features enabled by Gold/Platinum tiers.
-
Elastic Cloud is their hosting service, where Elastic can host and manage Elastic Stack clusters for you in your cloud-provider of choice. Elastic Cloud service came from their acquisition of Found in early 2015. MDB followed suit, and released Atlas service in mid-2016. Managed vendor-neutral cloud hosting is the big reason these companies are growing revenue so strongly.
-
Elastic has a 3rd option, Elastic Cloud Enterprise, which allows a company to deploy Elastic Cloud onto its own infrastructure via Docker, and use it as an internal, on-premise cloud where they can create and manage multiple Elastic Stack clusters.
COMPARE TO MDB
There are many similarities between them …
-
Both are focused around a core open-source NoSQL document store, accessed via JSON-based REST APIs or native libraries. Both engines are cluster-able and can be horizontally scaled easily (by adding more nodes to the cluster). Both data engines are built on replicated shards, which enable high availability (HA), resiliency and scale.
-
Both founders created companies around that open-source database engine that provided enterprise support and continued adding features, tools and eventually platforms around their core database. Both then expanded to create managed vendor-neutral cloud hosting of their data engine (MongoDB Atlas vs Elastic Cloud). One strength for both over the cloud vendors: the fact that the authors and maintainers of these complicated clustered database engines are the ones best suited to running a managed instance. Put the experts in charge!
-
Both provide platforms containing tools around that core database. Both have apps for managing the cluster, data exploration and visualization. When you buy MDB Enterprise Advanced subscription, beyond enterprise support you get to use their mgmt interface app (Ops Manager or Cloud Manager) for monitoring and backup, advanced modules for security & analytics, data visualization tool (Compass), and also get a commercial license to embed Mongo in your released product. When you buy an Elastic Stack Enterprise subscription, you get enterprise support as well as expanded capabilities of X-Pack plugins for security and ML.
Yet, some major differences …
-
MongoDB is a general-purpose document store with a wide set of use cases. Elasticsearch, having much better search & analytical capabilities and more flexible scaling, is a specific-to-purpose document store with a narrower [but expanding] set of use cases. If you manage a collection of data objects that has infrequent search or analytics needs, you pick MDB. For data that you need to slice and dice continually with queries to search and analyze it, you pick Elastic. And for data that is “ever growing”, you pick Elastic.
-
Given this more limited set of use cases, Elastic has had to fight harder. They have rapidly expanded their product line by acquisition, adding tools and services that helped build their core database into a platform. MDB is building its platform itself, and IMHO is subsequently moving way slower. Their new product Charts seems too little, too late – there are many other viz dashboarding tools that do this already (from open-source Grafana to proprietary Tableau).
-
Both have an “Open Source” focus, and both try to address having cloud-providers become competitors, using their own database against them. However, they have different approaches on how to address their open-source licensing to combat competition. MDB is trying to prevent cloud-providers from running a hosted cloud service using MongoDB (making them direct competition against their own Atlas service), by changing the licensing on their core database (making the OSS purists angry). Elastic is keeping the core database fully open-source (Apache 2.0 license), but is changing the licensing and open-source strategy of their bundled ‘mostly free’ X-Pack modules (also making the OSS purists angry, but this time its about the fact it’s bundled in open-source ELK). Elastic seems content to let cloud-vendors be competitors to Elastic Cloud, and letting their feature-rich modules be the differentiator.
-
Speaking of ecosystem, Elastic released a set of plug-ins for the ELK stack in 2016 that they called X-Pack, for non-core features like monitoring, security, alerting, and reporting. They started with all plug-ins being Premium (enterprise license required), but now source code is publicly available, but not open-source, and these modules are now free-to-use in their Basic tier. A subset of them are still premium and require an enterprise subscription. The Elastic Stack releases are bundling the open-source and Elastic licensed modules together. (Yes, this may cause some licensing confusion.)
-
While both MDB and Elastic have a managed cloud-hosting service plus provide users support for their self-hosted & self-managed databases, Elastic has a 3rd option - Elastic Cloud Enterprise (also from the Found acquisition). It allows their Elastic Cloud product to be installed on your on-premise infrastructure, so you can easily manage multiple ELK clusters on an internal cloud.
-
Elastic isn’t building a cloud side and a on-prem side to their platform like MDB is. It’s all Elastic Stack in the Elastic Cloud, just hosted at whatever cloud provider the customer desires, and managed by the finest experts one could find – thems that wrote it! There isn’t tooling appearing in Elastic Cloud that isn’t in core platform, unlike MDB with their Stitch serverless platform. However, the downside is that their Elastic Stack releases must bundle the proprietary modules side-by-side with the open-source products.
-
One striking difference as I walked through the product line, is the number of use cases it solves that DO NOT INVOLVE CODE. MDB is for developers only, to embed into their application stack. Elastic is for that, but also for non-developers to use without needing any custom development. IT can hook up Beats for monitoring infrastructure or network traffic. Enterprise users can feed in datasets with Logstash, for staff to query, visualize, or apply ML in Kibana. I expect this trend to continue, as it really opens up the applicability as to who can use the product line.
-
Best of all, Elastic is making exciting moves that are moving their company beyond being a do-it-yourself tool provider. There is something afoot! [More on that soon under TAM section. Keep reading! But I’ll give you a hint, it rhymes with “class”.]
To be continued, in Part 2 (due to TMF post size limit)…
-muji
long ESTC (7%)