DB Engines Types
External Links
@[https://db-engines.com/]
@[https://db-engines.com/en/ranking]
@[https://dzone.com/articles/rant-there-is-no-nosql-data-storage-engine]
Consistency models
(in distributed databases)
ºFive consistency modelsº natively supported by the A.Cosmos DBºSQL APIº
(SQL API is default API):
- native support for wire protocol-compatible APIs for
popular databases is also provided including
ºMongoDB, Cassandra, Gremlin, and Azure Table storageº.
RºWARN:º These databases don't offer precisely defined consistency
models or SLA-backed guarantees for consistency levels.
They typically provide only a subset of the five consistency
models offered by A.Cosmos DB.
- For SQL API|Gremlin API|Table API default consistency level
configured on theºA.Cosmos DB accountºis used.
Comparative Cassandra vs Cosmos DB:
Cassandra 4.x Cosmos DB Cosmos DB
(multi-region) (single region)
ONE, TWO, THREE Consistent prefix Consistent prefix
LOCAL_ONE Consistent prefix Consistent prefix
QUORUM, ALL, SERIAL Bounded stale.(def) Strong
Strong in Priv.Prev
LOCAL_QUORUM Bounded staleness Strong
LOCAL_SERIAL Bounded staleness Strong
Comparative MongoDB 3.4 vs Cosmos DB
MongoDB 3.4 Cosmos DB Cosmos DB
(multi-region) (single region)
Linearizable Strong Strong
Majority Bounded staleness Strong
Local Consistent prefix Consistent prefix
Distributed DB Eventual/Strong Consistency
@[https://hackernoon.com/eventual-vs-strong-consistency-in-distributed-databases-282fdad37cf7]
dbEngine: Data Topology compared
Key-value: wide-Column: Document Graph RDMS
o -- O o -- O O O O ... O O→O→O... (Hyper-Relational)
o -- O o -- O O O O ... O ↘ O → O ←─┐ Table1 Table2 Table3
o -- O o -- O O O O ... O O→O ↓ │ O ──┐ O ←─┐ O
... ... ↘ ┌─ O │ O │ O ←───→ O
O→O │ ↓ │ O └─→ O └─→ O
↘ │ O → O
O │ ↘ (Less powerful than
└────→ O Graph but easier to
Graph represent the mantain)
most general
CSV
csvkit
• csvkit: suite of command-line tools for converting to and working with CSV.
• By default, it sniffs CSV formats to deduces whether commas/tabs/spaces
delimit fields, performs type inference (converts text to numbers, dates,
booleans, etc.).
The inference system can be tunned.
$º$ in2csv data.xls ˃ data.csv º ← Convert Excel → CSV
$º$ in2csv data.json ˃ data.csv º ← Convert JSON → CSV
$º$ csvcut -n data.csv º ← Print column names
$º$ csvcut -c col_a,col_c \ º ← Select a subset of columns
$º data.csv ˃ new.csv º
$º$ csvcut -c col_c,col_a \ º ← Reorder cols:
$º data.csv ˃ new.csv º
$º$ csvgrep -c phone_number \ º ← Find rows with matching cells:
$º -r "555-555-\d{4}" \ º
$º data.csv ˃ new.csv º
$º$ csvjson data.csv ˃ data.json º ← Convert to JSON
$º$ csvstat data.csv º ← Generate summary statistics
$º$ SQL="" º
$º$ SQL="${SQL} SELECT name" º
$º$ SQL="${SQL} FROM data " º
$º$ SQL="${SQL} WHERE age ˃ 30 º"
$º$ csvsql --query "${SQL}" \ º← Query with SQL
$º data.csv ˃ new.csv º
$º$ csvsql \ º← IMPORT INTO PostgreSQL
$º --db postgresql:///... \ º
$º --insert data.csv º
$º$ sql2csv \ º← EXPORT FROM PostgreSQL
$º --db postgresql:///.. \ º
$º --query "select * from data" \ º
$º ˃ new.csv º
RDBMS
RDBMS
ºRDBMSº
ºFeaturesº
- support E-R data model.
- all inserts in a table must match the table schema.
(vs dangerous schemales noSQL engines).
- The minimum insert unit corresponds to a row in a table.
- The minimum update unit corresponds to a column in a row.
- basic operations are defined on the tables/relations through SQL:
- CRUD: Create/Read/Update/Delete
- set ops: union|intersect|difference
- subset selection defined by filters
- Projection of subset of table columns
- JOIN: combination of:
Cartesian_product+selection+projection
- TX ACID control
- user management
- ºHistoryº
- beginning of 1980s
- Most widely used DBMS
ºMain Examplesº
- PostgreSQL
- Oracle
- MySQL/MariaDB
- @[http://myrocks.io/] ← Built on top of key-value RocksDB.
optimized for SSD/Memory.
- SQLite : C Embedded/Server SQL engine.
DQLite DQLite stands for (d)istributed SQLite.
- H2: Robust Java JVM embedded SQL engine (server mode also supported)
that also allows on (RAM) memory only ddbbs.
- TiDB
- MySQL compatible
- RAFT-distributed
- Rust/go written
- Features:
- "infinite" horizontal scalability
- strong consistency
- HA
- SQL Server
- DB2
- Hive (https://db-engines.com/en/system/Hive)
- Home: https://hive.apache.org/
- RºWARNº: No Foreign keys, NO ACID
-ºEventual Consistencyº
- data warehouse software facilitates reading,
writing, and managing large datasets residing in
distributed storage using SQL. (Hadoop,...)
- 2.0+ support Spark as execution engine.
- Implemented in Java
- supports analysis of large datasets
stored in Hadoop's HDFS and
compatible file systems such as
Amazon S2 filesystem.
- SQL-like DML and DDL statements
Traditional SQL queries implemented in
MapReduce Java API to execute SQL apps:
- necessary SQL abstraction provided to
integrate SQL-like queries (HiveQL) into
the underlying Java without the need
for low-level queries.
- Hive aids portability of SQL-based apps
to Hadoop
- JDBC, ODBC, Thrift
- SQL "SUMMARY":
BEGINNER: INTERMEDIATE: ADVANCED:
======================== ================================ =======================
· Types of SQL · LIKE, AND/OR and DISTINCT · JOINT ... UNIONS ...
· Table, Rows and Colums · NULLs and Dates · OVER ... PARTITION BY ...
· SELECT ... FROM ... · COUNT, SUM, AVG ... · STRUCT ... UNNEST ..
WHERE ... · GROUP BY ... HAVING... · Reguar Expresions
ORDER BY ... · Alias, Sub-queries and WITH ...AS ..
Materialize: Stream 2 PSQL
@[https://github.com/MaterializeInc/materialize]
- Streaming database for real-time applications, accepting input data from
Kafka, CSV files, ... to query them using SQL.
- ask questions about live data
- Refreshed answers in millisecs.
- Designed to help interactively explore streaming data.
- perform data warehousing analytics against live relational data.
- increase the freshness and reduce the load of dashboards/monitoring tasks.
- provide correct/consistent answers with minimal latency.
(vs approximate answers/eventual consistency).
- recast SQL92 queries as dataflows.
- Support for a large fraction of PostgreSQL, and are actively working
to support builting PostgreSQL functions.
dqlite.io
@[https://dqlite.io/]
• Dqlite (“distributed SQLite”) extends SQLite across a cluster of machines, with
automatic failover and high-availability to keep your application running. It
uses C-Raft, an optimised Raft implementation in C, to gain high-performance
transactional consensus and fault tolerance while preserving SQlite’s
outstanding efficiency and tiny footprint.
SchemaSpy
@[http://schemaspy.org/]
• SchemaSpy is generates database to HTML documentation,
including Entity Relationship diagrams.
• Basic ussage:
INPUT PROCESSING OUTPUT
========= ============= =================
running → schemaspy → html report with:
ddbb (+graphviz) - table/view/columns
└────┬────┘ - constrains/routines summary and stats.
· -ºE/R diagram.º
· - Orphan tables.
· -ºanomalies.º
·
┌──────────────┴─────────────────────┐
(JDK and graphviz pre-installed)
$º $ java -jar schemaspy-6.1.0.jar \ º
$º -t pgsql \ º
$º -dp postgresjdb.jar \ º
$º -h $DDBB_HOST \ º
$º -p $DDBB_PORT \ º
$º -u $DDBB_USER \ º
$º -p $PASS \ º
$º -o $OUTPUT_HTML_REPORT_DIR º
SchemaCrawler (TODO)
- "Unix friendly" generating mermaid diagram output.
Graph_DBMS
Graph DBMS
- represent data in graph structures of nodes (entities) and edges(relations).
- Relations are are first class citizens (vs foreign keys in SQL ddbbs,
"Relational DDBBs lack relationships, some queries are very easy, some
others require complex joins" ).
- BºModeling real world is easier than in SQL/RDBMs. º
BºAlso, they are resillient to model/schema changes.º
- Sort of "hyper-relational" databases.
- Two properties should be considered in graph DDBBs:
└ºunderlying storageº:
- native optimized graph storage
- Non native (serialization to relational/Document/key-value/... ddbbs.
└ºprocessing engineº:
-ºindex-free adjacencyº: ("true graph ddbb"), connected nodes physically
"point" to each other in the database.
BºMuch faster when doing graph-like queries.º Query performance
remains (aproximately) constant in "many-joins" scenarios, while RDBMs
degrades exponentially with JOINs.
Time is proportional to the size of the path/s traversed by the query,
(vs subset of the overall table/s).
Rº(usually) they DON'T provide indexes on all nodes:º
Rº- Direct access to nodes based on attribute values is not possibleº
Rº or slower.º
-ºnon-index-free adjacencyº: External indexes are used to link nodes.
BºLower memory use.º
Native |·Infi.Graph ·OrientDB ·Neo4j
|·Titan ·Affinity ·*dex
^^^^^ |
Graph |
Processing | ·Franz Inc
vvvvv | ·Allegro Graph
|·FlockDB ·HyperGraphDB
└---------------------------------
non ← Graph → Native
Native Storage
- Graph Comute Engines execute graph algorithms against large datasets.
- Example algorithms include:
- Idetify clusters in data
- Answer "how many average relationships do node have"
- Find "paths" to get from one node to another.
- It can be independent of the underlying storage or not.
- They can be classified by:
- in-memory/single machine (Cassovary,...)
- Distributed: Pegasys, Giraph, ... Most of the based on the white-paper
BºPregel: A system fro large-scale graph processingº (2010 ACM)
@[https://dl.acm.org/doi/10.1145/1807167.1807184]
"Many practical computing problems concern large graphs. ... billions of
vertices, trillions of edges ... In this paper we present a
computational model suitable for this task. "
- GraphQL ISO standard Work in place integrating the
ideas from openCypher (Neo4j contribution), PGQL, GSQL,
and G-CORE languages.
- @[https://www.gqlstandards.org/]
- OpenCypher support:
- Neo4jDB - Cypher for Spark/Gremin
- AgensGraph - Mengraph
- RedisGraph - inGraph
- SAPHANA Graph - Cypher.PL
BºExamplesº
- Neo4j
- AGE: @[https://www.postgresql.org/about/news/2050/]
- multi-model graph database extension for PostgreSQL
- Support for Subset of Cypher Expressions through
@[https://www.opencypher.org/]
- AWS Neptune
- Azure Cosmos DB
- Datastax Enterprise
- OrientDB
- ArangoDB
- there areºtwo primary implementationsºboth consisting of
nodes(or "vertices") and directed edges (or "links").
Data models looking slightly different on each implementation.
Both allow properties (attribute/value pairs) to be
associated with nodes.
-ºgraphs—property graphs (PG)º
They also allow attributes for edges.
- no open standards exist for schema definition,
query languages, or data interchange formats.
-ºW3C Resource Description Framework (RDF)º
Node properties treated simply as more edges.
(techniques exists to express edge properties in RDF).
- graphs can be represented as edges or "triples":
edge triple = ( starting point, label , end point) or
edge triple = (subject , "predicate, object )
(Also called "triple stores")
- RDF forms part of (a suite of) standardized W3C specs.
built on other web standards, collectively known as
BºSemantic Web, or Linked Dataº. They include:
- schema languages (RDFS and OWL):
- declarative query language (SPARQL)
- serialization formats
- Supporting specifications:
- mapping RDBMs to RDF graphs
-Bºstandardized framework for inferenceº
Bº(ex, drawing conclusions from data)º
- Eclipse rdf4j: https://rdf4j.org/
OOSS modular Java framework for working with RDF data:
parsing/storing/inferencing/querying of/over such data.
- easy-to-use API
- can be connected to all leading RDF storage solutions.
with Two out-of-the-box RDF databases (the in-memory store
and the native store).
- connect with SPARQL endpoints to leverage the power
of Linked Data and Semantic Web.
with SPARQL 1.1 query and update language
- The framework offers a large scala of tools
repositories using the exact same API as for local access.
- supports mainstream RDF file formats:
- RDF/XML, Turtle, N-Triples, N-Quads, JSON-LD, TriG and TriX.
- Used for example in semanticturkey:
http://semanticturkey.uniroma2.it/doc/dev/
PG resemble conventional data structures in an isolated
application or use case, whereas RDF are originally designed
to support interoperability and interchange across
independently developed applications.
- (@[https://www.allthingsdistributed.com/2019/12/power-of-relationships.html])
"...Because developers ultimately just want to do graphs, you
can choose to do fast Apache TinkerPop Gremlin traversals for
property graph or tuned SPARQL queries over RDF graphs..."
- TinkerPop:
@[http://tinkerpop.apache.org/]
open source, vendor-agnostic, graph computing framework.
When a data system is TinkerPop-enabled, users are able
to model their domain as a graph and analyze that graph
using the Gremlin graph traversal language.
...Sometimes an application is best served by an in-memory,
transactional graph database. Sometimes a multi-machine
distributed graph database will do the job or perhaps the
application requires both a distributed graph database for
real-time queries and, in parallel, a Big(Graph)Data processor
for batch analytics.
BºGraph Management Software Summaryº
- Exploitation:
- Visualization: Gephi, Cytoscape, LinkUrious
- APIs: Apache TinkerPop,
- DDBB:
- JanusGraph, Neo4j, DataStax, MarkLogic, ...
- Processing Systems:
- Spark, Apache Giraph, Oracle Labs PGX,
- Storage:
- Hadoop, Neo4j, Apache HBase, Cassandra.
RDF vs Graph
@[https://www.youtube.com/watch?v=t1Mn178sEYg]
Both RDF and LPG are types fo Graph databases.
RDF: w3c recomendation for data exchange. LPG: Label Property Graph (Neo4j, ...)
=========================================== ===========================================
· Developed in the "beginnings" of Web to add · Developed in Sweden (2000 - 2007)
semantics to web. · (e)fficient (graph native) storage
· Specialized RDF stores (triple/quad) stores · (f)ast query and traversal
arises, also called "Semantic Graph Databases". · (h)umane model: Close to the way we think
· understand and reason about the world.
· LPG can also be used as an efficient storage mechanism for
RDF data.
· Based on the notion of statement: · LPG focus on objects (vs RDF statements)
statement == triplet [subject,predicate,object]
ej: ej:
subject predicate object
========= ========== =======
ppl://ann ·· is a ········ person ┐ ┌─ "There is a person that is described by her name: Ann,
ppl://ann ·· user ID is ·· @ann ─┴···············┴─ her user ID: @ann, and a globally unique identifier: ppl://ann "
ppl://ann ·· name is ····· Ann Smith ··················· "There is a person with a unique identifier: ppl://dan"
ppl://dan ·· likes ······· ppl://ann ··················· "Dan likes Ann"
Bº"""...Basically it is an atomic decomposition of º
Bº the data model around facts"º
· • Vertices: · • Vertices:
Every statement produces two vertices in the graph. Unique Id + set of key-value pairs
- Some are uniquely identified by URIs: Resources
- Some are property values: Literals.
• Edges: • Edges:
Every statement produces an edge. Unique Id + set of key-value pairs
Uniquely identified by URIs.
• Vertices or Edges have RºNOºinternal structureº • Vertices and Edges Bºhave internal structureº
· Query Language: SparQL · Query Language: Cypher
prefix ms: ˂http://myschma.me/˃
prefix rdf: ˂http://www ... #˃
SELECT ?who ································ MATCH (who)-[:LIKES]-˃(a:Person)
{ WHERE a.name CONTAINS 'Ann'
?a rdf:type ms:Person . RETURN who
?a ms:name ?asName .
FILTER regex(?asName, 'Ann')
?who ms:likes ?a .
}
· RDF Stores (vs RDF model) is highly indexed, · Native GraphDBs (Neo4j,...?) excel with highly
causing "JOIN" problems. dynamic datasets and transactions UC where
data integrity is key
· RDF has no inference capabilities · Graph native can have inference capabilities
BºSemantics on RDF are just RULES called ONTOLOGIESº, e.g.:
an Bºoptional layerºon top of RDF data. Philip owns a Mercedes
""" ... They are difficult to debug and reasoning Juan is married to Mercedes
with ontology languages is intractable/
undecidable""" The inference engine can deduce that 1st
Mercedes is an instance of a car, and the
2nd is an instance of a person
Do graph DBs scale
@[https://dzone.com/articles/do-graph-databases-scale] [scalability][TODO]
analytics and rendering
BºNetWorkXº [data.visualization][data.analytics][ui]
https://networkx.org/documentation/stable/index.html
- Python package for the creation, manipulation, and
study of the structure, dynamics, and functions of complex networks.
- Gallery:
@[https://networkx.org/documentation/stable/auto_examples/index.html]
BºGraphViz Graph Visualizationº
@[http://www.graphviz.org]
http://www.graphviz.org/gallery/
Query Distributed Graph DB
@[https://www.infoq.com/presentations/graph-query-distributed-execution/]
Dragon dist.GraphDB@FB
@[https://engineering.fb.com/2016/03/18/data-infrastructure/dragon-a-distributed-graph-query-engine/]
- Facebook Graph load:
Bºmillions of writes and billions of reads per secondº.
Bºthe information you see can’t be prepared in advance.º
- Facebook graph DB evolution:
memcached → MySQL → TAO → Dragon + TAO.
└┬┘
(T)he (A)ssociation (O)bjects Server
"RAM cache" of MySQL ,...
- A typical photo upload creates ~20(edges) writes to MySQL cached in RAM via TAO.
- Inverted indexing is a popular technique also used at Facebook. e.g.:
123 : Likes : Shakira Shakira: LikedBy: 123
222 : Likes : Shakira Shakira: LikedBy: 222
333 : Likes : Shakira Shakira: LikedBy: 333
└─┬─┘ └──┬──┘
Direct Inverted
Index Index
- indexing imply faster reads, slower writes, not used for small sets.
Dragon optimies using partial indexing (skip indexing for small sets).
BºThe combination of partial indexing techniques and a richer query language º
Bºsupporting filter/orderby operator allows us to index a system roughly 150x º
Bºlarger while serving 90 percent of queries from the cache.º
- Dragon isºbacked by a demand-filled, in-process key-value storeº, updated in
real time and eventually consistent.
- It uses a number of optimization techniques to conserve storage, improve
locality, and execute queries in 1/2ms with high availability and consistency.
- Dragon push down many complex queries closer to storage by profitting from
knowledge about the social graph data and user's behaviour.
OWL
@[https://en.wikipedia.org/wiki/Web_Ontology_Language]
https://www.w3.org/wiki/Ontology_editors
The Web Ontology Language (OWL) is a family of knowledge
representation languages for authoring ontologies. Ontologies are a
formal way to describe taxonomies and classification networks,
essentially defining the structure of knowledge for various domains:
the nouns representing classes of objects and the verbs representing
relations between the objects. Ontologies resemble class hierarchies
in object-oriented programming but there are several critical
differences. Class hierarchies are meant to represent structures used
in source code that evolve fairly slowly (typically monthly
revisions) whereas ontologies are meant to represent information on
the Internet and are expected to be evolving almost constantly.
Similarly, ontologies are typically far more flexible as they are
meant to represent information on the Internet coming from all sorts
of heterogeneous data sources. Class hierarchies on the other hand
are meant to be fairly static and rely on far less diverse and more
structured sources of data such as corporate databases.[1]
http://owlgred.lumii.lv/
https://www.cognitum.eu/Semantics/FluentEditor/
KEY_VALUE
KEY-VALUE
ºKEY-VALUE STORESº
-ºsimplestºform of DBMS.
- store pairs of keys and values
-ºhigh performanceº
- not adequate for complex apps.
Low data consistency in distributed clusters,
when compared to Registry Based.
- value is not ussually "big" (when compared
to wide-column DDBBs like Cassandra,..).
Not designed for high-data-layer apps, but for low-level software tasks.
- extended forms allows to sort the keys, enabling range queries as well as an
ordered processing of keys.
- Can evolve to document stores and wide column stores.
ºExamplesº
-ºRedisº: Cache/RAM-Memory oriented. Not strong cluster consistency
like etcd / Consul / Zookeeper.
- Comparative: redis (cache/simple key-value ddbb)
vs
etcd (key-value Registry DDBB):
- etcd is designed with high availability in mind,
being distributed by default - with RAFT consensus algorithm -
and providing consistency through node failures.
redis doesn't.
- etcd is persisted to disk by default. Redis is not.
(logs and snapshots are possibles)
- Redis/memecached is a blazing fast in-memory key-value store
- etcd read performance is good for most purposes, but 1-2 orders
of magnitude lower than redis. Write gap is even bigger.
- Comparative: redis (cache/simple key-value ddbb)
vs
memcached (cache/simple key-value ddbb)
- redis: able to store typed data (lists, hashes, ...)
and supporting lot of different programming patterns out of
the box, while memcached doesn't.
-ºAmazon DynamoDBº
-ºMemcachedº
-ºGuava Cacheº
@[https://github.com/google/guava/wiki/CachesExplained]
- ºnon-distributedº easy-to-use Java library for data caching
- A Cache is similar to ConcurrentMap, but not quite the same. The most
fundamental difference is that a ConcurrentMap persists all elements that
are added to it until they are explicitly removed. A Cache on the other
hand is generally configured to evict entries automatically, in order to
constrain its memory footprint. In some cases a LoadingCache can be useful
even if it doesn't evict entries, due to its automatic cache loading.
-ºHazelcastº
- Designed for caching.
- Hazelcast clients, by default, will
connect to all cache cluster nodes
an know about the cluster partition
table to route request to the correct
node.
- Java JCache compliant
- Advanced cache eviction algorithms
based on heuristics with sanitized O(1)
runtime behavior.
- LabelDB ("Ethereum State"): Non distributed, with
focus in local-storage persistence.
- TiKV
- dev lang:ºRustº
- Incubation Kubernetes project
- used also for TiDB
-ºRocksDB:º(by Facebook)
- built on earlier work on LevelDB
- core building block for fast key-value server.
- Focus: storing data on flash drives.
- It has a Log-Structured-Merge-Database (LSM) design with
flexible tradeoffs between Write-Amplification-Factor (WAF),
Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF).
- It has multi-threaded compactions, making itºespecially suitableº
ºfor storing multiple terabytes of data in a singleº
ºlocal (vs distributed) database.º.
ºIt can be used for big-data processing on single-node infrastructureº
- Used by Geth/Besu/... and other Ethereum clients ...
RocksDB
@[https://raw.githubusercontent.com/facebook/rocksdb/gh-pages-old/intro.pdf]
Local (embedded in library with no network latency) key-value
database designed for high-performance:
- Prefer in order: RAM → SSD (focus or RocksDB) → HD → Network Storage
- Optimized for server loads (multiple clients reading/writing).
-BºBased on LevelDB, but 10x faster for writesº, thanks to
new architecture, index algorithm (bloom filters, ...)
- C++ library with up-to-date java wrappers.
RºWhat it is not?º
Rº- Not distributedº
Rº- No failoverº
Rº- Not highly available. data lost if machine breaks.º
(But highly available architectures can be made on top
of it adding consensus protocols ... and loosing performance).
- keys and values are byte streams.
- common operations are:
- Get(key)
- NewIterator()
- Put(key, val)
- Delete(key)
- SingleDelete(key).
- support for multi-operational transactions, both optimistic and pessimistic mode.
- RocksDB has a Write Ahead Log (WAL):
- Puts stored in an in-memory buffer called the memtable as
well as optionally inserted into WAL. On restart, it re-processes all
the transactions that were recorded in the log.
- Data Checksuming to detect corruptions for each SST file block.
BºA block, once written to storage, is never modified.º
- Use cases:
- "Many". Amongst others, it is used as core storage engine for Facebook's
Dragon, probably one of the biggest Graph DB in production.
ZippyDB
@[https://www.infoq.com/news/2021/09/facebook-zippydb/]
@[https://engineering.fb.com/2021/08/06/-data/zippydb/]
- By Facebook Engineering.
- Internally it uses RocksDB on each node as "storage engine".
- general-purpose biggest key-value store @ Facebook.
- in production for more than six years.
- It offers flexibility to applications in terms tunable durability/consistency/availability/latency.
- Use cases:
- metadata for distributed filesystems.
- counting events.
- app and/or product data.
- A ZippyDB deployment (named "tier") consists of compute and storage resources
ºspread across several regions worldwide.º
- Each deployment hosts multiple use cases in aºmulti-tenant fashionº.
- ZippyDB splits the data belonging to a use case into shards.
- Depending on configuration, it replicates each shard across
multiple regions for fault tolerance, Bºusing either Paxos or async replication.º
- A subset of replicas per shard is part of a quorum group, where data
is synchronously replicated (high durability and availability).
- The remaining replicas, if any, are configured as followers using
asynchronous replication, allow applications to have many in-region replicas
to support low-latency reads with relaxed consistency while keeping the
quorum size small for lower write latency.
RDF
RDF stores
@[https://db-engines.com/en/article/RDF+Stores]
@[https://en.wikipedia.org/wiki/Resource_Description_Framework]
- Resource Description Framework stores
ºdescribes information in triplets:º
º(subject,predicate,object)º
- Originally used for describing
IT-resources-metadata.
- Today often used in semantic web.
- RDS is a subclass of graph DBMS:
(subject,predicate,object)
^ ^ ^
node edge node
but it offer specific methods
beyond general graph DBMS ones. Ex:
SPARQL, SQL-like query lang. for
RDF data, supported by most
RDF stores.
ºExamplesº
- MarkLogic
- Jena
- Virtuoso
- Amazon Neptune
- GraphDB
- Apache Rya:
JSON-LD
- format for mapping JSON data into the RDF semantic graph model, as defined by [JSON-LD].
@[https://en.wikipedia.org/wiki/JSON-LD
Apache Rya
@[https://searchdatamanagement.techtarget.com/news/252472464/Apache-Rya-matures-open-source-triple-store-database]
· Started by Adina Crainiceanu, associate professor of computer science at
the U.S. Naval Academy. Initial research paper published in 2012
· scalable cloud-based RDF triple store that supports fast-and-easy access to
data through SPARQL queries.
built on top of Apache Accumulo®. (MongoDB back-end also implemented).
· Rya uses novel storage methods, indexing schemes, and query
processing techniques thatºscale to billions of triples across º
ºmultiple nodes.º
· Rya's users include U.S. Navy (autonomous drones,...).
Search
Search
- "NoSQL" DBMS for search of content
- Optimized for:
- complex search expressions
- Full text search
- reducing words to stem
- Ranking and grouping of results
- Geospatial search
- Distributed search for high
scalability
ºExamplesº
- Apache Lucene: Powerful search java Library used as the base for
powerful solutions.
- Elasticsearch: Both are built on top of Lucene.
Solr adding server, cluster and escalability features.
- When compared Elasticsearch is simpler to use
and integrates with Kibana graphics.
- Solr has better documenttion.
- Splunk
- MarkLogic
- Sphinx
- Eclipse Hawk:
@[projects.eclipse.org/projects/modeling.hawk]
heterogeneous model indexing framework:
- indexes collections of models
transparently and incrementally into
NoSQL DDBB, which can be queried
efficiently.
- can mirror EMF, UML or Modelio
models (among others) into a Neo4j or
OrientDB graph, that can be queried
with native languages, or Hawk ones.
Hawk will watch models and update
the graph incrementally on change.
Time_Series
Time Series
- Managing time series data
- very high TX load:
- designed to efficiently collect,
store and query various time
series.
- Ex use-case:
SELECT SENSOR1_CPU_FREQUENCY / SENSOR2_HEAT'
joins two time series based on the
overlapping areas of time providing
new time-serie
- Time/space constrained data can be of two basic types:
- Point in time/space
- Region in time/space
ºExamplesº
- Timescale.com
- PostgreSQL optimized for Time Series.
by modifying the insert path,
execution engine, and query planner
to "intelligently" process queries
across chunks.
- QuestDB:
- https://questdb.io/
- RelationalModel.
- PostgresSQL wire protocol.(TODO)
- "Query 1.6B rows in millisecs"
- Relational Joins + Time-Series Joins
- Unlimited Transaction Size
- Cross Platform.
- Java Embedded.
- Telegraf TCP/UDP
- InfluxDB
- Kdb+
- Graphite. Simple system that will just:
- Store numeric time series data
- Render graphs of this data
It willºNOTº:
- collect data (Carbon needed)
- render graphs (external apps exists)
- RRDtool
- Prometheus
- https://prometheus.io/
- See also: Cortex: horizontally scalable, highly available,
multi-tenant, long term storage for Prometheus.
- CNCF project,ºkubernetes friendlyº
- GoLang based:
- binaries statically linked
- easy to deploy.
- Many client libraries.
- monitoring metrics analyzer
and alerting
- Highly dimensional data model.
Time series are identified by a
metric name and a set of
key-value pairs.
- stores all data as time series:
streams of timestamped values
belonging to same metric and
same set of labeled dimensions.
- Multi-mode data visualization.
- Grafana integration
- Built-in expression browser.
- PromQL Query language allowing
to select+aggregate time-series
data in real time.
- result can either be shown as
graph, tabular data or consumed
by external HTTP API.
- Precise alerts based on PromQL.
- scaling:
- sharding
- federation
- "Singleton" servers relying only
on local storage.
(optionally remote storage).
- Uber M3
www.infoq.com/news/2018/08/uber-metrics-m3
- Large Scale Metrics Platform
""" built to replace Graphite+Carbon cluster,
and Nagios for alerting and Grafana for
dashboarding due to issues like
poor resiliency/clustering, operational
cost to expand the Carbon cluster,
and a lack of replication""".
ºFeaturesº
- cluster management, aggregation,
collection, storage management,
a distributed TSDB
- M3QL query language (with features
not available in PromQL).
- tagging of metrics.
- local/remote integration similar to
the Prometheus Thanos extension providing
cross-cluster federation, unlimited
storage and global querying across
clusters, works.
- query engine: single global view
without cross region replication.
- Redis TimeSeries:
- By default Redis is just a key-value store.
@[https://redislabs.com/redis-enterprise/redis-time-series/]
RedisTimeSeries simplifies the use of Redis for time-series use
cases like IoT, stock prices, and telemetry.
- See also:
@[https://www.infoq.com/articles/redis-time-series-grafana-real-time-analytics/]
How to Use Redis TimeSeries with Grafana for Real-time Analytics
TS Popularity
@[https://www.techrepublic.com/article/why-time-series-databases-are-exploding-in-popularity/]
M3DB
M3DB: (Uber) OOSS distributed timeseries database
- ability to shard its metrics into partitions,
replicate them by a factor of three,
and then evenly disperse the replicas across separate
failure domains.
Service_Registry
Service Registry
ºand Discoveryº
- Specialized key/value DBMS:
-ºtwo processes must existsº:
-ºService registration processº
storing final-app-service
(host,port,...)
-ºService discovery processº
- let final-app-services query
the data
- other aspects to consider:
- auto-delete of non-available services
- Support for replicated services
- Remote API is provided
ºExamplesº
-ºZooKeeperº
- originated from Hadoop ecosystem.
now is core part of most cluster based Apache
projects (hadoop, kafka, solr,...)
- data-format similar to file system.
- cluster mode.
- Disadvantages:
- complex:
- Java plus big number of dependencies
- Still used by kafka for config but
plans exists to replace.
@[https://issues.apache.org/jira/browse/KAFKA-6598]
@[https://cwiki.apache.org/confluence/display/KAFKA/KIP-500%3A+Replace+ZooKeeper+with+a+Self-Managed+Metadata+Quorum] - 2019-11-06 -
-ºetcdº
- distributed versioned key/value store
- HTTPv2/gRPC remote API.
- based on previous experiences with ZooKeeper.
- hierarchical config.
- Very easy to deploy, setup and use
(implemented in go with self-executing binary)
- reliable data persistence
- very good doc
- @https://coreos.com/blog/introducing-zetcd]
- See etcd vs Consul vs ZooKeeper vs NewSQL
comparative at:
@[https://etcd.io/docs/v3.4.0/learning/why/]
and:
@[https://loneidealist.wordpress.com/2017/07/12/apache-zookeeper-vs-etcd3/]
Disadvantages:
- needs to be combined with few
third-party tools for serv.discover:
- etcd-registrator: keep updated list
of docker containers
- etcd-registrator-confd: keep updated
config files
- ...
- Core-component of Kubernetes for cluster
configuration.
-ºConsulº
- strongly consistent datastore
- multidatacenter gossip protocol
for dynamic clusters
- hierarchical key/value store
- adds the notion of app-service
(and app-service-data).
- "watches" can be used for:
- sending notifications of
data changes
- (HTTP, TTLs , custom)
health checks and
output-dependent
commands
- embedded service discovery:
no need to use third-party one
(like etcd). Discovery includes:
- node health checks
- app-services health checks
- ...
- Consul provides a built in framework
for service discovery.
(vs etcd basic key/value + custom code)
- Clients just register services and
perform discovery using the DNS
or HTTP interface.
- out of the box native support for
multiple datacenters.
- template-support for config files.
- Web UI: display all services and nodes,
monitor health checks, switch
from one datacenter to another.
- doozerd (TODO)
See also:
- Comparision chart:
coreos.com/etcd/docs/latest/learning/why.html
etcd 101
"etcd" name refers to unix "/etc" folder and "d"istributed system.
- etcd is designed as a general substrate for large scale distributed systems.
These are systems that will never tolerate split-brain operation and are
willing to sacrifice availability to achieve this end. etcd stores metadata in
a consistent and fault-tolerant way. An etcd cluster is meant to provide
key-value storage with best of class stability, reliability, scalability and
performance.
@[https://github.com/etcd-io/etcd]
@[https://etcd.io/docs/v3.4.0/]
@[https://etcd.io/docs/v3.4.0/learning/]
- designed to:
reliably (99.999999%) store infrequently updated data
reliable watch queries.
- multiversion persistent key-value store of key-value pairs:
- inexpensive snapshots
- watch history events ("time travel queries").
- key-value store is effectively immutable unless
store is compacted to to shed oldest versions.
Logical view
- flat binary key space.
- key space:
- lexically sorted index on byte string keys
→ soºrange queries are inexpensiveº.
- Each key/key-space maintains multiple revisions starting with version 1
- revisions are indexed as well
→ºranging over revisions with watchers is efficientº
- generation: key life-span, from creation to deletion.
1 key ←→ 1+ generation.
Physical view:
- data stored as key-value pairs in a persistent b+tree.
-ºkey of key-value pair is a 3-tuple (revision, sub, type)º.
^ ^
Unique (opt)
key-ID special key-type
in rev. (deleted key,..)
@[https://jepsen.io/analyses/etcd-3.4.3]
- In our tests, etcd 3.4.3 lived up to its claims for key-value operations: we
observed nothing but strict-serializable consistency for reads, writes, and
even multi-key transactions, during process pauses, crashes, clock skew,
network partitions, and membership changes. Strict-serializable behavior was
the default for key-value operations; performing reads with the serializable
flag allowed stale reads, as documented.
@[https://etcd.io/docs/v3.4.0/learning/design-client/]
- etcd v3+ is fully committed to (HTTPv2)gRPC.
For example: v3 auth has connection based authentication,
rather than v2's slower per-request authentication.
Wide_Column Stores
Wide Column Stores
(also called extensible record stores)
-ºshort of two-dimensionalº ºkey-value stores.º
where key set as well as value can be "huge"
(thousands of columns per value)
- Compared to Service Registry DDBBs (Zookeeper,
etcd3) that are also key-value stored :
- Ser.Registry ddbb scale much less (many
times a single node can suffice) but are
strongly consistent. Wide Column Stores
can scale to ten of thousands of nodes but
consistency is eventual / optimistic.
- column names and record keys ºare not fixedº
└─────┬─────┘
(schema-free)
- not to be confused with column oriented storage
of RDMS. Last one is an internal concept
for improving performance storing
table-data column-by-column vs
record-by-record)
ºExamplesº
-ºCassandraº
- SQL-like SELECT, DML and DDL statements (CQL)
- handle large amounts of data across farm
of commodity servers, providing high availability
with no single point of failure. (peer-to-peer)
- Allows for different replication strategies
(sync/async for slow-secure/fast-less-secure)
cluster replication.
- logical infrastructure support for clusters
spanning multiple datacenters in different
continents.
-ºIntegrated Managed Data º
ºLayer Solutions with Spark,º
ºKafka and Elasticsearch. º
- PostgreSQL + cstore_fdw extension: columnar store extension for analytics
use cases where data is loaded in batches.
Cstore_fdw’s columnar nature delivers performance by only reading relevant
data from disk. It may compress data by 6 to 10 times
to reduce space requirements for data archive.
-ºScyllaº
- compatible with Cassandra (same CQL and Thrift protocols, [scalability]
and same SSTable file formats) with higher throughputs and
lower latencies (10x).
- C++14 (vs Cassandra Java)
-ºcustom network managementºthat minimizes resource by º
ºbypassing the Linux kernel. No system call is required º
to complete a network request. Everything happens in userspace.
-ºSeastar async lib replacing threadsº
- sharded design by node:
- each CPU core handles a data-subset
- authors claim to achieve much better performance on modern NUMA SMP archs,
and to scale very well with the number of cores:
-ºUp to 2 million req/sec per machineº
-ºHBaseº. Apache alternative to Google BigTable
Internally uses skip-lists:
@[../programming_theory.html?query=e65f9917-5f27-4b78-8a5a-0653640b6b88]
_ºCosmosDBº
REF: Google bigTable-osdi06.pdf
-ºGoogle Spannerº (BigTable subtitution)
@[https://cloud.google.com/spanner/]
- Features: Schema, SQL ,Consistency ,Availability ,Scalability ,Replication
Cassandra 101
RºWARN:º
- Why It's a Poor Choice For a Metadata Database for Object Stores
@[https://blog.min.io/the-trouble-with-cassandra-based-object-stores/]
- Cassandra excels at supporting write-heavy workloads,
- Cassandra have limitations when supporting read-heavy workloads
due to its eventual consistency model and lack of transactions,
multi-table support like joins, subqueries can also limit its usefulness.
- Object storage needs are far simpler and different from what Cassandra
is built for.
- RºWARNº: Because the implications of employing Cassandra as a object
storage metadata database were not properly understood, many object
storage vendors made it a foundational part of their architecture.
keeping them from ever moving past simple archival workloads.
- NOTE: It is not unusual to see Cassandra clusters
with 100+ nodes in production enviroments.
Cassandra whitepaper:
-ºdistributed key-value storeº
- developed at Facebook.
- data and processing spread out across many commodity servers
- highly available service without single point of failure
allowing replication even across multiple data centers.
- Possibility to choose synchronous/asynchronous replication
for each update.
-ºfully distributed DB, with no master DBº
- Reads are linearly scalable with the number of nodes.
It scalates (much)better cassandra that comparables systems
like hbase,voldermort voldtdb redis mysql
Writes are linearyly scalable?
- based on 2 core technologies:
- Google's Big Table
- Amazon's Dynamo
- 2 versions of Cassandra:
- Community Edition : distributed under the Apache™ License
- Enterprise Edition : distributed by Datastax
- Cassandra is a BASE system (vs an ACID system):
BASE == (B)asically (A)vailable, (S)oft state, (E)ventually consistent)
ACID == (A)tomicity, (C)onsistency, (I)solation, (D)urability.
Oº- BASE implies that the system is optimistic and accepts that º
Oº the database consistency will be in a state of fluxº
Oº- ACID is pessimistic and it forces consistency at the endº
Oº of every transaction.º
Replication
Strategy
1
↑
↓
1
cluster 1 ←→ 1+ Keyspace 1 ←→ N Column Family ←→ 0+ Row
└┬┘ └─────┬─────┘
Typically
just one sharing similar - collection of
structure sorted columns.
┌─ CQL lang. used to access data.
┌─┴───────────────────────────────────────────────────────────────┐
│OºKEYSPACEº: container for app.data, similar to a RDMS schema │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Column Family 1 │ │
│ └─────────────────────────────────────────────────┘ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Column Family 2 │ │
│ └─────────────────────────────────────────────────┘ │
│ ... │
│ ┌─────────┐ ┌─────────────────────────────────────────────────┐ │
│ │Settings │ │ Column Family N Col Col2 ... ColN │ │
│ ├─────────┤ │ Key1 Key2 ... KeyN │ │
│ │*Replica.│ │ ┌────────┐ ┌──────────────────────────────────┐ │ │
│ │ Strategy│ │ │Settings│ │RowKey1│Val1_1│Val2_1│ │ValN_1│ │ │ │
│ └─────────┘ │ │ │ │RowKey2│Val2_2│Val2_2│ │ValN_2│ │ │ │
│ │ └────────┘ │ ... │ │ │
│ │ └──────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
RowKey := Unique row UID
Used also to sharde/balance
load acrossºnodesº
ValX_Y := value ||
value collection
BASIC INFRASTRUCTURE ARCHITECTURE:
──────────────────────────────────
ºNodeº: (Cassandra peer instance) ºRackº:
- data storage (on File System - Set of nodes
or HDFS)
- The data is balanced
across nodes based
on the RowKey f each
Column Family.
- Commit log to ensure
data durability.
- Sequentially written automatically
replicated throughout
┌── Node "N" choosen cluster
│ based on RowKey ↑
↓ ┌──┴───┐
Input → Commit Log → Index+write →(memTable → Write to ──→ SSTable
@Node N toºmemTableº full) ºSSTableº Compaction
└──┬───┘ └──┬───┘ └───┬────┘
- in-memory structure - (S)orted (S)tring Table - Periodically consolidates
- "Sort-of" write-back in File System or HDFS storage by discarding
cache data marked for deletion
with a tombstone.
- repair mechanisms exits
to ensure consistency
of cluster
ºDatacenterº: ºClusterº:
- Logical Set of Racks. - 1+ Datacenters
- Ex: America, Europe,... - full set of nodes
- Replication set at which map to a single
Data Center bounds complete token ring
CQL
- similar syntax to SQL
- works with table data.
- Available Shells: cqlsh | DevCenter | JDBC/ODBC drivers
ºCoordinator Nodeº:
- Node where client connects for a read/write request
acting as a proxy between the client and Cassandra
internals.
Cassandra Key Components
Gossip
A peer-to-peer communication protocol in which nodes periodically exchange
state information about themselves and about other nodes they know about. This
is similar to hear-beat mechanism in HDFS to get the status of each node by the
master.
A peer-to-peer communication protocol to share location and state
information about the other nodes in a Cassandra cluster. Gossip information is
also stored locally by each node to use immediately when a node restarts.
The gossip process runs every second and exchanges state messages with up
to three other nodes in the cluster. The nodes exchange information about
themselves and about the other nodes that they have gossiped about, so all
nodes quickly learn about all other nodes in the cluster.
A gossip message has a version associated with it, so that during a gossip
exchange, older information is overwritten with the most current state for a
particular node.
To prevent problems in gossip communications, use the same list of seed
nodes for all nodes in a cluster. This is most critical the first time a node
starts up. By default, a node remembers other nodes it has gossiped with
between subsequent restarts. The seed node designation has no purpose other
than bootstrapping the gossip process for new nodes joining the cluster. Seed
nodes are not a single point of failure, nor do they have any other special
purpose in cluster operations beyond the bootstrapping of nodes.
Note: In multiple data-center clusters, the seed list should include at least
one node from each data center (replication group). More than a single seed
node per data center is recommended for fault tolerance. Otherwise, gossip has
to communicate with another data center when bootstrapping a node. Making every
node a seed node is not recommended because of increased maintenance and
reduced gossip performance. Gossip optimization is not critical, but it is
recommended to use a small seed list (approximately three nodes per data
center).
Data distribution and replication
How data is distributed and factors influencing replication.
In Cassandra, Data is organized by table and identified by a primary key, which
determines which node the data is stored on. Replicas are copies of rows.
Factors influencing replication include:
Virtual nodes (Vnodes):
Assigns data ownerommended for production. It defines a node’s data center
and rack and uses gossip for propagating this information to other nodes.
There are many vtypes of snitches like dynamic snitching, simple snitching,
RackInferringSnitch, PropertyFileSnitch, GossipingPropertyFileSnitch,
Ec2Snitch, Ec2MultiRegionSnitch, GoogleCloudSnitch, CloudstackSnitch.
TODO:
https://rene-ace.com/cassandra-101-understanding-what-is-cassandra/
- Seed node
- Snitch purpose
- topologies
- Coordinator node,
- replication factors,
- ...
Cassandra+Spark
@[https://es.slideshare.net/chbatey/1-dundee-cassandra-101]
Similar projects
- Voldermort: developed by Linked-In.
@[https://www.project-voldemort.com/voldemort/]
BºWhat's newº
- Cassandra 4.0 (2020-10)
@[https://cassandra.apache.org/doc/latest/new/]
- Support for Java 11
- Virtual Tables
- Audit Logging
- All successful/failed login attempts
- All database command requests to CQL.
- high performant live query logging
- useful for live traffic capture and traffic replay.
- Chronicle-Queue used to rotate a log of queries
- Full Query Logging (FQL)
- Improved Internode Messaging
- Improved Streaming
- Streaming is the process used by nodes of a cluster to exchange data
in the form of SSTables.
- Transient Replication (experimental)
cstart
Cassandra Orchestration Tool by Spotify
@[https://github.com/spotify/cstar]
@[www.infoq.com/news/2018/10/spotify-cstar]
""" ... Cstar emerged from the necessity of running shell commands
on all host in a Cassandra cluster ....
Spotify fleet reached 3000 nodes. Example scripts were:
- Clear all snapshots
- Take a new snapshot (to allow a rollback)
- Disable automated puppet runs
- Stop the Cassandra process
- Run puppet from a custom branch of our git repo
in order to upgrade the package
- Start the Cassandra process again
- Update system.schema_columnfamilies to the JSON format
- Run `nodetool upgradesstables`, which depending on
the amount of data on the node, could take hours to
complete
- Remove the rollback snapshot
"""
$ pip3 install cstart
REF: @[https://github.com/spotify/cstar]
"""Why not simply use Ansible or Fabric?
Ansible does not have the primitives required to run things in a topology aware
fashion. One could split the C* cluster into groups that can be safely executed
in parallel and run one group at a time. But unless the job takes almost
exactly the same amount of time to run on every host, such a solution would run
with a significantly lower rate of parallelism, not to mention it would be
kludgy enough to be unpleasant to work with.
Unfortunately, Fabric is not thread safe, so the same type of limitations apply.
All involved machines are assumed to be some sort of UNIX-like system like
OS X or Linux with python3, and Bourne style shell."""
Dynamo vs Cassandra
https://sujithjay.com/data-systems/dynamo-cassandra/
Dynamo vs Cassandra : Systems Design of NoSQL Databases
State-of-the-art distributed databases represent a distillation of
years of research in distributed systems. The concepts underlying any
distributed system can thus be overwhelming to comprehend. This is
truer when you are dealing with databases without the strong
consistency guarantee. Databases without strong consistency
guarantees come in a range of flavours; but they are bunched under a
category called NoSQL databases.
Document_Stores
Document Stores
- also called document-oriented DBMS
- schema-free:
- different records may have
different columns
- values of individual columns
can have dif. types
- Columns can be multi-value
- Records can have a nested structure
- Often use internal notations,
mostly JSON.
- features:
- secondary indexes in (JSON) objects
ºExamplesº
- MongoDB @[#mongo_summary]
www.infoq.com/articles/Starting-With-MongoDB
14 Things I Wish I’d Known When Starting with
MongoDB
- Amazon DynamoDB
- Couchbase
- Cosmos DB
- CouchDB:
Note from https://www.xplenty.com/blog/couchdb-vs-mongodb/
- BºCAP theorem: CouchDB prioritizes availability, while MongoDB º
Bºprioritizes consistency.º
- Reviews: MongoDB seems to have somewhat better reviews than CouchDB.
MongoDB Summary
$ mongo ← Launch mongo shell
# show dbs ← list all databases
# use db_name ← create/ login to a database
# db.createCollection('collection01') ← create collection
# db.collection01.insert( [ ← Insert N documents
{ key1: "val11", key2:"val21", }, db.collection.insertOne()
{ key1: "val12", key2:"val22", }, db.collection.insertMany()
{ key1: "val13", key2:"val23", }
]);
# db.collection01.save( ← Upsert doc.
{ key1: "newVal", ... },
);
# db.collection01.update() ← Update 1+ docs. in collection based on
matching document and based on multi option
# db.collection01.update( ← updateOne()|updateMany()
{ key1: val11 }, ← query to match
{ $set: { key2 : val2} }, ← update with
{ multi: true} ← options
);
# db.collection01 ← Update single document.
.findOneAndUpdate(
filter,
update,
options) ← Options:
upsert=true|false
returnNewDocument: true: return new doc
(vs original).
upsert==true ⅋⅋ returnNewDocument==false:
→ return null.
# db.collecti01.findOne( { _id : 123 });← Find by ID
# db.collecti01.findOne( { key1 : val1}); by query
# db.collecti01.findOne( ← Find with projection
{ key1 : val1}, (limiting the fields to return)
{ key2: 1} ); ← returns id , key2 fields only
# db.collecti01.find( {...} ) ← Returns cursor with selected docs.
# db.collecti01.deleteOne( filter, opts) ← or deleteMany(filter, opts)
BºreadConcern levels:º
- Control consistency and isolation properties of data reads
( from replica sets ⅋ replica set shards ).
- local: returns data from instace without guaranteing that
its been written to majority of replica members
(Default for reads against primary)
- available: similar to local, gives lowest latancy for sharded collections.
(Default for reads against secondaries)
- majority: returns only if the data acknowledged by a majority of
the replica set members.
- linearizable: return data after all successful majority-acknowledged
writes
- snapshot: Only available for transactions on multi documents.
BºWrite Concern:º
- Control level of acknowledgment for a given write operation
sent to mongod and mongos (for sharded collections)
- fields spec:
- w : (number),
0 : No ACK requested
1 : (default) wait ACK from standalone mongod|primary(replica set)
2+: wait ACK from primary + given N-1 secondaries
- j : (bool)
true =˃ return ACK only after real writte onto on-disk journal
- wtimeout: millisecs before returning error.
RºWARNº: On failure MongoDB does not rollback the data,
data may eventually get stored
BºIndexesº
_id Index (Default index, can NOT be dropped)
1. Single Field:
2. Multikey Field:
3. Geospatial Index:
4. Text Indexes:
9. Aggregations
1. Aggregation Pipeline
2. Map-Reduce
3. Single Purpose
PouchDB (web/mobile)
@[https://pouchdb.com/]
- "...PouchDB is an open-source JavaScript database inspired by
Apache CouchDB that is designed to run well within the browser.
-ºPouchDB was created to help web developers build applications thatº
ºwork as well offline as they do online.º
It enables applications to store data locally while offline, then
synchronize it with CouchDB and compatible servers when the
application is back online, keeping the user's data in sync no matter
where they next login..."
Data(log)_collect
Data(log) collect
ºFluentDº |ºLogstashº
"Improved" logstat |- TheºLº in ELK
- https://www.fluentd.org/ |-*OS data Collector
- data collector for unified logging layer |- TODO:
- increasingly used Docker, GCP, | osquery.readthedocs.io/en/stable/
and Elasticsearch communities |- low-level instrumentation framework
- https://logz.io/blog/fluentd-logstash | with system analytics and monitoring
FluentD vs Logstash compared | both performant and intuitive.
ºFeaturesº |- osquery exposes an operating system
- unify data collection and consumption | as a high-performance SQL RDMS:
for better understanding of data. | - SQL tables represent concepts such
| as running processes, loaded kernel
| modules, open network connections,
Syslog Elasticsearch | browser plugins, hardware events
Apache/Nginx logs → → → MongoDB |ºFeatures:º
Mobile/Web app logs → → → Hadoop | - File Integrity Monitoring (FIM):
Sensors/IoT AWS, GCP, ... | - DNS
| - *1
*1:@[https://medium.com/palantir/osquery-across-the-enterprise-3c3c9d13ec55]
@[https://github.com/tldr-pages/tldr/blob/master/pages/common/logstash.md]
|ºPrometheus Node ExporteRº |ºOthersº
|- https://github.com/prometheus/node_exporter |- collectd
|- TODO: |- Dynatrace OneAgent
|- Datadog agent
|- New Relic agent
|- Ganglia gmond
|- ...
See also: Spring cloud Sleuth: @[../JAVA/java_map.html#spring_cloud_summary]
Distributed tracing for Spring Cloud compatible with Zipkin, HTrace log-based
(e.g. ELK) tracing.
Loki
@[https://grafana.com/loki]
- logging backend, optimized for Prometheus and Kubernetes
- optimized to search, visualize and explore your logs natively in Grafana.
Logreduce IA filter
Logreduce IA filter
- logreduce@pypi
- Quiet log noise with Python and machine learning
Event_Stream_Arch.
Event Stream Architectures
Distributed (Event) Stream Processors
REF:
- Stream_processing@Wikipedia
- Patterns for streaming realtime anaylitics
- ???
- Streaming Reactive Systems & Data Pites w. Squbs
- Streaming SQL Foundations: Why I Love Streams+Tables
- Next Steps in Stateful Streaming with Apache Flink
- Kafka Streams - from the Ground Up to the Cloud
- Data Decisions with Real-Time Stream Processing
- Foundations of streamng SQL
- The Power of Distributed Snapshots in Apache Flink
- Panel: SQL over Streams, Ask the Experts
- Survival of the Fittest - Streaming Architectures
- Streaming for Personalization Datasets at Netflix
- < href="https://www.safaribooksonline.com/library/view/an-introduction-to/9781491934951/">An Introduction to Time Series with Team Apache
""" Apache Cassandra evangelist Patrick McFadin shows how to solve time-series data
problems with technologies from Team Apache: Kafka, Spark and Cassandra.
- Kafka: handle real-time data feeds with this "message broker"
- Spark: parallel processing framework that can quickly and efficiently
analyze massive amounts of data
- Spark Streaming: perform effective stream analysis by ingesting data
in micro-batches
- Cassandra: distributed database where scaling and uptime are critical
- Cassandra Query Language (CQL): navigate create/update your data and data-models
- Spark+Cassandra: perform expressive analytics over large volumes of data
|ºKafkaº@[./kafka_map.html] | ºSparkº (TODO)
|-@[https://kafka.apache.org] | @[http://spark.apache.org/]
|- scalable,persistent and fault-tolerant | - Zeppelin "Spark Notebook"(video):
| real-time log/event processing cluster. | @[https://www.youtube.com/watch?v=CfhYFqNyjGc]
|- It's NOT a real stream processor but a |
| broker/message-bus to store stream data |
| for comsuption. |
|- "Kafka stream" can be use to add |
| data-stream-processor capabilities |
|- Main use cases: |
| - real-time reliable streaming data | ºFlinkº TODO
| pipelines between applications | - Includes a powerful windowing system
| - real-time streaming applications | supports many types of windows:
| that transform or react to the | - Stream windows and win.aggregations are crucial
| streams of data | building block for analyzing data streams.
|- Each node-broker in the cluster has an |
| identity which can be used to find other |
| brokers in the cluster. The brokers also |
| need some type of a database to store |
| partition logs. |
|-@[http://kafka.apache.org/uses] |
| (Popular) Use cases |
| BºMessagingº: good replacement for traditional |
| message brokers decoupling processing from |
| data producers, buffering unprocessed messages, |
| ...) with better throughput, built-in |
| partitioning, replication, and fault-tolerance |
| BºWebsite Activity Trackingº(original use case) |
| Page views, searches, ... are published |
| to central topics with one topic per activity |
| type. |
| BºMetricsº operational monitoring data. |
| BºLog Aggregationº replacement. |
| - Group logs from different servers in a central |
| place. |
| - Kafka abstracts away the details of files and |
| gives a cleaner abstraction of logs/events as |
| a stream of messages allowing for lower-latency |
| processing and easier support for multiple data |
| sources and distributed data consumption. |
| When compared to log-centric systems like |
| Scribe or Flume, Kafka offers equally good |
| performance, stronger durability guarantees due |
| to replication, and much lower end-to-end |
| latency. |
| BºStream Processingº: processing pipelines |
| consisting of multiple stages, where raw input |
| data is consumed from Kafka topics and then |
| aggregated/enriched/transformed into new |
| topics for further consumption. |
| - Kafka 0.10.+ includes Kafka-Streams to |
| easify this pipeline process. |
| - alternative open source stream processing |
| tools include Apache Storm and Samza. |
| BºEvent Sourcing architectureº: state changes |
| are logged as a time-ordered sequence of |
| records. |
| BºExternal Commit Log for distributed systemsº: |
| - The log helps replicate data between |
| nodes and acts as a re-syncing mechanism |
| for failed nodes to restore their data. |
See also:
-Flink vs Spark Storm:
@[https://www.confluent.io/blog/apache-flink-apache-kafka-streams-comparison-guideline-users/]
Check also: Streaming Ledger. Stream ACID TXs on top of Flink
@[https://docs.confluent.io/current/schema-registry/index.html]
Confluent Schema Registry provides a serving layer for your metadata. It
provides a RESTful interface for storing and retrieving Apache Avro® schemas.
It stores a versioned history of all schemas based on a specified subject name
strategy, provides multiple compatibility settings and allows evolution of
schemas according to the configured compatibility settings and expanded Avro
support. It provides serializers that plug into Apache Kafka® clients that
handle schema storage and retrieval for Kafka messages that are sent in the
Avro format.
- LodDevice(Facebook)
@[https://www.infoq.com/news/2018/09/logdevice-distributed-logstorage]
- LogDevice has been compared with other log storage systems
like Apache BookKeeper and Apache Kafka.
- The primary difference with Kafka
(@[https://news.ycombinator.com/item?id=17975328]
seems to be the decoupling of computation and storage
- Underlying storage based on RocksDB, a key value store
also open sourced by Facebook
Event Based
Architecture
@[https://www.infoq.com/news/2017/11/jonas-reactive-summit-keynote]
Jonas Boner ... talked about event driven services (EDA) and
event stream processing (ESP)... on distributed systems.
... background on EDA evolution over time:
- Tuxedo, Terracotta and Staged Event Driven Architecture (SEDA).
ºevents represent factsº
- Events drive autonomy in the system and help to reduce risk.
- increase loose coupling, scalability, resilience, and traceability.
- ... basically inverts the control flow in the system
- ... focus on the behavior of systems as opposed to the
structure of systems.
- TIP for developers:
ºDo not focus on just the "things" in the systemº
º(Domain Objects), but rather focus on what happens (Events)º
- Promise Theory:
- proposed by Mark Burgess
- use events to define the Bounded Context through
the lense of promises.
quoting Greg Young:
"""Modeling events forces you to have a temporal focus on what’s going on in the
system. Time becomes a crucial factor of the system."""
""" Event Logging allows us to model time by treating event as a snapshot in time
and event log as our full history. It also allows for time travel in the
sense that we can replay the log for historic debugging as well as for
auditing and traceability. We can replay it on system failures and for data replication."""
Boner discussed the following patterns for event driven architecture:
- Event Loop
- Event Stream
- Event Sourcing
- CQRS for temporal decoupling
- Event Stream Processing
Event stream processing technologies like Apache Flink, Spark Streaming,
Kafka Streams, Apache Gearpump and Apache Beam can be used to implement these
design patterns.
DDDesign, CQRS ⅋ Event Sourcing
Domain Driven Design, CQRS and Event Sourcing
@[https://www.kenneth-truyers.net/2013/12/05/introduction-to-domain-driven-design-cqrs-and-event-sourcing/]
Enterprise Patterns
ESB Architecture
- Can be defined by next feautes:
- Monitoring of services/messages passed between them
- wire Protocol bridge between HTTP, AMQP, SOAP, gRPC, CVS in Filesystem,...
- Scheduling, mapping, QoS management, error handling, ..
- Data transformation
- Data pipelines
- Mule, JBoss Fuse (Camel + "etc..."), BizTalk, Apache ServiceMix, ...
REF: https://en.wikipedia.org/wiki/Enterprise_service_bus#/media/File:ESB_Component_Hive.png
^ Special App. Services
|
E | Process Automation BPEL, Workflow
n |
t | Application Adapters RFC, BABI, IDoc, XML-RPC, ...
e m |
r e | Application Data Consolidation MDM, OSCo, ...
p s |
r s | Application Data Mapping EDI, B2B
i a | _______________________________
s g | Business Application Monitoring
e e | _______________________________
| Traffic Monitoring Cockpit
S c |
e h | Special Message Services Ex. Test Tools
r a |
v n | Web Services WSDL, REST, CGI
i n |
c e | Protocol Conversion XML, XSL, DCOM, CORBA
e l |
| Message Consolidation N.N (data locks, multi-submit,...)
B |
u | Message Routing XI, WBI, BIZTALK, Seeburger
s |
| Message Service MQ Series, MSMQ, ...
Apache NiFi: Data Routing
@[https://nifi.apache.org]
- no-code, .drag-and-drop Web-GUI to build pipelines for
data route+transform (99% NiFi users will never see a line of code)
- Deployed as a standalone application (vs framework, api or library)
- Support batch ETL but "prefers" data streams.
- Support for binary data and large files ("multi-GB" video files).
- Complex (and performant) transformations, enrichments and normalisations.
- out of the box it supports:
- Kafka, Elastic, HDFS, S3, Postgres, Mongo, etc.
- Generic sources/endpoints: TCP, HTTP, IMAP, ...
- scalable directed graphs of data routing/transformation/system mediation logic.
- UI to control design, feedback, monitoring
- natively clustered - it expects (but isn't required) to be deployed on
multiple hosts that work together as a cluster for
performance, availability and redundancy.
- Highly configurable:
- Loss tolerant vs guaranteed delivery
- Low latency vs high throughput
- Dynamic prioritization
- Flow can be modified at runtime
- Back pressure
- Data Provenance
- Track dataflow from beginning to end
- Designed for extension
- Build your own processors and more
- Enables rapid development and
effective testing
- Secure
- SSL, SSH, HTTPS, encrypted content, etc...
- Multi-tenant authorization and internal
authorization/policy management
- DevOps:
@[https://dzone.com/articles/setting-apache-nifi-on-docker-containers]
REFS:
- NiFi+Spark:
@[https://blogs.apache.org/nifi/entry/stream_processing_nifi_and_spark"]
- Nifi "vs" Cammel: [comparative]
@[https://stackoverflow.com/questions/65625166/apache-camel-vs-apache-nifi]
GraphQL Engine
• Blazing fast, instant realtime GraphQL APIs for DB with fine
grained access control, also trigger webhooks on database events.
• POLE POSITION in GitHub for Top 32 Automatic Api Open Source Projects on Github (
https://awesomeopensource.com/projects/automatic-api
• Ex:
PostgreSQL data GraphQL subscription UI (Web, Mobile, ...)
ORDERS TABLE subscription fetchOrder { ·········─→ event pushed to update UI
============================== orders ( where:
order_id payment dispatched {id: {_eq: 'XX_11'}}) {
---------+--------+----------- id
XX-11 true true payment
---------+--------+----------- dispatched
... }
NoCoDB: DDBB to spreedsheet
@[https://www.nocodb.com/]
• Second position in GitHub for Top 32 Automatic Api Open Source Projects on Github (
https://awesomeopensource.com/projects/automatic-api
• Open Source "Airtable Alternative"
• Turns any database into a smart spreadsheet.
Metabase
https://github.com/metabase/metabase
• simplest way to get business intelligence and analytics from a DDBB to everyone.
• Low-code chart (including geo-map) generation.
Datasette.io
• Spin up a JSON API for any data in minutes.
• Prototype and prove your ideas without building a custom backend.
Redash:
- "Make your company Data Driven"
- Redash is designed to enable anyone, regardless of the level of
technical sophistication, to harness the power of data big and small.
SQL users leverage Redash to explore, query, visualize, and share
data from any data sources. Their work in turn enables anybody in
their organization to use the data. Every day, millions of users at
thousands of organizations around the world use Redash to develop
insights and make data-driven decisions.
- Supported DDBBs:
Amazon Athena MemSQL
Amazon DynamoDB Microsoft Azure Data Warehouse / Synapse
Amazon Redshift Microsoft Azure SQL Database
Axibase Time Series Database Microsoft SQL Server
Cassandra MongoDB
ClickHouse MySQL
CockroachDB Oracle
CSV PostgreSQL
Databricks (Apache Spark) Presto
DB2 by IBM Prometheus
Druid Python
Elasticsearch Qubole
Google Analytics Rockset
Google BigQuery Salesforce
Google Spreadsheets ScyllaDB
Graphite Shell Scripts
Greenplum Snowflake
Hive SQLite
Impala TiDB
InfluxDB TreasureData
JIRA Vertica
JSON Yandex AppMetrrica
Apache Kylin Yandex Metrica
OmniSciDB (Formerly MapD)
Business Data charts
@[https://www.forbes.com/sites/bernardmarr/2017/07/20/the-7-best-data-visualization-tools-in-2017/#643a48726c30]
Apache Superset: [TODO]
@[https://www.phoronix.com/scan.php?page=news_item&px=Apache-Superset-Top-Level]
Superset is the project's big data visualization and business
intelligence web solution. Apache Superset allows for big data
exploration and visualization with data from a variety of databases
ranging from SQLite and MySQL to Amazon Redshift, Google BigQuery,
Snowflake, Oracle Database, IBM DB2, and a variety of other
compatible data sources.
Python based.
Superset is also cloud-native in the sense that it is flexible and
lets you choose the:
- web server (Gunicorn, Nginx, Apache),
- metadata database engine (MySQL, Postgres, MariaDB, etc),
- message queue (Redis, RabbitMQ, SQS, etc),
- results backend (S3, Redis, Memcached, etc),
- caching layer (Memcached, Redis, etc),
Superset also works well with services like NewRelic, StatsD and
DataDog.
Low-Level Data charts
|ºKibanaº (TODO)
|- "A Picture's Worth a Thousand Log Lines"
|- visualize (Elasticsearch) data and navigate
| the Elastic Stack, learning understanding
| the impact rain might have on your quarterly
| numbers
ºGrafanaº (TODO)
@[https://grafana.com/]
- time series analytics
- Integrates with Prometheus queries
@[https://logz.io/blog/grafana-vs-kibana/]
- 7.0+ includes new plugin architecture, visualizations,
transformations, native trace support, ...
https://grafana.com/blog/2020/05/18/grafana-v7.0-released-new-plugin-architecture-visualizations-transformations-native-trace-support-and-more/
TOGAF
@[https://en.wikipedia.org/wiki/The_Open_Group_Architecture_Framework]
The Open Group Architecture Framework (TOGAF) is a framework for enterprise
architecture that provides an approach for designing, planning, implementing,
and governing an enterprise information technology architecture.[2] TOGAF is
a high level approach to design. It is typically modeled at four levels:
Business, Application, Data, and Technology. It relies heavily on
modularization, standardization, and already existing, proven technologies and
products.
TOGAF was developed starting 1995 by The Open Group, based on DoD's TAFIM. As
of 2016, The Open Group claims that TOGAF is employed by 80% of Global 50
companies and 60% of Fortune 500 companies.[3]
Summary from @[https://www.redhat.com/architect/TOGAF-importance]
• organization created in 1996. 850-global members as of 2021-11
advancing in technology standards services, certifications,
research, and training.
• The Open Group is behind many world-class architectural and digital
standards, including:
- Open Agile Architecture (O-AA) Standard
- ArchiMate Modeling Language
- Certifying body for the Unix trademark.
- developing and management of the
ºOpen Group Architecture Framework (TOGAF)º
key industry-standard for enterprise architecture.
( more than 110,000 TOGAF certifications issued in 155 countries)
"... organizations are finding the need to pull these disparate
projects together and are turning to TOGAF Standard and enterprise
architecture as the most proven and reliable methodology of achieving
this. Understanding and managing that big picture of an enterprise
has always been at the heart of TOGAF Standard, and it will be
exciting to see how practitioners apply it to this next stage of
digitalization ...."
"... What TOGAF doesn't do is tell you how to do your job:
TOGAF doesn't give us a playbook, instead, it realigns the way we
solve problems into a common framework, approach, and a language that
can be communicated across a team of architects, a practice, an
industry, and across our profession."
"...TOGAF works because it provides a framework to identify and promote
validated patterns across an evolving set of customer and
industry-specific architectures that are then evolved into deployable
solution blueprints."
Kogito
@[http://kie.org/]
- Kogito == Drools + jBPM + OptaPlanner
BºDroolsº
- Drools is a business rule management system with a
forward-chaining and backward-chaining inference based rules engine,
allowing fast and reliable evaluation of business rules and complex
event processing.
Also use in other projects like https://www.shopizer.com/, and OOSS
java ecommerce solution based on Spring/Hibernate/Drools/Elastic.
BºjBPMº
@[https://www.jbpm.org/]
- toolkit for building business applications to help automate business processes and decisions.
- jBPM originates from BPM (Business Process Management) but it has
evolved to enable users to pick their own path in business
automation. It provides various capabilities that simplify and
externalize business logic into reusable assets such as cases,
processes, decision tables and more.
- business processes (BPMN2)
- case management (BPMN2 and CMMN)
- decision management (DMN)
- business rules (DRL)
- business optimisation (Solver)
- jBPM can be used as standalone service or embedded in custom
service. It does not mandate any of the frameworks to be used, it can
be successfully used in
- traditional JEE applications - war/ear deployments
- SpringBoot, Thorntail, ... deployments
- standalone java programs
BºOptaPlannerº [optimization]
- OptaPlanner is an AIºconstraint solverº. It optimizes planning and
scheduling problems, such as the Vehicle Routing Problem, Employee
Rostering, Maintenance Scheduling, Task Assignment, School
Timetabling, Cloud Optimization, Conference Scheduling, Job Shop
Scheduling, Bin Packing and many more. Every organization faces such
challenges: assign a limited set of constrained resources (employees,
assets, time and/or money) to provide products or services.
OptaPlanner delivers more efficient plans, which reduce costs and
improve service quality.
Messaging Architecture
Summary
• Messaging represents a layer down ESB in some cases.
Messaging architectures focus on reliable delivery of messages of different nature/timelife.
On top of them, ESB add data management/processing/transformation.
• In general a message in message driven architectures represents some information that
*MUST* be processed, while an event in event stream architecture ("Kafka", "Pulse", fast
distributed queues) represent a more generic information that MUST or MUST not be processed.
Event streams can most of the time be used to build a message stream architecture on top
and ussually the have better real-time performance and/or scalability (big data) performance.
On the other side conventional messaging architectures focus on warranting the proper
delivery and consumption of messages.
For example, messages triggering transactions representing "money" that do not require
big-data scalation are better deal with message archictectures than general event stream
architectures.
- Message classification by consumption model:
┌──────────────────────────────────────────────────────────────────────────────────────────┐
│ publish/ │ Publishers send messages (commands) to be consumed-and-forgotten. │
│ subscribe │ Events cannot be replayed after being received. │
│ │ New consumers will not be able to retrieve past events. │
│ │ Messages are ussually interpreted and processed in the same way │
│ │ by all clients. They behave like a sort of async-and-remote function call. │
│ │ where subscribing clients acts as the called function. │
│ │ subscribers are decoupled in time, but coupled in logic. │
│──────────────────────────────────────────────────────────────────────────────────────────│
│ Event │ Publishers write new events (source-of-true)in order to a long-lasting log. │
│ streaming │ Consumers don't need to subscribe, and they can read from any part of the │
│ │ even stream log, allowing to replay reads. │
│ │ Different clients can interpret and process event stream logs differently │
│ │ and for different purposes. │
│ │ subscribers are decoupled in time and in logic. │
└──────────────────────────────────────────────────────────────────────────────────────────┘
- Messages classification by message expiration-time validity:
┌────────────────────────────────────────────────────────────────────┐
│MESSAGE VALID │ MESSAGIN SYSTEM NEEDS │ Consumption model │
│────────────────────────│───────────────────────│───────────────────│
│for a short time │ fast delivery │ publish/subscribe │
│────────────────────────│───────────────────────│───────────────────│
│until consumed │ persistence │ publish/subscribe │
│────────────────────────│───────────────────────│───────────────────│
│for repeated consumption│ persistence and index │ event streaming │
└────────────────────────────────────────────────────────────────────┘
- Message classification by semmantics:
┌───────────────────────────────────────────────────────────────────────────┐
│NATURE │ │ │
│────────────────│────────────────────────────│─────────────────────────────│
│Command/Order │·leads state change │· Async resilient alternative│
│ │·Requires response/result │ to sync HTTP PUT/POST │
│────────────────│────────────────────────────│─────────────────────────────│
│Query │·Requires response/result │· Async resilient alternative│
│ │ sync/async(fire and forget)│ to sync HTTP PUT/POST │
│────────────────│────────────────────────────│─────────────────────────────│
│Event │·Doesn't require response │· Used in Cmd-Query Responsa-│
│ │ │ bility Segregation │
└───────────────────────────────────────────────────────────────────────────┘
Ex messaging system:
SYSTEM BEST FOR SEMANTICS WORST FOR
┌───────────────────────────────────────────────────────┐
│MQ "Classic │ ·Durable │·Command │ ·Replayable │
│Broker" │ ·Volatile │·event │ │
│─────────────│──────────────│─────────│────────────────│
│KAFKA │ ·Replayable │·Event │ ·Volatile │
│ │ ·Scalability │ │ ·light hardware│
│ │ ·Decoupled │ │ │
│ │ components │ │ │
│─────────────│──────────────│─────────│────────────────│
│Qpid │ ·Volatile │·Command │ ·Durable │
│ │ │·Query │ ·Replayable │
└───────────────────────────────────────────────────────┘
┌───────────────────────────────────────────────────────────────────────────────────────────────┐
│ Working │ Description │ PROS │ CONS │
│ Model │ │ │ │
│───────────────────────────────────────────────────────────────────────────────────────────────│
│ queuing │ pool of consumers may read │ allows to split processing│ queues aren't │
│ │ from server and each record│ over multiple consumers, │ multi─subscriber—once │
│ │ goes to one of them │ scaling with easy │ one process reads the │
│ │ │ │ data it's gone. │
│───────────────────────────────────────────────────────────────────────────────────────────────│
│ publish/ │ record broadcast │ broadcast data to │ no way of scaling │
│ subscribe │ to all consumers. │ multiple processes │ since every message │
│ │ │ │ goes to every subscriber│
└───────────────────────────────────────────────────────────────────────────────────────────────┘
Bº##################º
Bº# Message Queues #º
Bº##################º
Defined by
- message oriented architecture
- Persistence (or durability until comsuption)
- queuing
- Routing: point-to-point / publish-and-subscribe
- No processing/transformation of message/data
Bº########º
Bº# AMQP #º (RabbitMQ, APache ActiveMQ, Qpid, Solace, ...)
Bº########º
@[https://en.wikipedia.org/wiki/Advanced_Message_Queuing_Protocol]
- Open standard network protocol
• binary wire protocol designed as open replacement for existing
proprietary messaging middleware.
• strong enterprise adoption. (JP Morgan claims to process
1+ billion messages a day. Also used by NASA, Google,...)
- Often compared to JMS:
- JMS defines API interfaces, AMQP defines network protocol
- JMS has no requirement for how messages are formed and
transmitted and thus every JMS broker can implement the
messages in a different (incompatible) format.
AMQP publishes its specifications in a downloadable XML format,
allowing library maintainers to generate APIs driven by
the specs while also automating construction of algorithms to
marshal and demarshal messages.
- brokers implementations supporting it:
@[https://www.amqp.org/about/examples]
- Apache Qpid (TODO)
@[http://qpid.apache.org/]
- INETCO's AMQP protocol analyzer
@[http://www.inetco.com/resource-library/technology-amqp/]
- JORAM JMS + AMPQ
@[http://joram.ow2.org/] 100% pure Java implementation of JMS
- Kaazing's AMQPºWeb Clientº
@[http://kaazing.net/index.html]
- Azure Service Bus+ AMPQ
-RºWhat's wrong with AMQP?º. Extracted from
@[https://news.ycombinator.com/item?id=1657574]
""" ... My answer, as someone who recentlyºswitched an AMQP backend º
ºto using redisºfor queues and pub/sub broadcast: AMQP is wrong the [comparative]
same way RDBMSes or Java are "wrong." It's too heavy, too slow, or
too complex for a great-many use cases...
Bº The truth is that something with a radically reduced feature set, º
Bºthat's simpler to use and understand, is really all that's needed toº
Bºmeet a whole lot of "real world" use cases. º
- @[http://www.windowsazure.com/en-us/develop/net/how-to-guides/service-bus-amqp-overview/]
- JBoss A-MQ, built from @[http://qpid.apache.org/]
@[http://www.redhat.com/en/technologies/jboss-middleware/amq]
- IBM MQLight
@[https://developer.ibm.com/messaging/mq-light/]
- StormMQ
@[http://stormmq.com/]
a cloud hosted messaging service based on AMQP
- RabbitMQ (by VMware Inc)
@[http://www.rabbitmq.com/];
also supported by SpringSource
...
- Java JMS
• See also:
·@[../WebTechnologies/map.html#asyncapi_summary] standard to model [standard]
event-driven async APIs.
· "The many meanings of event-driven architecture,"
@[https://www.youtube.com/watch?v=STKCRSUsyP0] (By Martin Fowler)
Message Brokers
- Routing
- (De-)Multiplexing of messages from/into multiple messages to different recipients
- Durability
- Transformation (translation of message between formats)
- "things usually get blurry - many solutions are both (message queue and message
broker) - for example RabbitMQ or QDB. Samples for message queues are
Gearman, IronMQ, JMS, SQS or MSMQ."
Message broker examples are, Qpid, Open AMQ or ActiveMQ.
- Kafka can also be used as message broker but is not its main intention
ActiveMQ
• Based on JAVA
Bº######################
Bº# ActiveMQ "Classic" #
Bº######################
endlessly pluggable architecture serving many generations of
applications.
· Web interface.
· JMS 1.1 with full client implementation including JNDI
(RºNo JMS 2.0 supportº)
·ºHigh availabilityºusing shared storage
· Familiar JMS-based addressing model
· "Network of brokers" for distributing load
· KahaDB+JDBC options for persistence
·BºQUICK DEVELOPMENT SETUP:º
→ Setup Java/$JAVA_HOME
→ Download ActiveMQ Classic
→ Execute (apache-activemq-5.16.2/bin/)wrapper
→ Open http://localhost:8161/admin/index.jsp
→ "Queues" tab → "Create" button
DONE!!!
Bº####################
Bº# ActiveMQ Artemis #
Bº####################
• Multiprotocol("polyglot") broker (AMQP, OpenWire, MQTT, STOMP, HTTP, ...)
• Single / multi-cluster.
• Java based
• next generation High-performance, non-blocking architecture
· Web interface.
· JMS 1.1, 2.0 with full client implementation including JNDI
·ºHigh availabilityºusing shared storage or network replication
· Simple + powerful protocol agnostic addressing model
· Flexible clustering for distributing load
· Advanced journal implementations for low-latency persistence
as well as JDBC
· High feature parity with ActiveMQ "Classic" to ease migration
· Very good and free documentation:
@[https://activemq.apache.org/components/artemis/documentation/latest/book.pdf]
RabbitMQ
• Erlang Alternative. Quite popular since it was used as AMQP
ref.implementation.
@-[https://www.rabbitmq.com/getstarted.html]
• Based on Erlang (Rºvery reduce developer communityº)
• Multiprotocol("polyglot") broker (AMQP, MQTT, STOMP, ...)
e.g.: RabbitMQ Web Stomp allows for example to expose messaging in a browser
through websockets.
• P: Producer, C: consumer
P → │││queue_name │││ ──→ C Work queues
└─→ C ------------------
┌→ │││queue1 │││ ──→ C
P → │X│ ─┤ Publish/Subscribe
└→ │││queue2 │││ ──→ C ------------------
┌→ (error) │││queue1 │││ ──→ C
P → │X│ ─┤ Routing
^ │ info -------
| └───────error──→ │││queue2 │││ ──→ C
warning
type=direct
┌→ *.orange.* → │││queue1 │││ ──→ C
P → │X│ ─┤ Topics
^ │ ------
| └─ *.*.rabbit → │││queue2 │││ ──→ C
lazy.*
type=topic
C ─→ Request: ──→ │││ request │││ ──→ Server Async RPC
^ reply_to=amqp.gen... │ ---------
│ correlation_id=abc │
│ │
└─── Reply ─── │││ response │││ ──────┘
correlation_id=abc
See @[#activemq_summary]
OpenHFT: microSec Messaging storing to disk
@[https://github.com/OpenHFT/]
- Chronicle-Queue: Micro second messaging that stores everything to disk
- Chronicle-Accelerate: HFT meets Blockchain in Java platform
XCL is a new cryptocurrency project that, learning from the previous
Blockchain implementations, aims to solve the issues limiting adoption
by building an entirely new protocol that can scale to millions of
transactions per second, delivering consistent sub-second latency. Our
platform will leverage AI to control volatility and liquidity, require
low energy and simplify compliance with integrated KYC and AML support.
The XCL platform combines low latencies (sub millisecond), IoT transaction
rates (millions/s), open source AI volatility controls and blockchain for
transfer of value and exchange of value for virtual fiat and crypto
currencies. This system could be extended to other asset classes such as
securities and fixed income. It uses a federated services model and
regionalized payment systems making it more scalable than a blockchain
which requires global consensus.
The platform makes use of Chronicle-Salt for encryption and Chronicle-Bytes.
on Chronicle Core’s direct memory and OS system call access.
- Chronicle-Logger: A sub microsecond java logger, supporting standard logging
APIs such as Slf & Log4J
CQRS
REF: @[https://www.redhat.com/architect/pros-and-cons-cqrs]
• CQRS stands for Command Query Responsibility Segregation
• The data-processing architecture pattern separates a service's write
tasks from its read tasks:
• Separating write activity from ready activities allows you to use the
best database technology for the task at hand, for example, a stream-event
engine (Kafka/Pulse/...) for writing and a non-SQL database for reading.
• While reading and writing to the same database is acceptable for
small applications, distributed applications operating at web-scale
require a segmented approach.
• Typically there's more read activity than write activity. Also,
read activity is immutable. Thus, replicas dedicated to reading data
can be spread out over a variety of geolocations.
• This approach allows users to get the data that closest to them
resulting in a more efficient and scalable solution.
Distributed cache
(Extracted from whitepaper by Christoph Engelbert, Soft.Arch.at Hazelcast)
cache hit: data is already available in the cache when requested
(otherwise it's said cache miss)
- Caches are implemented as simple key-value stores for performance.
- Caching-First: term to describe the situation where you start thinking
about Caching itself as one of the main domains of your application.
ºUssageº
-ºReference Dataº
- normally small and used to speed up the dereferencing
of previously known, fixed number of elements (e.g.
states of the USA, abbreviations of elements,...).
-ºActive DataSetº
- Grow to their maximum size and evict the oldest or not
frequently used entries to keep in memory bounds.
Caching Strategies
ºCooperative (Distributed) cachingº
different cluster-nodes work together to build a huge, shared cache
Ussually an "intelligent" partitioning algorithm is used to balance
load about cluster nodes.
- common approach when system requires large amounts of data to be cached
ºPartial Cachingº
- not all data is stored in the cache.
ºGeographical Cachingº
- located in chosen locations to optimize latency
- CDN (Content Delivery Network) is the best known example of this type of cache
- Works well when content changes less often.
ºPreemptive Cachingº
- mostly used in conjunction with a Geographical Cache
- Using a warm-up engine a Preemptive Cache is populated on startup
and tries to update itself based on rules or events.
- The idea behind this cache addition is to reload data from any
backend service or central cluster even before a requestor wants
to retrieve the element. This keeps access time to the cached
elements constant and prevents accesses to single elements from
becoming unexpectedly long.
- Can be difficult to implement properly and requires a lot of
knowledge of the cached domain and the update workflows
ºLatency SLA Cachingº
- It's able to maintain latency SLAs even if the cache is slow
or overloaded. This type of cache can be build in two different ways.
- Having a timeout to exceed before the system either requests
the potentially cached element from the original source
(in parallel to the already running cache request) or simple
default answer, using whatever returns first.
- Always fire both requests in parallel and take whatever returns first.
(discouraged since it mostly dimiss the value of caching). Can make
sense if multiple caching layers are available.
Caching Topologies
-ºIn-process:º
- cache share application's memory space.
- most oftenly used in non-distributed systems.
- fastest possible access speed.
- Easy to build, but complex to grow.
-ºEmbedded Node Cachesº
- the application itself will be part of the cluster.
- kind of combination between an In-Process Cache and the
Cooperative Caching
- it can either use partitioning or full dataset replication.
- CONST: Application and cache cannot be scaled independently
-ºClient-Server Cachesº
- these systems tend to be Cooperative Caches by having a
multi-server architecture to scale out and have the
same feature set as the Embedded Node Caches
but with the client layer on top.
- This architecture keeps separate clusters of the applications
using the cached data and the data itself, offering
the possibility to scale the application cluster and the
caching cluster independently.
Evict Strategies
-ºLeast Frequently usedº
- values that are accessed the least amount of times are
remove on memory preasure.
- each cache record must keep track of its accesses using
a counter which is increment only.
-ºLeast Recently Usedº
- values that were last used most far back in terms of time
are removed on memory preasure.
- each record keeps must track of its last access timestamp
Other evict strategies can be found at
@[https://en.wikipedia.org/wiki/Cache_replacement_policies]
Memcached
@[https://www.memcached.org/
- distributed memory object caching system
- Memcached servers are unaware of each other. There is no crosstalk, no
syncronization, no broadcasting, no replication. Adding servers increases
the available memory. Cache invalidation is simplified, as clients delete
or overwrite data on the server which owns it directly
- initially intended to speed up dynamic web applications alleviating database load
ºMemcached-session-managerº
@[https://github.com/magro/memcached-session-manager]
tomcat HA/scalable/fault-tolerant session manager
- supports sticky and non-sticky configurations
- Failover is supported via migration of sessions
@[https://www.infoworld.com/article/3063161/why-redis-beats-memcached-for-caching.html]
Redis
@[https://redis.io/]
• in-memory data structure store, used as a key-value database, cache.
Commonly used as shared-session in balanced REST APIs (e.g, Spring has
zero-code/low-config options to "move" sessions to redis in spring-cloud
module).
• Since it can also notify listener of changes in its state it can
also be used as message broker (this is the case for example
in Kubernetes, where etcd implement an asynchronous message system
amongst its componentes).
• supports data structures such as strings, hashes, lists, sets, sorted sets
with range queries, bitmaps, hyperloglogs and geospatial indexes with
radius queries
• Redis has built-in replication, Lua scripting, LRU eviction, transactions
and different levels of on-disk persistence, and provides high availability
via Redis Sentinel and automatic partitioning with Redis Cluster
• Redis Bºis also considered as a lightweight alternative to "heavy" messagingº
Bºsystems (AMPQ, ...)º.
@[https://www.infoworld.com/article/3063161/why-redis-beats-memcached-for-caching.html]
@[https://www.infoq.com/news/2018/10/Redis-5-Released]
• redis supports a variety of data types all oriented around binary safe strings.
- binary-safe strings (most common)
- lists of strings
- unordered sets of strings
- hashes
- sorted sets of strings
- maps of strings
- redis works best with smaller values (100k or less):
Bºconsider chopping up bigger data into multiple keysº
•ºeach data value is associated to a keyºwhich can be used
to lookup the value from the cache.
• up to 500 mb values are possible, but increases
network latency and Rºcan cause caching and out-of-memory issuesº
Rºif cache isn't configured to expire old valuesº
• redis keys: binary safe strings.
- guidelines for choosing keys:
- avoid long keys. they take up more memory and require longer
lookup times because they have to be compared byte-by-byte.
- prefer hash of big keys to big keys themself.
- maximum size: 512 mb, but much smaller must be used.
- prefer keys like "sport:football;date:2008-02-02" to
"fb:8-2-2". extra size and performance difference is
negligible.
• data in redisºis stored in nodes and clustersº
•ºclusters are sets of three or more nodesºyour dataset is split
across.
Bº#########################º
Bº# common redis commands #º
Bº#########################º
┌───────────────────────────────────────────────────────────────────────┐
│command │ description │
│─────────────────┼─────────────────────────────────────────────────────│
│ping │ping the server. returns "pong". │
│─────────────────┼─────────────────────────────────────────────────────│
│set [key] [value]│sets key/value in cache. Return "ok" on success. │
│─────────────────┼─────────────────────────────────────────────────────│
│get [key] │gets a value from cache. │
│─────────────────┼─────────────────────────────────────────────────────│
│exists [key] │returns '1' if key exists, '0' if it doesn't. │
│─────────────────┼─────────────────────────────────────────────────────│
│type [key] │returns type associated to value │
│─────────────────┼─────────────────────────────────────────────────────│
│incr [key] │increment the given value associated with key by '1'.│
│ │value must be integer|double. Return new value │
│─────────────────┼─────────────────────────────────────────────────────│
│incrby │ increment value (associated to key) by amount │
│ [key] [amount]│value must be integer|double. Return new value │
│─────────────────┼─────────────────────────────────────────────────────│
│del [key] │ deletes the value associated with the key. │
│─────────────────┼─────────────────────────────────────────────────────│
│flushdb │ delete all keys and values in the database. │
└───────────────────────────────────────────────────────────────────────┘
+ create batch , create transaction related ones [TODO]
-ºredis has a command-line tool (redis-cli)º: ex:
STRING VALUE INTEGER VALUE
============ | =============
˃ set key01 value01 ←ok | ˃ set key02 100 ←ok
˃ get key01 ←"value01" | ˃ get key02 ←(integer) 100
| ˃ incr key02 ←(integer) 101
| ˃ incrby key02 50 ←(integer) 151
˃ type key01 ←(string) | ˃ type key02 ←(integer)
˃ exists key01 ←(string) 1 |
˃ del key01 ←(string) 1 |
˃ exists key01 ←(string) 0 |
-ºadding an expiration timeº (key time-to-live (ttl))
after which key is automatically deleted from cache.
- expirations can be set using seconds or milliseconds precision
but expire-time resolution is always 1 millisecond.
- expire information is replicated to disk, so if server remains
stopped, at restart stop-time is counted as consumed time.
- expiration example:
returns
˃ set counter 100 ← OK
˃ºexpire counter 5º ← (integer) 1
˃ get counter 100
... wait ...
˃ get counter º(nil)º
• redis deployments can consists of:
ºsingle node º, ºmultiple nodeº, ºclusteredº
• Eval: Lua Scripts [TODO]
@[https://redis.io/commands/eval]
Hazelcast
in-memory
data grid
@[https://en.wikipedia.org/wiki/Hazelcast]
- based on Java
Ehcache
(terabyte)
cache
@[http://www.ehcache.org/]
- Can be used as tcp service (distributed cache) or process-embedded
TODO: Same API for local and distributed objects?
- open source, standards-based cache that boosts performance, offloads I/O
- Integrates with other popular libraries and frameworks
- It scales from in-process caching, all the way to mixed
in-process/out-of-process deployments withºterabyte-sized cachesº
ºExample Ehcache 3 APIº:
CacheManager cacheManager =
CacheManagerBuilder.newCacheManagerBuilder()
.withCache("preConfigured",
CacheConfigurationBuilder.newCacheConfigurationBuilder(Long.class, String.class,
ResourcePoolsBuilder.heap(100))
.build())
.build(true);
Cache˂Long, String˃ preConfigured =
= cacheManager.getCache("preConfigured", Long.class, String.class);
Cache˂Long, String˃ myCache = cacheManager.createCache("myCache",
CacheConfigurationBuilder.newCacheConfigurationBuilder(Long.class, String.class,
ResourcePoolsBuilder.heap(100)).build());
myCache.put(1L, "da one!");
String value = myCache.get(1L);
cacheManager.close();
(simpler/lighter solution but not so escalable could be to use Google Guava Cache)
JBoss Cache
@[http://jbosscache.jboss.org/]
Distributed Storage
SAN vs ...
@[https://serversuit.com/community/technical-tips/view/storage-area-network,-and-other-storage-methods.html]
Storage Area Network, and Other Storage Methods
Data Lake
@[https://en.wikipedia.org/wiki/Data_lake]
UStore!!!
@[https://arxiv.org/pdf/1702.02799.pdf]
Distributed Storage with rich semantics!!!
Today’s storage systems expose abstractions which are either too low-level
(e.g., key-value store, raw-blockstore) that they require developers
to re-invent thewheels, or too high-level (e.g., relational databases,
Git)that they lack generality to support many classes of ap-plications. In
this work, we propose and implement ageneral distributed data storage
system, called UStore,which has rich semantics. UStore delivers three
keyproperties, namely immutability, sharing and security,which unify and
add values to many classes of today’sapplications, and which also open the
door for new ap-plications. By keeping the core properties within
thestorage, UStore helps reduce application development ef-forts while offering
high performance at hand. The stor-age embraces current hardware trends as key
enablers.It is built around a data-structure similar to that of Git,a popular
source code versioning system, but it alsosynthesizes many designs from
distributed systems anddatabases. Our current implementation of UStore
hasbetter performance than general in-memory key-valuestorage systems,
especially for version scan operations.We port and evaluate four
applications on top of US-tore: a Git-like application, a collaborative
data scienceapplication, a transaction management application, anda
blockchain application. We demonstrate that UStoreenables faster
development and the UStore-backed ap-plications can have better performance
than the existingimplementations
""" Our current implementation of UStore hasbetter performance than
general in-memory key-valuestorage systems, especially for version
scan operations.We port and evaluate four applications on top of
US-tore: a Git-like application, a collaborative data
scienceapplication, a transaction management application, anda
blockchain application. We demonstrate that UStore enables faster
development and the UStore-backed applications can have better
performance than the existing implementations. """
NFS considered harmful
@[http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html]
cluster-FS comparative
@[http://zgp.org/linux-tists/20040101205016.E5998@shaitan.lightconsulting.com.html]
Ceph (exabyte Soft.Defined Storage)
@[http://ceph.com/ceph-storage/]
Ceph’s RADOS provides you with extraordinary data storage scalability—
thousands of client hosts or KVMs accessing petabytes to
exabytes of data. Each one of your applications can use the object, block or
file system interfaces to the same RADOS cluster simultaneously, which means
your Ceph storage system serves as a flexible foundation for all of your
data storage needs. You can use Ceph for free, and deploy it on economical
commodity hardware. Ceph is a better way to store data.
By decoupling the namespace from the underlying hardware, object-based
storage systems enable you to build much larger storage clusters. You
can scale out object-based storage systems using economical commodity hardware
, and you can replace hardware easily when it malfunctions or fails.
Ceph’s CRUSH algorithm liberates storage clusters from the scalability and
performance limitations imposed by centralized data table mapping. It
replicates and re-balance data within the cluster dynamically—elminating this
tedious task for administrators, while delivering high-performance and
infinite scalability.
See more at: http://ceph.com/ceph-storage/#sthash.KNp2tGf5.dpuf
When Ceph is Not Enough
Josh Goldenhar, VP Product Marketing, Lightbits Labs
... In this 30-minute webinar, we'll discuss the origins of Ceph and why
it's a great solution for highly scalable, capacity optimized storage
pools. You’ll learn how and where Ceph shines but also where its
architectural shortcomings make Ceph a sub-optimal choice for today's
high performance, scale-out databases and other key web-scale
software infrastructure solutions.
Participants will learn:
• The evolution of Ceph
• Ceph applicability to infrastructures such as OpenStack,
OpenShift and other Kubernetes orchestration environments
Rº• Why Ceph can't meet the block storage challenges of modern, º
Rº scale-out, distributed databases, analytics and AI/ML workloadsº
Rº• Where Cephs falls short on consistent latency responseº
• Overcoming Ceph’s performance issues during rebuilds
Bº• How you can deploy high performance, low latency block storage inº
Bº the same environments Ceph integrates with, alongside your Ceph º
Bº deployment.º
GlusterFS
- Software defined distributed storage,
Tachyon memory-centric distributed FS
@[http://tachyon-project.org/index.html]
- memory-centric distributed file system enabling reliable file sharing at memory-speed
across cluster frameworks, such as Spark and MapReduce. It achieves high performance by leveraging
lineage information and using memory aggressively. Tachyon caches working set files in memory,
thereby avoiding going to disk to load datasets that are frequently read. This enables different
jobs/queries and frameworks to access cached files at memory speed.
Tachyon is Hadoop compatible. Existing Spark and MapReduce programs can run on top of it without
any code change. The project is open source (Apache License 2.0) and is deployed at multiple companies.
It has more than 40 contributors from over 15 institutions, including Yahoo, Intel, and Redhat.
The project is the storage layer of the Berkeley Data Analytics Stack (BDAS) and also part of the
Fedora distribution.
S3QL FUSE FS (Amazon S2, GCS, OpenStack,...)
- FUSE-based file system
- backed by several cloud storages:
- such as Amazon S2, Google Cloud Storage, Rackspace CloudFiles, or OpenStack
@[http://xmodulo.com/2014/09/create-cloud-based-encrypted-file-system-linux.html]
- S3QL is one of the most popular open-source cloud-based file systems.
- full featured file system:
- unlimited capacity
- up to 2TB file sizes
- compression
- UNIX attributes
- encryption
- snapshots with copy-on-write
- immutable trees
- de-duplication
- hardlink/symlink support, etc.
- Any bytes written to an S3QL file system are compressed/encrypted
locally before being transmitted to cloud backend.
- When you attempt to read contents stored in an S3QL file system, the
corresponding objects are downloaded from cloud (if not in the local
cache), and decrypted/uncompressed on the fly.
Nexenta
@[https://nexenta.com/]
- Software-Defined Storage Product Family.
Minio.io
@[https://minio.io]
- Private Cloud Storage
- high performance distributed object storage server, designed for
large-scale private cloud infrastructure. Minio is widely deployed across the
world with over 146.6M+ docker pulls.
Why Object storage?
@[https://www.ibm.com/developerworks/library/l-nilfs-exofs/index.html]
Object storage is an interesting idea and makes for a much more scalable
system. It removes portions of the file system from the host and pushes them
into the storage subsystem. There are trade-offs here, but by distributing
portions of the file system to multiple endpoints, you distribute the workload,
making the object-based method simpler to scale to much larger storage systems.
Rather than the host operating system needing to worry about block-to-file
mapping, the storage device itself provides this mapping, allowing the host to
operate at the file level.
Object storage systems also provide the ability to query the available
metadata. This provides some additional advantages, because the search
capability can be distributed to the endpoint object systems.
Object storage has made a comeback recently in the area of cloud storage. Cloud
storage providers (which sell storage as a service) represent their storage as
objects instead of the traditional block API. These providers implement APIs
for object transfer, management, and metadata management.
Distributed Tracing
OpenTelemetry
@[https://opentelemetry.io/]
- Emerging suite of standards for (Cloud Native) Observability.
- integration with modern observability SaaS providers like
Honeycomb, Datadog, Stackdriver, ...
• OpenTelemetry Announces Roadmap for Metrics Specification:
@[https://www.infoq.com/news/2021/03/opentelemetry-metrics-roadmap/]
The OpenTelemetry project announced its roadmap for its metrics
specification. The roadmap includes a stable metrics API/SDK, metrics
data model and protocol, and compatibility with Prometheus.
Zipkin
@[https://zipkin.io/]
Testing
Genesis Distributed Testing
@[https://docs.whiteblock.io/introduction_to_testing.html]
The following are types of tests one can perform on a distributed system:
- Functional Testing is conducted to test whether a system performs as it was
specified or in accordance with formal requirements
- Performance Testing tests the reliability and responsiveness of a system
under different types of conditions and scenarios
- Penetration Testing tests the system for security vulnerabilities
- End-to-End Testing is used to determine whether a system’s process flow
functions as expected
- Fuzzing is used to test how a system responds to unexpected, random, or
invalid data or inputs
Genesis is a versatile testing platform designed to automate the tests listed
above, making it faster and simpler to conduct them on distributed systems
where it was traditionally difficult to do so. Where Performance, End-to-End,
and Functional testing comprise the meat of Genesis’ services, other types of
testing are enabled through the deployment of services and sidecars on the
platform.
End-to-End tests can be designed by applying exit code checks for process
completion, success, or failure in tasks and phases, while Performance tests
can be conducted by analyzing data from tests on Genesis that apply a variety
of network conditions and combinations thereof. Functional tests can use a
combination of tasks, phases, supplemental services and sidecars, and network
conditions, among other tools.
These processes and tools are further described in this documentation.
Jepsen Project
@[https://jepsen.io/]
Jepsen is an effort to improve the safety of distributed databases, queues,
consensus systems, etc. We maintain an open source software library for systems
testing, as well as blog posts and conference talks exploring particular
systems’ failure modes. In each analysis we explore whether the system lives
up to its documentation’s claims, file new bugs, and suggest recommendations
for operators.
Jepsen pushes vendors to make accurate claims and test their software
rigorously, helps users choose databases and queues that fit their needs, and
teaches engineers how to evaluate distributed systems correctness for
themselves.
AAA
keycloak
@[https://www.keycloak.org/]
@[https://medium.com/keycloak/keycloak-essentials-86254b2f1872] [TODO]
@[https://medium.com/keycloak/keycloak-realm-client-configuration-dfd7c8583489]
• See also @[../WebTechnologies/map.html#keycloak_angular_openid]
"""Add authentication to applications and secure services with minimum fuss"""
- No need to deal with storing users or authenticating users.
- It's all available out of the box.
- Advanced features such as User Federation, Identity Brokering and Social Login.
Single-Sign On LDAP and Active Directory
Login once to multiple applications Connect to existing user directories
Standard Protocols Social Login
OpenID Connect, OAuth 2.0 Easily enable social login
and SAML 2.0
Identity Brokering Clustering
OpenID Connect or SAML 2.0 IdPs For scalability and availability
High Performance Themes
Lightweight, fast and scalable Customize look and feel
Extensible
Customize through code
Password Policies
Customize password policies
- AA using the Keycloak REST API
@[https://developers.redhat.com/blog/2020/11/24/authentication-and-authorization-using-the-keycloak-rest-api/]
@[https://github.com/edwin/java-keycloak-integration]
Ex Use Case:
- centralized ID platform for Indonesia’s Ministry of Education
single sign-on:
Identity Roles:
- Students (IDs)
- Teachers (IDs)
- each user have different access and privileges at each school.
BºSETUP:º {
→ Creating Keycloak realm:
→ Add 'education' realm for ministry.
(click enable and create)
→ go to 'Roles' page:
→ choose tab "Realm Roles".
→ Create two roles:
- "teacher"
- "student"
→ (Clients page) click Create to add a client:
- create "jakarta-school" client and save
→ Go to jakarta-school details page → "Settings tab":
- Enter next client config:
Client ID: jakarta-school
Enabled: ON
Consent Required: OFF
BºClient Protocol: openid-connectº
Access Type: confidential
BºStandard Flow Enabled: ONº
Impact Flow Enabled: OFF
BºDirect Access Grants Enabled: ONº
Browser Flow: browser (← Scroll to bottom)
Direct Grant Flow: direct grant
→ Go to tab "Roles" tab → Add next Roles:
- "create-student-grade"
- "view-student-grade"
- "view-student-profile"
→ go to tab "Client Scopes" → Default Client Scopes and
add “roles” and “profile” to Assigned Default Client
Scopes.
→ go to "jakarta-school details" page → Mappers →
Create Protocol Mappers:
- Set mappers to display the client roles on the
Userinfo API:
Name : roles
Mapper Type : User Realm Role
Multivalued : ON
Token Claim Name : roles
Claim JSON Type : String
Add to ID token : OFF
Add to access token: OFF
Add to userinfo : ON
→ go to "Users page" → "Add user".
- create the new users and Save. Ex:
Username : edwin
BºEmail : edwin@redhat.comº
First Name : Edwin
Last Name : M
User Enabled : ON
BºEmail Verified: OFFº
→ Go to tab "Role Mappings" → Client Roles
(for each user in jakarta-school)
Check that roles perm. are enabled:
"create-student-grade"
"view-student-grade"
"view-student-profile".
}
BºSpring Boot Java App using Keycloak for AAº
- pom.xml:
˂?xml version="1.0" encoding="UTF-8"?˃
...
˂dependencies˃
...
˂dependency˃
˂groupId˃com.auth0˂/groupId˃
˂artifactId˃jwks-rsa˂/artifactId˃
˂version˃0.12.0˂/version˃
˂/dependency˃
˂dependency˃
˂groupId˃com.auth0˂/groupId˃
˂artifactId˃java-jwt˂/artifactId˃
˂version˃3.8.3˂/version˃
˂/dependency˃
˂/dependencies˃
˂/project˃
- Config. properties file:
server.error.whitelabel.enabled=false
spring.mvc.favicon.enabled=false
server.port = 8082
keycloak.client-id=jakarta-school
keycloak.client-secret=197bc3b4-64b0-452f-9bdb-fcaea0988e90
keycloak.scope=openid, profile
keycloak.authorization-grant-type=password
keycloak.authorization-uri=http://localhost:8080/auth/realms/education/protocol/openid-connect/auth
keycloak.user-info-uri=http://localhost:8080/auth/realms/education/protocol/openid-connect/userinfo
keycloak.token-uri=http://localhost:8080/auth/realms/education/protocol/openid-connect/token
keycloak.logout=http://localhost:8080/auth/realms/education/protocol/openid-connect/logout
keycloak.jwk-set-uri=http://localhost:8080/auth/realms/education/protocol/openid-connect/certs
keycloak.certs-id=vdaec4Br3ZnRFtZN-pimK9v1eGd3gL2MHu8rQ6M5SiE
logging.level.root=INFO
- Config: Get the Keycloak public certificate ID from:
Keycloak → Education → Keys → Active (with RSA-generated key selected).
- Integration code: (several Keycloak APIs are involved).
- logout API:
@Value("${keycloak.logout}")
private String keycloakLogout;
public void logout(String refreshToken) throws Exception {
MultiValueMap˂String, String˃ map = new LinkedMultiValueMap˂˃();
map.add("client_id",clientId);
map.add("client_secret",clientSecret);
map.add("refresh_token",refreshToken); // ← RºWARNº: Use refresh
// (vs access) token
HttpEntity˂MultiValueMap˂String, String˃˃ request = new HttpEntity˂˃(map, null);
restTemplate.postForObject(keycloakLogout, request, String.class);
}
- Check whether a bearer token is valid and active or not
@Value("${keycloak.user-info-uri}")
private String keycloakUserInfo;
private String getUserInfo(String token) {
MultiValueMap˂String, String˃ headers = new LinkedMultiValueMap˂˃();
headers.add("Authorization", token);
HttpEntity˂MultiValueMap˂String, String˃˃ request =
new HttpEntity˂˃(null, headers);
return restTemplate.postForObject(keycloakUserInfo, request, String.class);
}
- Authorization (== "role valid for a given API"?) alternatives:
- Alt 1. determine what role a bearer token brings by verifying it
against Keycloak’s userinfo API
RºDrawback:º multiple roundtrip request needed
following response from Keycloak is expected:
{
"sub": "ef2cbe43-9748-40e5-aed9-fe981e3082d5",
Oº"roles": [ "teacher" ],º
"name": "Edwin M",
"preferred_username": "edwin",
"given_name": "Edwin",
"family_name": "M"
}
- Alt 2: decode JWT bearer token and validate it.
- code needs firt the public key used for signing the token:
(It requires a round trip but can be cached)
- Keycloak provides a JWKS endpoint. Ex:
$ curl -L -X GET 'http://.../auth/realms/education/protocol/openid-connect/certs'
{ "keys": [
{
"kid": "vdaec4Br3ZnRFtZN-pimK9v1eGd3gL2MHu8rQ6M5SiE", ← key id
"kty": "RSA",
"alg": "RS256",
"use": "sig",
"n": "4OPCc_LDhU6ADQj7cEgRei4....",← Oºpublic key usedº
"e": "AQAB" Oºfor decodingº
}
] }
- A sample decoded JWT token will look like:
{
"jti": "85edca8c-a4a6-4a4c-b8c0-356043e7ba7d",
"exp": 1598079154,
"nbf": 0,
"iat": 1598078854,
"iss": "http://localhost:8080/auth/realms/education",
"sub": "ef2cbe43-9748-40e5-aed9-fe981e3082d5",
"typ": "Bearer",
"azp": "jakarta-school",
"auth_time": 0,
"session_state": "f8ab78f8-15ee-403d-8db7-7052a8647c65",
"acr": "1",
"realm_access": { "roles": [ "teacher" ] },
"resource_access": {
"jakarta-school": {
"roles": [ "create-student-grade", "view-student-profile",
"view-student-grade" ]
}
},
"scope": "profile",
"name": "Edwin M",
"preferred_username": "edwin",
"given_name": "Edwin",
"family_name": "M"
}
Ex JWT validation code:
@GetMapping("/teacher")
HashMap teacher(
@RequestHeader("Authorization") String authHeader) {
try {
DecodedJWT jwt = JWT.decode(
authHeader.replace("Bearer", "").trim());
Jwk jwk = jwtService.getJwk();
Algorithm algorithm = Algorithm.RSA256(
(RSAPublicKey) jwk.getPublicKey(), null);
algorithm.verify(jwt); // ← Bºcheck signatureº
List˂String˃ roles = ((List)jwt.getClaim("realm_access").
asMap().get("roles"));
if(!roles.contains("teacher")) // ← Bºcheck rolesº
throw new Exception("not a teacher role");
Date expiryDate = jwt.getExpiresAt(); // ← Bºcheck expirationº
if(expiryDate.before(new Date()))
throw new Exception("token is expired");
return new HashMap() {{ put("role", "teacher"); }}; // validation OK
} catch (Exception e) {
throw new RuntimeException(e);
}
}
FreeIPA
@[https://www.reddit.com/r/linuxadmin/comments/apbjtc/freeipa_groups_and_linux_usernames/]
Non-classified
API Management
@[https://www.datamation.com/applications/top-api-management-tools.html]
Behaviour
Driven
Development
(BDD)
- Cucumber, ...
@[https://cucumber.io/docs/guides/10-minute-tutorial/]
Infra vs App Monitoring
See also @[../JAVA/java_map.html#mdc_summary]
BºInfrastructure Monitoring:º
- Prometheus + Grafana
(Alternatives include Monit, Datadog, Nagios, Zabbix, ...)
BºApplication distributed tracing/Monitoring (End-to-End Request Tracing)º
- Jaeger (OpenTracing compatible), Zipkin, New Relic
(Alternatives include AppDynamics, Instana, ...)
-ºtrack operations inside and across different systemsº.
- Check for example how an incomming request affected to the web
server, database, app code, queue system, all presented along
a timeline.
- Especially valuable in distributed systems.
- It complements logs and metrics:
- App.request metrics warns about latencies in remote requests
while local traces just do at local isolated systems.
- Logs on each isolated system can explain why such system is "slow".
- Infra. monitoring warn about scarcity of CPU/memory/storage resources.
- See also notes on Spring Cloud Sleuth @[../JAVA/java_map.html] that
offers an interface to different tracing tech.stacks. (Jaeger,...)
• Related: https://github.com/dapr/dapr
Dapr: portable, event-driven, runtime for building distributed
applications across cloud and edge.
...by letting Dapr’s sidecar take care of the complex challenges such
as service discovery, message broker integration, encryption,
observability, and secret management, you can focus on business logic
and keep your code simple.
... usually a developer must add some code to instrument an
application for this purpose send collected data to external
monitoring tool ... Having to maintain this code adds another
burden sometimes requiring understanding the monitoring tools'
APIs, additional SDKs ... different cloud providers offer
different monitoring solutions.
• Observability with Dapr:
• When building an application which leverages Dapr building blocks
to perform service-to-service calls and pub/sub messaging, Dapr
offers an advantage with respect to distributed tracing. Because this
inter-service communication flows through the Dapr sidecar, the
sidecar is in a unique position to offload the burden of
application-level instrumentation.
• Distributed tracing
Dapr can be configured to emit tracing data, and because Dapr does so
using widely adopted protocols such as the Zipkin protocol, it can be
easily integrated with multiple monitoring backends.
BºLogging How Toº:
→ Start by adding logs and infra monitoring
→ add application monitoring:
( requires support from programming languages/libraires, developpers and devOps teams)
- instrument code (Jaeger)
- add tracing to infrastructure components
- load balancers
→ deploy App tracing system itself.
(Ex.: Jaeger server, ...)
Bº(Bulk) Log Managementº
- Elastic Stack
(Alternative include Graylog, Splunk, Papertrail, ...)
Jaeger (App.Monit)
@[https://www.jaegertracing.io/]
@[https://logz.io/blog/zipkin-vs-jaeger/]
- Support
- Support for Open Tracing instrumentation libraries
like @[https://github.com/opentracing-contrib].
- K8s Templates and Operators supported. [k8s]
@[https://developers.redhat.com/blog/2019/11/14/tracing-kubernetes-applications-with-jaeger-and-eclipse-che/]
- Jaeger addresses next problems:
- end-to-end distributed transaction monitoring/tracing.
troubleshooting complex distributed systems.
- performance and latency optimization.
- root cause analysis.
- service dependency analysis.
- distributed context propagation.
º"... As on-the-ground microservice practitioners are quickly realizing, theº
º majority of operational problems that arise when moving to a distributedº
º architecture are ultimately grounded in two areas: networking andº
º observability. It is simply an orders of magnitude larger problem to networkº
º and debug a set of intertwined distributed services versus a singleº
º monolithic application..."º
Jaeger Log Data Architecture Schema:
(Based on OpenTracing)
@[https://www.jaegertracing.io/docs/1.21/architecture/]
@[https://github.com/opentracing/specification/blob/master/specification.md]
ºSPANº ºTRACEº:
(logical work-unit) - data/execution path
---------------- through system components
- operation name (sort of directed acyclic
- start time graph of spans).
- duration.
- Spans may be nested
and ordered to model
causal relationships.
Unique ID → │A│
(Context) └─│B│
└─│C│─│D│
└········│E│
^^^^^^^^
barrier waiting for
│B│,│C│ results
········→ time line →···············
┐
├───────────── SPAN A ────────────┤ │TRACE
├──── SPAN B ───┤ │
├─ SPAN C ─┤ │
├─ SPAN D ─┤ │
├─ SPAN E ─┤ │
┘
• Jaeger Components Schema:
BºDATA INPUTº
Application jaeger-collector
└→ App Instrumetation ┌→ (Go Lang)
└→ OpenTracing API · + memory-queue ·┐
└→ Jaeger-client ···→ Jaeger ··┘ ·
agent ·
(Go lang) ·
·
Data Store ←··┘
(Cassandra,
Elastic Search,
Kafka, Memory)
^ ^
· ·
BºDATA QUERY (App. Monit)º · ·
· ·
Jaeger-UI ←····→ jaeger-query ←·········┘ ·
(Web React) (GO lang) ·
(Alt.1) ·
·
Spark ······································┘
(Alt.2)
Disruptor
Disruptor
See also
lmax-exchange.github.com
When Disruptor is not good fit
GraphQL
• GraphQL: Facebook-developed language that provides a powerful API to get only
the dataset you need in a single request, seamlessly combining data sources.
• REF: https://www.redhat.com/en/topics/api/what-is-graphql
The purpose of this monorepo is to give the GraphQL Community:
a to-specification official language service (see: API Docs)
a comprehensive LSP server and CLI service for use with IDEs
a codemirror mode
a monaco mode (in the works)
an example of how to use this ecosystem with GraphiQL.
examples of how to implement or extend GraphiQL.
Apollo GraphQL Middelware
@[https://www.apollographql.com/]
· Simplify app development by combining APIs, databases, and microservices into a
single data graph that you can query with GraphQL.
· @[https://www.infoq.com/news/2020/06/apollo-platform-graphql/]
Apollo Data Graph Platform: A GraphQL Middleware Layer for the Enterprise
Big Data
Ambari: Hadoop Cluster Provision
@[https://projects.apache.org/project.html?ambari]
Apache Ambari makes Hadoop cluster provisioning, managing, and monitoring dead simple.
Spark
@[http://spark.apache.org/]
- General cluster computing framework
- initially designed around the concept of Resilient Distributed Datasets (RDDs).
- RDDs enable data reuse by persisting intermediate results in memory
- Use case 1: fast computations for iterative algorithms.
- especially beneficial for work flows like machine learning
(Bºsame operation may be applied over and over againº)
- Use case 2: clean and/or transform data.
- Speed
- Run workloads 100x faster (compared to Hadoop?).
- Apache Spark achieves high performance for both batch
and streaming data, using a state-of-the-art DAG scheduler,
a query optimizer, and a physical execution engine.
- Ease of Use
- Write applications quickly in Java, Scala, Python, R, and SQL.
- Spark offers over 80 high-level operators that make it easy to
build parallel apps. And you can use it interactively from the
Scala, Python, R, and SQL shells.
- Example PythonBºDataFrame APIº.
|Bºdfº=ºspark.read.jsonº("logs.json") ← automatic schema inference
|Bºdfº.where("age ˃ 21")
| .select("name.first").show()
- Generality
- Combine SQL, streaming, and complex analytics.
- Spark powers a stack of libraries including SQL and DataFrames,
MLlib for machine learning, GraphX, and Spark Streaming.
You can combine these libraries seamlessly in the same application.
- Runs Everywhere
- Spark runs on Hadoop, Apache Mesos, Kubernetes, standalone,
or in the cloud. It can access diverse data sources.
- You can run Spark using its standalone cluster mode, on EC2, on Hadoop YARN,
on Mesos, or on Kubernetes. Access data in HDFS, Alluxio, Apache Cassandra,
Apache HBase, Apache Hive, and hundreds of other data sources.
- Common applications for Spark include real-time marketing campaigns, online
product recommendations, cybersecurity analytics and machine log monitoring.
Apache Druid
- Complementary to Spark, Druid can be used to accelerate OLAP queries in Spark.
- Spark: indicated for BºNON-interactiveº latencies.
- Druid: indicated for Bº interactiveº (sub-milisec) scenarios.
- Use-case:
- powering applications used by thousands of users where each query
must return fast enough such that users can interactively explore
through data.
- Druid fully indexes all data, and Bºcan act as a middle layer between º
BºSpark and final appº.
BºTypical production setupº:
data → Spark processing → Druid → Serve fast queries to clients
More info at:
https://www.linkedin.com/pulse/combining-druid-spark-interactive-flexible-analytics-scale-butani
Hadoop "vs" Spark
@[https://www.infoworld.com/article/3014440/big-data/five-things-you-need-to-know-about-hadoop-v-apache-spark.html]
Hadoop is essentially a distributed data infrastructure:
-It distributes massive data collections across multiple nodes
within a cluster of commodity servers
-It also indexes and keeps track of that data, enabling
big-data processing and analytics far more effectively
than was possible previously.
Spark, on the other hand, is a data-processing tool that operates on those
distributed data collections; it doesn't do distributed storage.
You can use one without the other:
- Hadoop includes not just a storage component, known as the
Hadoop Distributed File System, but also a processing component called
MapReduce, so you don't need Spark to get your processing done.
- Conversely, you can also use Spark without Hadoop. Spark does not come with
its own file management system, though, so it needs to be integrated with one
- if not HDFS, then another cloud-based data platform. Spark was designed for
Hadoop, however, so many agree they're better together.
Spark is generally a lot faster than MapReduce because of the way it processes
data. While MapReduce operates in steps, Spark operates on the whole data set
in one fell swoop:
"The MapReduce workflow looks like this: read data from the cluster, perform
an operation, write results to the cluster, read updated data from the
cluster, perform next operation, write next results to the cluster, etc.,"
explained Kirk Borne, principal data scientist at Booz Allen Hamilton.
Spark, on the other hand, completes the full data analytics operations
in-memory and in near real-time:
"Read data from the cluster, perform all of the requisite analytic
operations, write results to the cluster, done," Borne said.
Spark can be as much as 10 times faster than MapReduce for batch processing and
p to 100 times faster for in-memory analytics, he said.
You may not need Spark's speed. MapReduce's processing style can be just fine
if your data operations and reporting requirements are mostly static and you
can wait for batch-mode processing. But if you need to do analytics on
streaming data, like from sensors on a factory floor, or have applications that
require multiple operations, you probably want to go with Spark.
Most machine-learning algorithms, for example, require multiple operations.
Recovery: different, but still good.
Hadoop is naturally resilient to system faults or failures since data
are written to disk after every operation, but Spark has similar built-in
resiliency by virtue of the fact that its data objects are stored in something
called resilient distributed datasets distributed across the data cluster.
"These data objects can be stored in memory or on disks, and RDD provides full
recovery from faults or failures," Borne pointed out.
Hive
Data W.H. on top of Hadoop
@[https://projects.apache.org/project.html?hive]
- querying and managing utilities for large datasets
residing in distributed storage built on top of Hadoop
- easy data extract/transform/load (ETL)
* a mechanism to impose structure on a variety of data formats*
- access to files stored in Apache HDFS and other data storage
systems such as Apache HBase (TM)
HiveMQ Broker: MQTT+Kafka for IoT
@[https://www.infoq.com/news/2019/04/hivemq-extension-kafka-mqtt/]
Data Science
Orange
@[https://orange.biolab.si/screenshots/]
Features:
- Interactive Data Visualization
- Visual Programming
- Student's friendly.
- Add-ons
- Python Anaconda Friendly
$ conda config --add channels conda-forge
$ conda install orange3
$ conda install -c defaults pyqt=5 qt
- Python pip Friendly
$ pip install orange3
Example Architectures
observability:
loging
+ monitoring
+ tracing
Ex:ºFile Integrity Monitoring at scale: (RSA Conf)º
@[https://www.rsaconference.com/writable/presentations/file_upload/csv-r14-fim-and-system-call-auditing-at-scale-in-a-large-container-deployment.pdf]
Auditing log to gain insights at scale:
┌─→ Pagerduty
┌─→ Grafana ─┼─→ Email
Elastic │ └─→ Slack
Search ─┼─→ Kibana
│
└─→ Pre─processing ─→ TensorFlow
Alt1:
User │ go-audit- User space
land │ container app
───────├───── Netlink ───── Syscall iface ───────────
Kernel │ socket ^
│ ^ |
└─ Kauditd ───┘
Loggin@Coinbase
@[https://www.infoq.com/news/2019/02/metrics-logging-coinbase]
Adidas Reference Architecture
Source @[https://adidas.github.io]
ºuser interfaceº
Working with the latest JavaScript technologies as well as building a stable
ecosystem is our goal. We are not only using one of many available frameworks,
we test them, we analyze our necessities and we choose what has the best
balance between productivity and developer satisfaction.
adidas is moving most of the applications from the desktop to the browser, and
we are improving every day to achieve it in the less time possible:
standardization, good practices, continuous deployment, automated tasks are
part of our daily work.
Our stack has changed a lot since the 2015. Back then we used code which is
better not to speak about. Now, we power our web applications with frameworks
like React and Vue and modern tooling. Our de facto tool for shipping these
applications is webpack.
We love how JavaScript is changing everyday and we encourage the developers to
use its latest features. TypeScript is also a good choice for developing
consistent, highly maintainable applications.
• ºAPIº
Apiary is our collaborative platform for designing, documenting and governing
API's. It helps us to adopt API first approach with templates and best
practices. Platform is strongly integrated with our API guidelines and helps us
to achieve consistency of all our API's. Apiary also speeds up and simplifies
the development and testing phase by providing API mock services and automation
of API testing.
Mashery is an API platform that delivers key API management capabilities needed
for successful digital transformation initiatives. Its full range of
capabilities includes API creation, packaging, testing, and management of your
API's and community of users. API security is provided via an embedded or
optional on-premise API gateway.
Runscope is our continuous monitoring platform for all our API's we are
exposing. It monitors uptime, performance and validates the data. It is tightly
integrated with Slack and Opsgenie in order to alert and create incident if
things are going wrong way. Using Runscope we can detect any issues with our
API's super fast and act accordingly to achieve highest possible SLA.
ºfast dataº
Apache Kafka is the core of our Fast Data Platform.
The more we work with it, the more versatile we perceived it was, covering more
use cases apart of event sourcing strategies.
It is designed to implement the pub-sub pattern in a very efficient way,
enabling Data Streaming cases, where multiple can easily subscribe to the same
stream of information. The fan-out is the pub-sub pattern is implemented in a
really efficient way, allowing to achieve high thought in the production and
the consumption part.
It also enables other capabilities like Data Extraction and Data Modeling, as
long as Stateful Event Processing. Together with Storm, Kafka Streams and the
Elastic stack, provides the perfect toolset to integrate our digital
applications in a modern self-service way.
ºacidº
Jenkins has become the de facto standard for continuous integration/continuous
delivery projects.
ACID is a 100% docker powered by Kubernetes that allows you to create a Jenkins
as a Service platform. ACID provides an easy way to generate an isolated
Jenkins environment per team while keeping a shared basic configuration and a
central management dashboard.
Updating and maintaining teams instances have never been so easy. Team freedom
provided out of the box, gitops disaster recovery capabilities and elastic
slaves allocation are the key pillars.
ºtestingº
SerenityBDD is our default for automation in test solution. Built on top of
Selenium and Cucumber, it enables us to write cleaner and more maintainable
automated acceptance and regression tests in a faster way. With archetypes
available for both frontend (web) and backend (API), it is almost
straightforward to be implemented in most projects.
Supporting a wide type of protocols and application types, Neoload is our tool
of choice for performance testing. It provides an easy to use interface with
powerful recording capabilities, allowing manual implementation as well if the
user chooses. Since it supports dockerized load generators, it makes it easy to
create a shared infrastructure and integrate it in our continuous integration
chain.
ºmonitoringº
From the monitoring perspective, we are using different tools trying to provide
internally end-to-end monitoring, for a better troubleshooting.
For system and alerting monitoring, we are using Prometheus, an open source
solution, easy for developers and completely integrated with Kubernetes. For
infrastructure monitoring we have Icinga with hundreds of custom scripts and
graphite as timeseries database. About APM, Instana is our solution, which
makes easy and fast to monitor the applications and identify bottlenecks. We
use Runscope to test and monitor our APIs with continuous schedule to give us
completely visibility into API problems.
Grafana is the visualization tool by default which is integrated with
Prometheus, Instana, Elasticsearch, Graphite, etc.
For logging, we are based on the ELK stack running on-premises and AWS
Kubernetes, with custom components for processing and alerting being able to
notify the problems to the teams.
As the goal of monitoring is the troubleshooting, all these tools have
integration with Opsgenie, where we have defined escalation policies for
alerting and notifications. The incidents can be communicated to the final
users via Statuspage.
ºmobileº
At mobile development, we run out of hybrid solutions. Native is where we live.
We work with the latest and most concise, expressive and safe languages
available at the moment for android and iOS mobile phones: Kotlin and Swift.
Our agile mobile development process requires big doses of automation to test,
build and distribute reliably and automatically every single change in the app.
We rely on tools such as Jenkins and Fastlane to automate those steps.
100% Serverless AWS arch
What a typical 100% Serverless Architecture looks like in AWS!
@[https://medium.com/serverless-transformation/what-a-typical-100-serverless-architecture-looks-like-in-aws-40f252cd0ecb]
Example Data Store Arch
- Mix data store model based on:
- Document store:
- MongoDB : Store and manage objects directly as documents, keeping their structure intact.
Avoiding Object Relational Mapping in RDMS.
- MongoDB-GridFS : storage system supports efficient querying and storage of binary files.
(videos streamed).
Avoid the need fo blob storage structures.
- Graph database: (Hyper-relational ddbb)
- Neo4j: Allows for complex queries for highly linked data, stored data as nodes and links.
- Spring Data module repositories: used to "match" Java codebase with underlying data store.
- Allows mapping java objects to MongoDB with minimal effort.
Facebook
netconsole monit@scale
@[http://www.serverwatch.com/server-news/linuxcon-how-facebook-monitors-hundreds-of-thousands-of-servers-with-netconsole.html]
"""""" Facebook had a system in the past for monitoring
that used syslog-ng, but it was less than 60 percent reliable.
In contrast, Owens stated netconsole is highly scalable and can
handle enormous log volume with greater than 99.99 percent
reliability. """""""
Badoo
20 billion Events/day
@[https://www.infoq.com/news/2019/08/badoo-20-billion-events-per-day/]
Kafka
Summary
External Links
- Documentation:
@[http://kafka.apache.org/documentation/]
- Clients: (Java,C/C--,Python,Go,...)
@[https://cwiki.apache.org/confluence/display/KAFKA/Clients]
- Official JAVA JavaDoc API:
@[http://kafka.apache.org/11/javadoc/overview-summary.html]
- Papers, ppts, ecosystem, system tools, ...:
@[https://cwiki.apache.org/confluence/display/KAFKA/Index]
- Kafka Improvement Proposals (KIP)
@[https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals]
- Data pipelines with Kotlin+Kafka+(async)Akka
@[../JAVA/kotlin_map.html?query=data+pipelines+kafka]
- Cheatsheets:
@[https://lzone.de/cheat-sheet/Kafka]
@[https://ronnieroller.com/kafka/cheat-sheet]
@[https://abdulhadoop.wordpress.com/2017/11/06/kafka-command-cheat-sheet/]
@[https://github.com/Landoop/kafka-cheat-sheet]
Global Diagram
(one+ server in 1+ datacenters)
☞Bºbest practiceº: publishers must be unaware of underlying
LOG partitions and only specify a partition key:
While LOG partitions are identifiable and can be sent
to directly, it is not recommended. Higher level
constructs are recommended.
┌────────┐ ┌──────────┐ ┌───────────┐
│PRODUCER│ 1 ←······→ 1 │TOPIC │ 1 ←···→ 0...N │SUBSCRIBERS│
│ ··→ stream of··→(CATEGORY)│ │ │
└────────┘ OºRECORDsº └──────────┘ └───────────┘
↑ ====== 1 1 ← Normally, 1←→1, but not necesarelly
| Oº·key º ↑ ↑
· Oº·value º · ·
· Oº·timeStamp º · ·
· · · GºCONSUMER GROUPº: cluster of consumers
· · ↓ | (vs single process consumer) º*1º
(optionally) producers · 1 ↓
can wait for ACK. ↓ CONSUMER | CONSUMER ←───┐
· 1 GROUP A | GROUP B │
· ┌───┬────────────────────┬──────────────────┐ --------- | --------------------
· │LOG│ │ ordered records │ Cli1 Cli2 | Cli3 Cli4 Cli5 Cli6 │
| ├───┘ │ ─────────────── │ | | | | | | |
└···→┌Partitionº0ºreplica ┌1 │ 0 1 2 3 4 5 │ ←┤···········←┘ · · · │
│└Partitionº0ºreplica │2┐│ 0 1 2 3 4 5 │ · · | · · ·
│┌Partition 1 replica ├1││ 0 1 2 3 │ · · | · · · │
│└Partition 1 replica │2┤│ 0 1 2 3 │ ·+···←┤ | · · ·
│┌Partitionº2ºreplica ├1││ ... │ · | | · · · │
│└Partitionº2ºreplica │2┤│ │ |···←┘ | · · ·
│┌Partition 3 replica ├1││ - Partitions are │ ←┴················←┘ · · │
│└Partition 3 replica │2┤│ independent and │ | · ·
│┌Partitionº4ºreplica ├1││ grow at differn. │ | · · │
│└Partitionº4ºreplica │2┤│ rates. │ | · ·
│┌Partition 5 replica ├1││ │ | · · │
│└Partition 5 replica │2┤│ - Records expire │ ←······················←┘ ·
│┌Partitionº6ºreplica └1││ and can not be │ | · │
│└Partitionº6ºreplica 2┘│ deleted manually │ ←···························←┘
└──↑───────────────────^─┴──────────────────┘ * num. of group instances │
│ │ ←── offset ─→ must be <= # partitions
│ └────────────┐ record.offset (sequential id) │
┌─────────────────────┴─────────────────┐ │ uniquelly indentifies the record
─ Partitions serve several purposes: │ within the partition. │
─ allow scaling beyond a single server. │
─ act as the ☞BºUNIT OF PARALLELISMº. │ - aGºCONSUMER GROUPºis a view (state, ┐ │
─ Partition.ºRETENTION POLICYº indicates │ position, or offset) of full LOG. │
how much time records will be available │ - consumer groups enable different ├───┘
for compsumption before being discarded.│ apps to have a different view of the │
-Bºnumber of partitions in an event hubº │ LOG, and to read independently at │
Bºdirectly relates to the number of │ their own pace and with their own ┘
Bºconcurrent readers expected.º │ offsets:
-OºTOTAL ORDER of events is just guaran-º│ ☞Bºin a stream processing Arch,º
Oºteed inside a partition.º │ Bºeach downstream applicationº
Messages sent by a producer to a given │ Bºequates to a consumer groupº
partition are guaran. to be appended │
in the order they were sent. │
┌───────────────────────────────┘
- Messages sent by a producer to a particular topic partition are guaranteed
to be appended in the order they are sent.
┌───────┴────────┐
PARTITION REPLICA: (fault tolerance mechanism)
└ Kafka allows producers to wait on acknowledgement so that a write
isn't considered complete until it is fully replicated and guaranteed
to persist even if the server written to fails, allowing to balance
replica consistency vs performance.
└ Each partition has one replica server acting as leader" and 0+
replica servers "followers". The leader handles all read
and write requests for the partition while the followers passively
replicate the leader. If the leader fails, one of the followers
replaces it.
└ A server can be a leader for partition A and a follower for
partition B, providing better load balancing.
└ For a topic with replication factor N, we will tolerate up to N-1
server failures without losing any records committed to the log.
º*1:º
- Log partitions are (dnyamically) divided over consumer instances so
that each client instance is the exclusive consumer of a "fair share"
of partitions at any point in time.
- The consumer group generalizes the queue and publish-subscribe:
- As with a queue the consumer group allows you to divide (scale) up
processing over a collection of processes (the members of the consumer group).
- As with publish-subscribe, Kafka allows you to broadcast messages to
multiple consumer groups
data-schema Support
Apache AVRO format
• Efficient compac way to store data in disk.
• "Imported" from Hadoop's World.
• Efficient: Binary format, containing only data (vs JSON/JSON-schema field name+data)
libraries (avro-tools.jar, kafka-avro-console-consumer,...) used to write/read.
• Schema never goes inside the message, only the "SchemaID".
• Support for (fast) Google’s Snappy Compression.
• Support forºon-demand deserialization of columns (vs full data)º.
• It's not a column store (like Parquet, supporting better compression and insertion [comparative]
predicates), but it adapts better to big fragments of data.
@[https://www.slideshare.net/HadoopSummit/file-format-benchmark-avro-json-orc-parquet]
• Example:
JAVA Class ←··→ Avro Schema
================= ======================
public class User { {
long id; "type": "record",
String name; "name": "User",
} "fields" : [
{"name": "id" , "type": "long" },
{"name": "name", "type": "string"}
]
}
• No data is null by default [million_dolar_mistake]
• Enterprise Patterns:
• Define global or by-product dictionary of Business events.
• Manage Avro schemas within a Source Control Management ( "Git" )
(Notice also Avro support for schema evolution)
"Dump" schemas and topic relations to Confluent Schema Registry
when new events are validated (or existing events updated).
event validation: Get sure they are really needed
(they are NOT a duplicated view of other existing event).
• Ideally each kafka topic corresponds to a single schema.
A (very opinionated) "HIERARCHICAL TOPIC NAMING" can be similar to:
${cluster}.${tenant}.${product_or_concern}.${recipient}.${version}
└────┬────┘
process, person or group in charge
of reacting to event.
This hierarchy also help in classifying different topics into
different "Service Levels" categories (scalability, partitions, replication
factor, retention time, ...)
@[https://shravan-kuchkula.github.io/kafka-schemas/#understand-why-data-schemas-are-a-critical-part-of-a-real-world-stream-processing-application]
Kafka Schema Registry
BºDZone Intro Summaryº
@[https://dzone.com/articles/kafka-avro-serialization-and-the-schema-registry]
by Jean-Paul Azar
Confluent Schema Registry:
- REST API for producers/consumers managing Avro Schemas:
- store schemas for keys and values of Kafka records.
- List schemas and schema-versions by subject.
- Return schema by version or ID.
- get the latest version of a schema.
- Check if a given schema is compatible with a certain version.
- Compatibility level include:
- backward: data written with old schema readable with new one.
- forward : data written with new schema is readable with old one
- full : backward + forward
- none : Schema is stored by not schema validation is disabled
(Rºnot recommendedº).
- configured Bºglobally or per subjectº.
- Compatibility settings can be set to support Bºevolution of schemasº.
- Kafka Avro serialization project provides serializers
taking schemas as input?
- Producers send the schema (unique) ID and consumers fetch (and cache)
the full schema from the Schema Registry.
- Producer will create a new Avro record (schema ID, data).
Kafka Avro Serializer will register (and cache locally) the associated
schema if needed, before serializing the record.
- Schema Registry ensures that producer and consumer see compatible
schemas and Bºautomatically "transform" between compatible schemas,º
Bºtransforming payload via Avro Schema Evolutionº.
BºSCHEMA EVOLUTIONº
Scenario:
- Avro schema modified after data has already been written to store
with old schema version.
OºIMPORTANTº: ☞ From Kafka perspective,Oºschema evolutionº happens only
Oºduring deserialization at the consumer (read)º:
If consumer’s schema is different from the producer’s schema,
and they are compatible, the value or key is automatically
modified during deserialization to conform to the consumer's
read schema if possible.
BºSchema Evolution: Allowed compatible Modification º
- change/add field's default value.
- add new field with default value.
- remove existing field with default value.
- change field's order attribute.
- add/remove field-alias (RºWARN:º can break consumers depending on the alias).
- change type → union-containing-original-type.
BºBest Patternº
- Provide a default value for fields in your schema.
- Never change a field's data type.
- Do NOT rename an existing field (use aliases instead).
Ex: Original schema v1:
{
"namespace": "com.cloudurable.phonebook",
"type": "record",
"name": "Employee",
"doc" : "...",
"fields": [
{"name": "firstName", "type": "string" },
{"name": "nickName" , "type": ["null", "string"] , "default" : null},
{"name": "lastName" , "type": "string" },
{"name": "age" , "type": "int" , "default": -1 },
{"name": "emails" , "type": {"type" : "array",
"items": "string"}, "default":[] },
{"name": "phoneNum" , "type":
[ "null",
{ "type": "record",
"name": "PhoneNum",
"fields": [
{"name": "areaCode" , "type": "string"},
{"name": "countryCode", "type": "string", "default" : ""},
{"name": "prefix" , "type": "string"},
{"name": "number" , "type": "string"}
]
}
]
},
{"name": "status" , "default" :"SALARY",
, "type": {
"type": "enum",
"name": "Status",
"symbols" : ["RETIRED", "SALARY",...]
}
}
]
}
- Schema Version 2:
"age" field, def. value -1, added
| KAFKA LOG |
Producer@v2 →|Employee@v2| ··→ consumer@v.1 ·····→ NoSQL Store
| | ^ ^
| | age field removed 'age' missing
| | @deserialization
| |
| |
| | consumer@ver.2 ←..... NoSQL Store
| | ^ ^
| | age set to -1 'age' missing
BºRegistry REST API Ussageº
└ POST New Schema
$º$ curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" \º
$º$ --data '{"schema": "{\"type\": …}’ \ º
$º$ http://localhost:8081/subjects/Employee/versions º
└ List all of the schemas:
$º$ curl -X GET http://localhost:8081/subjects º
Using Java OkHttp client:
package com.cloudurable.kafka.schema;
import okhttp3.*;
import java.io.IOException;
public class SchemaMain {
private final static
MediaType SCHEMA_CONTENT =
MediaType.parse("application/vnd.schemaregistry.v1+json");
private final static String EMPLOYEE_SCHEMA = "{ \"schema\": \"" ...;
private final static String BASE_URL = "http://localhost:8081";
private final OkHttpClient client = new OkHttpClient();
private static newCall(Requeste request) {
System.out.println(client.newCall(request).execute().body().string() );
}
private static putAndDumpBody (final String URL, RequestBody BODY) {
newCall(new Request.Builder().put(BODY).url(URL).build());
}
private static postAndDumpBody(final String URL, RequestBody BODY) {
newCall(new Request.Builder().post(BODY).url(URL).build());
}
private static getAndDumpBody(final String URL) {
request = new Request.Builder()
.url(URL).build();
System.out.println(client.newCall(request).
execute().body().string());
}
public static void main(String... args) throws IOException {
System.out.println(EMPLOYEE_SCHEMA);
postAndDumpBody( // ← POST A NEW SCHEMA
BASE_URL + "/subjects/Employee/versions",
RequestBody.create( SCHEMA_CONTENT, EMPLOYEE_SCHEMA )
);
getAndDumpBody(BASE_URL + "/subjects"); // ← LIST ALL SCHEMAS
getAndDumpBody(BASE_URL // ← SHOW ALL VERSIONS
+ "/subjects/Employee/versions/");
getAndDumpBody(BASE_URL // ← SHOW VERSION 2 OF EMPLOYEE
+ "/subjects/Employee/versions/2");
getAndDumpBody(BASE_URL // ← "SHOW SCHEMA WITH ID 3
+ "/schemas/ids/3");
getAndDumpBody(BASE_URL // ← SHOW LATEST VERSION
+ "/subjects/Employee/versions/latest");
postAndDumpBody( // ← SCHEMA IS REGISTERED?
BASE_URL + "/subjects/Employee",
RequestBody.create( SCHEMA_CONTENT, EMPLOYEE_SCHEMA )
);
postAndDumpBody( // ← //TEST COMPATIBILITY
BASE_URL + "/compatibility/subjects/Employee/versions/latest",
RequestBody.create( SCHEMA_CONTENT, EMPLOYEE_SCHEMA )
);
getAndDumpBody(BASE_URL // ← TOP LEVEL CONFIG
+ "/config");
putAndDumpBody( // ← SET TOP LEVEL CONFIG VALs
BASE_URL + "/config", // VALs :=none|backward|
RequestBody.create(SCHEMA_CONTENT, forward|full
"{\"compatibility\": \"none\"}"
);
putAndDumpBody( // ← SET CONFIG FOR EMPLOYEE
BASE_URL + "/config/Employee", //
RequestBody.create(SCHEMA_CONTENT,
"{\"compatibility\": \"backward\"}"
);
}
}
BºRUNNING SCHEMA REGISTRYº
$º$ CONFIG="etc/schema-registry/schema-registry.properties"º
$º$ cat ${CONFIG} º
$ºlisteners=http://0.0.0.0:8081 º
$ºkafkastore.connection.url=localhost:2181 º
$ºkafkastore.topic=_schemas º
$ºdebug=false º
$º$ .../bin/schema-registry-start ${CONFIG} º
BºWriting Producers/Consumers with Avro Serializers/Sche.Regº
└ start up the Sch.Reg. pointing to ZooKeeper(cluster).
└ Configure gradle:
plugins {
id "com.commercehub.gradle.plugin.avro" version "0.9.0"
} └─────────────┬──────────────────┘
// http://cloudurable.com/blog/avro/index.html
// transform Avro type → Java class
// Plugin supports:
// - Avro schema files (.avsc) ("Kafka")
// - Avro RPC IDL (.avdl)
// $º$ gradle buildº ← generate java classesº
group 'cloudurable'
version '1.0-SNAPSHOT'
apply plugin: 'java'
sourceCompatibility = 1.8
dependencies {
testCompile 'junit:junit:4.11'
compile 'org.apache.kafka:kafka-clients:0.10.2.0' ←
compile "org.apache.avro:avro:1.8.1" ← Avro lib
compile 'io.confluent:kafka-avro-serializer:3.2.1' ← Avro Serializer
compile 'com.squareup.okhttp3:okhttp:3.7.0'
}
repositories {
jcenter()
mavenCentral()
maven { url "http://packages.confluent.io/maven/" }
}
avro {
createSetters = false
fieldVisibility = "PRIVATE"
}
└ Setup producer to use GºSchema Registryº and BºKafkaAvroSerializerº
package com.cloudurable.kafka.schema;
import com.cloudurable.phonebook.Employee;
import com.cloudurable.phonebook.PhoneNum;
import io.confluent.kafka.serializers.KafkaAvroSerializerConfig;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.LongSerializer;
import io.confluent.kafka.serializers.KafkaAvroSerializer;
import java.util.Properties;
import java.util.stream.IntStream;
public class AvroProducer {
private static Producer˂Long, Employee˃ createProducer() {
final String
serClassName = LongSerializer.class.getName();
KafkaAvroClN = Serializer .class.getName();
SCHEMA_REG_URL_CONFIG = KafkaAvroSerializerConfig.
SCHEMA_REGISTRY_URL_CONFIG;
VAL_SERI_CLASS_CONFIG = ProducerConfig.
VALUE_SERIALIZER_CLASS_CONFIG
final Properties props = new Properties();
props.put(ProducerConfig. BOOTSTRAP_SERVERS_CONFIG , "localhost:9092");
props.put(ProducerConfig. CLIENT_ID_CONFIG , "AvroProducer" );
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG , serClassName );
Bºprops.put( VAL_SERI_CLASS_CONFIG , KafkaAvroClN );º
Bº// └────────────────────────┬──────────────────────────────┘ º
Bº// CONFIGURE KafkaAvroSerializer. º
Gºprops.put( SCHEMA_REG_URL_CONFIG , º // ← Set Schema Reg.
Gº "http://localhost:8081");º URL
return new KafkaProducer˂˃(props);
}
private final static String TOPIC = "new-employees";
public static void main(String... args) {
Producer˂Long, Employee˃ producer = createProducer();
Employee bob = Employee.newBuilder().setAge(35)
.setFirstName("Bob").set...().build();
IntStream.range(1, 100).forEach(index->{
producer.send(new ProducerRecord˂˃(TOPIC, 1L * index, bob));
});
producer.flush();
producer.close();
}
}
└ Setup consumer to use GºSchema Registryº and BºKafkaAvroSerializerº
package com.cloudurable.kafka.schema;
import com.cloudurable.phonebook.Employee;
import io.confluent.kafka.serializers.KafkaAvroDeserializer;
import io.confluent.kafka.serializers.KafkaAvroDeserializerConfig;
import org.apache.kafka.clients.consumer.Consumer;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.serialization.LongDeserializer;
import java.util.Collections;
import java.util.Properties;
import java.util.stream.IntStream;
public class AvroConsumer {
private final static String BOOTSTRAP_SERVERS = "localhost:9092";
private final static String TOPIC = "new-employees";
private static Consumer createConsumer() {
Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, BOOTSTRAP_SERVERS);
props.put(ConsumerConfig.GROUP_ID_CONFIG, "KafkaExampleAvroConsumer");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
LongDeserializer.class.getName());
//USE Kafka Avro Deserializer.
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
KafkaAvroDeserializer.class.getName());
//Use Specific Record or else you get Avro GenericRecord.
props.put(KafkaAvroDeserializerConfig.SPECIFIC_AVRO_READER_CONFIG, "true");
//Schema registry location.
props.put(KafkaAvroDeserializerConfig.SCHEMA_REGISTRY_URL_CONFIG,
"http://localhost:8081"); //<----- Run Schema Registry on 8081
return new KafkaConsumer<>(props);
}
public static void main(String... args) {
final Consumer consumer = createConsumer();
consumer.subscribe(Collections.singletonList(TOPIC));
IntStream.range(1, 100).forEach(index -> {
final ConsumerRecords records =
consumer.poll(100);
if (records.count() == 0) {
System.out.println("None found");
} else records.forEach(record -> {
Employee employeeRecord = record.value();
System.out.printf("%s %d %d %s \n", record.topic(),
record.partition(), record.offset(), employeeRecord);
});
});
}
}
Notice that just like with the producer, we have to tell the
consumer where to find the Registry, and we have to configure the
Kafka Avro Deserializer.
Configuring Schema Registry for the consumer:
//Use Kafka Avro Deserializer.
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
KafkaAvroDeserializer.class.getName());
//Use Specific Record or else you get Avro GenericRecord.
props.put(KafkaAvroDeserializerConfig.SPECIFIC_AVRO_READER_CONFIG, "true"); //Schema registry location.
props.put(KafkaAvroDeserializerConfig.SCHEMA_REGISTRY_URL_CONFIG,
"http://localhost:8081"); //<----- Run Schema Registry on 8081
An additional step is that we have to tell it to use the generated
version of the Employee object. If we did not, then it would use the
Avro GenericRecord instead of our generated Employee object, which is
a SpecificRecord. To learn more about using GenericRecord and
generating code from Avro, read the Avro Kafka tutorial as it has
examples of both.
https://docs.confluent.io/current/schema-registry/index.html
https://docs.confluent.io/current/schema-registry/schema_registry_tutorial.html
APIs
API Summary
@[https://medium.com/@stephane.maarek/the-kafka-api-battle-producer-vs-consumer-vs-kafka-connect-vs-kafka-streams-vs-ksql-ef584274c1e]
┌──────────────────────────────────────────────────────────────────┐
│ ┌───────────────┐ ┌──────────────┐ │
│ │Schema Registry│ │Control Center│ │
│ └───────────────┘ └──────────────┘ │
│ │
│ │
│ Kafka 2 ←·· Replicator ┐ ┌.. REST │
│ · · Proxy │
│ Event · · │
│ Source ┌─v────v──┐ │
│ (IoT,log, ─→ Producer ───→ ───→ Consumer ──→ "real time" │
│ ...) │ Kafka 1 │ action │
│ │ │ │
│ │ │ │
│ │ │ │
│ Data │ │ Target DDBB,│
│ Source ─→ Connect ───→ ───→ Connect ──→ S3,HDFS, SQL│
│ (DDBB, Source │ │ Sink MongoDB,... │
│ csv,...) └─^────^──┘ │
│ │ │ │
│ │ ┌───────────┐ │
│ Streams │KSQL Server│ │
│ API └───────────┘ │
└──────────────────────────────────────────────────────────────────┘
APIs Ussage Context
────────────── ───────────────────────────────────────────────────────
Producer Apps directly injecting data into Kafka
────────────── ───────────────────────────────────────────────────────
Connect Source Apps inject data into CSV,DDBB,... Conn.Src API inject
such data into Kafka.
────────────── ───────────────────────────────────────────────────────
Streams/KSQL Apps consuming from Kafka topics and injecting back
into Kafka:
- KSQL : SQL declarative syntax
- Streams: "Complex logic" in programmatic java/...
────────────── ───────────────────────────────────────────────────────
Consumer Apps consuming a stream, and perform "real-time" action
on it (e.g. send email...)
────────────── ───────────────────────────────────────────────────────
Connect Sink Read a stream and store it into a target store
────────────── ───────────────────────────────────────────────────────
Producer API:
└ Bºextremely simple to useº: send data and Wait in callback.
└ RºLot of custom code for ETL alike appsº:
- How to track the source offsets?
(how to properly resume your producer in case of errors)
- How to distribute load for your ETL across many producers?
( Kafka Connect Source API recommended in those cases)
Connect Source API:
└ High level API built on top of the Producer API for:
- producer tasks Bºdistribution for parallel processingº
-Bºeasy mechanism to resume producersº
└Bº"Lot" of available connectorsº out of the box (zero-code).
Consumer API:
└ BºKISS APIº: It uses Consumer Groups. Topics can be consumed in parallel.
RºCare must be put in offset management and commits, as wellº
Rºas rebalances and idempotence constraints, they’re really º
Rºeasy to write. º
BºPerfect for stateless workloadsº (notifications,...)
└ RºLot of custom code for ETL alike appsº:
Connect Sink API:
└ built on top of the consumer API.
└Bº"Lot" of available connectorsº out of the box (zero-code).
Streams API:
└ Support for Java and Scala.
└ It enables to write either:
-BºHigh Level DSLº(ApacheºSpark alikeº)
- Low Level API (ApacheºStorm alikeº).
└ Complicated Coding is still required, but producers/consumers handling
completely is hidden.
Bºfocussing on stream logicº
└BºSupport for 'joins', 'aggregations', 'exactly-once' semmantics.º
└Rºunit test can be difficultº (test-utils library to the rescue).
└ Use of state stores, when backed up by Kafka topics, will force
Rºprocessing a lot more messagesº, butBºadding support for resilient appsº.
KSQL :
└ ºwrapper on top of Kafka Streamsº.
└ It abstract away Stream coding complexity.
└RºNot support for complex transformations, "exploding" arrays,...º
(as of 2019-11, gaps can be filled)
Message Delivery Semantics
@[http://kafka.apache.org/documentation.html#semantics]
producer
@[http://kafka.apache.org/11/javadoc/index.html?org/apache/kafka/clients/producer/kafkaproducer.html]
- send streams of data to topics in the cluster.
- thread safe → sharing a singleton across threads will generally be faster.
- ex:
import org.apache.kafka.clients.producer.KafkaProducer;
// pre-setup:
final properties kafkaconf = new Properties();
kafkaconf.put("bootstrap.servers", "localhost:9092");
kafkaconf.put("acks" , "all"); ← "all" : slowest/safest setting: block until
full commit of the record (in all partitions?)
// set how to turn key|value ProducerRecord instances into bytes.
kafkaconf.put("key.serializer" , "org.apache.kafka.common.serialization.StringSerializer");
kafkaconf.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
producer˂string, string˃ producer = new kafkaproducer˂˃(kafkaconf);
producer.send( // ← sending-topics is async: (added to buffer
new ProducerRecord˂string, string˃( of pending records)
"my-topic",
integer.tostring(i),
integer.tostring(i)
)
);
producer.close(); // ← leak of resources if not done
producer:
· pool of buffer space holding records not yet transmitted for each partition
of size (config) batch.size
+ · background i/o thread turning
"record" to → (batch) network request.
------------------------
automatically retry unless
config.retries == 0
RºWARNº: config.retries ˃ 0 opens up the possibility of duplicates .
Kafka0.11+ idempotent mode avoid the risk
-"enable.idempotence" setting -
RºWARNº: avoid application level re-sends in this case.
See more details in official doc.
tunning º"linger.ms"º tells producer to wait up to that number of milliseconds
before sending a request in hope that more records will arrive to fill up the
same batch. This can increase performance by introducing a determenistic delay
of at-least "linger.ms".
Bº(similar to nagle's algorithm in tcp)º.
- Kafka 0.11+ transactional producer allows clients to send messages to multiple
partitions and topics! atomically (producer.beginTransaction, producer.send,
producer.commit, producer.abortTransaction ).
config
see full list @ @[http://kafka.apache.org/documentation/#producerconfigs]
-ºcleanup.policyº : "delete"* or "compact" retention policy for old log segments.
-ºcompression.typeº : 'gzip', 'snappy', lz4, 'uncompressed'
-ºindex.interval.bytesº: how frequently kafka adds an index entry to it's offset index.
-ºleader.replication.throttled.replicasº: ex:
[partitionid]º:[brokerid],[partitionid]:[brokerid]:... or
wildcard '*' can be used to throttle all replicas for this topic.
-ºmax.message.bytesº : largest record batch size allowed
-ºmessage.format.versionº: valid apiversion ( 0.8.2, 0.9.0.0, 0.10.0, ...)
-ºmessage.timestamp º : max dif allowed between the timestamp when broker receives
º.difference.max.msº a message and timestamp specified in message.
-ºmessage.timestamp.typeº: "createtime"|"logappendtime"
-ºmin.insync.replicasº : all|-1|"n": number of replicas that must acknowledge a write
for it to be considered successful.
-ºretention.msº : time we will retain a log before old log segments are discarded.
bºsla on how soon consumers must read their dataº.
Consumer
@[http://kafka.apache.org/26/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html]
@[http://kafka.apache.org/11/javadoc/index.html?org/apache/kafka/streams/KafkaStreams.html]
Class KafkaConsumer˂K,V˃
- client consuming records from cluster.
- transparently handles failure of Kafka brokers,
- transparently adapts as topic partitions it fetches migrate within the cluster.
- This client also interacts with the broker to
ºallow consumer-groups to load balance consumption º.
- consumer Rºis not thread-safeº.
- Compatible with brokers ver. 0.10.0+.
Basics:
- Consumer Position: offset of the next record that will be given out.
- Committed position: last offset stored securely. Should the process
fail and restart, this is the offset consumers will
recover to.
- Consumer can either automatically commit offsets periodically; or
commit it manually.(commitSync, commitAsync).
- Consumer Groups and Topic Subscriptions:
- consumer groups: consumer instances sharing same group.id creating a
pool of processes (same or remote machines) to split
the processing of records.
- Each consumer in a group can dynamically set the list of topics it
wants to subscribe to through one of the subscribe APIs.
- Kafka will deliver each message in the subscribed topics to one process
in each consumer group by balancing partitions between all members so that
Bºeach partition is assigned to exactly one consumer in the groupº.
(It makes no sense to have more consumers that partititions)
Group rebalancing (map from partitions to consumer in group) occurs when:
- Consumer is added/removed/not-available*1 to pool.
*1 liveness detection time defined in "max.poll.interval.ms".
- partition is added/removed from cluster.
- new topic matching a subscribed regex is created.
(Consumer groups just allow to have independent parallel multiprocess
clients acting as a same app with no need to manual synchronization).
(additional consumers are actually quite cheap).
(See official doc for advanced topics on balancing consumer groups)
- Simple Example Consumer: let Kafka dynamically assign a fair share of
the partitions for subscribed-to topics
(See original doc for manual choosing partitions to consume from)
Properties configProps = new Properties();
configProps.setProperty("bootstrap.servers", "localhost:9092");
configProps.setProperty("group.id", "test");
configProps.setProperty("enable.auto.commit", "true");
configProps.setProperty("auto.commit.interval.ms", "1000");
configProps.setProperty("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.setProperty("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer˂String, String˃ consumer = new KafkaConsumer˂˃(configProps);
consumer.subscribe(Arrays.asList("foo", "bar"));
while (true) {
ConsumerRecords˂String, String˃ records = consumer.poll(Duration.ofMillis(100));
for (ConsumerRecord˂String, String˃ record : records)
System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
if (isManualCommit /*enable.auto.commit == false*/) {
... ; consumer.commitSync(); ...
}
Use consumer.seek(TopicPartition, long) to skip up to some record.
Use seekToBeginning(Collection)/seekToEnd(Collection) to go to start/end of partition.
See original doc for how to read Transactional Messages.
See original doc for Multi-threaded Processing.
Config
@[http://kafka.apache.org/documentation/#consumerconfigs]
@[http://kafka.apache.org/documentation/#newconsumerconfigs]
@[http://kafka.apache.org/documentation/#oldconsumerconfigs]
Connect
@[http://kafka.apache.org/11/javadoc/index.html?overview-summary.html]
(more info)
- allows reusable producers or consumers that connect Kafka
topics to existing applications or data systems.
Connectors List
@[https://docs.confluent.io/current/connect/connectors.html]
@[https://docs.confluent.io/current/connect/managing/connectors.html]
- Kafka connectors@github
@[https://github.com/search?q=kafka+connector]
-BºHTTP Sinkº
-BºFileStreamºs (Development and Testing)
-BºGitHub Sourceº
-BºJDBC (Source and Sink)º
- PostgresSQL Source (Debezium)
- SQL Server Source (Debezium)
-BºSyslog Sourceº
- AWS|Azure|GCD|Salesforce "*"
- ...
Config
@[http://kafka.apache.org/documentation/#connectconfigs]
AdminClient
@[http://kafka.apache.org/11/javadoc/index.html?org/apache/kafka/clients/admin/AdminClient.html]
- administrative client for Kafka, which supports managing and inspecting
topics, brokers, configurations and ACLs.
Config
@[http://kafka.apache.org/documentation/#adminclientconfigs]
Streams
@[http://kafka.apache.org/25/documentation/streams/]
See also:
@[http://kafka.apache.org/11/javadoc/index.html?org/apache/kafka/streams/KafkaStreams.html]
- Built on top o producer/consumer API.
- simple lightweight embedableBºclient libraryº, with support for
real-time querying of app state with low level Processor API
primitives plus high-level DSL.
- transparent load balancing of multiple instances of an application.
using Kafka partitioning model to horizontally scale processing
while maintaining strong ordering guarantees.
- Supports fault-tolerant local state, which enables very fast and
efficient stateful operations likeºwindowed joins and aggregationsº.
- Supports exactly-once processing semantics when there is a
client|Kafka failure.
- Employs one-record-at-a-time processing to achieve millisecond
processing latency, and supports event-time based windowing
operations with out-of-order arrival of records.
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.common.utils.Bytes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.KTable;
import org.apache.kafka.streams.kstream.Materialized;
import org.apache.kafka.streams.kstream.Produced;
import org.apache.kafka.streams.state.KeyValueStore;
import java.util.Arrays;
import java.util.Properties;
public class WordCountApplication {
public static void main(final String[] args) throws Exception {
final Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG ,
"wordcount-app");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG ,
"kafka-broker1:9092");
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG,
Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG,
Serdes.String().getClass());
final StreamsBuilder builder = new StreamsBuilder();
KStream˂String, String˃ txtLineStream
= builder.stream("TextLinesTopic");
KTable˂String, Long˃ wordCounts = txtLineStream
.flatMapValues(
textLine -˃
Arrays.asList(
textLine.toLowerCase().split("\\W+")
)
)
.groupBy ( (key, word) -˃ word )
.count ( Materialized.
˂ String, Long, KeyValueStore˂Bytes, byte[]˃ ˃
as("counts-store"));
wordCounts.
toStream().
to("WordsWithCountsTopic",
Produced.with(Serdes.String(), Serdes.Long()));
KafkaStreams streams = new KafkaStreams(builder.build(), props);
streams.start();
}
}
BºKafka Streams CORE CONCEPTSº
@[http://kafka.apache.org/25/documentation/streams/core-concepts]
└ºStreamº : graph of stream processors (nodes) that
ºProcessorº connected by streams (edges).
ºtopologyº
└ºStreamº : It represents an data set unbounded in size / time,
ordered, replayable and fault-tolerant inmmutable
record set.
└ºStreamº : node processing an step to transform data
ºprocessorº in streams.
- It receives one input record at a time from
upstream processors and produces 1+ output records.
- Special processors:
- Source Processor: NO upstream processors,
just one or multiple Kafka topics as input.
- Sink Processor: no down-stream processors.
Outpus goes to Kafka topic/external system.
└two ways to define the stream processing topology:
- Streams DSL : map, filter, join and aggregations
- Processor API: (low-level) Custom processors. It also allows
to interact with state stores.
└ºTime modelº: operations like windowing are defined based
on time boundaries. Common notions of time in
streams are:
- Event time:
- Ingestion time: time it's stored into a topic partition.
- Processing time: It may be (milli)seconds for real time or
hours for batch time, after event time.
real event time.
- Kafka 0.10.x+ automatically embeds (event or ingestion) time.
Event of ingestion choose can be done at Kafka or topic level
in configuration.
- Kafka Streams assigns a TS to every data record via the
TimestampExtractor interface allowing to describe the "progress"
of a stream with regards to time and are leveraged by
time-dependent operations such as window operations.
- time only "advances" when a new record arrives at the
processor. Concrete implementations of the TimestampExtractor
interface will provide different semantics to the
stream time definition.
- Finally, Kafka Streams sinks will also assign timestamps
in a way that depends on the context:
- When output is generated from some input record,
for example, context.forward(), TS is inherited from input.
- When new output record is generated via periodic
functions such as Punctuator#punctuate(), TS is defined
as current node internal time (context.timestamp()) .
- For aggregations, result update record TS is the max.
TS of all input records.
NOTE: default behavior can be changed in the Processor API
by assigning timestamps to output records explicitly
in "#forward()".
└ºAggregationº:
ºOperation º
INPUT OUTPUT
-------- ------
KStream → Aggregation → KTable
or KTable
^ ^ ^
DSL Ex: count/sum DSL object:
- new value is considered
to overwrite the old value
ºwith the same keyºin next
steps.
└ºWindowingº: - trackedºper record keyº.
- Available in ºStreams DSLº.
- window.ºgrace_periodº controls
ºhow long Streams clients will wait forº
ºout-of-order data records.º
- Records arriving "later" are discarded.
"late" == record.timestamp dictates it belongs
to a window, but current stream time
is greater than the end of the window
plus the grace period.
└ºStates º: - Needed by some streams.
-Bºstate storesº in Stream APIs allows apps to
store and query data, needed by stateful operations.
- Every task in Kafka Streams embeds 1+ state stores
that can be accessed via APIs to store and query
data required for processing. They can be:
-ºpersistent key-value storeº:
-ºIn-memory hashmapº
- "another convenient data structure".
- Kafka Streams offers fault-tolerance and automatic
recovery for local state-stores.
- directºread-only queriesºof the state stores
is provided to methods, threads, processes or
applications external to the stream processing
app through BºInteractive Queriesº, exposing the
underlying implementation of state-store read
methods.
└ºProcessingº: - at-least-once delivery
ºGuaranteesº (processing.guarantee=exactly_once in config)
- exactly-once processing semantics (Kafka 0.11+)
^^^^^^^^^^
Kafka 0.11.0+ allows producers to send messages to
different topic partitions in transactional and
idempotent manner.
More specifically,Streams client APIguarantees that
for any record read from the source Kafka topics,
its processing results will be reflected exactly once
in the output Kafka topic as well as in the
state stores for stateful operations.
(KIP-129 lecture recomended)
└ºOut-of-Orderº:
ºHandlingº
- Within topic-partition:
- records with larger timestamps but smaller offsets
are processed earlier.
- Within stream task processing "N" topic-partitions:
- If app is Rºnot configured to wait for all partitionsº
Rºto contain some buffered dataº and pick from the
partition with the smallest timestamp to process
the next record, timestamps may be smaller in
following records for different partitions.
"FIX": Allows applications to wait for longer time
while bookkeeping their states during the wait time.
i.e. making trade-off decisions between latency,
cost, and correctness.
In particular, increase windows grace time.
Rº As for Joins some "out-of-order" data cannot be handled
by increasing on latency and cost in Streams yet:
Config
See details: @[http://kafka.apache.org/documentation/#streamsconfigs]
-ºCore config:º
-ºapplication.id º: string unique within the Kafka cluster
-ºbootstrap.servers º: host1:port1,host2:port2
^^^^^^^^^^^^^^^^^^^^^^^
No need to add all host. Just a few ones to start sync
-ºreplication.factorº: int, Default:1
-ºstate.dir º: string, Default: /tmp/kafka-streams
-ºOther params:º
- cache.max.bytes.buffering: long, def: 10485760 (max bytes for buffering ºacross all threadsº)
- client.id : ID prefix string used for the client IDs of internal consumer,
producer and restore-consumer, with pattern '-StreamThread--'.
Default: ""
- default.deserialization.exception.handler
- default.key .serde : Default serializer / deserializer class
- default.value.serde
- default.production.exception.handler: Exception handling class
- default.timestamp.extractor
- max.task.idle.ms : long, Maximum amount of time a stream task will stay idle
when not all of its partition buffers contain records,
to avoid potential out-of-order record processing
across multiple input streams.
- num.standby.replicas: int (default to 0)
- num.stream.threads :
- processing.guarantee: at_least_once (default) | exactly_once.
^^^^^^^^^^^^
- It requires 3+ brokers in production
- for development it can be changed by
tunning broker setting
- transaction.state.log.replication.factor
- transaction.state.log.min.isr.
- security.protocol : PLAINTEXT, SSL, SASL_PLAINTEXT, SASL_SSL.
- topology.optimization: [none*, all] Set wheher Kafka Streams should optimize the topology
- application.server : Default: "", endpoint used for state store discovery and
interactive queries on this KafkaStreams instance.
- buffered.records.per.partition: int, default to 1000
- built.in.metrics.version
- commit.interval.ms : frequency to save the position of the processor.
(default to 100 for exactly_once or 30000 otherwise).
- connections.max.idle.ms: Default: 540000
- metadata.max.age.ms:
- metric.reporters :
- metrics.num.samples:
- metrics.recording.level: [INFO, DEBUG]
- metrics.sample.window.ms
- poll.ms : Amount of time in milliseconds to block waiting for input.
- receive.buffer.bytes: int, def: 32768, size of the TCP receive buffer (SO_RCVBUF) to use
when reading data. Set to -1 to use OS default.
- send.buffer.bytes : in, def: 131072, size of the TCP send buffer (SO_SNDBUF ) to use
when sending data. Set to -1 to use OS default.
- reconnect.backoff.max.ms:
- reconnect.backoff.ms
- request.timeout.ms
- retries : Setting a value greater than zero will cause the client to
resend any request that fails with a potentially transient error.
- retry.backoff.ms :
- rocksdb.config.setter:
- state.cleanup.delay.ms:
- upgrade.from :
- windowstore.changelog.additional.retention.ms:
Examples
@[https://github.com/confluentinc/kafka-streams-examples]
- Examples: Runnable Applications
- Examples: Unit Tests
- Examples: Integration Tests
- Docker Example: Kafka Music demo application
Example ºdocker-compose.ymlº:
└───────┬────────┘
Services launched:
zookeeper:
kafka:
schema-registry:
kafka-create-topics:
... kafka-topics --create --topic play-events ...
... kafka-topics --create --topic song-feed ...
kafka-music-data-generator: ← producer
kafka-music-application: ← consumer
- Examples: Event Streaming Platform
KSQL(ksqlDB)
@[https://www.confluent.io/]
@[https://www.confluent.io/blog/ksql-open-source-streaming-sql-for-apache-kafka/]
See also:
- Pull Queries and Connector Management Added to ksqlDB (KSQL) Event Streaming Database for Kafka (2019-12)
@[https://www.infoq.com/news/2019/12/ksql-ksqldb-streaming-database/]
- pull queries to allow for data to be read at a specific point in
time using a SQL syntax, and Connector management that enables direct
control and execution of connectors built to work with Kafka Connect.
- Until now KSQL has only been able to query continuous streams
("push queries"). Now it can also read current state of a materialized
view using pull queries:
These new queries can run with predictable low latency since the
materialized views are updated incrementally as soon as new
messages arrive.
- With the new connector management and its built-in support for a
range of connectors, it’s now possible to directly control and
execute these connectors with ksqlDB, instead of creating separate
solutions using Kafka, Connect, and KSQL to connect to external data
sources.
The motive for this feature is that the development team
believes building applications around event streams Rºwas too complexº.
BºInstead, they want to achieve the same simplicity as when buildingº
Bºapplications using relational databases. º
- Internally, ksqlDB architecture is based on a distributed commit
log used to synchronize the data across nodes. To manage state,
RocksDB is used to provide a local queryable storage on disk. A
commit log is used to update the state in sequences and to provide
failover across instances for high availability.
Zeebe Streams
@[https://www.infoq.com/news/2019/05/kafka-zeebe-streams-workflows]
• Kafka events are sometimes part of a business process with tasks
spread over several microservices. To handle complex business
processes a workflow engine can be used, but to match Kafka it must
meet the same scalability Kafka provides.
• Zeebe is a workflow engine currently developed and designed to
meet these scalability requirements.
Developer Tools
Pixy
@[https://github.com/mailgun/kafka-pixy]
Kafka-Pixy is a dual API (gRPC and REST) proxy for Kafka with
automatic consumer group control. It is designed to hide the
complexity of the Kafka client protocol and provide a stupid simple
API that is trivial to implement in any language.
Docker Img For Developers
@[https://github.com/lensesio/fast-data-dev]
- Apache Kafka docker image for developers; with Lenses (lensesio/box)
or Lenses.io's open source UI tools (lensesio/fast-data-dev). Have a
full fledged Kafka installation up and running in seconds and top it
off with a modern streaming platform (only for kafka-lenses-dev),
intuitive UIs and extra goodies. Also includes Kafka Connect, Schema
Registry, Lenses.io's Stream Reactor 25+ Connectors and more.
Ex SystemD integration file:
/etc/systemd/system/kafkaDev.service
| #!/bin/bash
|
| # Visit http://localhost:3030 to get into the fast-data-dev environment
|
| [Unit]
| Description=Kafka Lensesio
| After=docker.service
| Wants=network-online.target docker.socket
| Requires=docker.socket
|
| [Service]
| Restart=always
| # Create container if it doesn't exists with container inspect
| ExecStartPre=/bin/bash -c "/usr/bin/docker container inspect lensesio/fast-data-dev 2> /dev/null || /usr/bin/docker run -d --name kafkaDev --net=host -v /var/backups/DOCKER_VOLUMES_HOME/kafkaDev:/data lensesio/fast-data-dev"
| ExecStart=/usr/bin/docker start -a kafkaDev
| ExecStop=/usr/bin/docker stop -t 20 kafkaDev
|
| [Install]
| WantedBy=multi-user.target
DevOps
Quick start
@[http://kafka.apache.org/documentation.html#quickstart]
BºPRE-SETUPº
Download tarball from mirror @[https://kafka.apache.org/downloads],
then untar like:
$º$ tar -xzf - kafka_2.13-2.5.0.tgz ; cd kafka_2.13-2.5.0º
- Start the server
$º$ ~ bin/zookeeper-server-start.sh \ º ← Start Zookeeper
$º config/zookeeper.properties 1˃zookeeper.log 2˃⅋1 ⅋ º
$º$ ~ bin/kafka-server-start.sh \ º ← Start Kafka Server
$º config/server.properties 1˃kafka.log 2˃⅋1 ⅋ \ º
$º$ bin/kafka-topics.sh --create --zookeeper \ º ← Create BºTESTº topic
$º localhost:2181 --replication-factor 1 \ º (Alt.brokers can be
$º --partitions 1 --topic TEST º configured to auto-create
└──┘ them when publishing to
non-existent ones)
$º$ bin/kafka-topics.sh --list --zookeeper localhost:2181º ← Check topic
$º$ TEST º ← Expected output
└──┘
$º$ bin/kafka-console-producer.sh --broker-list º ← Send some messages
$º localhost:9092 --topic TEST º
$ºThis is a message º
$ºThis is another message º
$ºCtrl+V º
$º$ bin/kafka-console-consumer.sh --bootstrap-server \ º ← Start a consumer
$ºlocalhost:9092 --topic Bºtestº --from-beginning º
$ºThis is a message º
$ºThis is another message º
********************************
* CREATING A 3 BROKERS CLUSTER *
********************************
$º$ cp config/server.properties config/server-1.propertiesº ← Add 2 new broker configs.
$º$ cp config/server.properties config/server-2.propertiesº
└───────────┬────────────┘
┌──────────────────────────────┤
┌───────────┴────────────┐ ┌──────────┴─────────────┐
ºconfig/server-1.properties:º ºconfig/server-2.properties:º
broker.id=1 broker.id=2 ← unique id
listeners=PLAINTEXT://:9093 listeners=PLAINTEXT://:9094
log.dir=/tmp/kafka-logs-1 log.dir=/tmp/kafka-logs-2 ← avoid overwrite
$º$ bin/kafka-server-start.sh config/server-1.properties ...º ← Start 2nd cluser
$º$ bin/kafka-server-start.sh config/server-2.properties ...º ← Start 3rd cluser
$º$ bin/kafka-topics.sh --create --zookeeper localhost:2181\º ← Create new topic with
$º --replication-factor 3 --partitions 1 º Bºreplication factor of 3º
$º --topic topic02 º
$º$ bin/kafka-topics.sh --describe \ º ← Check know which broker
$º --zookeeper localhost:2181 --topic topic02 º is doing what
(output will be similar to)
Topic: topic02 PartitionCount:1 ReplicationFactor:3 Configs: ← summary
Topic: topic02 Partition:0 Leader:1 Replicas: 1,2,0 Isr: 1,2,0 ← Partition 0
└┬┘
set of "in-sync" replicas.
(subset of "replicas" currently
alive and in sync with "leader")
*****************************************
* Using Kafka Connect to import/export *
* data using simple connectors *
*****************************************
BºPRE-SETUPº
$º$ echo -e "foo\nbar" > test.txt º ← Prepare (input)test data
- SETUP source/sink Connectors:
config/connect-file-srcs.properties ← unique_connector_id, connector class,
config/connect-file-sink.properties ...
$º$ bin/connect-standalone.sh \ º ← Start Bºtwo connectorsº running in
$º \ º standalone mode (dedicated process)
$º config/connect-standalone.properties \ º ← 1st param is common Kafka-Connect config
$º \ º (brokers, serialization format ,...)
$º config/connect-file-srcs.properties \ º
$º config/connect-file-sink.properties º
└─────────────────┬────────────────────┘
examples in kafka distribution set the "pipeline" like:
"test.txt" → connect-test → sink connector → "test.sink.txt"
• See also: Ansible Install:
@[https://github.com/confluentinc/cp-ansible]
@[https://docs.confluent.io/current/installation/cp-ansible/index.html]
- Installs Confluent Platform packages.
- ZooKeeper
- Kafka
- Schema Registry
- REST Proxy
- Confluent Control Center
- Kafka Connect (distributed mode)
- KSQL Server
- systemd Integration
- Configuration options for:
plaintext, SSL, SASL_SSL, and Kerberos.
Configuration
• Broker Config:
@[http://kafka.apache.org/documentation/#configuration]
Main params:
- broker.id
- log.dirs
- zookeeper.connect
• Topics config: @[http://kafka.apache.org/documentation/#topicconfigs"
• Compaction: kafka.devops, kakfa.performance
http://kafka.apache.org/documentation.html#compaction
Strimzi k8s Operator
"Kafka on K8s in a few minutes"
• Strimzi: RedHat OOSS project that provides container images and
operators for running production-ready Kafka on k8s and OpenShift.
• K8s native experience, using kubectl to manage the Kafka cluster and GitOps.
• Monitoring and observability integration with Prometheus.
• TODO:
@[https://developers.redhat.com/blog/2019/06/06/accessing-apache-kafka-in-strimzi-part-1-introduction/]
@[https://developers.redhat.com/blog/2019/06/07/accessing-apache-kafka-in-strimzi-part-2-node-ports/]
@[https://developers.redhat.com/blog/2019/06/10/accessing-apache-kafka-in-strimzi-part-3-red-hat-openshift-routes/]
@[https://developers.redhat.com/blog/2019/06/11/accessing-apache-kafka-in-strimzi-part-4-load-balancers/]
@[https://developers.redhat.com/blog/2019/06/12/accessing-apache-kafka-in-strimzi-part-5-ingress/]
Kafka vs ...
Pulsar vs Kafka
- Pulsar is a younger project Bºinspired and informed by Kafkaº.
Kafka on the other side has a bigger community.
- Pulsar Webinars [resource]
@[https://streamnative.io/resource#pulsar]
BºPulsas "PRO"s over Kafkaº
Bº========================º
@[https://kafkaesque.io/7-reasons-we-choose-apache-pulsar-over-apache-kafka/]
@[https://kesque.com/5-more-reasons-to-choose-apache-pulsar-over-kafka/]
1. Streaming and queuing Come together:
Pulsar supports standard message queuing patterns, such as
competing consumers, fail-over subscriptions, and easy message
fan out keeping track of the client read position in the topic
and stores that information in its high-performance distributed
ledger, Apache BookKeeper, handling many of the use cases of a
traditional queuing system, like RabbitMQ.
2. Simpler ussage:
If you don't need partition you don't have to worry about them.
"If you just need a topic, then use a topic". Do not worry about
how many consumers the topic might have.
Pulsar subscriptions allow you to add as many consumers as you want
on a topic with Pulsar keeping track of it all. If your consuming
application can’t keep up, you just use a shared subscription to
distribute the load between multiple consumers.
Pulsar hasºpartitioned topicsºif you need them, but
ºonly if you need themº.
3. Fitting a log on a single server becomes a challenge (Disk full,
remote copy o large logs can take a long time.
More info at "Adding a New Broker Results in Terrible Performance"
@[https://www.confluent.io/blog/stories-front-lessons-learned-supporting-apache-kafka/]
Apache Pulsar breaks logs into segments and distributes them across
multiple servers while the data is being written by using BookKeeper
as its storage layer.
This means that the Bºlog is never stored on a single serverº, so a
single server is never a bottleneck. Failure scenarios are easier to
deal with and Bºscaling out is a snap: Just add another server.º
BºNo rebalancing needed.º
4. Stateless Brokers:
In Kafka each broker contains the complete log for each of
its partitions. If load gets too high, you can't simply add
another broker. Brokers must synchronize state from other
brokers that contain replicas of its partitions.
In Pulsar brokers accept data from producers and send data
to consumers, but the data is stored in Apache BookKeeper.
If load gets high, just add another broker.
It starts up quickly and gets to work right away.
5. Geo-replication is a first-class feature in Pulsar.
(vs proprietary add-on).
BºConfiguring it is easy and it just works. No PhD needed.º
6. Consistently Faster
BºPulsar delivers higher throughput along with lower and moreº
Bºconsistent latency.º
7. All Apache Open Source
input and output connectors (Pulsar IO), SQL-based topic queries
(Pulsar SQL), schema registry,...
(vs Kafka open-source features controlled by a commercial entity)
8. Pulsar can have multiple tenants and those tenants can have
multiple namespaces to keep things all organized. Add to that
access controls, quotas, and rate-limiting for each namespace
and you can imagine a future where we can all get along using
just this one cluster.
(WiP or Kafka in KIP-37).
9. Replication
You want to make sure your messages never get lost. In Kakfa
you configure 2 or 3 replicas of each message in case
something goes wrong.
In Kafka the leader stores the message and the followers make
a copy of it. Once enough followers acknowledge they’ve got it,
"Kafka is happy".
Pulsar uses a quorum model: It sends the message out to a
bunch of nodes, and once enough of them acknowledge they've
got it, "Pulsar is happy". Majority always wins, and all votes
are equal giving more consistent latency behavior over time.
@[https://kafkaesque.io/performance-comparison-between-apache-pulsar-and-kafka-latency/]
(Kafka quorum is also a WiP in KIP-250)
10.Tiered storage.
What if you like to store messages forever (event-sourcing)?
It can get expensive on main high-performance SSDs.
BºWith Pulsar tiered storage you can automatically push old messages º
Bºinto practically infinite, cheap cloud storage (S3) and retrieve themº
Bºjust like you do those newer, fresh-as-a-daisy messages.º
(Kafka describes this feature in KIP-405).
11.End-to-end encryption
Producer Java client can encrypt message using shared keys with
the consumer. (hidden info to broker).
Kafka is in feature-request state (KIP-317).
12.When an (stateless) Pulsar broker gets overloaded, pulsar rebalance
request from clients automatically.
It monitors the broker ussage of CPU, memory, and network (vs disk, since
brokers is stateless) to take the decision to balance.
No need to add a new broker until all brokers are at full.
In Kafka load-balancing is done by installing another package
such as LinkedIn's Cruise Control or paying for Confluent's rebalancer
tool.
— See also: [TODO]
@[https://streamnative.io/blog/tech/pulsar-vs-kafka-part-1]
KubeMQ: Kafka alternative
@[https://dzone.com/articles/seamless-migration-from-kafka-to-kubemq?edition=699391]
• Non free product.
· Free for stand alone single pod.
· Professional for SMBs: Deploy on up to 20 k8s clusters.
· Professional for Dist. product: Corporate license .
· Enterprise: Enhanced SLA spport.
• Kafka monolithic architecture is more suited for on-premise clusters or high-end multi-VM setups. Spinning up a multi-node cluster on a standalone workstation for testing purposes can be a challenge.
• KubeMQ messaging service is built from the ground up with Kubernetes in mind:
- It is Statless and Ephemeral.
- When config. changes are needed, nodes are shut down and replaced requiring
- Container images is "just" 30MB, suitable for local development.
ºzeroº-configuration setup or post-install tweakingº:
A 'channel' is the only object developers need to create (forget about
brokers, exchanges, orchestrators ... KubeMQ's Raft replaces ZooKeeper).
• KubeMQ message delivery patterns include:
• Pub/Sub with or without persistence
• Request/Reply (synchronous, asynchronous)
• At Most Once Delivery
• At Least Once Delivery
• Streaming patterns
• RPC
(Comparatively, Kafka only supports Pub/Sub with persistence and streaming.
RPC and Request/Reply patterns are not supported by Kafka at all).
• kubemqctl: cli interface analogous to kubectl.
• KubeMQ Sources allows to connect to an existing Kafka input topic to simplify
migration from Kafka.
Akka+Kotlin:Data Pipes
@[https://www.kotlindevelopment.com/data-pipelines-kotlin-akka-kafka/]
- Objective:
Create a "GitHub monitor" that build an analytics component that
polls one of your services (GitHub), writes data into a message queue
for later analysis, then, after post-processing, updates statistics in
a SQL database.
- Tooling: Akka Streams + Alpakka connector collection
- "Architecture":
1) polls GitHub Event API for kotlin activity
2) writes all events into a Kafka topic for later use
3) reads events from Kafka and filters out PushEvents
4) updates a Postgres database with:
- who pushed changes
- when
- into which repository.
- Akka summary: data is moving from Sources to Sinks.
(Observable and Sink in RxJava)
ºENTRY POINTº
(standard main function)
fun main(vararg args: String) {
val system = ActorSystem.create() // "boilerplate" for using Akka and Akka Streams
val materializer = ActorMaterializer.create(system)
val gitHubClient = GitHubClient(system, materializer) // instance used to poll the GitHub events API
val eventsProducer = EventsProducer(system, materializer) // instance used to write events into Kafka
val eventsConsumer = EventsConsumer(system) // instance used to read events from Kafka
val pushEventProcessor = PushEventProcessor(materializer) // instance used to filter PushEvents and update the database
eventsProducer.write(gitHubClient.events()) // put things in motion.
pushEventProcessor.run(eventsConsumer.read())
}
Each time we receive a response from GitHub, we parse it and send individual events downstream.
fun events(): Source˂JsonNode, NotUsed˃ =
poll().flatMapConcat { response -˃
response.nodesOpt
.map { nodes -˃ Source.from(nodes) }
.orElse(Source.empty())
}
ºEventsProducer and EventsConsumerº
(the "power" of Akka Streams and Alpakka)
- Akka-Streams-Kafka greatly reduces the amount of code
that we have to write for integrating with Kafka.
Publishing events into a Kafka topic look like:
fun write(events: Source˂JsonNode, NotUsed˃)
: CompletionStage˂Done˃
= events.map { // ← maps GitHub event to(Kafka)ProducerRecord
node -˃ ProducerRecord˂ByteArray, String˃
("kotlin-events",
objectMapper.writeValueAsString(node) // ← serialize JsonNode as a String
)
}.runWith( // ← connects Source Sink
Producer.plainSink(settings), materializer)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- defined@Akka Streams Kafka
- takes care of communicating with Kafka
- The other way around, reading from Kafka, is also super simple.
fun read(): Source˂JsonNode, NotUsed˃ =
Consumer.plainSource(
settings,
Subscriptions.assignmentWithOffset(
TopicPartition("kotlin-events", 0), 0L))
.map { record -˃ objectMapper.readTree(record.value()) }
.mapMaterializedValue { NotUsed.getInstance() }
At this point, we have a copy of GitHub's events feed for github.com/Kotlin stored in Kafka,
so we can time travel and run different analytics jobs on our local dataset.
ºPushEventProcessorº
we want to filter out PushEvents from the stream
and update a Postgres database with the results.
- Alpakka Slick (JDBC) Connector is used to connect to PostgreSQL.
fun createTableIfNotExists(): Source˂Int, NotUsed˃ {
val ddl =
"""
|CREATE TABLE IF NOT EXISTS kotlin_push_events(
| id BIGINT NOT NULL,
| name VARCHAR NOT NULL,
| timestamp TIMESTAMP NOT NULL,
| repository VARCHAR NOT NULL,
| branch VARCHAR NOT NULL,
| commits INTEGER NOT NULL
|);
|CREATE UNIQUE INDEX IF NOT EXISTS id_index ON kotlin_push_events (id);
""".trimMargin()
return Slick.source(session, ddl, { _ -˃ 0 })
}
- Similarly, the function to update the database looks like this.
fun Source˂PushEvent, NotUsed˃.updateDatabase() :
CompletionStage˂Done˃ =
createTableIfNotExists().flatMapConcat { this }
.runWith(Slick.sink˂PushEvent˃(session, 20, { event -˃
"""
|INSERT INTO kotlin_push_events
|(id, name, timestamp, repository, branch, commits)
|VALUES (
| ${event.id},
| '${event.actor.login}',
| '${Timestamp.valueOf(event.created_at)}',
| '${event.repo.name}',
| '${event.payload.ref}',
| ${event.payload.distinct_size}
|)
|ON CONFLICT DO NOTHING
""".trimMargin()
}), materializer)
We are almost done, what's left is filtering and mapping from JsonNode to
PushEvent and composing the methods together.
fun Source˂JsonNode, NotUsed˃.filterPushEvents(): Source˂PushEvent, NotUsed˃ =
filter { node -˃ node["type"].asText() == "PushEvent" }
.map { node -˃ objectMapper.convertValue(node, PushEvent::class.java) }
And finally, all the functions composed together look like this.
fun run(events: Source˂JsonNode, NotUsed˃): CompletionStage˂Done˃ =
events
.filterPushEvents()
.updateDatabase()
This is why we've used the extension methods above, so we can describe
transformations like this, simply chained together.
That's it, after running the app for a while (gradle app:run) we can see
the activities around different Kotlin repositories.
You can find the complete source code on GitHub.
A very nice property of using Akka Streams and Alpakka is that it makes
really easy to migrate/reuse your code, e.g. in case you want to store data
in Cassandra later on instead of Postgres. All you would have to do is define
a different Sink with CassandraSink.create. Or if GitHub events would be
dumped in a file located in AWS S3 instead of published to Kafka, all you
would have to do is create a Source with S3Client.download(bucket, key). The
current list of available connectors is located here, and the list is growing.
Unordered
Unordered
• Security:
https://kafka.apache.org/documentation/#security
• Burrow Monit:
@[https://dzone.com/articles/kafka-monitoring-with-burrow]
• Best Pracites:
@[https://www.infoq.com/articles/apache-kafka-best-practices-to-optimize-your-deployment]
• Mirror Maker (geo-replica):
@[https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=27846330]
fka's mirroring feature makes it possible to maintain a replica of an
existing Kafka cluster. The following diagram shows how to use the
MirrorMaker tool to mirror a source Kafka cluster into a target
(mirror) Kafka cluster. The tool uses a Kafka consumer to consume
messages from the source cluster, and re-publishes those messages to
the local (target) cluster using an embedded Kafka producer.
• Kafka Backup:
@[https://github.com/itadventurer/kafka-backup]
- Kafka Backup is a tool to back up and restore your Kafka data
including all (configurable) topic data and especially also consumer
group offsets. To the best of our knowledge, Kafka Backup is the only
viable solution to take a cold backup of your Kafka data and restore
it correctly.
• Leasson Learned:
@[https://www.confluent.io/blog/stories-front-lessons-learned-supporting-apache-kafka/]
- Under-replicated Partitions Continue to Grow Inexplicably
- Kafka Liveness Check and Automation Causes Full Cluster Down
- Adding a New Broker Results in Terrible Performance
• Faust (Python Streams):
@[https://github.com/robinhood/faust]
- Faust provides both stream processing and event processing, sharing
similarity with tools such as Kafka Streams, Apache
Spark/Storm/Samza/Flink,
ELK
ELK
ElasticSearch (Search Engine)
ELK Summary
-ºElasticSearch cluster solutions in practice means:º
- Deploy dozens of machines/containers.
- Server tuning
- mapping writing: i.e., - deciding how the data is:
- tokenized
- analyzed
- indexed
depending of user's requirements, internal limitations, and
available hardware resources.
- Elasticsearch is the central piece of ELK (Elastic Search, Logstash, and Kibana)
- developed and maintained by Elastic.
- Elasticsearch:
- Add clustering and enterprise features on top of the Apache Lucene Library.
- Originally designed for text-search indexation.
New versions can be used as a general data indexing solution.
Extracted from: @[https://docs.sonarqube.org/latest/requirements/requirements/]
""" ...the "data" folder houses the Elasticsearch indices on which a huge amount
of I/O will be done when the server is up and running. Great read ⅋ write
hard drive performance will therefore have a great impact on the overall
server performance.
"""
Cerebro: admin GUI
@[https://www.redhat.com/sysadmin/cerebro-webui-elk-cluster]
Elasticsearch comes as a set of blocks, and you—as a designer - are supposed
to glue them together. Yet, the way the software comes out of the box does not
cover everything. So, to me, it was not easy to see the cluster's heartbeat
all in one place. I needed something to give me an overview as well as allow me
to take action on basic things.
I wanted to introduce you to a helpful piece of software I found: Cerebro.
According to the Cerebro GitHub page:
Cerebro is an open source (MIT License) elasticsearch web admin tool built
using Scala, Play Framework, AngularJS, and Bootstrap.
Solr (ElasticSearch Alternative)
@[http://lucene.apache.org/solr/]
- blazing-fast, search platform built on Apache Lucene
- Doesn't include analytics engine like ElasticSearch
Logstash (Collector)
- Logstash: log pipeline (ingest/transform/load it into a store
like Elasticsearch)
(It is common to replace Logstash with Fluentd)
- Kibana: visualization layer on top of Elasticsearch.
- In productiona few other pieces might be included like:
- Kafka, Redis, NGINX, ....
logstash-plugins
https://github.com/logstash-plugins
Plugins that can extend Logstash's functionality.
logstash-codec-protobuf parsing Protobuf messages
logstash-integration-kafka
logstash-filter-elasticsearch
amazon-s3-storage
logstash-codec-csv
...
Logstash Alternatives
Graylog
- log management tool based on Java, ElasticSearch and MongoDB.
Graylog can be used to collect, index and analyze
any server log from a centralized location or distributed location. We can
easily monitor any unusual activity for debugging applications and logs using
Graylog. Graylog provides a powerful query language, alerting abilities, a
processing pipeline for data transformation and much more. We also can extend
the functionality of Graylog through a REST API and Add-ons.
- gaining popularity in the Go community with
the introduction of the Graylog Collector Sidecar
written in Go.
- ... it still lags far behind the ELK stack.
- Composed under the hood of:
- Elasticsearch
- MongoDB
- Graylog Server.
- comes with alerting built into the open source version
and streaming, message rewriting, geolocation, ....
- Ex:
@[https://www.howtoforge.com/how-to-monitor-log-files-with-graylog-v31-on-debian-10/]
Fluentd
-WºIt is not a global log aggregation system but a local one.º
- Adopted by the CNCF as Incubating project.
- Recommended by AWS and Google Cloud.
- Common replacement for Logstash
- It acts as a local aggregator to collect all node logs and send them off to central
storage systems.
-Oº500+ plugins for quick and easy integrations withº
Oºdifferent data input/outputs. º
-Bºcommon choice in Kubernetes environments due to:º
Bº- low memory requirements (tens of megabytes) º
Bº (each pod has a Fluentd sidecar) º
Bº- high throughput. º
Kibana (UI)
Kibana Summary
- Analytical UI results extracted from ElasticSearch.
Kibana Alternatives
- Grafana: lower level UI for infrastructure data visualization.
Directly can inject data from many sources.
Kibana must use logstash to inject from non-elasticsearch sources.
- Tableau: Very simple to use and ability to produce interactive visualizations.
- well suited to handling the huge and very fast-changing datasets
- integrates with Hadoop, Amazon AWS, My SQL, SAP, Teradata, ....
- Qlikview: Tableau’s biggest competitor.
- FusionCharts: non free JavaScript-based charting and visualization package
90+ different chart types,
rather than having to start each new visualization from scratch, users
can pick from "live" example templates.
- Highcharts: non free. (It can be used freely as a trial, non-commercial or personal use).
- Datawrapper: increasingly popular choice to present charts and statistics.
- simple, clear interface
- very easy to upload csv data and create straightforward charts, also maps,
...
- Plotly : Plotly enables more complex and sophisticated visualizations,
-BºIntegrats with analytics-oriented programming languages such as Python, R and Matlab.º
- built on top of the open source d3.js.
- Free/Non free licence.
- support for APIs such as Salesforce.
- Sisense : full stack analytics platform but its visualization capabilities.
simple-to-use drag and drop interface.
Dashboards can then be shared across organizations.
Non-Ordered
Non-Ordered
InfluxDB (Elas.Search Alt for TimeSeries)
@[https://logz.io/blog/influxdb-vs-elasticsearch/]
• awesome influxdb:
https://github.com/PoeBlu/awesome-influxdb
A curated list of awesome projects, libraries, tools, etc. related to InfluxDB
Tools whose primary or sole purpose is to feed data into InfluxDB.
- accelerometer2influx - Android application that takes the x-y-z axis metrics
from your phone accelerometer and sends the data to InfluxDB.
- agento - Client/server collecting near realtime metrics from Linux hosts
- aggregateD - A dogstatsD inspired metrics and event aggregation daemon for
InfluxDB
- aprs2influxdb - Interfaces ham radio APRS-IS servers and saves packet data
into an influxdb database
- Charmander - Charmander is a lab environment for measuring and analyzing
resource-scheduling algorithms
- gopherwx - a service that pulls live weather data from a Davis Instruments
Vantage Pro2 station and stores it in InfluxDB
- grade - Track Go benchmark performance over time by storing results in
InfluxDB
- Influx-Capacitor - Influx-Capacitor collects metrics from windows machines
using Performance Counters. Data is sent to influxDB to be viewable by grafana
- Influxdb-Powershell - Powershell script to send Windows Performance counters
to an InfluxDB Server
- influxdb-logger - SmartApp to log SmartThings device attributes to an
InfluxDB database
- influxdb-sqlserver - Collect Microsoft SQL Server metrics for reporting to
InfluxDB and visualize them with Grafana
- k6 - A modern load testing tool, using Go and JavaScript
- marathon-event-metrics - a tool for reporting Marathon events to InfluxDB
- mesos-influxdb-collector - Lightweight mesos stats collector for InfluxDB
- mqforward - MQTT to influxdb forwarder
- node-opcua-logger - Collect industrial data from OPC UA Servers
- ntp_checker - compares internal NTP sources and warns if the offset between
servers exceeds a definable (fraction of) seconds
- proc_to_influxdb - Console app to observe Windows process starts and stops
via InfluxDB
- pysysinfo_influxdb - Periodically send system information into influxdb (uses
python3 + psutil, so it also works under Windows)
- sysinfo_influxdb - Collect and send system (linux) info to InfluxDB
- snmpcollector - A full featured Generic SNMP data collector with Web
Administration Interface for InfluxDB
- Telegraf - (Official) plugin-driven server agent for reporting metrics into
InfluxDB
- tesla-streamer - Streams data from Tesla Model S to InfluxDB (rake task)
- traffic_stats - Acquires and stores statistics about CDNs controlled by
Apache Traffic Control
- vsphere-influxdb-go - Collect VMware vSphere, vCenter and ESXi performance
metrics and send them to InfluxDB
MFA OOSS
https://opensource.com/article/20/3/open-source-multi-factor-authentication Open source alternative for multi-factor authentication: privacyIDEA
Chrony (NTP replacement)
@[https://www.infoq.com/news/2020/03/ntp-chrony-facebook/]
Facebook’s Switch from ntpd to chrony for a More Accurate, Scalable NTP Service
Apache Beam
• Apache Beam provides an advanced unified programming model, allowing
you to implement batch and streaming data processing jobs that can
run on any execution engine.
• Allows to execute pipelines on multiple environments such as Apache
Apex, Apache Flink, Apache Spark among others.
Apache Ignite
• Apache Ignite is a high-performance, integrated and distributed
in-memory platform for computing and transacting on large-scale data
sets in real-time, orders of magnitude faster than possible with
traditional disk-based or flash technologies.
• Can be used to dramatically increase RDBMS (SQL) Online Analytics Processing (OLAP)
and Online Transaction Processing (OLTP).
@[https://www.gridgain.com/resources/papers/accelerate-mysql-olap-oltp-use-cases]
Apache PrestoDB
@[https://prestodb.io/]
- distributed SQL query engine originally developed by Facebook.
- running interactive analytic queries against data sources of
all sizes ranging from gigabytes to petabytes.
- Engine can combine data from multiple sources (RDBMS, No-SQL,
Hadoop) within a single query, and it has little to no performance
degradation running. Being used and developed by big data giants
like, among others, Facebook, Twitter and Netflix, guarantees a
bright future for this tool.
- Designed and written from the ground up for interactive
analytics and approaches the speed of commercial data warehouses
while scaling to the size of organizations like Facebook.
- NOTE: Presto uses Apache Avro to represent data with schemas.
(Avro is "similar" to Google Protobuf/gRPC)
Google Colossus FS
@[https://www.systutorials.com/3202/colossus-successor-to-google-file-system-gfs/]
ddbb from scratch
Writing a SQL database from scratch in Go: 3. indexes | notes.eatonphil.com !!!
https://notes.eatonphil.com/database-basics-indexes.html
Mattermost+Discourse
CERN, cambia el uso de Facebook Workplace por Mattermost y Discourse
https://www.linuxadictos.com/cern-cambia-el-uso-de-facebook-workplace-por-mattermost-y-discourse.html
Intel Optane DC performance
http://mikaelronstrom.blogspot.com/2020/02/benchmarking-5-tb-data-node-in-ndb.html?m=1
Through the courtesy of Intel I have access to a machine with 6 TB of Intel
Optane DC Persistent Memory. This is memory that can be used both as
persistent memory in App Direct Mode or simply used as a very large
DRAM in Memory Mode.
Slides for a presentation of this is available at slideshare.net.
This memory can be bigger than DRAM, but has some different characteristics
compared to DRAM. Due to this different characteristics all accesses to this
memory goes through a cache and here the cache is the entire DRAM in the
machine.
In the test machine there was a 768 GB DRAM acting as a cache for the
6 TB of persistent memory. When a miss happens in the DRAM cache
one has to go towards the persistent memory instead. The persistent memory
has higher latency and lower throughput. Thus it is important as a programmer
to ensure that your product can work with this new memory.
What one can expect performance-wise is that performance will be similar to
using DRAM as long as the working set is smaller than DRAM. As the working
set grows one expects the performance to drop a bit, but not in a very significant
manner.
We tested NDB Cluster using the DBT2 benchmark which is based on the
standard TPC-C benchmark but uses zero latency between transactions in
the benchmark client.
This benchmark has two phases, the first phase loads the data from 32 threads
where each threads loads one warehouse at a time. Each warehouse contains
almost 500.000 rows in a number of tables.
SOAP vs RESTfull vs ...
SOAP vs RESTfull vs JSON-RPC vs gRPC vs Drift
EventQL
eventql/eventql: Distributed "massively parallel" SQL query engine
https://github.com/eventql/eventql
n-gram index
"...Developed an indexing and search software, mostly in C.
The indexer is multi-threaded and computes a n-gram index
- stored in B-Trees - on hundreds of GB of data generated
every day in production. The associated search engine is also
a multi-threaded, efficient C program."
- In the fields of computational linguistics and probability, an n-gram
is a contiguous sequence of n items from a given sample of text or
speech. The items can be phonemes, syllables, letters, words or base
pairs according to the application. The n-grams typically are
collected from a text or speech corpus. When the items are words,
n-grams may also be called shingles[clarification needed].
Genesis: end-to-end testing
Genesis: An End-To-End Testing & Development Platform For Distributed Systems
https://github.com/EntEthAlliance/genesis/branches
Outputting data to collect for later analysis is as simple as writing
a JSON object to stdout. You can review the data we collect from each
test on the test details page in the dashboard, build your own
metrics and analytics dashboards with Kibana, and write your own
custom data analysis and reports using Jupyter Notebooks.
fallacies of distributed computing
@[https://en.wikipedia.org/wiki/Fallacies_of_distributed_computing]
The fallacies are[1]
The network is reliable;
Latency is zero;
Bandwidth is infinite;
The network is secure;
Topology doesn't change;
There is one administrator;
Transport cost is zero;
The network is homogeneous.
The effects of the fallacies
- Software applications are written with little error-handling on
networking errors. During a network outage, such applications may
stall or infinitely wait for an answer packet, permanently consuming
memory or other resources. When the failed network becomes available,
those applications may also fail to retry any stalled operations or
require a (manual) restart.
- Ignorance of network latency, and of the packet loss it can cause,
induces application- and transport-layer developers to allow
unbounded traffic, greatly increasing dropped packets and wasting
bandwidth.
- Ignorance of bandwidth limits on the part of traffic senders can
result in bottlenecks.
- Complacency regarding network security results in being blindsided
by malicious users and programs that continually adapt to security
measures.[2]
- Changes in network topology can have effects on both bandwidth and
latency issues, and therefore can have similar problems.
- Multiple administrators, as with subnets for rival companies, may
institute conflicting policies of which senders of network traffic
must be aware in order to complete their desired paths.
- The "hidden" costs of building and maintaining a network or subnet
are non-negligible and must consequently be noted in budgets to avoid
vast shortfalls.
- If a system assumes a homogeneous network, then it can lead to the
same problems that result from the first three fallacies.
Time Series + Spark
https://github.com/AmadeusITGroup/Time-Series-Library-with-Spark
NATS
Extracted from:
https://docs.baseline-protocol.org/baseline-protocol/packages/messaging
NATS is currently the default point-to-point messaging provider and
the recommended way for organizations to exchange secure protocol
messages. NATS was chosen due to its high-performance capabilities,
community/enterprise footprint, interoperability with other systems
and protocols (i.e. Kafka and MQTT) and its decentralized
architecture.
https://docs.nats.io/
The Importance of Messaging
Developing and deploying applications and services that communicate
in distributed systems can be complex and difficult. However there
are two basic patterns, request/reply or RPC for services, and event
and data streams. A modern technology should provide features to make
this easier, scalable, secure, location independent and observable.
Distributed Computing Needs of Today
A modern messaging system needs to support multiple communication
patterns, be secure by default, support multiple qualities of
service, and provide secure multi-tenancy for a truly shared
infrastructure. A modern system needs to include:
- Secure by default communications for microservices, edge
platforms and devices
- Secure multi-tenancy in a single distributed communication
technology
- Transparent location addressing and discovery
- Resiliency with an emphasis on the overall health of the system
- Ease of use for agile development, CI/CD, and operations, at scale
- Highly scalable and performant with built-in load balancing and
dynamic auto-scaling
- Consistent identity and security mechanisms from edge devices to
backend services
NATS is simple and secure messaging made for developers and
operators who want to spend more time developing modern applications
and services than worrying about a distributed communication system.
- Easy to use for developers and operators
- High-Performance
- Always on and available
- Extremely lightweight
- At Most Once and At Least Once Delivery
- Support for Observable and Scalable Services and Event/Data
Streams
- Client support for over 30 different programming languages
- Cloud Native, a CNCF project with Kubernetes and Prometheus
integrations
Use Cases
NATS can run anywhere, from large servers and cloud instances,
through edge gateways and even IoT devices. Use cases for NATS
include:
- Cloud Messaging
- Services (microservices, service mesh)
- Event/Data Streaming (observability, analytics, ML/AI)
- Command and Control
- IoT and Edge
- Telemetry / Sensor Data / Command and Control
- Augmenting or Replacing Legacy Messaging Systems
Serverless Architecture
· Also known as (AWS non official name) "Lambda" architecture.
· LinkedIn drops Lambda Arch to remove complexity [TODO]
@[https://www.infoq.com/news/2020/12/linkedin-lambda-architecture/]
OpenFaas (k8s)
@[https://www.openfaas.com/]
· Serverless Functions, Made Simple. [_PM.low_code]
OpenFaaS® makes it simple to deploy both functions and existing code to Kubernetes.
CliG Dev
@[https://clig.dev/]
Command Line Interface Guidelines
Debezium: Reacting to RDBM row changes
@[https://developers.redhat.com/blog/2020/12/11/debezium-serialization-with-apache-avro-and-apicurio-registry/]
- Debezium is a set of distributed services that captures row-level
database changes so that applications can view and respond to them.
Debezium connectors record all events to a Red Hat AMQ Streams Kafka
cluster. Applications use AMQ Streams to consume change events.
- See also:
https://developers.redhat.com/articles/2021/07/30/avoiding-dual-writes-event-driven-applications#
Apache AirFlow
https://dzone.com/articles/apache-airflow-20-a-practical-jump-start?edition=676391
- community driven platfom to programmatically author, schedule and monitor workflows.
- It is NOT a data streaming solution. Tasks do not move data from one to the other
- Workflows are expected to be mostly static or slowly changing with workflows expected
to look similar from a run to the next.
- "Batteries included" for lot of external apps:
Amazon Elasticsearch Opsgenie Vertica
Apache Beam Exasol Oracle Yandex
Apache Cassandra Facebook Pagerduty Zendesk
Apache Druid File Transfer Protocol (FTP) Papermill
Apache HDFS Google Plexus
Apache Hive gRPC PostgreSQL
Apache Kylin Hashicorp Presto
Apache Livy Hypertext Transfer Protocol (HTTP) Qubole
Apache Pig Internet Message Access Protocol (IMAP) Redis
Apache Pinot Java Database Connectivity (JDBC) Salesforce
Apache Spark Jenkins Samba
Apache Sqoop Jira Segment
Celery Microsoft Azure Sendgrid
IBM Cloudant Microsoft SQL Server (MSSQL) SFTP
Kubernetes Windows Remote Management (WinRM) Singularity
Databricks MongoDB Slack
Datadog MySQL Snowflake
Dingding Neo4J SQLite
Discord ODBC SSH
Docker OpenFaaS Tableau
Telegram
- See also:
· https://github.com/BasPH/data-pipelines-with-apache-airflow
Jailer SQL "Explorer"
@[https://github.com/Wisser/Jailer]
• Jailer Database: tool for database subsetting and relational data browsing.
• The Subsetter exports consistent, referentially intact row-sets
from relational databases, generates topologically sorted SQL-DML,
DbUnit datasets and hierarchically structured XML.
• The Data Browser allows bidirectional navigation through the
database by following foreign-key-based or user-defined relationships.
FEATURES
• Exports consistent and referentially intact row-sets from your
productive database and imports the data into your development and
test environment.
• Improves database performance by removing and archiving obsolete
data without violating integrity.
• Generates topologically sorted SQL-DML, hierarchically structured
XML and DbUnit datasets.
• Data Browsing. Navigate bidirectionally through the database by
following foreign-key-based or user-defined relationships.
• SQL Console with code completion, syntax highlighting and
database metadata visualization.
• A demo database is included with which you can get a first
impression without any configuration effort.
zstd fast lossless compression
Storage: Release Zstandard v1.4.7
https://github.com/facebook/zstd/releases/tag/v1.4.7
Zstandard, or zstd as short version, is a fast lossless compression
algorithm, targeting real-time compression scenarios at zlib-level
and better compression ratios. It's backed by a very fast entropy
stage, provided by Huff0 and FSE library.
New Object Storage Protocol Could Mean the End for POSIX
https://www.enterprisestorageforum.com/cloud-storage/object-storage-protocol-could-mean-the-end-for-posix.html
Kafka: Removing Zookeeper Dependency
https://www.infoq.com/podcasts/kafka-zookeeper-removing-dependency/
Noclassified Identity Technologies
• Modern Authentication (OpenID Connect, Oauth, SAML 2.0)
• Microsoft EMS, Azure Log Analytics Server, Active Directory,
• Kerberos, Tiering ModelMicrosoft PKI (HSM, Certificate Management),
• Venafi CMSZero Trust Concept, B2B Guest Concept
Dual graph
- Wikipedia
https://en.m.wikipedia.org/wiki/Dual_graph
https://en.m.wikipedia.org/wiki/Glossary_of_graph_theory_terms
eDelivery Message exchange protocol
- ISO approves eDelivery Message exchange protocol as International Standard
@[https://ec.europa.eu/cefdigital/wiki/display/CEFDIGITAL/2020/07/24/ISO+approves+eDelivery+message+exchange+protocol+as+International+Standard]
CAI: standard to protect images
- Content Authenticity Initiative
- Today, at Adobe MAX 2019, in collaboration with The New York Times
Company and Twitter, we announced the Content Authenticity Initiative
(CAI) to develop an industry standard for digital content attribution.
Eclipse APP4MC
@[https://www.eclipse.org/app4mc/]
Eclipse APP4MC is a platform for engineering embedded multi- and
many-core software systems. The platform enables the creation and
management of complex tool chains including simulation and
validation. As an open platform, proven in the automotive sector by
Bosch and their partners, it supports interoperability and
extensibility and unifies data exchange in cross-organizational
projects. Multi- and Many-Core Development Process Support The
Amalthea platform allows users to distribute data and tasks to the
target hardware platforms, with the focus on optimization of timing
and scheduling. It addresses the need for new tools and techniques to
make effective use of the level of parallelism in this environment.
RedHat Node.js Ref. Arch.
@[https://github.com/nodeshift/nodejs-reference-architecture]
Functional Components Development Operations
===================== =========== ==========
Web Framework Building good containers Health Checks
Template Engines Static Assets Monitoring/Metrics
Message Queuing Keeping up to date Monitoring
Internationalization Code Quality Metrics Collection
Accessibility Code Consistency Distributed Tracing
API Definition Testing Problem Determination
GraphQL Code Coverage Logging
Databases References to CI/CD Rollout
Authen+Author Npm Deployment
Data Caching Npm Proxy/Priv.Registry Containers
Scaling and Multi-threading Npm Publishing Serverless
Consuming Services Package Development Load-balancing
Node versions/images Secure Development Process Failure Handling
Transactions_handling
Apache Calcite
" The foundation for your next high-performance database "
• Industry-standard SQL parser, validator and JDBC driver:
• Query optimization:
Represent your query in relational algebra, transform using planning
rules, and optimize according to a cost model.
• "Any data, anywhere"
Connect to third-party data sources, browse metadata, and optimize by
pushing the computation to the data.
CloudEvents
• Events are everywhere but event producers tend to describe events
differently and developers must constantly re-learn how to consume events.
This also limits the potential for libraries, tooling and infrastructure
to aide the delivery of event data across environments, like SDKs,
event routers or tracing systems.
• CloudEvents(CNCF) specification describes event data in common formats to
provide interoperability across services, platforms and systems.
• Core Specification:
CloudEvents v1.0.1 WIP
• Optional Specifications:
AMQP Protocol Binding v1.0.1 WIP
AVRO Event Format v1.0.1 WIP
HTTP Protocol Binding v1.0.1 WIP
JSON Event Format v1.0.1 WIP
Kafka Protocol Binding v1.0.1 WIP
MQTT Protocol Binding v1.0.1 WIP
NATS Protocol Binding v1.0.1 WIP
WebSockets Protocol Binding - WIP
Protobuf Event Format v1.0-rc1
Web hook v1.0.1 WIP
• Additional Documentation:
CloudEvents Adapters - WIP
CloudEvents SDK Requirements - WIP
Documented Extensions - WIP
Primer v1.0.1 WIP
Proprietary Specifications - WIP
Search UI:
https://github.com/ProjectOpenSea/search-ui
https://www.elastic.co/enterprise-search/search-ui?size=n_6_n
• React Library for the fast development of modern, engaging search experiences
quickly without re-inventing the wheel.
• Use it with Elastic App Search or Elastic Site Search to have a
search experience up and running in minutes.