evitaDB - quick database for e-commerce
Original name |
evitaDB - rychlá databáze pro e-commerce |
Author(s) |
Jan Novotný |
Length |
54:23 |
Date |
14-11-2023 |
Language |
Czech 🇨🇿 |
Rating |
⭐⭐⭐⭐⭐ |
-
✅ The speaker proved he was truly an author of evitaDB as his speech was fluent and purely informative.
"evitaDB offers what others don’t because it focuses on a smaller segment of the market."
Problem domain
-
Catalog huge e-commerce web applications
-
Big amount of connected data:
-
100k+ items and relations
-
Millions of prices and attributes
-
The vast majority of operations are read-only
-
-
Caching problem
-
High amount of combinations and relations, that cannot be cached at once
-
Requirements for accuracy and exactness of data (stored items, prices) that change rapidly
-
-
Requirements for low latency and high throughput
-
Repeating scenarios
Command Query Responsibility Segregation (CQRS)
It is a common pattern in this domain: Separated data and conceptual models for data writing and management (primary store) and reading (secondary store).
(More about CQRS at Microservices + Event Driven Architecture in tech. company | Sponsored talk from KB.)
evitaDB offers what others don’t because it focuses on a smaller segment of the market:
-
noSQL DB for JVM optimized for fast reads many times than PostgreSQL and ElasticSearch
-
ACID principles and entity relations
-
variable model for structured and non-structured data
-
REST, GraphQL, gRPC API
Run easily
-
Download Docker image (160MB in total where 40MB is the engine itself):
docker pull index.docker.io/evitadb/evitadb:latest
-
Download sample data (70MB):
wget https://evitadb.io/download/evita-demo-dataset.zip unzip -d data evita-demo-dataset.zip
-
Run the database:
# there is open issue https://github.com/docker/roadmap/issues/238 for Windows / MacOS # and you need to open ports manually docker run --name evitadb -i -p 5555:5555 -p 5556:5556 -p 5557:5557 \ -v "./data:/evita/data" \ -e "api.exposedOn=localhost" \ index.docker.io/evitadb/evitadb:latest
-
Open the browser and view the data:
firefox https://localhost:5555/lab
evitaLab
evitaDB organized data into catalogs (i.e. database schema in relation world) and entity collections (i.e. connected tables in relation world).
Data model: Entity
-
Entity header: Hierarchy placement and version.
-
Attributes set: Structured global / localized data resembling a table row in RDBMS.
-
Associated data: Non-structured JSON global / localized data.
-
Prices: Structured data (id, priceList, currency, priceWithoutTax, priceWithTax, taxRate, validity, sellable, innerRecordId).
-
Reference: Structured global / localized data, foreign keys and group placement. Relation-specific attributes set.
The entities are immutable and versioned, i. e upon editing entity a new one with higher version is created instead.
Schema definition
evita.updateCatalog(
"testCatalog",
session ->{ session
.defineEntitySchema ("Product")
.verifySchemaButAllow(
EvolutionMode.ADDING_ASSOCIATED_DATA,
EvolutionMode.ADDING_REFERENCES
)
.withoutHierarchy()
.withPrice()
.withLocale(Locale.ENGLISH, new Locale("cs", "CZ"))
.withAttribute("code", String.class, whichIs -> whichIs.unique())
.withAttribute("url", String.class, whichIs -> whichIs.unique().localized())
.withAttribute("oldEntityUrls", String[].class, whichIs -> whichIs.filterable().localized())
.withAttribute("name", String.class, whichIs -> whichIs.filterable().sortable())
.withAttribute("ean", String.class, whichIs -> whichIs.filterable())
.withAttribute("priority", Long.class, whichIs-> whichIs.sortable())
.withAttribute("validity", DateTimeRange.class, whichIs -> whichIs.filterable())
.withAttribute("quantity", BigDecimal.class, whichIs -> whichIs.filterable().indexDecimalPlaces(2))
.withAttribute("alias", Boolean.class, whichIs -> whichIs.filterable())
.withAssociatedData("referencedFiles", ReferencedFileSet.class)
.withAssociatedData("labels", Labels.class, whichIs ->w hichIs.localized())
.withReferenceToEntity(
"categories", "Category", Cardinality.ZERO_OR_MORE,
whichIs -> whichIs.indexed()
.withAttribute("categoryPriority", Long.class, thatIs -> thatIs.sortable())
.withReferenceToEntity(
"brand", "Brand", Cardinality.ZERO_OR_ONE, whichIs -> whichIs.faceted())
.withReferenceTo(
"stock", "stock", Cardinality.ZERO_OR_MORE, whichIs -> whichIs. faceted())
.updateVia(session);
@Entity(
allowedEvolution = {
EvolutionMode.ADDING_LOCALES,
EvolutionMode.ADDING_CURRENCIES
}
public interface Product {
@PrimaryKey
int getId();
@Attribute
@Nonnull
String getCode();
@Attribute(name = "manufacturedBefore", filterable = true)
@Nonnull
default int[] getYears() {
// the default implementation defines default value
return new int[] (1978, 2005, 2020);
@AssociatedData
@Nonnull
ReferencedFiles getReferencedFiles();
@ParentEntity
int getParentEntity();
@PriceForSale
PriceContract getSelling Price();
@Reference
Brand getMarketingBrand();
@Reference
Brand[] getLicensingBrands();
record ReferencedFiles(@Nonnull int... fileId) implements Serializable {}
interface Brand extends Serializable {
@ReferencedEntity
int getBrand();
@ReferencedEntityGroup
int getBrandGroup();
@Attribute
String getMarket();
}
}
The latter way is preferred as there is no requirement for de/serialization. The contextual autocomplete helps to learn the domain model.
The database knows its structure and generates REST API or gRPC based on the schema.
Queries
The goal was to keep the queries consistent among protocols:
query(
collection('Product'),
filterBy(
attributeEquals('status, 'ACTIVE'),
attributeGreaterThan('battery-life', 10),
or(
attributeIs('validity', NULL),
attributeInRangeNow('validity')
)
),
orderBy(
attributeNatural('orderedQuantity', DESC)
),
require(
entityFetch(
attributeContentAll()
)
)
)
query {
listProduct(
filterBy: {
attributeStatusEquals: "ACTIVE"
attributeBatteryLifeGreater Than: "10"
or: [
{ attributeValidityIs: NULL },
{ attributeValidityInRange: "2023-10-28T00:00:00+00:00" }
]
orderBy: {
attributeOrderedQuantityNatural: DESC
}
){
primaryKey
attributes {
code
status
batteryCapacity
batteryLife
}
}
}
Localization
query(
collection('Product'),
filterBy(
entityPrimaryKeyInSet(103885, 103911, 105715),
entityLocaleEquals('en')
),
orderBy(
attributeNatural('name', DESC)
),
require(
entityFetch(
attributeContent('name', 'descriptionShort')
)
)
)
query(
collection('Product'),
filterBy(
entityPrimaryKeyInSet(103885, 103911, 105715),
),
require(
entityFetch(
attributeContent('name', 'descriptionShort')
dataInLocales('cs', 'en')
)
)
)
Entity graph loading
It is an analogy to SQL JOIN clauses with these exceptions:
-
It is not needed to specify a binding key - it is always a primary key.
-
There is no problem ONE-TO-MANY cardinality like in the relation database that multiplies data in output.
It is possible to load an entity tree (graph) of referring entities into any depth.
query(
collection('Product'),
filterBy(
entityPrimaryKeyInSet (103885)
),
require(
entityFetch(
referenceContentWithAttributes(
'parameterValues',
entityFetch(
attributeContent('code'),
referenceContentWithAttributes(
'parameter',
entityFetch(
attributeContent('code')
)
)
)
)
)
)
)
Lazy-loaded is partly supported, planned but not given a priority as it brings another set of performance problems. The recommended approach is to know what we really need.
The entities in the tree can be further filtered and ordered.
query(
collection('Product'),
filterBy(
entityPrimaryKeyInSet(103885)
),
require(
entityFetch(
referenceContentWithAttributes(
'parameterValues',
filterBy(
entityHaving(
referenceHaving(
'parameter',
entityHaving(
attributeContains('code', 'r')
)
)
)
),
orderBy(
entityProperty(
attributeNatural('code', DESC)
)
),
entityFetch(
attributeContent('code'),
referenceContentWithAttributes(
'parameter',
entityFetch(
attributeContent('code')
)
)
)
)
)
)
)
query{
getProduct (primaryKey: 103885) {
primaryKey
attributes { code }
parameterValues(
filterBy: {
entityHaving: {
referenceParameterHaving: {
entityHaving: {
attributeCodeContains: "r"
}
}
}
}
orderBy: {
entityProperty: {
attributeCodeNatural: DESC
}
}
) {
referencedEntity {
primaryKey
attributes { code }
parameter {
referencedEntity {
primaryKey
attributes { code }
}
}
}
}
}
}
Parametrized queries (facets)
It enables to select a range of attribute values to find out what products make sense to the customer. Numbers in the brackets simplify navigation and prevent from excessively narrowing the range. Standard parameter values combinations are:
-
OR within the same group
-
AND across the groups
-
… and exceptions
The parameters can be both checkboxes and radio buttons. Facets can be at the reference level in evitaDB.
A good e-shot does not let you go into the dead-end, and has a solid parameterized search: There are huge number of combinations but only few make sense.
The process should navigate you.
query(
collection('Product'),
filterBy(
attributeEquals('status', 'ACTIVE'),
userFilter(
facetHaving(
'groups',
entityHaving(
attributeEquals('code', 'sale')
)
)
)
),
require(
facetSummary(
COUNTS,
entityFetch(
attributeContent('code')
),
entityGroupFetch(
attributeContent('code')
)
)
)
)
A single query not only eliminates a need for round trips and the context is kept together for optimized search where intermediate results can be reused. In contrary using multiple queries breaks such a search context.
Hierarchy structure
Menu types:
-
Mega menu.
-
Highly dynamic collapsible menu.
-
Simple breakdown of the nearest subcategories.
-
Menu with path to the actual category.
Further features:
-
An option to exclude empty categories.
-
(Transitively) count the items in the categories.
-
Subcategories count.
-
Subtree exclusion based on the expression / condition (for example private directory, Christmas, etc.).
query(
collection('Product'),
filterBy(
hierarchyWithin(
'categories',
attributeEquals('code', 'audio'),
excluding (
attributeEquals('code', 'wireless-headphones')
)
)
),
require(
hierarchyOfReference(
'categories',
parents(
'parentsWithTheirSiblings',
entityFetch(attributeContent('code')),
siblings(
entityFetch(attributeContent('code'))
)
),
fromRoot(
'roots',
entityFetch (attributeContent('code')),
stopAt(level(1)),
statistics (CHILDREN_COUNT, QUERIED_ENTITY_COUNT)
)
)
)
)
)
Finding selling price
The database supports different prices for different customers.
-
B2C prices - simple solution:
-
Strikethrough price (selected prices are not subject of further calculations).
-
Multi-currency support.
-
Price validity and configuration varies in certain and multiple time ranges.
-
Elimination of items with no price
-
-
B2B prices
-
Price with and without VAT.
-
Price list prioritization.
-
-
Sorting by dynamically calculated prices.
-
Aggregated prices:
-
Master / variants (a single variant price, for example a t-shirt with different sizes and colors, i.e. a relation database would suffer doing
GROUP BY
over millions of rows). -
Products sets (sum of price of the set)
-
The implementation has prices as first-class citizen at the entity levels so the whole model is pre-optimized for these use cases.
query(
collection('Product'),
filterBy(
entityPrimaryKeyInSet(113010),
priceValidInNow(),
priceInPricelists('employee-basic-price', 'basic'),
priceInCurrency ('CZK')
),
orderBy(
priceNatural(ASC)
),
require(
entityFetch(
attributeContent('code'),
priceContentRespectingFilter('reference')
)
)
)
Histograms
Histograms are appropriate of representing interval values, i.e. high-cardinality values.
Possible use cases for:
-
Selling price
-
Attributes
-
Facets
They suggest the user in what interval there are the most items available. The argument is a number of "columns" of the histogram that is optimal for FE rendering.
query(
collection('Product'),
filterBy(
priceValidInNow(),
priceInPricelists('basic'),
priceInCurrency('EUR'),
userFilter(
priceBetween(100, 2000),
attributeBetween(
'battery-capacity', 2000, 3000
)
)
),
require(
priceHistogram(10),
attributeHistogram (10, 'battery-capacity')
)
)
Java API
Remote evitaDB
It is possible to start up evitaDB as a Java application with exposed gRPC interface and a client connecting to the database. The communication is done via gRPC protocol.
For example, it is possible to connect to evitaDB running in Docker as a remote process.
var evita = new EvitaClient(
EvitaClientConfiguration.builder()
.host("localhost")
.port(5566)
.build());
Embedded evitaDB
The database can run in-memory. While this solution has no overhead of gRPC calls, it stores indices for search (not data themselves) for the sake of performance which takes a lot of RAM and if we need memory dump, we hardly find what we need as in-memory evitaDB would take 90% of the memory.
final Evita evita = new Evita (
EvitaConfiguration
.builder()
.build()
);
final ExternalApiServer externalApiServer = ExternalApiServer(
evita,
ApiOptions.builder()
.enable(GrpcProvider.CODE)
.enable(SystemProvider.CODE)
.build());
// open the API on configured ports
externalApiServer..start();
//close the server and the ports, then close evitaDB itself
Runtime.getRuntime().addShutdownHook(new Thread(() -> {
externalApiServer.close();
evita.close();
}));
This solution is optimal for integration tests and also optimized for parallel tests run.
Example test run
Tests result (normally they run under one minute):
Tests finished in: 1m 10s
Evita statistics:
- instances created: 153
- simultaneous instances peak: 11
- entities created: 15264
- ports opened: 80
- simultaneously opened ports: 15
The video from the tests run contained most likely CPU and Memory live charts in IntelliJ Idea. |
License
Open-source friendly though not fully open-source license: Business Source License, Version 1.1.
The license causes the code older than 4 years becomes licensed under Apache License, Version 2.0.
Links
-
Docker Hub: https://hub.docker.com/r/evitadb/evitadb:
index.docker.io/evitadb/evitadb:latest
-
Maven: https://central.sonatype.com/artifact/io.evitadb/evita_db/0.8.ALPHA
<dependency> <groupId>io.evitadb</groupId> <artifactId>evita_db</artifactId> <version>0.9. ALPHA</version> <type>pom</type> </dependency>
-
Homepage: https://evitadb.io
-
Discord: https://bit.ly/evitaDB
The project is still in the EA (early access) mode, that’s why it is ALPHA .
|