Don’t be scared of benchmarks in the development
Original name |
Nebojte se benchmarků při vývoji |
Author(s) |
Jan Novotný |
Length |
45:53 |
Date |
13-11-2023 |
Language |
Czech 🇨🇿 |
Rating |
⭐⭐⭐⭐☆ |
-
✅ Explanation of why Moore’s Law ceases to apply. Highlighted pitfalls of the data preparation for JMH tests. Nice overview of profiling and the tools for it.
-
⛔ I cannot imagine how to configure and run the JMH tests for integrations as was advised.
-
⛔ JMH Stack Profiling was not sufficiently explained.
"Moore’s Law ceases to apply as a single core performance converges to a limit defined by the laws of physics."
Moore’s Law ceases to apply as a single core performance converges to a limit defined by the laws of physics: Performance becomes a thing again!
The producers compensate for the performance limits by adding more threads.
Java Microbenchmark Harness (JMH) is a low-lever test framework for simple and quick unit and integration performance tests writing.
Set-up
Maven dependencies
-
org.openjdk.jmh:jmh-core
(compile
) -
org.openjdk.jmh:jmh-generator-annprocess
(provided
)
Samples are available at https://github.com/openjdk/jmh/tree/master/jmh-samples/src/main/java/org/openjdk/jmh/samples There are three basic ways to run it:
Run
-
Using IntelliJ Idea using a JMH plugin (click on the green triangle and run as a test).
-
Using the main class:
org.openjdk.jmh.Main.main(args);
-
Using the command line, for which it’s necessary to use a Maven Shade Plugin (
org.apache.maven.plugins:maven-shade-plugin
) to create a fat jar (also called uberjar) that contains the performance tests.<executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> <configuration> <finalName>benchmarks</finalName> <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> <mainClass>org.openjdk.jmh.Main</mainClass> </transformer> <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"></transformer> </transformers> <filters> <filter> <!-- sharding of signed JAR fails without this: https://stackoverflow.com/q/999489/3764965 --> <artifact>*:*</artifact> <excludes> <exclude>META-INFO/*.SF</exclude> <exclude>META-INFO/*.DSA</exclude> <exclude>META-INFO/*.RSA</exclude> </excludes> </filter> </filters> </configuration> </execution> </executions>
Upon running
mvn clean install
, the JAR with tests can be run:java -jar target/benchmarks.jar
Configuration
Annotations
Annotations are available to configure the test, though they can be overridden with the command line arguments.
-
@Benchmark
annotates the performance test itself. -
@Warmup
defines the warm-up. -
@Measurement
defines the benchmark measurement details. -
@Fork
defines the forking. The test should run in the separate JVM (@Fork(1)
) to not influent the tests and yield representative results -
@BenchmarkMode
and@OutputTimeUnit
to define the output. -
@Setup(Level.XXX)
and@TearDown(Level.XXX)
for test and data preparations. -
@State(Scope.XXX)
to define the test data scope.
Lifecycle of the test
There are three levels: Level.Trial
, Level.Iteration
, and Level.Invocation
.
-
Trial is run once before (
@Setup(Level.Trial
) and after (@TearDown(Level.Trial
) the test and is used for initialization and closing resources. -
Iteration is run always before (
@Setup(Level.Iteration
) and after (@TearDown(Level.Iteration
) each test iteration and is used for the data preparation and clean-up. -
Invocation is run always before (
@Setup(Level.Invocation
) and after (@TearDown(Level.Invocation
) the test method (@Benchmark
) itself and might have an impact on the test, so it’s recommended to not handle data in this phase unless it’s quick compared to the test execution itself.
Data preparation
A stateful data object annotated with @State
and defined scope is passed as a formal parameter to the method annotated with @Benchmark
.
Scopes
Scopes define the liveness of the @State
data object:
-
Scope.Thread
is the most used one and the data are isolated and instantiated per thread. -
Scope.Benchmark
creates one instance for all the tests and threads, so beware of the synchronization and access. -
Scope.Group
grants all instances will be shared across all threads within the same group defined by@Group
. This advanced scope is the least and the hardest to use and configure correctly.
Debugging
The debugging of the test (especially data preparation) does not work with a JVM fork (@Fork(1)
). For this purpose, the fork has to be disabled (@Fork(0)
).
Gotchas
It’s needed to watch out for the data if they are sent with each iteration as we assume.
Profiles
Profiles reveal the reasoning behind the bad test results as we want to know why.
IntelliJ idea has its profiler displaying the process and time diagram of test method execution.
JMH Stack Profiling can reveal the multithreading issue, for example:
-
A single thread processes 2878000 ops/s, output:
Stack profiler: ...[Thread state distributions]...... 75.0% RUNNABLE 25.0% TIMED_WAITING
-
Multiple (12) threads process together each only 2121178 ops/s, output:
Stack profiler: ...[Thread state distributions]...... 69.2% BLOCKED 24.0% RUNNABLE 6.7% TIMED_WAITING
The reason is that the threads spend nearly 70% of the time in the
BLOCKED
state.
Types of profiles: Stack profiling, Linux performance profiling (for L1 cache), GC profiling, ClassProfiler (number of un/loaded classes), CompilerProfiler (time dedicated for JIT compilation)…
Best practices
-
Define a problem, formulate a hypothesis, state the expectations and perform an experiment.
-
Ask the following: What do I try to measure? Do I measure it correctly? Why the result is not what I expect? What happens?
-
Do not stick with the absolute numbers and rather compare and use the relative numbers as the results depend on hardware, OS, surrounding code, etc.
-
Use the production-alike data.
-
Observe the
BLOCKED
thread time in multithreading applications. -
Be careful with formulating and summarizing the results. It’s worth consulting with a colleague, reviewing the test, and involving more people.
-
Observe the changes in time and use the CI/CD pipeline for the performance tests.