picobenchmarks - very short sequence of few instructions. Measures latency and throughput of even individual instructions, usually in assembler, or simple obvious C code. Sometimes same instructions, just different order, to measure pipeline effects. Only of interest to compiler writers, and software writers. Often cannot be used to compare very different CPU architectures. Datasets usually entirely in L1 cache or registers. Example: xor ax, ax; xor bx, bx vs xor ax, ax; mov bx, ax nanobenchmarks - a bit bigger, usually have some kernel-like structure. I.e. looping, vectorization of loops, implementing basic functions, like math special functions, string functions, etc. Dataset often in L1 cache, and have very regular data patterns access, easy to prefetch automatically or manually. Example: LU decomposition of medium size matrices, strstr implementation, sin(x) implementation, float2str implementation, classic Dryhstone benchmark. Fibonacci. Mandelbrot set generation. synthetic - type of nanobenchmark, usually measure key aspects of computer system in abstract workload. Often nonsense code, that just mimicks statistically other programs in number of instructions and code patterns. Often also used for things like database, file system, store tests, with a lot of small random operations. Examples: memory bandwidth, postmark, stress-ng, Whetstone, NBench microbenchmarks - usually synthetics, rearly excercise more than one maybe two system at a time. Can be often used to measure different systems, but usually designed for comparing multiple different implementation / algorithms for solving some problem. Compiler optimizations can easily undo a lot of assumptions, so care needs to be taken. Often is important in other apps, but also often results can be skwed by assembler optimizations (i.e. x86 SSE, are more popular than ARM or or dedicated hardware acceleration (cryptography is a good example). Example: FFT, N-Queens, Sieve, Blowfish, zlib compression, small raytracer, A* algorithm, audio beat-detection, JSON parser. minibenchmarks - real world application fragments, reformed to be easy to run. More open code, a lot of conditions and jumps, high dependence on input data. Have usually no or very small dataset, or it is generated at runtime. Usually no IO, with all input and output entirely in memory. minibenchmarks rearly do real multithreading testing, instead running multiple copies of same benchmarks, with (almost) no shared state (beyond maybe input data). Tests way more of memory subsystem, branch prediction, speculation, etc. Almost always code and data does not fit L1, but on some modern CPUs, code could fit in L2 or L3. Binary code between 100kB and 500kB usually, rearly much bigger. Example: gcc compiler parser, sqlite3 database, ray tracer with some complex shaders, structure from motion, HTML DOM manipulation from JavaScript, PDF rendering, video object detection. Geekbench, JavaScript Octane, and SPEC suites are example of such benchmarks. Minibenchmarks can be very deciving, especially as many of them are in suites, and have dozens of subtests, but report often weighted geometric mean, which can be highly influenced just by single sub-test. Also same microbenchmark can be run with different input size, which could move from compute bound to memory bound, and test a very different things. macrobenchmarks - usually complex apps, either end-user app that has own utility, and benchmarking is not a primary or even secondary purpose. Most realistic, highest number of variables. Big code, big datasets, IO, networking, encryption, libraries, graphics, multithreaded with complex synchronization, a lot of random memory allocations, etc, etc. Might require a long time to remove variance, do warmups, or do setup. Easily can go out of date due to use of old APIs or prioritary nature of some parts. Often macrobenchmarks can also be used as a stress test tool, to asses limits of the system or software. Often will take hours, to ensure stability, measure tail latency, ensure no memory leaks, etc. Code and data almost always cannot fit even in L2 or L3 caches, but some small sub-parts could be considered hot. Binary code between 5MB and 100MB, with about half of it being cold or dead, and other half actually executed. A lot of code being complex due to glue, configuration, error handling, business logic, special cases handling, etc. Example: Cybperpunk 2077, PostgreSQL 17 stress test, Linux kernel compilation (highly parallel but varied), GROMACS molecular dynamics simulation, Blender Cycles renderer (embrasingly parallal), LaTeX run on a big complex document, complex web app in Python or PHP with MySQL, behind Apache and nginx reverse proxy.