Jetpack Microbenchmark: Code Performance Testing

FunCorp

Oct 12, 2022

I’ll explain how the Microbenchmark library works and show examples of how to use it. Perhaps this will help you evaluate performance and solve controversial situations during the code review.

If you need to check the execution time of the code, then the first thing that comes to mind looks something like this:

val startTime = System.currentTimeMillis()
// Execute the code we want to evaluate
val totalTime = System.currentTimeMillis() - startTime

It does not consider the “warm-up” of the code under study.
It does not take into account the state of the device, for example, Thermal Throttling.
It produces only one result, with no idea of the variance in execution time.
This can make it more challenging to isolate the code under test.

Therefore, estimating the execution time is not as trivial as it may seem at first glance. There is a solution, for example, Firebase Performance Monitoring. Still, it is more suitable for monitoring performance in a production environment and is not ideal for isolated parts of the code.

Google’s library is better able to do this.

What is Microbenchmark

Microbenchmark is a library from Jetpack that allows you to estimate the execution time of Kotlin and Java code quickly. It can, to some extent, rid the final result of the influence of warm-up, throttling, and other factors, and it can also generate reports in the console or JSON file. Also, this tool can be used with CI, allowing you to identify performance problems already at the initial stages.

This library provides the best results when profiling code that is used repeatedly. Good examples would be RecyclerView scrolling, data conversion, and so on.

It is also advisable to exclude the influence of the cache if any. This can be done by generating unique data before each run. In addition, performance tests require specific settings (for example, debuggable disabled), so the right solution is to put them in a separate module.

How Microbenchmark Works

Let’s see how the library works.

All benchmarks run inside the IsolationActivity (the AndroidBenchmarkRunner class is responsible for the first launch), where the initial configuration occurs.

It consists of the following steps:

Availability of other Activities with the test. In case of duplication, the test will fail with the following error: Only one IsolationActivity should exist.
Check the Sustained Mode support. This is a mode in which the device can maintain a constant performance level, which has a good effect on the consistency of the results.
In parallel with the test, the BenchSpinThread process starts with THREAD_PRIORITY_LOWEST. This is done so that at least one core is constantly loaded. This approach only works in combination with Sustained Mode.

In general terms, the job of a benchmark is to run code from a test a number of times and measure the average time it takes to run. But there are certain subtleties. For example, with this approach, the first launches will take several times more time. This is because there may be a dependency in the code under test that spends a lot of time initializing. In some ways, this is similar to the engine of a car, which needs some time to warm up.

Before the control runs, you need to make sure that everything is working normally, and that the warm-up is completed. In the library code, the end of warm-up is the state when the next run of the test gives a result within a certain error range.

fun onNextIteration(durationNs: Long): Boolean {
    iteration++
    totalDuration += durationNs
            
    if (iteration == 1) {
        fastMovingAvg = durationNs.toFloat()
        slowMovingAvg = durationNs.toFloat()
        return false
    }
            
    fastMovingAvg = FAST_RATIO * durationNs + (1 - FAST_RATIO) * fastMovingAvg
    slowMovingAvg = SLOW_RATIO * durationNs + (1 - SLOW_RATIO) * slowMovingAvg
            
    // If fast moving avg is close to slow, the benchmark is stabilizing
    val ratio = fastMovingAvg / slowMovingAvg
            
    if (ratio < 1 + THRESHOLD && ratio > 1 - THRESHOLD) {
        similarIterationCount++
    } else {
        similarIterationCount = 0
    }
            
    if (iteration >= MIN_ITERATIONS && totalDuration >= MIN_DURATION_NS) {
        if (similarIterationCount > MIN_SIMILAR_ITERATIONS ||
            totalDuration >= MAX_DURATION_NS) {
            // benchmark has stabilized, or we're out of time
            return true
        }
    }
    
    return false
}

In addition to warming up the code, Thermal Throttling detection is implemented inside the library. You should not allow this state to affect your tests because throttling increases the average execution time.

Detecting overheating is much easier than the WarmupManager. The isDeviceThermalThrottled method checks the execution time of a small test function within this class. Namely, the time of copying a small ByteArray is measured.

private fun measureWorkNs(): Long {
    // Access a non-trivial amount of data to try and 'reset' any cache state.
    // Have observed this to give more consistent performance when clocks are unlocked.
    copySomeData()
    val state = BenchmarkState()
    state.performThrottleChecks = false
    val input = FloatArray(16) { System.nanoTime().toFloat() }

    val output = FloatArray(16)
            
    while (state.keepRunningInline()) {
        // Benchmark a simple thermal
        Matrix.translateM(output, 0, input, 0, 1F, 2F, 3F)
    }
            
    return state.stats.min
}
        
/**
 * Called to calculate throttling baseline, will be ignored after first call.
 */
fun computeThrottleBaseline() {
    if (initNs == 0L) {
        initNs = measureWorkNs()
    }
}
        
/**
 * Makes a guess as to whether the device is currently thermal throttled based on performance
 * of single-threaded CPU work.
*/
fun isDeviceThermalThrottled(): Boolean {
    if (initNs == 0L) {
        // not initialized, so assume not throttled.
        return false
    }
            
    val workNs = measureWorkNs()
    return workNs > initNs * 1.10
}

The above data is used when running basic tests. It helps to exclude runs for warm-up and those affected by throttling (if any). By default, 50 significant runs are performed, and if desired, this number and other constants are easily changed to the necessary ones. But you need to be careful, as this can greatly affect the operation of the library.

@Before
fun init() {
    val field = androidx.benchmark.BenchmarkState::class.java.getDeclaredField("REPEAT_COUNT")
    field.isAccessible = true
    field.set(benchmarkRule, GLOBAL_REPEAT_COUNT)
}

Let’s try to work with the library as ordinary users. Let’s test the JSON read and write speed for GSON and Kotlin Serialization.

@RunWith(AndroidJUnit4::class)
class KotlinSerializationBenchmark {
    private val context = ApplicationProvider.getApplicationContext<Context>()
    private val simpleJsonString = Utils.readJsonAsStringFromDisk(context, R.raw.simple)
        
    @get:Rule val benchmarkRule = BenchmarkRule()
        
    @Before
    fun init() {
        val field = androidx.benchmark.BenchmarkState::class.java.getDeclaredField("REPEAT_COUNT")
        field.isAccessible = true
        field.set(benchmarkRule, Utils.GLOBAL_REPEAT_COUNT)
    }
        
    @Test
    fun testRead() {
        benchmarkRule.measureRepeated {
            Json.decodeFromString<List<SmallObject>>(simpleJsonString ?: "")
        }
    }
        
    @Test
    fun testWrite() {
        val testObjects = Json.decodeFromString<List<SmallObject>>(simpleJsonString ?: "")
        benchmarkRule.measureRepeated {
            Json.encodeToString(testObjects)
        }
     }
}

To evaluate the test results, you can use the console in Android Studio or generate a report in a JSON file. Moreover, the details of the report in the console and the file are very different: in the first case, you can only find out the average execution time, and in the second you will get a full report indicating the time of each run (useful for plotting graphs) and other information.

Setting up reports is located in the edit rune configuration window > extra params instrumentation. The parameter that is responsible for saving reports is called androidx.benchmark.output.enable. Additionally, here you can configure the import of values from Gradle, which will be useful when running on CI.

Settings for Running Performance Tests with Reports Enabled

From now on, when you run tests, reports will be saved to the application directory, and the file name will correspond to the class name. You can see an example of the report structure here.

Conclusion

At Funcorp, this tool was used to find the best solution among JSON parsers. In the end, Kotlin Serialization won. At the same time, we really missed profiling for CPU and memory consumption during testing, we had to receive this data separately.

It may seem that the tool has insufficient functionality, and also that its capabilities are limited, and the scope of application is very specific. In general, it is, but in some cases, it can be very useful. Here are a few cases:

Evaluating the performance of the new library in your project.
Resolution of disputable situations during the code review, when it is necessary to justify the choice in favor of a particular decision.
Collecting statistics and evaluating code quality over a long period of time when integrating with CI.

Microbenchmark also has an older sibling called Macrobenchmark, which is designed to evaluate UI operations, such as app launches, scrolling, and animations. However, that’s a topic for a separate article.