Designing for Performance

In this document

Introduction
Optimize Judiciously
Avoid Creating Unnecessary Objects
Performance Myths
Prefer Static Over Virtual
Avoid Internal Getters/Setters
Use Static Final For Constants
Use Enhanced For Loop Syntax
Consider Package Instead of Private Access with Inner Classes
Use Floating-Point Judiciously
Know And Use The Libraries
Use Native Methods Judiciously
Closing Notes

An Android application will run on a mobile device with limited computingpower and storage, and constrained battery life. Because ofthis, it should beefficient. Battery life is one reason you mightwant to optimize your app even if it already seems to run "fast enough".Battery life is important to users, and Android's battery usage breakdownmeans users will know if your app is responsible draining their battery.

Note that although this document primarily covers micro-optimizations,these will almost never make or break your software. Choosing the rightalgorithms and data structures should always be your priority, but isoutside the scope of this document.

Introduction

There are two basic rules for writing efficient code:

Don't do work that you don't need to do.
Don't allocate memory if you can avoid it.

Optimize Judiciously

This document is about Android-specific micro-optimization, so it assumesthat you've already used profiling to work out exactly what code needs to beoptimized, and that you already have a way to measure the effect (good or bad)of any changes you make. You only have so much engineering time to invest, soit's important to know you're spending it wisely.

(See Closing Notes for more on profiling andwriting effective benchmarks.)

This document also assumes that you made the best decisions about datastructures and algorithms, and that you've also considered the futureperformance consequences of your API decisions. Using the right datastructures and algorithms will make more difference than any of the advicehere, and considering the performance consequences of your API decisions willmake it easier to switch to better implementations later (this is moreimportant for library code than for application code).

(If you need that kind of advice, see Josh Bloch's Effective Java,item 47.)

One of the trickiest problems you'll face when micro-optimizing an Androidapp is that your app is pretty much guaranteed to be running on multiplehardware platforms. Different versions of the VM running on differentprocessors running at different speeds. It's not even generally the casethat you can simply say "device X is a factor F faster/slower than device Y",and scale your results from one device to others. In particular, measurementon the emulator tells you very little about performance on any device. Thereare also huge differences between devices with and without a JIT: the "best"code for a device with a JIT is not always the best code for a devicewithout.

If you want to know how your app performs on a given device, you need totest on that device.

Avoid Creating Unnecessary Objects

Object creation is never free. A generational GC with per-thread allocationpools for temporary objects can make allocation cheaper, but allocating memoryis always more expensive than not allocating memory.

If you allocate objects in a user interface loop, you will force a periodicgarbage collection, creating little "hiccups" in the user experience. Theconcurrent collector introduced in Gingerbread helps, but unnecessary workshould always be avoided.

Thus, you should avoid creating object instances you don't need to. Someexamples of things that can help:

If you have a method returning a string, and you know that its result will always be appended to a StringBuffer anyway, change your signature and implementation so that the function does the append directly, instead of creating a short-lived temporary object.
When extracting strings from a set of input data, try to return a substring of the original data, instead of creating a copy. You will create a new String object, but it will share the char[] with the data. (The trade-off being that if you're only using a small part of the original input, you'll be keeping it all around in memory anyway if you go this route.)

A somewhat more radical idea is to slice up multidimensional arrays intoparallel single one-dimension arrays:

An array of ints is a much better than an array of Integers, but this also generalizes to the fact that two parallel arrays of ints are also alot more efficient than an array of (int,int) objects. The same goes for any combination of primitive types.
If you need to implement a container that stores tuples of (Foo,Bar) objects, try to remember that two parallel Foo[] and Bar[] arrays are generally much better than a single array of custom (Foo,Bar) objects. (The exception to this, of course, is when you're designing an API for other code to access; in those cases, it's usually better to trade good API design for a small hit in speed. But in your own internal code, you should try and be as efficient as possible.)

Generally speaking, avoid creating short-term temporary objects if youcan. Fewer objects created mean less-frequent garbage collection, which hasa direct impact on user experience.

Performance Myths

Previous versions of this document made various misleading claims. Weaddress some of them here.

On devices without a JIT, it is true that invoking methods via avariable with an exact type rather than an interface is slightly moreefficient. (So, for example, it was cheaper to invoke methods on aHashMap map than aMap map, even though in bothcases the map was a HashMap.) It was not the case that thiswas 2x slower; the actual difference was more like 6% slower. Furthermore,the JIT makes the two effectively indistinguishable.

On devices without a JIT, caching field accesses is about 20% faster thanrepeatedly accesssing the field. With a JIT, field access costs about the sameas local access, so this isn't a worthwhile optimization unless you feel itmakes your code easier to read. (This is true of final, static, and staticfinal fields too.)

Prefer Static Over Virtual

If you don't need to access an object's fields, make your method static.Invocations will be about 15%-20% faster.It's also good practice, because you can tell from the methodsignature that calling the method can't alter the object's state.

Avoid Internal Getters/Setters

In native languages like C++ it's common practice to use getters (e.g.i = getCount()) instead of accessing the field directly (i= mCount). This is an excellent habit for C++, because the compiler canusually inline the access, and if you need to restrict or debug field accessyou can add the code at any time.

On Android, this is a bad idea. Virtual method calls are expensive,much more so than instance field lookups. It's reasonable to followcommon object-oriented programming practices and have getters and settersin the public interface, but within a class you should always accessfields directly.

Without a JIT, direct field access is about 3x faster than invoking atrivial getter. With the JIT (where direct field access is as cheap asaccessing a local), direct field access is about 7x faster than invoking atrivial getter. This is true in Froyo, but will improve in the future whenthe JIT inlines getter methods.

Use Static Final For Constants

Consider the following declaration at the top of a class:

static int intVal = 42;static String strVal = "Hello, world!";

The compiler generates a class initializer method, called<clinit>, that is executed when the class is first used.The method stores the value 42 intointVal, and extracts areference from the classfile string constant table forstrVal.When these values are referenced later on, they are accessed with fieldlookups.

We can improve matters with the "final" keyword:

static final int intVal = 42;static final String strVal = "Hello, world!";

The class no longer requires a <clinit> method,because the constants go into static field initializers in the dex file.Code that refers tointVal will usethe integer value 42 directly, and accesses to strVal willuse a relatively inexpensive "string constant" instruction instead of afield lookup. (Note that this optimization only applies to primitive types andString constants, not arbitrary reference types. Still, it's goodpractice to declare constants static final whenever possible.)

Use Enhanced For Loop Syntax

The enhanced for loop (also sometimes known as "for-each" loop) can be usedfor collections that implement the Iterable interface and for arrays.With collections, an iterator is allocated to make interface callsto hasNext() and next(). With an ArrayList, a hand-written counted loop isabout 3x faster (with or without JIT), but for other collections the enhancedfor loop syntax will be exactly equivalent to explicit iterator usage.

There are several alternatives for iterating through an array:

  static class Foo {    int mSplat;  }  Foo[] mArray = ...  public void zero() {    int sum = 0;    for (int i = 0; i < mArray.length; ++i) {      sum += mArray[i].mSplat;    }  }  public void one() {    int sum = 0;    Foo[] localArray = mArray;    int len = localArray.length;    for (int i = 0; i < len; ++i) {      sum += localArray[i].mSplat;    }  }  public void two() {    int sum = 0;    for (Foo a : mArray) {      sum += a.mSplat;    }  }

zero() is slowest, because the JIT can't yet optimize awaythe cost of getting the array length once for every iteration through theloop.

one() is faster. It pulls everything out into localvariables, avoiding the lookups. Only the array length offers a performancebenefit.

two() is fastest for devices without a JIT, andindistinguishable fromone() for devices with a JIT.It uses the enhanced for loop syntax introduced in version 1.5 of the Javaprogramming language.

To summarize: use the enhanced for loop by default, but consider ahand-written counted loop for performance-critical ArrayList iteration.

(See also Effective Java item 46.)

Consider Package Instead of Private Access with Private Inner Classes

Consider the following class definition:

public class Foo {  private class Inner {    void stuff() {      Foo.this.doStuff(Foo.this.mValue);    }  }  private int mValue;  public void run() {    Inner in = new Inner();    mValue = 27;    in.stuff();  }  private void doStuff(int value) {    System.out.println("Value is " + value);  }}

The key things to note here are that we define a private inner class(Foo$Inner) that directly accesses a private method and a privateinstance field in the outer class. This is legal, and the code prints "Value is27" as expected.

The problem is that the VM considers direct access to Foo'sprivate members fromFoo$Inner to be illegal becauseFoo and Foo$Inner are different classes, even thoughthe Java language allows an inner class to access an outer class' privatemembers. To bridge the gap, the compiler generates a couple of syntheticmethods:

/*package*/ static int Foo.access$100(Foo foo) {  return foo.mValue;}/*package*/ static void Foo.access$200(Foo foo, int value) {  foo.doStuff(value);}

The inner class code calls these static methods whenever it needs toaccess themValue field or invoke the doStuff methodin the outer class. What this means is that the code above really boils down toa case where you're accessing member fields through accessor methods.Earlier we talked about how accessors are slower than direct fieldaccesses, so this is an example of a certain language idiom resulting in an"invisible" performance hit.

If you're using code like this in a performance hotspot, you can avoid theoverhead by declaring fields and methods accessed by inner classes to havepackage access, rather than private access. Unfortunately this means the fieldscan be accessed directly by other classes in the same package, so you shouldn'tuse this in public API.

Use Floating-Point Judiciously

As a rule of thumb, floating-point is about 2x slower than integer onAndroid devices. This is true on a FPU-less, JIT-less G1 and a Nexus One withan FPU and the JIT. (Of course, absolute speed difference between those twodevices is about 10x for arithmetic operations.)

In speed terms, there's no difference between float anddouble on the more modern hardware. Space-wise,doubleis 2x larger. As with desktop machines, assuming space isn't an issue, youshould preferdouble to float.

Also, even for integers, some chips have hardware multiply but lackhardware divide. In such cases, integer division and modulus operations areperformed in software — something to think about if you're designing ahash table or doing lots of math.

Know And Use The Libraries

In addition to all the usual reasons to prefer library code over rollingyour own, bear in mind that the system is at liberty to replace callsto library methods with hand-coded assembler, which may be better than thebest code the JIT can produce for the equivalent Java. The typical examplehere is String.indexOf and friends, which Dalvik replaces withan inlined intrinsic. Similarly, theSystem.arraycopy methodis about 9x faster than a hand-coded loop on a Nexus One with the JIT.

(See also Effective Java item 47.)

Use Native Methods Judiciously

Native code isn't necessarily more efficient than Java. For one thing,there's a cost associated with the Java-native transition, and the JIT can'toptimize across these boundaries. If you're allocating native resources (memoryon the native heap, file descriptors, or whatever), it can be significantlymore difficult to arrange timely collection of these resources. You alsoneed to compile your code for each architecture you wish to run on (ratherthan rely on it having a JIT). You may even have to compile multiple versionsfor what you consider the same architecture: native code compiled for the ARMprocessor in the G1 can't take full advantage of the ARM in the Nexus One, andcode compiled for the ARM in the Nexus One won't run on the ARM in the G1.

Native code is primarily useful when you have an existing native codebasethat you want to port to Android, not for "speeding up" parts of a Java app.

If you do need to use native code, you should read ourJNI Tips.

(See also Effective Java item 54.)

Closing Notes

One last thing: always measure. Before you start optimizing, make sure youhave a problem. Make sure you can accurately measure your existing performance,or you won't be able to measure the benefit of the alternatives you try.

Every claim made in this document is backed up by a benchmark. The sourceto these benchmarks can be found in thecode.google.com "dalvik" project.

The benchmarks are built with theCaliper microbenchmarkingframework for Java. Microbenchmarks are hard to get right, so Caliper goes outof its way to do the hard work for you, and even detect some cases where you'renot measuring what you think you're measuring (because, say, the VM hasmanaged to optimize all your code away). We highly recommend you use Caliperto run your own microbenchmarks.

You may also findTraceview usefulfor profiling, but it's important to realize that it currently disables the JIT,which may cause it to misattribute time to code that the JIT may be able to winback. It's especially important after making changes suggested by Traceviewdata to ensure that the resulting code actually runs faster when run withoutTraceview.

Android 程序优化