Android(安卓)Dex文件结构
dex— Dalvik Executable Format
Copyright © 2007 The Android Open Source Project
This document describes the layout and contents of.dexfiles, which are used to hold a set of class definitions and their associated adjunct data.
Guide To Types
Name | Description |
byte | 8-bit signed int |
ubyte | 8-bit unsigned int |
short | 16-bit signed int, little-endian |
ushort | 16-bit unsigned int, little-endian |
int | 32-bit signed int, little-endian |
uint | 32-bit unsigned int, little-endian |
long | 64-bit signed int, little-endian |
ulong | 64-bit unsigned int, little-endian |
sleb128 | signed LEB128, variable-length (see below) |
uleb128 | unsigned LEB128, variable-length (see below) |
uleb128p1 | unsigned LEB128 plus1, variable-length (see below) |
LEB128
LEB128 ("Little-EndianBase128") is a variable-length encoding for arbitrary signed or unsigned integer quantities. The format was borrowed from theDWARF3specification. In a.dexfile, LEB128 is only ever used to encode 32-bit quantities.
Each LEB128 encoded value consists of one to five bytes, which together represent a single 32-bit value. Each byte has its most significant bit set except for the final byte in the sequence, which has its most significant bit clear. The remaining seven bits of each byte are payload, with the least significant seven bits of the quantity in the first byte, the next seven in the second byte and so on. In the case of a signed LEB128 (sleb128), the most significant payload bit of the final byte in the sequence is sign-extended to produce the final value. In the unsigned case (uleb128), any bits not explicitly represented are interpreted as0.
Bitwise diagram of a two-byte LEB128 value | |||||||||||||||
First byte | Second byte | ||||||||||||||
1 | bit6 | bit5 | bit4 | bit3 | bit2 | bit1 | bit0 | 0 | bit13 | bit12 | bit11 | bit10 | bit9 | bit8 | bit7 |
The variantuleb128p1is used to represent a signed value, where the representation is of the valueplus oneencoded as auleb128. This makes the encoding of-1(alternatively thought of as the unsigned value0xffffffff) — but no other negative number — a single byte, and is useful in exactly those cases where the represented number must either be non-negative or-1(or0xffffffff), and where no other negative values are allowed (or where large unsigned values are unlikely to be needed).
Here are some examples of the formats:
Encoded Sequence | Assleb128 | Asuleb128 | Asuleb128p1 |
00 | 0 | 0 | -1 |
01 | 1 | 1 | 0 |
7f | -1 | 127 | 126 |
80 7f | -128 | 16256 | 16255 |
Overall File Layout
Name | Format | Description |
header | header_item | the header |
string_ids | string_id_item[] | string identifiers list. These are identifiers for all the strings used by this file, either for internal naming (e.g., type descriptors) or as constant objects referred to by code. This list must be sorted by string contents, using UTF-16 code point values (not in a locale-sensitive manner). |
type_ids | type_id_item[] | type identifiers list. These are identifiers for all types (classes, arrays, or primitive types) referred to by this file, whether defined in the file or not. This list must be sorted bystring_idindex. |
proto_ids | proto_id_item[] | method prototype identifiers list. These are identifiers for all prototypes referred to by this file. This list must be sorted in return-type (bytype_idindex) major order, and then by arguments (also bytype_idindex). |
field_ids | field_id_item[] | field identifiers list. These are identifiers for all fields referred to by this file, whether defined in the file or not. This list must be sorted, where the defining type (bytype_idindex) is the major order, field name (bystring_idindex) is the intermediate order, and type (bytype_idindex) is the minor order. |
method_ids | method_id_item[] | method identifiers list. These are identifiers for all methods referred to by this file, whether defined in the file or not. This list must be sorted, where the defining type (bytype_idindex) is the major order, method name (bystring_idindex) is the intermediate order, and method prototype (byproto_idindex) is the minor order. |
class_defs | class_def_item[] | class definitions list. The classes must be ordered such that a given class's superclass and implemented interfaces appear in the list earlier than the referring class. |
data | ubyte[] | data area, containing all the support data for the tables listed above. Different items have different alignment requirements, and padding bytes are inserted before each item if necessary to achieve proper alignment. |
link_data | ubyte[] | data used in statically linked files. The format of the data in this section is left unspecified by this document; this section is empty in unlinked files, and runtime implementations may use it as they see fit. |
Bitfield, String, and Constant Definitions
DEX_FILE_MAGIC
embedded inheader_item
The constant array/stringDEX_FILE_MAGICis the list of bytes that must appear at the beginning of a.dexfile in order for it to be recognized as such. The value intentionally contains a newline ("\n"or0x0a) and a null byte ("\0"or0x00) in order to help in the detection of certain forms of corruption. The value also encodes a format version number as three decimal digits, which is expected to increase monotonically over time as the format evolves.
ubyte[8] DEX_FILE_MAGIC = { 0x64 0x65 0x78 0x0a 0x30 0x33 0x35 0x00 }
= "dex\n035\0"
Note:At least a couple earlier versions of the format have been used in widely-available public software releases. For example, version009was used for the M3 releases of the Android platform (November-December 2007), and version013was used for the M5 releases of the Android platform (February-March 2008). In several respects, these earlier versions of the format differ significantly from the version described in this document.
ENDIAN_CONSTANTandREVERSE_ENDIAN_CONSTANT
embedded inheader_item
The constantENDIAN_CONSTANTis used to indicate the endianness of the file in which it is found. Although the standard.dexformat is little-endian, implementations may choose to perform byte-swapping. Should an implementation come across a header whoseendian_tagisREVERSE_ENDIAN_CONSTANTinstead ofENDIAN_CONSTANT, it would know that the file has been byte-swapped from the expected form.
uint ENDIAN_CONSTANT = 0x12345678;
uint REVERSE_ENDIAN_CONSTANT = 0x78563412;
NO_INDEX
embedded inclass_def_itemanddebug_info_item
The constantNO_INDEXis used to indicate that an index value is absent.
Note:This value isn't defined to be0, because that is in fact typically a valid index.
Also Note:The chosen value forNO_INDEXis representable as a single byte in theuleb128p1encoding.
uint NO_INDEX = 0xffffffff; // == -1 if treated as a signed int
access_flagsDefinitions
embedded inclass_def_item,field_item,method_item, andInnerClass
Bitfields of these flags are used to indicate the accessibility and overall properties of classes and class members.
Name | Value | For Classes (andInnerClassannotations) | For Fields | For Methods |
ACC_PUBLIC | 0x1 | public: visible everywhere | public: visible everywhere | public: visible everywhere |
ACC_PRIVATE | 0x2 | *private: only visible to defining class | private: only visible to defining class | private: only visible to defining class |
ACC_PROTECTED | 0x4 | *protected: visible to package and subclasses | protected: visible to package and subclasses | protected: visible to package and subclasses |
ACC_STATIC | 0x8 | *static: is not constructed with an outerthisreference | static: global to defining class | static: does not take athisargument |
ACC_FINAL | 0x10 | final: not subclassable | final: immutable after construction | final: not overridable |
ACC_SYNCHRONIZED | 0x20 |
|
| synchronized: associated lock automatically acquired around call to this method.Note:This is only valid to set whenACC_NATIVEis also set. |
ACC_VOLATILE | 0x40 |
| volatile: special access rules to help with thread safety |
|
ACC_BRIDGE | 0x40 |
|
| bridge method, added automatically by compiler as a type-safe bridge |
ACC_TRANSIENT | 0x80 |
| transient: not to be saved by default serialization |
|
ACC_VARARGS | 0x80 |
|
| last argument should be treated as a "rest" argument by compiler |
ACC_NATIVE | 0x100 |
|
| native: implemented in native code |
ACC_INTERFACE | 0x200 | interface: multiply-implementable abstract class |
|
|
ACC_ABSTRACT | 0x400 | abstract: not directly instantiable |
| abstract: unimplemented by this class |
ACC_STRICT | 0x800 |
|
| strictfp: strict rules for floating-point arithmetic |
ACC_SYNTHETIC | 0x1000 | not directly defined in source code | not directly defined in source code | not directly defined in source code |
ACC_ANNOTATION | 0x2000 | declared as an annotation class |
|
|
ACC_ENUM | 0x4000 | declared as an enumerated type | declared as an enumerated value |
|
(unused) | 0x8000 |
|
|
|
ACC_CONSTRUCTOR | 0x10000 |
|
| constructor method (class or instance initializer) |
ACC_DECLARED_ | 0x20000 |
|
| declaredsynchronized.Note:This has no effect on execution (other than in reflection of this flag, per se). |
*Only allowed on forInnerClassannotations, and must not ever be on in aclass_def_item.
MUTF-8 (Modified UTF-8) Encoding
As a concession to easier legacy support, the.dexformat encodes its string data in a de facto standard modified UTF-8 form, hereafter referred to as MUTF-8. This form is identical to standard UTF-8, except:
- Only the one-, two-, and three-byte encodings are used.
- Code points in the rangeU+10000…U+10ffffare encoded as a surrogate pair, each of which is represented as a three-byte encoded value.
- The code pointU+0000is encoded in two-byte form.
- A plain null byte (value0) indicates the end of a string, as is the standard C language interpretation.
The first two items above can be summarized as: MUTF-8 is an encoding format for UTF-16, instead of being a more direct encoding format for Unicode characters.
The final two items above make it simultaneously possible to i
更多相关文章
- 代码中设置drawableleft
- android 3.0 隐藏 系统标题栏
- Android开发中activity切换动画的实现
- Android(安卓)学习 笔记_05. 文件下载
- Android中直播视频技术探究之—摄像头Camera视频源数据采集解析
- 技术博客汇总
- android 2.3 wifi (一)
- AndRoid Notification的清空和修改
- Android中的Chronometer