吾爱破解 - LCG - LSG |安卓破解|病毒分析|www.52pojie.cn

 找回密码
 注册[Register]

QQ登录

只需一步,快速开始

查看: 1745|回复: 13
收起左侧

[其他转载] ARM汇编 2.ARM Instructions, GNU Assembler Derivatives

[复制链接]
t7sqynt3 发表于 2020-7-23 06:39
本帖最后由 t7sqynt3 于 2020-7-23 06:42 编辑

前言:逆向工程需要汇编基础,先尝试整理目前所学的ARM汇编知识。由于水平有限,可能会有所疏漏,欢迎进行指正和讨论。此系列为原创,如引用需标明出处。然而,不建议作为学术研究的引用,因为内容未经过peer review。

You should cite this article if you want to use it in your work.

Warning: this article may not be precise and professional, and it is not peer-reviewed.

欢迎专业人士对此文进行翻译,因为作者本人不知道准确的翻译术语。

2.ARM Instructions, GNU Assembler Derivatives

Basically, ARM instructions have three categories:

  • Data processing instructions: the destination of data is a register and they only work on registers.

  • Control flow instructions: unconditional or conditional branch, function call.

  • Data transfer instructions: load contents from memory to registers, and store contents of registers to memory

The instructions listed below is not comprehensive, but they are the most commonly used when you write assembly code. (Which may not be exactly the case if you reverse engineer some programs.)

Data Processing Instructions

The general format is instruction destination_reg, oprand1_reg, operand2.

Operand2 can be registers r0-r15, r0-15 shifted, or 32-bit immediate value that can be derived by an 8-bit value, or by shifting, rotating, and/or complementing an 8-bit value.

  • mov: moves a value to a register

    mov destination_reg, source_reg or mov destination_reg, #imm8

    When -256 <= #imm8 <= 255, the assembler can always success. Otherwise, the value must be produced by shifting, rotating, and/or complementing an 8-bit value. There are many values that fail to meet this requirement, and the assembler will generate errors if this happens.

    In our assembly code, #imm8 can be written as numbers, e.g. 1, or as #number's, e.g. #1. Both are accepted by the GNU assembler for ARM.

  • mvn: move 1's complement of a value to a register

    mvn destination_reg, source_reg or mvn destination_reg, #imm8

    #imm8 is typically a value <= 255. Otherwise, the restrictions are the same as mentioned before.

  • add: add two registers

    add destination_reg, source1_reg, source2_reg or add destination_reg, source1_reg, #imm8

    dest = src1 + src2 or dest = src1 + #imm8

  • sub: subtract two registers

    sub destination_reg, source1_reg, source2_regorsub destination_reg, source1_reg, #imm8

    dest = src1 - src2 or dest = src1 - #imm8

  • rsb: subtract two registers (operand2 - operand1)

    rsb destination_reg, source1_reg, source2_regorrsb destination_reg, source1_reg, #imm8

    dest = src2 - src1 or dest = #imm8 - src1

  • mul: multiply two registers

    mul destination_reg, source1_reg, <source2_reg>

    source2_reg is optional. If omitted, dest = dest * src1.

    Otherwise, dest = src1 * src2.

    Notice that the lower 32-bit of the result is stored in dest, and the multiplication is not signed.

  • and: bitwise AND of two registers

    and destination_reg, source1_reg, source2_reg

    dest = src1 & src2

  • orr: bitwise OR of two registers

    orr destination_reg, source1_reg, source2_reg

    dest = src1 | src2

  • eor: bitwise exclusive OR of two registers (XOR)

    eor destination_reg, source1_reg, source2_reg

    dest = src1 ^ src2

  • bic: bitwise clear (AND NOT) of two registers

    bic destination_reg, source1_reg, source2_reg

    dest = src1 & ~src2

  • lsl: logical shift left

    lsl <destination_reg,> source1_reg, source2_reg or lsl <destination_reg,> source1_reg, #const

    1 <= #const <= 32

    dest = src1 << src2 or dest = src1 << #const

    If dest does not present, the result is stored in src1.

  • lsr: logical shift right

    lsr <destination_reg,> source1_reg, source2_reg or lsr <destination_reg,> source1_reg, #const

    1 <= #const <= 32

    dest = (unsigned)src1 >> src2 or dest = (unsigned)src1 >> #const

    If dest does not present, the result is stored in src1.

  • asr: arithmetic shift right

    asr <destination_reg,> source1_reg, source2_reg or asr <destination_reg,> source1_reg, #const

    1 <= #const <= 32

    dest = (signed)src1 >> src2 or dest = (signed)src1 >> #const

    If dest does not present, the result is stored in src1.

  • ror: rotate right

    ror <destination_reg,> source1_reg, source2_reg or ror <destination_reg,> source1_reg, #const

    1 <= #const <= 32

    Copy the low-order bits into the high-order bits positions as they are shifted.

    If dest does not present, the result is stored in src1.

  • cmp: compare two values and set condition flags

    cmp source1_reg, source2_reg or cmp source1_reg, #imm8

    The requirements for #imm8 is the same as before.

Notice: many ARM CPU's do not have hardware support for division, especially for early ones. We avoid discuss this here.

Control Flow Instructions

Condition Flags

Note: the content of this section and the below ("Condition Flags" and "") is referenced from the ARM community post "Condition Codes 1: Condition Flags and Codes" by Jacob Bramley, published September 11, 2013.

Flag Explanation
N set if the result is negative (set to bit 31 of the result)
Z set if the result is zero
C set if the result of an unsigned operation overflows.
V set if the result of a signed operation overflows.

Dedicated comparison instructions that set the flags: (these are not the only instructions that set the flags)

  • cmp: works like subs, which do sub and set the conditional flags, but cmp does not store the result.

  • cmn: works like adds, but does not store the result.

  • tst: works like ands, but does not store the result.

  • teq: works like eors, but does not store the result.

Branch instructions

Branch instructions are used for changing the order of instruction execution, or "jump".

For function calls, use branch and link: bl. For example, bl printf.

For conditional and unconditional branching, use instruction format bxx where xx can be codes below:

Code Meaning (for cmp or subs) Flags Tested
eq Equal. Z==1
ne Not equal. Z==0
cs or hs Unsigned higher or same (or carry set). C==1
cc or lo Unsigned lower (or carry clear). C==0
mi Negative. The mnemonic stands for "minus". N==1
pl Positive or zero. The mnemonic stands for "plus". N==0
vs Signed overflow. The mnemonic stands for "overflow set". V==1
vc No signed overflow. The mnemonic stands for "overflow clear". V==0
hi Unsigned higher. (C==1) && (Z==0)
ls Unsigned lower or same. (C==0) || (Z==1)
ge Signed greater than or equal. N==V
lt Signed less than. N!=V
gt Signed greater than. (Z==0) && (N==V)
le Signed less than or equal. (Z==1) || (N!=V)
al (or omitted) Always executed. None.

Data Transfer Instructions

These instructions are used to move data between CPU and memory. They can transfer bytes, half words (2 bytes), or words (4 bytes), from registers to memory, or from memory to registers.

  • ldr: load data from memory address to registers

    ldr destination_reg, source_memory_address

  • str: store data from registers to memory address

    str destination_reg, source_memory_address

source_memory_address can be:

  • =label

  • =expression (e.g. =0xffffffff)

  • [base_register<, #imm12>] (-2048 <= #imm12 <= 2047)

  • [base_register, offset_register]

Suffixes for ldr or str:

  • ldrb or strb: load or store a byte.

  • ldrh or strh: load or store a halfword.

  • ldr or str: load or store a word.

  • ldrd or strd: load or store a double word. (even register, register + 1 as lower word, upper word)

Sign extension for load a byte or halfword:

  • ldrsb: load a signed byte.

  • ldrsh: load a signed halfword.

Notice the alignment requirements for halfword (2-byte), word (4-byte), and double word (8-byte) in memory. Failure to align will be detrimental to the performance.

GNU Assembler Derivatives

The list below is not meant to be comprehensive; the derivatives are only the commonly used ones.

Target Hardware

  • .arch: the CPU architecture. e.g. .arch armv6.

  • .cpu: the CPU. e.g. .cpu cortex-a15.

Assembler Control

  • .section: assemble the following in section. e.g. .section .rodata.

  • .text: the text section, equivalent to .section .text.

  • .data: the data section.

  • .bss: the BSS section.

  • .align: align the following code or data in the section. .align x where x is a non-negative integer means that align to 2 to the power of x bytes.

Symbol

  • .global: make symbol visible to the linker. e.g. .global main.

  • .extern: use external functions or library functions. e.g. .extern printf.

  • .type: define a label. e.g. .type main, %function.

  • .equ: set the value of symbol. e.g. .equ SIZ, 1.

Constant Definition

  • .byte: define a byte data (8-bit).

  • .hword: define a halfword data (2 bytes).

  • .word: define a word data (4 bytes).

  • .quad: define a double word data (8 bytes).

  • .single: 4 bytes float value.

  • .double: 8 bytes double float value.

  • .skip: skip address forward, filled by 0.

  • .fill: repeat copies of value with size. e.g. .fill 16, 4, 0xffffffff creates an integer (4 bytes size) array of 16 elements with value -1.

  • .ascii: an ascii string. NOT zero-terminated.

  • .asciz: a zero-terminated ascii string.

Misc

.syntax: the syntax of assembly code. Generally we use modern syntax: .syntax unified.

免费评分

参与人数 2吾爱币 +4 热心值 +2 收起 理由
wushaominkk + 3 + 1 欢迎分析讨论交流,吾爱破解论坛有你更精彩!
鸡蛋饼00 + 1 + 1 我很赞同!

查看全部评分

发帖前要善用论坛搜索功能,那里可能会有你要找的答案或者已经有人发布过相同内容了,请勿重复发帖。

bc001 发表于 2020-7-23 07:24
谢谢楼主分享
az12az 发表于 2020-7-23 08:16
Psyber 发表于 2020-7-23 08:20
huohua1991 发表于 2020-7-23 08:37
谢谢分享,学习学习
ALL_IN 发表于 2020-7-23 08:38
看不懂............
so_so_so 发表于 2020-7-23 08:38

谢谢楼主分享
fuzzylogic 发表于 2020-7-23 08:53
现在如此底层的好少了
sbuangke2019 发表于 2020-7-23 09:00
给中文可以吗
skjsnb 发表于 2020-7-23 09:10
Markdown好评, 学习一下,帮顶!
您需要登录后才可以回帖 登录 | 注册[Register]

本版积分规则 警告:本版块禁止灌水或回复与主题无关内容,违者重罚!

快速回复 收藏帖子 返回列表 搜索

RSS订阅|小黑屋|处罚记录|联系我们|吾爱破解 - LCG - LSG ( 京ICP备16042023号 | 京公网安备 11010502030087号 )

GMT+8, 2024-4-30 01:22

Powered by Discuz!

Copyright © 2001-2020, Tencent Cloud.

快速回复 返回顶部 返回列表