本版首Po, 查了一下好像没有相关讨论, 请大大们鞭小力一点QQ
网志好读版: http://tinyurl.com/y677eu2f
最近从同事那里听到这个小技巧, 分享给大家
Tip: 建议尽量使用structure来存取Register,可以获得以下好处
1. 让compiler对base address计算做最佳化 (with -O1),让程式更有效率
2. 易写、易读、易懂!
正文开始!
让compiler对base address计算做最佳化 (with -O1),让程式更有效率
基本概念: Placing C variables at specific addresses to access memory-mapped
peripherals
The ARM compiler will normally use a ‘base register’ plus the immediate
offset field available in the load/store instruction to compile struct member
or specific array element access.
In the ARM instruction set, LDR/STR word/byte instructions have a 4KB range,
but LDRH/STRH instructions have a smaller immediate offset of 256 bytes.
Equivalent 16-bit Thumb instructions are much more restricted - LDR/STR have
a range of 32 words, LDRH/STRH have a range of 32 halfwords and LDRB/STRB
have a range of 32 bytes. However, 32-bit Thumb instructions offer a
significant improvement. Hence, it is important to group related peripheral
registers near to each other if possible. The compiler will generally do a
good job of minimising the number of instructions required to access the
array elements or structure members by using base registers.
以上大意上是说ARM compiler原本就会使用base register加上offset来
对struct member与array element来做存取,所以如果我们将一组连续位置的register用
struct或array来定义,就可以也套用上述的base register存取方式。
直接看例子比较快,如果我们直接用下面这样的方法去写A/B/C
#define REG_BASE_ADDR (0x10000000FFFFF00)
#define REG_A (REG_BASE_ADDR + 0x8)
#define REG_B (REG_BASE_ADDR + 0x10)
#define REG_C (REG_BASE_ADDR + 0x18)
#define READ_REG(reg, val) val = *((volatile unsigned long *) (reg))
#define WRITE_REG(reg, val) *((volatile unsigned long *) (reg)) = val
void foo(unsigned long a_val, unsigned long b_val, unsigned long c_val){
WRITE_REG(REG_A, a_val);
WRITE_REG(REG_B, b_val);
WRITE_REG(REG_C, c_val);
}
从Compiler Explorer(ARM64 GCC 8.2 -O2)测试的assembly结果如下
(https://godbolt.org/z/3MRiMJ)
可以看到需要分别计算A/B/C register的base address(Line2~10)才能写值。
foo:
mov x5, 65288
mov x4, 65296
movk x5, 0xfff, lsl 16
movk x4, 0xfff, lsl 16
movk x5, 0x100, lsl 48
mov x3, 65304
movk x4, 0x100, lsl 48
movk x3, 0xfff, lsl 16
movk x3, 0x100, lsl 48
str x0, [x5]
str x1, [x4]
str x2, [x3]
ret
而如果改成下面的写法,利用structure来存取 (https://godbolt.org/z/g-eJmz)
#define REG_BASE_ADDR (0x10000000FFFFF00)
typedef struct
{
unsigned long BASE;
unsigned long REG_A;
unsigned long REG_B;
unsigned long REG_C;
} my_register;
#define READ_REG(reg, val) do{ \
volatile my_register *base = (my_register *) REG_BASE_ADDR; \
val = base->reg; \
} while(0)
#define WRITE_REG(reg, val) do{ \
volatile my_register *base = (my_register *) REG_BASE_ADDR; \
base->reg = val; \
} while(0)
void foo(unsigned long a_val, unsigned long b_val, unsigned long c_val){
WRITE_REG(REG_A, a_val);
WRITE_REG(REG_B, b_val);
WRITE_REG(REG_C, c_val);
}
产生的Assembly如下,可以看到只需要去计算Base Address一次,并存在base register
x3,接着直接透过offset去读A/B/C,整整少了一半的指令数!对于斤斤计较MCPS的Hard
Real Time Context来说可是有天壤之别!
foo:
mov x3, 268435200
movk x3, 0x100, lsl 48
str x0, [x3, 8]
str x1, [x3, 16]
str x2, [x3, 24]
ret
易写、易读、易懂!
再来看第二个优点,这点对我来说跟甚至比程式效率还重要,而这其实也是Structure本
身最大的用途:用对工程师最友善的方式来描述资料
例如Register A的Spec如下 (little-endian)
Bit | 0 1 2 3 4 5 6 7 8