stm32f4discovery 是很好的 os 练习平台, 不过没有 mmu 是我觉得可惜的部份, 而
raspberrypi 2 正好可以补足这部份, 然而 rpi2 我目前还不知道怎么使用 jtag, 在除
错上会比较麻烦, 得用冥想的。
没想到第二个 bare-metal rpi2 程式就要搞 mmu 了, 感觉很硬斗, 我自己觉得还好, 毕
竟我已经累积了不少经验/知识。这就是累积的力https://github.com/ygtw/Blog2BBS量
。
之前有写过 x86 mmu ( http://goo.gl/hbVhMi ) 的文章, 那时候 mmu 并不是我重点学
习的部份, 现在换个平台, 再来重新学习。
raspberrypi 2 是 arm cortex A7, 这是比较新的架构, 网络上找的 mmu 资讯大部份都
是 arm v6 的, 而 DS-5 有 startup_Cortex-A7/startup.s 可以用来参考, 程式码配合
手册, 可以加速学习速度。
我强烈建议你先看《一步步嵌入式操作系:ARM程的方法与 ( http://goo.gl/rvfq46 )》
3.2 ~ 3.4, 否则应该看不懂这篇, 除了理论 (在一步步嵌入式操作系:ARM程的方法与提
到, 所以我不会说明 arm v6 mmu 工作方式, 这本书说明的很详细, 这本书虽然绝版了却
很容易找到, 别担心, 这本书写的是 arm v6, 但还是有很大的参考价值), 我还会展示实
作的程式码, 可以想成是一步步嵌入式操作系:ARM程https://github.com/ygtw/Blog2BBS
的方法与 arm v7-A 真实机器 (rpi2) 的版本。
由于是 arm v7-A, 所以还要搭配 ARM Architecture Reference Manual ARMv7-A and
ARMv7-R edition ( http://goo.gl/WkRDdw ) Chapter B3 Virtual Memory System
Architecture (VMSA) 研读 (因为我找不到中文的, 所以只好自己 K 英文手册), 若你真
的看过一步步嵌入式操作系:ARM程的方法与, 再看这部份会好懂些。大同中有小异,
cortex v7-A 多了一些字段, 所以还是要参考一下手册的内容。这部份有 200 多页, 不
过并不需要真的看完才会设定, 我大概看了 10 页左右就足够我的测试。
page 大小有以下四类, 我选用 sections, 这是 1MB 的大小, cortex v7-A 没有 1K 大
小的 page, 而且这只需要一个 page table, 不用出动到两个 page table, 简化我们的
学习。
Supersections Consist of 16MB blocks of memory. Support for Supersections is
optional, except that an implementation that includes the Large Physical
Address Extension and supports more that 32 bits of Physical Address must
also support Supersections to provide access to the entire Physical Address
space.
Sections Consist of 1MB blocks of memory.
Large pagesConsist of 64KB blocks of memory.
Small pagesConsist of 4KB blocks of memory.
( https://goo.gl/bgFWcB )
fig 1. short-descriptor (first-level descriptor)
寄件者 20150614 raspberry pi 2 ( https://goo.gl/F3RkW5 )
( https://goo.gl/JU1gEg )
fig 2. virtual address 转换为 physical address 的方式
fig 2 说明 virtual address 如何转成 physical address, 在一步步嵌入式操作系:ARM
程的方法与 p62, p63 有详细的说明, 还提供了范例, 这个图其实很直觉的, 前题是你得
把那两页看完, 否则应该看不懂。
我要让 physical address 0x3F000000 对应到 virtual address 0xc8000000, 应该怎么
填入 first-level descriptor, 也许你听过 page table entry, 和 first-level
descriptor 是一样的东西, 我是用 arm 手册上的术语。
https://github.com/descent/arm_os/blob/master/leeos/chapter3/part2/cortex_a_mmu.c
( https://goo.gl/aGJOXy )
这是我从一步步嵌入式操作系:ARM程的方法与https://goo.gl/HwXp4g (
https://github.com/descent/arm_os/blob/master/leeos/chapter3/part2/mmu.c ) 改
过来的, 可以先打印出 first-level descriptor 其存放位址与内容。
tt_ex
1 #define PHYSICAL_IO_ADDR 0x3F000000
2 #define VIRTUAL_IO_ADDR 0xc8000000
3 translation table base : 30700000
4
5 0 (IO) ## pte_addr: 30703200, pte: 3f000de2
6 1 (IO) ## pte_addr: 30703204, pte: 3f100de2
7 2 (IO) ## pte_addr: 30703208, pte: 3f200de2
translation table base register 是 30700000, 我们来看看一个 virtual address
0xc8000000 怎么对应到 physical address 0x3F000000。注意哦! 第一笔 first-level
descriptor 并不是在 30700000, 而是 30703200。
c8000000 => c80 00000
c80 << 2 = 3200
30700000 | 3200 = 30703200
tt_ex L5就是我们要查找的 first-level descriptor, 其内容是 3f000de2。
3f000de2 橘色的 3f0 和 c8000000 蓝色的 00000 作 or 运算, 3f000000 就是最后的答
案。
所以正确填好这个表, 就能正确转换实体/虚拟位址, 你要怎么转就怎么转。不过实际上
还要再复杂一点, 还有 ds-5/examples/DS-5Examples/startup_Cortex-A7/startup.s
L218 ~ L232 的字段要填, 请自己参阅手册。
手册相关章节
B3.5.2 Memory attributes in the Short-descriptor translation table format
descriptors (page B3-1328)
Domain 这个字段还需要参考: page B3-1362, B4.1.43 DACR, Domain Access Control
Register, VMSA (page B4-1558)
Translation Table Base Register: page B4-1729
rpi2 有 1GB ram, arm address 从 0 开始, 所以 physical address 从 0x00000000 ~
0x40000000, 我特别说明这个, 看起来好像是废话, 其实不是这样, 嵌入式系统的内存
位址很有可能出乎你的想像, 有的 1GB 是从 0x10000000 开始算起的, 并不一定是从 0
算起, 每一家厂商的 soc 可能都不同。和书中一样, ddr 的 mapping 我们用一对一, 也
就是位址 0 经过 mmu 转换后还是位址 0, 位址 99 经过 mmu 转换后还是位址 99。
ds-5/examples/DS-5Examples/startup_Cortex-A7/startup.s
1 ;==================================================================
2 ; Copyright ARM Ltd 2005-2014. All rights reserved.
3 ;
4 ; Cortex-A7 Embedded example - Startup Code
5 ;==================================================================
6
7
8 ; Standard definitions of mode bits and interrupt (I & F) flags in PSRs
9
10 Mode_USR EQU 0x10
11 Mode_FIQ EQU 0x11
12 Mode_IRQ EQU 0x12
13 Mode_SVC EQU 0x13
14 Mode_ABT EQU 0x17
15 Mode_UND EQU 0x1B
16 Mode_SYS EQU 0x1F
17
18 I_Bit EQU 0x80 ; When I bit is set, IRQ is
disabled
19 F_Bit EQU 0x40 ; When F bit is set, FIQ is
disabled
20
21
22 PRESERVE8
23 AREA VECTORS, CODE, READONLY ; Name this block of code
24
25 ENTRY
26
27 ;==================================================================
28 ; Entry point for the Reset handler
29 ;==================================================================
30
31 EXPORT Start
32
33 Start
34
35 ;==================================================================
36 ; Exception Vector Table
37 ;==================================================================
38 ; Note: LDR PC instructions are used here, though branch (B) instructions
39 ; could also be used, unless the exception handlers are >32MB away.
40
41 Vectors
42 LDR PC, Reset_Addr
43 LDR PC, Undefined_Addr
44 LDR PC, SVC_Addr
45 LDR PC, Prefetch_Addr
46 LDR PC, Abort_Addr
47 LDR PC, Hypervisor_Addr
48 LDR PC, IRQ_Addr
49 LDR PC, FIQ_Addr
50
51 Reset_Addr DCD Reset_Handler
52 Undefined_Addr DCD Undefined_Handler
53 SVC_Addr DCD SVC_Handler
54 Prefetch_Addr DCD Prefetch_Handler
55 Abort_Addr DCD Abort_Handler
56 Hypervisor_Addr DCD Hypervisor_Handler
57 IRQ_Addr DCD IRQ_Handler
58 FIQ_Addr DCD FIQ_Handler
59
60
61 ;==================================================================
62 ; Exception Handlers
63 ;==================================================================
64
65 Undefined_Handler
66 B Undefined_Handler
67 SVC_Handler
68 B SVC_Handler
69 Prefetch_Handler
70 B Prefetch_Handler
71 Abort_Handler
72 B Abort_Handler
73 Hypervisor_Handler
74 B Hypervisor_Handler
75 IRQ_Handler
76 B IRQ_Handler
77 FIQ_Handler
78 B FIQ_Handler
79
80
81 ;==================================================================
82 ; Reset Handler
83 ;==================================================================
84 Reset_Handler FUNCTION {}
85
86 ;==================================================================
87 ; Disable caches, MMU and branch prediction in case they were left
enabled from an earlier run
88 ; This does not need to be done from a cold reset
89 ;==================================================================
90
91 MRC p15, 0, r0, c1, c0, 0 ; Read CP15 System Control
register
92 BIC r0, r0, #(0x1 << 12) ; Clear I bit 12 to disable I
Cache
93 BIC r0, r0, #(0x1 << 2) ; Clear C bit 2 to disable D
Cache
94 BIC r0, r0, #0x1 ; Clear M bit 0 to disable
MMU
95 BIC r0, r0, #(0x1 << 11) ; Clear Z bit 11 to disable
branch prediction
96 MCR p15, 0, r0, c1, c0, 0 ; Write value back to CP15
System Control register
97
98 ; The MMU is enabled later, before calling main(). Caches and branch
prediction are enabled inside main(),
99 ; after the MMU has been enabled and scatterloading has been performed.
100
101 ;===================================================================
102 ; ACTLR.SMP Enables coherent requests to the processor.
103 ; You must ensure this bit is set to 1 before the caches and MMU are
enabled, or any cache and TLB maintenance operations are performed.
104 ;===================================================================
105 MRC p15, 0, r0, c1, c0, 1 ; Read CP15 ACTLR
106 ORR r0, r0, #(1 << 6) ; set ACTLR.SMP bit
107 MCR p15, 0, r0, c1, c0, 1 ; Write CP15 ACTLR
108
109 ;==================================================================
110 ; Invalida
110 ; Invalidate Data and Instruction TLBs and branch predictor in case they
were left enabled from an earlier run
111 ; This does not need to be done from a cold reset
112 ;==================================================================
113
114 MOV r0,#0
115 MCR p15, 0, r0, c8, c7, 0 ; I-TLB and D-TLB invalidation
116 MCR p15, 0, r0, c7, c5, 6 ; BPIALL - Invalidate entire
branch predictor array
117
118 ;==================================================================
119 ; Initialize Supervisor Mode Stack
120 ; Note stack must be 8 byte aligned.
121 ;==================================================================
122
123 IMPORT ||Image$$ARM_LIB_STACK$$ZI$$Limit|| ; Linker symbol
from scatter file
124 LDR SP, =||Image$$ARM_LIB_STACK$$ZI$$Limit||
125
126 ;===================================================================
127 ; Set Vector Base Address Register (VBAR) to point to this application's
vector table
128 ;===================================================================
129
130 LDR r0, =Vectors
131 MCR p15, 0, r0, c12, c0, 0
132
133 ;==================================================================
134 ; Cache Invalidation code for Cortex-A7
135 ; NOTE: Neither Caches, nor MMU, nor BTB need post-reset invalidation on
Cortex-A7,
136 ; but forcing a cache invalidation, makes the code more portable to other
CPUs (e.g. Cortex-A9)
137 ;==================================================================
138 ; Invalidate L1 Instruction Cache
139 MRC p15, 1, r0, c0, c0, 1 ; Read Cache Level ID Register
(CLIDR)
140 TST r0, #0x3 ; Harvard Cache?
141 MOV r0, #0 ; SBZ
142 MCRNE p15, 0, r0, c7, c5, 0 ; ICIALLU - Invalidate
instruction cache and flush branch target cache
143
144 ; Invalidate Data/Unified Caches
145
146 MRC p15, 1, r0, c0, c0, 1 ; Read CLIDR
147 ANDS r3, r0, #0x07000000 ; Extract coherency level
148 MOV r3, r3, LSR #23 ; Total cache levels << 1
149 BEQ Finished ; If 0, no need to clean
150
151 MOV r10, #0 ; R10 holds current cache
level << 1
152 Loop1 ADD r2, r10, r10, LSR #1 ; R2 holds cache "Set"
position
153 MOV r1, r0, LSR r2 ; Bottom 3 bits are the
Cache-type for this level
154 AND r1, r1, #7 ; Isolate those lower 3 bits
155 CMP r1, #2
156 BLT Skip ; No cache or only instruction
cache at this level
157
158 MCR p15, 2, r10, c0, c0, 0 ; Write the Cache Size
selection register
159 ISB ; ISB to sync the change to
the CacheSizeID reg
160 MRC p15, 1, r1, c0, c0, 0 ; Reads current Cache Size ID
register
161 AND r2, r1, #7 ; Extract the line length
field
162 ADD r2, r2, #4 ; Add 4 for the line length
offset (log2 16 bytes)
163 LDR r4, =0x3FF
164 ANDS r4, r4, r1, LSR #3 ; R4 is the max number on the
way size (right aligned)
165 CLZ r5, r4 ; R5 is the bit position of
the way size increment
166 LDR r7, =0x7FFF
167 ANDS r7, r7, r1, LSR #13 ; R7 is the max number of the
index size (right aligned)
168
169 Loop2 MOV r9, r4 ; R9 working copy of the max
way size (right aligned)
170
171 Loop3 ORR r11, r10, r9, LSL r5 ; Factor in the Way number and
cache number into R11
172 ORR r11, r11, r7, LSL r2 ; Factor in the Set number
173 MCR p15, 0, r11, c7, c6, 2 ; Invalidate by Set/Way
174 SUBS r9, r9, #1 ; Decrement the Way number
175 BGE Loop3
176 SUBS r7, r7, #1 ; Decrement the Set number
177 BGE Loop2
178 Skip ADD r10, r10, #2 ; increment the cache number
179 CMP r3, r10
180 BGT Loop1
181
182 Finished
183
184
185 ;===================================================================
186 ; Cortex-A7 MMU Configuration
187 ; Set translation table base
188 ;===================================================================
189
190 IMPORT ||Image$$VECTORS$$Base|| ; From scatter file
191 IMPORT ||Image$$TTB$$ZI$$Base|| ; From scatter file
192
193 ; Cortex-A7 supports two translation tables
194 ; Configure translation table base (TTB) control register cp15,c2
195 ; to a value of all zeros, indicates we are using TTB register 0.
196
197 MOV r0,#0x0
198 MCR p15, 0, r0, c2, c0, 2
199
200 ; write the address of our page table base to TTB register 0
201 LDR r0,=||Image$$TTB$$ZI$$Base||
202 MOV r1, #0x08 ; RGN=b01 (outer cacheable
write-back cached, write allocate)
203 ; S=0 (translation table
walk to non-shared memory)
204 ORR r1,r1,#0x40 ; IRGN=b01 (inner cacheability
for the translation table walk is Write-back Write-allocate)
205
206 ORR r0,r0,r1
207 MCR p15, 0, r0, c2, c0, 0
208
209
210 ;===================================================================
211 ; PAGE TABLE generation
212
213 ; Generate the page tables
214 ; Build a flat translation table for the whole address space.
215 ; ie: Create 4096 1MB sections from 0x000xxxxx to 0xFFFxxxxx
216
217
218 ; 31 20 19 18 17 16 15 14 12 11 10 9 8 5 4
3 2 1 0
219 ; |section base address| 0 0 |nG| S |AP2| TEX | AP | P | Domain | XN
| C B | 1 0|
220 ;
221 ; Bits[31:20] - Top 12 bits of VA is pointer into table
222 ; nG[17]=0 - Non global, enables matching against ASID in the TLB
when set.
223 ; S[16]=0 - Indicates normal memory is shared when set.
224 ; AP2[15]=0
225 ; AP[11:10]=11 - Configure for full read/write access in all modes
226 ; TEX[14:12]=000
227 ; CB[3:2]= 00 - Set attributes to Strongly-ordered memory.
228 ; (except for the code segment descriptor, see below)
229 ; IMPP[9]=0 - Ignored
230 ; Domain[5:8]=1111 - Set all pages to use domain 15
231 ; XN[4]=1 - Execute never on Strongly-ordered memory
232 ; Bits[1:0]=10 - Indicate entry is a 1MB section
233 ;===================================================================
234 LDR r0,=||Image$$TTB$$ZI$$Base||
235 LDR r1,=0xfff ; loop counter
236 LDR r2,=2_00000000000000000000110111100010
237
238 ; r0 contains the address of the translation table base
239 ; r1 is loop counter
240 ; r2 is level1 descriptor (bits 19:0)
241
242 ; use loop counter to create 4096 individual table entries.
243 ; this writes from address 'Image$$TTB$$ZI$$Base' +
244 ; offset 0x3FFC down to offset 0x0 in word steps (4 bytes)
245
246 init_ttb_1
247 ORR r3, r2, r1, LSL#20 ; R3 now contains full level1
descriptor to write
248 ORR r3, r3, #2_0000000010000 ; Set XN bit
249 STR r3, [r0, r1, LSL#2] ; Str table entry at TTB base
+ loopcount*4
250 SUBS r1, r1, #1 ; Decrement loop counter
251 BPL init_ttb_1
252
253 ; In this example, the 1MB section based at
'||Image$$VECTORS$$Base||' is setup specially as cacheable (write back mode).
254 ; TEX[14:12]=001 and CB[3:2]= 11, Outer and inner write back,
write allocate normal memory.
255 LDR r1,=||Image$$VECTORS$$Base|| ; Base physical address of
code segment
256 LSR r1, #20 ; Shift right to align to 1MB
boundaries
257 ORR r3, r2, r1, LSL#20 ; Setup the initial level1
descriptor again
258 ORR r3, r3, #2_0000000001100 ; Set CB bits
259 ORR r3, r3, #2_1000000000000 ; Set TEX bit 12
260 STR r3, [r0, r1, LSL#2] ; str table entry
261
262 ;===================================================================
263 ; Setup domain control register - Enable all domains to client mode
264 ;===================================================================
265
266 MRC p15, 0, r0, c3, c0, 0 ; Read Domain Access Control
Register
267 LDR r0, =0x55555555 ; Initialize every domain
entry to b01 (client)
268 MCR p15, 0, r0, c3, c0, 0 ; Write Domain Access Control
Register
269
270 IF {TARGET_FEATURE_NEON} || {TARGET_FPU_VFP}
271 ;==================================================================
272 ; Enable access to NEON/VFP by enabling access to Coprocessors 10 and 11.
273 ; Enables Full Access i.e. in both privileged and non privileged modes
274 ;==================================================================
275
276 MRC p15, 0, r0, c1, c0, 2 ; Read Coprocessor Access
Control Register (CPACR)
277 ORR r0, r0, #(0xF << 20) ; Enable access to CP 10 & 11
278 MCR p15, 0, r0, c1, c0, 2 ; Write Coprocessor Access
Control Register (CPACR)
279 ISB
280
281 ;==================================================================
282 ; Switch on the VFP and NEON hardware
283 ;=================================================================
284
285 MOV r0, #0x40000000
286 VMSR FPEXC, r0 ; Write FPEXC register, EN
bit set
287 ENDIF
288
289
290 ;===================================================================
291 ; Enable MMU and branch to __main
292 ; Leaving the caches disabled until after scatter loading.
293 ;===================================================================
294
295 IMPORT __main ; Before MMU enabled import
label to __main
296
297 LDR r12,=__main ; Save this in register for
possible long jump
298
299 MRC p15, 0, r0, c1, c0, 0 ; Read CP15 System Control
register
300 BIC r0, r0, #(0x1 << 12) ; Clear I bit 12 to disable I
Cache
301 BIC r0, r0, #(0x1 << 2) ; Clear C bit 2 to disable D
Cache
302 BIC r0, r0, #0x2 ; Clear A bit 1 to disable
strict alignment fault checking
303 ORR r0, r0, #0x1 ; Set M bit 0 to enable MMU
before scatter loading
304 MCR p15, 0, r0, c1, c0, 0 ; Write CP15 System Control
register
305
306 ; Now the MMU is enabled, virtual to physical address translations will
occur. This will affect the next
307 ; instruction fetch.
308 ;
309 ; The two instructions currently in the ARM pipeline will have been
fetched before the MMU was enabled.
310 ; The branch to __main is safe because the Virtual Address (VA) is the
same as the Physical Address (PA)
311 ; (flat mapping) of this code that enables the MMU and performs the
branch
312
313 BX r12 ; Branch to __main C library
entry point
314
315 ENDFUNC
316
317
318
319 ;==================================================================
320 ; Enable caches and branch prediction
321 ; This code must be run from a privileged mode
322 ;==================================================================
323
324 AREA ENABLECACHES, CODE, READONLY
325
326 EXPORT enable_caches
327
328 enable_caches FUNCTION
329
330 ;==================================================================
331 ; Enable caches and branch prediction
332 ;==================================================================
333
334 MRC p15, 0, r0, c1, c0, 0 ; Read System Control Register
335 ORR r0, r0, #(0x1 << 12) ; Set I bit 12 to enable I
Cache
336 ORR r0, r0, #(0x1 << 2) ; Set C bit 2 to enable D
Cache
337 ORR r0, r0, #(0x1 << 11) ; Set Z bit 11 to enable
branch prediction
338 MCR p15, 0, r0, c1, c0, 0 ; Write System Control
Register
339
340 BX lr
341
342 ENDFUNC
343
344
345 END
https://github.com/descent/arm_os/blob/master/leeos/chapter3/part2/cortex_a_mmu.c
( https://goo.gl/aGJOXy ) 是我参考了
ds-5/examples/DS-5Examples/startup_Cortex-A7/startup.s 其他属性设定最后的结果
, 和 ds-5/examples/DS-5Examples/startup_Cortex-A7/startup.s 有些不同, 请自己参
考并配合手册说明来读懂它们, 你愿意看这篇文章, 我相信手册的内容应该难不倒你。
和书中以模拟器执行程式不同, 真实机器上可有很多细节要处理, 还好有人做好这些事情
了。我的测试方式从 uart 开始, uart05.c 是 uart 的程式码, 可以透过 uart 显示字
元到 minicom 上, 从https://goo.gl/foUPWu (
https://github.com/dwelch67/raspberrypi ) fork 来的, 我仿照书中的想法, 将这程
式修改加入 mmu 之后, 再从转换的位址印出字符。
uart05.c L64, L77 0x3f215040 就是 uart 送出字符的暂存器实体位址, 0xc8215040 则
是用 mmu 转换过后的虚拟位址, 所以在启动 mmu 后, 就要写入 0xc8215040 这位址才能
正确在 uart 上送出字符。
uart 转换 first-level descriptor
0 (IO) ## pte_addr: 30703200, pte: 3f000de2
1 (IO) ## pte_addr: 30703204, pte: 3f100de2
2 (IO) ## pte_addr: 30703208, pte: 3f200de2
大家可以算算看, 0xc8215040 是不是会对应到 0x3f215040。
uart05.c
1
2
//