Skip to content

cortex-a中断中浮点运算破坏线程栈浮点寄存器 #8043

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions libcpu/arm/cortex-a/context_gcc.S
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,9 @@ rt_hw_context_switch:
tst r6, #(1<<30)
beq 1f
vstmdb sp!, {d0-d15}
vstmdb sp!, {d16-d31}
#ifdef RT_USING_VFPD32 /*there art 32 double float registers to save*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个在编译的时候应该是不同的参数,对应的参数分别是什么?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

浮点单元有多种版本,VFPv2,VFPv3-D16,VFPv4-D16是只有D0-D15寄存器,VFPv3-D32,VFPv4-D32有D0-D31寄存器。我手上用的芯片实现的版本是VFPv4-D16,所以我只能测试D0-D15。我看到其它RTOS(如threadX,FreeRTOS)对于该处的实现均是保存的D0-D31,所以我默认它们的移植是正确的。所以就用RT_USING_VFPD32表示浮点寄存器是D0-D31的情况,至于只有D0-D15的情况,在我写代码时没有过多考虑(因为低16个寄存器大家都有),当然也可以使用如RT_USING_VFPD16来表示。

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果这里能够通过gcc内部的宏来控制,这个就最好了。

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

浮点单元有多种版本,VFPv2,VFPv3-D16,VFPv4-D16是只有D0-D15寄存器,VFPv3-D32,VFPv4-D32有D0-D31寄存器。我手上用的芯片实现的版本是VFPv4-D16,所以我只能测试D0-D15。我看到其它RTOS(如threadX,FreeRTOS)对于该处的实现均是保存的D0-D31,所以我默认它们的移植是正确的。所以就用RT_USING_VFPD32表示浮点寄存器是D0-D31的情况,至于只有D0-D15的情况,在我写代码时没有过多考虑(因为低16个寄存器大家都有),当然也可以使用如RT_USING_VFPD16来表示。

唔,我的意思是,类似这样的方式,在编译时这个编译参数是什么?有不同的编译参数了,是否可能会出现gcc内部定义宏是有差异的。如果有差异,直接使用内置宏来做区分就显得最方便了,而不需要自己来定于宏。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有的,如下:
-mfpu=name
This specifies what floating-point hardware (or hardware emulation) is available on the target. Permissible names are: ‘auto’, ‘vfpv2’, ‘vfpv3’, ‘vfpv3-fp16’, ‘vfpv3-d16’, ‘vfpv3-d16-fp16’, ‘vfpv3xd’, ‘vfpv3xd-fp16’, ‘neon-vfpv3’, ‘neon-fp16’, ‘vfpv4’, ‘vfpv4-d16’, ‘fpv4-sp-d16’, ‘neon-vfpv4’, ‘fpv5-d16’, ‘fpv5-sp-d16’, ‘fp-armv8’, ‘neon-fp-armv8’ and ‘crypto-neon-fp-armv8’. Note that ‘neon’ is an alias for ‘neon-vfpv3’ and ‘vfp’ is an alias for ‘vfpv2’.

The setting ‘auto’ is the default and is special. It causes the compiler to select the floating-point and Advanced SIMD instructions based on the settings of -mcpu and -march.

If the selected floating-point hardware includes the NEON extension (e.g. -mfpu=neon), note that floating-point operations are not generated by GCC’s auto-vectorization pass unless -funsafe-math-optimizations is also specified. This is because NEON hardware does not fully implement the IEEE 754 standard for floating-point arithmetic (in particular denormal values are treated as zero), so the use of NEON instructions may lead to a loss of precision.

You can also set the fpu name at function level by using the target("fpu=") function attributes (see ARM Function Attributes) or pragmas (see Function Specific Option Pragmas).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

但是之前了解的是没办法让这个-mfpu 编译选项被.S文件识别出。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以用不同的编译参数从 gcc dump 出两份宏定义,然后对比一下文件是否有差异

vstmdb sp!, {d16-d31} @ save the upper 16 registers
#endif
vmrs r5, fpscr
stmfd sp!, {r5}
1:
Expand Down Expand Up @@ -237,7 +239,9 @@ rt_hw_context_switch_exit:
beq 1f
ldmfd sp!, {r5}
vmsr fpscr, r5
vldmia sp!, {d16-d31}
#ifdef RT_USING_VFPD32 /*there art 32 double float registers to restore*/
vldmia sp!, {d16-d31} @ restore the upper 16 registers
#endif
vldmia sp!, {d0-d15}
1:
#endif
Expand Down
21 changes: 19 additions & 2 deletions libcpu/arm/cortex-a/start_gcc.S
Original file line number Diff line number Diff line change
Expand Up @@ -458,13 +458,30 @@ vector_irq:

#else
stmfd sp!, {r0-r12,lr}
#ifdef RT_USING_FPU
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

中断一般都是主动保存,主动恢复 FPU 状态的。否则为了一个小概率发生的事件,系统却要付出每一次 IRQ 增加大量访存的巨大代价。这是不合理的。

因此建议是新增 FPU_SAVE/RESTORE 的 API/汇编宏,而不是修改 IRQ

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同意,这样会比较合理。中断函数中是否使用了浮点数,编程人员是知道的,主动进行保护是明智的。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

但这样好像也存在一个问题,就是线程在创建的时候就需要告知rt_hw_stack_init在初始化栈时要预留浮点单元的栈空间。所以相关的接口函数参数可能需要修改。

vstmdb sp!, {d0-d15}
#ifdef RT_USING_VFPD32 /*there art 32 double float registers to save*/
vstmdb sp!, {d16-d31} @ save the upper 16 registers
#endif
vmrs r5, fpscr
stmfd sp!, {r5}
#endif

bl rt_interrupt_enter
bl rt_hw_trap_irq
bl rt_interrupt_leave

/* if rt_thread_switch_interrupt_flag set, jump to
* rt_hw_context_switch_interrupt_do and don't return */
#ifdef RT_USING_FPU
ldmfd sp!, {r5}
vmsr fpscr, r5
#ifdef RT_USING_VFPD32 /*there art 32 double float registers to restore*/
vldmia sp!, {d16-d31} @ restore the upper 16 registers
#endif
vldmia sp!, {d0-d15}
#endif

@ if rt_thread_switch_interrupt_flag set, jump to
@ rt_hw_context_switch_interrupt_do and don't return
ldr r0, =rt_thread_switch_interrupt_flag
ldr r1, [r0]
cmp r1, #1
Expand Down