I agree it's probably XIP cache misses taking the time. Also, even if the compiler didn't know that 258 % 240 == 14 and it actually used the hardware divider, for example if one or both of the operand values were unknown at compile time, the compiler would emit a call to a routine that does the division. That routine would have additional overhead for setting up the divider registers, preserving state in case an interrupt happens during the division, etc, so it would be considerably more than 8 cycles still.
Edit: the code for the divide routine is here: https://github.com/raspberrypi/pico-sdk ... /divider.S
Edit: the code for the divide routine is here: https://github.com/raspberrypi/pico-sdk ... /divider.S
Statistics: Posted by alastairpatrick — Tue Feb 06, 2024 12:37 am