If your ARM Cortex-M has a DWT, you can use the cycle counter to measure the cycles spent executing code. That could be used for delay loops or to measure execution time.
Some ARM Cortex-M have a DWT (Data Watchpoint and Trace) unit implemented, and it has a nice feature in that unit which counts the execution cycles. The DWT is usually implemented on most Cortex-M3, M4 and M7 devices, including e.g. the NXP Kinetis or LPC devices.