Direct Memory Access (DMA) uses memory controllers separate from the CPU to accelerate data movement between memory locations, or between peripherals and memory. The RP2040 has 12 DMA channels which can stream an aggregate of over 100 megabytes/sec without affecting CPU performance, in many cases.
There are a huge number of options available to set up a DMA transfer. You can think of a DMA channel controller as a separate, programmable processor with the main job of moving data. Memory on the RP2040 is arranged as a bus matrix with separate memory bus control masters for each ARM core and for the DMA system, and several memory bus targets accessed by the masters. Each bus target can be accessed on each machine cycle.
A page for the Cornell University course ECE4760 explains the principles and has code to demonstrate using the DMA capability.
Here we use the DMA subsystem to produce a complete computing system, independent of the main ARM cpus. The DMA machine makes use of memory-copy ability, transport_triggered operations, and self-modifying code. The code consists of a sequence of DMA block descriptors stored in an array. The implemented operations are Turing Complete, and run at about the speed of an Arduino. About 8 million DMA blocks/second can be fetched/executed. There is a history of using only memory-moves to build a general cpu. In 2013 Stephen Dolan published x86 mov is Turing-Complete describing an example of a one-opcode machine. The paper Run-DMA by Michael Rushanan and Stephen Checkoway shows how to do this with one version (Raspbery Pi 2) of ARM DMA. The DMA system on the RP2040 has more transport-triggered functions and is a little easier to build. Joseph Primmer and I built a DMA processor using the Microchip PIC32 DMA system. Addition and branching had to be based on table-lookup. See DMA Weird machine.
See the details here.