Why memory configuration matters

The default Alif VS Code template divides memory equally between the two Cortex-M55 cores and allocates modest stack/heap sizes suitable for simple examples like Blinky. A MobileNetV2 model with ExecuTorch needs significantly more:

  • The embedded model is approximately 3.7 MB (stored in MRAM/flash).
  • The ExecuTorch runtime, operator libraries, and application code add another 800 KB of code.
  • Inference requires approximately 7.6 MB of SRAM for memory pools and intermediate tensors.

You need to reconfigure the MRAM allocation, stack/heap sizes, and linker script to fit this workload.

Edit the memory region configuration

Open device/ensemble/RTE/Device/AE822FA0E5597LS0_M55_HP/app_mem_regions.h.

Change the following values from their defaults:

DefineDefault valueNew valuePurpose
APP_MRAM_HE_BASE0x800000000x80580000Move HE core out of the way
APP_MRAM_HE_SIZE0x002000000x00000000Give HE core zero MRAM
APP_MRAM_HP_BASE0x802000000x80000000HP core starts at MRAM base
APP_MRAM_HP_SIZE0x002000000x00580000HP core gets full 5.5 MB
APP_HP_STACK_SIZE0x000020000x0000400016 KB stack (doubled)
APP_HP_HEAP_SIZE0x000040000x0001000064 KB heap (quadrupled)

The default template splits MRAM 2 MB / 2 MB between the two cores. Since you’re only using the HP core, you give it the entire 5.5 MB of available MRAM. The increased stack and heap accommodate ExecuTorch’s initialization code, which uses more stack depth and a few small dynamic allocations.

Edit the linker script

Open device/ensemble/RTE/Device/AE822FA0E5597LS0_M55_HP/linker_gnu_mram.ld.src.

Make the following three changes to this file.

Add SRAM1 to the zero-initialization table

The application code places the 4 MB planned memory pool in SRAM1. The C runtime startup code needs to zero-initialize this region. Find the .zero.table section:

    

        
        
#if __HAS_BULK_SRAM
    LONG (ADDR(.bss.at_sram0))
    LONG (SIZEOF(.bss.at_sram0)/4)
#endif

    

Add two lines for SRAM1 immediately after:

    

        
        
#if __HAS_BULK_SRAM
    LONG (ADDR(.bss.at_sram0))
    LONG (SIZEOF(.bss.at_sram0)/4)
    LONG (ADDR(.bss.at_sram1))
    LONG (SIZEOF(.bss.at_sram1)/4)
#endif

    

Add GOT sections to the data copy table

The precompiled ExecuTorch libraries use position-independent code (PIC), which relies on a Global Offset Table (GOT). The GOT must be copied from flash to RAM at startup, otherwise the table contains zeros and every indirect function call (including C++ vtable lookups) crashes with a BusFault.

Find the .data.at_dtcm section:

    

        
        
  .data.at_dtcm : ALIGN(8)
  {
    *(vtable)
    *(.data)
    *(.data*)
    *arm_common_tables*(.data* .rodata*)

    KEEP(*(.jcr*))

    . = ALIGN(8);

    

Add the GOT entries after KEEP(*(.jcr*)):

    

        
        
  .data.at_dtcm : ALIGN(8)
  {
    *(vtable)
    *(.data)
    *(.data*)
    *arm_common_tables*(.data* .rodata*)

    KEEP(*(.jcr*))

    /* GOT for PIC code in precompiled ExecuTorch libraries */
    *(.got)
    *(.got.plt)

    . = ALIGN(8);

    
Note

This issue can be difficult to diagnose. Without these two lines, the firmware boots and loads the model, but crashes with a BusFault when ExecuTorch calls a virtual function. The GOT stores addresses for indirect calls. If the startup code doesn’t copy it from flash to RAM, those lookups resolve to address zero and the CPU faults.

Add SRAM section wildcards

The application code uses __attribute__((section(".bss.at_sram0"))) to place memory pools in SRAM. The stock linker script only has specific named sections for LCD and camera buffers. You need wildcard patterns to catch the ExecuTorch pools.

Find the .bss.at_sram0 section:

    

        
        
  .bss.at_sram0 (NOLOAD) : ALIGN(8)
  {
    *(.bss.lcd_crop_and_interpolate_buf)
    *(.bss.lcd_frame_buf)
    *(.bss.camera_frame_buf)
    *(.bss.camera_frame_bayer_to_rgb_buf)
  } > SRAM0
#endif

    

Replace it with expanded SRAM0 wildcards and a new SRAM1 section:

    

        
        
  .bss.at_sram0 (NOLOAD) : ALIGN(8)
  {
    *(.bss.lcd_crop_and_interpolate_buf)
    *(.bss.lcd_frame_buf)
    *(.bss.camera_frame_buf)
    *(.bss.camera_frame_bayer_to_rgb_buf)
    *(.bss.at_sram0)
    *(.bss.at_sram0.*)
  } > SRAM0

  .bss.at_sram1 (NOLOAD) : ALIGN(8)
  {
    *(.bss.at_sram1)
    *(.bss.at_sram1.*)
  } > SRAM1
#endif

    

After these changes, the memory layout is:

RegionSizeUsage
MRAM5.5 MBCode + model (~4.5 MB used)
ITCM256 KBFast code (~89% used)
DTCM1 MBStack (16 KB) + heap (64 KB) + GOT + data
SRAM04 MBMethod pool (1.5 MB) + temp pool (1.5 MB) + float input buffer (~588 KB)
SRAM14 MBPlanned memory buffers

Configure the flash settings

The Security Toolkit needs a JSON configuration file that tells it where to load the binary in MRAM and which CPU should boot it.

Open (or create) .alif/M55_HP_cfg.json and set its contents to:

    

        
        
{
  "DEVICE": {
    "disabled" : false,
    "binary": "app-device-config.json",
    "version" : "0.5.00",
    "signed": true
  },
  "USER_APP": {
    "binary": "alif-img.bin",
    "mramAddress": "0x80000000",
    "version": "1.0.0",
    "cpu_id": "M55_HP",
    "flags": ["boot"],
    "signed": false
  }
}

    

The key fields are:

  • mramAddress: must match APP_MRAM_HP_BASE (0x80000000) from app_mem_regions.h.
  • cpu_id: M55_HP tells the bootloader to start the High-Performance core.
  • flags: ["boot"]: marks this application as the boot image.

You can view the completed versions of these edited files in the workshop repository for reference.

The memory layout and flash configuration are complete. The next section covers preparing the test image.

What you’ve learned and what’s next

You’ve reconfigured the memory regions to allocate the full 5.5 MB of MRAM to the HP core, modified the linker script to support GOT relocation and SRAM memory pools, and configured the Security Toolkit JSON file to boot the application from the correct MRAM address.

Next, you’ll prepare a test image for classification by converting it to the format the model expects.

Back
Next