Memory, memory, always memory !!!

One of the things that remind me programming computers of my childhood is how tight the memory is on ESP8266. Whether you use c++ with Arduino or javascript with Espruino, it’s a problem you’ll likely to meet sooner than later. Here I will describe a bit how I deal with it.

  • SRAM: F()/Progmem, define/const, cost of libraries,
  • DRAM: memory leaks, Stack, Heap, fragmentation, Strings, STL, etc.

Memory, Arduino and C++

I follow a procedure to check my programs according to the following steps:

Memory Leaks

Memory leaks are usually easy to spot and solve, as the available memory keeps decreasing while your program is not supposed to use any new memory.

  1. Find all memory allocation instructions new/malloc. Verify that there is a delete/free for each.
  2. If there are still leaks. I observe the memory in some loops after the program has loaded everything. For example, does the server request reply loop always start with the same amount of memory? I try to disable components of the program to isolate the leak see if it is still there.

For ESP8266, you keep an eye on DRAM heap memory with :

Serial.println(ESP.getFreeHeap(),DEC);

SRAM : globals/defines, F()/Progmem & libraries

We have little control over SRAM usage, which is the initial memory use. As long as you use F function for your constant char strings, little or no global variables, you are ok. It will mainly depend on the libraries included. See also this article on how to optimize your SRAM usage.

The SRAM is the memory used to load the program when started. The program itself is executed from the flash, but all global variables will be stored in SRAM. The flash use is stated as « program storage space ». The SRAM use is stated as « Global variables » in Arduino IDE.

Sketch uses 297240 bytes (28%) of program storage space. Maximum is 1044464 bytes.
Global variables use 27432 bytes (33%) of dynamic memory, leaving 54488 bytes for local variables. Maximum is 81920 bytes.

a) What SRAM shows

It’s a bit tricky to know what is counted in SRAM. The compiler will optimize to only include the executed code, so don’t expect all the code to be added to SRAM with an « #include », it will only be added when the code is executed, when the function is called.

b) Globals vs Defines

To declare variables, we have the choice between global const variables and defines. The compiler does some optimizations that blur the difference. Mainly, the difference is that global const variables may use memory if declared without PROGMEM. It depends on the compiler. For char strings, they always do, but for other variables it depends. The defines will take memory each time they are used, may end up to be more expensive than const var, unless F function is used

//Example comparing define and global
#define MYSTRINGDEF "hello"
#define MYINTDEF 156

I am not sure yet, but it seems that a char or an int value does not take any SRAM, and seems to be stored in flash.

String str="";
str+=' ';// this do not take SRAM
str+=" ";// this use SRAM (it may not show clearly as the compiler may put several in one block)

c) String literals and F function (and PROGMEM)

Any string used in your program will require two memory slots, one for the char sequence and another for the pointer to this sequence. I spare you the details about Harvard architecture of microcontrollers, but in practice, this means that when the program is started, all the char sequences are loaded from flash into memory to be addressable, as they cannot be addressed directly on the flash. The consequence is that every string literal will consume SRAM, unless using PROGMEM or F function.

The F function is a macro that will tell the compiler to let the string in flash and load it on demand. Serial.println and String can understand it, but it will not behave like a normal char *. Using String.c_str(), it can be converted to a normal char *. Here is an example :

Serial.println("hello world");    // takes SRAM memory
Serial.println(F("hello world")); // do not take memory
String str="hello";               // takes SRAM memory
String str=F("hello");            // do not take memory

char *cptr=String(F("hello")).c_str(); // convert to normal char *

The F function cannot be used in a global context, you have to use PROGMEM, like shown here. The expression of F() function is following.

#define F(string_literal) (reinterpret_cast<const __FlashStringHelper *>(((__extension__({static const char __c[]) = ((string_literal)); &__c[0];})))))

d) Error « section type conflict »

If you use F function, you may run, like I did, into this error :

error: __c causes a section type conflict with __c

I found an article explaining why, and suggesting the following way to circumvent it. Basically, the number of variables is limited with the same attribute. The solution is to add an attribute (that has to start with « .irom.text. », to not reach the limit. This has to be included in each header.

#define FX(string_literal) (reinterpret_cast<const __FlashStringHelper *>(((__extension__({static const char __c[] __attribute__((section(".irom.text.myheader"))) = ((string_literal)); &__c[0];})))))

In the following example, I redefine only the attribute inside each header. I use RF to make a normal char* out of it.

// general level
#define FX(string_literal) (reinterpret_cast<const __FlashStringHelper *>(((__extension__({static const char __c[] __attribute__((section(FTEMPLATE))) = ((string_literal)); &__c[0];})))))
#define RF(x) String(FX(x)).c_str()
// inside each header
#undef FTEMPLATE
#define FTEMPLATE ".irom.text.myheader"

e) SRAM from libraries

Your main SRAM usage should come from the included libraries. I found this article useful for this topic. I tested out few esp8266 libraries to see how much SRAM they take :

  • Serial : begin : 72 bytes, println : 432 bytes
  • String : constructor : 20 bytes
  • digitalRead/digitalWrite: 420 bytes
  • Ticker.h : attachms : 56 bytes
  • Wifi : begin : 12 bytes
  • Esp8266WebServer.h : 684 bytes
  • Wire: begin 24 bytes
  • SPIFFS : open : 688 bytes
  • std::vector : insert : 72 bytes
  • std::string : find : 72 bytes
  • std::map: empty instance : 256 bytes
  • Math.h : log : 12 bytes, exp : 52 bytes

You see from these number that the top SRAM use are from : SPIFFS 688bytes,  Esp8266WebServer.h 684 bytes, Serial.println : 432, digitalread/digitalwrite 420 bytes. I am confident that this use is justified in this case. There is nothing to do, but it is interesting to know. It is not true for few libraries, e.g. non-blocking DHT sensor library I use, and I had to change few code files to add F function around error message string literals.

 

DRAM : the Stack and the Heap

The DRAM is shared by two types of memory. One is the Stack, that is used to store arguments when functions are called, or when local variables are declared, and freed when the functions return. The stack does not involve much management except not declaring too large local variables, and not recursing too deep. The memory type is the Heap, that is used for every dynamic allocation using new or malloc. The memory allocated in heap is not freed when functions return, and need to be freed manually using delete or free. This type of memory is the one requiring management, and creating leaks (see above).

To keep an eye on heap memory available I use the following. Be aware that the stack use will also decrease the available free heap. So this instruction is giving the usage of both Stack and Heap:

Serial.println(ESP.getFreeHeap(),DEC);

I regularly check the DRAM use of my program to see if it is as expected. I monitor how much DRAM memory my code use, and how much the library use (in addition to SRAM). For instance, it allowed me to spot that my std::map<std::string, std::string> with 6 entries may use as much as 1kb per map, one of the focus of my attempt to optimize memory use (see below).

a) Memory blocks minimal size of 16 bytes (and alignment)

The esp8266 memory is only usable by blocks of 16 bytes. If you allocate 20 bytes, you’ll get 32bytes (2 blocks). It is important if you plan to allocate small structures (see next topic about fragmentation). The memory is also 4 bytes aligned, which means that each structure/object will take a size multiple of 4, unless specified « packed » keyword. If you use the packed keyword and unaligned structures, expect the processing to be inoptimally slower.

b) Fragmentation

Allocating small structures may also create heap fragmentation when many allowed small blocs are so dispersed in the memory that they prevent allocating large blocks:  Even if the memory is available, it is unusably fragmented. I did not come across such situation yet, but you do, use memory pools to always allocate all your small objects close to each other in the same place.

c) Arduino String vs. Stl std::string

The difference between the two is the growth rate. One of the big memory consumer data structure is the string. Not only literals require F function to keep them out of memory, but both versions of the String object itself consume a lot of heap memory.

These aspects are implementation dependent. The standard implementation of Arduino String and Stl std::string for arduino have this behavior :

  • When created empty both objects take 32 bytes. The starting size is so big because, I think, there is an optimization for small strings, an internal fixed table storable in the stack is used instead of the heap, preventing fragmentation.
  • When started from a string literal, they take less memory.
  • The difference between the two comes when you start adding characters. Each allocator has a different strategy :
    • Arduino String will just allocate an additional bloc, of 16bytes for esp8266, minimizing the final memory usage.
    • Stl std::string will double the size each time, a strategy meant to minimize recopy: 32, 64, 125, 256, 512, 1024, etc. If you use them a lot on arduino like me, be careful that they don’t get out of hand, e.g. large static string that take a bit more that the allocation threshold, i.e. 530bytes string takes 1024bytes.

d) The problem with Strings, especially small ones

If you use a lot of short strings, you’ll end up with a lot of misused memory. E.g. a table with 50 strings of 4 chars takes 50×32=1600bytes when the data is only using 4*50=200bytes. The starting heap is typically 45kb, you can fit only 30 of these.

The solution? don’t use strings in mass, or store char * and make the memory management yourself. I searched a lot for optimizations, but query results are masked by people who optimize for speed and for avoiding fragmentation. If you have a suggestion, I’ll be glad to read it, please comment below. I may have to make my own economic string that uses less memory. In theory, you only need 6 extra bytes to fit any 4byte aligned data.

e) Stl std::map & std::vector… of String/std::string

Stl maps and vectors are also a bit expensive to use. Even empty they take 64byte maps and 24bytes for vectors. Coupled with strings it’s a proper memory nightmare. A map of 5 entries uses 10 short strings that may take up to 1kb. My nightmare became reality when I wanted to make tree maps of string (maps of maps of string).

I designed my code to be able to compare different options regarding maps and vector storage especially of strings. So far I uses an stl implementation, but I switched to a more economic alternative: to minimize memory, I sacrifice speed and store everything in one string, and use string search to find back my key. For the map I made this SSMap & GenSSMap object source code. I have plans for vectors, I did not carried out yet. For tree map, I have this code yet.

The best solution I found so far is to store everything in JSON format. I have some code ready to integrate as an alternative. I’ll add it here when it’s part of the git and make a small comparison of memory usage with the three implementations, one STL and two home-made.

Espruino and JS

From what I remember, the problem of memory with espruino is multiple.

  1. Visualize the memory : dump, sizeof, etc
  2. Try to store things in the right encoding.
  3. Use less than 4 letter variables
  4. Keep your code shorter. Even when stored in the 44kb available for code, code source takes space.
  5.  Switch to Arduino, because it’s much more efficient

 

References/See also

 

2 commentaires sur “Memory, memory, always memory !!!

Laisser un commentaire