How good is your memory?

Dec 2021 • 13 min read

/media/how_good_is_your_mem_1.png

In this post we’ll explore various memory health concerns in NodeJS and ways to solve them. This is a semi follow-up to NodeJS in Flames - which discussed CPU profiling of NodeJS applications using flame-graphs.

It’s important to think about memory performance because in simple terms - bad performance can cause hard-to-squash bugs, downtimes, loss-in-revenue and other bad things that the grinch would love at Christmas. Imagine shipping a great product and then having multiple incidents caused by memory exhaustion in your application. In Node it’ll look like this;

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0xa1a640 node::Abort() [node]
 2: 0xa1aa4c node::OnFatalError(char const*, char const*) [node]
 3: 0xb9a68e v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
 4: 0xb9aa09 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
 5: 0xd57c85  [node]
 6: 0xd58316 v8::internal::Heap::RecomputeLimits(v8::internal::GarbageCollector) [node]
 7: 0xd64bd5 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [node]
 8: 0xd65a85 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
 9: 0xd6853c v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
10: 0xd2ef5b v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationType, v8::internal::AllocationOrigin) [node]
11: 0x107158e v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [node]
12: 0x140de99  [node]

How do you begin to think about solving this? To answer this question in this article, we will:

Memory Management Basics

Memory management is the process of controlling and coordinating the way a software application accesses computer memory. There are two things to think about here - how memory is allocated and how it is freed.

For allocation, software typically uses two regions of memory - stack and heap memory. The stack stores static data (e.g. string values and object addresses), while the heap stores dynamic data (e.g. objects) and is the largest portion as a result.

To free unused allocated heap memory, most modern programming languages perform Garbage Collection (GC). It’s one of the most common memory management techniques and runs at certain intervals.

How NodeJS allocates memory

/media/how_good_is_your_mem_2.png

Node runs on the V8 engine so the way it handles memory is based on the V8 memory structure. Any program is represented by allocated memory in the V8 process called a Resident Set.

The heap memory has the largest area because it stores dynamic data. It is also where Garbage Collection takes place, which we’ll talk about in the next section.

Heap memory is divided into various segments (spaces). Each space consists of a set of pages, which are blocks of virtual memory which are mapped to physical memory in an OS. In V8, each page is 1MB in size. We will describe five (5) spaces below:

How NodeJS frees up memory

/media/how_good_is_your_mem_3.png

The summary of how Node frees up memory should hopefully sound familiar at this point - GC cycles! Garbage Collection routinely cleans up unused memory by clearing out objects that aren’t needed anymore. The New Space is managed by the Scavenger (Minor GC) and the Old Space is managed by the Mark-Sweep & Mark-Compact (Major GC).

What we haven’t discussed yet is how the GC algorithms determine whether an object is used/unused. From this great tour of V8 garbage collection, an object is unused if:

  1. It is not a root object. A root object is one that’s pointed to directly by V8, or a global object.
  2. It is not referenced by any live or root object.

This means that the only way your program will free up memory is if it has a healthy cycle of unused objects. This is why using globals is discouraged, because they will never be removed by GC. They will take up memory space in your program forever and even worse if they continue to grow unbounded - your program will run out of heap memory and crash.

99 Problems but is Memory Management one of them?

Now that we understand the basics of how Node allocates and frees memory, let’s run through a checklist of possible issues we can have with memory.

#1 - Is NodeJS the cause of memory exhaustion?

This is an article about memory usage in NodeJS applications, so it’s easy to assume that Node will always be the cause of the problem, but it won’t. The correct first step is to get information about what’s eating up memory.

A couple of ways you can do this are:

#2 - Do you have a memory leak in Node? Is your app crashing?

A memory leak happens when your program allocates a memory continuously without freeing it. It becomes a problem if the process running your program then runs out of available memory and crashes, leading to a downtime 2.

Because the fine details of how memory is managed is abstracted away from you by Node, they can be tough to debug. However if you understand how the memory management works, you’re better equipped to understand how to solve them.

Let’s take a look at this example of a server endpoint with a leak;

const list = [];

app.use('/', function(req, res){
  list.push({
    "name": "hi",
    "arr":  new Array(100000)
  });
  res.status(200).send({message: "test"});
});

The reason for the memory leak here is that the size of the list array will grow with every request, and it won’t be garbage-collected because it’s a global variable. In the wild, it won’t always be this simple to figure out. e.g. your code may be fine, but a package you imported contains a nasty leak. Let’s talk about how to debug memory leaks below;

Using the chrome debugger

In most articles you will find on the internet, this will be the recommended method 3. It applies if you can reproduce the leak locally or can tunnel your production instance to your localhost (e.g. ssh -L 8080:localhost:8080 admin@example.com).

I’ll summarise this method described in the Heroku article below:

I like this method because it uses a familiar environment. But depending on your setup, you might not be able to port forward your production instance to localhost for security or other concerns, and might not be able to reproduce the leak locally. The next method - using llnode - is useful if you have access to the instance and can run shell commands on it.

Using llnode (Linux)

llnode is described as an lldb plugin for Node.js and V8, which enables inspection of JavaScript states for insights into Node.js processes and their core dumps. It is super useful for running postmortems when Node processes crash, but can also be used to analyse running programs.

It’s my current favourite method for memory analysis because of the port-forwarding issue with the Chrome debugger. The use-cases are:

  1. An app that’s running out of memory quickly - we can enable core dumps which will produce a core file when a crash happens. We can then inspect the memory state in the core file using llnode.
  2. An app that has a consistent memory growth before being replaced - we can take a core dump from the running process (this will slow down the Node process for a long or short time) and also use the core file with llnode for analysis.

The steps for using llnode are as follows:

With llnode in your arsenal, you can figure out these sorts of issues on the affected server, or copy the core dump to your local machine and run the steps as well. As long as you have the corresponding node executable for your program, you can run the diagnostics anywhere.

An important disclaimer is that core dump files contain everything that was in memory at the time it was generated, which could include sensitive data like keys and user passwords. You should treat them with care and take measures to secure them properly. They can also be very large depending on the application memory usage, so be aware of any disk space constraints you may have and remember to turn them off when you’re done troubleshooting, especially if your servers aren’t ephemeral.

#3 - Is your application freeing up memory properly?

/media/how_good_is_your_mem_4.png

This isn’t really different from having a memory leak, except that it might be slower to surface and the crash would happen over a longer period of time. I just want to use this section to emphasize that your app doesn’t need to crash before you consider whether you’re using memory properly. Some sub-questions here are:

Consider this scenario - you need to move your app from one infrastructure to another. e.g. from Docker running on Kubernetes to a single process running on an AWS EC2 instance (please don’t do that). When running on Kubernetes, if your app’s memory grows steadily until it hits the limit, the pod will get replaced. But on a standalone server running one or two processes, you might not have that luxury out of the box.

For example, you would have to manage the process with supervisord to make sure it restarts when it crashes. You might not think of that right away, so you’re guaranteed at least one downtime (and continuous downtimes depending on how long it takes to restart the process on a crash).

To confirm if you’re freeing up memory, you can do the following:

#4 - Are you storing large objects in memory?

If you have large objects in memory, you run several risks including memory crashes, slower processing etc. It’s even worse if those objects aren’t being garbage-collected - you’ll get a memory crash faster than you can say Jack Sparrow.

I can think of two examples of when objects can get heavy and should be noted;

#5 - Are you using global variables?

The gist with global variables is that they won’t be Garbage-Collected (removed from memory). While it’s not recommended, you can get by with global variables as long as you are not storing large amounts of data in them and not cleaning them up yourself.

For example if you are using a global variable to do some in-memory caching, you need to remember to reset the value where it’s applicable to your application. If you don’t do this, you’re bound to run out of memory eventually.

In our example, the list array is a global variable. This is the reason the list continues to grow with every request until the crash happens. Since our aim was to collect a record of each request for some reason, we can modify the implementation to write to a storage layer instead. For simplicity, we can stream to a file:

app.use('/', function(req, res){
  // this will overwrite the file on every request, but 
  // I want to keep it in this code block for brevity of
  // the example.
  const writableStream = fs.createWriteStream("out.txt");
  writableStream.write(JSON.stringify({
    "name": "hi",
    "arr":  new Array(100000)
  }));
  writableStream.on('error',  (error) => {
      console.log(`An error occured while writing to the file. Error: ${error.message}`);
  });
  res.status(200).send({message: "test"});
});

And now the server can handle even double the amount of requests without running out of memory:

$ ab -c 100 -n 2000 -k http://localhost:3001/
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 200 requests
Completed 400 requests
Completed 600 requests
Completed 800 requests
Completed 1000 requests
Completed 1200 requests
Completed 1400 requests
Completed 1600 requests
Completed 1800 requests
Completed 2000 requests
Finished 2000 requests

Conclusion

When thinking about memory performance in your app, consider the following questions:

In this post, we’ve discussed memory usage in Node and a bunch of different stumbling blocks. There are a lot of articles about figuring out memory leaks using the Chrome debugger, so I didn’t spend too much time talking about that.

Troubleshooting with llnode is less popular however, and might be the perfect solution for some production scenarios. Hopefully you find that interesting and explore more. The code samples can be found in this repository. Let me know if you have any comments or concerns!

Thanks to Chidi, Ari and Ife for looking through drafts of this ❤️

Further Reading

Footnotes

  1. Prometheus doesn’t have an easy way to export process metrics, but if you can write an exporter for CPU and memory stats for Node, you can then visualise with Grafana and set up alerts with Prometheus. Sounds like a fun project ngl. 

  2. Why are memory leaks a problem? 

  3. Some examples of great resources on this are The Heroku article and Auth0 describing finding leaks in browser Javascript code

  4. This will remove the limit on the size of files that can be created on the server, allowing core dumps to be created. Reason for this is that they can get very large so they are disabled by default. 

Hi! My name is Opeyemi. I am an SRE that cares about Observability, Performance and Dogs. You can learn more about me or send me a message on Twitter.

Share on