Memory usage and system crash

szymon

Active member
May 2, 2020
39
7
I'm looking for guidance on tuning PI and macOS Catalina so that my system stops crashing. If I happen to be working on the machine while it's processing then I can prevent a crash, but if I leave it overnight with a big job then I come back the next morning to the machine having restarted, which isn't good ;-). I'm not sure if this is a product defect or just something that needs tuning out, so I thought I'd put it under Bug Reports but feel free to dump it wherever is needed.

My machine is a 2019 16" MacBook Pro, with 32Gb of Memory, an OctoCore i9 running at 2.4Ghz, and a 512Gb SSD. This is my primary work machine, and I just use it for PI on the side (as I don't have another machine with this kind of processing power). In general I am very happy with running PI on it. I limit PI to 15 of the 16 available CPU threads so that I can keep doing work in email clients, Web browsers, terminal emulators etc, and PI honours that very well, even when it's running at 1500% of my CPU (cool!) the rest of the system is responsive and runs well so I can keep working. Since this is a work machine, I am unable to plug in external disk drives. I am able to use a network based storage solution, which I do use to free space up on the local SSD as needed, but I cannot just go and plug in a 6Tb USB disk and be done with it; the USB ports on the machine are technically disabled by my work's hardening process and I cannot get around this.

I live in LPLondon and am forced to take short exposures, so to get any meaningful integration I need to take a lot of images. I therefore routinely process large numbers of files (hundreds or even thousands). My primary imaging camera currently is an Altair 269C (the IMX269 sensor is pretty awesome by the way), which produces ~40Mb files; my imaging capture software NINA stores these natively as LZ4HC compressed XISF files which take up ~30Mb each.

When going through an integration process (assume I've prepared masters of my calibration frames to use), I need to take my lights and calibrate them (1 copy, which doubles the file size to 80Mb or around 60Mb compressed), then debayer them (which multiplies file size by 3 - we're now up to 240Mb or 180Mb compressed), then go through subframe selector (another copy of the 240Mb files), then go through Star Alignment (another copy of the 240Mb files), then perform the actual integration. And that's skipping cosmetic correction and local normalisation (I don't consider myself proficient enough to bring those into my stacking workflow yet). So each file actually needs around 840Mb or 640Mb compressed of disk space. I run out of disk space very easily!

When trying to integrate large numbers of files, the PI process grows to huge memory sizes; I've seen 80Gb. Obviously that doesn't all fit in my 32Gb memory, so it's using virtual memory, which macOS is pretty good at using. However this means that macOS virtual memory is fighting for disk resources with my image files. In addition, PI itself stores a bunch of temporary files, and these are also fighting for disk resources.

So -- we know so far that I am constrained on disk space. However, what happens if I start running out of disk space? From my observations, one of three things:

1. If PI "notices" that I am out of disk space first -- say it happens to be trying to write a file -- then it can handle it, and the way it does this can be configured (I usually have it set to ask the user, meaning that it pauses what it was doing and throws up an Abort or Ignore prompt, I can then clear up disk space, abort and restart -- a "retry" prompt would be great but I can live without it).

2. If the Operating System Virtual Memory runtime "notices" that I am out of disk space first, then it throws out a window saying "Your system has run out of memory" and offering me the opportunity to kill running processes. The first few times this happened I got very confused, because when at this stage I look at the list of running processes, then yes PI is taking a huge amount of memory, but the OS shows that all as paged memory and usually shows a good 20Gb of memory free! It took me a while to realise that what it actually means is that the system has run out of hard disk space meaning that the virtual memory can no longer accommodate further growth in the huge PI process. Note that if at this point I don't clear up disk space, then very soon the machine just crashes.

3. If neither of these "notice" that I am out of disk space, the machine just crashes. This has happened to me twice while I was using the machine (editing documents, browsing the Web, etc) while PI was working away in the background. Boom, just turns itself off and restarts. I've also had multiple crashes where I leave PI running overnight, but I don't know if those were case 2 or case 3 (since I'm not in front of the computer I don't know if it tried to tell me that it's running out of memory or not before rebooting the whole machine!).

Note that I do try and be careful with my disk space. I use a network drive to store images as I go through the steps (I'm not trying to do all of this with the automated scripts, I do each stage in turn and copy off or delete the files from previous stages). However while processing at an absolute minimum I still need to have available around 1/2Gb of disk space per image (e.g. in the Subframe Selector or for Star Alignment I need to have an input image of 240Mb and an output image of 240Mb). While integrating at a minimum I need the 240Mb per image, or 180Mb if compressed. I also plan to leave "enough" space, I hope, for the Operating System virtual memory, for the PI temporary files, and for anything else that might be needed, when I perform one of these large operations.

I consider it a bug that, rather than getting an "Out of Memory" message and crashing, the PI process seems to keep trying to run at all costs, to the extent of crashing the machine. I don't know if that's a "product bug", or just an artefact of how macOS works, or if by being storage constrained rather than physical memory constrained I'm just an edge case, or what, but clearly this behaviour is wrong :)

So, what I am looking for is some kind of explanation for how I can get around these crashes. I need a solution which is workable within my constraints (in particular one that doesn't require me to add more storage space to the machine, which I cannot do). Is there a way of limiting the memory size of the PI process? Either technically (by somehow telling it "don't use more than X gb") or configuratively (when you do your thing, don't do so many of Y which will be more CPU intensive but will use less memory)? Is there a way of telling macOS to just fail a memory allocation if it's running low on space to create more virtual memory, rather than just allowing it to grow to 80Gb or more? Is there a way to stop the file sizes from growing so much? (I don't understand much about storage of astronomical data; I use compressed XISF because NINA supports it natively, but it seems that the 40Mb data turning into 250Mb files seems somewhat excessive!). Can I tell PI not to use "temporary files" at all? How can I lower memory (and thereby disk) usage at a cost of more CPU usage? Can I make this fail in a better way (e.g. if this situation happens, PI stops processing and calmly tells me "get more disk space")? What other suggestions or solutions can you provide?

Many thanks,

Frustrated of London.
 

pfile

PTeam Member
Nov 23, 2009
5,099
41
there might be some "sysctl" settings that can influence how the OSX memory system works, but more than likely this is a bug in how OSX handles out-of-memory problems. PI puts a lot of stress on machines of all flavors and you can see even today there's a linux user saying that the oom daemon is killing their PI process (which is probably what OSX should be doing rather than apparently eventually kernel panicking.)

one thing that might help is an external SSD for your working files, in fact on macrumors.com i see that sandisk has 2TB external usb3 SSDs for $280 US today.

the 'temporary' files are necessary as each step in the pipeline communicates with the other using those files on disk.

probably the #1 ram consumer in this flow is ImageIntegration, and you can tune how much memory it uses at the possible cost of execution time. i don't think you can limit StarAlignment but if you were to reduce the number of threads PI is allowed to use then it follows that it will be working on fewer images concurrently and therefore using less ram.

finally the macbooks do seem to have somewhat subpar thermal performance and crashes under high load seems to just be a "thing". apparently apple have not tuned the throttling properly to handle an extended period of high CPU. so limiting the # of threads might also help on that front.

rob
 

szymon

Active member
May 2, 2020
39
7
there might be some "sysctl" settings that can influence how the OSX memory system works, but more than likely this is a bug in how OSX handles out-of-memory problems. PI puts a lot of stress on machines of all flavors and you can see even today there's a linux user saying that the oom daemon is killing their PI process (which is probably what OSX should be doing rather than apparently eventually kernel panicking.)
Agreed; I'd be fine with it as a last resort killing the PI process! I do agree there's probably not much that you guys can do about that though :-\

one thing that might help is an external SSD for your working files, in fact on macrumors.com i see that sandisk has 2TB external usb3 SSDs for $280 US today.
Arrrrrggghhhh. I thought I was clear:
Since this is a work machine, I am unable to plug in external disk drives.
[...]
I cannot just go and plug in a 6Tb USB disk and be done with it; the USB ports on the machine are technically disabled by my work's hardening process and I cannot get around this.
[...]
I need a solution which is workable within my constraints (in particular one that doesn't require me to add more storage space to the machine, which I cannot do).
Yeah I'd love to just add more disk, but I can't! Strictly speaking I probably could hack a way around it, but I like my job and want to keep it... :)

probably the #1 ram consumer in this flow is ImageIntegration, and you can tune how much memory it uses at the possible cost of execution time. i don't think you can limit StarAlignment but if you were to reduce the number of threads PI is allowed to use then it follows that it will be working on fewer images concurrently and therefore using less ram.
Now this suggestion is very interesting. I hadn't thought about it that way. I will experiment in particular with running the overnight batch jobs at say 8 out of 16 threads, hopefully that will make a difference. Thank you!

finally the macbooks do seem to have somewhat subpar thermal performance and crashes under high load seems to just be a "thing". apparently apple have not tuned the throttling properly to handle an extended period of high CPU. so limiting the # of threads might also help on that front.
That was certainly true of my last MacBook, this one however seems significantly better. In any case it sits on a riser which has active cooling (because I needed that for my previous one), so hopefully that shouldn't be an issue here :)

-simon
 

pfile

PTeam Member
Nov 23, 2009
5,099
41
Yeah I'd love to just add more disk, but I can't! Strictly speaking I probably could hack a way around it, but I like my job and want to keep it...
im sorry it was all just too much to read that closely, sorry i gave you a bad suggestion.

rob
 

szymon

Active member
May 2, 2020
39
7
im sorry it was all just too much to read that closely, sorry i gave you a bad suggestion.
No no it's all good, I appreciate your taking the time to reply. I'm going to try running with less threads and see if that helps, great suggestion :cool:
 

szymon

Active member
May 2, 2020
39
7
Ok, I have an update. It turns out that to stop this happening, I just need to not compress the XISF files that are going to be used for integration. This dramatically lowers the memory requirements for integration. Compression is good for every stage except integration/drizzle! So basically, don't have a compression hint for the output of star alignment registration and all will be good :cool:
 

pfile

PTeam Member
Nov 23, 2009
5,099
41
ok, that's another thing i missed then. i think both FITS and XISF support incremental reading, meaning, PI can partially load the file to work on the set of pixel rows it is processing. however, if the file is compressed (or is a CR2 file) then i think it has to be read into memory in its entirety and decompressed, which dramatically increases the memory footprint. so by turning that off you are essentially letting the disk be a kind of virtual memory system for integration tasks.

there may not be anything that can be done about this since PI needs to see the whole pixel stack at once to do the normalization and rejection.

rob
 

szymon

Active member
May 2, 2020
39
7
Just crashing in computers sometimes has to do with temperature. What temperatures does the machine reach after eg. an hour of calculations?
It wasn't a temperature issue. My new MacBook handles temperatures very well -- but I keep it in the same place as my old MacBook, which means it's on a riser with a built in fan that is on, and it has the room fan right next to it. It also very happily performs other PI processing at 1500% CPU for long periods -- it's only when the memory becomes an issue that it dies.
 

szymon

Active member
May 2, 2020
39
7
ok, that's another thing i missed then. i think both FITS and XISF support incremental reading, meaning, PI can partially load the file to work on the set of pixel rows it is processing. however, if the file is compressed (or is a CR2 file) then i think it has to be read into memory in its entirety and decompressed, which dramatically increases the memory footprint. so by turning that off you are essentially letting the disk be a kind of virtual memory system for integration tasks.

there may not be anything that can be done about this since PI needs to see the whole pixel stack at once to do the normalization and rejection
That's absolutely fine. Not compressing the files just before integration is an absolutely fair and valid workaround. :). Thank you for your support!