This is Charles Anstey, the author of the paper in question. I have been trying to find a simpler way to explain what is going on and why sub-exposure length matters.
Let's consider each term separately because they follow superposition.
Target signal : Goes up linearly with time so it depends purely on total time Tgt(t). So 1 100-minute shot or stacking 100 1-minute shots are equal.
Light pollution signal : Goes up linearly with time so it depends purely on total time LP(t). So 1 100-minute shot or stacking 100 1-minute shots are equal. However we can simply subtract this out from the image as a bias.
Light pollution noise : Goes up with the SQRT(time) so it depends purely on total time Elp(sqrt(t)). So 1 100-minute shot or stacking 100 1-minute shots are equal.
At this point if we had a noiseless camera, there is no advantage to shooting longer sub-frames and the implications are pretty astounding. When advanced amateurs start using noiseless cameras, astro-imaging will take another huge leap forward in quality. At that point, DSO imagers will be using 10-30 frames per second just like planetary imagers, collecting as many frames as they want, and then stacking only the best of the best until all the desired detail has been attained. Look up "Lucky camera" to see the results of such a camera.
However, the current real-world "affordable" cameras all have read noise. The total read noise in an image depends purely on the number of frames stacked and goes up by the SQRT(#frames)
Read Noise : RN*SQRT(# sub-exposures). So 100 1-minute shots has a lot more read noise than 1 100-minute shot.
The goal of the formula is to balance the number of sub-exposures with respect to read noise and light pollution. Reducing the sub-exposure length and having more total frames results in more total noise from the increase in read noise with no gain in target signal, fixed by total time. Increasing sub-exposure length and stacking fewer frames gains basically nothing because the faintest signal discernible above the total noise has already been limited by the total exposure time and dominated by LP noise so lowering total read noise will not help.
From a practical point of view when you don't know how much total time you wish to spend on an object, shooting longer sub-exposures allows you to keep adding more frames if you decide to keep at it with the only penalty being that if the frame is lost because of bad guiding or other error, you lose a larger piece of time. Use the formula to determine the practical upper limit of the total time for a given sub-exposure and know that if you are thinking of possibly putting in even more total time, you need to use a longer sub-exposure to be safe.
The problem with John Smith's formula is that it is only valid if you are stacking exactly two frames. Eventually if you stack enough frames, read noise will become significant enough that for the same total time you would get better results for the faintest signals shooting fewer frames for longer time. Also the implication is that shooting in LP or during the full moon somehow limits how deep you can image compared to dark skies. Completely incorrect and an imager can create an identical result under the full moon compared to darkest skies if they image with enough total time. The increase in total time may be a factor of 10+ but what else are you going to do for that night, nothing? Seeing and transparency determine the absolute limits of image quality and depth. You can't stack a bunch of 3.5" FWHM images and wind up with a 2.0" FWHM image no matter how much total time. Don't skip great nights of seeing just because the moon is out. The only caveat is the camera has to have enough well depth to cover the LP signal and have plenty of bits left over for the stretch to an 8-bit display. Cameras with only 8 or 12 bits are likely to be limited by excessive or even moderate LP.