Author Topic: PJSR is it Ok to use Settings as a file cache?  (Read 4697 times)

Offline mschuster

  • PTeam Member
  • PixInsight Jedi
  • *****
  • Posts: 1087
PJSR is it Ok to use Settings as a file cache?
« on: 2012 September 28 08:24:53 »
I have a batch script that calculates properties of image files. I'd like to cache the properties for use by other instances of the script to avoid the expensive recalculation. Is it Ok to put this stuff in Settings? The key will be the md5 hash of the image file to avoid cache misses on image file renaming and copying. I envision caching data for up to several hundred files or so with maybe 1k bytes of data per file.

Thanks,
Mike

Offline Juan Conejero

  • PTeam Member
  • PixInsight Jedi Grand Master
  • ********
  • Posts: 7111
    • http://pixinsight.com/
Re: PJSR is it Ok to use Settings as a file cache?
« Reply #1 on: 2012 September 28 10:33:22 »
Hi Mike,

Yes, you can use Settings for this purpose. I implemented ImageIntegration's cache in a similar way, and the Script Editor cache also uses the application's settings data.

When you write a key/value pair to the Settings object, you are storing the data in the application's settings file, which is a performance critical object. Now I consider this a design error, since populating the application's general settings with module and script data makes it slower, more complex and harder to maintain. Eventually, I'll change this system to generate a separate settings file for each module and script. However, the changes will be transparent to existing modules and scripts, so you don't have to bother with this. Just write your cache data against Settings.

Thank you for asking. I appreciate the care you put in your implementations.
Juan Conejero
PixInsight Development Team
http://pixinsight.com/

Offline Juan Conejero

  • PTeam Member
  • PixInsight Jedi Grand Master
  • ********
  • Posts: 7111
    • http://pixinsight.com/
Re: PJSR is it Ok to use Settings as a file cache?
« Reply #2 on: 2012 September 28 10:48:16 »
Quote
The key will be the md5 hash of the image file to avoid cache misses on image file renaming and copying

By the way, if computing the MD5 digest is expensive (I think so), you can build an efficient cache hash by concatenating the following items:

- The file's name and suffix (not the directory path, to make the hash location-independent).
- The file's last modification date and time, for example encoded as YYYYMMDDhhmmss
- Optionally: The file's length in bytes.

This is what I do in the new File Explorer window to cache image thumbnails, statistics and keywords. You can get these items from the file's directory entry. The new FileInfo object in PI 1.8 will simplify retrieving these data. For now, I'd compute the cache hash with the file name only. Adding the file times and lengths in PI 1.8 will be trivial.
Juan Conejero
PixInsight Development Team
http://pixinsight.com/

Offline mschuster

  • PTeam Member
  • PixInsight Jedi
  • *****
  • Posts: 1087
Re: PJSR is it Ok to use Settings as a file cache?
« Reply #3 on: 2012 October 23 18:57:14 »
FYI Juan, MD5 in PJSR is slow, at least 20x slower than native tool. On 64Mb file MD5 in PJSR takes about 8s on my MB Air. But I will use it anyway as I think filename/mod date is not robust enough for my app.
Mike

Offline Juan Conejero

  • PTeam Member
  • PixInsight Jedi Grand Master
  • ********
  • Posts: 7111
    • http://pixinsight.com/
Re: PJSR is it Ok to use Settings as a file cache?
« Reply #4 on: 2012 October 24 11:10:20 »
JavaScript is an interpreted language, so no surprise if it is slower than native code, especially for routines doing a lot of bit-level stuff such as cryptographic hashing. However, 20x seems way excessive. Let's see how the new SM 1.8.7 engine improves on that.

Anyway, PJSR in PI 1.8 will have native MD5 and SHA1 routines available in the ByteArray and File objects, so this bottleneck won't exist.

Quote
I think filename/mod date is not robust enough for my app.

Why?  ???
Juan Conejero
PixInsight Development Team
http://pixinsight.com/

Offline mschuster

  • PTeam Member
  • PixInsight Jedi
  • *****
  • Posts: 1087
Re: PJSR is it Ok to use Settings as a file cache?
« Reply #5 on: 2012 October 24 16:07:08 »
Thanks Juan. I don't like mod date, too easy to loose integrity IMO (annual time zone change or during travel, backup/restore, or various system tools that tweak fileinfo for whatever purpose, for example). Hash data assurance is better (Git for example). But image files are big and so additional integrity is costly of course, but in my app cost is probably OK.

Mike

Offline Juan Conejero

  • PTeam Member
  • PixInsight Jedi Grand Master
  • ********
  • Posts: 7111
    • http://pixinsight.com/
Re: PJSR is it Ok to use Settings as a file cache?
« Reply #6 on: 2012 October 25 08:50:17 »
Instead of computing one MD5 digest for the entire file, you can compute a set of them for a few small blocks. For example, three blocks of 100 KiB each located at the beginning, center and end of the file. The probability of a different file having the same three MD5 checksums at the same locations is virtually zero.
Juan Conejero
PixInsight Development Team
http://pixinsight.com/