Link to a reference for setting up a personal astrophotography database

rimcrazy

Member
Feb 15, 2020
5
0
I'm a noob at astrophotography and in just a short time it is easy to see one creates a plethora of files just in one nights shooting. I'm never one to reinvent the wheel if a good one exists. What I'm looking for is a structure for a astrophotography database that I can set up on my own personal servers. I have an Astrobin account but that is more for show & tell not for storing and organizing all of your source material. I'm fairly well versed in Linux and such and have a number of RAID servers in my home. I was thinking before I got too far down the road in making a completely disorganized mess of my shots it would be better to get some structure to my work. There are a number of ways of attacking this from on the low side to just keeping a well organized local file structure and then have a local GIT server on one of my network servers and then just sync things as needed. This is probably the easiest to set up and maintain but the least useful in terms of searching and retrieving lots of information as the size grows and time goes on. I would think next I'd look at a MySQL server with a PHP front end to allow for inputting and then query retrival. I'm not looking for anyone to solve my problem just possibly some input as to how others are solving/ignoring this or if there are some examples on the net that others are either using or have seen. I have to believe I'm not the first person looking to do this.

I'd also be interested in the take of what the experienced astrophotographers do with a nights shooting... for example, once you have stacked/aligned and generated your flat/dark/bias frames do you keep the original source files or just trash them? What about the source images themselves?

Sorry if this has been asked before on this forum. I did do a quick search and did not see anything pop up that looked similar.

Thanks much!
 

Geoff

Well-known member
Mar 6, 2020
73
26
I keep all the original files, the stacked masters and the final processed result(s). They are placed in a suitable folder ( Astrophotos>ngc objects>ngc 253_october 2019). I may also keep a project here.

In a separate folder I keep all calibration masters in dated subfolders. (Astrophotos>Calibration>2020>January)

I don’t keep calibrated or registered lights. These can be reproduced from the data I have kept in the unlikely event that I would need to do this.
 

Linwood

Well-known member
Jul 28, 2020
125
9
I don’t keep calibrated or registered lights. These can be reproduced from the data I have kept in the unlikely event that I would need to do this.
I've been working at this also, wondering the right things to keep and right things to delete.

In this workflow:

- Calibrate/integrate flats
- Calibrate lights
- Cosmetic correction
- Subframe to weight and approve/reject
- Alignment (+/- an extra step to build a master)
- Integrate
- Drizzle integrate

I am thinking I will stop after the subframe (Blink may be in there also). This reduces the number of subs getting rid of garbage, and the weighting (unless I change my mind) does not change. This way I also can delete all the flats, master flats, etc.

If I add more data I might rerun subframe on the whole group to identify the best frames for alignment, and if I decide to get more strict and eliminate some more subs in total.

But stopping after subframe allows me to save a lot of time on reprocess, as I never find any need to change anything in the prior steps -- I mean, your flats at that point are your flats, you can't recreate them to recalibrate.
 

Linwood

Well-known member
Jul 28, 2020
125
9
Sorry, I actually started to post for a different reason and managed to forget....

What I struggle with is what I did after the obvious stuff. If I go to reprocess, it is easy to figure out a new reference image, align image, etc. but...

Let's say I worked through the two dozen or so steps after integration -- pixelmath to combine to RGB, work after, TGV, Luminance synthesis (maybe), stretch (which ones, how much), luminance combine, curves, more curves, ...

There's a LOT of stuff I do after integration, and keeping a record is tedious and something I basically just do not do. I should, I don't. Some things are kept in the image history, but that gets lost when you create a new image say from combines of various sorts (at least some of them, I guess luminance can be added without losing history).

Does anyone have a good way?

And maybe as a suggestion: How about a "Pixinsight Diary".

I suspect most of us work on one target set in a session of work. Often we try things, undo, delete, go back, redo... Pixinsight knows all this, in fact is keeping most of it in individual image history. How about accumulating it all somewhere, wiping out the dead ends (e.g. if I do A->B->C then undo twice, who cares about B and C, they are gone).

Something we can save with the target, and if we want to reprocess, we can at least look and see how we got to that point. Heck... while parameters may change, at least in principle you could then make it a script and redo all the reprocessing at once?
 

rimcrazy

Member
Feb 15, 2020
5
0
Thanks for the replies... reassures me I'm not nuts, this is a problem. I looking at a possible solution with working with my son as he is a MySQL wizard and the problem here with copious astro photos is similar to one he's solving with his company. Unfortunately if I get a solution I can't offer it up to the community as an open source solution as it is proprietary to his company's work but I can share, in general, what we are looking at a file management type of solution where you essentially store every image taken as a record. No need to group in multiple folders, etc. Everything comes down to the picture file. You store it in the DB with a table that had enough fields that you can always query the DB to get what you need. As you figure out from Blink and other parts of the process that there are files/records not needed you just delete these from the DB. What is nice about this is that over time you have a searchable DB so you can recall with a query to get, say, all of the images you've taken of M43, or images taken with a specific camera or specific sky conditions. If you put that data in then you have it to query with to get the info back out. So the workflow is then basically go shoot for an evening and when done dump it all into the DB. Sometimes you have the time to post process before but say you don't but you just shoot for a week and stuff it all into the DB. Come the weekend you do a query and define a task, like you want to post process with PI and the DB will get the files from a shoot, even over multiple nights, create some working folders on your PC/Mac and dump the respective image records into an orderly fashion for post processing. When your done you then load new files generated that you want to keep and create a query that will delete files/images you determined during your PI session that are no longer needed and it will clear the DB of these records. That is the thought at least. This way too all of the images/records are in one location and for storage convenience you can even keep the DB as a tarball so it takes less space. Since it's a single file I can then use my redundant NAS as a backup and the DB can write to both so I have multi backups of my DB across multiple machines.

I'm curious in doing this would it be necessary to query the FITS header for info or write custom entries into the FITS header? Is this something I can even do? I don't know enough about FITS to know.

Linwood, you bring up something I'd not thought of but only because I've not done enough post processing. I can easily relate to this however as I do a fair amount of 3D work and find the same issue in that I need to keep a log sometimes of what I did to get from Point A to Point B because it is a complicated step but I know I'll never remember it. LOL. Having a log method for recording steps makes a lot of sense and it would be something you would like to keep with the records of a finished image. I'll have to think about that one. For that I, if I get something I might be able to share that as it's really only useful for this kind of work.
 

fredvanner

Well-known member
Apr 17, 2019
828
90
70
Wells, Somerset, UK
This is not, of course, a new problem, nor a specifically astrophotographic problem (I have 20 years of unindexed family photos from 20 different cameras on my NAS...).
One problem is that all the main (astro)photography applications operate on files in a folder structure. Moving files into a database would be an archive activity, and would require user action to (a) select files for archive, and (b) select files for extraction if required for further processing. An alternative view is a "flat, structure-agnostic internet-style" model, whereby files are stored in any arbitrary folder structure that is convenient to the user, but are then indexed by a sophisticated tool that maintains an index database, using an intelligent search and metadata collection strategy (comparable to Google internet search indexing). Periodic scans would add/update/delete content. Search facilities could use complex metadata-based search criteria (e.g. "find all object='M42', filter=['Ha' or 'Oiii'], integratedframes>10, after 2015"). Subsets of files identified by metadata could be backed up and restored without specific reference to their location in the folder structure (e.g. "backup all object="M31" from 2010 to 2020").
Of course, this would not be easy...;)
 

Linwood

Well-known member
Jul 28, 2020
125
9
I do regular photography as well, and use Lightroom as a DAM (Digital Asset Manager, there's tons of stuff written under that phrase). But there I have tens of thousands of individual photos often with loose groupings to an event, but also lots of cross groupings to persons or organizations. Ability to query in various ways is useful.

I honestly do not see that here. Most of what I shoot may be one target a night, maybe two, interspersed with lots of nights of clouds. But more importantly all the cross connections one keeps track of in a DAM do not exist -- there's never a need to associate M31 shots with M42 shots because they share a common athlete or such. I won't be doing a "dump all the subs taken with this camera at this rotation angle for any targets" or similar meta queries, the way I do now with "Show me everywhere John Smith is in a shot also for University X".

Which is what your solution sounds more like, a database layer overtop to serve as a queryable index. Not a bad thing, just seems overkill. But even then I wonder if an existing DAM program may not be better than rolling your own.

As to keeping the files IN the database as opposed to a database ABOUT the files: Relational databases are generally not THAT good at keeping big binary blobs internally. And image data does not compress well, or quickly. Most commercial products layered on databases that store large binary objects maintain performance by storing the blobs in the file system and just point to them, essentially becoming OS file folders with a different look. Those that store them inside the database have all sorts of special handling for that data, partitioning its physical storage separate from columns with more modest data, etc. Getting it to work well, especially to perform well, is a challenge and art form. One of the most widely used examples is Lightroom -- it does not store any photo in its catalog despite using that terminology, it just points to them. If you do store them inside the database, I certainly would not start with MySQL for something that may quickly store terabytes; if free is needed maybe Postgresql. But I'd seriously consider leaving the files in the file system and just store metadata. (I am really new to astrophotography, but I did database and systems design all my life).

The bigger astrophotography long term challenge I think is minimizing space - individual subs are 120mb, processed ones over 200mb, compressed still 180mb. And a typical target will have hundreds of original subs. Culling what I will never need, especially not precisely not quite redundant copies is my challenge. I mean, the original and the calibrated light are not the same, in theory I may want to improve on the calibration; in practice I won't. So to me they are redundant. So cull original, keep calibrated. Some may do the opposite. But through the long process chain (calibrated, cosmetics, subframed, aligned, integrated, drizzled) these ambiguities of "redundant" abound, and where and how you add more data varies. For example, if you keep originals not calibrated subs, you also need to keep master flats and darks; or are you keeping originals of those also. Then when you reprocess you need to match everything by date if recalibrating before combining for one big run afterwards.

Sorry, I do not mean to throw cold water on your database idea, but I do think before you put a lot of effort into it, it is worth trying to work through your use cases, see if it is worth the effort.
 

fredvanner

Well-known member
Apr 17, 2019
828
90
70
Wells, Somerset, UK
I am also a systems engineer with a software background including decades of database design, and I don't disagree with you at all. :)
The key point I was trying to make is that it is better to leave the files where they are, and maintain an index to them. If this was easy to do manually we would all be doing it; since this thread sugggests that we aren't, I suggest that an index-building tool is a reasonable solution (possibly the only solution if the alternative is to manually process all your existing archive data). Since many astro files have lots of accessible metadata (e.g. in fits or xisf headers), this tool could use harvested metadata (as well as file / folder names, file extensions, creation dates, etc.) to index the data.
 

rimcrazy

Member
Feb 15, 2020
5
0
All good points. I don't use Lightroom anymore for personal photos, I use Capture One as I don't like the idea of Adobe locking up my organization of photos when I quit paying them. On the blobs, databases, etc. I would agree with all of that but..... the databases my son works with are genetic DNA files. Each record is huge.... the databases he creates typically span multiple servers in multiple physical sites with 100's of millions of records. Because I'm not doing the heavy lifting I just going to be a test bed if you will for his back end all I have to do is some custom PHP to make the form fit my function instead of his.... which is really not all that hard. Some background on the post. I had posted the original question and then was chatting with my son describing the problem this weekend and he goes "That, in principal, is exactly what we are doing right now. You want to be a test bed?" so that is where I'm at now. On the issue of exporting to a specific file structure that is not has hard as it sounds. In the work they do for their customers they already have to do the same basic thing. They solve it with portable docker container that have scripts to do this task. Taking an existing one to modify it for my needs is far simpler than doing one from scratch. So yea, doing it from scratch..... a major PITA and not trivial. Porting an existing one to my specific file structure, not too hard at all.

As mentioned, this is not a new problem. The astrophotography twist in its workflow is interesting in that yes you don't need to organize across multiple "athletes" as such which makes it easier but goodness gracious you create a ton of files in just one evenings successful shoot. This is what made me step back and go "Ah oh...I better get this organized and fast or I'm going to be in real trouble". An interesting side note on this. I had the opportunity to do some work with some professional astronomers here in Arizona. I was shocked to see what, IMHO, the abysmal record keeping of how the data the collect to do their work. Goodness they have scopes costing 10's of millions of dollars and every person has their own way of keeping, organizing data and it's splattered all over the place. It obviously works but the engineer in me looks at that and just wonders what a waste in that they don't keep data in such a fashion to allow each other to share and use each others work. I'm sure there is something here I don't understand but whatever.

I appreciate the feedback very much. All of this being said to Linwoods comment I may in the end, just create some Capture One catalogs and go with that. I like Capture One and in terms of dbase and basic photo editing capabilities its not an issue. Capture One vs Lightroom is just a Chevy vs Ford issue. Both are very capable programs and both have their +'s and -'s. I don't know however if it accepts a .fits file but I will look into that.
 

Linwood

Well-known member
Jul 28, 2020
125
9
Will Capture One index xifs files?

My reaction was mostly to the idea of stuffing files into a database-- a metadata skimmer and organizer sounds quite interesting.
 

pfile

PTeam Member
Nov 23, 2009
6,170
180
i primarily use a filename based scheme to organize my raw subexposures. however i did at one point write a script to figure out what site/camera/telescope/rotator angle/filter/date a flat was taken and stuff that in a sqlite3 database. then it can be queried for a set of flats that match a particular light within a specified date and rotator tolerance and spit out a list of pathnames of flats. i also made a script to create a calibration icon from a list of files, so together i could generate a set of calibrated flat subs with PI. this all predates BPP and WBPP.

this was all using perl which i understand to be old news at this point. however there is a fits parser available for perl and that's what i used to extract most of the data about the file, then had some heuristics that would identify the site/overall project. and it was all under unix; not sure how this would be accomplished on windows. anyway i'm sure astropy contains a fits parser so one could also use python and sqlite3 for this.

so - i think a DB that contains all the metadata for lights/flats/darks/bias could be very useful, but as others have pointed out i dont think storing the files themselves in the database is a good idea.
 

fredvanner

Well-known member
Apr 17, 2019
828
90
70
Wells, Somerset, UK
... since PI is scriptable and has all the facilities for metadata access, perhaps such a utility could be written as a PI script / process. This might allow really smart options, like celestial coordinate searches.
 

rimcrazy

Member
Feb 15, 2020
5
0
Will Capture One index xifs files?

My reaction was mostly to the idea of stuffing files into a database-- a metadata skimmer and organizer sounds quite interesting.
So a very quick look, I don't think Capture One will index a .xifs file. They have a tool which you can use to look at it but it won't index. Also neither Capture One nor Lightroom will import a .fits file. There is a Fits liberator plug-in for Photoshop but that is all I found. Obviously we all use PI to edit our .fits files but for storage LR or CO would have been nice.
 

pfile

PTeam Member
Nov 23, 2009
6,170
180
to my knowledge the only 3rd party filesystem support for XISF metadata is the Observer Pro finder plugin for macosx. it allows spotlight searches on xisf by parsing the FITS header data and adding it to the mac's spotlight database.

but i don't use windows so maybe there's something available on windows?