First attempt at an azure guide...
Some brief notes on using Azure for PI:Azure makes an excellent platform for certain PI activities because you can rent large amounts of compute and memory cheaply, and for as long as you need it. To this end a few use cases stand out to me:
1. Creation of master/super bias files where you need to integrate potentially hundreds of files. On my Q6600 with only 4GB ram this was impossible because I would run out of memory.
2. Image Preprocessing and integration. For the reason noted above this can be challenging if you have large numbers of files.
3. Certain computationally intensive operations such as TGVDenoise. I tested this on an image from my 100d (18mp). On my Q6600 it took approx. 40mins for TGV to run. In an azure machine (Standard D5_v2 (16 Cores, 56 GB memory)) it took 100s!
How I use Azure:• I have an msdn subscription which includes a monthly azure allowance. I can therefore rent large amounts of compute without actually spending any money! You can of course get your own subscription on a PAYG basis if you don’t have a monthly allowance.
• I also have an O365 subscription which includes 1TB of Onedrive capacity. This plays an important role in being able to hop between my local machine and azure. You could of course use any similar service, you just need a way of getting your data into the cloud.
Getting started:• Within Onedrive I created an AP folder which holds all of my AP data (raw through to processed). I capture data straight into this folder structure so it is uploaded to Onedrive in near realtime.
• Within the azure portal (
http://portal.azure.com) create a virtual machine. For OS I picked Windows 10 Enterprise N x64, and for size pick D2_V2 Standard. At this stage you don’t want to pay for large amounts of compute so keep the machine small. You could pick a smaller machine but they are architecturally different (not using the latest xeons) and they can be quite slow!
• Once the machine is accessible sign in, install PI, apply patches etc. Sign into onedrive and select your AP folder as a sync folder. By default this will goto the C drive which works fine for me although the OS partition can be somewhat slow. At this point you will notice the azure internet pipe is so large, the download of all your AP data is disk limited!
• Once your data is fully in sync check PI is working as you would expect. Power the machine off.
• Back in the azure portal resize the machine; I choose D5_v2 or D14_v2.
• Power it on and connect. You now have a 16 core PC at your disposal, billed hourly!
• For preprocessing and image integration I use the temporary D disk (volatile) or a ram disk (volatile) as the destination (not my Onedrive folder). I then copy the integrated images into my AP folder.
Benefits/Tips:• You can have a 16 core machine billed hourly. A D5_v2 machine costs approx. $2 per hour. Building and running a 16 core machine at home would be a very expensive exercise.
• By leveraging the elasticity of azure you can scale the machine up and down to minimise cost. There is no point paying for 16 cores unless you want to use them, and you don’t have to.
•
Remember to stop the virtual machine as soon as you are finished with it. Otherwise you could be incurring a significant charge. If the machine shows “Stopped (deallocated)” it will have no cost (apart from storage).
• Inbound data has no cost, outbound data does. You tend to take large amounts of incoming data and integrate it into very few, relatively small files, so you might ingest 10GB of data and output 250MB. This works brilliantly, but
remember to keep the working directories out of Onedrive otherwise you will start paying for significant amounts of outbound data. At this point the huge internet pipe will hurt you because it will upload vast amounts of data in no time.
•
Normal storage is very cheap in azure, but beware of premium storage. I tried this and whilst very fast (1.5GB/s), it is expensive and 30GB was costing me approx. $2.50 per day! Storage costs accrue whether the machine is running or not. The temp drive is almost as fast, but remember it doesn’t persist between sessions. In many ways it is easier to use multiple ram drives and then copy the data you want to keep.
• If you want to minimise storage costs you can selectively sync directories (e.g. by target) so you only use storage as you need it. Because the azure internet pipe is so fast this doesn’t introduce significant delays, and I guess you could try syncing your onedrive folder to volatile memory if you were prepared to resync every time you powered the machine on.
PI Benchmark Analysis:• In general the azure machines Dn_V2 are very capable, especially compare to a normal spec desktop/laptop. However, I suspect ECC memory limits the performance compared to the fastest desktop results. The CPU speeds are also much lower than the fastest desktop results.
http://pixinsight.com/benchmark/benchmark-report.php?sn=KOQG911N6EE0QT031HMCBNF7CD2511CZhttp://pixinsight.com/benchmark/benchmark-report.php?sn=G06X582RBL69239KF28JR12YEL7MJ4RF• The cheaper A
n machines are slow; your normal pc is probably just as capable.
http://pixinsight.com/benchmark/benchmark-report.php?sn=EE98891LK5NXVK39J12E28U4I4LF7PCXhttp://pixinsight.com/benchmark/benchmark-report.php?sn=L0G7I6GX4G2JDX542OQ5BBK330159I33Feedback welcome...