Building for Cost Part 2

In the last post, we looked at ways to separate content using different pieces of Windows Azure for the benefit of creating a more cost-effective solution using Windows Azure Web Sites (WAWS) and Windows Azure Storage for storing images and other binary/static files.   Although we’re looking at this just for cost efficacy, frankly that approach just makes sense from an availability and performance aspect, too. Another fantastic service is Windows Azure Mobile Services (WAMS), but unsurprisingly, it has its own cost structure outlined here: Similar to WAWS, you can have up to 10 services for free.  But, the way these services are metered are completely different, and it might change the way we build an application. First, let’s discuss the free tier.  The current metering, limited to 100 devices, means that using an authenticated client (that is, any client using the SDK) renders this tier useless for anything other than testing.  For example, here’s my low-use application: We’re not really incurring many API calls, but our device count is pretty high.  When I designed the app, these meters weren’t in place, but if they were I’d design the app differently.  Hopefully, we’ll see some changes in this as the current plan is a bit inflexible.  What this means, though, is that for my app to keep working, I need to upgrade to the $25/mo plan.  By making a few specific decisions early on, we can build the application to take advantage of both WAMS and WAWS.  For example, in WAMS, custom API calls can be made and invoked over HTTP.  I leveraged this in Brew Finder for several items: brewerydb’s callback, live tiles, and web services that don’t require authentication.  Because these calls are made from “non-devices,” the usage looks like so: I’m not exactly sure why this is showing zero bytes out (it could be for the time duration shown in this graph, it’s not significant enough), but the 0 devices and API calls count is accurate.  We also have the option to deploy another WAWS with node.js, if we’d like to stick with server-side javascript.  Remember, the database that is used can certainly be shared by any number of WAWS, WAMS, etc.  So, if you need to curb back on your API calls, offload that work to a WAWS instead where the calls are not metered if you don’t need WAMS specific features (like authentication).  Summary The biggest challenge in developing a cloud backed app is guesstimating the usage.  The first step, though, is to understand what is being measured – API calls, devices, bandwidth, CPU time, etc.   Keeping the application architecture spread across multiple services for different functions (authentication, live tiles, etc.) is a great way to make the application flexible so you can take advantage of the way each service is metered, bearing in mind this is always subject to change. Having trouble estimating or getting started?  Drop me a note or leave a comment on this post.  I’d be happy to help you work through some ideas. 

Building for Cost using Windows Azure

One of the great advantages of having a green field cloud-based infrastructure is being able tailor the app using specific features of the platform.  Sometimes, cost isn’t an issue.  But, many times – particularly for indie devs – it is.  Scale isn’t the foremost concern here, and often times dependability isn’t, either.  Rather, it’s flexibility and cost.  In this post, I’m going to dissect a few ways to take advantage of Azure and save some coin in the process.  Estimating Azure costs continues to be one of the most frequent questions I’m asked. First a couple of housekeeping issues:  Windows Azure has a pricing calculator that shows the current prices and allows you to turn the knobs and dials to forecast the price.  I’m writing this blog post on 7/24/2013, and since prices are dynamic, this post will be dated as prices modulate so keep that in account.  Also, I’m doing rough math to keep this simple, and picking zone 1 for bandwidth costs (which is everywhere except East/Southeast Asia where rates are slightly higher). What hat are you wearing today? I’m approaching this as an indie dev who wants to create application services and host content without incurring a big expense.  I need reasonable reliability (no under the desk servers, but I don’t need geoloadbalancing, either), and I’m not willing to pay for 2+ servers for an SLA.  In short, my apps currently don’t make enough money.  These data points below are taken from my current apps you can see on my blog sidebar. Windows Azure Virtual Machines cost $0.02/hr or about $15/mo.  If you’re the kind of person who likes to install all kinds of stuff, this might be for you.  It’s a good option to keep in mind because unlike most other offerings, this is completely customizable.  I could load up this server and support all my apps, if I wanted to, but for now, this isn’t anything I’m interested in. First, let’s look at Windows Azure Web Sites (WAWS).  With WAWS, you can easily deploy web sites and apps, but lose some of the customization.  You can either spin up an app based off an image (like Wordpress) or deploy your own web site.   You can host up to 10 sites for free, or upgrade to shared mode -- required for custom domains and costs about $10/mo.  If we stick with Free, we’re metered to about 1 hour of CPU time per day (which means deploying your own chess engine might be a bad idea).  This is also sub-metered to make sure we’re not gobbling up too much in any given block of time – here’s an image of the dashboard for my Dark Skies / Windows 8 app: File system storage and memory usage you generally don’t have to worry about.  Your app is unlikely to change its size footprint, and you’d have to have a specialized case to warrant memory concerns.  The free tier is also limited to 165MB/day of outbound bandwidth per day.   This is the one we need to watch.  More users often mean more CPU, and can mean higher memory usage, but certainly will linearly impact the bandwidth our app uses.  In addition, I can see most of these values plotted for the last week on the dashboard: This WAWS has a very predictable usage pattern.  I’m not concerned about the outbound bandwidth yet, but that’s by design.  By knowing this value is metered strictly, it’s best to keep images and other static files (such as JS files) elsewhere, such as in blob storage, as those will tear up the 165 MB/day limit.  So let’s take a step back for a moment.  What is this WAWS doing?  It’s serving Windows 8 live tiles for Dark Skies, like so: The ASP.NET MVC based site has an engine that determines the moon’s rise/set times, and passes this down to a view which looks like: <tile> <visual> <binding template="TileWideSmallImageAndText03" branding="name"> <image id="1" src="http://lightpollution.blob.core.windows.net /icons/80/waning_gibbous.png" alt="moon phase"/> <text id="1">Waning Gibbous Rises at 09:43 p.m. Sets at 09:41 a.m.⁺</text> </binding> <binding template="TileSquarePeekImageAndText04" branding="name"> <image id="1" src="http://lightpollution.blob.core.windows.net /icons/150/waning_gibbous.png" alt="moon phase"/> <text id="1">Waning Gibbous Rises at 09:43 p.m., sets at 09:41 a.m.⁺</text> </binding> </visual> </tile> .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } There are a number of tiles that get served from this site, and at this point in time, it’s all it does.  Pretty simple for an entire website.  All of the moon images are stored in blob storage instead of in the WAWS.  This image is pulled by every Windows 8 app updating its live tile, which is about 1,100 requests per hour.  The outbound text adds up to about 20MB/day, or about 600MB/mo.  If we assume that each one of those requests are displaying a 10 KB image like in the tile above, that’s 1,100 x 24= 26,400 requests * 10 KB = 264,000 KB/day, or 264 MB/day.  Ouch!   Remember, we’re limited to 165MB/day – so we can keep within those limits by placing the images in blob storage.  Blob storage isn’t free, so this would present a decision: do you put the images in blob storage, or keep them in the WAWS but upgrade to the shared tier?  Here’s a quick look at the shared tier: Using a Shared WAWS There’s no limiting of bandwidth, and we’ll simply pay for what we use.  In our above example, 264MB/day is about 7.5 GB/mo.  If we look at the data transfer rate page, the first 5 GB/mo is free, followed by $0.12 / GB.   Since our total outbound including the raw text puts us at 8 GB/mo, we’re only charged on 3 GB, and spending $0.36/mo on bandwidth.  Remember, too, that we upgraded to Shared mode to unlock the restrictions, which is $9.68/mo. All said, we’re looking at $10.04/mo if we’d like to put the entire site in a single WAWS. Using Storage In this example, let’s break out the images (as it currently is in production) and compare costs.  Our WAWS is free, and looking at the meters, we’re comfortable within the usage restrictions.   The only thing we need to worry about is our blob storage account.   Blob storage, if you aren’t familiar with it, is a service that holds any kind of binary data, like files, images, and documents.  It’s scalable and an easy way to separate the code of a website from the content, and compares to Amazon’s S3. Looking at the storage pricing page, our images consume a a few MB at most, so we need to round this cost down to $0.01 as it’s just impossibly small at this scale.   Transactions are incurred for every get request, at the cost of $0.01 per 100,000 transactions.  Since we’ve got 26,400 requests/day, that’s 792,000 requests/mo, or about $0.08.   The outbound bandwidth remains unchanged. All said, we’re looking at $0.36/mo for bandwidth and $0.09/mo for storage, for a total of $0.45/mo. Conclusion Obviously it’s a big cost savings to stick with the free Windows Azure web site and offload the assets into blob storage.   We don’t have an SLA on the free tier, but then, if a request fails, it’s a silent fail as the tile skips that update.  The user isn’t likely going to notice.  The difference gets a little more interesting when looking at Windows Azure Mobile Services, and I’ll do that in the next post…

Azure SLA Confusion

Azure SLA is something that gets discussed quite a bit but there’s something that I see causing a bit of confusion.  The SLA for Azure compute instances states: For compute, we guarantee that when you deploy two or more role instances in different fault and upgrade domains, your internet facing roles will have external connectivity at least 99.95% of the time. Some folks (for example, this post) incorrectly conclude that you need to deploy your solution across 2 or more datacenters to get this SLA.  Actually, that’s not true – you just need to make sure they are in different fault and upgrade domains.  This is something that is typically done by default.  You can think of a fault domain as a physical separation in a different rack, so if there’s a hardware failure on the server or switch, it only affects instances within the same fault domain.  Upgrade domains are logical groupings that control how deployments are upgraded.  For large deployments, you may have multiple upgrade domains so that all roles within an upgrade domain are upgraded as a group. To illustrate this, I spun up 3 instances of Worldmaps running on my local Dev Fabric.  I have an admin tool in the site that shows all current instances, their role, and their domain affiliation: The admin page uses the RoleEnvironment class to check status of the roles (more on this in another post), but also display their fault and upgrade domains.  (A value of “0f” is fault domain 0.  “0u” is upgrade domain 0, and so on).  So by default, my three instances are in separate fault and upgrade domains that correspond to their instance number. All of these instances are in the same datacenter, and as I long as I have at least 2 instances and ensure they have different fault and upgrade domains (which is the default behavior), I’m covered by the SLA.  The principal advantage of keeping everything within the same datacenter is cost savings between roles, storage, and SQL Azure.  Essentially, any bandwidth within the data center (for example, my webrole talking to SQL Azure or Azure Storage) incurs no bandwidth cost.  If I move one of my roles to another datacenter, traffic between datacenters is charged.  Note however there are still transaction costs for Azure Storage. This last fact brings up an interesting and potentially beneficial side effect.  While I’m not trying to get into the scalability differences between Azure Table Storage and SQL Azure, from strictly a cost perspective, it could be infinitely more advantageous to go with SQL Azure in some instances.   As I mentioned in my last post, Azure Storage transaction costs might creep up and surprise you if you aren’t doing your math.  If you’re using Azure Table Storage for session and authentication information and have a medium volume site (say, less than 10 webroles but that’s just my off the cuff number – it really depends on what your applications are doing), SQL Azure represents a fixed cost whereas Table Storage will vary based on traffic to your site. For example, a small SQL Azure instance at $9.99/month = $0.33/day.  Azure Table transactions are $0.01 per 10,000.   If each hit to your site made only 1 transaction to storage, that would mean you could have 330,000 hits per day to achieve the same cost.   Any more, and SQL Azure becomes more attractive, albeit with less scalability.   In many cases, it’s possible you wouldn’t need to go to table storage for every hit, but then again, you might make several transactions per hit, depending on what you’re doing.  This is why profiling your application is important. More soon!

Thoughts on Windows Azure Pricing…

There are a LOT of posts out there talking about Azure pricing.  There’s the Azure TCO Calculator, and some good practices scattered out there that demystify things.  Some of these bear repeating here, but I also wanted to take you through my math on expenses – how you design your app can have serious consequences on your pricing.  So let’s get the basic pricing out of the way first (just for the Azure platform, not AppFabric or SQL Azure): Compute = $0.12 / hour Storage = $0.15 / GB stored / month Storage transactions = $0.01 / 10K Data transfers = $0.10 in / $0.15 out / GB - ($0.30 in / $0.45 out / GB in Asia) Myth #1:  If I shut down my application, I won’t be charged. Fact:  You will be charged for all deployed applications, even if they aren’t running.  This is because the resources are allocated on deployment, not when the app is started.  Therefore, always be sure to remove deployments that aren’t running (unless you have a good reason to keep them there). Myth #2:  If my application is less CPU intensive or idle, I will be charged less. Fact:  For compute hours, you are charged the same whether your app is at 100% CPU or idle.  There’s some confusion (and I was surprised by this, too) because Azure and Cloud provisioning is often referred to as "consumption based” and (in this case, incorrectly) compared to a utility like electricity.  A better analogy is that of a hotel room.  An Azure deployment is reserving a set of resources.  Like the hotel room, whether you use it or not doesn’t change the rate. On the plus side, Compute hours are a fairly easy thing to calculate.  It’s the number of instances in all of your roles * $.12 for small VM instances.  A medium instance (2 core) is $.24, and so on. Myth #3:  There’s no difference between a single medium instance and two small instances. Fact:  While there is no difference in compute price, there is significant difference in that the two small instances offer better redundancy and scalability.  It’s the difference between scaling up vs scaling out.  The ideal scenario is for an application that can add additional instances on demand, but the reality is that applications need to written to support this. In Azure, requests are load balanced across all instances of a given webrole.   This complicates session and state management.  Many organizations do what is called sticky persistence or sticky sessions when implementing their own load balancing solution in their applications.  When a user visits a site, they will continue to visit the same server for their entire session.  The downside of this approach is that should the server go down, the user is redirected to another server and loses all state information.  However, it’s a viable solution in many scenarios, but not one that Azure load balancing offers. Scaling up is done by increasing your VM size to medium (2 core), large (4 core), or XL (8 core), with more RAM allocated at each level.  The single instance becomes much more powerful, but your limited by the hardware of a single machine. In Azure, the machine keys are synchronized among instances so there is no problem with cookies and authentication tokens, such as those in the ASP.NET membership providers.  If you need session state information, this is where things get more complicated.  I will probably get zinged for saying this, but there is currently no good Azure-based session management solution.  The ASP Providers contained in the SDK does have a Table Storage Session State demo, but the performance isn’t ideal.   There are a few other solutions out there, but currently the best bet is to not rely on session state and instead use cookies whenever possible. Now, having said all this, the original purpose of the post:  I wanted to make sure folks understood transactions costs with Azure Storage.  Any time your application so much as thinks about Storage, it’s a transaction.  Let’s use my Worldmaps site as an example.  This is not how it works today, but very easy could have been.  A user visits a blog that pulls an image from Worldmaps.  Let’s follow that through: Step Action Transaction # 1 User’s browser requests image.   2 Worker roll checks queue. (empty) 1 3 If first hit for map (not in cache), stats/data pulled from Storage. 2 4 Application enqueues hit to Azure Queue. 3 5 Application redirects user to Blob Storage for map file. 4 6 Worker dequeues hit. 5 7 Worker deletes message from queue. 6 While #3 is only on first hit for a given map, there are other transactions going on behind the scenes and if you are using the Table Storage Session State provider … well, it’s another transaction per hit (possibly two, if session data is changed and needs to be written back to storage). If Worldmaps does 200,000 map hits per day (not beyond the realm of possibility but currently a bit high), then 200,000 * 6 = 1,200,000 storage transactions.  They are sold in 10,000 transactions for $.01, so that’s 120 “units” or $1.20 per day.  Multiply that by 30 days, and that’s about $36/mo for storage transactions alone – not counting the bandwidth or compute time. I realized this early on and as a result I significantly changed the way the application works.  Tips to save money: If you don’t need durability, don’t use Azure queues.  Worldmaps switches between in-memory queues and Azure queues based on load, configuration, and task.  Since queues are REST calls, you could also make a WCF call directly to another Role. Consider scaling down worker roles by multithreading particularly for IO heavy roles.  Also, a webrole’s run method (not implemented) simply calls Thread.Sleep(-1), so why not override it to do processing?  More on this soon… SQL Azure may be cheaper, depending on what you’re doing.  And potentially faster because of connection pooling. If you aren’t interested in CDN, use Azure Storage only for dynamic content. Don’t forget about LocalStorage.  While it’s volatile, you can use it as a cache to serve items from the role, instead of storage. Nifty backoff algorithms are great, but implement only to save transaction cost.  It won’t affect compute charge. Take advantage of the many programs out there, such as hours included in MSDN subscriptions, etc. Next up will be some tips on scaling down and maximizing the compute power of each instance.

My Apps

Dark Skies Astrophotography Journal Vol 1 Explore The Moon
Mars Explorer Moons of Jupiter Messier Object Explorer
Brew Finder Earthquake Explorer Venus Explorer  

My Worldmap

Month List