SQL Azure Pricing Changes

SQL Azure just got some better pricing!  Here are the details: Database Size Price Per Database Per Month 0 to 100 MB Flat $4.995 Greater than 100 MB to 1 GB Flat $9.99 Greater than 1 GB to 10 GB $9.99 for first GB, $3.996 for each additional GB Greater than 10 GB to 50 GB $45.954 for first 10 GB, $1.998 for each additional GB Great than 50 GB to 150 GB $125.874 for first 50 GB, $0.999 for each additional GB Notice the new 0 to 100 MB tier – finally, a good option for small databases, utility databases, blogs, etc.   Note, however, that when setting up a database, there is a maxsize property – currently, the maxsize can be set to 1 GB, 5 GB, 10 GB, and then in 10 GB increments up to 150 GB.  (The 1 GB and 5 GB belong to the Web Edition, and the larger are part of the Business Edition.  Both offer the same availability/scalability.) So, if a database is set to maxsize of 1 GB, as long as the size stays at or below 100 MB, the reduced pricing will be in effect.  The price is calculated daily based on the peak size of the database for that day, and amortized over the month.  This is a breakdown of the changes from the previous pricing model: GB Previous Pricing New Pricing New Price/GB Total % Decrease 5 $49.95 $25.99 $5.20 48% 10 $99.99 $45.99 $4.60 54% 25 $299.97 $75.99 $3.04 75% 50 $499.95* $125.99 $2.52 75% 100 $499.95 * $175.99 $1.76 65% 150 $499.95* $225.99 $1.51 55% *Previous prices 50GB and larger reflect price cap of $499.95 announced December 12, 2011. For more information, check out the Accounts and Billing in SQL Azure page.  Also, my colleague Peter Laudati has a nice write upon the changes!

Storing Data in Azure: SQL, Tables, or Blobs?

While building the back end to host our “Rock, Paper, Scissors in the cloud” game, we faced a situation of where/how to store the log files for the games that are played.   In my last post, I explained a bit about the idea; in the game, log files are essential at tuning your bot to play effectively.  Just to give a quick example of what the top of a log file might look like:  In this match, I (bhitney) was playing a house team (HouseTeam4) … each match is made up of potentially thousands of games, with one game per line.    From the game’s perspective, we only care about the outcome of the entire match, not the individual games within the match – but we need to store the log for the user.  There’s no right or wrong answer for storing data – but like everything else, understanding the pros and cons is the key.  Azure Tables We immediately ruled out Azure Tables, simply because the entity size is too big.   But what if we stored each game (each line of the log) in an Azure Table?    After all, Azure Tables shine at large, unstructured data.   This would be ideal because we could ask specific questions of the data – such as, “show me all games where…”.  Additionally, size is really not a problem we’d face – tables can scale to TBs.  But, storing individual games isn’t a realistic option.  The number of matches played for a 100 player match 4,950.  Each match has around 2,000 games, so that means we’d be looking at 9,900,000 rows per round.   At a few hundred milliseconds per insert, it would take almost a month to insert that kind of info.  Even if we could get latency to a blazing 10ms, it would still take over a day to insert that amount of data.    Cost wise, it wouldn’t be too bad: about $10 per round for the transaction costs. Blob Storage Blob storage is a good choice as a file repository.  Latency-wise, we’d still be looking at 15 minutes per round.  We almost went this route, but since we’re using SQL Azure anyway for players/bots, it seemed excessive to insert metadata into SQL Azure and then the log files into Blob Storage.  If we were playing with tens of thousands of people, that kind of scalability would be really important.   But what about Azure Drives?   We ruled drives out because we wanted the flexibility of multiple concurrent writers.  SQL Azure Storing binary data in a database (even if that binary data is a text file) typically falls under the “guilty until proven innocent” rule.  Meaning: assume it’s a bad idea.  Still, though, this is the option we decided to pursue.  By using gzip compression on the text, the resulting binary was quite small and didn’t add significant overhead to the original query used to insert the match results to begin with.  Additionally, the connection pooling makes those base inserts incredibly fast – much, much faster that blob/table storage. One other side benefit to this approach is that we can serve the GZip stream without decompressing it.  This saves processing power on the web server, and also takes a 100-200k log file to typically less than 10k, saving a great deal of latency and bandwidth costs. Here’s a simple way to take some text (in our case, the log file) and get a byte array of the compressed data.  This can then be inserted into a varbinary(max) (or deprecated image column) in a SQL database: 1: public static byte[] Compress(string text) 2: { 3: byte[] data = Encoding.UTF8.GetBytes(text); 4: var stream = new MemoryStream(); 5: using (Stream ds = new GZipStream(stream, CompressionMode.Compress)) 6: { 7: ds.Write(data, 0, data.Length); 8: } 9:  10: byte[] compressed = stream.ToArray(); 11:  12: return compressed; 13: } And to get that string back: 1: public static string Decompress(byte[] compressedText) 2: { 3: try 4: { 5: if (compressedText.Length == 0) 6: { 7: return string.Empty; 8: } 9:  10: using (MemoryStream ms = new MemoryStream()) 11: { 12: int msgLength = BitConverter.ToInt32(compressedText, 0); 13: ms.Write(compressedText, 0, compressedText.Length - 0); 14:  15: byte[] buffer = new byte[msgLength]; 16:  17: ms.Position = 0; 18: using (GZipStream zip = new GZipStream(ms, CompressionMode.Decompress)) 19: { 20: zip.Read(buffer, 0, buffer.Length); 21: } 22:  23: return Encoding.UTF8.GetString(buffer); 24: } 25: } 26: catch 27: { 28: return string.Empty; 29: } 30: }   In our case, though, we don’t really need to decompress the log file because we can let the client browser do that!  In our case, we have an Http Handler that will do that, and quite simply it looks like:   1: context.Response.AddHeader("Content-Encoding", "gzip"); 2: context.Response.ContentType = "text/plain"; 3: context.Response.BinaryWrite(data.LogFileRaw); // the byte array 4: context.Response.End(); Naturally, the downside of this approach is that if a browser doesn’t accept GZip encoding, we don’t handle that gracefully.   Fortunately it’s not 1993 anymore, so that’s not a major concern.

SQL Azure Logins

SQL Azure currently has fairly limited management capabilities.  When you create a database, you receive an administrator account that is tied to your login (you can change the SQL Azure password, though).  Because there is no GUI for user management, there’s a temptation to use this account in all your applications, but I highly recommend you create users for your application that have limited access.  If you limit access to only stored procedures, you need to specify execute permissions.  Assuming you want your connection to have execute permissions on all stored procedures, I recommend a new role that has execute permissions.  That way, you can simply add users to this role and as you add more stored procedures, it simply works.  To create this role, you can do something like this: CREATE ROLE db_executor GRANT EXECUTE TO db_executor Now in the master database (currently, you need to do this in a separate connection – just saying ‘use master’ won’t work) you can create your login for the database: CREATE LOGIN MyUserName WITH PASSWORD = 'Password'; In your application database, you need to create a user – in this case, we’ll just create a user with the same name as the login: CREATE USER MyUserName FOR LOGIN MyUserName; Next, we’ll specify the appropriate roles.  Depending on your needs, you may need only datareader.  I recommend db_owner only if necessary. -- read/write/execute permissions EXEC sp_addrolemember N'db_datareader', N'MyUserName' EXEC sp_addrolemember N'db_datawriter', N'MyUserName' EXEC sp_addrolemember N'db_executor', N'MyUserName' -- only if you need dbo access: EXEC sp_addrolemember N'db_owner', N'MyUserName' You can continue to customize as necessary, as long as you are familiar with the appropriate T-SQL. 

Azure SLA Confusion

Azure SLA is something that gets discussed quite a bit but there’s something that I see causing a bit of confusion.  The SLA for Azure compute instances states: For compute, we guarantee that when you deploy two or more role instances in different fault and upgrade domains, your internet facing roles will have external connectivity at least 99.95% of the time. Some folks (for example, this post) incorrectly conclude that you need to deploy your solution across 2 or more datacenters to get this SLA.  Actually, that’s not true – you just need to make sure they are in different fault and upgrade domains.  This is something that is typically done by default.  You can think of a fault domain as a physical separation in a different rack, so if there’s a hardware failure on the server or switch, it only affects instances within the same fault domain.  Upgrade domains are logical groupings that control how deployments are upgraded.  For large deployments, you may have multiple upgrade domains so that all roles within an upgrade domain are upgraded as a group. To illustrate this, I spun up 3 instances of Worldmaps running on my local Dev Fabric.  I have an admin tool in the site that shows all current instances, their role, and their domain affiliation: The admin page uses the RoleEnvironment class to check status of the roles (more on this in another post), but also display their fault and upgrade domains.  (A value of “0f” is fault domain 0.  “0u” is upgrade domain 0, and so on).  So by default, my three instances are in separate fault and upgrade domains that correspond to their instance number. All of these instances are in the same datacenter, and as I long as I have at least 2 instances and ensure they have different fault and upgrade domains (which is the default behavior), I’m covered by the SLA.  The principal advantage of keeping everything within the same datacenter is cost savings between roles, storage, and SQL Azure.  Essentially, any bandwidth within the data center (for example, my webrole talking to SQL Azure or Azure Storage) incurs no bandwidth cost.  If I move one of my roles to another datacenter, traffic between datacenters is charged.  Note however there are still transaction costs for Azure Storage. This last fact brings up an interesting and potentially beneficial side effect.  While I’m not trying to get into the scalability differences between Azure Table Storage and SQL Azure, from strictly a cost perspective, it could be infinitely more advantageous to go with SQL Azure in some instances.   As I mentioned in my last post, Azure Storage transaction costs might creep up and surprise you if you aren’t doing your math.  If you’re using Azure Table Storage for session and authentication information and have a medium volume site (say, less than 10 webroles but that’s just my off the cuff number – it really depends on what your applications are doing), SQL Azure represents a fixed cost whereas Table Storage will vary based on traffic to your site. For example, a small SQL Azure instance at $9.99/month = $0.33/day.  Azure Table transactions are $0.01 per 10,000.   If each hit to your site made only 1 transaction to storage, that would mean you could have 330,000 hits per day to achieve the same cost.   Any more, and SQL Azure becomes more attractive, albeit with less scalability.   In many cases, it’s possible you wouldn’t need to go to table storage for every hit, but then again, you might make several transactions per hit, depending on what you’re doing.  This is why profiling your application is important. More soon!

My Apps

Dark Skies Astrophotography Journal Vol 1 Explore The Moon
Mars Explorer Moons of Jupiter Messier Object Explorer
Brew Finder Earthquake Explorer Venus Explorer  

My Worldmap

Month List