In part 1, I detailed some of the specifics in getting the Rock, Paper, Azure (RPA) up and running in Windows Azure. In this post, I’ll start detailing some of the other considerations in the project – in many ways, this was a very real migration scenario of a reasonably complex application. (This post doesn’t contain any helpful info in playing the game, but those interested in scalability or migration, read on!) The first issue we had with the application was scalability. Every time players are added to the game, the scalability requirements of course increases. The original purpose of the engine wasn’t to be some big open-ended game played on the internet; I imagine the idea was to host small (10 or less players). While the game worked fine for < 10 players, we started to hit some brick walls as we climbed to 15, and then some dead ends around 20 or so. This is not a failing of the original app design because it was doing what it was intended to do. In my past presentations on scalability and performance, the golden rule I always discuss is: you have to be able to benchmark and measure your performance. Whether it is 10 concurrent users or a million, there should always be some baseline metric for the application (requests/sec., load, etc.). In this case, we wanted to be able to quickly run (within a few minutes) a 100 player round, with capacity to handle 500 players. The problem with reaching these numbers is that as the number of players goes up, the number of games played goes up drastically (N * N-1 / 2). Even for just 50 players, the curve looks like this: Now imagine 100 or 500 players! The first step in increasing the scale was to pinpoint the two main problem areas we identified in the app. The primary was the threading model around making a move. In an even match against another player, roughly 2,000 games will be played. The original code would spin up a thread for each _move_for each game in the match. That means that for a single match, a total of 4,000 threads are created, and in a 100-player round, 4,950 matches = 19,800,000 threads! For 500 players, that number swells to 499,000,000. The advantage of the model, though, is that should a player go off into the weeds, the system can abort the thread and spin up a new thread in the next game. What we decided to do is create a single thread per player (instead of a thread per move). By implementing 2 wait handles in the class (specifically a ManualResetEvent and AutoResetEvent) we can accomplish the same thing as the previous method. (You can see this implementation in the Player.cs file in the DecisionClock class.) The obvious advantage here is that we go from 20 million threads in a 100 player match to around 9,900 – still a lot, but significantly faster. In the first tests, 5 to 10 player matches would take around 5+ minutes to complete. Factored out (we didn’t want to wait) a 100 player match would take well over a day. In this model, it’s significantly faster – a 100 player match is typically complete within a few minutes. The next issue was multithreading the game thread itself. In the original implementation, games would be played in a loop that would match all players against each other, blocking on each iteration. Our first thought was to use Parallel Extensions (of PFx) libraries built into .NET 4, and kicking off each game as a Task. This did indeed work, but the problem was that games are so CPU intensive, creating more than 1 thread per processor is a bad idea. If the system decided to context switch when it was your move, it could create a problem with the timing and we had an issue with a few timeouts from time to time. Since modifying the underlying thread pool thread count is generally a bad idea, we decided to implement a smart thread pool like the one here on The Code Project. With this, we have the ability to auto scale the threads dynamically based on a number of conditions. The final issue was memory management. This was solved by design: the issue was that original engine (and Bot Lab) don’t store any results until the round is over. This means that all the log files really start to eat up RAM…again, not a problem for 10 or 20 players – we’re talking 100-200+ players and the RAM just bogs everything down. The number of players in the Bot Lab is small enough where this wasn’t a concern, and the game server handles this by design by using SQL Azure, recording results as the games are played. Next time in the deep dive series, we’ll look at a few other segments of the game. Until next time!
I recently sat down with Peter Laudati, my cloud colleague up in the NY/NJ area, and discussed Worldmaps and the migration to the cloud in Peter’s and Dmitry’s Connected Show podcast . Thanks guys for the opportunity! Connected Show - Episode #40 – Migrating World Maps to Azure A new year, a new episode. This time, the Connected Show hits 40! In this episode, guest Brian Hitney joins Peter to discuss how he migrated the My World Maps application to Windows Azure. Fresh off his Azure Firestarter tour through the eastern US, Brian talks about migration issues, scalability challenges, and blowing up shared hosting. Also, Dmitry and Peter rap about Dancing with the Stars, the XBox 360 Kinect, Dmitry’s TWiT application for Windows Phone 7, and Dmitry’s outdoor adventures at 'Camp Gowannas'. Show Link: http://bit.ly/dVrIXM
I’ve recently completed my migration from Webhost4Life! Woohoo! For a long time, I’ve felt a bit captured because they’ve hosted my blog, email for a few domains, etc. It’s difficult to make the move. I’ll say this upfront: if you need $10/mo hosting, it’s hard to beat and easier than hosting at home. So right away, before I slam webhost4life too much, I understand the cheap price and you can’t really expect the moon. If you’re hosting your kid’s sports league site, the neighborhood website, etc., it’s a nice option. But it ends there. I've been using Webhost4Life since 2004, so I’m a long time customer. I’ve had a few speed bumps along the way (like getting shut down with no notice due to high volume when I hosted Worldmaps on the site), or arguing over using too much file space when I was still way under the account limit. Webhost4Life also went through a migration back in February that was a bit painful. This past weekend, there was a solid 48-72 hour outage that, due to the more than 48 hour response time after I submitted the ticket, it was simply easier to accept the outage and migrate elsewhere. What bothered me about this outage in particular was that it was clear my files were being messed with. For example, each of my sites looked a little like this: Here’s another: Obviously I last deployed the site back in the migration in late Feb. I noticed the outage late Friday, 5/7/2010, and it was probably down most of the day. So sometime earlier that day, my web.config files were messed with. Not only that, but each of my sites (some of which were legacy, and had no App_Theme / Ajax had these added: And, even further, my global.asax.cs file was modified and namespace was changed, as if updated by a tool or opened in VS. Many folders had an App_Themes folder – all new. Things get a tad more interesting when I crack into specifically what was changed (this is what leads me to believe it’s Webhost4Life, not a hacker, making the changes). First, this is what my web.config looked like locally … I use 3 databases. The default/logging databases are the same, the warehouse is my local server (at home) where I store archives of the logging data, where the SQL Server is on the same box: Once deployed, the Warehouse server isn’t used at all. I just keep the setting there so the settings are side by side. When I opened the modified file, I saw this: My connection strings were modified! First up, my Warehouse setting was something I only used locally at home – it seems some tool has likely replaced it. Also, the sql399 site was replaced with VCNSQL86 connection. When I log into the Webhost4Life control panel, I see that VCNSQL86 is the correct server name – obviously at some point, the name was changed from sql399. I don’t have a problem with name changes, but I do have a problem with the files being changed for me and not being notified of a change. In fact, I think a better approach is to just let my app die than modify it for me. Shared hosting or not, I think someone going into the files without explicit permission is a violation. Besides, like most developers, I work locally and FTP my changes to the site, so any changes they have would be overwritten next time I deploy. Even though I redeployed the applications, all sites were still broken. Something still wasn’t working … the subdomain to folder feature wasn’t working correctly which prevented the sites from starting. I know the site was working Thursday, so it had to be related to all these changes. After more than 48 hours, I did finally get a reply back on my original ticket, and the reply sums it up: I have checked the domain 'structuretoobig.com' and sub domain 'blog.structuretoobig.com' and noticed that it is not pointing to our server. I’ve asked for more details, so we’ll see what happens.
The recent update to Windows Azure went quite well! The site is now using a single Azure webrole, a single Azure worker role, Azure Queues for workload, and Azure Blobs for storage. It’s also using SQL Azure as the database. From a user’s point of view, not much has changed but the performance and scalability has been much improved. On the stats page, I implemented a few new stats … First up is the hourly breakdown of hits to a site. Below is Channel 9’s current breakdown. Neat way to tell when the traffic is heaviest to your site. In this case, C9 is busiest at 3pm GMT, or about 9am-4pm EST. In addition, Worldmaps includes country breakdown information: And, Stumbler has been updated a bit so be sure to check it out and watch traffic in real time! Finally, there’s change to the registration process. To add some scalability, Worldmaps now stores data in one of two schemes. The older scheme has been migrated to what is called a “plus” or enhanced account. The newer scheme is the default, and it stores data in a much more aggregated way. What determines how information is stored? This is based off of an invitation code on the Create Map form: If no invitation code is provided, the newer scheme is used. If a valid invite code is provided, the old, more detailed method is used. If you’d like an invite code, drop me some feedback.What’s the difference? Currently, the difference is pretty small. On the stats page, current number of Unique IP's can not be calculated, so it looks like so: Future report options are a bit limited as well, but otherwise, all data (and Stumbler) is still available.
For those of you with Worldmap accounts, the Azure migration is (hopefully) underway. This process will take awhile, in part due to the holiday vacation :) but also testing and what not. Starting today, the accounts section will be closed for changes until the migration is complete. Hopefully, the migration will not cause any breaking changes. However, the biggest possible change is to make sure you are accessing the service using www in the URL … for example, “http://www.myworldmaps.net….” – if you’re leaving that out, it will have to be added in or the map will not work after the migration due to limitations with the DNS CNAMEing in Azure. Stay tuned, and hope for a smooth ride and management approval :)
Back in 2004, I created my own blog engine (if you can call it a blog engine) as part of my site. At the time, it was a fun exercise as I was new to RSS. Over time, like any other pet project, my desire to keep it up to date was waning, and it eventually became more of a burden than anything else. There were a list of features I wanted to have – Live Writer support being one, plus a myriad other features. Why reinvent the wheel?
So I spent a considerable amount of time thinking about what to use as a new platform. I didn’t want to migrate to blogs.msdn.com. I looked at Community Server, and while certainly a solid product, wasn’t what I was looking for. Then I stumbled on blogengine.net, an open source blog engine built on ASP.NET. This was exactly what I was looking for! Robust enough that it solves the major issues (including Live Writer support) but simple enough that I can leverage, modify, and extend till my heart is content.
Like Worldmaps a few weeks back, the blog is now a separate entity at blog.structuretoobig.com, although I still have it integrated with some parts of my site. If you read the blog through my site, not too much has changed although you may wish to bookmark blog.structuretoobig.com. For RSS subscribers, the new RSS feed is:
The old RSS link will redirect here. If there are any hicups in the migration process (broken feeds, duplicate entries, etc.) I apologize, but hopefully this is a one time thing.
So far, I’m pretty impressed with blogengine.net and the migration was fairly smooth. Blogengine will import BlogML or RSS, however, I couldn’t get the RSS import to work. I downloaded another tool off codeplex to do RSS to BlogML, and then imported the BlogML file into blogengine. The results were really good. There might be a few small issues here and there, but as a whole, I was pleasantly surprised at how quick it was. So, if you’re looking for a blogging engine for your site, definitely check out blogengine.