Over the past several years, we have been designing and developing our systems in preparation of getting them up “into the cloud”. Whether this means Microsoft, Amazon, or whomever was unimportant as the architecture needed to allow for high-availability and load-balanced deployments of our systems – the cloud-specific issues could be figured out later. About 1 1/2 years ago, we deployed some minor systems to Azure and consumed some of their services (most importantly queueing and blob storage). Over the past month and a half, we’ve been making changes specific to Azure. And last weekend, a co-worker of mine (who I can’t express enough gratitude towards) and I spent a grueling 72 hours beginning Friday morning migrating all of our databases and systems to Azure. We learned a lot through our various successes and failures during this migration, and in the time leading up to it.
For our system, we have a single set of internal WCF services hitting the database and half a dozen internal applications hitting those internal services. One of those internal applications is a set of externally-accessible WCF services, and on our customers’ behalf, we have some custom applications consuming those “public” services. Technologies/systems that we employ include the following:
- SQL Server (50GB database where 23GB exists in a single table)
- SQL Server Reporting Services
- SQL Server Analysis Services
- SQL Server Integration Services
- .NET 4.0/MVC 2.0/Visual Studio 2010
- Claims-Based Authentication (via ADFS)
- Active Directory
- Probably some more that I’m forgetting. If they’re important, I’ll add them back here.
By the end of the weekend, we had successfully migrated all critical systems to Azure (that we planned to) and only a couple non-critical apps still needed migration. We (temporarily) pulled the plug on one of our non-critical applications, in part due to migration difficulties and in part due to a pre-existing bug that needs fixed in it ASAP, so we decided to just tackle both at once the following week after getting some sleep. I can’t say the migration went without a hitch. While we had some unexpected major victories on some high-risk areas, we also had some unexpected major problems in some low-risk areas.
I’ll go over some specific experiences in some follow-up posts, but here were some major key points we took away from this experience that might help others. Some of these we knew about in advance and were prepared to deal with them. Others caught us by surprise and caused problems for our migration
- If you have a large database (anything more than 5GB), do a LOT of testing before you start migration! Backups, dropping/recreating indexes on large tables, etc! For instance, we have one table that we can’t drop and recreate an index on and the default ways to create backups take 8-10 hours for our database!
- When migrating your database to Azure, don’t do it with your “base system” being locally. Upload a .bak backup file to Azure blob storage using a tool like Cerabata’s Cloud Storage Studio (which allows you to upload in small chunks to easily recover from errors and improve bandwidth speeds) and create a medium-sized Azure Virtual Machine with a SQL Server Evaluation image, and base all of your data migration work from there. You’ll save so much time doing it this way unless you get everything working perfectly your very first try (unlikely). Otherwise, for just a couple bucks (it literally cost us ~$2 for the entire weekend’s worth of VMs we used), it’s totally worth it!
- AUTOMATION!! Automation, automation, AUTOMATION! You do the same thing over and over and over so many times, really, have a solid build server with automated build scripts for doing this! Do NOT use Visual Studio or any manual process! The ROI on investing in a build server will pay off before your 5th deployment, most likely, regardless of how complex or simple your system is!
- No heaps! You must have a primary key/clustered index on every single table. No exceptions! Period! Exclamation mark!
- Getting Data Sync up and running is a major pain in the ass! Azure’s Data Sync has stricter limitations than SQL Server’s Data Sync (for instance, computed columns don’t play nicely at all in Azure but SQL Server has no problem with them). There are just enough nuances and so much time that it takes to find them that this you can spend quite a bit of time just figuring this out. And then figuring how to automate these nuances is yet another topic of discussion since the tools are so poor right now.
- Use the SQL Database Migration Wizard to migrate your data from your “base system” to an Azure database. But be gentle with it, it likes to crash and that’s painful when it happens 3 hours into the process! Also, realize that it turns nullable booleans with a NULL value into FALSE and doesn’t play nicely with some special characters, so be prepared to deal with these nuances!
- Red Gate SQL Compare and SQL Data Compare are GREAT tools to help you make sure your database is properly migrated! SQL Data Compare fixes up the problems from the SQL Database Migration Wizard very nicely and SQL Compare gives you reassurance that indexes, foreign keys, etc. are all migrated nicely.
- As I said before, test test test with your database! For us, 8-10 hour database backups were unacceptable. Our current solution for this problem is to use Red Gate’s Cloud Services Azure Backup service. With the non-transactionally-consistent backup option, we can get it to run in ~2 hours. Since we can have nightly maintenance windows, this works for us.
- Plan on migrating to MVC 3.0 if you want to run in Server 2012 instances.
- If you’re changing opened endpoints in the Azure configuration (i.e. opening/closing holes in firewalls), you have to delete the entire deployment (not service) and deploy again. Deploying over an existing deployment won’t work but also won’t give you any errors. Several hours were wasted here!
- MiniProfiler is pretty awesome! But the awesomeness stops and becomes very confusing if you have more than 1 instance of anything! Perhaps there’s a fix for this but we haven’t yet found one.
- If you have more than just one production environment, it’s very handy to have different subscriptions to help you keep things organized! Use one subscription for Dev/QA/etc, one for Production, one for Demo, one for that really big customer who wants their own dedicated servers, etc. Your business folks will also appreciate this as it breaks billing up into those same groups. Money people like that.
- Extra Small instances are dirt cheap and can be quite handy! But don’t force things in there that won’t fit. We found that, with our SOA, Extra Small instances were sufficient for everything except for two of our roles. Except for those two roles, we actually get much better performance with 7 (or fewer) Extra Small instances than 2 Small instances for a cheaper price (1 Small costs the same as 6 Extra Small).
In the next post, we’ll go over the things that we did leading up to this migration to prepare for everything. From system architecture to avoiding SessionState like the plague and retry logic in our DAL, we’ll cover the things that we did to help (or we thought would help) make this an easier migration. And I will also highlight the things we didn’t do that I wish we had done!