Learn our strategy for migrating 75,000 lines of code over 5 months with no downtime
For a large portion of last year I was fully immersed in a software migration I hope is required only once a decade...or less. After successfully emerging from the challenge, and allowing a few months to replenish my joie de vivre, I’m excited to share our journey to Python 3, including how the strategy we used minimized risk and preserved equanimity throughout the process.
GQueues was started in 2008, when Python 2.5 was the only language available for Google App Engine, which had launched in beta earlier that year. So the entire backend of GQueues was written in Python 2. For years we blissfully ignored migrating to Python 3 (with all its breaking changes) because App Engine didn’t even offer it as an option until 2018, when they launched their second generation runtimes. The next couple years flew by as we prioritized building highly requested features for our users. Before we knew it, the official Python 2 sunset date came and went (January 1, 2020), and it was time to accept reality and face the daunting migration that lay ahead.
One of the initial selling points of App Engine was the large number of services built into the platform – it provided everything needed to build a modern web app: a web framework, Datastore API, Search API, Memcache API, Task Queue API, Deferred API, Mail API – and GQueues used them all! In the second generation runtimes, the built-in services were removed from App Engine to increase app portability. So before we could migrate to Python 3, we first needed to migrate all of the built-in services to their corresponding unbundled Google Cloud replacements!
In 2020 we moved from the Search API to Elasticsearch on Google Kubernetes Engine, which was a 3-month project itself, with a strategy complex enough to warrant a blog post of its own. In 2021 another six weeks were spent migrating from the Memcache API to Google Cloud Memorystore for Redis. If you're looking to migrate legacy services to the latest runtimes, Google’s Wesley Chun has created a number of excellent codelabs and videos that walk you through most of the details.
At this point in our journey Google announced support for legacy bundled services on second generation runtimes through language-idiomatic libraries. THANK GOODNESS! We could now start the actual Python 3 migration without having to move the remaining services!
App Engine documentation suggests the following migration strategy to move an app to Python 3:
While this process may work for apps with small code bases or occasional users, it seemed way too risky for GQueues. Working months on updating code and then flipping a switch at the end felt irresponsible, and downright dangerous. Spending a significant amount of time upfront without knowing whether the migration would work is not a wise use of company resources. And the danger of updating the entire app all at once, potentially breaking everything for all users, was too much stress and anxiety to bear!
After much thought and consideration we settled on the following migration strategy:
This approach provided a number of advantages for us:
With a solid strategy in place, we could now start figuring out the implementation details to actually carry out the migration.
Since our approach centered around running Python 2 and 3 versions of the app in production simultaneously, we first had to figure out how to set this up for local development. We configured dev_appserver.py to support both runtimes and start services for our Python 2 app.yaml and Python 3 app.yaml files.
In our local environment we set up nginx as a reverse proxy to fill the same role as the Google Cloud Load Balancer. Then, instead of accessing dev_appserver.py services directly like before, requests went through nginx, which would forward to the Python 2 service or Python 3 service based on the URL. As we migrated features we updated nginx.conf to route traffic to the appropriate instance.
At the time we discovered one annoyance running dev_appserver.py for Python 3: a new virtualenv is created and all dependencies are installed from scratch every time the server is started. This mimics what happens in production, but on a local machine it means initializing the server takes over a minute – painful! Fortunately, we found a patch that forces dev_appserver.py to use an existing virtualenv, which brought initialization down to 5 seconds. And then we only had to update the virtualenv when dependencies were changed in requirements.txt.
Since then, Google has added a "--python_virtualenv_path" flag to dev_appserver.py so you can use a persistent virtualenv now with no patch required!
Of course, before we could start porting code we had to learn what’s actually different in Python 3. There are tons of resources on the web covering the changes, like the cheat sheet from Python-Future, and notes from Guido himself. As we migrated the first files (our internal support tools), we began building our own reference sheet on what we needed to look for in our Python 2 code, and how it should be changed for Python 3. This included Python 3 language changes, as well as those required for switching to the Django web framework.
After migrating the first handful of files we had most of the necessary changes documented in our reference. As with any migration project, accuracy and thoroughness are critical. So just like pilots who always follow their pre-flight checklist, we used our reference sheet as a checklist, reviewing each Python 2 file for every item in the sheet, to make sure nothing was missed.
Reading through all the backend code, some of which was written over ten years ago, it was tempting to refactor during the migration process. This would of course lengthen an already very long project, so we decided to refrain from refactoring unless absolutely necessary.
We did reorganize files when it made sense, and split code when required so there was only one class per file (which was definitely not always the case in earlier code). Many classes also had staticmethods, which we had to split out into their own utility modules to eliminate dependencies that would have made migrating individual features nearly impossible.
We chose Django as our new web framework because it's fairly similar to the webapp2 framework built for App Engine’s Python 2 runtime. So we didn't have to update any of our HTML templates, and most of the required updates were minor syntax changes. And we were able to simplify some other parts of the migration by writing custom middleware to wrap the requests.
For instance, webapp2 conveniently allowed cookies to be set on the request object, which we used throughout the Python 2 codebase. Django only supports settings cookies on the response object, but creating the following middleware provided the missing functionality, so no other cookie-related changes were needed.
Another custom middleware allowed us to easily route logs to Google Cloud Logging in production and to the console when running locally.
As noted earlier, we had already moved from App Engine’s memcache to Memorystore for Redis, but we still needed to make a few Python 3 related changes to the SerializedRedis client wrapper we had written. Most importantly, we needed to ensure that values written to Redis by Python 2 code could be retrieved by Python 3 code, and vice versa, since both versions would be running and interacting with each other.
Since Python 3 handles strings and unicode differently than Python 2, we needed a way to identify the type of the value in Redis before decoding it. It turns out Google already solved this problem in their update to the legacy memcache bundled service, so we mimicked this approach of setting integer bit flags for all values going into Redis and decoding based on these flags.
Likewise, we also had to ensure that all pickled objects stored in Redis used protocol version 2, since that’s the highest version that’s compatible with both Python 2 and 3.
Since we were migrating only one or two features at a time, Python 3 code was released to production every few days. We alerted our Customer Care team as changes went out so they could be on the lookout for any support issues that might be related to a recent update. We also tracked the progress of files migrated in GQueues itself and celebrated milestones in team meetings. 🥳
This approach worked amazingly well, and as the sole engineer on the project, I was able to migrate 266 files containing 75,000 lines of code over 5 months with 51 code releases to production. Porting code is not a difficult task per se, but this was definitely a grueling project with the sheer volume of changes and testing required. Whenever my mental energy waned, I took a break. If I pushed on, the chances of something being overlooked greatly increased with such a tedious undertaking. As these types of projects go, it was successful precisely because our users didn’t even know that it happened. It’s not quite the same joy as launching a highly requested feature to cheers from users, but the relief I felt with GQueues off an unsupported version of Python was enormous!
Without a big “switch-over” event, finishing the migration was surprisingly anti-climactic. And that was just fine with me. On the last day, there was very little difference between 98% percent of the app being served by Python 3 servers, and 100% after the final release. The strategy worked exactly as I hoped.
It turns out the most thrilling moment of the migration was release 26, when more of GQueues was running on Python 3 than Python 2. That’s when everything began to feel real. A close second was the pure joy I felt when I finally got to shut down the Python 2 servers and delete all the old code from the repository.
If you still have a Python 3 migration on the horizon, I hope this helps in figuring out an approach that will work best for your app. You have my empathy, and heartfelt encouragement: You can do this! For everyone else, we can wake up every day grateful that a Python 3 migration isn’t looming over our heads. 😅
I love building products! And Python. And dark chocolate. When I'm not leading the team at GQueues, I can be found running ultras on the trails of the Rocky Mountains.