At Amberbit, we have a legacy project which uses MongoDB. We’ve been working on it for few years now. Through this time it obviously grow and changed a lot and at some point we started focusing more on it’s performance. The biggest issue we had was that we were using mongoid and mongoid/moped, which not only caused memory leaks (rack mini profiler) but also had an impact on performance as we needed to run really complicated queries that non-relational database couldn’t handle well. That’s why we agreed on switching to PostgreSQL.
Step by step, I want to show you how to migrate from MongoDB to PostgreSQL. You don’t need to use all of my solutions, but I promise they’ll save your time and make your code cleaner.
Repositories
The key is to make sure that you won’t lose any query, which could break some of your functionalities. First step is based on the recipe proposed by Andrzej Krzywda in Fearless Refactoring. He recommends creating repositories for each model you are using or at least for the biggest and most complicated ones. By moving all your queries there, you can easily modify them later and change their syntax in the Active Record way. Same strategy apply to your tests.
class UsersRepository
def find(id, attributes: nil)
users.only(attributes).where(id: id).first
end
def find_by_email(email, attributes: nil)
users.only(attributes).where(email: email)
end
def all(attributes: nil)
users.only(attributes).all
end
def all_by_name(name, attributes: nil)
users.only(attributes).where(name: name)
end
private
def users
@users ||= User
end
end
To speed up moving queries to the repository don’t forget to grep some commonly occurring patterns:
$ git grep ‘ User\.’
$ git grep ‘ user\.’
$ git grep ‘\.users
Remember to find places where model is called by associated model if it’s connected by has_many
, has_one
or belongs_to
.
UserCommunity.find(id).users
change it to
def all_by_user_community(id, attributes: nil)
users.only(attributes).where(user_community_id: id)
end
You should also move all create
, update
, delete
and even save
calls to the Repository. It will help you later on, while working on callbacks. It will be much cleaner to place it there instead of using before_save
actions inside the model.
def update(user, attrs={}l)
user.update(attrs)
end
def save(user)
user.save
end
attributes
As you might see, I’m passing an optional parameter attributes
. It allows you to pass array of the names of fields you want to receive, which will speed up your query execution in situations when you don’t really need all fields of your model.
size, length, count
It would be also a good practise to move all .count
calls to the repository you are creating. When using mongodb, there is no difference when using count
, length
or size
. In postgres, there is a huge performance difference between them. Using length
, will load the data each time and then execute the query. When you use count
, it won’t load all data but just execute count query. Performance-wise is best to use size
. It will load data only when they’re not already loaded otherwise it will just count them.
Setup Postgres
When you’re done with the first step, you can go ahead and start setting up postgres for your project. This is the simplest part. You need to add pg
and active_record
gems to your Gemfile, run bundle exec install
and prepare your database.yml
file to keep your db connection configuration. I was afraid I’ll have some problems with working on two different databases in one project, but it turns out that it’s much easier than I expected. As I said, it’s the easiest part :)
Generate postgres migration
Once you have your project set up with the proper gems and your model repositories are all prepared, you should tell your new database about those new models you are going to add. Creating migrations is really easy as you can use rails generate ...
command. Although, you need to remember to define the type of a migration, as we have configured two different databases. The solution you need to apply is:
$ rails generate active_record:migration MigrationName
When you generate your migration file, I recommend using your mongoid Model file and copying all of the mongoid field
to generated migration. Of course you’ll need to modify them to fit the proper Active Record migration syntax. You can also use Find/Replace tool (vim: :%s/pattern/patter/g
).
field :name, String
will be
t.string :name
From my experience, you should also add mongo_id
string field to each model, which may help you later with your existing data migration. Don’t forget about your model relations and to deal with all of the mongoid embedded documents
by creating additional tables or moving some fields directly to the parent model. If necessary, create some indirect tables. After you finish, remember to run rake db:migrate
. If something goes wrong, you can always do rake db:rollback
to undo your last migration. I think, you know the drill :)
Prepare your ActiveRecord models
I believe that the best practice to create new models and a task for populating data from MongoDB to PostgreSQL, would be by creating additional directory like /models/mongodb
. Next, copy there models you are going to migrate and modify them by adding namespace and name of the collection, which your model should be pointing at:
class Mongodb::User
field :name
field :email
store_in collection: ‘users’ #remember to pluralize the name of your model
end
When you’re done with that, get back to your primary model file and modify it in the ActiveRecord way:
- Remove all
include Mongoid::...
andfield :name
lines. - Add
< ActiveRecord::Base
afterclass Name
line - Modify your repositories to use Active Record syntaxes (change all
nin
,in
,eq
, use joins, etc.) If your file is referencing to some other models that you are not going to migrate now, you need to consider adding some custom methods that will make a proper connection and return the data from referenced objects. From my experience, it’s best not to split the migrations. It’s much easier to keep track of all model associations, and later to create script for populating data, when you already have all the referenced models created.
def find(id, attributes: ‘*‘)
users.select(attributes).where(id: id).first
end
Creating a task for data migration
The last step of the migration is to move all the data you have in your MongoDB to your new Postgres database. To do that I used few components:
- Mongoid/Moped driver to get data directly from database in hash format.
- Postgres transaction to be sure that the whole model migration will pass without any problems. Any brake or error during migration would trigger rollback, so we won’t need to check which records has already been created when trying to run it one more time after applying fixes.
Plain SQL to insert. It is much faster and cleaner way than using ActiveRecord. Also, if you already properly created all the records, you don’t need to validate them or run callbacks. You can use Bulk Insert gem that will help you generate all of the inserts in just one execution(it will speed it up by a lot).
ActiveRecord::Base.transaction do
attributes = User.attribute_names User.bulk_insert do |pg_model| Mongodb::User.collection.find().to_a.each do |user| attrs = user.slice(*attributes).merge(mongo_id: user['_id'].to_s) pg_model.add(attrs) end end
end
If you are going to create some references, you should implement proper migration in the same transaction. It only depends, if the associated model, has been already created or not.
So, if you need to join user to the group, you’ll need to migrate the group first, and then, in user migration block, merge additional attributes with the id of the group in postgres database. Use mongo_id
field to find the proper one:
attrs.merge(store_group_id: UserGroup.where(mongo_id: user[‘store_group_id’].to_s)
.first.try(:id))
This is also the reason why I recommended creating additional mongo_id
field in each model.
Also, remember about properly migrating many to many relations, which were arrays in mongo database, but now they need additional indirect table. After migrating both of the models(User and Tag), get one of them and create additional loop:
UserTag.bulk_insert do |postgres|
Mongodb::Tag.collection.find().each do |tag|
tag_id = Tag.where(mongo_id: record['_id'].to_s).first.try(:id)
tag['user_ids'].each do |uid|
postgres.add(user_id: User.where(mongo_id: sid.to_s).first.try(:id), tag_id: tag_id)
end
end
end
This is another situation where you definitely need your old mongo id value. This kind of migration will be probably the longest one. For me, running it on ~20k records takes around 3 min.
Conclusion
This transition was a time consuming process but we don’t regret it. It not only made our service faster but it also helped us to clear our code and make some good changes.
I bet there are more people who already had to deal with this problem so let me know about your experience and the solutions you worked out :)
Post by Rafał Maksymczuk
Rafał enjoys a range of technologies (Ruby, JavaScript), and has been working with us since 2013.