ActiveRecord+PostgreSQL mass-insertion benchmarks

In the past couple of weeks I've been researching databases and optimization techniques and I just wanted to share something I discovered.

I'm using:

  • Linux Mint 13
  • PostgreSQL 9.1 (default settings)
  • Ruby 1.9.3p194
  • ActiveRecord 3.2.6
  • an i5 760 with 4GB of RAM

I have an "items" table with 15 columns (5 string, 5 integer, 1 text, 2 date and 2 float) and I'm trying to add 100,000 items to the table.

Here are 4 different methods I've tried:

1. Plain ActiveRecord.create
Takes 262 seconds (2.62ms per item)

2. Using the activerecord-import gem (using models)
Create an array with 100,000 items created using "Item.new" and then run "Item.import items":
  items = []
  100_000.times { items << Item.new(...) }
  Item.import items
This technique takes 144 seconds (1.44ms per item) where the "Item.new" part takes 43.3 seconds and the "Item.import" part takes 100.7 seconds.

Improvement: ~55% compared to plain activerecord

3. Using the activerecord-import gem (using arrays)
Instead of creating models, just create an array of arrays:
  fields = %w{col1 col2 col3 ...}
  items = []
  100_000.times { items << ['value1', 'value2', ...] }
Item.import fields, items
This technique takes 54 seconds (0.54ms per item) where creating 100,000 arrays takes 0.22 seconds and the insert takes 53.8 seconds.

Improvement: ~21% compared to plain activerecord

4. Using the PostgreSQL COPY sql command
This time I created a CSV file and imported it using SQL (the CSV file has to be on the server's file system).
  File.open('/tmp/items.csv', 'w') do |f|
    100_000.times do { f.write(...) }
  end
  ActiveRecord::Base.connection.execute(
    "COPY items FROM '/tmp/items.csv' csv;")
This technique takes 2.22 seconds, 0.84 seconds to create the CSV and 1.38 for the "COPY" command.

Improvement: ~0.85% compared to plain activerecord

Summary
Technique Total time (seconds) Time per item (millisec) Improvement
Plain ActiveRecord2622.62
activerecord-import (models)1441.4455%
activerecord-import (arrays)540.5421%
COPY2.230.020.85%

You can see the code for this benchmark here: https://github.com/elentok/db-benchmarks/tree/master/psql_normaltable

Hope you found this useful,
David.

Comments

That was so cool! Nice improvement, thanks for showing your record here. Very informative! I'm impressed!

Popular posts from this blog

Restart the Windows File Sharing Service to fix weird problems

WPF, ImageSource and Embedded Resources

SharpDevelop dark color scheme