Elentok's Blog

About me

ActiveRecord+PostgreSQL mass-insertion benchmarks

In the past couple of weeks I've been researching databases and optimization techniques and I just wanted to share something I discovered.

I'm using:

  • Linux Mint 13
  • PostgreSQL 9.1 (default settings)
  • Ruby 1.9.3p194
  • ActiveRecord 3.2.6
  • an i5 760 with 4GB of RAM

I have an "items" table with 15 columns (5 string, 5 integer, 1 text, 2 date and 2 float) and I'm trying to add 100,000 items to the table.

Here are 4 different methods I've tried:

1. Plain ActiveRecord.create

Takes 262 seconds (2.62ms per item)

2. Using the activerecord-import gem (using models)

Create an array with 100,000 items created using "Item.new" and then run "Item.import items":

items = []
100_000.times { items << Item.new(...) }
Item.import items

This technique takes 144 seconds (1.44ms per item) where the "Item.new" part takes 43.3 seconds and the "Item.import" part takes 100.7 seconds.

Improvement: ~55% compared to plain activerecord

3. Using the activerecord-import gem (using arrays)

Instead of creating models, just create an array of arrays:

fields = %w{col1 col2 col3 ...}
items = []
100_000.times { items << ['value1', 'value2', ...] }
Item.import fields, items

This technique takes 54 seconds (0.54ms per item) where creating 100,000 arrays takes 0.22 seconds and the insert takes 53.8 seconds.

Improvement: ~21% compared to plain activerecord

4. Using the PostgreSQL COPY sql command

This time I created a CSV file and imported it using SQL (the CSV file has to be on the server's file system).

File.open('/tmp/items.csv', 'w') do |f|
  100_000.times do { f.write(...) } 

  "COPY items FROM '/tmp/items.csv' csv;")

This technique takes 2.22 seconds, 0.84 seconds to create the CSV and 1.38 for the "COPY" command.

Improvement: ~0.85% compared to plain activerecord


Technique Total time (seconds) Time per item (millisec) Improvement
Plain ActiveRecord 262 2.62
activerecord-import (models) 144 1.44 55%
activerecord-import (arrays) 54 0.54 21%
COPY 2.23 0.02 0.85%

You can see the code for this benchmark here.

Hope you found this useful, David.

Next:Global hotkeys for 8tracks on Mac