I had a task: Load data from xls and xml files, convert them, merge them and unload results into xls with specific format.
Tools\libs to use:
Python, sqlite3, win32com
Problems after implementation:
DB has ~ 25k rows and they have constantly being updated.
"INSERT OR REPLACE" sql operation way too slow for my set of data.
all processing takes 2 hrs 35 mins
Refactoring 1 and its results:
Move journal and temp storage into memory:
# http://www.sqlite.org/pragma.html#pragma_journal_modenow all processing takes "only" 49 minutes. it is better, but still is not acceptable.
PRAGMA journal_mode=MEMORY;
# http://www.sqlite.org/pragma.html#pragma_temp_store
PRAGMA temp_store=MEMORY;
Refactoring 2 and its results:
Move whole DB into the memory (i.e. use ":memory:" database)
Load DB from a disk to the memory when processing starts.
Save DB from the memory to a disk at the end of processing.
now all processing takes ~3 minutes.import sqlite3
import apsw
...
# http://apidoc.apsw.googlecode.com/hg/pysqlite.html # http://apidoc.apsw.googlecode.com/hg/backup.html#backup
EOF
3 minutes. Looking good :)
ReplyDeleteyep :)
Delete