I'm looking for a way to select one table row explicitly for one thread. I've written a crawler, that works with about 50 parallel processes. Every process has to take one row out of a table and process it.
CREATE TABLE `crawler_queue` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`url` text NOT NULL,
`class_id` tinyint(3) unsigned NOT NULL,
`server_id` tinyint(3) unsigned NOT NULL,
`proc_id` mediumint(8) unsigned NOT NULL,
`prio` tinyint(3) unsigned NOT NULL,
`inserted` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `proc_id` (`proc_id`),
KEY `app_id` (`app_id`),
KEY `crawler` (`class_id`,`prio`,`proc_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8
Now my processes do the following:
- start DB transaction
- do a select like
SELECT * FROM crawler_queue WHERE class_id=2 AND prio=20 AND proc_id=0 ORDER BY id LIMIT 1 FOR UPDATE - then update this row with
UPDATE crawler_queue SET server_id=1,proc_id=1376 WHERE id=23892 - commit transaction
This should help that no other process can grab a row that is processed yet. Doing an EXPLAIN on the select shows
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE crawler_queue ref proc_id,crawler proc_id 3 const 617609 Using where
But the processes seem to cause too high parallelism, because sometimes I can see two types of errors/warnings in my log (every 5 minutes or so):
mysqli::query(): (HY000/1205): Lock wait timeout exceeded; try restarting transaction (in /var/www/db.php l
ine 81)
mysqli::query(): (40001/1213): Deadlock found when trying to get lock; try restarting transaction (in /var/www/db.php line 81)
My question is: can anybody point me in the right direction to minimize these locking problems? (in production state, the parallelism would be 3-4 times higher than now, so I assume, that there would be much more locking problems)
I modified SELECT to use index crawler by hint USE INDEX(crawler). My problem now are lockwait timeouts anymore (deadlocks disappeared).
EXPLAIN with USE INDEX() shows now (no. of rows is higher, because table contains more data now):
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE crawler_queue ref proc_id,crawler crawler 5 const,const,const 5472426 Using where