Member
From: York, England
Registered: 2005-11-04
Posts: 605
I've been thanked 11 times.
Offline
I've got a BIG database thats constantly being updated, organised, deleted and optimised via a cron.
Everything works quite impressively, except I can't help thinking I could speed things up alittle (or alot)
At present, I'm managing to get only 2 rows written per second on a direct INSERT.
2 rows per second is not that impressive, is it?
So I'm trying to optimise things alittle and I'm wondering what people's thoughts are on the following where speed is concerned. I don't want to make unnessecary changes, so I'd be grateful for your thoughts before I mess up a working (albeit slow) system.
I was using ->
check for existence of a unique key
if(exists){
update timestamp
} else {
insert ful row
}
this was slowing things right down so I'm looking into REPLACE and INSERT ... ON DUPLICATE KEY UPDATE
REPLACE seems to be quite slow, slower than my original code and I can't just update the timestamp, it updates the entire row by deleting it and rewriting it each time. I haven't tried INSERT ... ON DUPLICATE KEY UPDATE yet. What's peoples experience of this?
My field names are quite descriptive (eg: categoryName, uniqueIdentifier)
if every byte of data counts, would it be quicker to change these to: cN, uI ?
Shortening class names in css speeds file download right up, so maybe it'll work for MySQL and PHP?
There are some fields which aren't being used yet in this table. I added themas a forethought for a potential use of this data, but if this is slowing things down, I'l gladly drop them. Although, if it makes no difference, I'll leave it. Any thoughts on this?
I really need to turbo charge this entire operation. 2 per second is way below the minimum required to get anywhere near my goal. Ive done the maths and if I can speed it up to around 25 per second, then I'm in with a chance.
The numbers are: at present i get 1,209,600 done, but I'm aiming for 15,000,000 per full 7 days written and dealt with
Any further optimisation advice would be gratefuly appreciated.
Cheers
Moderator
From: Yorkshire, UK
Registered: 2006-08-19
Posts: 2793
I've been thanked 77 times.
Offline
how many indexes do you have on your table?
I put an index on every numeric field, and most varchar ones as well
When you optimise the table, is it with your own procedure or are you actually using the mySQL optimise function?
i'd also research stored procedures and mysql's own sceduling
Member
From: York, England
Registered: 2005-11-04
Posts: 605
I've been thanked 11 times.
Offline
Isn't it possible to have too many indexes?
I've got a primary key (auto increment, naturally) a Unique key (my generated checksum) and thats about it for now.
I thought indexing was more for select and update queries, surely more indexes would slow an insert down as it has to write to the indexes too?
Not that Im any kind of myqsl expert, so feel 100% free to laugh, point and prove me wrong.
I have messed with the shorter field name idea and hit a snag that Ive hit before but forgot ALL about. Its quite funny really, you'll chuckle.
I removed all unnessecary characters from my field names but tried to keep them descriptive enough to not have to keep a menu, eg "iCategoryID" is now "CatID" (you get the picture.
Anyway, i had one called "iprodDescription" so I shortened it doen to "Desc"
and bugger me, the file would run but wouldn't insert a damn thing. It'd run right through a million inserts without changing a thing. I wasn't getting any errors to the screen (or logged)
So, I made the script print the query to the screen as it passed it, copied and pasted it directly into SQL in phpmyadmin.
"You have an error, check the manual near....."
And it hit me as square in the jaw as it did the first time I made this mistake.
DESC is a reserved command, so if you name a field DESC (case doesn't matter) it wont insert, select, update a thing.
it took me 3 days last time to figure that out, this time it only took 1!!
Anyhoo, the field name shortening added abit of speed to proceedings (once I ironed out the flaw) and rewriting my php so it doesn't need to check for the existence of a row before it proceeds has got me up to 77 rows per second written. Every row is different, but they're all pretty big data wise.
Which is very nice indeed.
I might be able to flog a few more rows out of it yet though. When things stopped happening, i reverted to just inserting, so Im gonna have to try that "INSERT ON duplicate UPDATE" idea, but I read that it scans the entire table before doing anything......
Moderator
From: Yorkshire, UK
Registered: 2006-08-19
Posts: 2793
I've been thanked 77 times.
Offline
I thought indexing was more for select and update queries, surely more indexes would slow an insert down as it has to write to the indexes too?
You said it yourself - from your original post i assumed you'd be doing updates half the time, and replace intos all of the time
Member
From: York, England
Registered: 2005-11-04
Posts: 605
I've been thanked 11 times.
Offline
Ah, yes I did, apologies. it's doing more of an insert job tbh, updating just the timestamp on duplicates and deleting everything over a week old.
I'm playing with On duplicate UPDATE now. I'm managing many many more than the 77 I previously posted, although Im not sure exactly till Ive done a full run. Looking good though
Moderator
From: Yorkshire, UK
Registered: 2006-08-19
Posts: 2793
I've been thanked 77 times.
Offline
A guy I work with managed to get a 26-hour process down to 9 minutes by using stored procedures and MySQL's own schedular.
Given that the SQL for MySQL is more advanced than just data management (with if-else and mathematical functions) one can write a function purely in SQL that may otherwise have been written is PHP
One can also write User Defined Functions (UDFs) in C and compile them into MySQL for super-fast results
Member
From: York, England
Registered: 2005-11-04
Posts: 605
I've been thanked 11 times.
Offline
My god northie! You do have a bloated opinion of my skills and knowledge!
What you're suggesting is WAY beyond the scope of my abilities! but it makes sense, cut out the middlemen (php) and watch it fly. Unfortunately, I don't have the time to get to grips with anything too new. Deadlines are pressing (although still months away) and I really must get this bit running.
We don't all have access to a team of developers/designers/server admins/seo's/etc ya know 
However, i've got it running ace now (I think). benchmarks come in with a max of 773/sec and a min of 302/sec depending on the data size. My benchmarks are php driven, so I would imagine that taking them out could improve things further.
I've played with the indexes slightly too, and (lo and behold) Your suggesstion of indexing int fields seems to hold very true. Ive only added an index to one field, my "MID" field. its an int(11) that identifies the origin of the data. Adding that index increased data write by upto 100/sec.
Removing my auto_increment primary key helped a fair bit. I usually build a table with a row id by default, but I removed it and made my unique key the primary and blam!! got an extra 200+/sec out of that
I'm sure theres more I could do with my php. It all looks hacked and butchered now, so I'm going to clean everything up, add a few error handlers and give it a weeks free reign and I can see what happens.
| Never |


