| Home | Register | FAQ | Members List | Search | Today's Posts | Mark Forums Read |
|
|
#1 (permalink) |
|
.
Join Date: Aug 2007
Location: California
Posts: 110
|
php/mysql predicament
Here's my problem. I have a large database with one specific column of interest. Each row in this column contains a string. I need to count how many times a specific sub string appears in this column of strings. Make sense? Any one know the best way to do this? |
|
|
|
|
|
#4 (permalink) |
|
Grumpy old man
Join Date: Oct 2007
Location: North Japan
Posts: 1,128
|
I believe you can do it with something like: SELECT COUNT(*) FROM table_name WHERE where column_name LIKE '%substring%'; Please note, I haven't tested this. It should return a single row containing a number which is the number of times the substring appears in the column. The % signs are necessary. |
|
|
|
#5 (permalink) |
|
.
Join Date: Aug 2007
Location: California
Posts: 110
|
Thanks Hunch. That works pretty well. The only issue I'm having with it now is that it's not efficient. I should have specified that I will be cycling through a few hundred substrings to count the appearance of each. Using this method, the only way I can think to make it work is with a separate query for each substring. Is there any way I can select the entire column, store it in an array, and have php do the work of counting the appearance of the substrings within the array? |
|
|
|
#7 (permalink) | |
|
Grumpy old man
Join Date: Oct 2007
Location: North Japan
Posts: 1,128
|
Quote:
Absolutely you could. You could even implode() the array so that you're simply counting occurrences in one big string and loop your substrings through substr_count() - beware though - if you're searching for very short substrings this would be could potentially create false positives and it would be safer in that case to keep them as an array. I'm not sure I'd use FULLTEXT for this. It's useful for natural language searching, or if you wanted to create a mini search engine, but in this case I'm not sure it would provide much assistance. Just as a quick aside before this goes any further, you're not simply counting the number of occurrences of every different existing value in a column are you? If so, you could do it with a simple SELECT DISTINCT query. I thought it was worth checking before considering more elaborate options. One other thing that would help to work out the most efficient method is to get a rough idea of what kind of size of database we're talking about here. I'm working on a project at the moment where one of the tables grows at the rate of 30,000 rows a week, so we're dealing with literally row counts in the millions. MySQL is comfortably handling fairly complex queries against tables of this size. |
|
|
|
|
#8 (permalink) |
|
blam blam
Join Date: Aug 2006
Location: ann arbor, mi usa
Posts: 522
|
Hard to say without knowing more about your project, but it's starting to sound like you might need to reevaluate your table structure to optimize performance (i.e. breaking said column into multiple columns to improve indexing) |
|
|
|
#10 (permalink) | |||
|
.
Join Date: Aug 2007
Location: California
Posts: 110
|
Quote:
That would be nice. Unfotunately, each row is distinct. I'm counting like substrings within distinct rows. Table size is about 140,000 rows at this point. Quote:
If you don't mind, would you provide a quick example Hunch? Quote:
Believe me, I'd love to break this column down into multiple columns. It's exactly what I need, but that's not an option. This entire table is converted into mysql from an ancient foxpro table used by my company's archaic accounting software. Table structure is pretty much set in stone. Good advice though. |
|||
|
|
|
#12 (permalink) |
|
Grumpy old man
Join Date: Oct 2007
Location: North Japan
Posts: 1,128
|
I'm not sure how either of the restructuring (breaking into multiple columns/tables) would help you either given that you're still going to have the same number of entirely distinct fields. If you're considering doing it in code, rather than through the database, and efficiency is your goal, I'd perhaps start thinking about other alternatives to PHP. My first port of call would probably be C++ simply because it's pretty renowned for speed. Just out of interest, I banged a few keywords into Google to see if anyone had any benchmarks, and found this quite informative page. The results are conclusive (although probably to be expected) with the C++ CGI version being up to 70 times faster than PHP on occasions. Notice on the really large tests, only C++ can even handle that volume of data. PHP and ASP both wimp out. WrenSoft - Zoom Search Engine - Benchmarking PHP vs ASP vs Javascript vs CGI |
|
![]() |