Old 28-07-2006, 08:34   #1 (permalink)
Skidzy McFergus
Registered User
 
Join Date: Nov 2005
Posts: 57
php remove punctuation

Hi, i was wondering if there was a php function that removes all punctuation from a string of text. I can filter out full stops, commas etc using regex but thought surely there must be a function to do all punctuation. Have looked through the manual but could find anything.
  Reply With Quote
Old 28-07-2006, 10:00   #2 (permalink)
MikeMackay
Everything is fine.
 
MikeMackay's Avatar
 
Join Date: Feb 2005
Location: Witham & London
Posts: 774
Send a message via MSN to MikeMackay Send a message via Skype™ to MikeMackay
What about building a regexp that only allows [a-z][0-9] and spaces (if you want spaces allowed). Get it to delete everything else from the string, that should take care of removing absolutely everything.

Be careful with it though.

- Mike
  Reply With Quote
Old 01-08-2006, 06:32   #3 (permalink)
xml
Senior Member
 
Join Date: Sep 2004
Posts: 149
Skidzy McFergus,
Do you anticipate unicode characters in the string?

If so try this:
$cleanUnicodeStr = trim(preg_replace('#[^\p{L}\p{N}]+#u', ' ', $unicodeStr));

This will also clean up shitty unicode punctuation like those smily faces, musical quotations and stars etc, whilst at the same time allow for Japanese symbols and other such characters.
__________________
  Reply With Quote
Old 01-08-2006, 06:48   #4 (permalink)
smallbeer
I Ain't Losing Any Sleep™
 
Join Date: Apr 2003
Posts: 5,237
or
Code:
$text = preg_replace('/\W/', ' ', $text);
  Reply With Quote
Old 01-08-2006, 06:50   #5 (permalink)
xml
Senior Member
 
Join Date: Sep 2004
Posts: 149
Does that support unicode smallbeer?
__________________
  Reply With Quote
Old 01-08-2006, 07:15   #6 (permalink)
smallbeer
I Ain't Losing Any Sleep™
 
Join Date: Apr 2003
Posts: 5,237
I'm not exactly sure what you mean by support, but I guess you'd want to use a function that supports multibyte characters.

http://uk2.php.net/mb_ereg_replace
  Reply With Quote
Old 01-08-2006, 08:42   #7 (permalink)
xml
Senior Member
 
Join Date: Sep 2004
Posts: 149
preg_replace supports unicode, by using the "u" modifier.

Without setting the "u" unicode modifier the string will be interpreted as a string of bytes and will not recognise multibyte unicode characters as individual items to be matched. Obviously cutting part way thru a multibyte character will fuck up the string royally causing a few heart aches on a internationalised application.
__________________
  Reply With Quote
Old 03-08-2006, 15:02   #8 (permalink)
Skidzy McFergus
Registered User
 
Join Date: Nov 2005
Posts: 57
thanks for all that, will process in due course.
  Reply With Quote
Reply



Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools Search this Thread
Search this Thread:

Advanced Search


Contact Us - Web Design Forums - Archive - Top
Search Engine Optimization by vBSEO 3.0.0 RC8