Tuesday, May 3, 2011

Regular expression to change user input

I am doing a search on my database. If the user enters eg. "foo bar" I want to display rows that have both "foo" AND "bar".

My SQL syntax looks like this: SELECT * FROM t1 WHERE MATCH (t1.foo_desc, t2.bar_desc) AGAINST ('+foo* +bar*')

How can I change the user input "foo bar" into "+foo* +bar*" using a regular expression in PHP?

From stackoverflow
  • $match = '+' . implode('* +', explode(' ', $input)) . '*';
    

    This assumes that the input isn't an empty string;

    Edit: As @Bart S points out, str_replace (or mb_str_replace if you're dealing with multibyte characters) would be even simpler...

    $match = '+' . str_replace(' ', '* +', $input) . '*';
    
    Bart S. : Why not use str_replace?
    soulmerge : +1 I don't understand why people always demand regular expressions for trivial tasks
    J. Steen : A programmer has a problem. He decides to solve this using regular expressions. Now he has two problems.
    Tomalak : +1 Thinking about it, I don't know either. ;-) However, *some* normalization/checking should be done in any case.
    Tomalak : @metaphor: Now that's a smart quote. I've never heard it before.
    Wil : Works, but explode parameters should be the other way around: explode($input, ' ') -> explode(' ', $input)
    Greg : Oops - fixed...
    Gumbo : What if there are two ore more spaces between two words? Your solution would replace those additional spaces too and turn `foo bar` (two space between) into "+foo* +* +bar*". So why not just a reg… ;)
  • You should use \b and \B to identify the word/not-word boundaries, and thus insert your +/*.

  • First, trim the user input and remove anything strange (punctuation, quotes, basically everything "\W", except white space of course).

    Then, substitute:

    (?<=\w)\b
    

    with

    "*"
    

    and:

    \b(?=\w)
    

    with

    "+"
    
  • For those interested in the MATCH...AGAINST syntax This article is a decent starting point if you haven't used it before.

  • I would use a regular expression to replace the spaces between two words by * +:

    '+' . preg_replace('/\s+/', '* +', trim($input)) . '*'
    

    Edit   Basically this is an implementation to what Tomalak said just with \s instead of \W.

0 comments:

Post a Comment