Blog RSS Feed Subscribe

Jordi Boggiano

Jordi Boggiano Passionate web developer, specialized in web performance and php. Partner at Nelmio, information junkie and speaker.

Categories

Major glob() fail

I just had the pleasure of discovering another of PHP's little quirks and since it's been almost a year since my last post, I thought it would be a good occasion.

Working on some personal project that lists a bunch of stuff on my hard drive, I found out that directories that contain square brackets (those []) don't return any results for the simple reason that glob reads [stuff] as a character class, just like in regular expressions. When you know it it makes perfect sense, but when you don't, the documentation is really not so helpful. Of course it mentions libc's glob() and unix shells, but not everyone knows what that implies at first glance.

My first reaction when I noticed that those directories were missing was to try and escape them with backslashes, which works on unix systems, but not on windows since the backslash is the directory separator. The cross platform solution is to enclose them in brackets (i.e. [[]), which effectively creates a character class with only the opening bracket in it, so it matches correctly.

I then wrote this glob_quote function which, just like preg_quote, escapes the meta characters that glob uses.

function glob_quote($str) { 
    $from = array( '[', '*', '?'); 
    $to = array('[[]', '[*]', '[?]'); 
    return str_replace($from, $to, $str); 
}

Another detail worth noting while I'm at it is that this problem also occurs when you do glob('*.txt') if your cwd contains brackets, since in this case the cwd is pre-pended to the pattern, the solution being to escape it as well as such:
glob(glob_quote(getcwd()).DIRECTORY_SEPARATOR.'*.txt');

That's it for today, so until next year..

December 02, 2009 // PHP

Post a comment:


Formatting: you may use [code php] [/code] (or other languages) for code blocks, links are automatically linked. <strong>, <em> and <blockquote> html tags are allowed without nesting, the rest will be escaped.