List Files and Directories with PHP

original aritcle at phpmaster.com

List Files and Directories with PHP

In this article I’ll talk about a common task you might have experienced while developing a PHP application: listing files and directories. I’ll discuss several basic and advanced solutions, each having its pros and cons. First I’ll present three approaches that use some very basic PHP functions and then progress to more robust ones which make use of SPL Iterators.

For the purposes of the discussion, let’s assume a directory structure that looks like the one below:

\---manager
|
\---user
|   \---document.txt
|   \---data.dat
|   \---style.css
|---article.txt
|---master.dat
|---script.php
|---test.dat
|---text.txt

The Basic Solutions

The first set of approaches demonstrate the use of the functions glob(), a combination of the functions opendir(), readdir() and closedir(), and the the function scandir().

Using glob()

The first function to discuss is glob() which allows us to perform a search for pathnames using wildcards common to the best known shells. The function has two parameters:

  • $pattern (mandatory): The search pattern
  • $flags (optional): One or more flags as listed in the official documentation

Let’s see some examples! To search in the directory for all files and directories that end with .txt, you would write:

<?php
$filelist = glob("*.txt");

If you display $filelist, the output will be:

array (
  0 => 'article.txt',
  1 => 'text.txt'
)

If you want a list of files and directories that begin with “te”, the code to write is:

<?php
$filelist = glob("te*");

The output is:

array (
  0 => 'text.txt'
)

To get a list of directories only which contain “ma”, the code is:

<?php
$filelist = glob("*ma*", GLOB_ONLYDIR);

In this last example, the output is:

array (
  0 => 'manager'
)

Notice that the last example makes use of the GLOB_ONLYDIR constant as the optional second parameter. As you can see, the file called master.dat is excluded because of it. Although the glob() function is easy to use, it isn’t so flexible sometimes. For example, it doesn’t have a flag to retrieve only files (and not directories) that match a given pattern.

Using opendir() and readdir()

The second approach to read files and directories I’d like to discuss involves the functions opendir(), readdir(), and closedir().

opendir() opens the directory and returns a connection handle. Once the handle is retrieved, you can use readdir(). With each invocation, this function will give the name of the next file or directory inside an opened directory. When all the names have been retrieved, the function returns false. To close the handle you use closedir().

Unlike glob(), this approach is a bit more involved since you don’t have parameters that help you filter the returned files and the directories. You have to perform post-filtering yourself to get what you want.

To parallel with the glob() function, the following example retrieves a list of all the files and the directories that start with “te”:

<?php
$filelist = array();
if ($handle = opendir(".")) {
    while ($entry = readdir($handle)) {
        if (strpos($entry, "te") === 0) {
            $filelist[] = $entry;
        }
    }
    closedir($handle);
}

The output is the same as the previous example.

But if you execute the code above and output the value of $entry as it runs, you’ll see it contains some odd-looking entries at times: “.” and “..”. These are two virtual directories you’ll find in each directory of the file system. They represent the current directory and the parent directory (the up-level folder) respectively.

The second example shows how to retrieve only the files contained in a given path.

<?php
$filelist = array();
if ($handle = opendir(".")) {
    while ($entry = readdir($handle)) {
        if (is_file($entry)) {
            $filelist[] = $entry;
        }
    }
    closedir($handle);
}

As you might guess, using the above code produces the following output:

array (
  0 => 'article.txt',
  1 => 'master.dat',
  2 => 'script.php',
  3 => 'test.dat',
  4 => 'text.txt'
)

Using scandir()

And finally, I’d like to present the scandir() function. It has only one mandatory parameter: the path to read. The value returned is an array of the files and directories contained in the path. Just like the last solution, to retrieve a subset of files and directories, you have to do post-filtering yourself. On the other hand, as you can see by looking at the code below, this solution is more concise and doesn’t need to manage file handle.

This example shows how to retrieve files and directories which start with the string “te”:

<?php
$entries = scandir(".");
$filelist = array();
foreach($entries as $entry) {
    if (strpos($entry, "te") === 0) {
        $filelist[] = $entry;
    }
}

Let’s use the SPL Iterators

Now let’s talk about some SPL Iterators. But before going into deep about their use, let me introduce them and the SPL library. The SPL provides a series of classes for object-oriented data structures, iterators, file handlers, and other features.

One of the pros is that Iterators are classes and so you can extend them to better fit your needs. Another advantage is that they have native methods that are really helpful in achieving many of the common task you might face and you have them in just one place. Take as an example the use of FilesystemIterator among readdir(), both of them will be used in a loop but while using readdir() your entry will be nothing but a string, using FilesystemIterator you have an object that can provide you a lot of information about that file or directory (size, owner, permissions and so on).

Of course, PHP can provide you the same information using functions like filesize() and fileowner() but PHP5 has turned its approach to OOP. So, in conclusion, my advice here is to follow the new best practices for the language. In case you need more general information about SPL Iterators, take a look at Using SPL Iterators.

As said in the introduction, I’ll show the use of FilesystemIterator, RecursiveDirectoryIterator and GlobIterator. The first of them inherits from the DirectoryIterator while the others inherit from the FilesystemIterator. They all have the same constructor which has just two parameters:

  • $path (mandatory): The path of the filesystem item to be iterated over
  • $flags (optional): One or more flags as listed in the official documentation

What actually differs in these iterators is the approach they use to navigate the given path.

The FilesystemIterator

Using the FilesystemIterator is quite simple. To see it in action, I’ll show two examples. In the first, I’ll search for all the files and directories which start with the string “te” while the second will use another iterator, the RegexIterator, to search all the files and directories that contains ends with “t.dat” or “t.php”. The RegexIterator is used to filter another iterator based on a regular expression.

<?php
$iterator = new FilesystemIterator(".");
$filelist = array();
foreach($iterator as $entry) {
    if (strpos($entry->getFilename(), "te") === 0) {
        $filelist[] = $entry->getFilename();
    }
}

With the code above, the result is the same of the previous examples.

The second example that uses the RegexIterator is:

<?php
$iterator = new FilesystemIterator(".");
$filter = new RegexIterator($iterator, '/t\.(php|dat)$/');
$filelist = array();
foreach($filter as $entry) {
    $filelist[] = $entry->getFilename();
}

In this case the output is:

array (
  0 => 'script.php',
  1 => 'test.dat'
)

The RecursiveDirectoryIterator

The RecursiveDirectoryIterator provides an interface for iterating recursively over filesystem directories. Due to its aim, it has some useful methods as getChildren() and hasChildren() which returns an iterator for the current entry if it is a directory and whether current entry is a directory respectively. To see both RecursiveDirectoryIterator and the getChildren() in action, I’ll rewrite the last example to get the same result.

<?php
$iterator = new RecursiveDirectoryIterator('.');
$filter = new RegexIterator($iterator->getChildren(), '/t\.(php|dat)$/');
$filelist = array();
foreach($filter as $entry) {
    $filelist[] = $entry->getFilename();
}

The GlobIterator

The GlobIterator iterates through the file system in a similar way to the glob() function. So the first parameter can include wildcards. The code below shows the usual example with the use of the GlobIterator.

<?php
$iterator = new GlobIterator("te*");
$filelist = array();
foreach($iterator as $entry) {
    $filelist[] = $entry->getFilename();
}

Conclusions

In this article I’ve illustrated different ways to achieve the same goal: how to retrieve and filter files and directories in a given path. These are some key points to remember:

  • The function glob() is a one-line solution and allows filtering, but it isn’t very flexible.
  • The solution using opendir(), readdir(), and closedir() is a bit verbose and needs a post-filtering but is more flexible.
  • The function scandir() requires post-filtering as well but doesn’t need to manage the handle.
  • If you want to use an OOP approach, you should use the SPL library. Moreover you can extend the classes to fit your needs.
  • While the GlobIterator has the ability to do pre-filtering, the others can do the same in a comfortable way using the RegexIterator.

Do you know of other approaches to achieve the goal? If so and you want to share with us, go ahead. Knowledge sharing is always welcome.

Image via Fotolia