In this article I’ll talk about a common task you might have experienced while developing a PHP application: listing files and directories. I’ll discuss several basic and advanced solutions, each having its pros and cons. First I’ll present three approaches that use some very basic PHP functions and then progress to more robust ones which make use of SPL Iterators.
For the purposes of the discussion, let’s assume a directory structure that looks like the one below:
\---manager | \---user | \---document.txt | \---data.dat | \---style.css |---article.txt |---master.dat |---script.php |---test.dat |---text.txt
The Basic Solutions
The first set of approaches demonstrate the use of the functions glob()
,
a combination of the functions opendir()
, readdir()
and closedir()
,
and the the function scandir()
.
Using glob()
The first function to discuss is glob()
which allows us
to perform a search for pathnames using wildcards common to the best
known shells. The function has two parameters:
$pattern
(mandatory): The search pattern$flags
(optional): One or more flags as listed in the official documentation
Let’s see some examples! To search in the directory for all files and directories that end with .txt, you would write:
<?php $filelist = glob("*.txt");
If you display $filelist
, the output will be:
array ( 0 => 'article.txt', 1 => 'text.txt' )
If you want a list of files and directories that begin with “te”, the code to write is:
<?php $filelist = glob("te*");
The output is:
array ( 0 => 'text.txt' )
To get a list of directories only which contain “ma”, the code is:
<?php $filelist = glob("*ma*", GLOB_ONLYDIR);
In this last example, the output is:
array ( 0 => 'manager' )
Notice that the last example makes use of the GLOB_ONLYDIR
constant
as the optional second parameter. As you can see, the file called master.dat
is
excluded because of it. Although the glob()
function is
easy to use, it isn’t so flexible sometimes. For example, it doesn’t
have a flag to retrieve only files (and not directories) that match a
given pattern.
Using opendir() and readdir()
The second approach to read files and directories I’d like to
discuss involves the functions opendir()
, readdir()
,
and closedir()
.
opendir()
opens the directory and returns a connection
handle. Once the handle is retrieved, you can use readdir()
.
With each invocation, this function will give the name of the next file
or directory inside an opened directory. When all the names have been
retrieved, the function returns false. To close the handle you use closedir()
.
Unlike glob()
, this approach is a bit more involved since
you don’t have parameters that help you filter the returned files
and the directories. You have to perform post-filtering yourself to get
what you want.
To parallel with the glob()
function, the following example
retrieves a list of all the files and the directories that start with “te”:
<?php $filelist = array(); if ($handle = opendir(".")) { while ($entry = readdir($handle)) { if (strpos($entry, "te") === 0) { $filelist[] = $entry; } } closedir($handle); }
The output is the same as the previous example.
But if you execute the code above and output the value of $entry
as
it runs, you’ll see it contains some odd-looking entries at times: “.” and “..”.
These are two virtual directories you’ll find in each directory
of the file system. They represent the current directory and the parent
directory (the up-level folder) respectively.
The second example shows how to retrieve only the files contained in a given path.
<?php $filelist = array(); if ($handle = opendir(".")) { while ($entry = readdir($handle)) { if (is_file($entry)) { $filelist[] = $entry; } } closedir($handle); }
As you might guess, using the above code produces the following output:
array ( 0 => 'article.txt', 1 => 'master.dat', 2 => 'script.php', 3 => 'test.dat', 4 => 'text.txt' )
Using scandir()
And finally, I’d like to present the scandir()
function.
It has only one mandatory parameter: the path to read. The value returned
is an array of the files and directories contained in the path. Just
like the last solution, to retrieve a subset of files and directories,
you have to do post-filtering yourself. On the other hand, as you can
see by looking at the code below, this solution is more concise and doesn’t
need to manage file handle.
This example shows how to retrieve files and directories which start with the string “te”:
<?php $entries = scandir("."); $filelist = array(); foreach($entries as $entry) { if (strpos($entry, "te") === 0) { $filelist[] = $entry; } }
Let’s use the SPL Iterators
Now let’s talk about some SPL Iterators. But before going into deep about their use, let me introduce them and the SPL library. The SPL provides a series of classes for object-oriented data structures, iterators, file handlers, and other features.
One of the pros is that Iterators are classes and so you can extend
them to better fit your needs. Another advantage is that they have native
methods that are really helpful in achieving many of the common task
you might face and you have them in just one place. Take as an example
the use of FilesystemIterator
among readdir()
,
both of them will be used in a loop but while using readdir()
your
entry will be nothing but a string, using FilesystemIterator
you
have an object that can provide you a lot of information about that file
or directory (size, owner, permissions and so on).
Of course, PHP can provide you the same information using functions
like filesize()
and fileowner()
but PHP5 has
turned its approach to OOP. So, in conclusion, my advice here is to follow
the new best practices for the language. In case you need more general
information about SPL Iterators, take a look at Using
SPL Iterators.
As said in the introduction, I’ll show the use of FilesystemIterator
, RecursiveDirectoryIterator
and GlobIterator
.
The first of them inherits from the DirectoryIterator
while
the others inherit from the FilesystemIterator
. They all
have the same constructor which has just two parameters:
$path
(mandatory): The path of the filesystem item to be iterated over$flags
(optional): One or more flags as listed in the official documentation
What actually differs in these iterators is the approach they use to navigate the given path.
The FilesystemIterator
Using the FilesystemIterator
is quite simple. To see it
in action, I’ll show two examples. In the first, I’ll search
for all the files and directories which start with the string “te” while
the second will use another iterator, the RegexIterator
,
to search all the files and directories that contains ends
with “t.dat” or “t.php”. The RegexIterator
is
used to filter another iterator based on a regular expression.
<?php $iterator = new FilesystemIterator("."); $filelist = array(); foreach($iterator as $entry) { if (strpos($entry->getFilename(), "te") === 0) { $filelist[] = $entry->getFilename(); } }
With the code above, the result is the same of the previous examples.
The second example that uses the RegexIterator
is:
<?php $iterator = new FilesystemIterator("."); $filter = new RegexIterator($iterator, '/t\.(php|dat)$/'); $filelist = array(); foreach($filter as $entry) { $filelist[] = $entry->getFilename(); }
In this case the output is:
array ( 0 => 'script.php', 1 => 'test.dat' )
The RecursiveDirectoryIterator
The RecursiveDirectoryIterator
provides an interface for
iterating recursively over filesystem directories. Due to its aim, it
has some useful methods as getChildren()
and hasChildren()
which
returns an iterator for the current entry if it is a directory and whether
current entry is a directory respectively. To see both RecursiveDirectoryIterator
and
the getChildren()
in action, I’ll rewrite the last
example to get the same result.
<?php $iterator = new RecursiveDirectoryIterator('.'); $filter = new RegexIterator($iterator->getChildren(), '/t\.(php|dat)$/'); $filelist = array(); foreach($filter as $entry) { $filelist[] = $entry->getFilename(); }
The GlobIterator
The GlobIterator
iterates through the file system in a
similar way to the glob()
function. So the first parameter
can include wildcards. The code below shows the usual example with the
use of the GlobIterator
.
<?php $iterator = new GlobIterator("te*"); $filelist = array(); foreach($iterator as $entry) { $filelist[] = $entry->getFilename(); }
Conclusions
In this article I’ve illustrated different ways to achieve the same goal: how to retrieve and filter files and directories in a given path. These are some key points to remember:
- The function
glob()
is a one-line solution and allows filtering, but it isn’t very flexible. - The solution using
opendir()
,readdir()
, andclosedir()
is a bit verbose and needs a post-filtering but is more flexible. - The function
scandir()
requires post-filtering as well but doesn’t need to manage the handle. - If you want to use an OOP approach, you should use the SPL library. Moreover you can extend the classes to fit your needs.
- While the
GlobIterator
has the ability to do pre-filtering, the others can do the same in a comfortable way using theRegexIterator
.
Do you know of other approaches to achieve the goal? If so and you want to share with us, go ahead. Knowledge sharing is always welcome.
Image via Fotolia