My recent task was to search in bucket for objects located in selected "directories" using PHP. Directory list inside bucket is not constant. New directories can be added every few months. Search should fetch all "main directories" that match some pattern and then search inside those.
[NOTE] I put word directories in quotes because there is no such thing as directories in Google Cloud Storage (unless hierarchical namespace is used but this is in preview stage). Buckets store objects in a flat structure without a hierarchy, meaning that there's no directories or folders. However, because object paths contain
/
, applications can group and display objects with same prefix in directory-like structure.
GCP storage library offers a $bucket->objects()
method for listing and searching objects. However, after reading documentation, examples and API spec, it was still not clear to me how to just fetch prefixes from bucket.
I spent some time experimenting with GCS library and found a few ways to just list prefixes (a.k.a. directories). Will present it here so it may be of some help to other users.
Here is example filesystem structure i synchronized with my test bucket.
bucket/
├── dir_1
│ └── subdir_1
│ └── test1.txt
├── dir_2
│ ├── subdir_2
│ │ └── test2.txt
│ └── test3.txt
└── test4.txt
After copying files, insides of bucket look like this:
gs://bucket/dir_1/subdir_1/test1.txt
gs://bucket/dir_2/subdir_2/test2.txt
gs://bucket/dir_2/test3.txt
gs://bucket/test4.txt
Here is code snippet illustrating multiple ways to list just prefixes inside a bucket.
require 'vendor/autoload.php';
use Google\Cloud\Storage\StorageClient;
$storage = new StorageClient();
$bucket = $storage->bucket(getenv('GOOGLE_CLOUD_STORAGE_BUCKET'));
$baseOptions = [
// need to be set to make GCS return
// results in a directory-like mode
'delimiter' => '/',
// we are interested in prefixes only,
// and those are not included by default
'fields' => 'prefixes,nextPageToken'
];
$testCases = [
[
'name' => 'Use matchGlob to retrieve main directories',
'options' => [
// forward slash is important,
// to match directories instead of files
'matchGlob' => '*/',
],
'expectedResults' => [
'dir_1/',
'dir_2/',
],
],
[
'name' => 'Use matchGlob to retrieve subdirectories',
'options' => [
'matchGlob' => 'dir_1/*/',
],
'expectedResults' => [
'dir_1/subdir_1/',
],
],
[
'name' => 'Use prefix to retrieve main directories',
'options' => [
// can also be omitted since it's default value
'prefix' => '',
],
'expectedResults' => [
'dir_1/',
'dir_2/',
],
],
[
'name' => 'Use prefix to retrieve subdirectories',
'options' => [
'prefix' => 'dir_1/',
],
'expectedResults' => [
'dir_1/subdir_1/',
],
],
[
// this is just example to show that matchGlob and prefix
// can be used together
// it is easier to just use one of them
'name' => 'Use prefix and matchGlob together',
'options' => [
// double asterisk is important,
// otherwise it will not match deeper than first level
'matchGlob' => '**/',
'prefix' => 'dir_1/',
],
'expectedResults' => [
'dir_1/subdir_1/',
],
],
];
foreach ($testCases as $testCase) {
$options = $baseOptions + $testCase['options'];
printf(
'Running test case: "%s" with options: %s' . PHP_EOL,
$testCase['name'],
json_encode($options)
);
$objects = $bucket->objects($options);
// initialize api call
// important because api call is made only
// when you try to access data
// and prefixes are empty at this point
iterator_to_array($objects);
// get prefixes
$directories = $objects->prefixes();
if ($directories === $testCase['expectedResults']) {
printf("✅ Test case passed." . PHP_EOL);
} else {
printf(
"❌ Test case failed! Expected: %s, got: %s" . PHP_EOL,
json_encode($testCase['expectedResults']),
json_encode($directories)
);
}
}