Rethinking Filesystems

Files and folders are artifacts of the past, so what's the next step forward?

Impetus

A few years ago (2021), I came across an article about how some younger students didn't automatically comprehend the directory structure prevalent on computers. Instructors were facing students who had grown up not with documents and photos and files in folders in folders in folders, but with their data scattered about websites, in their google docs and drives, and in the apps on their phones. Gone are once-powerful incantations such as "save to Desktop" and "/Users/<username>/Documents/school/essay.txt", replaced with a new generation of unhallowed sortings and searches.

Commentary and responses to this article, for all intents and purposes now lost in the wind, was split. As there always are there were several lamenting the ways of the youth, and how things are not the same as they had once been. Others blamed the software which had failed to keep up with the times, forcing instructors to grapple with teaching something they were probably never explicitly taught about in the first place. Perhaps there were others still with more nuanced takes, I cannot remember. so long ago.

The article and others like it have come and gone, but the gaps still remain. The directory metaphor, while tried and true, can feel as musty as an actual array of filing cabinets. And people still can't find their files. I propose a closing of this gap, a new way forward, via a system designed with how people actually use their files in mind.

What I Have in Mind

What would something better look like? How do people actually use their files? Maybe more importantly, how are they misused? I'm sure many of us have seen someone with a desktop overflowing with icons, with icons in vaguely-related piles. One of my own sins include a downloads folder overflowing with things I don't remember. Another is small one-off projects that consist of a couple of files that end up not sorted into anywhere in particular. How do we prevent things from getting 'in the way', and how do we find things that we are so rarely concerned about?

How do people find their files? If the finding were easier, maybe less effort would need to go into the initial sorting. I find myself just using whatever operating system's built-in search over other methods of locating files. I often have a vague sense of where the file in question is, but narrowing the location down often requires some sort of searching step, either by using search or filtering tools or just by going through files manually.

I have more ideas, but thinking about how to better search through one's files is a good place to start.

An Attempt

I attempted a solution to this problem within recent memory. My lack-of-solution was fairly enlightening, and frankly made me hopeful for a future where this problem is lessened. I will loosely describe what I did, what was good, what can be better, and what more might need to be done.

I implemented part of a filesystem in FUSE, and while I'm not sure FUSE is enough to capture my vision I haven't been convinced that it's not yet. Aside from FUSE, the other most significant software component was SQLite, which both served as my way of organizing data and metadata, but alsu suggested a way of searching and sorting through data in a way in which I was already familiar with. Though my experience with SQL is from a purely technical background, I believe the way it allows one to search, filter, sort, and so on is adaptable to a less technical interface that can still manage to be expressive and useful.

My initial attempt focused on tags and metadata, which I still think is a good way to approach this. Paths were still used, but instead of representing a hierarchy of files, paths represented a filtering over the whole filesystem. Instead of finding a folder for 'Client A' inside a directory for 'invoices' which would look like /invoices/client_a/, one would be able to narrow down all of one's files to only the invoices, and then filter to only the ones for 'Client A', resulting again in /invoices/client_a/. These are the same paths. The difference is that in my tag-based filesystem, the path /client_a/invoices/ results in the exact same set of files whereas this path may not even exist in a traditional filesystem, and if it did, would probably point to a different set of files entirely. Metadata worked similarly, for example one could further filter Client A's invoices to those from before a certain date by giving a path like /invoices/client_a/before:2023-05-06.

I didn't finish this filesystem, but I made enough progress to reasonably prove the concept. What was built felt great, but was not without its shortcomings. Sorting and filtering files this way felt natural and easy, and it didn't require much effort to feel well-organized. One of the problems is that if the criteria are broad, you'll likely get more files in your result than can fit in your head at one time. I imagine that by providing a list of tags or core metadata features on the remaining files in addition to the list of files (which I believe is reasonably doable), it may give you enough to narrow down your files. Another major issue was actually identifying when you wanted a file as opposed to when you wanted to filter further. I ended up throwing a special character on the path to determine whether you wanted the filter or the file, and even then, this system didn't do much to prevent multiple files with the same name. This problem I think is partly a shortcoming of my overall concept, but also a major shortcoming of the context in which the project lived.

Since I implemented this in FUSE, this was bound to be pretty much just a drop-in replacement for a filesystem, that is, it had to rely on and bend to the infrastructure already in place for UNIX filesystems. Though filesystems as they exist are really quite good at what they do, there's not much space for asking them to do much more without building an entirely different infrastructure around the new concepts. Which is, frankly, what I'm afraid might be necessary.

In my mind the most successful part of this was realising how much potential there is for auto-tagging and automatic metadata creation. One of the more encouraging parts of the system I ended up with was the automatic metadata feature, in which whenever a file was saved, its filetype, size, and other prominent features were automatically added to the database, and for some special file types, additional data could also be pulled, such as the dimensions of an image, the runtime of a video, or the length of an audio file. This was a tedious part to code, but I felt the results were absolutely worth it. Having this data built into the filesystem made using the metadata so much more actionable and natural.

todo

So now what? I still think a system like this is worth pursuing. There are a number of questions that need addressing before I go for round two.

How can we make the auto-tagging and metadata even better? Naturally more sophistocated auto-tagging and metadata are a must, perhaps including user access, last application to open, last application to edit, and so forth. Along those lines, I want something like a plug-in system, where different filetypes can have their specific metadata included in the filesystem. Expanding the tagging system to include things like sub-tags, project tags, and other context-specific information is another high priority, as this could make the system much more usable for projects that span a large number of files.

First of all, do I abandon the UNIX-style filesystem completely? Is object storage the way to go here? Should I write a new set of tools to work with files, and new file pickers? It's far more ambitious than just a filesystem but the benefits could be enlightening. I am imagining perhaps a new file interface for mobile devices, to replace the odd uncomfortable ways files are currently managed within and among apps. Building on top of instead of for the traditional UNIX patterns could definitely open up a greater range of interactions and allow more fine-tuned control over files, which leads to another question.

What other features do I want this to have? If I'm doing something fantastically different, I can maybe have radically different features. If we're thinking about ways people actually use their files, building in some sort of versioning system would be a natural next step. Other features could be taken from existing file systems, and I will need to study those, as well as other types of blob storage systems for further inspiration.

This is all premature. While I hope to achieve something like I've described, it will require a lot more thought and a lot of work. I am looking forward to doing that thinking and working, and to developing the tools to make this more possible.