Go back to the main page.
Go back to the "extra" page.

2009-01-10: Keeping myself busy, organizing folders

related: hoarding, bookmarking, etc.

It's always interesting to meet people who are information packrats. Just looking at their hard drive folder organization can be interesting, or even physical file folder setup. Here's what one of my disks looked like five months ago:


When I want to be, I can be amazingly thorough in organizing information. This is especially true for a limited number of categories of things to go into. When my father would always want to clean my room, he'd say there's only either trash or something that must go into a pre-set category (I very much disagree with this- don't ever do this, it *hurts*). I could personally trace every single detail of everything that I ever had: receipts, wrappers, cords, wires, electronic equipment, thousands of pages of hand-written notes when away from a keyboard, etc. So as you reduce the number of categories in to which to sort things, naturally there are more mistakes and you end up with a high quality of cleanliness and the appearance of a job well done. Meanwhile, future requirements might depend on information that has been sorted into 'trash' or something incorrectly by others who don't know how to properly sort or deal with the information, which is strong motivation for not dealing with a significant portion of my own personal information archive. I am not convinced that increasing the number of folders or the number of categories into which to sort things is the proper way to do it, and this applies the same for tagging and other semantic-philosophical equivalent subjects, but it's kind of what the current file systems force you to do, and it's certainly what you have to do with physical file folder systems.

So, for now, let's just all assume we're happy with the hierarchical file folder sorting system plus symbolic links.

An additional problem creeps up. As you increase the number of file folders, from my own personal observations of my own habits, it's increasingly harder to maintain the integrity of each of the categories and making sure that I don't forget that there is a specific place for a specific type of information to go into. On top of this, the deeper that you go into the hierarchy and so on, the more time you have to spend typing out the path name (or you can train yourself to remember some very convoluted single-symbol acronyms that are all symlinked over to the proper places, but still).

Consistent periodic routine of organizing the framework, spring cleaning, sorting, renaming, moving, relinking, etc., archive/backup rather than delete.

Recently Mac Cowell and Jason Morrison were asking me about how I go about organizing my information. How do I decide whether to submit it to which mailing list? and which forum? and which blog? In truth, I don't- I have been in a "rut" and have just focused mostly on email for a while now, mostly ignoring my forum subscriptions and missing out on a lot of old friends in old communities that I hope to one day meet back up with. But one idea that I've been kicking around is a bayesian filtering mechanism much like those used in spam filters, except for classifying my own writings and then figuring out, from historical information, which mailing list or which forum has had statistically similar content and whether or not the reactions were good or bad to that content. Maybe this way, then, I can have an outgoing filtering mechanism that can send my information to where it needs to go. Unfortunately, with incoming files, binary data and scanned data isn't as easy to quickly sort into specific categories, and there's always so much external context to the files which might involve my various plans in how I was going to use the files or how I wasn't going to use them.

List of problems

  1. Long-term viability of the folder hierarchy structure. Making sure the right files end up accumulating in the right places.
  2. Amount of time it takes to sort a new file to the right place in the hierarchy.
  3. Remembering the organization scheme (esp. a scheme that *grows*).
  4. Actually spending the time to organize and fix problems with the system. Two ways to solve this that I can think of off the top of my head. First, install a bug submission system for for files for yourself so that you can quickly write up a bug report mentioning where a file is and where it might be better to put it, but this will be dealt with at a later time. Secondly, when the number of 'bugs' in placing things in the folders hits some threshold, go into a spring-cleaning mode. The bug submission process *must* be easy to do- as easy as saving a file.


http://www.43folders.com/2006/08/10/folders-for-action

So, first off, be mindful about what’s likely to happen to a folder’s contents hours, days, months, or years from now.

Above all, try to envision the future moment at which this information will become useful and necessary again, and make sure your filing and piling support that scenario and lead quickly to any needed actions [which should be scripted anyway, but whatever].


At what point is there diminishing returns on spending how much effort filing just one file versus giant archives that might have more uniform meaning/usefulness/value in general?

So, second, and more to the point of email and physical “pending” folders, I think it’s useful to think of all the information in your world in terms of potential activity. Remember that demonstration from 8th grade science? The bow drawn back represents potential energy, and the arrow in flight is kinetic energy. Don’t get stuck thinking that kinetic action is the only game in town, and definitely and don’t let your byzantine folder system lull you into missing all the action potential currently unmined in your files.

The danger of too much foldering in your email program, in particular, should be self-evident. The more folders you have, the more thinking you have to do on both ends of information and action management: you have to first ruminate on the “right place” to put that email and then you’ll again have to recall where that right place was once you need it again. Is there a way you could just convert it to an action right now and be done with it forever? And is an email folder actually the best place for you to store a particular piece of information that you’ll need again someday? Only you know, chief.

For me, these folder structures just get simpler and simpler all the time; 90% of my email work now goes from “In” to either “Respond/Action” or “Archive.” What else is there to maintain? Do you really need five levels of time- or project-based archiving when you have a modern search-friendly program like Mail.app or Google Desktop? Maybe. Again, it’s your call.
Bottom line: ensure that all the folders, buckets, nets, and boxes in your life exist to support action above all else. The short-term buzz of “getting something out of your way” will fade quickly and is way offset by the future hassle of having to dig it out of your crazy nested system later on. Organize to act, not the other way around.
How to organize your files during your research career -- bibliography dir, 'done' dir, scripts/hacks, a dir for things to learn, todo, log, etc. Mail yourself things to do. Every paper and directory has its own directory. Use version control management tools. You might want to use general unix conventions like ~/bin, ~/lib, ~/pkg, ~/tmp. Be sure to keep a portfolio of your work so that you can take it with you whenever you go to talk with people- papers, publications, screenshots, code snippits that you are particularly proud of, stats, photographs, schematics, etc. You already have this on your website, of course, right?

Ideally, everything that I genuinely need to share should be based off of microformats and service daemons that I could write. For instance, BibTeX services should be integrated from the very moment that I download a dot tar with a PDF and BibTeX and some other information in it, to the moment that somebody queries my server for information on what I've read. Similarly, this should be the same for important financial information documents and information-- I should be able to just download some financial automation tools and keep track of my transactions (and so on) without having to stress out about all of these hundreds of different forms and documents. Seriously, this is the information age, why isn't this hidden behind the scenes for me? I'm a better programmer than this, why am I tolerating this?

# ls -R | grep ":$" | sed -e 's/:$//' -e 's/[^-][^\/]*\//--/g' -e 's/^/ /' -e 's/-/|/'

One method that might help out the situation is to use something like 'dselect' plus "tree" (the short script given above which outputs the overall tree structure of some directory) or something like that. The 'dselect' ncurses select-the-dir interface wouldn't be "just scroll until you hit the directory that you want", but rather the user should type out a few tags for what the content is, and then the "use-cases" that are in the database will pop up and suggest the right places to put the files. And if there are multiple places in which to file them, those should be selected too of course and symlinks can be made and it's no big deal. This is somewhat related to the idea of using a tool like dselect to select which "use-case" you want to use for the parameters to a program (see shell.html). It may also help to have a "context signature" - a list of actions that the user has executed since booting up the computer (like bash, bash, bash, konqueror, konsole, vim, bash, xterm, firefox) leading to that action, plus other related information, so this way the tool could work off of more information to help to cache some likely useful features immediately, but this isn't required at all. The whole point of the shell.html file was to talk about the interoperability of programs via the standardization of the command line arguments or the ability to describe valid command line argument structures (getopt, man pages, --help stuff) for a given program, and this tool is just the extension of such on to the use of file folders ("save-where?" and "open-with + apt-cache-search" per the recent debian-devel mailing list discussions).

dselect tool:


Organizing mistakes
HowToOrganizeStuff.com
FileSystemAlternatives
meshes-and-hierarchies
Limits of Hierarchies
Clay Shirky: Hierarchies are Overrated
faceted hierarchies
ThereAreNoTypes
NetworkVsHierarchy


I think it's -partially anyway- about the fact that none of the steps you take to increase productivity are automated? Whenever a path changes, your shortcuts lose their value. Unless you update them manually.



http://bbs.archlinux.org/viewtopic.php?pid=301679#p301679
my home hierarchy often moves , but I follow two rules I found to be critical for an efficient hierarchy:
1. no more that 8 folders at any point in the hierarchy
2. no more than a 3 level deep hierarchy

what's more I use two other 'tips'
a. files can be stored at any level (not only leaves)
b. files can be stored in multiple places (via hardlinks)

those numbers are not 'cat /dev/random', they are key numbers in human brain memory model: the instant memory (memory that stores data seen in a blink of an eye, literally) can memorize about 8 objects, and the brain has a hard time keeping up with deeper than 3 levels of hierarchy without resorting to time-costly abstractions (~context switching)
this way I can navigate faster, as I can recall the hierarchy of a thing very quickly and as a whole, and in each level I can see and "compute" all folders at once and not resort to some linear or dichotomic search navigation.
also with 8 folders, graphical navigation (in a file browser) can be greatly improved with visual clues: nautilus emblems make navigation a snap.
what's more, spatial mode with 3 levels max is really useful and much less cluttered than some people think.

notes:
- it's awesome how people overlook tip (a.) and end up only putting files in leaves of the tree, often creating dubious leaf subdirs for that purpose (and having a hard time figuring a name), when the file could well enough fit in the parent.
- (b.) gives me the advantage of directories behaving like both a tree hierarchy and tags. indeed what I do is some kind of tree sorting where in my memory I associate a file with a tag, and that tag belongs to a known, capped hierarchy. so my search has a cost of max 1+3*log(8)/n, n being the numbers of tags the file is associated with.
- the home versioning idea is indeed a good one. it somehow reminds me of the wanna-be TimeMachine of Leopard, except that you have total control over it. with my hierarchy, I rarely have a need to 'go back': what doesn't belong to the tree is outdated, and gets moved in a 'archives' accumulating directory. now that I think of it, this 'archives' dir is just like the .git dir, except it's manually managed. also I was thinking of having the home files hardlinked into archives (on a 'commit'), so that I can just delete a file and it would still be on 'archives', while not taking anymore place when not yet deleted. I'll have to think of it and read those articles above.


2009-02-01

Once upon a time, there was a King who ruled most nobly his kingdom. This was a special kingdom, fit for a special King, for all across the worlds, it was said, a thousand journeymen could tread for a thousand lifetimes and still not find a more magnificient kingdom than the King's. No peasant, tailor, blacksmith, knight nor fair maiden had ever known a more vast, bountiful land of all lands; indeed, many immigrated into the magnificient kingdom, some traveling very far over many mountains. But why, then, did the king wake up in a sweat one night? He rushed out of bed, donning his kingly robes; the mighty metal doors to his royal chambers slammed open and shut- immediately, the maid service scrambled down the hallway, tripping over one another after a rather boring game of cards. "Yes, your majesty, what, my King, is the problem?" said one particularly young, naive maidgirl. The King, who was pacing about, stopped and paused. "Woman," he raised his voice, "fetch my ten brightest!" And the young girl, as well as the others, rushed out of sight to the royal think tank to fetch the King's ten brightest men, who were woken and pushed out of bed in a rush to scramble to the King, who now recided in the dining room, looking rather perplexed, worried, confused. "My King," one of the ten bright men began to speak- "Ten brightest men. Every night I sleep and dream, but all is not well as it seems. Day after day, this magnificient kingdom, this golden kingdom sends me so much information. I can hardly swim. Thus I order you to construct a Royal Ontology fit for a King in no more than ten days time." And so the men looked at each other and nodded their confirmation. On the first day, the youngest of the men came back to the King, and showed the King a computer with three folders: "in", "out", "other". The King was not amused, and so the youngest of the ten men, who turned out to be not so bright, was never to be seen again in this world or the next. On the second day, the second of the ten men met with the King in the royal garden.