5 reasons to keep everything- An argument for digital information hoarding in the government of Canada

A key theme of the upcoming  Government of  Canada discussions on the   “digital office” is going to be electronic recordkeeping.  This is not surprising since the prime mover for this initiative is Library and Archives Canada, an institution charged with preserving records of historical value, also because, one could hardly imagine a digital office of the future that doesn’t rest on well managed and accessible digital information.  With sound digital recordkeeping comes transparency, preservation of our legacy knowledge and, most importantly, the ability to contribute, find and share the right information. 

As these discussions gear up, government employees who work in information management will be facing a major challenge: coming to grips with the way digital information is fundamentally changing recordkeeping.  Digital information calls into question some well established recordkeeping roles and in no place is this more evident than in determining what information to keep and what information to delete.  In a paper-based world ‘keep or delete’ procedures enabled us to weed out older records so that we could manage the limits on physicial storage space and more easily find valuable records.  Digital records do not take up physical space and with dramatically improved searchability of digital information ie search engines,  the real question is why don’t we just keep all of our digital information?      

Here are 5 reasons to keep everything: 

1) Costs of storing digital information continue to plummet.   A gigabyte, (200,000 pages of txt) of digital storage costed $12,000 in 1989 and costs $.64 in 2010. 

2) Web2.0 and other new tools are compounding the complexity of determining what to delete.  These web20 tools do not behave like traditional records ie Wikis are continuously evolving, organic and integrated. 

3) Knowledge management is broadening the scope of what information is considered valuable.  In the past, final versions and major drafts of important documents were considered valuable records.  Knowledge  Management  has broadened the notion of what is valuable to include discussions, decision and evolutions in thought that go into the creation of these ‘official’ records.

4) Search engine and other technologies are enabling the management of much larger collections of unstructured (unorganized) information, to filter out the noise to get at the real information.  Eg. Google creates order to Internet Chaos.  Eg. Version control allows for the designation of major drafts and final documents without having to delete.

5) Unpredictability of the future value of information and knowledge.   It is harder to assess what will be of future business or archival value so it is safer just to keep the information rather than discarding it.  Eg Climate change scientists are finding valuable data in ships logs from the 17th and 18th century, recently published by the British admiralty

Seriously considering the idea of keeping everything does not mean that information should not be managed. It is still fundamentally important to have processes in place to identify major and final documents and to put rules around how new tools like wikis need to be managed in way that makes this information reliable and findable.

It is likely to be too administratively costly and even risky to continue to go through the age old processes of disposition (deleting) in a digital environment. At any rate, Information Management professionals need to examine fundamentally what this all means in a digital world.

Advertisements
This entry was posted in IM, recordkeeping. Bookmark the permalink.

16 Responses to 5 reasons to keep everything- An argument for digital information hoarding in the government of Canada

  1. archivanator says:

    Compelling arguments but…what about the costs associated with sifting through all of that in the context of research (for future users) or in the context of litigation? What about building a story using the information – wouldn’t the person building the story be sidetracked and even mislead by information that really isn’t that important but was kept because it was easier than to delete it? There are also compelling reasons for not keeping things – deleting is also an act of creation in a way – as the creator of the information, you have the power (and in some case I would argue the duty) to determine what that record will be. Are the search engine powerful enough or even smart enough to ensure that whatever you need will come up? It is too easy to create information and then determine that everything is valuable, also my experience is that no machine, at this point is able to replace a human brain (a good one at least) while doing research, to make the intellectual links between information resources in order to draw a complete portrait of a complex question. Shouldn’t we rather teach people to understand how to only save the right information than assuming machines will make those links for us? Because we can save everything, does it mean we should? Just some thoughts….

    • deepishthoughts says:

      I really liked your insight that recordkeeping is about giving people the ability to tell the real story without having to wade through or be sidetracked by useless information. Maybe ”keeping everything” means that we rethink the scope of the information resource types that are necessary to tell the whole story.

      The best example I can think of is current retention and disposition authorities for policy documents. Under our current RDA (MIDA) processes we have decided that policy documents can be deleted two years after they are superceded by a new policy. Knowledge management has suggested that to truly tell the whole story you need to keep and to manage these historical versions because they provide the context and narrative. To extend that out a bit more not only do we allow for these final documents to be deleted, the intermediate documents, the discussions on blogs, the communications in email that went into the creation of this final document are not even comprehended in our current process. (please correct me if I am wrong about this)

      If we continue to delete these knowledge resources we only tell part of the story and more importantly we lose the capacity to learn from the past and risk re-inventing the wheel.

      So, what “keep everything” means in this instance is to re-look at the types of documents we now routinely delete and ask our selves do they have value as knowledge and how can we effectively manage them so that the story remains accurate and true.

    • deepishthoughts says:

      Oh by the way i am not ignoring your obvious reference to the most controversial element of a “keep everything” principle: transitory records. i am addressing that in a response to Danielle

  2. Danielle says:

    As a point of clarification, don’t we want want people deleting unnecessary electronic information the same way they destroy traditional paper transitory records? I think this is an issue that needs to be clarified. I understand that storage capacity allows us to ‘keep’ more information, but I’m hesitant to agree that this means we should. Arguing that we keep everything seems like one big step towards electronic hoarding, then again, maybe my fine-tuned librarian senses can’t turn their back on the benefits of weeding.

    • deepishthoughts says:

      Part of the argument is that it may take more effort to delete transitory records than it would be to manage them. This argument hinges on the use of technologies that have document management and in particular version control built into them. If our edrms, our wikis automatically generate a version history it is more likely easier to have knowledge workers identify major versus minor versions than to have them go through and delete the minor versions as part of their IM chores. IM responsiblities then shift from clean up to managing upfront their information. In this context the argument goes that it is not necessary to delete if you can find with a degree of confidence the major or final version and there may be some benefit to preserving even the garbage for unanticipated future value.

      I doubt if ancient romans ever thought that future archeologists would find value in their garbage dumps

  3. Jenn says:

    This is an interesting blog and will, without a doubt, raise a lot of eyebrows (especially based on how things have been dealt with in the “paper” world).

    It is definitely a culture change and needs to be looked at …. however, “keep everything” has always made me gasp! I don’t think that there is an easy nor fast approach that can be taken when thinking about the digital office. Since the Government of Canada is so fond of their working committees, working groups, etc., I seriously think that this is the approach that needs to be taken in order to have some type of systematic approach in place (government-wide) to handle information, its retention, and its disposition.

  4. @Scilib says:

    (posted from @scilibs twitter comments)

    There’s an interesting gap between inflexible rules of where and how info should be managed and the reality of what people do, for example, we’re told not to keep things in Outlook forever, but that’s the most natural thing for many people to do. Here’s the channel through which the majority of your information circulates & you’re forbidden to use it as searchable archive

    • deepishthoughts says:

      One of the perennial challenges of recordkeeping in government is to motivate public servants to organize, store and delete their information. Until recently this has meant policies, directives and guidelines that make information management a responsibility and training to impress upon employees its importance.

      What you have appealed for is an alignment of recordkeeping with how employees actually use and manage their information and I wholeheartedly agree. In other words, as the government visualizes the digital office of the future adapting IM and recordkeeping tools and processes to the way public servants naturally do their work, or, in other words the “ergonomics” of information management has to be an important topic of discussion.

      In fact, ergonomics of the digital office is a key theme of the initiative, so, more on that later.

  5. Fascinating blog post! The phrase that comes to mind when reading the arguments for not deleting anything is, “One’s man’s trash is another man’s treasure.” No two people will ever look at a piece of content and make the same value judgement. There are so many perspectives to consider when evaluating information that to rest the decision on its fate with only one or a handful of people does not take into consideration the full information / knowledge value spectrum.

    One of the benefits of keeping everything is that it shifts the IM focus / resources / costs away from retention and disposition and puts the emphasis on improving the proccesses through which knowledge and information is created, it’s organization, the metadata we apply, the ways we provide access, and information literacy.

    At the same time, I’m still open to hearing some compelling stories that truly convey the risk of keeping everything.

  6. Angela says:

    I must disagree with your idea of keeping everything. Can you imagine what the world would look like if we did keep EVERYTHING? Watch the TV show Hoarders on TLC and you will see what the effects are of keeping everything. Just because we can’t see the information piling up in digital format, doesn’t mean it’s not really happening. It’s like the compulsive shopper who whips out her credit card every time she makes a purchase and justifies it by saying, “well I can’t see the money draining out of my accounts and drowning me in debt, so it’s ok to buy this expensive stuff”.

    Imagine if you took the 50 some terabytes of digital information we now have on NRCan networks and put it into paper form. The volume would be astronomical! Could you imagine trying to make your way to your desk every morning over the mountains of information? You’d be buried alive!
    With this image in mind, does your opinion still stand?

    What if in the next 2, 5 or 10 years NRCan is hit with a major lawsuit? Having worked at PWGSC as part of the “Sponsorship Scandal” team responsible for ensuring that all of the information needed was supplied and organized (the amount of information gathered was 2-3 times the height of the Eiffel tower!), I’ve got to say that keeping everything is irresponsible. If we follow established retention periods, we wouldn’t have to go through such a strenuous exercise, nor would potentially damaging information need to be submitted to lawyers because it was disposed as per LAC guidelines.

    This is also the case with ATIP requests. The department receives a very high number of requests each year which consumes a lot of people’s time doing search and retrieval. If we had less information to go through because we’re deleting transitory and official records that have met their retention, it would save a lot of time in answering these requests and as a result, saving tax payers money.

    Your last comment about how “it is likely to be administratively costly to go through the age old process of disposition in a digital environment” is not entirely true. Yes, it may be costly up front, but as per my points above, it could save the department and tax payers millions of dollars in litigation costs as well as the cost of processing ATIP requests in the long run.

    Finally, who really wants to keep everything anyways? As the age old adage goes; a cluttered life (or in this case desk, computer, filing cabinet) is a cluttered mind…Wouldn’t we all be able to think a little more clearly and manage our time and priorities a little better if we didn’t have a gagillion electronic records staring us in the face every day?

    My 2 cents…

  7. @rebecca_blake says:

    In my short three years as a public servant I honestly don’t think I’ve ever (besides spam) deleted anything electronic.

    A couple reasons for this include:
    – Time and effort. I’m currently not convinced that wading through emails, documents, etc and unilaterally deciding what items to delete is a good use of time and tax dollars.
    – A desire to make the information I generate available for future generations. I can’t count the number of times I been frustrated by the lack of available/easily findable background documents on past policy decisions. Discussions and debates leading up to ‘final’ policies are the golden nuggets. Not only do they help paint a full picture, they enable policy makers to assess whether or not the assumptions and rationales behind past policy decisions are still relevant and applicable today.

    The real challenges, in my opinion, are the following:
    – Easily sharing the information I generate with other public servants. In an effort to reduce potential duplication and limit the need to reinvent the wheel, I often copy and paste content from emails to internal wikis to increase accessibility for others and make information easier to find for my own future use.
    – Powerful search engines. As I understand it, NRCan’s internal search engine is powerful for two reasons. First, the machine integrates a growing number of collections (wiki, intranet, blogs, website, departmental news, ‘published’ sharepoint documents, etc) in it’s search results. Second and perhaps more importantly, it integrates the human mind. By enabling employees to rate useful search results, add relevant tags, filter results, etc it helps the next user find what they’re looking for even quicker.

    Great post and very cool discussion all!

  8. Angela says:

    I really can’t justify using the fact that we may have powerful search engines for the argument of keeping everything.
    Let’s say that we do have an amazing search engine at NRCan and it’s able to find exactly what we’re looking for; this doesn’t mean that we won’t have to then sift through the results to get the document we want.

    It also doesn’t help for litigation or ATIP purposes. When we receive an ATIP request on any particular topic and we do a search on the search engine, we will have to go through all of the returned results to sift through what is relevant to the request. We will then have to hand over a mountain of documents to the ATIP office in order for them to process. This is already an issue; I can’t imagine what it would be like in a couple of years from now if we did decide that keeping everything was a best practice.

    I will also reiterate my point about law suits. If we keep everything, we will have to supply everything during litigation. A great search engine makes no difference here. Sure, it may help us find records related to the litigation, but we will then have to sort, organize and go through them all to provide as evidence. And having gone through this on a couple of occassions, let me tell you, it’s no walk in the park.

  9. Jac says:

    I tend to agree with Angela. ATIP requests are already extremely time consuming. Imagine if we kept everything! It would not only affect the time required by ATIP people from the departments to sort throught the information but would certainly spill over to lawyers’ offices and Courts since all are most likely to end up with more voluminous files. It would overburden the system.

    Another issue is the Privacy Act. Citizens and individuals we collect information from expect us to dispose of their personal information at some point. As a citizen, I’m not really sure how I would feel about the idea that my personal information could be kept eternally, especially if it no longer serves its original purpose. Not knowing what I know about the challenges of information management, my immediate reaction would probably be, “What for??!!”

    Also, a particular piece of information means something specific at a specific point in time, but can take an entirely new meaning at another. There is a fine line between the benefit and the drawbacks of keeping information after it has served its purpose.

    My opinion is a balance has to be reached where we would keep information at a reasonable volume according to its nature in order to respect its purpose and keep its management efficient (ATIP and Privacy Act-wise for example), all the while not spending unreasonable amounts of time weeding out and disposing of maybe less susceptible information.

    This is a very important discussion! A lot of work has yet to be done!

    • Sue says:

      I agree completely with Angela and Jac. Keep everything? Imagine what the lawyers would have to go through to make their case … imagine the employee finding documents on their personal file that should have been destroyed (ie letter of reprimand, performance evaluations from 10 years ago, etc.)? As Jac mentioned … “Why?” How are you going to handle ATIP requests when search results multiply by the thousands because nothing has been destroyed? Imagine trying to read these records many years from now? What about migration challenges?

  10. This conversation, which is excellent and desperately needed, is becoming somewhat confusing as different IM perspectives are added (RM, DM, ATIP and more) and as we are talking about three different situations – the past, the current state and a future state. For my own sanity, I’ve tried to separate them out a bit while providing my thoughts…

    • In a digital world we do not necessarily need to follow the RM practices that were necessary in a paper based world. Every one of these practices should be held up for inspection – what were the origins of these practices, what were these practices in aid of, are they practical or even possible anymore, are they necessary etc.?

    • Additionally, in the current digital world we are already faced with content types and information life-cycle variations that do not mesh with our traditional document-based RM perspectives and practices so we are compelled to consider new approaches to recordkeeping.

    • As I interpret the original author’s meaning “keeping everything” does not mean keeping all the transitory dross that we create on a daily basis. Rather that without the crippling cost issues of storage (paper and electronic) that we have faced in the past and, with the advent of improved search tools and centralized electronic repositories (with version control and the like), shouldn’t we begin to consider the potential benefits of keeping our valuable business records, and their contextual information, for longer periods of time?

    • Similarly, I do not believe the original author was suggesting that that the “keep everything” concept would extend to ignoring existing legislation such as that relating to the collection and strict management requirements for personal information.

    • Finally regarding the ATIP and Litigation issues raised – the mounting challenges of responding to ATIP and litigation have a number of causes today – they include:
    o Total volume of information (that includes presence of transitory materials, business value materials for which retention periods have expired and business value records)
    o Myriad, and often unmanaged, shared storage locations of information – electronic and paper – and the lack of sophisticated tools to find and assess information in these locations.
    o Myriad, and unmanaged, non-shared storage locations of information – electronic and paper.

    Consider these thoughts:
    1) Perhaps “Keeping everything” does not mean from this day forward we keep everything we generate – perhaps “keeping everything” is a practical solution for our existing collections of unmanaged legacy electronic records that most of us simply cannot afford the man-power to weed. Cleaning up the mess of the past is not something that most organizations can realistically consider nor is it the most appropriate place to expend our limited resources which should be expended on higher-value, forward looking activities.

    2) From a day-forward perspective, if we are in fact doing what we are supposed to be doing in our IM Programs, employees should have the tools and knowledge required to make decisions about what needs to be kept and what can be safely deleted and apply such knowledge on a daily basis (thus reducing, at a minimum, the mounting volumes of transitory information that have been such a problem for us all).

    3) Looking to the digital future, records management should be a rules-based activity that takes place in the background – something that employees are entirely unaware of. Employees should be concerned with managing their information only to the extent of:
    a. ensuring that it is stored somewhere that is accessible
    b. ensuring that it is stored with a minimum of metadata that will make it findable (and allow for records management business rules to take place in the background)
    c. ensuring that they are not cluttering organizational storage locations with transitory information

    4) In a the digital world of the future, where non-transitory business value information is stored and managed in accessible (i.e. not personal) repositories, against which sophisticated search technologies can be applied, a large majority of the ATIP and litigation issues which are so crippling today will be obviated.

    As the original author stated “Information Management professionals need to examine fundamentally what this all means in a digital world” – as an Information Management professional I am thankful to be part of this excellent conversation and I look forward to its continuation.

  11. Pingback: System Scope Test | » 5 reasons to keep everything: A Blog Response

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s