In another symptom of ongoing tech-bro model collapse Elon Musk recently tweeted out an impossible plan to have (his) AI rewrite the entire internet and then re-train (his) Grok AI on a “corrected version” of – wait for it –the “entire corpus of human knowledge”. The AI will “correct” the data, he says. The “uncorrected garbage” will be eliminated, he says.
Besides the fact that his models are likely based on the “one one-hundred-billionth of the internet” that’s accessible to be scraped for AI foundation models – hardly the entirely of human knowledge – this deranged fantasy completely misunderstands the nature of data.
Data is a living thing. Data evolves, data is connected to other data creating bonds that are meaningful in themselves, and data exists in wild places beyond the reach of AI. Data is hard to disappear entirely — but not impossible to lose.
There are many groups, organizations, and institutions who are actively working to preserve and protect endangered data; To protect its existence and to protect discoverability and accessibility which are just as important as long-term preservation.
An example: Remember when the government purged images from the Department of Defense website using random keywords that (they thought) promoted diversity, equity and inclusion? The AP created a searchable database of the eliminated images and made it available to the public.
Beautiful Public Data
Individuals are also contributing. The National Archives hosts rich digital image collections for public research that are endangered and under protected in the current moment.
An independent scholar recently used one of these collections to create a website for an at-risk collection of photographs from the WWII Japanese Mass Incarceration collection. These images tell the story of an important part of US History and highlight what could be lost if we don’t properly archive our government data.
A small but beautiful effort.
What Can You Do?
Don’t despair when another government data purge is announced. Support individual data preservation efforts. Explore one of the many a data preservation repositories. Join an open archiving project.
Become part of the growing network of researchers, activists, artists, and technologists working to ensure our public data—our public memory—is not just saved, but seen.