Back

replied to petersuber's status

Update. "Inside the race to archive the US government’s websites"
https://www.technologyreview.com/2025/02/07/1111328/inside-the-race-to-archive-the-us-governments-websites/

Surveying a range of initiatives with good clarity on the obstacles.

"There are questions about whether scraping the data will really be enough. Restoring websites and complex data sets is often not a simple process.…'The repairs and attempts to recover are sometimes insurmountable where we need continuous readings of data.' 'All of this data archiving work is a temporary Band-Aid,' says Gosnell. 'If data sets are removed and are no longer updated, our archived data will become increasingly stale and thus ineffective at informing decisions over time.' "

replied to petersuber's status

Update. "The Public Environmental Data Partners [] are committed to preserving and providing public access to federal environmental data. We are a volunteer coalition of several environmental, justice, and policy organizations, researchers across several universities, archivists, and students who rely on federal datasets and tools to support critical research, advocacy, policy, and litigation work. To gather insights on what data to preserve, we reached out to our networks, which consist largely of environmental justice groups and networks, state and local government climate offices, and academic researchers. We compiled a large list of federal databases and tools, and prioritized them based on their relative impact, our confidence that we could archive them, and the relative effort it would take to obtain and archive them."
https://screening-tools.com/

Continuously updated.

replied to petersuber's status

Update. "Today we [Harvard Law School @harvard_law Library Innovation Lab @harvardlil] released our archive of data.gov on Source Cooperative. The 16TB collection includes over 311,000 datasets harvested during 2024 and 2025, a complete archive of federal public datasets linked by data.gov. It will be updated daily as new datasets are added to data.gov. This is the first release in our new data vault project to preserve and authenticate vital public datasets for academic research, policymaking, and public use."
https://lil.law.harvard.edu/blog/2025/02/06/announcing-data-gov-archive/

replied to petersuber's status

Update. "Federal data is disappearing. On Thursday, meet the teams working to rescue it and learn how you can help. Join the Internet Archive [@internetarchive] and the Library Innovation Lab [@harvardlil] on Feb. 13, 3pm Eastern for a special event exploring the terabytes of data they have already saved and how to access it."
https://www.muckrock.com/news/archives/2025/feb/10/federal-data-is-disappearing-on-thursday-meet-the-teams-working-to-rescue-it-and-learn-how-you-can-help/

replied to petersuber's status

Update. If you're following this thread, you should also follow the Data Rescue Project by visiting its web site and subscribing to its email list. It aims "to serve as a clearinghouse for rescue-related efforts and data access points for public US governmental data that are currently at risk." And it's , which gives it fighting chance to be comprehensive and up to date.
https://www.datarescueproject.org/about-data-rescue-project/

If you're on , also follow its B account.
https://bsky.app/profile/datarescueproject.org

I'm very aware that a solo effort, like this Mastodon thread, doesn't scale to the size of this task and I welcome the arrival of a crowdsourced effort. I will use it and refer people to it.

Update. "As the US government removes health websites and data, here’s a list of non-government data alternatives and archives"
https://journalistsresource.org/home/as-the-us-government-removes-health-websites-and-data-heres-a-list-of-non-government-data-alternatives/

"There’s no perfect alternative to the government databases, but some non-governmental organizations have their own datasets, which can be useful to journalists. Several associations have also been downloading government data and making them available to their members. To help journalists with their continued reporting, we have curated a list of non-government websites that have health data, although some use government data to create their reports. We’ll continue to update this list. If you have a suggestion for a database, please email us."

h/t @kdnyhan