To help coordinate and support the work of saving government data, ensure that individual efforts didn’t duplicate one another, and provide a secure, accessible repository for archived material, a group of concerned librarians created the Data Rescue Project (DRP). A “clearinghouse” for data preservation efforts, DRP builds on efforts that began during Trump’s first term. LJ spoke with DRP organizer Lynda Kellam about the project and to learn more about how to get involved.
In late January, as President Donald Trump began rolling out executive orders intending to scrub federal websites of data on public health; the environment; gender identity; diversity, equity, and inclusion initiatives; and census research—to name a few—archivists across the country stepped up to preserve as many files as they could. Often working on their own time, data librarians (and others who understood what was at stake) began crawling government sites to save endangered information.
To help coordinate and support this work, ensure that individual efforts didn’t duplicate one another, and provide a secure, accessible repository for archived material, a group of concerned librarians created the Data Rescue Project (DRP). A “clearinghouse” for data preservation efforts, DRP builds on efforts that began during Trump’s first term. The site provides a Data Rescue Tracker spreadsheet that documents existing work, links to current efforts and organized data rescue events, and a list of resources and tools—centralized, easy-to-use assets for anyone interested in helping preserve the national record.
LJ spoke with DRP organizer Lynda Kellam, a data librarian at an Ivy League university, about the project and to learn more about how to get involved.
LJ : Can you talk about the genesis of DRP?
Lynda Kellam: The groups involved this this time around were a lot of the data librarians from professional organizations—IASSIST [International Association for Social Science Information Service & Technology], RDAP [Research Data Access & Preservation], and the Data Curation Network. Members of those organizations came together because—especially after the CDC data sets were taken down—we were seeing a lot of the same conversations across our different communities. There was a realization that we needed to do something. We were also seeing calls for the data rescues to restart. There had been data rescues in 2017, but that infrastructure didn’t really exist anymore—a lot of the people had gone on to other jobs, so there wasn’t really a core group that was going to come in and start saving things. We wanted to focus on making sure we were coordinating across all the different efforts.
We knew that certain things still existed, like End of Term Web Archive, and we knew that new things were popping up, like Harvard’s data.gov dump. But sometimes those have a tendency to get siloed, and we wanted to make sure that people were seeing all of the different things that were going on, and then also make sure that all those different groups are communicating with each other and not doing the same thing.
For instance, EDGI [Environmental Data & Governance Initiative], and the PEDP, Public Environmental Data Partners, are doing a great job with environmental data, so in my mind there’s no need to worry about environmental data for our group, because they’ve been doing it for a long time. We wanted to focus on things that we were worried would be at risk—spawned by the USAID dismantling and the realization that this is going beyond environmental data. It’s impacting data that that wasn’t impacted in the last time.
Who helped develop the project?
Most of us are data librarians. We also have a person from Saving Ukrainian Cultural Heritage Online [SUCHO]. His name is Sebastian [Majstorovic], and he’s based in Germany. He’s helped us out a lot and has a lot of experience with web scraping. We have a communication channel called Mattermost, where people can talk about things that they’re working on—all of the groups are able to talk to each other if they need to.
We have a very active Bluesky channel that I run. That has been a huge part of what I do, the communications side of things—other groups don’t necessarily have the capacity to do that because they have other jobs that they have to focus on. That’s why DRP has picked up that role.
Are you externally funded at all?
We have no funding. We are a completely volunteer-based group at this point.
How has DRP’s work changed over the past couple of months, and what does it look like now?
Those first few weeks were a little tenuous in a sense of feeling on edge about what we were doing, whether or not we were going to get trouble. That feeling has dissipated a little bit, because more of the bigger institutions like ICPSR [Inter-university Consortium for Political and Social Research] have come out to support us, and the Center for Open Science. They’ve made public statements about how this is important—this isn’t just about politics, this is about being able to save our data. That has given a lot of safety for those of us who are volunteers working on this. It’s made me feel more comfortable about talking to the press. So it’s gotten better, and we’ve gotten more people. We have about 700 active volunteers. About 1,000 people have subscribed to our newsletter, and then over 3,000 people follow us on Bluesky. So we have a pretty decent reach. The core 700 volunteers—not all of them are active, but there’s a group of those that are doing a lot of work, so it’s a matter of helping coordinate them. A lot of sitting on Saturday afternoon trying to figure out who’s working on what, tracking down people, things like that.
One volunteer has done a huge number of data set downloads. In that first weekend, I think he did about 40 percent of the work. And he said it was because he had friends who were federal employees. It was his way of giving back to them. He wasn’t even a librarian—he was just a volunteer who was helping out. He found us through Bluesky.
The first two weeks were really busy, and then there was a bit of a lull. Then in response to every action that’s been taken, especially this month, with actions against the Department of Education, IMLS, and other smaller offices, it’s been a little more active.
Do you have concerns about your own privacy, or any kind of backlash to doing this work?
As we’ve gotten more press, I’ve definitely been mindful of the fact that there might be trolls. I don’t imagine a larger reprisal. We’re not doing anything that librarians haven’t always done, which is preserving access to information. We have gotten nothing but support on social media. I think the closest thing we’ve had to people questioning what we’re doing has been people thinking that the Wayback Machine can capture everything. We work with the Wayback Machine and End of Term Web Archive, and we know what their possibilities are, but there are also limitations.
Who have been helpful partners?
EDGI and PEDP, definitely. The Harvard Library Innovation Lab were super supportive of us from the very beginning. Wayback and End of Term Web Archive have been great. There’s a lot of organizations that have reached out to us and have been really helpful.
America’s Essential Data is a new group that is looking more at tracking changes in websites, and they’re monitoring the legality of the end of a data series. Some data series are mandated by Congress—the data has to be collected by the agency, so it would be in violation of that if they didn’t collect it. And then there are others that Congress gives them the authority to collect. So they’re looking at that distinction and seeing which data sets are the ones where pressure can be put onto Congress or the agency to continue the collection of that data. It’s a completely different kind of question from what we look at, but it’s still helpful, and it’s nice to have that conversation. SUCHO was very helpful in terms of the way we set up our original inventory for tracking what we were doing. They gave us a lot of advice on how to handle the actual rescue process, based on what they did.
How will you make this data accessible to users?
All of our data we try to put into DataLumos, which is ICPSR’s archive. One of our goals was to avoid having data just sitting out there somewhere without any home, so a lot of my work has been encouraging people to make sure that they know what the workflow we use is and doing a little bit of quality control on the deposits they’re doing.
That’s a huge part of what we want to do, and why we put it in DataLumos, because it becomes immediately accessible to anybody. I was kind of adamant that we put it somewhere like that, because especially when we get into the education data and certain data sets where the communities aren’t necessarily as technical as other communities, so they need to have something that’s a little bit more immediate and clickable to access. A lot of people have been using BitTorrent for sharing data, which is super cool and helpful, but you have to know how to use that. I wanted to have something that was a little more immediately user-friendly for people. Access was a lesson learned from 2017. They were making things accessible, but it isn’t clear to me that all the data from the different data rescues ended up in the access that they had. With this, I wanted to make sure, as much as possible, we didn’t have data loss.
How can people or libraries get involved?
One is they could just follow the Bluesky [account], that’s the easiest way. Two, they could subscribe to our website. You’ll get an email with our newsletter, and that gives you different ways you can get involved as well. And three, we have a Google volunteer interest form, which is where people can go if they want to really get into it. This is on the FAQ as well. We do have a lot of people in there—we’re getting to a saturation point when it comes to hyper-interested volunteers, but we’ll always try. If you have skill sets that might work better for somebody else, we will refer you. Once you sign up for that, you’ll get an email that has a lot more information about the rescues that we’re doing, access to that information. And then you’ll be invited to the MatterMost channel—it’s a chat platform, like Slack.
For data rescue events, if people are interested, they should get in touch with us, and we can give them some playbooks on how to do that. We would love to know if somebody’s doing a data rescue event, because we can share our information so they’re not rescuing things that have already been done. I encourage anybody interested in doing a data rescue event to email us, and we can point them in the right direction. And we would love to know if people are creating websites or are doing events—just get in touch with us.
But the other thing that’s as important as data rescues is data impact stories—collecting those in your community, having an understanding of how your local community or city or state uses data, and making sure to explain that to people in the community so that it doesn’t seem just like a library topic or a nerd topic, that the lack of that data is going to impact our communities. We need to make sure that our local politicians and people on the local level understand what that impact will be.
If you go through our home page, it has a form to fill out an impact story. They’re not very long—they just talk about what data is important and why. The importance for the nation, especially at the local level, can’t be over-stressed.
As a member of the board of trustees of the St. Louis County Library District for 17 years and having served as president of the board for 14 of those years I can personally testify to the great work done by Director/CEO Kristen Sorth and her staff. It has been my pleasure to see the St. Louis County Library District grow and meet the varying needs of its patrons. I can
say without a doubt it is the best library district in Missouri and among the best in the nation.
We are currently offering this content for free. Sign up now to activate your personal profile, where you can save articles for future viewing