There is an adage that has been around since the early days of the web: “Nothing is ever deleted from the internet.” This refers to the fact that once something is released into the world, there’s little to stop others from copying it and reposting it for others, and they generally will.
For attorneys, there are a number of tools you can use to find materials that seem to have disappeared.
The best known and most comprehensive archive site is the Wayback Machine run by the Internet Archive (archive.org). The Internet Archive is a project that began in 1996 to preserve the history of the web. To do this, IA has a large server farm that follows links from one page to another, and saves a copy of each page it sees. This means that not all pages are saved in the Wayback Machine, since finding sites is sometimes a matter of luck, and there won’t always be a copy from a particular date, since the servers might not have crawled the site during the time you need. Furthermore, some sites request that the Internet Archive not save copies of their pages. Nevertheless, the Wayback Machine is an extremely powerful resource for finding information that has been removed from a website, such as statements that have been removed from a blog post, employee profiles for people who have since left the company, recent versions of websites that are currently down for technical reasons, and many more examples.
If what you’re looking for isn’t in the Wayback Machine, there are other services that operate the same way, but may have saved the site you need on a different date.
A website cache is a saved copy of a website as it existed last time it was visited. There are services that save caches of a large number of sites and make them available. Unlike archive services, web caches don’t save lots of copies of the same site back through time, they are just a snapshot of the most recent version, but they’re useful for finding sites that are offline due to technical problems, or very recent changes to the site (like a post that was just removed).
Because search engines are browsing and saving information all the time to update their results, it should come as no surprise that Google is often the best source for website caches.
To access the most recent version of a site that Google’s servers have saved, go to Google.com (not the browser’s address bar) and search for the URL you want to see, e.g. “wikipedia.org”. Usually, the site will be the first result, and if there is a green triangle to the right of the URL, you can click it and select the cached version of the site.
Bing's search engine saves caches, and they can be accessed in exactly the same way Google's are.
Occasionally, the best copy of a site that’s recently changed might already be on your computer. Whenever you visit a website, your browser has to download all the elements of the site (the code, the pictures, etc.) so it can display them for you. To save time, most modern browsers will save those things in a temporary cache so that it can bring them up more quickly if you go back to a site you just visited.
Accessing local caches can be tricky, and the method varies depending on your browser.
Social media is often not included in the Wayback machine or stored in a Google cache, but it is frequently relevant in legal proceedings. If you need information that has been “deleted” by a user on a social media site, it is likely not actually gone. If a crime is committed and someone posts video of themselves committing it to Facebook, then thinks better and presses the delete post button, Facebook doesn’t actually scrape the video from the server, it just changes the status of the file to “deleted.” If you have reason to think that evidence has been deleted from social media, you may have to request a subpoena of the company to get a copy.
A major problem in legal scholarship and court opinions is that both now frequently cite to websites, and those websites may later change or be deleted. This is called link rot, and there are some tools to combat it.
Perma.cc is a project created at Harvard Libraries, and it is designed to allow scholars to permanently save copies of the websites they cite to. The UMN Law Library is a partner of the project, and Scott Uhl can help you use Perma if you have questions.