TalkTalk
TalkTalk | |
URL | TalkTalk - Broadband, Fibre, TV and Calls[IA•Wcite•.today•MemWeb] |
Status | Closing (personal web space) |
Archiving status | Not saved yet (personal web space) |
Archiving type | Unknown |
IRC channel | #talkbork (on hackint) |
TalkTalk plc is a company which provides television, telecommunications, Internet access, and mobile network services to businesses and consumers in the United Kingdom.[1]
Personal Webspace Closure
In early 2018, TalkTalk announced that their customer webspace was closing down[2]:
An update about our Webspace service
We regularly review our products to make sure we’re providing only the best value and service to our customers, which is why we’ve decided to remove Webspace and focus on the innovation of our core products.
Webspace will no longer be available from 16th July; however we’ve teamed up with Wix – one of the largest online website builders – to give you a great offer and support you if you want to move your website. Simply create a Wix account and you’ll be able to rebuild your website with a fresh design and brand new features.
Like TalkTalk, Wix offers a free service; and if you’d like to use a premium plan which includes your own domain name, you’ll be able to take advantage of our exclusive TalkTalk discount.
This webspace serves a lot of old web 1.0 websites holding history, e.g. family history sites, war memorials, local businesses, and village cricket teams.
Although the shutdown was announced for 2018-07-16, the sites appear to still be online as of 2018-07-21.
TalkTalk exists as a result of historic takeovers, mergers, and the rebranding of defunct ISPs from as early as the 1990s. As a consequence of this, TalkTalk inherited and is responsible for managing the webspace data of four known domains. A forum post from a representative is useful for determining the quantity of webspace disappearing. A rough back-of-the-envelope estimate using search results indicates over 100,000 sites.
Grabbing Webspace
To gather a rough picture of the amount of sites, or at least sites indexed by search engines, we can perform a site:http://...
search engine query, which outputs the following rounded results.
Closing Domain | Google Results | Bing Results |
---|---|---|
http://website.lineone.net/~USERNAME/ | 11,000 | 13,000 |
http://myweb.tiscali.co.uk/USERNAME/ | 54,000 | 36,000 |
http://www.USERNAME.dsl.pipex.com/ | 14,000 | 15,000 |
http://www.USERNAME.talktalk.net/ | 51,000 | 25,000 |
Ideas
- Scrape search engines for full page URLs and extract deduplicated base URLs (in the format of the previous section)
- For every URL found:
curl http://web.archive.org/save/[URL]
- Use Wayback API to gather list of already partially archived images or pages/base URLs
- Archive the base URL recursively.
- Older (lineone, tiscali and pipex era) sites should likely take priority. These are often old and very "web 1.0".
Archival
Based on DNS History (with obvious rubbish filtered out), 96 subdomains of talktalk.net and 517 subdomains of dsl.pipex.com were archived through ArchiveBot job:bbjyy6chn3i3h1jbdc6ya3nm0 and job:4738iqfdngwdloyi7ye541u06, respectively. However, many of these sites did not actually exist, and this is only a very small part of the total webspace.