Difference between revisions of "TalkTalk"

From Archiveteam
Jump to navigation Jump to search
(Created TalkTalk page, closure of webspace)
 
(Mass-edit to update uses of Template:IRC)
(6 intermediate revisions by the same user not shown)
Line 3: Line 3:
| image = TalkTalkLogo.png
| image = TalkTalkLogo.png
| URL = {{url|http://talktalk.co.uk | TalkTalk - Broadband, Fibre, TV and Calls}}
| URL = {{url|http://talktalk.co.uk | TalkTalk - Broadband, Fibre, TV and Calls}}
| project_status = {{closing}} <small>(personal web space)</small>
| project_status = {{offline}} <small>(personal web space)</small>
| archiving_status = {{notsavedyet}} <small>(personal web space)</small>
| archiving_status = {{partiallysaved}} <small>(personal web space)</small>
| irc = archiveteam
| irc = talkbork
| irc_network = EFnet
| irc_abandoned = true
| lead = [[User:JustAnotherArchivist]]
}}
}}


Line 11: Line 14:


== Personal Webspace Closure ==
== Personal Webspace Closure ==
In early 2018, TalkTalk announced that their customer webspace was closing down<ref>https://help2.talktalk.co.uk/webspace-service-update</ref>:


In early 2018 TalkTalk announced that their customer '''webspace is closing down'''.<ref>https://help2.talktalk.co.uk/webspace-service-update</ref>
<blockquote>
An update about our Webspace service


TalkTalk exists as a result of historic takeovers, mergers and the rebranding of defunct ISPs from as early as the 1990s. As a consequence of this, TalkTalk inherited and is responsible for managing the webspace data of four known domains. A forum post from a representative [http://web.archive.org/web/20180525205942/https://community.talktalk.co.uk/t5/Email-Webmail/Webspace-is-being-deleted-Question/td-p/2208094 is useful for determining the quantity of webspace disappearing]. A rough back-of-the-envelope estimate using search results indicates over 100,000 sites.
We regularly review our products to make sure we’re providing only the best value and service to our customers, which is why we’ve decided to remove Webspace and focus on the innovation of our core products.


Four historic "branches" of webspace under different domains are going offline.
Webspace will no longer be available from 16th July; however we’ve teamed up with Wix – one of the largest online website builders – to give you a great offer and support you if you want to move your website. Simply create a Wix account and you’ll be able to rebuild your website with a fresh design and brand new features.


== Grabbing Webspace ==
Like TalkTalk, Wix offers a free service; and if you’d like to use a premium plan which includes your own domain name, you’ll be able to take advantage of our exclusive TalkTalk discount.
</blockquote>


To gather a ''rough'' picture of the amount of sites, or at least sites indexed by search engines, we can perform a <code>site:http://...</code> search engine query; which outputs the following rounded results.
This webspace served a lot of old web 1.0 websites holding history, e.g. family history sites, war memorials, local businesses, and village cricket teams.
 
Although the shutdown was announced for 2018-07-16, the sites appeared to still be online as of 2018-07-21. As of 2018-07-26, everything redirected to a notice which states that the webspace is being shut down; the original content can still be accessed by appending a <code>showpage=true</code> parameter to the URL. According to the notice, final deletion was planned for 2018-08-10<ref>https://www.talktalk.co.uk/notices/web-space-closing.html?accessurl=http://www.talktalk.net/</ref>:
 
<blockquote>
From 16th July we'll no longer offer Webspace, and we'll be shutting down our Webspace platform by the 10th August.
If you are the owner of this Webspace and want your content to remain on the web after this date, you'll need to move it to a new platform.
 
Find out more about how to move your content on our FAQ page.
</blockquote>
 
The webspace contents were deleted sometime between 2018-08-16 and 2018-09-05. The webspace itself as well as the FTP accounts wer actually still available even after that latter date.
 
TalkTalk exists as a result of historic takeovers, mergers, and the rebranding of defunct ISPs from as early as the 1990s. As a consequence of this, TalkTalk inherited and was responsible for managing the webspace data of several domains. A forum post from a representative [http://web.archive.org/web/20180525205942/https://community.talktalk.co.uk/t5/Email-Webmail/Webspace-is-being-deleted-Question/td-p/2208094 is useful for determining the quantity of webspace disappearing]. A rough back-of-the-envelope estimate using search results indicates over 100,000 sites.
 
=== Grabbing Webspace ===
To gather a ''rough'' picture of the amount of sites, or at least sites indexed by search engines, we can perform a <code>site:http://...</code> search engine query, which outputs the following rounded results.


{| class="wikitable"
{| class="wikitable"
Line 42: Line 64:
| 51,000
| 51,000
| 25,000
| 25,000
|-
| http://users.tinyworld.co.uk/USERNAME/
| N/A
| N/A
|-
| http://users.tinyonline.co.uk/USERNAME/
| N/A
| N/A
|-
| http://www.USERNAME.screaming.net/
| N/A
| N/A
|-
| http://www.USERNAME.homecall.co.uk/
| N/A
| N/A
|-
| http://www.USERNAME.ukgateway.net/
| N/A
| N/A
|-
| http://www.USERNAME.worldonline.co.uk/
| N/A
| N/A
|-
| http://homepages.nildram.co.uk/~USERNAME/
| N/A
| N/A
|-
| http://www.USERNAME.nildram.co.uk/
| N/A
| N/A
|-
| http://web.onetel.com/~USERNAME
| N/A
| N/A
|-
| http://web.onetel.net.uk/~USERNAME
| N/A
| N/A
|}
|}


It is confirmed that data on these domains will no longer be accessible on '''July 16, 2018'''. From Web 1.0 Family History Sites, War Memorials, Local Businesses to Village Cricket Teams, many sites hold history.
Notes:


== Ideas ==
* http://homepages.nildram.co.uk/~USERNAME/ and http://www.USERNAME.nildram.co.uk/ serve the same content in some cases.
* http://web.onetel.com/~USERNAME and http://web.onetel.net.uk/~USERNAME serve the same content.


Scrape search engines for full site URLs
=== Ideas ===
* Scrape search engines for full page URLs and extract deduplicated base URLs (in the format of the previous section)
* For every URL found: <code>curl http://web.archive.org/save/[URL]</code>
* Use Wayback API to gather list of already partially archived images or pages/base URLs
* Archive the base URL recursively.
* Older (lineone, tiscali and pipex era) sites should likely take priority. These are often old and very "web 1.0".


For every URL found, <code>curl http://web.archive.org/save/[URL]</code>  
=== Archival ===
Based on [[DNS History]] (with obvious rubbish filtered out), 96 subdomains of <tt>talktalk.net</tt> and 517 subdomains of <tt>dsl.pipex.com</tt> were archived through ArchiveBot {{Job|bbjyy6chn3i3h1jbdc6ya3nm0}} and {{Job|4738iqfdngwdloyi7ye541u06}}, respectively. However, many of these sites did not actually exist, and this is only a very small part of the total webspace.


Use Wayback API to gather list of already partially archived images or pages, and archive the root site fully
From scraping Bing and other sources, further ArchiveBot jobs were run for 2088 additional <tt>talktalk.net</tt> subdomains ({{Job|9490x1sgho1s9hr4p6ki1v6cq}}, {{Job|6pwm5oyomcfdbtmy7qifydepj}}, and {{Job|biywpihh8lxd77uisihkwczzl}}), 723 additional <tt>dsl.pipex.com</tt> subdomains ({{Job|8qwxayost81j0p78k4hflp5a4}}), 518 <tt>website.lineone.net</tt> sites/directories ({{Job|61r3cgx9plfxe95pldp4bfvnf}}), and 1281 <tt>myweb.tiscali.co.uk</tt> sites/directories ({{Job|9sw3f9v1f9pzpfcf9mao5ubn4}}).


Older (lineone, tiscali and pipex era) sites should likely take priority? These are often old and very "Web 1.0"
A thorough archival based on about 151k URLs extracted and derived from Bing, DNS History, DuckDuckGo, EntireWeb, Reddit, and the Wayback Machine was done by [[User:JustAnotherArchivist]] using wpull. 2.53 million URLs across the various domains were grabbed, resulting in 102.6 GiB of (compressed) WARCs.


== References ==
== References ==

Revision as of 19:03, 31 October 2021

TalkTalk
TalkTalkLogo.png
URL TalkTalk - Broadband, Fibre, TV and Calls[IAWcite.todayMemWeb]
Status Offline (personal web space)
Archiving status Partially saved (personal web space)
Archiving type Unknown
IRC channel #archiveteam-bs (on hackint)
(formerly #talkbork (on EFnet))
Project lead User:JustAnotherArchivist

TalkTalk plc is a company which provides television, telecommunications, Internet access, and mobile network services to businesses and consumers in the United Kingdom.[1]

Personal Webspace Closure

In early 2018, TalkTalk announced that their customer webspace was closing down[2]:

An update about our Webspace service

We regularly review our products to make sure we’re providing only the best value and service to our customers, which is why we’ve decided to remove Webspace and focus on the innovation of our core products.

Webspace will no longer be available from 16th July; however we’ve teamed up with Wix – one of the largest online website builders – to give you a great offer and support you if you want to move your website. Simply create a Wix account and you’ll be able to rebuild your website with a fresh design and brand new features.

Like TalkTalk, Wix offers a free service; and if you’d like to use a premium plan which includes your own domain name, you’ll be able to take advantage of our exclusive TalkTalk discount.

This webspace served a lot of old web 1.0 websites holding history, e.g. family history sites, war memorials, local businesses, and village cricket teams.

Although the shutdown was announced for 2018-07-16, the sites appeared to still be online as of 2018-07-21. As of 2018-07-26, everything redirected to a notice which states that the webspace is being shut down; the original content can still be accessed by appending a showpage=true parameter to the URL. According to the notice, final deletion was planned for 2018-08-10[3]:

From 16th July we'll no longer offer Webspace, and we'll be shutting down our Webspace platform by the 10th August. If you are the owner of this Webspace and want your content to remain on the web after this date, you'll need to move it to a new platform.

Find out more about how to move your content on our FAQ page.

The webspace contents were deleted sometime between 2018-08-16 and 2018-09-05. The webspace itself as well as the FTP accounts wer actually still available even after that latter date.

TalkTalk exists as a result of historic takeovers, mergers, and the rebranding of defunct ISPs from as early as the 1990s. As a consequence of this, TalkTalk inherited and was responsible for managing the webspace data of several domains. A forum post from a representative is useful for determining the quantity of webspace disappearing. A rough back-of-the-envelope estimate using search results indicates over 100,000 sites.

Grabbing Webspace

To gather a rough picture of the amount of sites, or at least sites indexed by search engines, we can perform a site:http://... search engine query, which outputs the following rounded results.

Closing Domain Google Results Bing Results
http://website.lineone.net/~USERNAME/ 11,000 13,000
http://myweb.tiscali.co.uk/USERNAME/ 54,000 36,000
http://www.USERNAME.dsl.pipex.com/ 14,000 15,000
http://www.USERNAME.talktalk.net/ 51,000 25,000
http://users.tinyworld.co.uk/USERNAME/ N/A N/A
http://users.tinyonline.co.uk/USERNAME/ N/A N/A
http://www.USERNAME.screaming.net/ N/A N/A
http://www.USERNAME.homecall.co.uk/ N/A N/A
http://www.USERNAME.ukgateway.net/ N/A N/A
http://www.USERNAME.worldonline.co.uk/ N/A N/A
http://homepages.nildram.co.uk/~USERNAME/ N/A N/A
http://www.USERNAME.nildram.co.uk/ N/A N/A
http://web.onetel.com/~USERNAME N/A N/A
http://web.onetel.net.uk/~USERNAME N/A N/A

Notes:

Ideas

  • Scrape search engines for full page URLs and extract deduplicated base URLs (in the format of the previous section)
  • For every URL found: curl http://web.archive.org/save/[URL]
  • Use Wayback API to gather list of already partially archived images or pages/base URLs
  • Archive the base URL recursively.
  • Older (lineone, tiscali and pipex era) sites should likely take priority. These are often old and very "web 1.0".

Archival

Based on DNS History (with obvious rubbish filtered out), 96 subdomains of talktalk.net and 517 subdomains of dsl.pipex.com were archived through ArchiveBot job:bbjyy6chn3i3h1jbdc6ya3nm0 and job:4738iqfdngwdloyi7ye541u06, respectively. However, many of these sites did not actually exist, and this is only a very small part of the total webspace.

From scraping Bing and other sources, further ArchiveBot jobs were run for 2088 additional talktalk.net subdomains (job:9490x1sgho1s9hr4p6ki1v6cq, job:6pwm5oyomcfdbtmy7qifydepj, and job:biywpihh8lxd77uisihkwczzl), 723 additional dsl.pipex.com subdomains (job:8qwxayost81j0p78k4hflp5a4), 518 website.lineone.net sites/directories (job:61r3cgx9plfxe95pldp4bfvnf), and 1281 myweb.tiscali.co.uk sites/directories (job:9sw3f9v1f9pzpfcf9mao5ubn4).

A thorough archival based on about 151k URLs extracted and derived from Bing, DNS History, DuckDuckGo, EntireWeb, Reddit, and the Wayback Machine was done by User:JustAnotherArchivist using wpull. 2.53 million URLs across the various domains were grabbed, resulting in 102.6 GiB of (compressed) WARCs.

References