From Archiveteam
Revision as of 20:27, 22 November 2021 by TheTechRobo (talk | contribs) (Add stuff that I just found out through experimentation.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
URL http://4shared.com
Status Unknown
Archiving status Unknown
Archiving type Unknown
IRC channel #archiveteam-bs (on hackint)

4shared is a file-sharing service. It does not seem to have an inactivity time bomb, as files from 2009[IAWcite.todayMemWeb] are still there. Downloading will not be simple, as it uses a JavaScripty 60-second limit that seems to only be bypassable with a paid subscription. In addition, it seems to require login in the first place. Netsurf, a browser without JavaScript, does not bypass the time limit . User agents such as Googlebot have not been tested yet, nor has reverse-engineering the JavaScript, at least by User:TheTechRobo, but he plans to try this. It doesn't seem to have a relationship with 4chan.

There appears to be a sitemap at https://www.4shared.com/web/sitemap.xml. This should come in handy when archiving. However, this sitemap is incomplete. As of 2021-10-31, it lists about 62.4 million files. According to https://blog.4shared.com/infographic-4shared-2020-review/[IAWcite.todayMemWeb], there were 193 million uploaded files as of December 2020. The statistics box on the blog page further claims that the total size of the hosted files is 940 TB as of 2021-10-31, but this figure had been there already in 2010.

While logged in, TheTechRobo has found a few things.

  • The download link is stored in the DOM as an input element with with the ID "baseDownloadLink".
  • baseDownloadLink seems to be used in d2Script.js, on line 96.
  • It looks like you still need cookies when requesting the download link, as I see nothing else special about the request. I might be missing something though.