From Archiveteam
Revision as of 21:32, 3 January 2016 by Bzc6p (talk | contribs) (→‎What I've done: note about
Jump to navigation Jump to search

What I'm trying to look smart with

That era of the web is far behind us when a single wget -r -p command could mirror a website in its entirety. Nowadays each and every website has its own soul, its own hideous Javascript-linked content, not to mention the various file formats and ways of embedding content. Thus, if one is serious about web archiving, each and every website must be discovered carefully, often painstakingly, which is in too few cases possible in any automated ways.

What I probably shouldn't have archived

Általam feltöltött tartalom eltávolításával kapcsolatos kéréseket a címre kell küldeni.

Requests for removal of content uploaded by me should be sent to

Who I am

Hungarian amateur who joined the efforts of ArchiveTeam. "Specialized" in watching and saving Hungarian websites.

What I've done

This user is on a tin can connected to a windmill,[1] likes simplicity, likes archiving websites, therefore hates Javascript being used to just show components of websites.

Also saved a few wikis in the beginning.

I sometimes also take part in Warrior projects, in those rare cases when the tracker limit is not already saturated.

My to-do list (Hungarian deathwatch)

As soon as I have enough time, I should archive the following sites as they are at a certain risk of disappearing. (Doesn't list sites I've already started working on.) In order of urgency:

  1. – currently crashed, keeping an eye on it
  2. – site seems to be okay, but company behind it is very much in the red for years now
  3. – company that bought it performs very badly
  4. – owner seems okay but site is not cared of, full of spam
  5. Ingyenweb – a very old and obsoleted free webhosting site not cared for by the maintainer. Also sitting on some valuable domain names
  6. – a video sharing site not (any more) too popular, not really cared for, owner same as the one of Ingyenweb

What I recommend

  • Chfoo's Wpull for saving to WARC
  • Ikreymer's when things are too difficult
    • wget still lacks some handy features wpull already has got
    • No, I don't prefer ArchiveBot as most websites can't be saved automatically, also one can't really fine-tune a specific ArchiveBot job. Can be useful and powerful, but it's still quite dull under development. Use with caution.
  • Alard's warc-proxy or Ikreymer's webarchiveplayer for testing WARCs
  • Kngenie's ias3upload for uploading to IA

My dreams

Short term (will maybe realized one day)

I'd like to create a small Hungarian Internet Archive, collecting and presenting archives of websites saved by me or others, just like the Internet Archive and the Wayback Machine does. (Don't think of large scale, just starting with a few second-hand 1 TB hard disks connected to a home server.) Also, I'd restore died websites on their original location (domain) if possible, so rotten links could resurrect.

This could also act as a mirror of some websites uploaded to IA.

Long term (isn't likely to be realized ever)

Recording main Hungarian radio and television channels' complete program (0–24), and also give the public restricted access to these archives. (There is NAVA, but that doesn't record everything, and is often a bit difficult to access.)

What I've found in Hungarian about ArchiveTeam