Difference between revisions of "User:Bzc6p"

From Archiveteam
Jump to navigation Jump to search
(→‎My toolbox: Update)
(→‎Restored websites: + oszdmeg.com)
 
(11 intermediate revisions by the same user not shown)
Line 1: Line 1:
<DIV style="background-color:yellow; padding: 5px; width:80%; margin-bottom:20px; margin-left:auto; margin-right:auto; text-align:center"><P style="font-size:x-large">'''Elindult az <span class="plainlinks">[https://archiveteam.hu archiveteam.hu]</span>!'''</P><P>Magyar nyelvű információk az ArchiveTeam tevékenységéről, illetve a magyar weboldalak sorsáról!</P><P style="font-size:x-large">'''Elindult az <span class="plainlinks">[https://archiveteam.hu/rszi RSZI]</span> magyar webarchívum!'''</P><P>3 képfeltöltő szolgáltatás közel '''2,5 millió, a Wayback Machine-ból sem elérhető képe''' újra hozzáférhető!</B></P></DIV>
{{Hungarian websites}}
'''bzc6p''' is a [http://en.wikipedia.org/wiki/Hungary Hungarian] amateur archivist who joined the efforts of ArchiveTeam. "Specialized" in watching and saving [[Template:Hungarian websites|Hungarian websites]].
'''bzc6p''' is a [http://en.wikipedia.org/wiki/Hungary Hungarian] amateur archivist who joined the efforts of ArchiveTeam. "Specialized" in watching and saving [[Template:Hungarian websites|Hungarian websites]].


Line 9: Line 5:
</div>
</div>


Not been doing much spectacular activity recently, but still operating my long-running projects in my, now much less, free time. You may, however, successfully contact me on my [[User_talk:bzc6p|talk page]] or via email if necessary.
I also check this wiki once a week, so you can contact me on my [[User_talk:bzc6p|talk page]] as well.


See [https://archive.org/details/@bzc6p what I'm archiving].
See [https://archive.org/details/@bzc6p what I'm archiving].


== My projects ==
== My projects ==
Line 18: Line 13:


=== Large websites ===
=== Large websites ===
The archive of each is in the terabyte range.
* [[Indafotó]]
* [[Indafotó]]
* [[eOldal]]
* [[eOldal]]
* [[network.hu]]
* [[network.hu]]
* [[TVN.hu]]
* [[TVN.hu]]
* [[pics.coldline.hu]]
* [[kepfeltoltes.hu]]
* [[kepfeltoltes.hu]]
* [[kephost.com]]
* [[kephost.com]]
* Selected [[YouTube]] videos, as part of [[News+C/hu|News+C]] project
* [[myVIP]]
* [[myVIP]]


=== Medium-sized websites ===
The archive of each ranges from a few gigabytes to a few hundred gigabytes.


=== Medium-sized websites ===
* [[cafeblog.hu]]
* [[Volán#Centralization,_round_3|mavcsoport.hu]]
* [[Volán#Centralization,_round_3|volanbusz.hu]]
* [https://vadhajtasok.hu vadhajtasok.hu], as part of [[News+C/hu]] ({{blue|continuous}})
* [https://vadhajtasok.hu vadhajtasok.hu], as part of [[News+C/hu]] ({{blue|continuous}})
* [[kephost.net]] ({{blue|continuous}})
* [[kephost.net]] ({{blue|continuous}})
Line 38: Line 41:
* [https://hvg.hu hvg.hu], as part of [[News+C/hu]] ({{blue|continuous}})
* [https://hvg.hu hvg.hu], as part of [[News+C/hu]] ({{blue|continuous}})
* [https://kuruc.info kuruc.info], as part of [[News+C/hu]] ({{blue|continuous}})
* [https://kuruc.info kuruc.info], as part of [[News+C/hu]] ({{blue|continuous}})
* [[pics.coldline.hu]]
* [http://nol.hu nol.hu] ([https://archive.org/details/nol_hu_2017 archive])
* [http://nol.hu nol.hu] ([https://archive.org/details/nol_hu_2017 archive])
* [[Wikispot]]
* [[Wikispot]]
* [[PSharing]]
* [[PSharing]]
* [[TVN.hu#tudjatok.hu|tudjatok.hu]]
* [[nolblog.hu]]
* [[legalja.hu]]
* [[netszar.com]]
* [[Volán]] websites
* [[keptarad.hu]]
* [[keptarad.hu]]
* [[kepkezelo.com]]
* [[kepkezelo.com]]
* [[noob.hu]]
* [[noob.hu]]
* [[GTF Képhost]]
* [[GTF Képhost]]
=== Small websites ===
* [http://dagalyfurdo.hu Dagály Fürdő] ([http://archive.org/details/dagalyfurdo_hu_20161010 archive])
* [http://wirtschaftsblatt.at WirtschaftsBlatt] ([http://archive.org/details/wirtschaftsblatt_at_articles archive])
* [http://balassiintezet.hu Balassi Intézet] ([http://archive.org/details/balassiintezet_hu_20160829 archive])
* [http://hi.co hi.co] ([http://archive.org/details/hi_co_20160829 archive])
* [http://ketezer.hu 2000] ([http://archive.org/details/ketezer_hu_20160825 archive])
* [http://mozaiktv.rs Mozaik TV] ([http://archive.org/details/mozaiktv_rs_20160827 archive])
* [http://precedensnyelvstudio.hu Precedens Nyelvstúdió] ([http://archive.org/details/precedensnyelvstudio_hu_20160827 archive])
* [http://alomauto.eu Álomautó Múzeum] ([http://archive.org/details/alomauto_eu_20160827 archive])
* [[nolblog.hu]]
* [[TVN.hu#tudjatok.hu|tudjatok.hu]]
* [http://kszz.profitarhely.hu Kecskeméti Szimfonikus Zenekar] ([https://archive.org/details/kszz_profitarhely_hu_20160128 archive])
* [http://melecafe.com Mele Café] ([https://archive.org/details/melecafe_com_20160128 archive])
* [http://cafealibi.hu Café Alibi] ([https://archive.org/details/cafealibi_hu_20160126 archive])
* [http://efmk.hu Kecskeméti Kulturális és Konferencia Központ] ([https://archive.org/details/efmk_hu_20160107 archive])
* [http://szeplakierzsebet.hu Széplaki Erzsébet] ([https://archive.org/details/szeplakierzsebet_hu_20160111 archive])
* [http://freddyfitness.hu Freddy Fitness] ([https://archive.org/details/freddyfitness_hu_20160106 archive])
* [http://legalja.hu legalja.hu] ([https://archive.org/details/legalja_hu_20151227 archive])
* Astra Insurance: [http://www.astrasig.ro Romania] ([http://archive.org/details/astrasig_ro_20151128 archive]), [http://www.astrabiztosito.hu Hungary] ([http://archive.org/details/astrabiztosito_hu_20151126 archive])
* [http://kajaszoszentpeter.hu Kajászószentpéter] (archives: [https://archive.org/details/kajaszoszentpeter_hu_2015_06 website], [https://archive.org/details/kajaszoszentpeter_photos photos], [https://archive.org/details/kajaszoszentpeter_videos videos])
* [http://netszar.com netszar.com] ([https://archive.org/details/netszar_com_2015_06 archive])
* Hungarian [[Volán]] websites
* [[Demotivalo.net]]
* [[Demotivalo.net]]
* [http://wikiapiary.com/wiki/Special:Contributions/bzc6p A few wikis] in the beginning.


=== Non-web stuff ===
=== Non-web stuff ===
I'm also archiving some Hungarian TV and radio programmes, magazines and [https://twitter.com/textfiles/status/1317124202688925701 shop flyers].
I'm also archiving some Hungarian TV and radio programs, magazines and [https://twitter.com/textfiles/status/1317124202688925701 shop flyers].


== Archiving schedule ==
== Archiving schedule ==
Line 88: Line 70:
* [https://vadhajtasok.hu vadhajtasok.hu] as part of [[News+C/hu|News+C]]
* [https://vadhajtasok.hu vadhajtasok.hu] as part of [[News+C/hu|News+C]]


=== Already started ===
=== 2025 ===
 
* [[Indafotó]]


=== 2025 ===
# [[blogger.hu]]
# [[blogger.hu]]
# [[cafeblog.hu]]
# [[cafeblog.hu]]
Line 108: Line 87:


However, they might be difficult to archive, too much to archive, not be of high historical importance, run by stable operators (rare!), or a combination of these, which keeps them out of focus.
However, they might be difficult to archive, too much to archive, not be of high historical importance, run by stable operators (rare!), or a combination of these, which keeps them out of focus.
== Restored websites ==
I'm hunting for Hungarian domain names the underlying websites of which have been completely archived, but the domains are currently parked. The goal is to restore content at its original location, thus reviving lots of dead links, as well as providing a near-perfect browsing experience (Wayback Machine is sometimes unable to correctly reproduce links to other pages and page requisites).
Websites successfully restored so far:
* [http://bakonyvolan.hu bakonyvolan.hu]
* [http://balatonvolan.hu balatonvolan.hu]
* [http://idokapu.com idokapu.com]
* [http://klonok.com klonok.com]
* [http://kommenthuszar.com kommenthuszar.com]
* [http://oszdmeg.com oszdmeg.com]
* [http://somlovolan.hu somlovolan.hu]
These archives are entirely self-hosted, they don't rely on the Internet Archive. Due to the way they are served, there may be some lag on occasion, but it's still usable.
== archiveteam.hu ==
On 2021-01-01, I started a Hungarian website for ArchiveTeam, [https://archiveteam.hu archiveteam.hu], with the most important information about ArchiveTeam in general, and archiving efforts of Hungarian websites, for Hungarian readers. (The design of the website is intentionally minimalistic. What I hate about the web these days is that it's full of bloat!) It has also been hosting some interactive services, see below.
I have various plans to make this website better, but at the moment my focus is on saving content, and uploading already saved content to the Internet Archive. The archiveteam.org wiki and its pages continue to qualify the comprehensive and most up-to-date source of information about Hungarian websites.
=== RSZI ===
[https://archiveteam.hu/rszi A Wayback Machine-like search tool] to obtain certain archived files – currently, images saved from image hosting websites. Launched at the same time as archiveteam.hu itself. The motivation for it was that WARCs I archived recently (since ~2016) have not been ingested into the Wayback Machine, so after the websites went down, there was no way to easily access a given file by URL. Currently, the RSZI service provides access to cca. 2.5 million images of three image hosting websites, with relying on WARC files hosted by the Internet Archive.
=== Lecsű ===
A short-lived (August 2021 – January 2022) on-demand semi-automated YouTube video archiving service. Internet Archive didn't like me uploading thousands of random YouTube videos, so the service got discontinued. Fortunately, now there's [[YouTube#Archive Team project|ArchiveTeam's own service]] that can be used for this purpose.


== Philosophy ==
== Philosophy ==
My experience with my few website archiving endavours so far suggests that there are very few websites today that can be mirrored completely in automated ways without human control and intervention. Thus, if one wants to make quality archives even of a small website, it needs more or less attention, often additional work, or several, supplemental runs of archiving tools.
My experience with my few website archiving endavours so far suggests that there are very few websites today that can be mirrored completely in automated ways without human control and intervention. Thus, if one wants to make quality archives even of a small website, it needs more or less attention, often additional work, or several, supplemental runs of archiving tools.


These archiving tools ([https://www.gnu.org/software/wget/manual/wget.html wget], [http://github.com/chfoo/wpull wpull], [[ArchiveBot]] etc.) are very important and useful, but in most cases, are themselves incapable of making complete archives. My philosophy is that we should do as complete and quality archives as possible, if we set off on the journey of archiving a website, so we cannot rely solely on these tools. Of course, constrained by time and resources, we must make a compromise. Otherwise, however, the above applies. At least for me. This is how I archive.
These archiving tools ([https://www.gnu.org/software/wget/manual/wget.html wget], [http://github.com/chfoo/wpull wpull], [[ArchiveBot]] etc.) are very important and useful, but in most cases, are themselves incapable of making complete archives. My philosophy is that we should do as complete and quality archives as possible, if we set off on the journey of archiving a website, so we cannot rely solely on these tools. Of course, constrained by time and resources, we must make a compromise. ''Something'' is better than ''nothing''. Otherwise, however, the above applies. At least for me. This is how I archive.


== My toolbox ==
== My toolbox ==
Line 133: Line 141:


== Further plans ==
== Further plans ==
I hope one day I can re-host Hungarian websites that are dead now but have been archived. Or, at least, create a Wayback Machine for Hungarian websites, that would also serve as a mirror to the corresponding Internet Archive items.


As for the [[URL Team]] project, given that the discovered URLs have not been saved in WARC format (yet) but in a format difficult to access and read, a shorturl-resolver service for already gone URL shorteners would be useful. It would be kind of a Wayback Machine for URL shorteners. It wouldn't even be difficult to set up, based on URL Team databases.
As for the [[URL Team]] project, given that the discovered URLs have not been saved in WARC format but in a format difficult to access and read, a shorturl-resolver service for already gone URL shorteners would be useful. It would be kind of a Wayback Machine for URL shorteners. It wouldn't even be difficult to set up, based on URL Team databases.


I would also be glad to record Hungarian radio and television channels' programme 24/7, but that would require a vast amount of resources, Until / instead of that, I'm collecting some recordings of notable Hungarian TV and radio programmes and moments from [[YouTube]] (and of course, I'm uploading them to the Archive).
As for Hungarian ones, until the corresponding domain names get caught, this could be a new feature on [https://archiveteam.hu archiveteam.hu]. But, this is a future project.


== Hungarian articles about Archive Team ==
== Hungarian articles about Archive Team ==
Line 143: Line 150:


*I've proudly discovered that Archive Team got its own [https://webarchivum.oszk.hu/mediawiki/index.php?title=Archive_Team article] (among ''Organizations'') on the [https://webarchivum.oszk.hu/mediawiki/index.php?title=MIA_WIKI knowledge base] of the Hungarian Internet Archive, that is, the Web Archiving Department of National Széchényi Library, the national library of Hungary! (Date: 2017-07-25).
*I've proudly discovered that Archive Team got its own [https://webarchivum.oszk.hu/mediawiki/index.php?title=Archive_Team article] (among ''Organizations'') on the [https://webarchivum.oszk.hu/mediawiki/index.php?title=MIA_WIKI knowledge base] of the Hungarian Internet Archive, that is, the Web Archiving Department of National Széchényi Library, the national library of Hungary! (Date: 2017-07-25).
** In 2021, I've been approached by them and we started conversations that appeared to become fruitful (e.g. they keeping a copy of all stuff I archived), but after a change in their contact person, I got ignored.)
*Péter Szűcs: ''[http://itcafe.hu/hir/az_internet_nem_felejt.html Az internet nem felejt]'' (''The internet doesn't forget''). itcafe.hu, 2015-03-05. (About ArchiveTeam's activity in general.)
*Péter Szűcs: ''[http://itcafe.hu/hir/az_internet_nem_felejt.html Az internet nem felejt]'' (''The internet doesn't forget''). itcafe.hu, 2015-03-05. (About ArchiveTeam's activity in general.)
*Dániel Dojcsák: ''[http://www.hwsw.hu/hirek/51400/blip-videomegoszto-online-tartalom-premium-torles-arhivum.html Elpusztulhat a nem profitképes online tartalom]'' (''Non-profitable online content may vanish''). hwsw.hu, 2013-12-03. (Mentions ArchiveTeam saving [[Blip.tv|Blip]] videos.)
*Dániel Dojcsák: ''[http://www.hwsw.hu/hirek/51400/blip-videomegoszto-online-tartalom-premium-torles-arhivum.html Elpusztulhat a nem profitképes online tartalom]'' (''Non-profitable online content may vanish''). hwsw.hu, 2013-12-03. (Mentions ArchiveTeam saving [[Blip.tv|Blip]] videos.)

Latest revision as of 09:53, 20 August 2025

bzc6p is a Hungarian amateur archivist who joined the efforts of ArchiveTeam. "Specialized" in watching and saving Hungarian websites.

Contact: vichratimot (at) archiveteam (dot) hu

I also check this wiki once a week, so you can contact me on my talk page as well.

See what I'm archiving.

My projects

Websites that I've archived, I'm archiving or I've taken part in organizing their archival, in reversed chronological order in each category. If the website has an entry on this wiki, consult that page for the archives. If not, a link to the archives should be found in the appropriate line.

Large websites

The archive of each is in the terabyte range.

Medium-sized websites

The archive of each ranges from a few gigabytes to a few hundred gigabytes.

Non-web stuff

I'm also archiving some Hungarian TV and radio programs, magazines and shop flyers.

Archiving schedule

This is a list of my currently going and planned future projects. They are usually preemptive efforts affecting websites that are fine at the moment, but seem to be approaching the end (abandonment, read-only state, operational issues, change in operator etc.), or they are easy to archive with an incremental approach.

Continuous

2025

  1. blogger.hu
  2. cafeblog.hu
  3. fotozz.hu

2026

  1. G-Portál
  2. Ultraweb.hu
  3. gyertyalang.hu

As needed (keeping an eye on them)

Nothing is safe! We have seen multi-terabyte websites go down immediately or with a few months notice!

However, they might be difficult to archive, too much to archive, not be of high historical importance, run by stable operators (rare!), or a combination of these, which keeps them out of focus.

Restored websites

I'm hunting for Hungarian domain names the underlying websites of which have been completely archived, but the domains are currently parked. The goal is to restore content at its original location, thus reviving lots of dead links, as well as providing a near-perfect browsing experience (Wayback Machine is sometimes unable to correctly reproduce links to other pages and page requisites).

Websites successfully restored so far:

These archives are entirely self-hosted, they don't rely on the Internet Archive. Due to the way they are served, there may be some lag on occasion, but it's still usable.

archiveteam.hu

On 2021-01-01, I started a Hungarian website for ArchiveTeam, archiveteam.hu, with the most important information about ArchiveTeam in general, and archiving efforts of Hungarian websites, for Hungarian readers. (The design of the website is intentionally minimalistic. What I hate about the web these days is that it's full of bloat!) It has also been hosting some interactive services, see below.

I have various plans to make this website better, but at the moment my focus is on saving content, and uploading already saved content to the Internet Archive. The archiveteam.org wiki and its pages continue to qualify the comprehensive and most up-to-date source of information about Hungarian websites.

RSZI

A Wayback Machine-like search tool to obtain certain archived files – currently, images saved from image hosting websites. Launched at the same time as archiveteam.hu itself. The motivation for it was that WARCs I archived recently (since ~2016) have not been ingested into the Wayback Machine, so after the websites went down, there was no way to easily access a given file by URL. Currently, the RSZI service provides access to cca. 2.5 million images of three image hosting websites, with relying on WARC files hosted by the Internet Archive.

Lecsű

A short-lived (August 2021 – January 2022) on-demand semi-automated YouTube video archiving service. Internet Archive didn't like me uploading thousands of random YouTube videos, so the service got discontinued. Fortunately, now there's ArchiveTeam's own service that can be used for this purpose.

Philosophy

My experience with my few website archiving endavours so far suggests that there are very few websites today that can be mirrored completely in automated ways without human control and intervention. Thus, if one wants to make quality archives even of a small website, it needs more or less attention, often additional work, or several, supplemental runs of archiving tools.

These archiving tools (wget, wpull, ArchiveBot etc.) are very important and useful, but in most cases, are themselves incapable of making complete archives. My philosophy is that we should do as complete and quality archives as possible, if we set off on the journey of archiving a website, so we cannot rely solely on these tools. Of course, constrained by time and resources, we must make a compromise. Something is better than nothing. Otherwise, however, the above applies. At least for me. This is how I archive.

My toolbox

Archiving websites

  • Chfoo's wpull: No longer maintained, but it's still my favorite tool for archiving websites
    • I'm running Debian 8 (EOL 2020) in VirtualBox in 2025 just for wpull to work... 😅
  • wget: Old but gold, now also with WARC support. Very fast, but lacks some handy features Wpull has got, but it's true the other way around as well.
    • Notably, it can also save POST requests to WARC, which wpull can't
    • Otherwise I use it for website discovery in my archiving scripts. I do the actual WARCing with wpull.
  • Internet Archive's warcprox: provides a proxy to your web browser, so you can easily create WARCs as you browse. Very useful for the News+C project combined with automating a web browser.
  • Bash scripts for website discovery, as well as for collecting URLs in archiving scripts. Simple and fast.
  • Python scripts for more sophisticated tasks (rare).

Replaying WARCs

  • ReplayWebPage: Very convenient, and is similar to how the Wayback Machine works.

Uploading to IA

Further plans

As for the URL Team project, given that the discovered URLs have not been saved in WARC format but in a format difficult to access and read, a shorturl-resolver service for already gone URL shorteners would be useful. It would be kind of a Wayback Machine for URL shorteners. It wouldn't even be difficult to set up, based on URL Team databases.

As for Hungarian ones, until the corresponding domain names get caught, this could be a new feature on archiveteam.hu. But, this is a future project.

Hungarian articles about Archive Team

Below I've collected online Hungarian news articles published about Archive Team that I've been able to find. The list is in reversed chronological order.