Difference between revisions of "Gna!"

From Archiveteam
Jump to navigation Jump to search
(clearer status, requests for help + a few more details from my own investigations)
(moved rsync table to (renamed) subpage)
Line 23: Line 23:


Data stored:
Data stored:
* '''Code hosting''' using CVS, Subversion, and Arch  ('''<font color=green>done</font>''' 2017-02-25, not updated since)
* '''Code hosting''' using CVS, Subversion, and Arch  ('''<font color=green>done</font>''' 2017-02-25, not updated since; see [[Gna!/code_and_downloads|subpage]])
** '''<font color=red>Third parties</font>''' can grab all code with full history:
** '''<font color=red>Third parties</font>''' can grab all code with full history:
*** All subversion repos available via anonymous rsync: rsync://svn.gna.org/svn/ (ref: bottom of every project's svn page e.g. [https://gna.org/svn/?group=admin]). (In [http://svnbook.red-bean.com/en/1.7/svn.reposadmin.planning.html#svn.reposadmin.basics.backends.fsfs FSFS] format, which is supposed to be portable.)
*** All subversion repos available via anonymous rsync: rsync://svn.gna.org/svn/ (ref: bottom of every project's svn page e.g. [https://gna.org/svn/?group=admin]). (In [http://svnbook.red-bean.com/en/1.7/svn.reposadmin.planning.html#svn.reposadmin.basics.backends.fsfs FSFS] format, which is supposed to be portable.)
Line 37: Line 37:
*** '''<font color=red>Third parties</font>''' can scrape attachments by relying on their increasing integer IDs, e.g. [https://gna.org/bugs/download.php?file_id=29845 file #29845]. It looks like you don't have to get the 'bugs' bit correct, so it's possible to scrape all public files by varying the ID. ('''<font color=green>done</font>'''/ongoing by [[User:JTN|JTN]], not uploaded anywhere yet)
*** '''<font color=red>Third parties</font>''' can scrape attachments by relying on their increasing integer IDs, e.g. [https://gna.org/bugs/download.php?file_id=29845 file #29845]. It looks like you don't have to get the 'bugs' bit correct, so it's possible to scrape all public files by varying the ID. ('''<font color=green>done</font>'''/ongoing by [[User:JTN|JTN]], not uploaded anywhere yet)
** Individual tickets can be private. (Maybe files too?) But the XML export includes private tickets (yes, to an unauthenticated URL).
** Individual tickets can be private. (Maybe files too?) But the XML export includes private tickets (yes, to an unauthenticated URL).
* '''File hosting''' at http://download.gna.org/  ('''<font color=green>done</font>''' 2017-02-25, not updated since)
* '''File hosting''' at http://download.gna.org/  ('''<font color=green>done</font>''' 2017-02-25, not updated since; see [[Gna!/code_and_downloads|subpage]])
** '''<font color=red>Third parties</font>''' can do anonymous rsync from rsync://download.gna.org/download/
** '''<font color=red>Third parties</font>''' can do anonymous rsync from rsync://download.gna.org/download/
* '''Project websites''' on home.gna.org ('''<font color=green>done</font>''' 2017-02-25, not updated since)
* '''Project websites''' on home.gna.org ('''<font color=green>done</font>''' 2017-02-25, not updated since; see [[Gna!/code_and_downloads|subpage]])
** e.g. http://home.gna.org/freeciv/
** e.g. http://home.gna.org/freeciv/
** These are managed via Subversion [https://gna.org/cookbook/?func=detailitem&item_id=107], so grabbing svn by rsync as above should also save website data + history
** These are managed via Subversion [https://gna.org/cookbook/?func=detailitem&item_id=107], so grabbing svn by rsync as above should also save website data + history
Line 55: Line 55:
* A news item<ref>https://gna.org/forum/forum.php?forum_id=2545</ref> about shutdown was posted to the front page 2017-01-31 linking to the above. A reply to that on 4 Feb suggests shutdown will happen "within 3 months, or when the hardware dies".
* A news item<ref>https://gna.org/forum/forum.php?forum_id=2545</ref> about shutdown was posted to the front page 2017-01-31 linking to the above. A reply to that on 4 Feb suggests shutdown will happen "within 3 months, or when the hardware dies".
* This suggests shutdown by around the beginning of May 2017.
* This suggests shutdown by around the beginning of May 2017.
== rsync grab sign-up ==
:''All done!''
This gets code and file hosting but not other stuff. &lt;180Gibyte, all in.
Please choose --bwlimit wisely (5M?)
{| border=1 cellspacing=1
! What
! Size
! No files
! Who/when
|-
| rsync://svn.gna.org/svn/
| ~41 Gibyte
| ~1m
| PurpleSym 2017-02-25 (via svnrdump; 18G lzip'd)<br>mkram 2017-02-26 (via rsync)
|-
| rsync://svn.gna.org/cvs/
| ~7.5 Gibyte
| ~200k
| mkram 2017-02-25
|-
| rsync://download.gna.org/arch/
| ~318 Mibyte
| ~71k
| mkram 2017-02-25 (except admindir)
|-
| rsync://download.gna.org/download/
| ~116 Gibyte
| ~130k
| mkram 2017-02-25
|-
| rsync://download.gna.org/www/
| ~6.4 Gibyte
| ~177k
| mkram 2017-02-25 (except "some authentication folder and .bashhistory")
|}
For mkram's rsync grab, breakdown by project and upload schedule at [[Gna!/projects]].


== References ==
== References ==


<references />
<references />

Revision as of 17:51, 17 April 2017

Gna!
Gna.org screenshot 20170225.png
URL https://gna.org/
Status Closing
Archiving status In progress...
Archiving type Unknown
IRC channel #gnarm (on hackint)

Gna! is a centralized location where software developers can develop, distribute and maintain free (GPL-compatible) software. It is an instance of the Savane code-hosting platform[1].

It is shutting down due to lack of admin effort; probably by end of April / early May 2017.

Hosted data

As of 2017-02 it claimed to have 1458 hosted projects. (Many are probably abandoned and will not be saved by their project admins before shutdown.)

Here's a breakdown of the kinds of data stored and what various people can do to grab the data:

  • Third party describes what random anonymous Internet people (e.g., Archive Team) can do
    • Done shows bits that we have already rescued
    • Help shows bits that someone could usefully do
  • Members describes things that only members of the relevant project can do (if better)

Data stored:

  • Code hosting using CVS, Subversion, and Arch (done 2017-02-25, not updated since; see subpage)
    • Third parties can grab all code with full history:
      • All subversion repos available via anonymous rsync: rsync://svn.gna.org/svn/ (ref: bottom of every project's svn page e.g. [1]). (In FSFS format, which is supposed to be portable.)
      • Ditto CVS, it looks like: rsync://svn.gna.org/cvs/
      • Arch/tla [2]: rsync://download.gna.org/arch/
    • There's also a ViewVC web front-end to browse code. (No point grabbing this if you've got the above)
  • Ticket tracking (not saved, help wanted)
    • Up to 4 trackers per project: 'bugs', 'patch', 'task', 'support'
    • Project members (only) can set up XML export of their own ticket text/metadata ("Export" item on tracker admin menu).
      • Only option for third parties looks like web scraping. Help: can someone look into this? (Someone pointed ArchiveBot at it but it doesn't seem to have grabbed much)
      • Exported XML is published to an unauthenticated URL of the form https://gna.org/export/project/user/number.xml . number might be global; a recent export had number 66. In principle this namespace could be mined by third parties although it's a rather large search space (1458 projects * 9116 users * 66 numbers) and would only catch recent or periodic exports, since they are cleared out quickly.
    • There's no supported interface for grabbing issue attachments (such as patches) even for project admins though.
      • Third parties can scrape attachments by relying on their increasing integer IDs, e.g. file #29845. It looks like you don't have to get the 'bugs' bit correct, so it's possible to scrape all public files by varying the ID. (done/ongoing by JTN, not uploaded anywhere yet)
    • Individual tickets can be private. (Maybe files too?) But the XML export includes private tickets (yes, to an unauthenticated URL).
  • File hosting at http://download.gna.org/ (done 2017-02-25, not updated since; see subpage)
    • Third parties can do anonymous rsync from rsync://download.gna.org/download/
  • Project websites on home.gna.org (done 2017-02-25, not updated since; see subpage)
  • Mailing lists using Mailman (not saved, help wanted)
    • Which means public archives are available to third parties in mbox format (albeit with email addresses mangled). e.g. [4] Help: can someone scrape these? Should be easy.
      • Note, the most recent mbox link on inactive lists (e.g., [5]) is broken; replace "2014-09.partial.mbox.gz" with "2014-09.mbox.gz" to fix it
      • It may be worth grabbing the HTML archives too, as they contain some info not available in the mboxes, e.g. "X-From-R13" in HTML comments contains reversibly obfuscated From address
    • Some mailing lists are private. Even project admins can't see the archives at the moment (sr 3421).
  • Project metadata: groups, users, news, help topics etc. In a database and probably only available via web scraping. Help: can someone look into this?
  • Usage stats at http://stats.gna.org/

Shutdown Notice

  • A notice of pending shutdown / request for takeover was first announced in Nov 2016[2] suggesting a time frame of six months
  • A news item[3] about shutdown was posted to the front page 2017-01-31 linking to the above. A reply to that on 4 Feb suggests shutdown will happen "within 3 months, or when the hardware dies".
  • This suggests shutdown by around the beginning of May 2017.

References