Difference between revisions of "Gna!"

Revision as of 17:37, 17 April 2017

Gna!

URL	https://gna.org/
Status	Closing
Archiving status	In progress...
Archiving type	Unknown
IRC channel	#gnarm (on hackint)

Gna! is a centralized location where software developers can develop, distribute and maintain free (GPL-compatible) software. It is an instance of the Savane code-hosting platform^[1].

It is shutting down due to lack of admin effort; probably by end of April / early May 2017.

Hosted data

As of 2017-02 it claimed to have 1458 hosted projects. (Many are probably abandoned and will not be saved by their project admins before shutdown.)

Here's a breakdown of the kinds of data stored and what various people can do to grab the data:

Third party describes what random anonymous Internet people (e.g., Archive Team) can do
- Done shows bits that we have already rescued
- Help shows bits that someone could usefully do
Members describes things that only members of the relevant project can do (if better)

Data stored:

Code hosting using CVS, Subversion, and Arch (done 2017-02-25, not updated since)
- Third parties can grab all code with full history:
  - All subversion repos available via anonymous rsync: rsync://svn.gna.org/svn/ (ref: bottom of every project's svn page e.g. [1]). (In FSFS format, which is supposed to be portable.)
  - Ditto CVS, it looks like: rsync://svn.gna.org/cvs/
  - Arch/tla [2]: rsync://download.gna.org/arch/
- There's also a ViewVC web front-end to browse code. (No point grabbing this if you've got the above)
Ticket tracking (not saved, help wanted)
- Up to 4 trackers per project: 'bugs', 'patch', 'task', 'support'
- Project members (only) can set up XML export of their own ticket text/metadata ("Export" item on tracker admin menu).
  - Only option for third parties looks like web scraping. Help: can someone look into this? (Someone pointed ArchiveBot at it but it doesn't seem to have grabbed much)
  - Exported XML is published to an unauthenticated URL of the form https://gna.org/export/project/user/number.xml . number might be global; a recent export had number 66. In principle this namespace could be mined by third parties although it's a rather large search space (1458 projects * 9116 users * 66 numbers) and would only catch recent or periodic exports, since they are cleared out quickly.
- There's no supported interface for grabbing issue attachments (such as patches) even for project admins though.
  - Third parties can scrape attachments by relying on their increasing integer IDs, e.g. file #29845. It looks like you don't have to get the 'bugs' bit correct, so it's possible to scrape all public files by varying the ID. (done/ongoing by JTN, not uploaded anywhere yet)
- Individual tickets can be private. (Maybe files too?) But the XML export includes private tickets (yes, to an unauthenticated URL).
File hosting at http://download.gna.org/ (done 2017-02-25, not updated since)
- Third parties can do anonymous rsync from rsync://download.gna.org/download/
Project websites on home.gna.org (done 2017-02-25, not updated since)
- e.g. http://home.gna.org/freeciv/
- These are managed via Subversion [3], so grabbing svn by rsync as above should also save website data + history
Mailing lists using Mailman (not saved, help wanted)
- Which means public archives are available to third parties in mbox format (albeit with email addresses mangled). e.g. [4] Help: can someone scrape these? Should be easy.
  - Note, the most recent mbox link on inactive lists (e.g., [5]) is broken; replace "2014-09.partial.mbox.gz" with "2014-09.mbox.gz" to fix it
  - It may be worth grabbing the HTML archives too, as they contain some info not available in the mboxes, e.g. "X-From-R13" in HTML comments contains reversibly obfuscated From address
- Some mailing lists are private. Even project admins can't see the archives at the moment (sr 3421).
Project metadata: groups, users, news, help topics etc. In a database and probably only available via web scraping. Help: can someone look into this?
Usage stats at http://stats.gna.org/

Shutdown Notice

A notice of pending shutdown / request for takeover was first announced in Nov 2016^[2] suggesting a time frame of six months
A news item^[3] about shutdown was posted to the front page 2017-01-31 linking to the above. A reply to that on 4 Feb suggests shutdown will happen "within 3 months, or when the hardware dies".
This suggests shutdown by around the beginning of May 2017.

rsync grab sign-up

All done!

This gets code and file hosting but not other stuff. <180Gibyte, all in.

Please choose --bwlimit wisely (5M?)

What	Size	No files	Who/when
rsync://svn.gna.org/svn/	~41 Gibyte	~1m	PurpleSym 2017-02-25 (via svnrdump; 18G lzip'd) mkram 2017-02-26 (via rsync)
rsync://svn.gna.org/cvs/	~7.5 Gibyte	~200k	mkram 2017-02-25
rsync://download.gna.org/arch/	~318 Mibyte	~71k	mkram 2017-02-25 (except admindir)
rsync://download.gna.org/download/	~116 Gibyte	~130k	mkram 2017-02-25
rsync://download.gna.org/www/	~6.4 Gibyte	~177k	mkram 2017-02-25 (except "some authentication folder and .bashhistory")

For mkram's rsync grab, breakdown by project and upload schedule at Gna!/projects.

References

[1] ttps://gna.org/cookbook/?func=detailitem&item_id=105

[2] ttps://mail.gna.org/public/project/2016-11/msg00001.html

[3] ttps://gna.org/forum/forum.php?forum_id=2545

[1]

[2]

[3]

@@ Line 1: / Line 1: @@
 {{Infobox project
 | title = Gna!
-| URL = http://www.gna.org
+| URL = https://gna.org/
 | project_status = {{closing}}
 | archiving_status = {{inprogress}}
@@ Line 9: / Line 9: @@
 '''Gna!''' is a centralized location where software developers can develop, distribute and maintain free (GPL-compatible) software. It is an instance of the [http://savannah.nongnu.org/p/savane-cleanup Savane] code-hosting platform<ref>https://gna.org/cookbook/?func=detailitem&item_id=105</ref>.
+It is shutting down due to lack of admin effort; probably by end of April / early May 2017.
 == Hosted data ==
@@ Line 14: / Line 16: @@
 As of 2017-02 it claimed to have 1458 hosted projects. (Many are probably abandoned and will not be saved by their project admins before shutdown.)
-* '''Code hosting''' using CVS, Subversion, and Arch
+Here's a breakdown of the kinds of data stored and what various people can do to grab the data:
-** All subversion repos available via anonymous rsync: rsync://svn.gna.org/svn/ (ref: bottom of every project's svn page e.g. [https://gna.org/svn/?group=admin]). (In [http://svnbook.red-bean.com/en/1.7/svn.reposadmin.planning.html#svn.reposadmin.basics.backends.fsfs FSFS] format, which is supposed to be portable.)
+* '''<font color=red>Third party</font>''' describes what random anonymous Internet people (e.g., Archive Team) can do
-** Ditto CVS, it looks like: rsync://svn.gna.org/cvs/
+** '''<font color=green>Done</font>''' shows bits that we have already rescued
-** Arch/tla [https://gna.org/cookbook/?func=detailitem&item_id=101]: rsync://download.gna.org/arch/
+** '''<span style="background-color: yellow">Help</span>''' shows bits that someone could usefully do
-** There's also a ViewVC web front-end to browse code.
+* '''<font color=blue>Members</font>''' describes things that only members of the relevant project can do (if better)
-* '''Ticket tracking'''
+Data stored:
+* '''Code hosting''' using CVS, Subversion, and Arch  ('''<font color=green>done</font>''' 2017-02-25, not updated since)
+** '''<font color=red>Third parties</font>''' can grab all code with full history:
+*** All subversion repos available via anonymous rsync: rsync://svn.gna.org/svn/ (ref: bottom of every project's svn page e.g. [https://gna.org/svn/?group=admin]). (In [http://svnbook.red-bean.com/en/1.7/svn.reposadmin.planning.html#svn.reposadmin.basics.backends.fsfs FSFS] format, which is supposed to be portable.)
+*** Ditto CVS, it looks like: rsync://svn.gna.org/cvs/
+*** Arch/tla [https://gna.org/cookbook/?func=detailitem&item_id=101]: rsync://download.gna.org/arch/
+** There's also a ViewVC web front-end to browse code. (No point grabbing this if you've got the above)
+* '''Ticket tracking''' ('''<span style="background-color: yellow">not saved, help wanted</span>''')
 ** Up to 4 trackers per project: 'bugs', 'patch', 'task', 'support'
-** Project admins (only) can set up XML export of their own ticket text/metadata ("Export" item on tracker admin menu).
+** '''<font color=blue>Project members</font>''' (only) can set up XML export of their own ticket text/metadata ("Export" item on tracker admin menu).
-*** Only option for third parties looks like web scraping.
+*** Only option for '''<font color=red>third parties</font>''' looks like web scraping. '''<span style="background-color: yellow">Help</span>''': can someone look into this? (Someone pointed [[ArchiveBot]] at it but [http://archive.fart.website/archivebot/viewer/?q=gna.org it doesn't seem to have grabbed much])
+*** Exported XML is published to an unauthenticated URL of the form <nowiki>https://</nowiki>gna.org/export/''project''/''user''/''number''.xml . ''number'' might be global; a recent export had number 66. In principle this namespace could be mined by third parties although it's a rather large search space (1458 projects * 9116 users * 66 numbers) and would only catch recent or periodic exports, since they are cleared out quickly.
 ** There's no supported interface for grabbing issue attachments (such as patches) even for project admins though.
-*** Attached files are allocated global increasing integer IDs, e.g. [https://gna.org/bugs/download.php?file_id=29845 file #29845]. It looks like you don't have to get the 'bugs' bit correct, so it's possible to scrape all public files by varying the ID.
+*** '''<font color=red>Third parties</font>''' can scrape attachments by relying on their increasing integer IDs, e.g. [https://gna.org/bugs/download.php?file_id=29845 file #29845]. It looks like you don't have to get the 'bugs' bit correct, so it's possible to scrape all public files by varying the ID. ('''<font color=green>done</font>'''/ongoing by [[User:JTN|JTN]], not uploaded anywhere yet)
-** Individual tickets can be private. (Maybe files too?)
+** Individual tickets can be private. (Maybe files too?) But the XML export includes private tickets (yes, to an unauthenticated URL).
-* '''File hosting''' at http://download.gna.org/
+* '''File hosting''' at http://download.gna.org/  ('''<font color=green>done</font>''' 2017-02-25, not updated since)
-** Anonymous rsync available at rsync://download.gna.org/download/
+** '''<font color=red>Third parties</font>''' can do anonymous rsync from rsync://download.gna.org/download/
-* '''Project websites''' on home.gna.org
+* '''Project websites''' on home.gna.org ('''<font color=green>done</font>''' 2017-02-25, not updated since)
 ** e.g. http://home.gna.org/freeciv/
 ** These are managed via Subversion [https://gna.org/cookbook/?func=detailitem&item_id=107], so grabbing svn by rsync as above should also save website data + history
-* '''Mailing lists''' using [[Mailman]]
+* '''Mailing lists''' using [[Mailman]] ('''<span style="background-color: yellow">not saved, help wanted</span>''')
-** Which means public archives are available in mbox format (albeit with email addresses mangled). e.g. [https://mail.gna.org/public/freeciv-announce/]
+** Which means public archives are available to '''<font color=red>third parties</font>''' in mbox format (albeit with email addresses mangled). e.g. [https://mail.gna.org/public/freeciv-announce/] '''<span style="background-color: yellow">Help</span>''': can someone scrape these? Should be easy.
-** Some mailing lists are private.
+*** Note, the most recent mbox link on inactive lists (e.g., [https://mail.gna.org/public/freeciv-warclient-commits/]) is broken; replace "2014-09.partial.mbox.gz" with "2014-09.mbox.gz" to fix it
-* '''Project metadata''': groups, users, news, help topics etc. In a database and probably only available via web scraping.
+*** It may be worth grabbing the HTML archives too, as they contain some info not available in the mboxes, e.g. "X-From-R13" in HTML comments contains reversibly obfuscated From address
+** Some mailing lists are private. Even '''<font color=blue>project admins</font>''' can't see the archives at the moment ([https://gna.org/support/?3421 sr 3421]).
+* '''Project metadata''': groups, users, news, help topics etc. In a database and probably only available via web scraping. '''<span style="background-color: yellow">Help</span>''': can someone look into this?
 * '''Usage stats''' at http://stats.gna.org/

Difference between revisions of "Gna!"

Revision as of 17:37, 17 April 2017

Contents

Hosted data

Shutdown Notice

rsync grab sign-up

References

Navigation menu

Difference between revisions of "Gna!"

Revision as of 17:37, 17 April 2017

Hosted data

Shutdown Notice

rsync grab sign-up

References

Navigation menu

Search