From Archiveteam
Revision as of 20:54, 22 November 2021 by Jake (talk | contribs) (Update to IA item)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
Gna! screenshot 20170225.png
Status Offline
Archiving status Partially saved
Archiving type Unknown
IRC channel #archiveteam-bs (on hackint)
(formerly #gnarm (on EFnet))
Data[how to use] Gna!#Where's the stuff then?

Gna! was a centralized location where software developers could develop, distribute and maintain free (GPL-compatible) software. It was an instance of the Savane code-hosting platform[1]. It hosted popular free software projects such as Battle for Wesnoth[2] and Freeciv[3] (full list).

It shut down due to lack of admin effort; the pending shutdown was first announced in Nov 2016, and it finally shut down on May 24 2017.

Where's the stuff then?

Here's links to all the known saved material:

What mkram, 2017-02 Zeryl, 2017-05 JTN, 2017-05
Code hosting, project files, project websites Gna_code_hosting as of 2017-02-25 (more details) ? -
Mailing list archives - mail.gna.org_2017-05-04 (ZIP)
gna.org_2017-05-08_html_mailman_archive (WARC)
Ticket trackers - gna_tickets
Inaccessible because marked as spam!
Ticket attachments - ? FIXME: as of 2017-05-24; not uploaded yet

Zeryl may also have arranged that some of their material (including some not shown here?) was ingested into the Wayback Machine.

On IRC, Zeryl said at one point "I've got: bugs, rsync items, support tickets, tasks, tickets, the warc of the mail site, mailman mboxes, mailman by thread, mailman by date. 213gb, heh". JTN doesn't know where their copy of 'rsync items' went.

Hosted data

As of 2017-02 it claimed to have 1458 hosted projects. (Many are probably abandoned and will not be saved by their project admins before shutdown.)

Here's a breakdown of the kinds of data stored and what various people can do to grab the data:

  • Third party describes what random anonymous Internet people (e.g., Archive Team) can do
    • Done shows bits that we have already rescued
    • Not done shows bits that aren't known to have been saved
  • Members describes things that only members of the relevant project can do (if better)

Data stored:

  • Code hosting using CVS, Subversion, and Arch (done 2017-02-25, not updated since; see subpage)
    • Third parties can grab all code with full history:
      • All subversion repos available via (insecure) anonymous rsync: rsync:// (ref: bottom of every project's svn page e.g. [1]). (In FSFS format, which is supposed to be portable.)
        • Gna members can get the same data with integrity protection over SSH (for any svn repository), but must use svnrdump; this supposedly creates a faithful copy of the important stuff, but is lightly munged.
      • Ditto CVS, it looks like: rsync://
        • Gna members can get the same data over SSH (for any project), but must use CVS commands. Don't know if there's a standard tool for reconstructing server-side repo state.
      • Arch/tla [2]: rsync://
        • Gna members can get the same data securely over sftp (for any project)
    • There's also a ViewVC web front-end to browse SVN/CVS code. (No point grabbing this if you've got the above)
  • Ticket tracking (done 2017-05-04 by Zeryl04: gna_tickets)
    • Up to 4 trackers per project: 'bugs', 'patch', 'task', 'support'
    • Gna members (only) can set up XML export of their own ticket text/metadata ("Export" item on tracker admin menu).
      • Only option for third parties looks like web scraping. (Someone pointed ArchiveBot at it but it doesn't seem to have grabbed much)
      • Exported XML is published to an unauthenticated URL of the form . number might be global; a recent export had number 66. In principle this namespace could be mined by third parties although it's a rather large search space (1458 projects * 9116 users * 66 numbers) and would only catch recent or periodic exports, since they are cleared out quickly.
    • There's no supported interface for grabbing issue attachments (such as patches) even for project admins though.
      • Third parties can scrape attachments by relying on their increasing integer IDs, e.g. file #29845. It looks like you don't have to get the 'bugs' bit correct, so it's possible to scrape all public files by varying the ID. (done to 2017-05-24 by JTN, not uploaded anywhere yet)
    • Individual tickets can be private. (Maybe files too?) But the XML export includes private tickets (yes, to an unauthenticated URL).
  • File hosting at (done 2017-02-25, not updated since; see subpage)
    • Third parties can do (insecure) anonymous rsync from rsync://
    • Gna members can get the same data (for any project) securely with rsync-over-SSH (rsync -avz dest/), or with sftp
  • Project websites on (done 2017-02-25, not updated since; see subpage)
  • Mailing lists using Mailman (done 2017-05-04 by Zeryl04 using this code; got public HTML+mbox, uploaded as mail.gna.org_2017-05-04; also 2017-05-08 gna.org_2017-05-08_html_mailman_archive. ArchiveBot also has something, not sure what.)
    • Which means public archives are available to third parties in mbox format (albeit with email addresses mangled). e.g. [4]
      • Note, the most recent mbox link on inactive lists (e.g., [5]) is broken; replace "2014-09.partial.mbox.gz" with "2014-09.mbox.gz" to fix it
      • It may be worth grabbing the HTML archives too, as they contain some info not available in the mboxes, e.g. "X-From-R13" in HTML comments contains reversibly obfuscated From address
    • Some mailing lists are private. Even project admins can't see the archives at the moment (sr 3421).
  • Project metadata: groups, users, news, help topics etc. In a database and probably only available via web scraping. Not done
  • Usage stats at Not done

Gna admins have not so far been responsive to requests for help from at least some project members wishing to migrate or rescue their data, presumably due to the same lack of effort that is why the site is shutting down. They haven't been approached about Archive Team style bulk backup (or at least JTN has not done so).

Shutdown Notice

  • A notice of pending shutdown / request for takeover was first announced in Nov 2016[4] suggesting a time frame of six months
  • A news item[5] about shutdown was posted to the front page 2017-01-31 linking to the above. A reply to that on 4 Feb suggests shutdown will happen "within 3 months, or when the hardware dies".
  • This suggests shutdown by around the beginning of May 2017.
  • As of early May 2017, it was still up, although its SSL certificate had been allowed to lapse.

Shutdown stopped responding during 24 May, 2017. This was unannounced, but a Gna admin confirmed that the shutdown was deliberate on #gna IRC later that day:

20:53 < jtn> has stopped responding. I guess this is it. Thanks for 
21:50 < zerodeux> yes, it's been shut down for good
21:50 < zerodeux> some traces now left on!