Difference between revisions of "SourceForge"

From Archiveteam
Jump to navigation Jump to search
m (sourceforge.jp is a completely different website)
m
(12 intermediate revisions by 8 users not shown)
Line 3: Line 3:
| image = SourceForge.png
| image = SourceForge.png
| description =  
| description =  
| URL = {{url|1=http://sourceforge.net/|2=sourceforge.net}}
| URL = {{url|https://sourceforge.net/}}
| project_status = {{online}}
| project_status = {{online}}
| archiving_status = {{inprogress}}
| archiving_status = {{notsavedyet}}
| source = [https://github.com/ArchiveTeam/sourceforge-grab sourceforge-grab], [https://github.com/ArchiveTeam/sourceforge-grab-rsync sourceforge-grab-rsync]
| source = [https://github.com/ArchiveTeam/sourceforge-grab sourceforge-grab], [https://github.com/ArchiveTeam/sourceforge-grab-rsync sourceforge-grab-rsync]
| tracker = [http://tracker.archiveteam.org/sourceforge sourceforge], [http://tracker.archiveteam.org/sourceforgersync sourceforgersync]
| tracker = [https://tracker.archiveteam.org/sourceforge sourceforge], [https://tracker.archiveteam.org/sourceforgersync sourceforgersync]
| irc = coldstorage
| irc = sourceforget
| irc_network = hackint
}}
}}


Line 15: Line 16:
It's really old, ad supported, adware supported. And yet, it is still alive.
It's really old, ad supported, adware supported. And yet, it is still alive.


It hosts code migrated from [[BerliOS]]<ref>https://joinup.ec.europa.eu/news/german-open-source-development-site-berlios-joins-sourceforge</ref> which shut down.
It hosts code migrated from [[BerliOS]]<ref>https://joinup.ec.europa.eu/collection/open-source-observatory-osor/news/german-open-source-developmen</ref> which shut down.


== Shutdown? ==
== Shutdown? ==
Line 35: Line 36:
=== 2015: Admins hijacking projects to add more adware ===
=== 2015: Admins hijacking projects to add more adware ===


http://lwn.net/SubscriberLink/646118/f8f6483b64fdafb9/
https://lwn.net/Articles/646118/


== Site Structure ==  
== Site Structure ==  
Line 41: Line 42:
* 444,202 project URLs found: https://github.com/marcroberts/archiveteam-sourceforge-lister/blob/master/projects-sorted.txt
* 444,202 project URLs found: https://github.com/marcroberts/archiveteam-sourceforge-lister/blob/master/projects-sorted.txt


Download files can be found on public ftp mirrors, priority on the rest of the site then download files last? e.g. http://www.mirrorservice.org/sites/ftp.sourceforge.net/
Download files can be found on public ftp mirrors, priority on the rest of the site then download files last? e.g. https://www.mirrorservice.org/sites/ftp.sourceforge.net/


CVS/svn/git/hg/bzr repositories should be a priority; many projects do not have their source code on the ftp mirrors.
CVS/svn/git/hg/bzr repositories should be a priority; many projects do not have their source code on the ftp mirrors.


The main API is documented here: http://sourceforge.net/p/forge/documentation/Allura%20API/ and allows unauthenticated access to most services.  It also can indicate what revision control system is used.
The main API is documented here: https://sourceforge.net/p/forge/documentation/Allura%20API/ and allows unauthenticated access to most services.  It also can indicate what revision control system is used.


Appropriate tools, (such as git clone -m and svnrdump) can be used to backup, but SF suggests using rsync regardless of the actual revision control system used.
Appropriate tools, (such as git clone -m and svnrdump) can be used to backup, but SF suggests using rsync regardless of the actual revision control system used.
Line 51: Line 52:
* Some projects have subdomain sites. Ex: http://supertuxkart.sourceforge.net/  Many can be listed by using the project API as an "external_homepage".
* Some projects have subdomain sites. Ex: http://supertuxkart.sourceforge.net/  Many can be listed by using the project API as an "external_homepage".


== How can I help? ==
== Archiving ==


There are two projects: one that grabs the web content and a copy of the binaries, and another that grabs the sourcecode repositories via rsync.
On June 17, 2015, ArchiveTeam started two simultaneous grabbing process: one for web-based content and binaries, and one for rsync-able source code repositories. Shortly afterwards, someone claiming to be a SourceForge staff member, told us to stop and first contact their representative.


For both, you can choose selecting the project in the [[Warrior]] appliance (only one of them), or set up and run the script(s) manually.
<div style="width:100%">
<pre>
jún 18 22:08:45 <burley-sf> FYI: I just blocked your archive client
jún 18 22:09:05 <JRWR>      oh?
jún 18 22:09:07 <burley-sf> it's not following robots.txt, and hitting recursive deep dives
jún 18 22:09:18 <JRWR>      oh my
jún 18 22:09:24 <arkiver>  burley-sf: We're currently trying to archive the software on your website
jún 18 22:09:26 <burley-sf> I'll also be killing the rsync's here soon, you are going too heavy on this
jún 18 22:09:37 <burley-sf> I understand, and I am OK with that -- but not the way you are doing it
jún 18 22:09:58 <arkiver>  burley-sf: What is your limit?
jún 18 22:09:59 <burley-sf> I suggest you stop, so I don't have to block the IPs for rsync
jún 18 22:10:06 <burley-sf> and reach out to our community guy
jún 18 22:10:14 <burley-sf> gimme min and I'll give you an email address
jún 18 22:10:24 <achip>    rsync is paused
jún 18 22:10:38 <arkiver>  burley-sf: thank you
jún 18 22:11:05 <burley-sf> rgaloppini@slashdotmedia.com
[...]
jún 18 22:36:13 <burley-sf> So reach out to Roberto at the address above and then I am sure we can sort something out that doesn't cause impact to the other users
jún 18 22:37:35 <burley-sf> And if you need to reach me for some reason -- david@sourceforge.net
</pre>
</div>


=== Web grab ===
We attempted to contact them but got no reply.
 
'''Warrior:''' SourceForge
 
'''Script:''' http://github.com/ArchiveTeam/sourceforge-grab
 
=== Code rsync ===
 
'''Warrior:''' SourceForge Rsync
 
'''Script:''' http://github.com/ArchiveTeam/sourceforge-grab-rsync
 
'''IMPORTANT''': in case of the rsync project, an item might be even tens (in rare cases, hundreds) of gigabytes in size! If you don't have very much free disk space, don't use high concurreny level!
 
=== General info for script runners ===
Read the instructions (README) of the corresponding repository.
 
{| class="mw-collapsible mw-collapsed" style="text-align:left;"
! Some additional information
|-
| Don't forget to replace YOURNICKHERE with your nickname.
 
The number after <code>--concurrent</code> determines how many threads run at the same time. You can increase this number if your resources (RAM, CPU, HDD, bandwidth) are sufficient. However, if you constantly see messages about rate limiting, there is no need to increase the concurrency.
 
If you want to stop the script, please do it gracefully if possible. To do so, create an empty file named '''STOP''' in the folder of the script (terminal command: <code>touch STOP</code>). The script finishes the current item(s) and stops only after that. (If you kill the script immediately, the items get broken, and they will need to be reassigned to another user.) – Before starting the script again, don't forget to remove the STOP file.
 
If you see "Project code is out of date", kill the script, go to its folder and issue <code>git pull REPOSITORY</code>, where REPOSITORY stands for the URL of either the <code>sourceforge-grab</code> or the <code>sourceforge-grab-rsync</code> repository, see above. After the updating has finished, re-launch the script.
|}
 
=== Donating to the Internet Archive ===
 
Content downloaded by the ArchiveTeam will be uploaded to the [[Internet Archive]], where it will be stored and be available – hopefully – forever. However, storing it costs thousands of dollars in the long run. So, if you can afford, please consider donating to the Internet Archive, so that this piece of history can be kept for us all. http://archive.org/donate
 
=== Do you like our cause? ===
 
If you want to help in other projects, want to learn more about ArchiveTeam, or even help in development in general, navigate to the [[Main Page]] of this wiki, from there you can reach a lot of information. The Team consists of volunteers working on the projects in their free time, so helping hands (and resources) are always welcome.


== References ==
== References ==
Line 99: Line 85:


== External links ==
== External links ==
* {{url|1=http://sourceforge.net/|2=SourceForge}}
* {{url|https://sourceforge.net/|2=SourceForge}}


{{Navigation box}}
{{Navigation box}}
[[category:Code]]

Revision as of 03:21, 29 March 2021

SourceForge
SourceForge.png
URL https://sourceforge.net/[IAWcite.todayMemWeb]
Status Online!
Archiving status Not saved yet
Archiving type Unknown
Project source sourceforge-grab, sourceforge-grab-rsync
Project tracker sourceforge, sourceforgersync
IRC channel #sourceforget (on hackint)

SourceForge is a free software repository.

It's really old, ad supported, adware supported. And yet, it is still alive.

It hosts code migrated from BerliOS[1] which shut down.

Shutdown?

2015: Removal of FRS Area

Hello,
You have been identified as having saved files in you user FRS profile area (/home/pfs/<username>. We are planning on removing this
area for user accounts on March 17th 2015. We wanted to give you the opportunity to move your data to a new location before we
remove the data. Here is a link that should help you with moving your data:
https://sourceforge.net/p/forge/documentation/SFTP/
If you need any help please contact us.
Thanks
SourceForge.net Support
sfnet_ops@slashdotmedia.com
https://sourceforge.net/support

[2]

2015: Admins hijacking projects to add more adware

https://lwn.net/Articles/646118/

Site Structure

Download files can be found on public ftp mirrors, priority on the rest of the site then download files last? e.g. https://www.mirrorservice.org/sites/ftp.sourceforge.net/

CVS/svn/git/hg/bzr repositories should be a priority; many projects do not have their source code on the ftp mirrors.

The main API is documented here: https://sourceforge.net/p/forge/documentation/Allura%20API/ and allows unauthenticated access to most services. It also can indicate what revision control system is used.

Appropriate tools, (such as git clone -m and svnrdump) can be used to backup, but SF suggests using rsync regardless of the actual revision control system used.

Archiving

On June 17, 2015, ArchiveTeam started two simultaneous grabbing process: one for web-based content and binaries, and one for rsync-able source code repositories. Shortly afterwards, someone claiming to be a SourceForge staff member, told us to stop and first contact their representative.

jún 18 22:08:45 <burley-sf> FYI: I just blocked your archive client
jún 18 22:09:05 <JRWR>      oh?
jún 18 22:09:07 <burley-sf> it's not following robots.txt, and hitting recursive deep dives
jún 18 22:09:18 <JRWR>      oh my
jún 18 22:09:24 <arkiver>   burley-sf: We're currently trying to archive the software on your website
jún 18 22:09:26 <burley-sf> I'll also be killing the rsync's here soon, you are going too heavy on this
jún 18 22:09:37 <burley-sf> I understand, and I am OK with that -- but not the way you are doing it
jún 18 22:09:58 <arkiver>   burley-sf: What is your limit?
jún 18 22:09:59 <burley-sf> I suggest you stop, so I don't have to block the IPs for rsync
jún 18 22:10:06 <burley-sf> and reach out to our community guy
jún 18 22:10:14 <burley-sf> gimme min and I'll give you an email address
jún 18 22:10:24 <achip>     rsync is paused
jún 18 22:10:38 <arkiver>   burley-sf: thank you
jún 18 22:11:05 <burley-sf> rgaloppini@slashdotmedia.com
[...]
jún 18 22:36:13 <burley-sf> So reach out to Roberto at the address above and then I am sure we can sort something out that doesn't cause impact to the other users
jún 18 22:37:35 <burley-sf> And if you need to reach me for some reason -- david@sourceforge.net

We attempted to contact them but got no reply.

References

External links