https://wiki.archiveteam.org/api.php?action=feedcontributions&user=No2pencil&feedformat=atomArchiveteam - User contributions [en]2024-03-28T11:37:23ZUser contributionsMediaWiki 1.37.1https://wiki.archiveteam.org/index.php?title=FortuneCity&diff=7430FortuneCity2012-03-18T08:34:54Z<p>No2pencil: /* configure: error: --with-ssl was given, but GNUTLS is not available. */</p>
<hr />
<div>{{Infobox project<br />
| title = FortuneCity<br />
| image = Fortunecity 1304522840896.png<br />
| description = <br />
| URL = {{url|1=http://www.fortunecity.com/}}<br />
| project_status = {{closing}} April 30th, 2012<br />
| tracker = http://focity.heroku.com/<br />
| source = https://github.com/ArchiveTeam/fortunecity<br />
| archiving_status = {{nosavedyet}}<br />
| irc = fortuneshitty<br />
}}<br />
<br />
<br />
== How to help ==<br />
<br />
To run one or more FortuneCity downloaders, you'll need to be on Linux or a Linux-like OS.<br />
<br />
Setting up:<br />
<pre><br />
git clone git://github.com/ArchiveTeam/fortunecity.git<br />
cd fortunecity<br />
./get-wget-warc.sh<br />
</pre><br />
<br />
Check the output: does it say wget is successfully compiled? Great!<br />
<br />
Now you can run a download client. Choose a nickname and run:<br />
<pre><br />
./seesaw.sh YOURNICK<br />
</pre><br />
<br />
The script should start downloading and uploading. If it works, feel free to run a few more!<br />
<br />
<b>Please do not run more than 10 downloaders.</b> It won't work. We need many individuals making small contributions.<br />
<br />
If you want to stop, you can just kill the scripts. To stop gracefully,<br />
<pre><br />
touch STOP<br />
</pre><br />
and the script will stop after the current user.<br />
<br />
There is no need to run upload-finished.sh. seesaw.sh will automatically upload your finished users to us.<br />
<br />
== Common Problems ==<br />
<br />
=== configure: error: --with-ssl was given, but GNUTLS is not available. ===<br />
<br />
You don't have the proper headers available to be able to compile wget-warc. It should be fairly easy to fix though. If you're using a Debian or Ubuntu-based Linux distribution:<br />
<br />
apt-get install libgnutls-dev<br />
<br />
If you're using a Fedora distribution:<br />
<br />
yum install gnutls-devel<br />
<br />
If you're using FreeBSD :<br />
cd /usr/ports/security/gnutls/<br />
make all install clean<br />
<br />
If you're using something else you'll just have to poke around in the documentation and figure it out :)<br />
<br />
== Status ==<br />
<br />
Archiving is well under way. Check the tracker for an up to date status report.<br />
<br />
== Site structure ==<br />
<br />
FortuneCities operated in multiple tlds: <code>com co.uk es it se</code><br />
<br />
(Once operated but died: <code>de fr nl cn.fortunecity.com</code>)<br />
<br />
Main website:<br />
<ul><br />
<li><nowiki>http://www.fortunecity.${tld}/</nowiki></li><br />
</ul><br />
<br />
Username-based sites:<br />
<ul><br />
<li><nowiki>http://members.fortunecity.${tld}/</nowiki></li><br />
<li><nowiki>http://${username}.fortunecity.${tld}/${username}/</nowiki></li><br />
</ul><br />
<br />
Area/street-based sites:<br />
<ul><br />
<li><nowiki>http://www.fortunecity.${tld}/${area}/${street}/${number}/</nowiki></li><br />
<li>on .com also: <nowiki>http://${area}.fortunecity.com/${street}/${number}/</nowiki></li><br />
</ul><br />
<br />
Range of numbers: unsure. At least includes 0 up to 2600. Note: the street names are case-sensitive.<br />
<br />
=== Categories, areas, streets ===<br />
<br />
Before 2001, FortuneCity used a category-area-street-based system.<br />
<br />
The categories were the same for all tlds:<br />
<pre><br />
artsandhumanities<br />
businessandcareers<br />
computersandinternet<br />
entertainment<br />
homeandfamily<br />
international<br />
peopleandchat<br />
recreationandsports<br />
scifiandparanormal<br />
travelandtransport<br />
</pre><br />
<br />
(This data is on https://github.com/ArchiveTeam/fortunecity/tree/master/explore)<br />
<br />
The areas for each category can be found via Wayback (1999 or 2000): <nowiki>http://www.fortunecity.${tld}/explore/category/${category}.html</nowiki><br />
<br />
The streets for each area, too.<br />
For com in 1999: <nowiki>http://${area}.fortunecity.com/</nowiki><br />
For others in 2000: <nowiki>http://www.fortunecity.${tld}/explore/area/${area}.html</nowiki><br />
<br />
The streets for the following .com areas couldn't be found via Wayback:<br />
<pre>challenge cratervalley lavender littleitaly marina olympia skyscraper tatooine victorian</pre><br />
<br />
{{Navigation box}}</div>No2pencilhttps://wiki.archiveteam.org/index.php?title=User:No2pencil&diff=6517User:No2pencil2011-10-18T04:53:53Z<p>No2pencil: </p>
<hr />
<div>My name is no2pencil, & I have a desire to help create, adapt, & execute Linux & Unix shell scripts. This seems to compliment what is going on here, so I decided to join in.<br />
<br />
I grew up with computers in the 80's. Back then I wanted a Commodore 64 so very much. When the parents brought home an IBM clone I was a little disappointed. There was so much reading required, & work to be done, just to do stuff. As that time has passed I am so very thankful for that situation, & what I learned from those days.<br />
<br />
After watching the BBS documentary by Jason Scott, I realized just how much I did miss out on, not having a modem with my antique machine. After I got my first laptop in '95, I finally found a 14.4 pcmcia card. With that card, I used to love writing programs & scripts to dial the modem on Slackware dialing payphones in lobies. This lead to me collecting & contributing numbers for [http://payphone-project.com the Payphone project]. Then after finding out about his desire to archive Geocities, just for the sake of doing it for others, & that I actually can add some value to this project... well I just had to join & help out as much or as little as possible.<br />
<br />
'''tl:dr''' - Jason Scott fucking rules.<br />
<br />
<br />
[[File:No2pencil.jpg]]<br />
<br />
A little bit about me :<br />
<br />
[http://dreamincode.net Dream In Code] Admin<br />
<br />
[http://akroncdnr.com Computer Design & Repair] Shop owner</div>No2pencilhttps://wiki.archiveteam.org/index.php?title=User:No2pencil&diff=6239User:No2pencil2011-08-03T02:58:22Z<p>No2pencil: </p>
<hr />
<div>My name is no2pencil, & I have a desire to help create, adapt, & execute Linux & Unix shell scripts. This seems to compliment what is going on here, so I decided to join in.<br />
<br />
I grew up with computers in the 80's. Back then I wanted a Commodore 64 so very much. When the parents brought home an IBM clone I was a little disappointed. There was so much reading required, & work to be done, just to do stuff. As that time has passed I am so very thankful for that situation, & what I learned from those days.<br />
<br />
After watching the BBS documentary by Jason Scott, I realized just how much I did miss out on, not having a modem with my antique machine. After I got my first laptop in '95, I finally found a 14.4 pcmcia card. With that card, I used to love writing programs & scripts to dial the modem. On an unrelated note I used to collect & contribute numbers for [http://payphone-project.com the Payphone project]. Then after finding out about his desire to archive Geocities, just for the sake of doing it for others, & that I actually can add some value to this project... well I just had to join & help out as much or as little as possible.<br />
<br />
'''tl:dr''' - Jason Scott fucking rules.<br />
<br />
<br />
[[File:No2pencil.jpg]]<br />
<br />
A little bit about me :<br />
<br />
[http://dreamincode.net Dream In Code] Admin<br />
<br />
[http://akroncdnr.com Computer Design & Repair] Shop owner</div>No2pencilhttps://wiki.archiveteam.org/index.php?title=User:No2pencil&diff=6238User:No2pencil2011-08-03T02:57:16Z<p>No2pencil: </p>
<hr />
<div>My name is no2pencil, & I have a desire to help create, adapt, & execute Linux & Unix shell scripts. This seems to compliment what is going on here, so I decided to join in.<br />
<br />
I grew up with computers in the 80's. Back then I wanted a Commodore 64 so very much. When the parents brought home an IBM clone I was a little disappointed. There was so much reading required, & work to be done, just to do stuff. As that time has passed I am so very thankful for that situation, & what I learned from those days.<br />
<br />
After watching the BBS documentary by Jason Scott, I realized just how much I did miss out on, not having a modem with my antique machine. I used to love writing programs & scripts to dial the modem. On an unrelated note I used to collect & contribute numbers for [http://payphone-project.com the Payphone project]. Then after finding out about his desire to archive Geocities, just for the sake of doing it for others, & that I actually can add some value to this project... well I just had to join & help out as much or as little as possible.<br />
<br />
'''tl:dr''' - Jason Scott fucking rules.<br />
<br />
<br />
[[File:No2pencil.jpg]]<br />
<br />
A little bit about me :<br />
<br />
[http://dreamincode.net Dream In Code] Admin<br />
<br />
[http://akroncdnr.com Computer Design & Repair] Shop owner</div>No2pencilhttps://wiki.archiveteam.org/index.php?title=User:No2pencil&diff=6237User:No2pencil2011-08-03T02:56:43Z<p>No2pencil: no2pencil profile</p>
<hr />
<div>My name is no2pencil, & I have a minute desire to help create, adapt, & execute Linux & Unix shell scripts. This seems to compliment what is going on here, so I decided to join in.<br />
<br />
I grew up with computers in the 80's. Back then I wanted a Commodore 64 so very much. When the parents brought home an IBM clone I was a little disappointed. There was so much reading required, & work to be done, just to do stuff. As that time has passed I am so very thankful for that situation, & what I learned from those days.<br />
<br />
After watching the BBS documentary by Jason Scott, I realized just how much I did miss out on, not having a modem with my antique machine. I used to love writing programs & scripts to dial the modem. On an unrelated note I used to collect & contribute numbers for [http://payphone-project.com the Payphone project]. Then after finding out about his desire to archive Geocities, just for the sake of doing it for others, & that I actually can add some value to this project... well I just had to join & help out as much or as little as possible.<br />
<br />
'''tl:dr''' - Jason Scott fucking rules.<br />
<br />
<br />
[[File:No2pencil.jpg]]<br />
<br />
A little bit about me :<br />
<br />
[http://dreamincode.net Dream In Code] Admin<br />
<br />
[http://akroncdnr.com Computer Design & Repair] Shop owner</div>No2pencilhttps://wiki.archiveteam.org/index.php?title=File:No2pencil.jpg&diff=6236File:No2pencil.jpg2011-08-03T02:53:19Z<p>No2pencil: Profile photo</p>
<hr />
<div>Profile photo</div>No2pencilhttps://wiki.archiveteam.org/index.php?title=Lulu_Poetry&diff=4716Lulu Poetry2011-05-02T08:31:35Z<p>No2pencil: /* Tools */</p>
<hr />
<div>'''Lulu Poetry''' or '''Poetry.com''', announced on May 1, 2011 that they would close four days later on May 4, deleting all 14 million poems. Archive Team members instantly amassed to find out how to help and aim their [http://en.wikipedia.org/wiki/LOIC LOIC]'s at it. (By the way, I actually mean their crawlers, not DDoS cannons.)<br />
<br />
<br />
==Site Structure==<br />
The urls appear to be flexible and sequential:<br />
<br><br />
(12:13:09 AM) closure: <nowiki>http://www.poetry.com/poems/archiveteam-bitches/3535201/</nowiki> , heh, look at that, you can just put in any number you like I think<br><br />
(12:15:16 AM) closure: <nowiki>http://www.poetry.com/user/allofthem/7936443/</nowiki> same for the users<br />
<br><br />
There are apparently over 14 million poems. The numbers go up to <nowiki>http://www.poetry.com/user/whatever/14712220</nowiki>, though interspersed are urls without poems (author deletions?).<br />
<br />
==Howto==<br />
# Claim a range of numbers below.<br />
# Generate a hotlist of urls for wget to download by running this, editing in your start and end number: <tt>perl -le 'print "<nowiki>http://www.poetry.com/poems/archiveteam/$_/</nowiki>" for 1000000..2000000' > hotlist</tt><br />
# Split the hotlist into 100 sublists: <tt>split hotlist</tt><br />
# Run wget on each sublist: <tt>wget -x i xaa</tt><br />
# To avoid getting too many files in one directory, which some filesystems will choke on, recommend moving into a new subdirectory before running each wget on the sublist. <br />
# For the daring, here's how to run all wgets on all the sublists in parallel, in subdirs: <tt>for x in ???; do mkdir $x.dir; cd $x.dir; wget -x -i ../$x &; cd ..; done</tt><br />
Important note: Everyone's getting a lot of 500 errors, probably because we're whacking their server. Because of this, make sure you '''keep all your log files'''. Then you can search them later to generate a list of urls to retry. Suggested ways to do this:<br />
* run the command <tt>grep -h -B1 "ERROR 500" *.log | grep ^http | sed 's/:$//'</tt><br />
* use a perl script to search your download directory for missing folders in the sequence<br />
<br />
'''Recommended wget command''': <tt>wget -E -k -T 8 -o logfile.log -nv -nc -x -i urls.txt</tt><br />
{| class="wikitable"<br />
|-<br />
|colspan="3"|<center>'''wget Options Translation'''</center><br />
|-<br />
! short !! long version !! meaning <br />
|-<br />
| <tt>-E</tt> || <tt>--adjust-extension</tt> || adds ".html" to files that are html but didn't originally end in .html<br />
|-<br />
| <tt>-k</tt> || <tt>--convert-links</tt> || change links in html files to point to the local versions of the resources<br />
|-<br />
| <tt>-T</tt> || <tt>--timeout=</tt> || if it gets hung for this long (in seconds), it'll retry instead of sitting waiting<br />
|-<br />
| <tt>-o</tt> || <tt>--output-file</tt> || use the following filename as a log file instead of printing to screen<br />
|-<br />
| <tt>-nv</tt> || <tt>--no-verbose</tt> || don't write every little thing to the log file<br />
|-<br />
| <tt>-nc</tt> || <tt>--no-clobber</tt> || if a file is already present on disk, skip it instead of re-downloading it<br />
|-<br />
| <tt>-x</tt> || <tt>--force-directories</tt> || force it to create a hierarchy of directories mirroring the hierarchy in the url structure<br />
|-<br />
| <tt>-i</tt> || <tt>--input-file</tt> || use the following filename as a source of urls to download<br />
|}<br />
<br />
==Coordination==<br />
{| class="wikitable" style="text-align: center;"<br />
|-<br />
|colspan="4"|'''Who is handling which chunks of urls?'''<br />
|-<br />
! IRC name !! starting number !! ending number !! Progress <br />
|-<br />
| closure || 0 || 200,000 || complete<br />
|-<br />
| closure || 200,000 || 999,999 || in progress<br />
|-<br />
| jag || 1,000,000 || 2,000,000 || in progress<br />
|-<br />
| notakp || 2,000,000 || 3,000,000 || in progress<br />
|-<br />
| no2pencil || 3,000,000 || 3,999,999 || in progress<br />
|-<br />
| d8uv || 4,000,000 || 4,499,999 || in progress<br />
|-<br />
| Qwerty01 || 4,500,000 || 4,999,999 || getting started<br />
|-<br />
| Awsm || ??? || ??? || in progress?<br />
|-<br />
| underscor || ??? || ??? || in progress?<br />
|-<br />
| BlueMax|| ??? || ??? || in progress?<br />
|-<br />
| SketchCow || ??? || ??? || in progress?<br />
|-<br />
| [yourusernamehere] || x,000,000 || x,999,999 || in progress<br />
|-<br />
| [seriouslyeditme] || x,000,000 || x,999,999 || in progress<br />
|}<br />
<br />
==Tools==<br />
For detecting server maintenance issues, I created the following correction script:<br><br />
<br />
flist=`grep "currently performing site maintenance" *.html | cut -d: -f1`<br />
<br />
x=0<br />
for file in ${flist};<br />
do<br />
if [ -f ${file} ];<br />
then<br />
echo correcting ${file}<br />
html=`echo ${file} | cut -c5-11`<br />
wget -E <nowiki>http://www.poetry.com/poems/archiveteam/${html}/</nowiki> -O poem${html}.html 2>/dev/null<br />
echo done...<br />
x=`expr ${x} + 1`<br />
fi<br />
done<br />
<br />
if [ ${x} -eq 0 ]; <br />
then<br />
echo Directory clean<br />
else<br />
echo ${x} files corrected<br />
fi</div>No2pencilhttps://wiki.archiveteam.org/index.php?title=Lulu_Poetry&diff=4714Lulu Poetry2011-05-02T08:13:06Z<p>No2pencil: </p>
<hr />
<div>'''Lulu Poetry''' or '''Poetry.com''', announced on May 1, 2011 that they would close four days later on May 4, deleting all 14 million poems. Archive Team members instantly amassed to find out how to help and aim their [http://en.wikipedia.org/wiki/LOIC LOIC]'s at it. (By the way, I actually mean their crawlers, not DDoS cannons.)<br />
<br />
<br />
==Site Structure==<br />
The urls appear to be flexible and sequential:<br />
<br><br />
(12:13:09 AM) closure: <nowiki>http://www.poetry.com/poems/archiveteam-bitches/3535201/</nowiki> , heh, look at that, you can just put in any number you like I think<br><br />
(12:15:16 AM) closure: <nowiki>http://www.poetry.com/user/allofthem/7936443/</nowiki> same for the users<br />
<br><br />
There are apparently over 14 million poems. The numbers go up to <nowiki>http://www.poetry.com/user/whatever/14712220</nowiki>, though interspersed are urls without poems (author deletions?).<br />
<br />
==Howto==<br />
# Claim a range of numbers below.<br />
# Generate a hotlist of urls for wget to download by running this, editing in your start and end number: <tt>perl -le 'print "<nowiki>http://www.poetry.com/poems/archiveteam/$_/</nowiki>" for 1000000..2000000' > hotlist</tt><br />
# Split the hotlist into 100 sublists: <tt>split hotlist</tt><br />
# Run wget on each sublist: <tt>wget -x i xaa</tt><br />
# To avoid getting too many files in one directory, which some filesystems will choke on, recommend moving into a new subdirectory before running each wget on the sublist. <br />
# For the daring, here's how to run all wgets on all the sublists in parallel, in subdirs: <tt>for x in ???; do mkdir $x.dir; cd $x.dir; wget -x -i ../$x &; cd ..; done</tt><br />
Important note: Everyone's getting a lot of 500 errors, probably because we're whacking their server. Because of this, make sure you '''keep all your log files'''. Then you can search them later to generate a list of urls to retry. Suggested ways to do this:<br />
* run the command <tt>grep -h -B1 "ERROR 500" *.log | grep ^http | sed 's/:$//'</tt><br />
* use a perl script to search your download directory for missing folders in the sequence<br />
<br />
'''Recommended wget command''': <tt>wget -E -k -T 8 -o logfile.log -nv -nc -x -i urls.txt</tt><br />
{| class="wikitable"<br />
|-<br />
|colspan="3"|<center>'''wget Options Translation'''</center><br />
|-<br />
! short !! long version !! meaning <br />
|-<br />
| <tt>-E</tt> || <tt>--adjust-extension</tt> || adds ".html" to files that are html but didn't originally end in .html<br />
|-<br />
| <tt>-k</tt> || <tt>--convert-links</tt> || change links in html files to point to the local versions of the resources<br />
|-<br />
| <tt>-T</tt> || <tt>--timeout=</tt> || if it gets hung for this long (in seconds), it'll retry instead of sitting waiting<br />
|-<br />
| <tt>-o</tt> || <tt>--output-file</tt> || use the following filename as a log file instead of printing to screen<br />
|-<br />
| <tt>-nv</tt> || <tt>--no-verbose</tt> || don't write every little thing to the log file<br />
|-<br />
| <tt>-nc</tt> || <tt>--no-clobber</tt> || if a file is already present on disk, skip it instead of re-downloading it<br />
|-<br />
| <tt>-x</tt> || <tt>--force-directories</tt> || force it to create a hierarchy of directories mirroring the hierarchy in the url structure<br />
|-<br />
| <tt>-i</tt> || <tt>--input-file</tt> || use the following filename as a source of urls to download<br />
|}<br />
<br />
==Coordination==<br />
{| class="wikitable" style="text-align: center;"<br />
|-<br />
|colspan="4"|'''Who is handling which chunks of urls?'''<br />
|-<br />
! IRC name !! starting number !! ending number !! Progress <br />
|-<br />
| closure || 0 || 200,000 || complete<br />
|-<br />
| closure || 200,000 || 999,999 || in progress<br />
|-<br />
| jag || 1,000,000 || 2,000,000 || in progress<br />
|-<br />
| notakp || 2,000,000 || 3,000,000 || in progress<br />
|-<br />
| no2pencil || 3,000,000 || 3,999,999 || in progress<br />
|-<br />
| d8uv || 4,000,000 || 4,499,999 || in progress<br />
|-<br />
| Qwerty01 || 4,500,000 || 4,999,999 || getting started<br />
|-<br />
| Awsm || ??? || ??? || in progress?<br />
|-<br />
| underscor || ??? || ??? || in progress?<br />
|-<br />
| BlueMax|| ??? || ??? || in progress?<br />
|-<br />
| SketchCow || ??? || ??? || in progress?<br />
|-<br />
| [yourusernamehere] || x,000,000 || x,999,999 || in progress<br />
|-<br />
| [seriouslyeditme] || x,000,000 || x,999,999 || in progress<br />
|}<br />
<br />
==Correction==<br />
For server maintenance issues, I created the following correction script :<br />
<br />
flist=`grep "currently performing site maintenance" *.html | cut -d: -f1`<br />
<br />
x=0<br />
for file in ${flist};<br />
do<br />
if [ -f ${file} ];<br />
then<br />
echo correcting ${file}<br />
wget -E http://www.poetry.com/poems/archiveteam/${x}/ -O poem${x}.html 2>/dev/null<br />
echo done...<br />
x=`expr ${x} + 1`<br />
fi<br />
done<br />
<br />
if [ ${x} -eq 0 ]; <br />
then<br />
echo Directory clean<br />
else<br />
echo ${x} files corrected<br />
fi</div>No2pencil