Difference between revisions of "AOL"
(→Plans: more possible plans, goals) |
(→Reverse Engineering: conversation about packet tokens involved in a download) |
||
Line 24: | Line 24: | ||
http://db48x.net/temp/Screenshot%20-%2001292013%20-%2008:28:31%20PM.png | http://db48x.net/temp/Screenshot%20-%2001292013%20-%2008:28:31%20PM.png | ||
[21:24:10] <db48x> there's a packet coming from the server with a token tf | |||
[21:24:16] <db48x> the data has a filename in it | |||
[21:24:59] <db48x> the data is in a series of packets with token FF and F7 (no explanation of the difference is available) | |||
[21:25:24] <balrog_> but like when you view a file library | |||
[21:25:34] <balrog_> how does it tell the server which library to display? | |||
[21:25:36] <db48x> the last packet of the file has token F9 | |||
21:25:42] <db48x> haven't figured that out yet | |||
[21:25:56] <balrog_> ah | |||
[21:26:01] <db48x> before this file in the capture there are packets with tokens EB and uJ going from the client to the server | |||
[21:26:03] <balrog_> none of the documentation covers this? | |||
[21:26:09] <balrog_> aaah | |||
[21:26:44] <db48x> and mD | |||
[21:26:51] <db48x> and tokens AT and tD coming back | |||
[21:29:29] <db48x> looks like the tD coming back has the metadata in it | |||
[21:30:50] <balrog_> http://sicexcels.tripod.com/~SicExcels/rm-vpd_info/TokenTypes_Basic.txt | |||
[21:31:12] <balrog_> http://sicexcels.tripod.com/~SicExcels/rm-vpd_info/TokenTypes_Plus.txt | |||
[21:31:16] <balrog_> quite incomplete | |||
[21:33:26] <db48x> mD = download now, then | |||
[21:34:31] <db48x> and an mF, file description | |||
[21:34:41] <db48x> followed by an AT with a bunch of data | |||
[21:35:35] <db48x> looks like labels for buttons like 'download now', 'download later', 'ask the staff', 'related files' | |||
[21:35:56] <db48x> packet 538 | |||
[21:37:19] <db48x> continues in the next AT packet, 540, which looks like it has the description in it | |||
[21:37:29] <db48x> talks about using ShrinkIt to unpack the file | |||
= URLs = | = URLs = |
Revision as of 20:33, 30 January 2013
AOL | |
Status | Online! on January 28, 2013 |
Archiving status | Researching |
Archiving type | Unknown |
IRC channel | #aohell (on hackint) |
This is about archiving the original AOL, not AOL's current website. The AOL system is currently in major disrepair. It is as if they left the machines sitting in the datacenter, and as they die, they do not fix any issues. There is much broken infrastructure.
Protocol
- http://web.archive.org/web/20020205182212/http://www.aol-files.com/fdo91/index.html
- http://web.archive.org/web/20020329213511/http://www.aol-files.com/downloads/docs/index.shtml
- http://www.angelfire.com/sk2/Twisted/Anti-AOL.htm
- http://web.archive.org/web/20011011181824/http://www.aol-files.com/misc/theaolprotocol.wri
- http://sicexcels.tripod.com/rm-vpd.html
Reverse Engineering
The trunk version of Wireshark includes a dissassembler for the AOL protocol that breaks out the basic header information, such as the packet type and the token. It doesn't go into any detail about the contents of the packet, but this is a good start. This isn't available for download yet, so you'll have to build it yourself, from the svn trunk; once built wireshark reports itself as 1.9.0.
http://db48x.net/temp/Screenshot%20-%2001292013%20-%2008:28:31%20PM.png
[21:24:10] <db48x> there's a packet coming from the server with a token tf [21:24:16] <db48x> the data has a filename in it [21:24:59] <db48x> the data is in a series of packets with token FF and F7 (no explanation of the difference is available) [21:25:24] <balrog_> but like when you view a file library [21:25:34] <balrog_> how does it tell the server which library to display? [21:25:36] <db48x> the last packet of the file has token F9 21:25:42] <db48x> haven't figured that out yet [21:25:56] <balrog_> ah [21:26:01] <db48x> before this file in the capture there are packets with tokens EB and uJ going from the client to the server [21:26:03] <balrog_> none of the documentation covers this? [21:26:09] <balrog_> aaah [21:26:44] <db48x> and mD [21:26:51] <db48x> and tokens AT and tD coming back [21:29:29] <db48x> looks like the tD coming back has the metadata in it [21:30:50] <balrog_> http://sicexcels.tripod.com/~SicExcels/rm-vpd_info/TokenTypes_Basic.txt [21:31:12] <balrog_> http://sicexcels.tripod.com/~SicExcels/rm-vpd_info/TokenTypes_Plus.txt [21:31:16] <balrog_> quite incomplete [21:33:26] <db48x> mD = download now, then [21:34:31] <db48x> and an mF, file description [21:34:41] <db48x> followed by an AT with a bunch of data [21:35:35] <db48x> looks like labels for buttons like 'download now', 'download later', 'ask the staff', 'related files' [21:35:56] <db48x> packet 538 [21:37:19] <db48x> continues in the next AT packet, 540, which looks like it has the description in it [21:37:29] <db48x> talks about using ShrinkIt to unpack the file
URLs
aol://nnnn
- 1722: Keywords
- 2719: Chatrooms
- 3548: User profiles
- 4344: Interactive page
- 4400: File libraries
- 4401: Files
- 586x: ???
Examples
- aol://4344:1264.a2main.10029531.514525857
- aol://4400:8287
- aol://4344:1264.a2abt.10037404
- aol://4344:117.mtv.591130
- aol://4344:226.llll.2755674.520114429 (Access code: 3675)
Sources
- http://web.archive.org/web/20060207004722/http://daol.aol.com/aolatoz
- http://aolhostages.tripod.com/oldused-KWs.txt
- http://www.oocities.org/sunsetstrip/club/5468/secretz.txt
- http://koin.org/files/aol.aim/aol/fdo/tools/software%20library%20list%20bmb_libs.xls
Lots of sources: Documents covering how to make AOL forms and various such things:
Samples of custom forms:
- http://www.mattmazur.com/archive/aol-files/downloads/tools/win/star/index.html - Open up the links under More Info
Some FDO lessons:
- http://www.mattmazur.com/archive/aol-files/fdo91/tutorial_lesson01.html
- http://www.mattmazur.com/archive/aol-files/fdo91/tutorial_lesson02.html
- http://www.mattmazur.com/archive/aol-files/fdo91/tutorial_lesson03.html
- http://web.archive.org/web/20010418134911/http://www.aol-files.com/fdo91/fdoman.html
About the class names:
Token list:
Here is an early version of aol-files.com:
Atoms list:
Structure
<balrog_> yes, but aol://4344:nnnn doesn't work without the extra [19:52] <balrog_> aol://4344:1264.a2main.10029531 also works <balrog_> simply aol://4344:1264.a2main does not work.
[20:17] <DrainLbry> so to summarize we've got aol://4400:ID (from spreadsheet), for file libraries, and aol://4344:uniqueidentifier for interactive content [20:18] <balrog_> aol://4344:uniqueidentifier:ID <balrog_> as per http://web.archive.org/web/20060207004722/http://daol.aol.com/aolatoz keywords used to be aol://1722:keyword <balrog_> but that's no longer working
Software
- http://koin.org/files/aol.aim/aol/mAOL/
- http://web.archive.org/web/20010713011523/http://www.aol-files.com/downloads/tools/win/dbview/index.html
- http://web.archive.org/web/20010128152400/http://www.aol-files.com/downloads/tools/mac.html
- http://www.mattmazur.com/archive/aol-files/downloads/tools/win/star/
- http://www.mattmazur.com/archive/aol-files/index.html
- http://www.ppcmla.com/downloads/
Goals
save forums/files/etc
AOL has a large number of forums on every topic, file libraries containing art, shareware, game mods, etc, etc. These should be fairly easy to enumerate, and once found it should be fairly easy to download all of the forum messages and files. Archives of these would be worth saving.
save everything
Every window that you can click on in AOL was created by a 'producer' at AOL. Many of them are essentially identical, file libraries for instance, but many are also one-offs created for a specific purpose. We ought to save these as well. Going this route will take a more thorough understanding of both the AOL protocol and the FDO scripts.
Plans
There are several ways to go about this, with tradeoffs that are only lightly explored.
custom scraper
Write a scraper in python that understands the AOL protocol and FDO scripts, and writes everything to warc files. Warc save us much of the trouble of figuring out how to organize everything on disk. they also make it much easier to create a server than can pretend to be the AOL server, or that can translate into http/html to allow anyone with a web browser to see what AOL was like.
wget-aol
Modify wget to support the AOL protocol. Very ambitious, but it would let us reuse wget's infrastructure, which may make the task easier; we'd be able to concentrate on just implementing the protocol and FDO parsing and leave the rest to wget. Would that reuse save us time, or would dealing with wget's internals drive us mad? Hard to say. This method would also allow us to create warc files.
script the client
Drive the real AOL client, perhaps with debugging tools installed, in order to capture both the FDO sources and screenshots of the rendering. Probably more fragile, but we wouldn't have to understand the actual protocol. Wouldn't be able to create warc files.