Difference between revisions of "NeoGAF"
|Line 41:||Line 41:|
where $is between 1 and $page_count
== Page Validation ==
== Page Validation ==
Revision as of 10:28, 25 October 2017
|Archiving status||Not saved yet|
|IRC channel||(on EFnet)|
NeoGAF is a forum dedicated to discussing video games and related topics in the video gaming industry, previously known as the Gaming-Age Forums from its inception in 1999 until April 2006. It's amongst one of the largest vBulletin forums in existence with the site reporting just shy of 120 million posts spread across about ~830K threads.
Extremely unstable in the wake of a sexual harassment scandal with accusations involving the site's owner. This has sparked reddit-like riots on the forums, with several moderators resigning and users being asked to be banned from the site. Site performance has tanked, with it being completely unavailable at times.
Site uses vBulletin as its forum software, commonly used on many forums such as Valve's Steam Users' Forums and numerous others.
All thread URLs are in this form:
Where $thread_id is between 1 and approximately 1500000 (as of October 24, 2017).
The $page_count, which exists in threads with more than one page, could be found within the HTML. E.g.
<li class="pageof"> Page 1 of 32 </li>
The $page_count could be extracted using an HTML parser that supports CSS-style selector. First select the "li" tag with the class name "pageof". Then extract the text within "li" tag. Trim any whitespace. And finally do a regex to extract the $page_count.
Once the page count is extracted, the URL to any other page is:
where $page_number is between 1 and $page_count.
Some forums, including the archive subforum, require you to login to view the threads.
Both valid and invalid pages return HTTP 200 status code representing success. Therefore in order to determine if a page is fetched correctly you cannot rely only on the status code; you must parse the HTML and test whether there are actual posts within a page or if there is an explicit message stating that a post does not exist.
To test if a thread actually contains posts, you could use an HTML parser and test if there exists any tags with classes named "post", "postbit-details-username", etc.
If a thread DOES_NOT_EXIST, the page will contain the following HTML code:
<td class="tcat">NeoGAF Message</td> </tr> <tr> <td class="panelsurround" align="center"> <div class="panel"> <div align="left"> <div style="margin: 12px">No Thread specified. If you followed a valid link, please notify the <a href="sendmessage.php">administrator</a></div> </div> </div> <!-- <div style="margin-top:8px"> <input type="submit" class="button" value="Go Back" accesskey="s" onclick="history.back(1); return false" /> </div> --> </td>
Similarly, if a thread is INVALID, the page will display "NeoGAF Message" along with the text "Invalid Thread specified. If you followed a valid link, please notify the administrator"
If you are NOT_AUTHORIZED to access a thread (e.g. a thread is in the moderators-only forum), the page will will display "NeoGAF Message" and a login form containing the text "You are not logged in or you do not have permission to access this page. This could be due to one of several reasons:".
If there are no posts in the thread and it displays neither of those messages, then it is an UNKNOWN_ERROR. An UNKNOWN_ERROR may be a connection problem or a could temporary server failure in serving the page, so it's best to re-fetch the threads with UNNOWN_ERRORs at a later time.