Difference between revisions of "Polygamia.pl"

From Archiveteam
Jump to navigation Jump to search
(Article from http://pastebin.com/6rrWQ1Gd)
 
Line 20: Line 20:
 
* There is also a page= parameter (starting at 1). However, it is almost always unnecessary, and only a small part of all articles needs more than one page to fit all comments.
 
* There is also a page= parameter (starting at 1). However, it is almost always unnecessary, and only a small part of all articles needs more than one page to fit all comments.
 
   
 
   
So it's possible to create a list of specially prepared URLs with parameters picked exactly to display all comments. One downside is that they're not linked to directly from the article pages, which means that once they're in the IA it might be a little complicated for the end user to find them, but there's no real way around that
+
So it's possible to create a list of specially prepared URLs with parameters picked exactly to display all comments. [https://dl.dropboxusercontent.com/u/89415187/polygamia.pl_article_comments.txt Here (currently up-to-date as of Feb 20)]
 +
 
 +
One downside is that they're not linked to directly from the article pages, which means that once they're in the IA it might be a little complicated for the end user to find them, but there's no real way around that
 
   
 
   
 
== Blogs ==
 
== Blogs ==
 
   
 
   
A list of every single blog post, in reverse chronological order, is at http://polygamia.pl/blogi It is an easy matter to scrape all the blog posts' URLs from that.
+
A list of every single blog post, in reverse chronological order, is at http://polygamia.pl/blogi It is an easy matter to scrape all the blog posts' URLs from that, either by hand, or by pointing a web scraper at this address and telling it to scrape everything in that folder.
 
   
 
   
A blog post has an URL of the form http://polygamia.pl/blogi/kumasztotera/2012/05/jak_zjezdzilem_pol_warszawy_by_kupic_gre_ktora_miala_dzis_premiere/2 ("kumasztotera" is the author's username, then follows the date and the title. The significance of the number at the end is unknown but it's necessary.)
+
A blog post has an URL of the form http://polygamia.pl/blogi/kumasztotera/2012/05/jak_zjezdzilem_pol_warszawy_by_kupic_gre_ktora_miala_dzis_premiere/2 ("kumasztotera" is the author's username, then follows the date and the title. The significance of the number at the end is unknown but it ''is'' necessary.)
 
   
 
   
 
Comments are displayed directly beneath a blog post. Only a handful of blog posts have so many comments that they have a "show all comments" link (no infinite scrolling here), whose URL follows the same rules as explained in the "Article comments" section.
 
Comments are displayed directly beneath a blog post. Only a handful of blog posts have so many comments that they have a "show all comments" link (no infinite scrolling here), whose URL follows the same rules as explained in the "Article comments" section.

Revision as of 06:00, 21 February 2016

Polygamia.pl is a Polish video gaming news website. It was originally a part of the media portal gazeta.pl until December 2015, when it was bought out by another company.

In February 2016, the website saw an announcement of an upcoming major redesign, which is currently available for previewing at http://new.polygamia.pl. While all old articles will be apparently retained, it is unknown what will happen to old user accounts (which were tied to the gazeta.pl portal); the new.polygamia.pl portal requires users to register a new account, and while all articles are available on it[*], the user comments and blogs are missing. Despite numerous questions from users, the website administrators haven't yet made an official pronouncement on the matter.

In particular, it is unknown what will happen to 1) user blogs and 2) comments under articles, both of which are tied to the currently existing accounts.

[*] this includes numerous very old pre-2009 articles, which have no comments to archive.

Article comments

The article URLs are of the form http://polygamia.pl/Polygamia/1,107162,19605465,zapraszamy-na-nowa-polygamie-w-early-access.html

The comments are visible under the articles, but, for articles with a large number of comments, a limited 'infinite scroll' system is used. By scrolling to the bottom of the page, the appearance of several more comments is triggered. After two-three rounds of this, the page instead shows a link to the "show all comments" page. The URL of that page is of the form http://polygamia.pl/Polygamia/1,107162,19605465,zapraszamy-na-nowa-polygamie-w-early-access.html?v=1&obxx=19605465&offset=19#opinions

In other words, it's exactly the article URL, except with several additional parameters:

  • v=1 means do not display the full text of the article, just the heading.
  • obxx= means that this is the "show all comments" page. The value of the parameter doesn't seem to matter. Even if it is blank, the comments are shown correctly.
  • offset=19 means skipping several top comments (these which were displayed with the infinite scroll.) By removing this parameter, you can easily get every single comment.
  • There is also a page= parameter (starting at 1). However, it is almost always unnecessary, and only a small part of all articles needs more than one page to fit all comments.

So it's possible to create a list of specially prepared URLs with parameters picked exactly to display all comments. Here (currently up-to-date as of Feb 20)

One downside is that they're not linked to directly from the article pages, which means that once they're in the IA it might be a little complicated for the end user to find them, but there's no real way around that

Blogs

A list of every single blog post, in reverse chronological order, is at http://polygamia.pl/blogi It is an easy matter to scrape all the blog posts' URLs from that, either by hand, or by pointing a web scraper at this address and telling it to scrape everything in that folder.

A blog post has an URL of the form http://polygamia.pl/blogi/kumasztotera/2012/05/jak_zjezdzilem_pol_warszawy_by_kupic_gre_ktora_miala_dzis_premiere/2 ("kumasztotera" is the author's username, then follows the date and the title. The significance of the number at the end is unknown but it is necessary.)

Comments are displayed directly beneath a blog post. Only a handful of blog posts have so many comments that they have a "show all comments" link (no infinite scrolling here), whose URL follows the same rules as explained in the "Article comments" section.

Sources

This article is from [1].