https://wiki.archiveteam.org/api.php?action=feedcontributions&user=Yin&feedformat=atomArchiveteam - User contributions [en]2024-03-29T11:19:27ZUser contributionsMediaWiki 1.37.1https://wiki.archiveteam.org/index.php?title=Tumblr&diff=32783Tumblr2018-12-05T08:50:00Z<p>Yin: Update list.</p>
<hr />
<div>{{Infobox project<br />
| title = Tumblr<br />
| logo = Tumblr on white.png<br />
| image = Tumblr_staff_blog.png<br />
| URL = <nowiki>http://www.tumblr.com/</nowiki><br />
| project_status = {{online}}<br />
| archiving_status = {{nosavedyet}}<br />
| source = https://github.com/ArchiveTeam/tumblr-grab<br />
| irc = tumbledown<br />
}}<br />
<br />
[[Image:Yahoobuystumblr.gif]]<br />
<br />
'''Tumblr''' is a social networking microblog.<br />
<br />
[[Yahoo!]] has purchased Tumblr for 1.1 billion dollars. Tumblr allegedly [http://blogs.wsj.com/digits/2014/10/21/yahoo-tumblr-to-make-over-100-million-in-revenue-next-year/ doubled in number of blogs in 2014] will become profitable in 2015.<br />
<br />
In December 2015, Yahoo put their Tumblr service into the "decide on" category in their Action Plan, according to their [http://www.wsj.com/public/resources/documents/yahoopresentation.pdf 2015 shareholder presentation].<br />
<br />
In June 2017, Tumblr tightened up "Safe mode", which limits "sensitive content" to all users below 18 years old and the viewing of blogs marked as explicit, potentially causing a major moveaway from Tumblr due to Internet Backdraft from its users. Given Yahoo's tedency to ax things that become less popular than expected, it might be important to keep an eye out for it.<br />
<br />
On 3 Dec 2018, [https://tumblr.zendesk.com/hc/en-us/articles/231885248-Sensitive-content Tumblr announced] that all NSFW content will be removed on 17 Dec 2018.<br />
<br />
== Quirks ==<br />
Users can change their account names into the format used for deleted accounts. Specifically, USERNAME-deactivated-[Any amount of digits, 0-9]. Users who do this are unaccessible via their main account page, or directly linked to posts. Their posts will still show up in searches, and their "archive" url will work. This doesn't seem to have an effect on the API, and tumblr-utils will still work just fine. For an example of this tomfoolery, see [http://diediedie3344-deactivated-204913.tumblr.com/archive the archive page of user "diediedie3344-deactivated-204913"].<br />
<br />
Another quirk is that tumblr accounts that appear to be on a different domain name are still accessible at, and show up in searches as, their account name. Trying to go to any page on the accountname.tumblr.com end redirects you to the same page on the custom-url-here.com page. For an example of this behavior, see [http://homosethsual.tumblr.com user "homosethsual"] which redirects to [http://ranpos.star.is/ ranpos.star.is]<br />
<br />
As of 30th of July 2017<s>,it is no longer possible to access NSFW accounts outside of http://tumblr.com/blog/<name> URLs. Attempting to access an NSFW account normally will now cause infinite redirecting.</s> NSFW marked Tumblrs are inaccessible to signed out users.<br />
<br />
== Lists of Tumblr blogs ==<br />
* {{URL|https://transfer.sh/13Aa3n/tumblr.com.txt}} (2.6 million; [[Project Sonar]] 2018-10-26 FDNS data)<br />
* {{URL|https://files.catbox.moe/o1di6l.xz}} (~7 million, scraped in april 2018, csv formatted with additional metadata. blog,url,post count,likes count, ... 5th csv field should be the is_nsfw indicator)<br />
* {{URL|https://files.catbox.moe/ve5hb4.xz}} (~12 million, scraped in april 2018, contains everything from above and more. schema as stated in first line of the file: tumblelog,url,posts,likes,adult,nsfw,groupchan)<br />
<br />
== Archiving ==<br />
As the sources above are incomplete, blogs to be archived can also be submitted using [https://docs.google.com/forms/d/e/1FAIpQLSdoYnlweKF-5iQ2G0FB9s7pDV_Le61dDU-gMMDsc8CQ50YBjQ/viewform?hl=en this form].<br />
<br />
== See also ==<br />
* [http://sourceforge.net/projects/gettumblrpics/ gettumblrpics], simple script to download images from a tumblr feed as they appear in it<br />
* [https://github.com/bbolli/tumblr-utils/ tumblr-utils], tumblr_backup.py can make a local backup of posts (XML default), video, audio and images. Uses APIv2<br />
* [https://github.com/woodenphone/tumblrsagi Tumblrsagi], Code to grab blogs from the API and stuff them into a database for rehosting, used by [https://t.archive.horse/blogs this tumblr archive]<br />
* [http://soup.io] can automatically mirror the contents of a tumblr blog as they are posted, which may be useful for maintaining an offsite-copy which can be archived later.<br />
* [https://www.jzab.de/content/tumblthree TumblThree], Can archive an entire blog by feeding it an URL, including asks, text posts and reblogs to XML format and can download all images. [https://github.com/johanneszab/TumblThree/releases/latest Downloadable here.] Windows only until the dev implements mono support.<br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Microblogging services]] [[Category:Yahoo!]]</div>Yinhttps://wiki.archiveteam.org/index.php?title=Tumblr&diff=32770Tumblr2018-12-04T04:35:38Z<p>Yin: Adding another list of tumblr blogs</p>
<hr />
<div>{{Infobox project<br />
| title = Tumblr<br />
| logo = Tumblr on white.png<br />
| image = Tumblr_staff_blog.png<br />
| URL = <nowiki>http://www.tumblr.com/</nowiki><br />
| project_status = {{online}}<br />
| archiving_status = {{nosavedyet}}<br />
| source = https://github.com/ArchiveTeam/tumblr-grab<br />
| irc = tumbledown<br />
}}<br />
<br />
[[Image:Yahoobuystumblr.gif]]<br />
<br />
'''Tumblr''' is a social networking microblog.<br />
<br />
[[Yahoo!]] has purchased Tumblr for 1.1 billion dollars. Tumblr allegedly [http://blogs.wsj.com/digits/2014/10/21/yahoo-tumblr-to-make-over-100-million-in-revenue-next-year/ doubled in number of blogs in 2014] will become profitable in 2015.<br />
<br />
In December 2015, Yahoo put their Tumblr service into the "decide on" category in their Action Plan, according to their [http://www.wsj.com/public/resources/documents/yahoopresentation.pdf 2015 shareholder presentation].<br />
<br />
In June 2017, Tumblr tightened up "Safe mode", which limits "sensitive content" to all users below 18 years old and the viewing of blogs marked as explicit, potentially causing a major moveaway from Tumblr due to Internet Backdraft from its users. Given Yahoo's tedency to ax things that become less popular than expected, it might be important to keep an eye out for it.<br />
<br />
On 3 Dec 2018, [https://tumblr.zendesk.com/hc/en-us/articles/231885248-Sensitive-content Tumblr announced] that all NSFW content will be removed on 17 Dec 2018.<br />
<br />
== Quirks ==<br />
Users can change their account names into the format used for deleted accounts. Specifically, USERNAME-deactivated-[Any amount of digits, 0-9]. Users who do this are unaccessible via their main account page, or directly linked to posts. Their posts will still show up in searches, and their "archive" url will work. This doesn't seem to have an effect on the API, and tumblr-utils will still work just fine. For an example of this tomfoolery, see [http://diediedie3344-deactivated-204913.tumblr.com/archive the archive page of user "diediedie3344-deactivated-204913"].<br />
<br />
Another quirk is that tumblr accounts that appear to be on a different domain name are still accessible at, and show up in searches as, their account name. Trying to go to any page on the accountname.tumblr.com end redirects you to the same page on the custom-url-here.com page. For an example of this behavior, see [http://homosethsual.tumblr.com user "homosethsual"] which redirects to [http://ranpos.star.is/ ranpos.star.is]<br />
<br />
As of 30th of July 2017<s>,it is no longer possible to access NSFW accounts outside of http://tumblr.com/blog/<name> URLs. Attempting to access an NSFW account normally will now cause infinite redirecting.</s> NSFW marked Tumblrs are inaccessible to signed out users.<br />
<br />
== Lists of Tumblr blogs ==<br />
* {{URL|https://transfer.sh/13Aa3n/tumblr.com.txt}} (2.6 million; [[Project Sonar]] 2018-10-26 FDNS data)<br />
* {{URL|https://files.catbox.moe/o1di6l.xz}} (~7 million, scraped in april 2018, csv formatted with additional metadata. blog,url,post count,likes count, ... 6th csv field should be the is_nsfw indicator)<br />
<br />
== See also ==<br />
* [http://sourceforge.net/projects/gettumblrpics/ gettumblrpics], simple script to download images from a tumblr feed as they appear in it<br />
* [https://github.com/bbolli/tumblr-utils/ tumblr-utils], tumblr_backup.py can make a local backup of posts (XML default), video, audio and images.<br />
* [https://github.com/woodenphone/tumblrsagi Tumblrsagi], Code to grab blogs from the API and stuff them into a database for rehosting, used by [https://t.archive.horse/blogs this tumblr archive]<br />
* [http://soup.io] can automatically mirror the contents of a tumblr blog as they are posted, which may be useful for maintaining an offsite-copy which can be archived later.<br />
* [https://www.jzab.de/content/tumblthree TumblThree], Can archive an entire blog by feeding it an URL, including asks, text posts and reblogs to XML format and can download all images. [https://github.com/johanneszab/TumblThree/releases/latest Downloadable here.] Windows only until the dev implements mono support.<br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Microblogging services]] [[Category:Yahoo!]]</div>Yinhttps://wiki.archiveteam.org/index.php?title=Wallhaven&diff=30418Wallhaven2018-03-10T12:19:44Z<p>Yin: Add Information about NSFW wallpapers and link to stats page.</p>
<hr />
<div>{{Infobox project<br />
| title = Wallhaven (Alpha Phase)<br />
| logo = Wallhaven_logo.png<br />
| image = Wallhaven.jpg<br />
| description = wallpaper repository<br />
| URL = http://alpha.wallhaven.cc<br />
| project_status = {{online}}<br />
| archiving_status = {{inprogress}}<br />
| irc = archiveteam<br />
}}<br />
<br />
'''wallhaven.cc''' is a store of wallpapers and other high-resolution media typically scraped from chans' /hr, /wg, and /w boards.<br />
<br />
It seems to be a replacement for wallbase.cc project.<br />
<br />
==Overview==<br />
<br />
It is in alpha phase now. Content uploaded to alpha.wallhaven.cc will likely be deleted after that phase is over.<br />
<br />
The notice on the page reads:<br />
<br />
Alpha Notice: We are expecting to start fresh at the end of the alpha phase. The alpha is only intended as a sneak peek and a quick and dirty bug test.<br />
<br />
We should archive the content on alpha.wallhaven.cc.<br />
<br />
Alpha phase ending soon.<br />
https://alpha.wallhaven.cc/forums/post/16193#post-16193<br />
<br />
Yeah, we're very close to ending alpha now. There will be a few "big" changes that will be breaking for a lot of 3rd parties, so we're going to try and make sure to have a decent plan to allow a bit of time to react before we just break everything for them.<br />
<br />
==Work thus far==<br />
<br />
Some page analysis.<br />
<br />
==Site Specifics==<br />
<br />
The structure is very similar to wallbase.cc. Scraping is very easy. Some urls have changed a bit.<br />
<br />
Stats: https://alpha.wallhaven.cc/stats<br />
<br />
Data:<br />
* Categories: alpha.wallhaven.cc/tags/''id''<br />
* Tags: alpha.wallhaven.cc/tag/''id''<br />
* Wallpapers: alpha.wallhaven.cc/wallpaper/''id''<br />
* Users: alpha.wallhaven.cc/user/''id''<br />
<br />
Media:<br />
* Wallpapers: alpha.wallhaven.cc/wallpapers/full/wallhaven-ID(.jpg/.png)<br />
<br />
Other notes:<br />
* You have to be logged in to view NSFW flagged wallpapers.<br />
* Tags can have aliases. This seems to be new. It's kinda cool, I think.<br />
* The domain implements rate limiting or the infrastructure is a lot slower compared to the wallbase.cc infrastructure.<br />
<br />
{{Navigation box}}<br />
[[Category:Image hosting]]</div>Yinhttps://wiki.archiveteam.org/index.php?title=Talk:Wallhaven&diff=30417Talk:Wallhaven2018-03-10T12:16:50Z<p>Yin: Update on scrape process</p>
<hr />
<div>Started grabbing all wallpapers with tags. I'm using wpull, so I should be able to provide WARCs as well. Will report back in a few days.<br />
<br />
--[[User:Yin|Yin]] 14:51, 02 March 2018 (UTC)<br />
<br />
Initial scrape done. Got ~71% of all wallpapers if I compare my number with the one listed on https://alpha.wallhaven.cc/stats. Turns out you have to be logged in to see NSFW flagged wallpapers. Going to do another scrape and will report back again.<br />
<br />
--[[User:Yin|Yin]] 12:12, 10 March 2018 (UTC)</div>Yinhttps://wiki.archiveteam.org/index.php?title=Talk:Wallhaven&diff=30395Talk:Wallhaven2018-03-02T13:53:29Z<p>Yin: grabbing wallhaven</p>
<hr />
<div>Started grabbing all wallpapers with tags. I'm using wpull, so I should be able to provide WARCs as well. Will report back in a few days.<br />
<br />
--[[User:Yin|Yin]] 14:51, 02 March 2018 (UTC)</div>Yinhttps://wiki.archiveteam.org/index.php?title=Wallhaven&diff=30394Wallhaven2018-03-02T13:47:12Z<p>Yin: update info about alpha and stats</p>
<hr />
<div>{{Infobox project<br />
| title = Wallhaven (Alpha Phase)<br />
| logo = Wallhaven_logo.png<br />
| image = Wallhaven.jpg<br />
| description = wallpaper repository<br />
| URL = http://alpha.wallhaven.cc<br />
| project_status = {{online}}<br />
| archiving_status = {{inprogress}}<br />
| irc = archiveteam<br />
}}<br />
<br />
'''wallhaven.cc''' is a store of wallpapers and other high-resolution media typically scraped from chans' /hr, /wg, and /w boards.<br />
<br />
It seems to be a replacement for wallbase.cc project.<br />
<br />
==Overview==<br />
<br />
It is in alpha phase now. Content uploaded to alpha.wallhaven.cc will likely be deleted after that phase is over.<br />
<br />
The notice on the page reads:<br />
<br />
Alpha Notice: We are expecting to start fresh at the end of the alpha phase. The alpha is only intended as a sneak peek and a quick and dirty bug test.<br />
<br />
We should archive the content on alpha.wallhaven.cc.<br />
<br />
Alpha phase ending soon.<br />
https://alpha.wallhaven.cc/forums/post/16193#post-16193<br />
<br />
Yeah, we're very close to ending alpha now. There will be a few "big" changes that will be breaking for a lot of 3rd parties, so we're going to try and make sure to have a decent plan to allow a bit of time to react before we just break everything for them.<br />
<br />
==Work thus far==<br />
<br />
Some page analysis.<br />
<br />
==Site Specifics==<br />
<br />
The structure is very similar to wallbase.cc. Scraping is very easy. Some urls have changed a bit.<br />
<br />
Stats:<br />
* Around 700k wallpapers so far.<br />
* Per day around 2.5k new wallpapers are uploaded.<br />
<br />
Data:<br />
* Categories: alpha.wallhaven.cc/tags/''id''<br />
* Tags: alpha.wallhaven.cc/tag/''id''<br />
* Wallpapers: alpha.wallhaven.cc/wallpaper/''id''<br />
* Users: alpha.wallhaven.cc/user/''id''<br />
<br />
Media:<br />
* Wallpapers: alpha.wallhaven.cc/wallpapers/full/wallhaven-ID(.jpg/.png)<br />
<br />
Other notes:<br />
* Tags can have aliases. This seems to be new. It's kinda cool, I think.<br />
* The domain implements rate limiting or the infrastructure is a lot slower compared to the wallbase.cc infrastructure.<br />
<br />
{{Navigation box}}<br />
[[Category:Image hosting]]</div>Yin