Difference between revisions of "Puu.sh"

From Archiveteam
Jump to navigation Jump to search
m (MOTHERFUCKER ! ! !)
m (MOTHERFUCKER ! ! !)
Line 13: Line 13:


'''puu.sh''' is a file sharing service that was created in 2010.
'''puu.sh''' is a file sharing service that was created in 2010.
== '''MOTHERFUCKER ! ! !''' ==
== '''MOTHERFUCKER ! ! !''' ==


== '''MOTHERFUCKER ! ! !''' ==
== '''MOTHERFUCKER ! ! !''' ==

Revision as of 11:49, 17 January 2017

puu.sh
Puu.sh logo
That puush could not be found
That puush could not be found
URL http://puush.me/
Status Special case
Archiving status Saved! ~17 TB of data
Archiving type Unknown
Project source https://github.com/ArchiveTeam/puush-grab
Project tracker http://chfoo-d1.mooo.com:8031/puush/
IRC channel #pushharder (on hackint)

puu.sh is a file sharing service that was created in 2010.

MOTHERFUCKER ! ! !

MOTHERFUCKER ! ! !

MOTHERFUCKER ! ! !

MOTHERFUCKER ! ! !

MOTHERFUCKER ! ! !

How to Help

If you are comfortable running scripts manually (i.e., outside the Warrior) go to the GitHub repo for information how to run the scripts.

Where can I find a file?

If you know the item ID, go the the Wayback Machine and enter the URL as http://puu.sh/XXXXX without any filename extension. The Wayback Machine treats the URL as case-insensitive so you may need to explore which URL is the one you are looking for.

If the Puush is private, it is unlikely archived as we do not guess the access code (the bunch of characters after the item ID). You can, however, use wildcards as a way of browsing the Wayback Machine. Here's an example.

Archives

Archives are uploaded to the Archive Team Puush collection. These are the original WARC files. They are 10GB in size instead of the typical 50GB because the project is staged on cloud hosting with small disk space.

Tracker information

  • The tracker and rsync target is being run by User:Chfoo.
  • On 2013-08-22, Redis was unable to background save due to failed fork().
  • On 2013-08-27, an attempt was made to clear out the tracker log. Redis crashed.
  • On 2014-01-01, an old, vulnerable auto-queue script attempted to load 36,000,000 items. Redis was killed by OOM killer. (Offending Tweet).
  • On 2014-04-27, the IP addresses of the tracker and IP of regular clients were banned. Puush also switched the default pool to private and a robots.txt file was added.
  • On 2014-05-28, the tracker is officially decommissioned.

Logs

Ranges

Date Loaded Start (Base 10) End (Base 10) Alphabet Notes
2013-08-06 0 (0) 3UXX3 (51607749) Legacy At most 10 URLs per item
2013-08-27 10 (62) 3UXX3 (51607749) Legacy At most 13 URLs per item (unlucky 13)
2013-09-08 3UXX4 (51607750) 49999 (61285459) Legacy At most 13 URLs per item
2013-09-13 4999a (61285460) 4mPOO (64547754) Puush At most 13 URLs per item
2013-09-15 4mPOP (64547755) 4rrrr (65645689) Puush At most 13 URLs per item
2013-09-16 4rrrs (65645690) 4sQ00 (65978416) Puush At most 13 URLs per item
4sQ01 (65978417) Puush At most 13 URLs per item. Auto-queues using a script that checks Twitter.

Statistics are occasionally updated on a Puush ID Increment Stats spreadsheet.

Ideas

  • Keep accessing each and every file - likely unsustainable in the long run in the event that expiry times are shortened
  • Grab everything - the site appears to use incremental images IDs

Shortcode Stats

Number of shortcodes:	 526
Number of string lengths:	 3
3 	 5 	   0.951%
4 	 125 	  23.764%
5 	 396 	  75.285%
Number of unique characters:	 62
Number of characters used:	 2495
0 	 24 	   0.962%
1 	 155 	   6.212%
2 	 234 	   9.379%
3 	 121 	   4.850%
4 	 24 	   0.962%
5 	 45 	   1.804%
6 	 26 	   1.042%
7 	 37 	   1.483%
8 	 25 	   1.002%
9 	 34 	   1.363%
A 	 46 	   1.844%
B 	 37 	   1.483%
C 	 46 	   1.844%
D 	 38 	   1.523%
E 	 36 	   1.443%
F 	 42 	   1.683%
G 	 33 	   1.323%
H 	 31 	   1.242%
I 	 37 	   1.483%
J 	 32 	   1.283%
K 	 38 	   1.523%
L 	 35 	   1.403%
M 	 28 	   1.122%
N 	 39 	   1.563%
O 	 31 	   1.242%
P 	 44 	   1.764%
Q 	 28 	   1.122%
R 	 36 	   1.443%
S 	 31 	   1.242%
T 	 26 	   1.042%
U 	 29 	   1.162%
V 	 32 	   1.283%
W 	 45 	   1.804%
X 	 30 	   1.202%
Y 	 29 	   1.162%
Z 	 30 	   1.202%
a 	 34 	   1.363%
b 	 39 	   1.563%
c 	 32 	   1.283%
d 	 46 	   1.844%
e 	 27 	   1.082%
f 	 30 	   1.202%
g 	 39 	   1.563%
h 	 38 	   1.523%
i 	 30 	   1.202%
j 	 34 	   1.363%
k 	 24 	   0.962%
l 	 29 	   1.162%
m 	 40 	   1.603%
n 	 40 	   1.603%
o 	 38 	   1.523%
p 	 25 	   1.002%
q 	 26 	   1.042%
r 	 34 	   1.363%
s 	 23 	   0.922%
t 	 45 	   1.804%
u 	 36 	   1.443%
v 	 27 	   1.082%
w 	 32 	   1.283%
x 	 45 	   1.804%
y 	 26 	   1.042%
z 	 22 	   0.882%

How many items are there?

<chfoo> [...] using the decentralized script i wrote, i've grabbed [randomly] 3824 items (totalling 785M) out of 6409 requests (a 60% hit rate at a max id of "40000" or 59,105,344). so, in theory, there's 35,463,206 items based on this sample and max id.