Time |
Nickname |
Message |
00:00
🔗
|
closure |
once again I find myself signing up for a dying platform. gotta love it |
00:00
🔗
|
balrog_ |
you can't sign up |
00:00
🔗
|
balrog_ |
the signup page is broken |
00:00
🔗
|
closure |
lol |
00:01
🔗
|
balrog_ |
however... http://www.bugmenot.com/view/posterous.com |
00:01
🔗
|
balrog_ |
I just reset the password of the first one to what's listed on bugmenot |
00:02
🔗
|
balrog_ |
hmm |
00:03
🔗
|
balrog_ |
there's an "id" key |
00:03
🔗
|
balrog_ |
I wonder if that's serial |
00:03
🔗
|
balrog_ |
and an api call to retrieve by id |
00:03
🔗
|
balrog_ |
yeah it seems serial |
00:03
🔗
|
balrog_ |
GET sites/:id |
00:03
🔗
|
balrog_ |
closure: you paying attention? |
00:04
🔗
|
balrog_ |
weird, the highest id that works is 299999 |
00:04
🔗
|
closure |
heh |
00:04
🔗
|
balrog_ |
hmm no |
00:04
🔗
|
balrog_ |
300001 works too |
00:04
🔗
|
closure |
missing users |
00:04
🔗
|
closure |
so you have a list? |
00:05
🔗
|
balrog_ |
no, I'm using the api at https://posterous.com/api |
00:05
🔗
|
balrog_ |
after logging in with bugmenot credentials |
00:06
🔗
|
balrog_ |
estimated about 10 million possibly vallid ids |
00:08
🔗
|
balrog_ |
so you should be able to whip up a program to make a list |
00:08
🔗
|
balrog_ |
right? |
00:10
🔗
|
closure |
I'd think so. It's not clear to me how their API uses the api token |
00:10
🔗
|
balrog_ |
see https://posterous.com/api/docs/pages/overview |
00:11
🔗
|
closure |
10k per day rate limit |
00:11
🔗
|
balrog_ |
:/ |
00:12
🔗
|
chronomex |
only 100 days! |
00:12
🔗
|
closure |
would need quite a lot of accounts, at least 15 or so working for months |
00:12
🔗
|
Cameron_D |
and we can always make more keys |
00:12
🔗
|
balrog_ |
how? |
00:12
🔗
|
Cameron_D |
lots of accounts? |
00:12
🔗
|
balrog_ |
account creation is disabled |
00:12
🔗
|
Cameron_D |
ahh |
00:13
🔗
|
closure |
emailing them a post still seems to open an account |
00:13
🔗
|
chronomex |
wheee |
00:13
🔗
|
chronomex |
really? |
00:13
🔗
|
chronomex |
nice |
00:13
🔗
|
closure |
http://joey-toqiv.posterous.com/post |
00:13
🔗
|
balrog_ |
oh really? |
00:13
🔗
|
balrog_ |
what email address? |
00:14
🔗
|
chronomex |
post@posterous.com ? |
00:14
🔗
|
ersi |
Remember that they have a shortener called Post.ly |
00:14
🔗
|
closure |
yes |
00:15
🔗
|
Cameron_D |
use the URL shortner project to get a list of URLs, and therefore users |
00:15
🔗
|
chronomex |
good thinking |
00:15
🔗
|
chronomex |
who has the giant url archive? swebb? |
00:15
🔗
|
ersi |
URLTeam does not have any Post.ly links for what I know |
00:15
🔗
|
chronomex |
k |
00:15
🔗
|
chronomex |
weird |
00:15
🔗
|
ersi |
Yes, swebb's got links from the tweethose |
00:16
🔗
|
closure |
2 birds |
00:16
🔗
|
ersi |
so if there's any post.ly links, he has a few. |
00:16
🔗
|
chronomex |
bird channel |
00:16
🔗
|
ersi |
Two birds, one cup. |
00:16
🔗
|
chronomex |
ogod |
00:17
🔗
|
Cameron_D |
If we start getting post.ly URLs now (How hard is it to add a new domain?) then after a few days we should have a decent list of usernames to start working with |
00:19
🔗
|
closure |
curl -X GET --user bugmenot@trash-mail.com:bugmenot23 -d "api_token=JAHeFmHJldwlsrlyHcutDEBBhvhDvFbt" http://posterous.com/api/2/sites/99999 |
00:20
🔗
|
closure |
well, that works.. json parsing and looping 1 to 10000 left |
00:21
🔗
|
closure |
also they have < 10800000 sites |
00:23
🔗
|
balrog_ |
creating an account by email seems to take me to https://posterous.com/register?flow=newcomment |
00:23
🔗
|
balrog_ |
closure: based on? |
00:25
🔗
|
closure |
escatly 10793724, based on bisection |
00:26
🔗
|
balrog_ |
be warned, there are some invalid/deleted ones |
00:26
🔗
|
balrog_ |
closure: does opening an account by email actually /work/? |
00:26
🔗
|
balrog_ |
hmm |
00:26
🔗
|
balrog_ |
the iOS app is still alive |
00:26
🔗
|
* |
balrog_ grabs it |
00:31
🔗
|
closure |
hmm, posting by email seems to make a site, but you still have to sign up to claim it |
00:31
🔗
|
* |
closure tries email password recovery |
00:31
🔗
|
closure |
nope |
00:31
🔗
|
balrog_ |
I'm going to try ios account creation |
00:34
🔗
|
balrog_ |
YES |
00:34
🔗
|
balrog_ |
you can sign up from the iphone app |
00:34
🔗
|
balrog_ |
"Thanks for using Posterous Spaces! Please click the link below to confirm your email address." |
00:34
🔗
|
closure |
lol |
00:34
🔗
|
chronomex |
awesome |
00:35
🔗
|
* |
closure rushes out to buy an iphone. 100 iphones |
00:35
🔗
|
balrog_ |
you only need one |
00:35
🔗
|
chronomex |
do they have an android app |
00:35
🔗
|
balrog_ |
or an ipad |
00:35
🔗
|
balrog_ |
or an ipod touch |
00:35
🔗
|
balrog_ |
I want to say yes |
00:35
🔗
|
balrog_ |
https://play.google.com/store/apps/details?id=com.Posterous&hl=en |
00:35
🔗
|
balrog_ |
aaah |
00:35
🔗
|
balrog_ |
that's not gonna help |
00:35
🔗
|
balrog_ |
you have to request an API key |
00:36
🔗
|
balrog_ |
and all requests appear to be reviewed? huh |
00:36
🔗
|
chronomex |
suck |
00:36
🔗
|
closure |
nooo.. really? |
00:36
🔗
|
closure |
I used the bugmenot account, went to the api page, there's a place to click to see the key |
00:36
🔗
|
balrog_ |
it's possible someone created it with a different email |
00:36
🔗
|
balrog_ |
then changed it to bugmenot after the api key was generated |
00:37
🔗
|
closure |
so you don't have view token on http://posterous.com/api ? |
00:40
🔗
|
balrog_ |
when I click it, it says: "To gain access to the Posterous Spaces API, please submit a request via our API request form." |
00:40
🔗
|
balrog_ |
with the form linked |
00:40
🔗
|
balrog_ |
the form asks for your name, email, and phone number, and why you want to use the API. |
00:40
🔗
|
closure |
I'm sure you can write something interesting there ;) |
00:41
🔗
|
balrog_ |
what's funny is that the send request buttons on the page all work |
00:43
🔗
|
closure |
oh, in that case.. load it up in firebug or chrome dev console, you can see it make the request there and get key |
00:43
🔗
|
balrog_ |
it may be using the cookie there |
00:43
🔗
|
balrog_ |
but it is making a GET request on the API |
00:43
🔗
|
balrog_ |
yeah that's what it appears to be doing |
00:43
🔗
|
balrog_ |
using cookie-auth |
00:44
🔗
|
balrog_ |
possibly referer-checking and such as well |
00:44
🔗
|
closure |
look for api_token= |
00:44
🔗
|
chronomex |
winrar |
00:44
🔗
|
balrog_ |
there isn't any |
00:44
🔗
|
balrog_ |
I'm looking at the request in wireshark |
00:45
🔗
|
balrog_ |
huh |
00:45
🔗
|
balrog_ |
now I'm getting "error": "Unauthorized You are not authorized to view this site." |
00:45
🔗
|
balrog_ |
oh wait |
00:45
🔗
|
balrog_ |
nvm |
00:46
🔗
|
balrog_ |
yeah it's using an XMLHttpRequest |
00:46
🔗
|
balrog_ |
and I see no api key |
00:47
🔗
|
balrog_ |
X-RateLimit-Remaining doesn't seem to be decrementing though |
00:48
🔗
|
closure |
there's a cookie |
00:49
🔗
|
closure |
ooh, that's interesting |
00:49
🔗
|
closure |
perhaps they forgot to rate limit this way? |
00:49
🔗
|
Wack0 |
what's up |
00:50
🔗
|
balrog_ |
it's possible |
00:52
🔗
|
closure |
curl -H "X-Requested-With: XMLHttpRequest" -H "X-Xhrsource: posterous" -X GET -H "Cookie: _sharebymail_session_id=e55e807375f457efa9a22e091c0685c7; email=bugmenot%40trash-mail.com; _plogin=Veritas; logged_in_before=true" http://posterous.com/api/2/sites/107 |
00:52
🔗
|
closure |
try that |
00:52
🔗
|
closure |
erm, assuming this is not our only usable api :) |
00:52
🔗
|
balrog_ |
http://hastebin.com/finekoveva |
00:53
🔗
|
balrog_ |
closure: hm? |
00:53
🔗
|
closure |
ok, so they also limit by IP, because that worked for me |
00:54
🔗
|
balrog_ |
so that didn't work? |
00:54
🔗
|
closure |
for you? |
00:54
🔗
|
balrog_ |
see the hastebin |
00:54
🔗
|
closure |
oh, NM, I misread it. ok. |
00:54
🔗
|
closure |
well, until they notice us and ban.. let's see if the request count is going down now |
00:55
🔗
|
balrog_ |
I suggest creating like 100 accounts |
00:56
🔗
|
closure |
hmm, it seems your curl logged me out ;) |
00:56
🔗
|
balrog_ |
LOL |
00:56
🔗
|
balrog_ |
well they're clearly vulnerable to the firesheep exploit |
00:57
🔗
|
closure |
ah, no, I misunderstood what curl -I does |
00:57
🔗
|
balrog_ |
ohh |
00:57
🔗
|
closure |
well, the api rate limit is *not* going down |
00:57
🔗
|
balrog_ |
hah! |
00:57
🔗
|
closure |
so.. time to put some load on the servers people |
00:57
🔗
|
chronomex |
balrog_: firesheep author is a friend of mine ;) |
00:57
🔗
|
* |
closure claims first million |
00:58
🔗
|
chronomex |
hahaha |
00:58
🔗
|
chronomex |
you would |
01:00
🔗
|
beardicus |
just ran the afore-pasted curl 3 times here, btw. seemed to work fine. |
01:00
🔗
|
closure |
so, I'm running this: for s in $(seq 1 1000000); do curl -H "X-Requested-With: XMLHttpRequest" -H "X-Xhrsource: posterous" -X GET -H "Cookie: _sharebymail_session_id=e55e807375f457efa9a22e091c0685c7; email=bugmenot%40trash-mail.com; _plogin=Veritas; logged_in_before=true" http://posterous.com/api/2/sites/$s >| site.$s; done |
01:00
🔗
|
closure |
can pull the user urls out of the files later |
01:01
🔗
|
closure |
have 300 done, no sign of the rate limit header going down |
01:02
🔗
|
balrog_ |
the apps probably use the API too and have their own platform API keys |
01:02
🔗
|
balrog_ |
it's not exactly nice but we could rip them out :) |
01:03
🔗
|
closure |
seems you don't need a key, just a login cookie |
01:03
🔗
|
closure |
so loging with bugmenot, get the cookie |
01:05
🔗
|
chronomex |
haha |
01:08
🔗
|
kanzure |
balrog_: i suggest using HTTP HEAD to determine the existence of pages, instead of GET. |
01:11
🔗
|
closure |
I'm running 100 concurrent api grabbers now. Seems to be working. |
01:12
🔗
|
closure |
hah, they rename spam sites to $site-banned.posterous.com |
01:13
🔗
|
closure |
20k users snarfed |
01:13
🔗
|
beardicus |
closure, should i start another range, or have you got this covered? |
01:13
🔗
|
closure |
I think I have 1-1 million |
01:14
🔗
|
closure |
just getting usernames, not downloading any site |
01:14
🔗
|
beardicus |
righto. |
01:15
🔗
|
closure |
which will be around a 4 gb download itself |
01:23
🔗
|
beardicus |
10793855 is the highest number that seems to give me results, though if there are gaps i may be mucking things up a bit. |
01:23
🔗
|
balrog_ |
there are gaps but that seems to be a fairly good ballpark estimate |
01:23
🔗
|
balrog_ |
we need a new irc channel for this |
01:24
🔗
|
EcapsCore |
GENERIC MESSAGE ABOUT POSTEROUS |
01:24
🔗
|
beardicus |
EcapsCore, GENERIC "WE'RE ON IT" RESPONSE |
01:24
🔗
|
beardicus |
and thanks :) |
01:24
🔗
|
EcapsCore |
GENERIC "OKAY" RESPONSE |
01:24
🔗
|
beardicus |
or rather, closure / balrog_ / other_smart_people are on it. |
01:25
🔗
|
EcapsCore |
Think we'll be able to do it? |
01:25
🔗
|
balrog_ |
I'm still mad about nbc's handling of everyblock |
01:25
🔗
|
balrog_ |
it's like they deliberately did it to prevent archival |
01:26
🔗
|
balrog_ |
http://www.wbez.org/blogs/britt-julious/2013-02/being-here-vs-living-here-why-everyblock-mattered-105550 |
01:27
🔗
|
closure |
need about 10 other people to run this on a million each: http://pastebin.com/4ka5niDy |
01:28
🔗
|
balrog_ |
I can run it, but we need to keep order |
01:28
🔗
|
closure |
wiki? |
01:28
🔗
|
closure |
btw, that session_id might expire in an hour or who knows |
01:30
🔗
|
NotGLaDOS |
Gah, horrible mobile IRC clients. |
01:32
🔗
|
* |
closure has 112 thousand sites already |
01:32
🔗
|
closure |
so, until they ban me, I guess it's going pretty well ;) |
01:32
🔗
|
balrog_ |
;) |
01:33
🔗
|
beardicus |
shall i start a table on the wiki? |
01:33
🔗
|
closure |
yes please |
01:33
🔗
|
balrog_ |
once we have all names, how do we download sites? |
01:34
🔗
|
closure |
probably one of the new fancy warrior jobs |
01:36
🔗
|
dashcloud |
so is there a page or something that details what people have done already? |
01:36
🔗
|
dashcloud |
it's just downloading names right? |
01:36
🔗
|
closure |
right, names and a few other bits and bobs |
01:38
🔗
|
beardicus |
ugh. http://archiveteam.org/index.php?title=Posterous |
01:38
🔗
|
beardicus |
wikis. |
01:39
🔗
|
beardicus |
filling in all ranges now... |
01:39
🔗
|
balrog_ |
I'll grab 7m |
01:39
🔗
|
dashcloud |
if you don't like the wiki, just use some collaborative editing tool online |
01:40
🔗
|
dashcloud |
I'll do 1-2m unless it's already done (example makes me think so) |
01:41
🔗
|
balrog_ |
please change the script to make curl not dump anything to the screen |
01:41
🔗
|
beardicus |
ranges 1 - 11,000,000 on the wiki for claiming. |
01:42
🔗
|
balrog_ |
I have 7m |
01:42
🔗
|
beardicus |
ok. will mark it for balrog_ . |
01:42
🔗
|
NotGLaDOS |
I'll grab 9m |
01:42
🔗
|
beardicus |
got it NotGLaDOS |
01:43
🔗
|
balrog_ |
can someone please make the script not spit out so much crap to console? |
01:43
🔗
|
balrog_ |
add -s to the curl line |
01:45
🔗
|
dashcloud |
curl -sH ? |
01:45
🔗
|
balrog_ |
curl -s -H ... |
01:49
🔗
|
balrog_ |
hah |
01:49
🔗
|
balrog_ |
it's not going very fast for me, 5589 so far |
01:49
🔗
|
balrog_ |
and load averages: 95.54 79.33 41.68 |
01:49
🔗
|
chronomex |
o_O |
01:49
🔗
|
NotGLaDOS |
Nah, you'll be alright aye |
01:49
🔗
|
closure |
here is a simple data processor to get a list of sites found: http://pastebin.com/YpLXBb1w |
01:50
🔗
|
closure |
hmm, my load is only 50 or so |
01:50
🔗
|
balrog_ |
what speed are you getting? |
01:50
🔗
|
balrog_ |
1 million may take too long |
01:50
🔗
|
balrog_ |
and/or the cookie may expire |
01:50
🔗
|
closure |
well, it happens to be round-tripping all the way to the UK, and I have 206000 done |
01:51
🔗
|
balrog_ |
this box is IO-starved :/ |
01:51
🔗
|
closure |
granted, this is a well-connected server |
01:51
🔗
|
balrog_ |
my connection is fine |
01:51
🔗
|
balrog_ |
my IO is shit |
01:51
🔗
|
balrog_ |
I need something with eSATA |
01:51
🔗
|
closure |
it'd be possible to hook the curl output up to the perl script and then you just get a list of sites |
01:51
🔗
|
balrog_ |
right now it's running off a USB disk |
01:56
🔗
|
dashcloud |
how am I supposed to run it? for chunk in $(seq 100 199); do ./snarf $chunk &; done ? so I'm guessing the file should be called snarf.sh? |
01:56
🔗
|
closure |
yeah |
01:57
🔗
|
closure |
well, snarf |
01:57
🔗
|
NotGLaDOS |
Yay, dos line endings interfering with the script! |
01:58
🔗
|
dashcloud |
so, it's not just me- I feel a lot better now |
01:58
🔗
|
closure |
yay appropriate irc nick |
01:59
🔗
|
NotGLaDOS |
Ugh, forgot how to change a file encoding in vim |
01:59
🔗
|
closure |
:%s/\r//g maybe |
02:00
🔗
|
dashcloud |
apparently the tools are now called tofromdos , and you call fromdos or todos |
02:01
🔗
|
NotGLaDOS |
"Pattern not found: Y0LOSWAG" |
02:01
🔗
|
NotGLaDOS |
Wut |
02:08
🔗
|
closure |
ok, they're overloaded I think |
02:08
🔗
|
closure |
getting connect fails |
02:08
🔗
|
closure |
or I'm banned |
02:08
🔗
|
balrog_ |
getting fails here too |
02:09
🔗
|
closure |
I'm getting connect hangs |
02:09
🔗
|
balrog_ |
actually no |
02:09
🔗
|
balrog_ |
it's still working here |
02:13
🔗
|
closure |
I have a better (less disk IO script) |
02:13
🔗
|
balrog_ |
will it pick up if I kill this one? |
02:14
🔗
|
closure |
no |
02:14
🔗
|
closure |
well, I could make it |
02:14
🔗
|
closure |
one sec |
02:14
🔗
|
NotGLaDOS |
Going to wait for that script befire |
02:15
🔗
|
NotGLaDOS |
I start. Last time I had a high disk IO (feeding 103 books to my bot), my host yelled at me |
02:16
🔗
|
beardicus |
hmm. doing about 2000 per minute here... load avg 2. |
02:16
🔗
|
beardicus |
scratch that, 4 :) |
02:17
🔗
|
closure |
http://pastebin.com/ZJ1WEi56 |
02:17
🔗
|
balrog_ |
load averages: 93.97 94.47 86.21 |
02:18
🔗
|
closure |
named in honor of my recently meeting Kryton. nice guy ;) |
02:27
🔗
|
closure |
my ip is 100% banned. I have 4 other IPs here.. was there a way to make curl url a different one? |
02:28
🔗
|
kanzure |
can't you just ask them for a dump |
02:29
🔗
|
balrog_ |
so you can't even access their website? |
02:30
🔗
|
closure |
nope |
02:30
🔗
|
balrog_ |
wow |
02:31
🔗
|
closure |
I guess someone notices the .24 million hits |
02:31
🔗
|
balrog_ |
33633 grabbed before stop |
02:32
🔗
|
closure |
smeg will replay the site.* files you got and resume |
02:32
🔗
|
balrog_ |
running it. |
02:32
🔗
|
balrog_ |
not as fast as I'd like because of I/O contention |
02:34
🔗
|
closure |
yeah, it'll be IO-y when resuming |
02:34
🔗
|
* |
closure guesses that aroud 99% of powerous sites are spam |
02:35
🔗
|
closure |
typoy tonigt |
02:35
🔗
|
balrog_ |
lol why you think that? |
02:35
🔗
|
closure |
just looking at all the iphonefoo and dealershipsblah names |
02:35
🔗
|
balrog_ |
how do I know when they ban my IP? |
02:36
🔗
|
balrog_ |
maybe I should not let them do this? |
02:37
🔗
|
closure |
432175 iphone-wodq.posterous.com |
02:37
🔗
|
closure |
432177 iphone-sylf.posterous.com |
02:37
🔗
|
closure |
432179 iphone-afii.posterous.com |
02:39
🔗
|
closure |
522173 idxf0d1mnl-banned.posterous.com |
02:39
🔗
|
closure |
522175 o6nuzdet0g-banned.posterous.com |
02:39
🔗
|
closure |
522177 bgv23e4mls-banned.posterous.com |
02:39
🔗
|
closure |
522179 ik1gtjuxyi-banned.posterous.com |
02:39
🔗
|
closure |
lol |
02:39
🔗
|
closure |
I have pages of that |
02:39
🔗
|
balrog_ |
cat *.hostnames | wc -l gives me 33028 |
02:43
🔗
|
balrog_ |
btw what happened with xanga-grab? |
02:49
🔗
|
balrog_ |
closure: http://www.sysadminvalley.com/2009/06/29/curl-requests-by-binding-to-different-ip-address/ |
02:49
🔗
|
balrog_ |
see man page |
02:53
🔗
|
closure |
950831 best-price-for-nexium-generic-banned-banned.posterous.com |
02:53
🔗
|
closure |
double banned! |
02:54
🔗
|
balrog_ |
:D |
02:54
🔗
|
Wack0 |
someone really likes their banhammer |
02:55
🔗
|
closure |
cool, got around the ban |
02:55
🔗
|
closure |
yeah |
02:56
🔗
|
balrog_ |
hmm, the numbers stopped advancing |
02:56
🔗
|
closure |
that's ban time, probably |
02:56
🔗
|
balrog_ |
wait, they started again |
02:57
🔗
|
balrog_ |
but that may not mean much |
02:57
🔗
|
closure |
or it's a little overloaded maybe? |
02:57
🔗
|
balrog_ |
how do I check if it's valid? |
02:57
🔗
|
closure |
the hostname files will only grow if it's fail |
02:58
🔗
|
closure |
tail -f *.hostnames |
02:58
🔗
|
closure |
er, grow if it's valid |
02:59
🔗
|
balrog_ |
we is banned |
03:00
🔗
|
balrog_ |
cat *.hostnames | wc -l : 39462 |
03:00
🔗
|
balrog_ |
:\ |
03:00
🔗
|
balrog_ |
closure: add some throttling |
03:01
🔗
|
balrog_ |
curl http://posterous.com |
03:01
🔗
|
balrog_ |
curl: (7) couldn't connect to host |
03:02
🔗
|
closure |
well, you could run less than 100 at a time |
03:02
🔗
|
closure |
maybe they won't ban at 10 at a time or something |
03:07
🔗
|
dashcloud |
so does it really start at 100100 or is that a mistake? if it's supposed to start at 100000, could someone grab those first ones? |
03:08
🔗
|
closure |
dashcloud: you did a million already? |
03:08
🔗
|
closure |
oh, I'm getting from 1, don't worry |
03:08
🔗
|
dashcloud |
I ran this: for chunk in $(seq 100 199); do ./snarf $chunk &; done |
03:08
🔗
|
dashcloud |
wait- I think I know what happened |
03:09
🔗
|
dashcloud |
I think I botched part of the download for 1-2 million |
03:09
🔗
|
closure |
I'm curious how you evaded the IP ban |
03:09
🔗
|
dashcloud |
probably because I didn't actually download a million |
03:10
🔗
|
dashcloud |
looks like I had done 10k or so |
03:10
🔗
|
closure |
switch to smeg, it'll resume |
03:10
🔗
|
balrog_ |
they banned me at about 40k |
03:11
🔗
|
dashcloud |
I think I edited the snarf.sh and accidentally shrunk the range |
03:16
🔗
|
closure |
this posterous thing is a nice preview of how much twitter cares about preserving all their data, btw.. |
03:20
🔗
|
dashcloud |
they've got it preserved- you just can't get to it unless you've got $$$$$$ bucks, and an "appropriate" business plan |
03:20
🔗
|
dashcloud |
you can't do analytics and data-mining stuff without an extensive archive |
03:21
🔗
|
closure |
well, I just found my 250000'th hostname |
03:23
🔗
|
* |
closure thinks it's hilarous that they have an api key with such an easily bypassed rate limit. wonder how common that is? |
03:26
🔗
|
closure |
fwiw, they seem to have around 25 boxes in the cluster handling these api calls, based on some headers |
03:26
🔗
|
balrog_ |
closure: lmk if you figure a way around the ip-block |
03:26
🔗
|
closure |
well, if you have more IP addresses, I do |
03:27
🔗
|
balrog_ |
I don't :/ |
03:27
🔗
|
balrog_ |
well I do but I don't want to waste them,,, the banned one was my primary |
03:29
🔗
|
closure |
there's always EC2 |
03:29
🔗
|
godane |
looks like SketchCow got some computer power magazines uploaded: http://archive.org/details/computer_shopper |
03:38
🔗
|
closure |
I've made a better smeg that automatically resumes from the *.hostnames files when re-run. So you can move the files elsewhere or give them to someone else to continue. |
03:38
🔗
|
closure |
http://pastebin.com/VUvydX0q |
03:39
🔗
|
closure |
hmm, may be the ugliest for loop I've ever written in shell |
03:39
🔗
|
balrog_ |
well someone wants these? |
03:39
🔗
|
balrog_ |
ew |
03:39
🔗
|
closure |
I'll take them |
03:39
🔗
|
closure |
if you're done |
03:40
🔗
|
balrog_ |
I have 39462 grabbed and I'm banned |
03:41
🔗
|
balrog_ |
closure: see pm |
03:44
🔗
|
closure |
you sure got banned before many were done.. I'll bet it's not automatic, just they noticed you |
03:44
🔗
|
* |
closure reserves an ip for the 3 am run ;) |
03:47
🔗
|
dashcloud |
here's an interesting project I found: https://github.com/calufa/tales-core |
03:47
🔗
|
dashcloud |
block-tolerant scraper |
03:49
🔗
|
dashcloud |
I'm going to remove my name from my blocks, because I won't be around this week to babysit them |
03:51
🔗
|
closure |
we can always go full-on-warrier job just to get the hostnames if it comes to it |
03:52
🔗
|
closure |
... and take a month or whatever |
03:52
🔗
|
closure |
I'd worry they might fix their broken rate limit before done |
03:55
🔗
|
closure |
whups 2 more ips banned |
04:02
🔗
|
beardicus |
whack-a-mole |
04:02
🔗
|
beardicus |
try rate-limiting? |
04:02
🔗
|
closure |
spose I ougt to |
04:02
🔗
|
beardicus |
maybe it's time for the week-long slow burn? |
04:03
🔗
|
beardicus |
i mean... we have six whole weeks... how "generous" of them. |
04:03
🔗
|
closure |
gotta get the data too |
04:03
🔗
|
closure |
if they rate limit just hostnames ...... |
04:04
🔗
|
closure |
otoh, this is probably running on some barely scaled part of their architecture |
04:08
🔗
|
closure |
I have over half a million hostnames if someone wants to start the site mirroring BTW |
04:28
🔗
|
Wack0 |
hey. what's going on. |
04:29
🔗
|
beardicus |
closure, i'm going to keep at my range overnight... one measly thread on a new IP... just to see what happens. |
04:33
🔗
|
closure |
k.. I have an ip that's running only 10 threads, also to see |
04:33
🔗
|
beardicus |
hmm. 'bout 80 per minute with one thread. ten days to finish a block of 1 million. |
04:53
🔗
|
closure |
not bad |
04:57
🔗
|
beardicus |
100 per minute now. 7 days. |
04:57
🔗
|
beardicus |
though i guess that's not counting misses at all, so it may be a bit faster. |
05:00
🔗
|
beardicus |
poop. banned with one thread after just 2000 or so requests. |
05:01
🔗
|
closure |
wow |
05:01
🔗
|
beardicus |
or i'm just hitting a big dry patch. |
05:01
🔗
|
closure |
yeah, me also banned |
05:02
🔗
|
closure |
so, admins are sitting at the console with a red bull in one hand and a banhammer in the other |
05:02
🔗
|
closure |
time to go away for 12 hours ;) |
05:03
🔗
|
* |
closure has 935845 hostnames including other grabs |
05:39
🔗
|
SketchCow |
have we set up #perposterus yet? |
05:39
🔗
|
SketchCow |
sorry, #preposterus |
05:46
🔗
|
closure |
beats #closurus (no) |
05:47
🔗
|
closure |
the constant ip banning is a bit of a problem |
05:47
🔗
|
closure |
although I am over 1 million, on my 5th ip |
05:49
🔗
|
beardicus |
i got 5000 with a single thread on my third IP. sadface. |
05:50
🔗
|
beardicus |
i guess maybe balls-out is the way to go... slurp it quick 'til you're banned. |
05:52
🔗
|
closure |
possibly. or it just keeps them opening the red bull |
05:52
🔗
|
chronomex |
schhk |
09:35
🔗
|
lemonkey |
http://techcrunch.com/2013/02/15/posterous-will-shut-down-on-april-30th-co-founder-garry-tan-launches-posthaven-to-save-your-sites |
09:37
🔗
|
omf_ |
we are already mapping the site out |
09:38
🔗
|
omf_ |
but we keep getting banned |
09:39
🔗
|
BlueMax |
has that ever stopped us before? :D |
09:41
🔗
|
omf_ |
not that I know of |
09:42
🔗
|
omf_ |
banning only builds the rage |
10:35
🔗
|
GLaDOS |
closure: the script seems to be scarce with output info |
10:35
🔗
|
db48x |
sweet, just found a picture of my great great great grandmother, great great grandmother, great grandmother and my grandmother all in a line out on the farm |
11:00
🔗
|
Nemo_bis |
:o |
11:34
🔗
|
SketchCow |
Could someone write to Garry Tan to ask him what the story is and how big Posterous is? |
12:51
🔗
|
godane |
SketchCow: i have over 7k videos now in g4video-web collection |
13:45
🔗
|
closure |
SketchCow: we know Posterous has just over 10 million sites. Size of sites unknown |
20:49
🔗
|
SketchCow |
closure: Thanks |
21:24
🔗
|
SketchCow |
Oh boy, ANOTHER person going "it was free, what do you suspect" |
21:24
🔗
|
SketchCow |
"What did you expect" |
21:34
🔗
|
ersi |
There's a bunch of those morons. |
21:35
🔗
|
ersi |
Don't go reading CircleHackerJerkerNews. ;-) |
21:38
🔗
|
SketchCow |
Wait- you mean the Y combinator message oard thinks this is fine? |
21:41
🔗
|
ersi |
Yes, basically. But it might be related to the fact that it's mainly a bunch of startup inmasturbators there |
21:42
🔗
|
ersi |
"Of COURSE it's fine to burn the house down when money runs out!" |
22:10
🔗
|
SketchCow |
http://www.archiveteam.org/index.php?title=Main_Page |
22:12
🔗
|
Smiley |
:) |
22:12
🔗
|
omf_ |
and most hners think they could build something to take posterous' place |
22:13
🔗
|
SketchCow |
OK, I have to go take a flight from Wellington to Auckland. |
22:14
🔗
|
Smiley |
plz try not to fall in the sea. |
22:14
🔗
|
SketchCow |
On it |
22:15
🔗
|
SketchCow |
This Posterous one is different. People are getting it. |
22:15
🔗
|
SketchCow |
I mean, they got it with geocities and others. |
22:15
🔗
|
SketchCow |
But people are really getting this one, what it represents. |
22:15
🔗
|
Smiley |
\o/ |
22:17
🔗
|
SketchCow |
OK, I'll be back on later. |
22:22
🔗
|
ersi |
SketchCow: Neat. New logo/image. |