| Time |
Nickname |
Message |
|
00:01
🔗
|
godane |
shaqfu: I'm getting the ?page now |
|
00:02
🔗
|
godane |
i only need --post-data instead of --post-data --user=blah --password=blah |
|
00:02
🔗
|
godane |
otherwise i will get ?page=2.html?user=blah.html or something |
|
00:43
🔗
|
shaqfu |
Ah, clever |
|
00:51
🔗
|
instence |
shaqfu, if you have a gun shoot me in the brain |
|
00:51
🔗
|
instence |
or give me temporary amnesia |
|
00:53
🔗
|
instence |
i just wish during archiving there was a way to de-stress the brain somehow so you could start fresh |
|
00:53
🔗
|
instence |
i guess that is what naps are for |
|
00:53
🔗
|
instence |
but time is always of the essence so its like *fuck* |
|
00:59
🔗
|
Coderjoe |
woo |
|
00:59
🔗
|
Coderjoe |
infocube 2.0 is now at 221% |
|
01:00
🔗
|
balrog_ |
wow. |
|
03:10
🔗
|
godane |
Coderjoe: i thought we was doing in -bs |
|
03:11
🔗
|
godane |
*talking |
|
03:13
🔗
|
godane |
looks like starfinder is in avgeeks |
|
03:16
🔗
|
godane |
ooks like a ton of nasa videos was saved by avgeeks too |
|
03:16
🔗
|
Coderjoe |
i don't need a running tally of what is there |
|
04:39
🔗
|
godane |
just found something funny |
|
04:40
🔗
|
godane |
i torrent from kat.ph was removed by the request of copyright owner |
|
04:42
🔗
|
shaqfu |
Which? |
|
04:43
🔗
|
godane |
http://kat.ph/keri-hilson-pretty-girl-rock-2010-single-sw-t4672360.html |
|
12:58
🔗
|
Schbirid |
hm, "q2l\#354ft.map": Invalid or incomplete multibyte or wide character". would that be a ascii ì ? |
|
12:58
🔗
|
Schbirid |
any idea how i can find out? |
|
12:58
🔗
|
Schbirid |
my fs are utf8 but no idea what the source was |
|
13:21
🔗
|
ersi |
http://www.michaelnielsen.org/ddi/how-to-crawl-a-quarter-billion-webpages-in-40-hours/ [HN discussion: ] http://news.ycombinator.com/item?id=4367933 |
|
13:35
🔗
|
Schbirid |
On December 19, 2008, BusinessWeek listed Cuil as one of the most successful U.S. startups of 2008 |
|
13:35
🔗
|
Schbirid |
, based on the amount of money they raised. |
|
13:36
🔗
|
godane |
my kat.ph-community is still going |
|
13:39
🔗
|
winr4r |
Schbirid: lol, cuil |
|
13:49
🔗
|
Schbirid |
wicked, i mounted that forumlpanet bz2 again and now cpu usage is no problem. i wonder what went wrong the other time |
|
13:49
🔗
|
Schbirid |
s |
|
13:49
🔗
|
Schbirid |
this rock |
|
14:01
🔗
|
ersi |
I've encountered Common Crawl before, but the Everything-Amazon-tech-and-Cloud stuff scares me away |
|
14:15
🔗
|
alard |
Can't you just download the data and use it somewhere else? |
|
14:17
🔗
|
ersi |
yeah, but you need an Amazon account and pay for the download etc |
|
14:17
🔗
|
ersi |
I mean, sure - that's fair. But it make me reluctant to take a look at it |
|
14:20
🔗
|
alard |
https://aws-publicdatasets.s3.amazonaws.com/?prefix=common-crawl/crawl-002 |
|
14:21
🔗
|
alard |
I think you can download everything for free, no account needed. |
|
14:22
🔗
|
alard |
https://s3.amazonaws.com/aws-publicdatasets/common-crawl/parse-output/segment/1341690169105/1341826131693_45.arc.gz |
|
14:23
🔗
|
ersi |
oh, cool |
|
15:18
🔗
|
godane |
all most 13000 forum posts from kat.ph/community has been downloaded |
|
15:32
🔗
|
godane |
i'm getting a lot of 404s in my kat.ph/community dump |
|
15:33
🔗
|
godane |
there is also stuff like this too that needs to be backup: http://kat.ph/blog/TheBatman/ |
|
15:35
🔗
|
godane |
i just have no idea how other then scan my newer dump with http://kat.ph/user/[[:alnum:]]* or something to get user name urls |
|
15:36
🔗
|
godane |
then user part to blog and start grabing |
|
15:36
🔗
|
godane |
i also have to look a images from all urls in this dump |
|
16:01
🔗
|
godane |
blog post like this need to be saved for them: http://kat.ph/blog/Nemesis43/post/5200/ |
|
17:11
🔗
|
godane |
just updated my linux jouranl collection |
|
17:12
🔗
|
ersi |
linux journal collection? |
|
17:12
🔗
|
godane |
you get some here: http://www.missoulapubliclibrary.org/online-resources/317-linux |
|
17:12
🔗
|
godane |
whats funny is that its a library |
|
17:13
🔗
|
ersi |
ah |
|
17:14
🔗
|
godane |
also here: www.iar.unlp.edu.ar/biblio/htdocs/artic/bajad/linuxj/linuxj.htm |
|
17:15
🔗
|
godane |
the library has some pdfs that are index |
|
17:15
🔗
|
godane |
so i grab those index ones too |
|
18:59
🔗
|
arkhive |
I'm picking up 'hundreds' of 5.25" floppies Monday. Will be dumping like crazy. |
|
19:03
🔗
|
winr4r |
arkhive: excellent |
|
19:04
🔗
|
balrog_ |
arkhive: what sort of floppies? |
|
19:19
🔗
|
winr4r |
good evening, btw |
|
19:28
🔗
|
arkhive |
Not sure yet. |
|
19:28
🔗
|
arkhive |
:) |
|
19:28
🔗
|
arkhive |
evenin' |
|
19:33
🔗
|
godane |
hey winr4r |
|
19:33
🔗
|
winr4r |
:) |
|
19:33
🔗
|
winr4r |
been busy, godane? |
|
19:34
🔗
|
godane |
my kat.ph/community still is |
|
19:34
🔗
|
godane |
thanks to alard i will be able to grab all images off of kat.ph/community dump |
|
19:35
🔗
|
godane |
still pulling new images from it |
|
19:37
🔗
|
godane |
so do sort and uniq works not just uniq |
|
20:07
🔗
|
godane |
its in a url loop |
|
20:09
🔗
|
godane |
i think i got most of it anyway |
|
20:10
🔗
|
godane |
i should have blocked ?p_id paths |
|
20:11
🔗
|
godane |
and blocked 26799 post |
|
20:42
🔗
|
godane |
getting a ton of user pictures now |
|
20:44
🔗
|
godane |
there is 5000+ user pics |
|
20:44
🔗
|
godane |
from kastatic.com/i2/u/# path |
|
20:45
🔗
|
godane |
then there is kastatic.com/i2/userpics/# |
|
20:55
🔗
|
godane |
the kastatic.com image dump is very big |
|
20:55
🔗
|
godane |
and i have not got to kastatic.com/i2/userpics/ |
|
20:55
🔗
|
godane |
yet |
|
21:05
🔗
|
godane |
my eyes |
|
21:05
🔗
|
godane |
a fat guy took picture of himself naked |
|
21:06
🔗
|
godane |
that is what is data dump |
|
23:47
🔗
|
godane |
i'm downloading 8-bit theatre |