#archiveteam-bs 2012-08-11,Sat

↑back Search

Time Nickname Message
00:01 🔗 godane shaqfu: I'm getting the ?page now
00:02 🔗 godane i only need --post-data instead of --post-data --user=blah --password=blah
00:02 🔗 godane otherwise i will get ?page=2.html?user=blah.html or something
00:43 🔗 shaqfu Ah, clever
00:51 🔗 instence shaqfu, if you have a gun shoot me in the brain
00:51 🔗 instence or give me temporary amnesia
00:53 🔗 instence i just wish during archiving there was a way to de-stress the brain somehow so you could start fresh
00:53 🔗 instence i guess that is what naps are for
00:53 🔗 instence but time is always of the essence so its like *fuck*
00:59 🔗 Coderjoe woo
00:59 🔗 Coderjoe infocube 2.0 is now at 221%
01:00 🔗 balrog_ wow.
03:10 🔗 godane Coderjoe: i thought we was doing in -bs
03:11 🔗 godane *talking
03:13 🔗 godane looks like starfinder is in avgeeks
03:16 🔗 godane ooks like a ton of nasa videos was saved by avgeeks too
03:16 🔗 Coderjoe i don't need a running tally of what is there
04:39 🔗 godane just found something funny
04:40 🔗 godane i torrent from kat.ph was removed by the request of copyright owner
04:42 🔗 shaqfu Which?
04:43 🔗 godane http://kat.ph/keri-hilson-pretty-girl-rock-2010-single-sw-t4672360.html
12:58 🔗 Schbirid hm, "q2l\#354ft.map": Invalid or incomplete multibyte or wide character". would that be a ascii ì ?
12:58 🔗 Schbirid any idea how i can find out?
12:58 🔗 Schbirid my fs are utf8 but no idea what the source was
13:21 🔗 ersi http://www.michaelnielsen.org/ddi/how-to-crawl-a-quarter-billion-webpages-in-40-hours/ [HN discussion: ] http://news.ycombinator.com/item?id=4367933
13:35 🔗 Schbirid On December 19, 2008, BusinessWeek listed Cuil as one of the most successful U.S. startups of 2008
13:35 🔗 Schbirid , based on the amount of money they raised.
13:36 🔗 godane my kat.ph-community is still going
13:39 🔗 winr4r Schbirid: lol, cuil
13:49 🔗 Schbirid wicked, i mounted that forumlpanet bz2 again and now cpu usage is no problem. i wonder what went wrong the other time
13:49 🔗 Schbirid s
13:49 🔗 Schbirid this rock
14:01 🔗 ersi I've encountered Common Crawl before, but the Everything-Amazon-tech-and-Cloud stuff scares me away
14:15 🔗 alard Can't you just download the data and use it somewhere else?
14:17 🔗 ersi yeah, but you need an Amazon account and pay for the download etc
14:17 🔗 ersi I mean, sure - that's fair. But it make me reluctant to take a look at it
14:20 🔗 alard https://aws-publicdatasets.s3.amazonaws.com/?prefix=common-crawl/crawl-002
14:21 🔗 alard I think you can download everything for free, no account needed.
14:22 🔗 alard https://s3.amazonaws.com/aws-publicdatasets/common-crawl/parse-output/segment/1341690169105/1341826131693_45.arc.gz
14:23 🔗 ersi oh, cool
15:18 🔗 godane all most 13000 forum posts from kat.ph/community has been downloaded
15:32 🔗 godane i'm getting a lot of 404s in my kat.ph/community dump
15:33 🔗 godane there is also stuff like this too that needs to be backup: http://kat.ph/blog/TheBatman/
15:35 🔗 godane i just have no idea how other then scan my newer dump with http://kat.ph/user/[[:alnum:]]* or something to get user name urls
15:36 🔗 godane then user part to blog and start grabing
15:36 🔗 godane i also have to look a images from all urls in this dump
16:01 🔗 godane blog post like this need to be saved for them: http://kat.ph/blog/Nemesis43/post/5200/
17:11 🔗 godane just updated my linux jouranl collection
17:12 🔗 ersi linux journal collection?
17:12 🔗 godane you get some here: http://www.missoulapubliclibrary.org/online-resources/317-linux
17:12 🔗 godane whats funny is that its a library
17:13 🔗 ersi ah
17:14 🔗 godane also here: www.iar.unlp.edu.ar/biblio/htdocs/artic/bajad/linuxj/linuxj.htm
17:15 🔗 godane the library has some pdfs that are index
17:15 🔗 godane so i grab those index ones too
18:59 🔗 arkhive I'm picking up 'hundreds' of 5.25" floppies Monday. Will be dumping like crazy.
19:03 🔗 winr4r arkhive: excellent
19:04 🔗 balrog_ arkhive: what sort of floppies?
19:19 🔗 winr4r good evening, btw
19:28 🔗 arkhive Not sure yet.
19:28 🔗 arkhive :)
19:28 🔗 arkhive evenin'
19:33 🔗 godane hey winr4r
19:33 🔗 winr4r :)
19:33 🔗 winr4r been busy, godane?
19:34 🔗 godane my kat.ph/community still is
19:34 🔗 godane thanks to alard i will be able to grab all images off of kat.ph/community dump
19:35 🔗 godane still pulling new images from it
19:37 🔗 godane so do sort and uniq works not just uniq
20:07 🔗 godane its in a url loop
20:09 🔗 godane i think i got most of it anyway
20:10 🔗 godane i should have blocked ?p_id paths
20:11 🔗 godane and blocked 26799 post
20:42 🔗 godane getting a ton of user pictures now
20:44 🔗 godane there is 5000+ user pics
20:44 🔗 godane from kastatic.com/i2/u/# path
20:45 🔗 godane then there is kastatic.com/i2/userpics/#
20:55 🔗 godane the kastatic.com image dump is very big
20:55 🔗 godane and i have not got to kastatic.com/i2/userpics/
20:55 🔗 godane yet
21:05 🔗 godane my eyes
21:05 🔗 godane a fat guy took picture of himself naked
21:06 🔗 godane that is what is data dump
23:47 🔗 godane i'm downloading 8-bit theatre
