#archiveteam 2014-06-15,Sun

↑back Search

Time Nickname Message
10:34 🔗 schbirid we have archived all ~140000 mp3s of earbits now
11:04 🔗 joepie91 131,224 songs according to the frontpage
11:06 🔗 schbirid details ;)
11:29 🔗 Nemo_bis WikiTeam vs. bitrot: 249 : 189 https://wikiapiary.com/w/index.php?title=WikiTeam_websites
11:30 🔗 Nemo_bis (I hope the actual number is better, not all dumps are mapped by wikiapiary yet.)
13:15 🔗 ivan` schbirid: very cool
14:22 🔗 Lord_Nigh this is all the elite-tnk stuff i could find on the internet, some mirrored from before various (gtoal, for instance) takedowns: https://dl.dropboxusercontent.com/u/79094972/elite-tnk.7z
14:23 🔗 Lord_Nigh i have no idea how that should be archived, some probably should be blacked out
14:23 🔗 Lord_Nigh gtoal stuff used to live at http://www.gtoal.com/athome/tailgunner/java/elite/elite-the_new_kind-1.0/
14:24 🔗 Lord_Nigh elitegl still exists at http://web.archive.org/web/20060515000000*/http://homepage.ntlworld.com/paul.dunn4/elitegl.zip
14:25 🔗 Lord_Nigh we have david braben to thank for all this stuff disappearing
14:29 🔗 Lord_Nigh ooh one more file, just found it: ftp://faime.demon.co.uk/pub/files/newkind.tar.bz2
14:35 🔗 Lord_Nigh wow that ftp is unreliable as hell, i may have just taken it down :(
14:43 🔗 joepie91 weird, it shouldn't be
14:43 🔗 joepie91 though, actually
14:43 🔗 joepie91 never mind
14:43 🔗 joepie91 that's a home-hosted box
14:44 🔗 joepie91 Lord_Nigh: demon is (was?) an ISP that allows every customer to get their own DNS subdomain for their home IP
14:44 🔗 joepie91 similar to what XS4ALL (and previously Demon NL before it got absorbed into XS4ALL) do
14:57 🔗 underscor that's sweet
14:57 🔗 underscor wish US ISPs did that
14:58 🔗 underscor I mean, I suppose I'd rather have properly priced symmetric broadband first
14:58 🔗 underscor lol
15:26 🔗 dashcloud there's something weird going on with my angelfire grab- it's now taking between 5 and 6 minutes per page downloaded now. Started off very quickly, but now it's really, really slow
15:27 🔗 schbirid give the tape feeding robot its time ;)
15:37 🔗 db48x heh
16:29 🔗 Nemo_bis yay now automatic links to inside tars! 1.0 GB [contents] https://archive.org/details/wiki.urbandead.com
16:36 🔗 dashcloud so, instead of the OOM killer ending my grab, wget just ran out of memory and killed the grab
16:37 🔗 dashcloud it looks like I got bogged down in a lot of blog links from the pages like this: http://midlandlocks.angelfire.com/blog/index.blog?start=1340286959 should I use --reject-regex (what one?) or reject domain?
17:04 🔗 schbirid yay infinite calendars
17:07 🔗 dashcloud will a *blog* regex work like I want it to? (don't go to any page with blog in the url), or do I need a more targetted regex?
17:08 🔗 schbirid '/blog/' should work, maybe even 'midlandlocks.angelfire.com/blog/' (never tried that and ignoring that the . = any character)
17:09 🔗 schbirid maybe use '/blog/index.blog\?start=' rather?
17:10 🔗 dashcloud I imagine it's on a bunch of pages, so I guess /blog/index.blog\? would be a better choice
17:10 🔗 dashcloud it'd be nice if I had a good list of old-style angelfire pages, instead of the new ones
17:12 🔗 dashcloud anyone tried crawling a webring before? this page: http://leocentaur.angelfire.com/theremin.html mentions it is part of a webring
17:12 🔗 schbirid http://www.angelfire.com/ga/quake4/
17:12 🔗 schbirid http://www.angelfire.com/ult/td/
17:12 🔗 schbirid are two old ones i know
17:13 🔗 schbirid but wasnt there a list of them all available?
17:13 🔗 schbirid dashcloud: http://www.angelfire.com/robots.txt
17:14 🔗 exmic aw yiss
17:14 🔗 exmic sitemaps <3
17:14 🔗 schbirid thanks for archiving angelfire btw, great target
17:15 🔗 dashcloud now that makes things easier
17:15 🔗 schbirid might not be complete but a great start
17:15 🔗 dashcloud is there a way to make wget multi-threaded, or do I need to just run multiple wgets?
17:17 🔗 exmic wpull might do what you want
17:17 🔗 exmic also might be a better choice for a large site like angelfire, because it uses sqlite instead of a list in ram
17:20 🔗 dashcloud I am definitely open to trying different things
17:21 🔗 dashcloud here's the current command I'm using for wget: http://paste.archivingyoursh.it/fucominiha.mel
17:21 🔗 dashcloud how do I get a copy of wpull, and what would an equivalent command look like?
17:27 🔗 exmic instructions are here https://github.com/chfoo/wpull
17:43 🔗 schbirid what's that recommended upload script again? i forgot that curl cannot do multiple files at once and i dont want to use s3cmd again
17:46 🔗 schbirid nvm, i guess i can use multiple --upload-file args for curl
17:49 🔗 exmic internetarchive-python?
17:49 🔗 dashcloud this is probably what you want: https://pypi.python.org/pypi/internetarchive
17:50 🔗 exmic yeah, that
17:51 🔗 joepie91 [18:29] <Nemo_bis> yay now automatic links to inside tars! 1.0 GB [contents] https://archive.org/details/wiki.urbandead.com
17:51 🔗 joepie91 whoo, so the "add that to the item page" thing happened :D
17:52 🔗 joepie91 dashcloud: http://dir.yahoo.com/computers_and_internet/internet/world_wide_web/searching_the_web/webrings/
17:52 🔗 joepie91 but it's Yahoo
17:52 🔗 joepie91 so be fast
17:52 🔗 joepie91 lol
17:54 🔗 dashcloud if my wget grab fails for a fourth time, I'll switch to that instead
17:55 🔗 Nemo_bis joepie91: it did!
18:04 🔗 schbirid greap, that python thing fails with "OverflowError: long int too large to convert to int" on my warc.gz
18:05 🔗 schbirid and a similar issue has not been replied to in 2 months https://github.com/jjjake/ia-wrapper/issues/57
18:09 🔗 SN4T14 schbirid, stop being a peasant and use 64-bit Python. :p
18:09 🔗 schbirid SN4T14: the vm is 32 bit, so no
18:10 🔗 SN4T14 Switch to a 64-bit one, then
18:10 🔗 SN4T14 It's a Python problem, not a script one, it seems
18:10 🔗 schbirid not my problem... >:(
18:11 🔗 SN4T14 >.>
18:11 🔗 SN4T14 Alternatively, split the files into <2GB chunks
18:11 🔗 SN4T14 I'm guessing it's just taking the size in bytes and crapping it's pants.
18:13 🔗 schbirid alternatively i will just use s3cmd again...
18:48 🔗 db48x hmm
21:41 🔗 garyrh https://twitter.com/textfiles/status/478278683551993856
21:42 🔗 garyrh "We will be purging the media files over the next couple of weeks. Please make sure to retain your original copies in case you decide to license them directly in the future."
21:48 🔗 garyrh ooh fun: http://rawporter.s3.amazonaws.com/
21:50 🔗 midas WE ARE PLEASED TO ANNOUNCE RAWPORTER HAS ENTERED INTO AN EXCLUSIVE BUSINESS PARTNERSHIP AND WILL BE JOINING A MUCH LARGER ORGANIZATION.
21:50 🔗 midas oh fuck you yahoo.
21:51 🔗 joepie91 wtf is it with companies and misconfigured S3 buckets
21:51 🔗 joepie91 lol
21:52 🔗 Smiley joepie91: in a good or bad way?
21:52 🔗 joepie91 good way
21:52 🔗 joepie91 https://rawporter.s3.amazonaws.com/?marker=photos/4r0um476v3qmjp.jpg
21:52 🔗 joepie91 here
21:52 🔗 joepie91 marker works
21:52 🔗 joepie91 problem solved
21:52 🔗 joepie91 cc garyrh
21:52 🔗 joepie91 so we can just iterate over their shit, basically
21:53 🔗 garyrh yeah
21:53 🔗 garyrh i'm looking to see how they store videos
21:53 🔗 Smiley yey someone figured out markers.
21:53 🔗 Smiley plz be video/something.bla
21:54 🔗 joepie91 Smiley: markers are well-documented, they just didn't work with earbits because that wasn't direct S3
21:54 🔗 joepie91 but cloudfront-proxied
21:54 🔗 joepie91 which stripped GET params
21:54 🔗 joepie91 (which is why the params didn't appear to do anything)
21:54 🔗 joepie91 anwyay
21:54 🔗 joepie91 anyway *
21:54 🔗 joepie91 marker: just set the last Key you encountered as marker
21:54 🔗 joepie91 and it'll be the first Key of the next page
21:54 🔗 joepie91 all alphabetically sorted
21:55 🔗 joepie91 should be 10 minutes of writing Python, tops
21:57 🔗 joepie91 :p
21:57 🔗 joepie91 bright side: they gave us pictures of the founders, so that SketchCow can put them on the slide-of-shame for his next presentation
21:57 🔗 midas indeed :p
21:57 🔗 midas prominent figure: CTO Michael Robinson, thanks for not securing your S3 bucket
21:58 🔗 midas hm, no CFO? perfect nobody to blame when they get the S3 bill
21:58 🔗 joepie91 hahaha
21:59 🔗 joepie91 I wonder what earbits guys will think
21:59 🔗 joepie91 "... the fuck is that spike"
21:59 🔗 SN4T14 lol
22:00 🔗 garyrh confirmed: videos are on s3
22:00 🔗 SN4T14 I wonder how many people you guys have fucked over with multi-thousand dollar AWS bills. :p
22:00 🔗 garyrh http://rawporter.s3.amazonaws.com/uploads/tmd57xoakjyy64.flv
22:00 🔗 midas and the hand written letter from Jeff Bezos "Thanks for the bandwidth usage this week!"
22:00 🔗 joepie91 hehe
22:01 🔗 joepie91 garyrh: awesome, it'll be in the list then
22:01 🔗 joepie91 so yeah
22:01 🔗 joepie91 somebody write something that iterates
22:01 🔗 joepie91 over the /
22:01 🔗 joepie91 and you're done
22:01 🔗 joepie91 I can't do that now, I need sleep
22:01 🔗 SN4T14 while true; do echo penis; done
22:01 🔗 joepie91 and I'll probably end up accidentally blowing up an S3 datacenter or something
22:01 🔗 SN4T14 That's something that iterates
22:01 🔗 SN4T14 :p
22:01 🔗 SN4T14 Kind of
22:01 🔗 SN4T14 Well, that looks
22:01 🔗 SN4T14 hang on
22:01 🔗 SN4T14 loops
22:02 🔗 garyrh i'll give it a shot
22:02 🔗 SN4T14 iteration=("iteration" "is" "silly"); for i in "${iteration[@]}"; do echo $i; done
22:02 🔗 SN4T14 That's something that iterates! :D
22:03 🔗 joepie91 [00:01] <SN4T14> while true; do echo penis; done
22:03 🔗 joepie91 I'm not sure I'd object to this
22:03 🔗 joepie91 also, ew, bash
22:03 🔗 SN4T14 lol
22:03 🔗 SN4T14 alternatively
22:03 🔗 SN4T14 yes penis
22:03 🔗 midas also, -bs
22:03 🔗 SN4T14 does the same. :p
22:04 🔗 SN4T14 ...I thought I was in -bs. :p
22:04 🔗 joepie91 bs indeed!
22:04 🔗 joepie91 as did I, actually
22:04 🔗 joepie91 lol
22:04 🔗 midas ;-)
22:04 🔗 SN4T14 They both just say #ArchiveTe in HexChat before getting cut off. :p
22:04 🔗 midas lol
22:11 🔗 db48x I have too many ssh connections
22:11 🔗 db48x and too many tmux sessions
22:17 🔗 SN4T14 db48x, put them all inside a tmux session! :D
22:19 🔗 joepie91 I put a tmux in your tmux so you can tmux while you tmux...
22:19 🔗 SN4T14 Oh yes, tmux me harder. :p
22:19 🔗 * SN4T14 moans
22:21 🔗 * joepie91 raises eyebrow
22:21 🔗 SN4T14 Queue the porn music!
22:21 🔗 SN4T14 (Crap, this isn't -bs, why do we keep moving over? >.>)
22:32 🔗 db48x the difference between a window manager and tmux is pretty slim
23:17 🔗 db48x so what's the channel name for rawporter?
23:18 🔗 garyrh db48x, i got a list of files from s3 and am currently downloading them
23:19 🔗 db48x estimated size?
23:19 🔗 garyrh probably a few gigs
23:20 🔗 db48x that's pretty small
23:20 🔗 garyrh yeah, i think the site was small
23:20 🔗 db48x do we know what their url structure was like?
23:21 🔗 db48x or were there any public videos at all?
23:22 🔗 garyrh files were stored on http://rawporter.s3.amazonaws.com/
23:22 🔗 garyrh pictures and videos
23:22 🔗 db48x sure, but if we're going to put up an archive of the site, it will help to know what was visible at what url
23:24 🔗 garyrh profiles: http://rawporter.com/pm/NUMBER, media:http://rawporter.com/m/NUMBER
23:24 🔗 garyrh but now the site is down, so it's hard to tell what else there is/was
23:24 🔗 Lord_Nigh is the s3 down too now?
23:25 🔗 garyrh no s3 is still up, but you can't access the profiles/media on their website anymore
23:26 🔗 db48x https://webcache.googleusercontent.com/search?q=cache:V4yT2PCIjlkJ:rawporter.com/m+&cd=4&hl=en&ct=clnk&gl=us&client=firefox-nightly
23:26 🔗 db48x not very helpful
23:27 🔗 garyrh yep, that's how I found out the s3 link.
23:27 🔗 garyrh too bad there was no warning
23:28 🔗 db48x http://blog.rawporter.com/ is still there
23:29 🔗 zyce there's some cached pages of it on bing, might help
23:33 🔗 ete multiple blog posts angry at how google does not disable right clicking to save images
23:36 🔗 db48x yea, hilarious actually

irclogger-viewer