#archiveteam 2012-03-14,Wed

↑back Search

Time Nickname Message
00:28 🔗 chronomex but we are all stuck with their bullshit from now into eternity
00:39 🔗 db48x chronomex: we always are
00:39 🔗 db48x chronomex: geocities had a novel but ultimately silly way of organizing users
00:40 🔗 db48x gotta deal with that every time you deal with geocities in any way
00:40 🔗 chronomex and they tried to foist it upon the world as a web standard
00:40 🔗 chronomex unlike geocities
00:40 🔗 db48x true :)
01:21 🔗 SketchCow HI
01:21 🔗 SketchCow Better run, faster than my bullet
01:21 🔗 SketchCow DFJustin: Congrats, you're now an admin.
01:22 🔗 DFJustin yay
01:22 🔗 DFJustin already been tweaking some stuff
01:23 🔗 DFJustin also you can throw these in the archiveteam collection or w/e http://www.archive.org/details/konachan-siterip-2010 http://www.archive.org/details/konachan-siterip-2011
01:28 🔗 DFJustin who do I bother about bugs in the isoview script
01:35 🔗 SketchCow Gear them up and mail them to me.
01:35 🔗 SketchCow Ultimately I'd like someone to rewrite it to be pretty and HTML compliant.
01:42 🔗 nitro2k01 Is this some php (or other script) code that reads data directly from an ISO on request?
01:43 🔗 chronomex yes
01:43 🔗 nitro2k01 Maybe there's something to be won from putting that data in a database
01:43 🔗 nitro2k01 Not sure
01:43 🔗 nitro2k01 And my not sure is not sarcasm
01:44 🔗 chronomex or dump a 'find -ls' from every iso into its metadata
01:44 🔗 chronomex same for zip, tar, etc
01:44 🔗 nitro2k01 Or that
01:45 🔗 chronomex put into metadata, database is step #17
01:46 🔗 nitro2k01 Another project someone might want to dump at some point. 8bitcolelctive, if/when the site comes back
01:47 🔗 nitro2k01 About 100 GB of chiptune IIRC
01:47 🔗 chronomex yow, thats a lot of chiptune
01:47 🔗 nitro2k01 Yup
01:48 🔗 nitro2k01 I may be wrong on the exact number, and how it's developed over the last years
01:48 🔗 chronomex sure
01:48 🔗 nitro2k01 But in the order 50-150 GB
01:49 🔗 nitro2k01 Complication: The guy who owns the place is known not to pay his bills in time. And his father is a trigger-happy lawyer
01:49 🔗 nitro2k01 If his bandwidth bill doubled over a month, he might wtf a bit
01:50 🔗 db48x heh
01:50 🔗 chronomex aha excellent
01:50 🔗 chronomex i'm sure he'd be willing to make a donation
01:51 🔗 SketchCow If you liked it you should have put metadata on it.
01:51 🔗 SketchCow Complication whaaaat
01:51 🔗 SketchCow How about this
01:51 🔗 SketchCow How about I talk to Nullsleep about talking to the guy about mirroring it on archive.org
01:52 🔗 nitro2k01 The owner is Jose Torres, hated by most of the old chip world
01:52 🔗 nitro2k01 Most people, including nullsleep, wouldn't want to have anything to do with him if they could help it
01:54 🔗 SketchCow haa
01:54 🔗 SketchCow Well, that's charming.
01:57 🔗 chronomex yay computers
01:57 🔗 chronomex bringing out the dickheads in all of us
02:01 🔗 nitro2k01 I'm in the mood for some internet http://welcometointernet.org/
02:01 🔗 chronomex iiiiinternetttt
02:01 🔗 chronomex we love us some iiiiinternettttt
02:04 🔗 nitro2k01 Also, I need to learn wget/set up a way to mirror page. Far to often shit just disappears on me
02:04 🔗 nitro2k01 Not cool, internet
02:04 🔗 nitro2k01 Today, devrs.com
02:04 🔗 lemonkey blogs.starwars.com shutting down
02:04 🔗 lemonkey http://www.theforce.net/latestnews/story/BlogsStarWarscom_Is_Shutting_Down_144246.asp
02:05 🔗 lemonkey following cancellation of hyperspace fan club
02:05 🔗 shaqfu Speaking of Twitter, is Posterous at-risk now?
02:05 🔗 lemonkey YES
02:05 🔗 lemonkey it was an acqui-hire
02:05 🔗 lemonkey so I don't think they're interested in the posterous tech at all
02:06 🔗 nitro2k01 Luckily, someone had a mirror of it from a couple of years back (including zip files) and with some addition of GCache and IA material I should be able to create a pretty up to date copy
02:06 🔗 lemonkey I think they're providing migration tools for tumblr and google blogs
02:06 🔗 lemonkey er to
07:24 🔗 SketchCow Well, that was a great show
07:24 🔗 SketchCow I think a dude was hitting on me
07:24 🔗 SketchCow But no boldness.
07:24 🔗 SketchCow Danced, but no bear hug
07:24 🔗 SketchCow Go for the bear hug, jesus
07:24 🔗 SketchCow What's to lose, it's 1am
07:38 🔗 chronomex for serious
07:40 🔗 chronomex you can't get laid if you don't ask
07:40 🔗 chronomex why'd you let him get away, man
07:40 🔗 chronomex I'm sure that would have made an awesome story
07:50 🔗 jonas__ hi:)
07:51 🔗 chronomex y0
07:55 🔗 SketchCow Cool story, bro
07:55 🔗 SketchCow Tie a handkerchief to his angle
07:55 🔗 SketchCow ankle
07:59 🔗 chronomex what's up, jonas__?
08:02 🔗 SketchCow OK, next. http://www.archiveteam.org/index.php?title=Main_Page
08:02 🔗 SketchCow Are people seeing the new site?
08:02 🔗 SketchCow With the broken thumbnail?
08:02 🔗 chronomex I see we are going to rescue your shit
08:03 🔗 jonas__ This page was last modified on 25 January 2012, at 20:50.
08:07 🔗 SketchCow I see it didn't 'take'.
08:18 🔗 SketchCow OK, it now 'took".
08:18 🔗 SketchCow It's hope it work.s
08:19 🔗 jonas__ f5,f5,f5,not yet
08:24 🔗 jonas__ whats your current take on me.com? wondering to join and put my nick in the "highscore at memac.heroku
08:26 🔗 chronomex that'd be excellent
08:26 🔗 chronomex me.com is chugging along slowly
08:26 🔗 chronomex jesus shit I didn't realize we had 50T already
08:36 🔗 jonas__ archiveteam is yet only allowed to store at archive.org not run the scripts there? ^_^
08:37 🔗 jonas__ and how were the 200k usernames collected actually, is that list complete?
08:38 🔗 chronomex as usual we have no good way to know for sure
08:39 🔗 ersi Chaos is our fuel
08:40 🔗 ersi I'd be willing to bet it isn't a complete list of usernames, but you know what's greater than 0? 200k!
08:42 🔗 jonas__ :)
08:43 🔗 jonas__ just wondering how its made
08:43 🔗 chronomex google + wordlists
08:43 🔗 ersi most likely yeah
08:44 🔗 chronomex no, like, that's how archiveteam does this shit.
08:44 🔗 chronomex google + wordlists are source number one
08:46 🔗 jonas__ didnt google increase restrictions on that?
08:47 🔗 chronomex the whole world hates us
08:47 🔗 chronomex ;)
08:48 🔗 jonas__ *shakehands*
08:49 🔗 jonas__ so you need a big set of IPs, which arnt commonly known proxies, to get >99% of the google results with the words list
08:49 🔗 jonas__ ?
08:51 🔗 chronomex I dunno, I haven't done that side of things in a while
08:52 🔗 ersi jonas__: IPv6 man, IPv6
08:55 🔗 SketchCow Good luck, I'm behind seven dead hookers
09:16 🔗 db48x jonas__: you'd have to ask alard, I think he was the one that came up with the user list
09:16 🔗 db48x also, it's fairly easy to add in new users if you can find them :)
09:20 🔗 alard Hi.
09:20 🔗 alard ipv6 is indeed very helpful.
09:50 🔗 SketchCow Can people visit www.tapedocumentary.com
09:52 🔗 db48x SketchCow: yes
09:52 🔗 SketchCow Thanks
09:52 🔗 db48x your </td> is after your </tr>
09:52 🔗 db48x should be the other way around
09:52 🔗 db48x and you have no doctype
09:53 🔗 db48x no html5 for you
10:05 🔗 db48x SketchCow: what did you change on the front page?
10:06 🔗 db48x I don't see anything in the history...
10:06 🔗 db48x I do see a new spammer though :)
10:08 🔗 SketchCow Do you.
10:08 🔗 db48x alas, I do
10:09 🔗 SketchCow Extra bonus points for telling me before I fall asleep
10:10 🔗 db48x http://www.archiveteam.org/index.php?title=User:DugaldHurst3751
10:11 🔗 SketchCow Congrats on misdirecting me.
10:12 🔗 db48x oh?
10:13 🔗 SketchCow Spammer will temporarily win
10:13 🔗 SketchCow We were discussing tapedocumentary.com
10:13 🔗 SketchCow then you go 'the front page"
10:13 🔗 SketchCow So I wasted, oh, 2-3 minutes on "find spammer on front page of tapedocumentary.com"
10:14 🔗 SketchCow Going to bed now, will deal tomorrow
10:14 🔗 db48x oh, my bad
10:15 🔗 db48x sleep well
10:15 🔗 SketchCow I am finally hunkering down and fixing all 10 sites that have spam or other issues.
10:15 🔗 SketchCow And are hacked a la dreamhost.
10:16 🔗 SketchCow This is taking a little time.
10:17 🔗 db48x yea, I imagine
10:20 🔗 SketchCow New archiveteam.org has fixed image.
10:20 🔗 SketchCow I found a weird bug there.
12:13 🔗 PepsiMax What was the last reason I quit?
12:13 🔗 PepsiMax I seem to have have a very borken IPv6 connection.
12:14 🔗 db48x PepsiMax: just a ping timeout
13:19 🔗 PepsiMax db48x: darnit, then something is wrong with mah IPv6.
13:20 🔗 db48x alas
13:20 🔗 db48x is it a tunnel?
14:47 🔗 PepsiMax db48x: no, my router supports native IPv6.
14:48 🔗 PepsiMax I told Irssi to stop preffering IPv6, so I should be on IPv4 now.
14:57 🔗 alard kennethre: Hi. The MobileMe break is over, apparently. So if you want start again... (But maybe your virtual bill is already large enough.)
14:57 🔗 kennethre alard: excellent! I'll spin up a few
14:58 🔗 kennethre alard: I need to think about the virtual bill though :)
14:58 🔗 alard Sure. But everything helps!
15:21 🔗 SketchCow MORNING
15:21 🔗 SketchCow OK, there goes the spammers on archive.org
15:28 🔗 isforinse I havn't tried to get a range of content from archive.org in a while. How would I go about downloading all of project gutenberg?
15:32 🔗 SketchCow I mean archiveteam.org
15:32 🔗 SketchCow Now adding the Cite and Captcha extensions back, now that the rest is working.
15:36 🔗 isforinse I guess I can just wget http://gutenberg.readingroo.ms/
15:39 🔗 alard isforinse: Or use rsync. http://www.gutenberg.org/wiki/Gutenberg:Mirroring_How-To
15:48 🔗 SketchCow OK, I got the Cite and Blacklist extensions in
15:48 🔗 SketchCow And UserMerge
15:55 🔗 DFJustin underscor wrote an ia_grab script but I haven't seen it posted anywhere, he got distracted by doing it with git-annex instead
15:56 🔗 DFJustin which is cool but not really a grab-n-go solution for people
16:10 🔗 db48x yea, git annex is shiny
16:12 🔗 godane is there a way to bypass a 302 with wget
16:12 🔗 godane i trying to get the whitepapers from the register
16:13 🔗 Schbirid well, depends what triggers it
16:13 🔗 Schbirid wait what
16:13 🔗 godane it goes to a login screen
16:13 🔗 godane there is 2404 whitepapers on the register
16:14 🔗 Schbirid freely available?
16:14 🔗 godane you have to login to get them
16:15 🔗 godane i have a login account but i don't want to manually download everyone though firefox
16:15 🔗 Schbirid then login, save the cookie data, use the cookie with wget
16:15 🔗 Schbirid might wor
16:15 🔗 Schbirid k
16:27 🔗 Coderjoe you need to figure out the login form data you need to send, as well as the url to send it to. then you can tell wget to save cookies (including session cookies) to a file while sending the form data to the login handler. then, on your download attempts with wget, you just tell it to load the cookies from the file
16:27 🔗 alard wget --load-cookies=cookies.txt --keep-session-cookies http://.../download
16:27 🔗 alard wget --save-cookies=cookies.txt --keep-session-cookies --post-data="username=X&password=Y" http://.../login
16:27 🔗 Coderjoe (provided they are using cookies to track logins and not http auth)
16:28 🔗 Coderjoe pretty much what alard said as well. I don't think he provided the form data you need, however. you still need to figure that out.
16:28 🔗 alard :)
16:33 🔗 godane my cookie is always empty
16:33 🔗 isforinse godane: link to site?
16:33 🔗 godane http://account.theregister.co.uk/login/
16:35 🔗 isforinse <form action="/paper/download/2339/tms-customssd-outlook-report-final.pdf" method="get">
16:35 🔗 isforinse hrrm, nvm
16:37 🔗 Coderjoe looks like you may need to make something to give the urls to wget, since I am pretty sure wget will not spider forms
17:42 🔗 db48x http://raganwald.posterous.com/dear-landlord
18:05 🔗 godane how do you redirect wget to get the right filename?
18:06 🔗 godane i do this: wget --keep-session-cookies --load-cookies=cookies.txt http://whitepapers.theregister.co.uk/paper/download/0003
18:07 🔗 godane it redirects to filename in browser but not with wget
18:07 🔗 db48x use -O
18:08 🔗 godane i want there file name
18:08 🔗 godane i don't want to have to type it in for everyone
18:09 🔗 godane *every time
18:11 🔗 isforinse godane: does the file provide the filename?
18:11 🔗 isforinse ...download/0003 probably saves with a useless filename, yes?
18:12 🔗 db48x I imagine they're giving the filename in one of the headers
18:12 🔗 db48x Content-disposition or whatever it is
18:14 🔗 alard I think you need --trust-server-names for that.
18:31 🔗 SketchCow WELL TODAY JASON LEARNED A LITTLE LESSON ABOUT NPR'S "ON THE MEDIA"
18:31 🔗 SketchCow I think the best part was when he compared me to OJ Simpson
18:32 🔗 SketchCow Also, they kept gearing the conversation to Ghostbusters and we are like Ghostbusters and aren't we like Ghostbusters and at the end the host said "Cue the Ghostbusters theme" so I think we know where this is going.
18:32 🔗 chronomex this deserves an image macro
18:33 🔗 chronomex archiveteam: ghostbusters or oj simpson? BOTH!
18:33 🔗 nitro2k01 Link?
18:34 🔗 chronomex host has yet to edit himself into seeming less hung over
18:34 🔗 chronomex which he totally is, clearly
18:34 🔗 chronomex (I listened in)
18:36 🔗 db48x heh
18:36 🔗 SketchCow yes, it was meant to be 'come, see how professionals conduct themselves' and it was 'come, see a professional trip over his own dick so many times you want to call 911 and make him recite the alphabet'
18:37 🔗 SketchCow He said "how's the law" and I said this and that
18:37 🔗 SketchCow And he said "Well, OJ simpson is in jail for using guns to get his own property"
18:37 🔗 SketchCow And I said "Archive Team does not empoy weapons or firepower to acquire websites."
18:37 🔗 SketchCow I hope that goes in
18:37 🔗 SketchCow Singularly awful
18:38 🔗 chronomex a singularity of fuckery
18:39 🔗 nitro2k01 "This is a robbery. Hand over over your web site or I'll shoot your fucking CPU out!"
18:40 🔗 chronomex haxxxx
18:41 🔗 chronomex watchout we haax the gibsonnn
18:44 🔗 chronomex i think the show comes out on fridays
19:02 🔗 lemonkey oink shutting down http://techcrunch.com/2012/03/14/kevin-roses-oink-shuts-down/
19:03 🔗 chronomex they have export tool, they say
19:24 🔗 shaqfu_ Oh, other Oink
19:24 🔗 shaqfu I was about to say; that's been gone for years...
19:43 🔗 lemonkey http://lifehacker.com/5893278/how-to-protect-your-data-in-the-event-of-a-webapp-shutdown-and-prevent-the-problem-in-the-future
19:46 🔗 lemonkey doh
19:47 🔗 lemonkey joe's barbershop to be shutdown for a few months
19:47 🔗 lemonkey fire in basement
19:47 🔗 lemonkey http://sfist.com/2012/03/14/cyclists_terrorized_on_the_wiggle_s.php
19:48 🔗 lemonkey sry wrong chan
19:49 🔗 chronomex heh
20:53 🔗 Coderjoe hahaha
20:53 🔗 Coderjoe http://holistic.xkcd.com/
20:54 🔗 db48x yea
21:10 🔗 godane alard: --trust-server-names doesn't work
21:10 🔗 alard But you do get a file? Or not even that?
21:11 🔗 godane i get the file as 0003
21:12 🔗 godane but i don't want to use -O cause there is over 2000 urls like this
21:16 🔗 db48x godane: use web-sniffer.net or something to get a look at the response headers
21:19 🔗 isforinse Who is it who does the wiki archiving?
21:19 🔗 alard godane: Try --content-disposition (if your wget is recent enough).
21:19 🔗 Coderjoe --content-disposition
21:19 🔗 Coderjoe and i'd been using it for a few years. I'd be surprised if the version in use is too old
21:19 🔗 godane that worked
21:20 🔗 alard Coderjoe: The manual says 'experimental (not fully-functional)', but yeah, who knows how long it has been like that.
21:21 🔗 ersi isforinse: emijrp is a driving force at that
21:21 🔗 ersi isforinse: I'd recommend popping into #wikiteam
21:21 🔗 isforinse That's the name, thanks ersi
21:21 🔗 ersi he's not there (or here) right now though
21:22 🔗 ersi yeah :) np, you're welcome
21:22 🔗 Coderjoe date on the earliest script I have with --content-disposition: Apr 18 2009
21:31 🔗 godane looks like 4 thur 8 whitepapers don't exist
21:33 🔗 chronomex ugghhhh im getting sick
21:33 🔗 chronomex good thing im coming home RIGHT NOW
21:35 🔗 nitro2k01 More data deletion http://0000free.com/e/403.html
21:35 🔗 nitro2k01 (A site I liked was hosted there.)
21:51 🔗 godane looks like some of the whitepapers needs more information
21:52 🔗 godane so i end up doing those manually
22:00 🔗 Nemo_bis already linked here? http://www.ipetitions.com/petition/save-cbc-music-archives
22:02 🔗 SketchCow Yes
22:02 🔗 SketchCow You know what I hate? Petitions.
22:02 🔗 SketchCow Know what I like? Action.
22:02 🔗 SketchCow Over two months ago I called the archivists in charge of the archives.
22:02 🔗 SketchCow They're going to good homes, and archive.org is there if they can't be found.
22:02 🔗 is4 +1
22:10 🔗 godane oh goodie
22:11 🔗 godane i'm getting a 34mb flash video
22:12 🔗 Nemo_bis So it's not true that they're throwing away things which have not been digitized?
22:13 🔗 Nemo_bis Of course I didn't link it to ask signatures. :p
22:15 🔗 Nemo_bis -store-buys-entire-music-archive-cbc-calgary
22:15 🔗 Nemo_bis ah http://calgary.openfile.ca/blog/curator-blog/curated-news/2012/calgary-music
22:38 🔗 kennethre of interest: githubarchive.org
22:38 🔗 kennethre code: https://github.com/igrigorik/githubarchive.org/
22:56 🔗 closure good way to eventually enumerate all github repos
22:56 🔗 * closure removes that from his todo list

irclogger-viewer