[00:29] episode 4x07 is up of hak5 [01:04] how's this look? http://pastebin.com/BwEgbDT1 [01:13] how about some line breaks [01:16] I was told one giant line of text is what was wanted, so that's how I re-formatted it [01:17] if you've got an example or a sample I can look at, that would be appreciated [01:17] Almost got it. [01:17] I don't literally mean HEADER: [01:18] I want it like this, no line reaks [01:18] But HEADER isn't needed, page numbers not needed. [01:27] how's this: http://pastebin.com/w85muBcN ? [01:33] Big K Magazine, April 1984. Contents. Games Programs: ROCKET for VIC 20, BOMB RUN for ORIC, DEMON DRIVER for COMMODORE 64, DOWN FALL for BBC Model 8, ESCAPE for SPECTRUM. SOFTWARE REVIEWS: Charlie Nicholas reviews for us. HARDWARE: Wonderful Widgets, Brilliant Bodges- A Cheapo Epro, Goad Your Code the 6502 Way, Squaring Up- Atari v. Acorn 91. FEATURES: Do you Sincerely Want to be Rich? ... [01:33] See what I did there? [01:33] okay- got it [01:34] I guess Games Programs should be GAMES PROGRAMS: [01:38] What about the free-standing bits at the end? [01:42] Don't go crazy [01:42] okay then- http://pastebin.com/FDBHL2MX [01:43] Yes. [01:43] proceed [01:44] move on to the next one? or did you want more stuff captured from the contents page? [01:45] I mean go on, do it all [01:45] The style is good [01:45] Do this issue, and move on to next issue, this will be good. [01:45] okay [02:16] did you also want the boilerplate at the bottom of the page? [02:23] No [02:26] okay- think I got the first one done then: http://pastebin.com/79nPpTLk [02:32] I showed your post about the pirate radio archive to a guy I know online who's into it, and he pointed me to http://radio-airchecks.nl/ , which he says has 500 GB at least of pirate radio recordings [02:42] just watch a clip of glenn beck on black tom explosion [02:42] http://en.wikipedia.org/wiki/Black_Tom_explosion [04:56] Well, even though it's taking a billion years, I am uploading those 80 Microcomputers. [05:20] Just blew in 61 issues of Commodore Format magazine, which was dedicated to the Commodore 64. [05:20] http://www.archive.org/details/commodore-format-magazine [05:20] Should be ready for perusal in an hour or so. [05:50] just thought I say hi and let you know I haven't forgotten you. One day... in some way... you will be repaid. Dishonesty is too nice a word for you... Have a happy holiday season... NOT. [05:50] My e-mail is awesome [05:50] This is a guy whose hard drive I still have [05:50] Some of you have noticed my slow turnaround [05:50] He turned abusive [05:51] Not surprisingly, I think you'll understand, his turnaround became slower [05:51] So we're now in xeno's paradox [05:54] ShetchCow: i readed about your distriwiki [05:54] couldn't that be done using a git/mecurinal like vcs [05:57] It could be done a ton of ways. [05:57] It'd be a module. [05:57] i think my linux source dvd will be of some use [05:58] i have full distro that will be able to rebuild it self [05:58] websites, sources tarballs ( recompressed to .tar.lzma to save space), and repos of projects [06:01] i also use dokuwiki cause slitaz doc website does [06:02] dokuwiki puts all docs in plan text :-D [06:02] the history is a problem when the users of changed history doesn't exist though [06:04] Not true [06:04] It means there has to be a shared update process [06:04] And it means you will have race conditions [06:05] And those are all problems that need fixing. [06:05] ok [06:06] but all changes should be able to be reverse anytime? [06:06] i only think git or mecurinal cause you can just branch off the master/default branch [06:26] not often you see SketchCow talking with himself [06:29] So.. this is the fifth day of my instructables.com mirroring [07:07] It' a big one. [07:07] indeed [07:19] It's growing slowly.. I'm up at 28GB now [07:20] that's 5-6GB/day [07:21] someone here had downloaded 40GB "once upon a time).. I'm atleast 1-3 days away from that [07:48] http://vimeo.com/28976327 [07:50] Ooh [07:52] Hm, I'd do either 6502 or Tape - but Arcade would be interesting as well, even though that's atleast a little covered [07:53] SketchCow: interesting video. I thought the tense music was odd though [07:53] I laughed at the tagline for the Tape documentary though, which is good :) [07:53] It added atmosphere [07:54] This whole thing is completely off-kilter. [07:55] yea, the atmosphere felt wrong somehow [07:56] i liked it [07:56] felt, human. [07:58] It is intentionally wrong. [07:58] You won't forget it soon, will you. [08:01] Ha, strategic [08:02] The whole thing is strategic. [08:02] It appeals to a certain kind of person. [08:02] A person who would give me hundreds of dollars and not see a thing for years. [08:03] That's not reddit people. [08:03] :) [08:03] It also gets weirder the more times you play it. [08:04] heh [08:06] Doesn't it. [08:07] It does. [08:09] You know those guys who make something filmy and then run around showing you their stuff and watching you and quizzing you on what you think? [08:09] I ain't one of those guys. [08:10] But I will say, I accounted for the liking two out of three. [08:10] You can invest with premiums in two [08:11] or you can invest in all three for slightly less than normal all three. [08:11] ah, I was mislead by the "beta" designation [08:12] Well, I like reactions. [08:12] But I don't seek it out. [08:13] Either people will invest, and I'll hit my goal, or they won't. [08:13] And then I merely have to archive forever [08:13] heh [08:14] Beta is merely my worry of it not rendering., [08:14] ah [08:15] so which of the three would you prefer to do? [08:16] All [08:16] otherwise he would have promoted only one :) (I think) [08:17] ha ha [08:17] Uploaded it to kickstarter page (preview) [08:17] I just love it [08:17] What a weird video [08:23] * db48x2 yawns [09:07] hrm [09:07] I can't find my book of stamps [10:09] I had it here somewhere not even a year ago [10:41] finally, my cap shall refresh at midnight! [10:42] heh [10:44] gah! i hate this stupid shake thing in windows7 [10:45] shake thing? [10:45] yeah, where you grab the window panel [10:45] you shake it left and right like twice and it minimizes all the windows except the one your dragging [10:47] oh, right [10:47] why do you hate it? [10:47] because i have 3 monitors, so, when i drag something across, it assumes im doing the shake thing [10:47] and minimizes everything when i like ot have it up where it is [10:48] huh, I don't have that problem with my three monitors [10:48] you using eyefinity though? [10:48] hmm, not at the moment [10:48] ahh, se eim using eyefinity [10:48] see im* [10:50] lol [10:50] I turned on eyefinity and it's got my monitors arranged vertically [10:52] you should be able to drga the monitors around on the plotter thing to put them right lol [10:52] no, I had to disable eyefinity and set it up again [10:52] O.o [10:52] then it let me choose between 1x3 and 3x1 [10:53] ahh yeah [10:53] ive done 5 monitor setups with eyefinity lol [10:54] ok, now I've got it set up "right" [10:54] awsome! [10:54] dragging windows around doesn't trigger the shake gesture though [10:54] even across monitor boundaries [10:54] for mine it does oddly enough [10:54] im not sure how i can turn it off [10:54] weird [10:55] though im using an XFX card and sometimes i wonder how shoddy the drivers are [10:56] already got a problem with my display port because of the cards bios and they refused to give me an updated bios [10:57] fun [10:57] well, I have to go back to my old settings [10:57] eyefinity is ok for games, but terrible for a normal windows desktop [10:57] and also my monitors are not all the same resolution [10:59] there, back to normal [11:00] well, except that all my windows are on the wrong displays :) [11:00] Arg, OCD overload [11:00] Windows are not allowed to move [11:00] :) [11:01] they should all be maximized [11:01] or at least almost all of them should be maximized [11:02] main screen has my main program maximised and my 2nd screen has IRC and whatever other chat windows laid out in a way where I can see most of them [11:27] SketchCow: nice shortened url in your pitch vid :P [11:28] the page doesn't appear to be up yet. are you proposing all three of those or letting people choose one? [11:29] because, fuck, I want all three [11:42] random question: how do you store your archives? warc? arc? tgz of directory? [11:43] I store them in a gigantic .derp [11:43] all files appended after each other [11:44] do you have a .herp file with the offsets [11:48] lol [11:48] No, that's not derpy or herpy at all [11:49] :( [11:49] just I notice from http://www.archiveteam.org/index.php?title=Wget_with_WARC_output you're using the old version of warctools [11:50] (which is well, unpleasant) [11:50] I just wget, without WARC [11:50] ah ok [11:50] it is just I am the person who is writing the new one [11:50] also, you're free to uphax the code for warc support [11:51] alard wrote that warc support a little while ago [11:51] well the problem is that we use python now instead of C, so it isn't as easily hacked into wget [11:52] but this is why I was asking about warc files [11:52] we who what? [11:52] oh [11:52] and are you saying WARC changes frequently? [11:52] I work for the company that wrote warc-tools (the c lib on google code) [11:53] I'll take a python WARC library [11:53] We no longer use or maintain it, and we're currently using a python library instead [11:53] ah-ha [11:53] sorry, yeah I should have owned up to that earlier [11:55] Cameron_D: I could probably knock up a wget like script that uses it [11:55] Cameron_D: we use it in production but heh, my attention has been on the bits that use the library rather than the library itself [11:55] Hi all. [11:55] but I have time alloted to deal with support issues for it [11:55] http://code.hanzoarchives.com/warc-tools/overview [11:55] o/ alard [11:55] tef: yes, wget-warc uses the old c version (which seems to work pretty well). [11:59] meh, as long as it doesn't produce unusable WARC archives [11:59] not so far [11:59] (heh) [11:59] but it turns out lots of warcs are a bit special [12:00] oh? [12:00] I found one with unix line separators and gzipped fully rather than crlf and each record gzipped [12:00] and there are a bunch of pre 1.0 ones floating around [12:00] that's always fun [12:01] tef, cant look through the code at the moment, but is there a usage example of sorts? [12:01] Cameron_D: there are some scripts in the repo for opening/reading warcs and arc2warc conversion [12:02] apologies for the lack of documentation. we're a small company and we're a little rushed off our feet at the moment [12:03] it's ok [12:03] i'll see if I can merge in a python wget example to it [12:03] I have some code knocking around for that that doesn't use wget https://github.com/tef/codesamples/tree/master/pyget [12:04] Mmmmh, 8.4GB memory usage from wget [12:04] doesn't use warctools either [12:04] tef, thanks, I'll take a look [12:04] Cameron_D: if you have any questions about warctools email me directly at thomas.figg@hanzoarchives.com [12:05] hmm, i got a question about Wget actually [12:05] I have *some* time alloted to deal with support/features [12:05] about wget, or wget and warc? [12:05] just wget on it's own [12:05] what i want to know is [12:05] say if im poking a url, for example www.example.com/millenium/0001/ and it has a number heirarchy for directories right [12:06] Fortunecity? :) [12:06] is there a script i can use to incrementally increase the number to a specified limit and stop when it hits the number or? [12:06] yeah [12:06] bash [12:06] bash would be your friend, yes [12:07] www.example.com/millenium/{0001..9999}/ should do the trick. it'll be longer than the maximum command line length [12:07] SketchCow: that last shot in the kickstarter vid is kinda awkward [12:07] for i in `seq ... ...`; do ....; done [12:07] db48x2: cheers mate (: [12:09] yw [12:09] ersi: btw, which features of wget are the most useful to you? [12:10] a compress option i think is need for wget-warc [12:10] (my boss is happy for me to make a simplified wget example for the new warctools) [12:10] only cause right now it only saves to as gzip/tar.gz [12:11] godane: there is one [12:11] per record compression ? [12:11] tef: the regular mirror switch, convert links to local ones and keep original (-kK) http/ftp support [12:11] cool [12:11] godane: --no-warc-compression [12:11] --content-disposition is crucial for me [12:12] --random-wait -EkKp --protocol-directories -np --follow-ftp [12:12] was talking about about changing the it to bz2 or .lzma [12:13] gzip is pretty de-facto for warcs [12:13] ok [12:13] oh, and --user-agent, but that's easy to do [12:14] thought it would be nice to add lzma so you can save more space [12:14] this is really helpful [12:15] ersi: most stuff does re-writing after creating warc files [12:15] the idea being the warc record being an exact snapshot of the wire traffic [12:15] (near enough) [12:15] tef: honestly I think it'd be easier to integrate the python library into wget [12:15] tef: that reminds me [12:15] hmm [12:16] tef: alard was working on a way to feed a warc into wget and have wget output the set of mirrored directories [12:16] ah I see [12:16] warc unpacker [12:17] yea [12:17] it could do the -k (and -K) stuff [12:17] well [12:17] in a warc record [12:17] you can have request/response [12:18] as well as conversion records [12:18] yea [12:18] so the -K stuff would be writing those in some fashion [12:18] I had wanted to put conversion records into the warc [12:18] it's a bit tricky, so I haven't done it yet [12:18] (we strip transfer-chunked and content-encoding) [12:21] it seems most of the wget options you guys use are about unpacking/rewriting the content i.e -EkK [12:21] and a few for navigation i.e -p -np --follow-ftp [12:23] so there is less need to clone wget if wget can read from warcs via some method (i.e a proxy) [12:23] and generally wget's link traversing (mirroring) [12:26] i'm not sure what that meansin specific - do you mean keeping an existing archive up to date? [12:27] or do you mean the scope of links that are checked [12:27] the scope of links fetched [12:27] ah [12:27] yea, wget does a lot of parsing of html and css [12:27] it seems to do the job well of going deeper [12:28] and when it's getting page recreciuits it's awesome [12:28] oops, what the hell happened there [12:28] page requisits(sp?) [12:28] to some extent I think it would be better trying to play to the strengths of being written in python - hackable, rather than apeing wget entirely [12:28] i.e scriptable for those more awkward things rather than having to resort to bash :-) [12:29] Nothing wrong with blunt tools, even though I like python [12:29] ;D [12:29] yeah but you have a very good blunt tool [12:30] ersi: thanks again for taking the time to explain this stuff [12:30] fwiw both I and my boss have a soft spot for the work archive team does so we'd like to help out where we can, esp re warctools [12:31] [12:34] anyway, i'll shut up now and i'll talk again when i've got something to show for it [12:34] cheers [12:41] heh [12:41] you can talk whenever you like [12:45] tef: no prob of course :) [12:45] db48x2: I hate that thing where someone comes in with a driveby idea [12:46] :) [12:46] well, it's different from getting feedback when one is more likely to do something about it [12:46] and most of us here have a softspot for IA, so WARC is closeby in our hearts [12:46] even if they... care about robots.txts [12:47] (I see why though, sucks getting blocked or taken unseriously) [12:47] Booya! One more bug/defect reported~ then I'll look extra productive [13:42] http://nationalmap.gov/historical/ [14:30] Schbirid: sweet [14:36] Bluh, ..instructables.. /keyword-iphone/keyword-easy/index.html [14:40] Schbirid: are you able to download any maps from that? [14:41] the links in the 'Download GeoPDF' column all just point back to the search results [14:42] the map search works though [14:43] Morning. [14:44] hello SketchCow [14:45] sorry didnt try [14:45] downloading maps is very slow [14:45] 25kBps :) [14:46] it was announced today so surely a lot of traffic [14:46] yea [16:05] Hey, so I wrote the 80 micro archive that has some "offline" and asked for them [16:05] Less than 24 hours later, here they come. [16:14] http://www.abandonware-magazines.org/index.php [16:16] does anyone have a link to a pdf on an https server? [16:16] I have to test specifically that combination [16:16] That's quite a link, DFJustin [16:31] underscor: today would be a good day to fix your olduse.net shell box. (on Boing Boing) [16:36] underscor: oh, it works again, NM [18:08] http://kck.st/jasonscott [18:15] cool, will brute force spread :) [18:17] big jump from $10 to $100 [18:19] Yes. [18:24] Will these each be shorter/less comprehensive than BBS/Getlamp? [18:28] No. [18:29] wow [18:42] fucking scanner is misbehaving [18:42] today is an angry technology day [18:42] actually it's an angry chronomex day [18:43] SketchCow: radio silence, btw [18:45] grr technology [18:47] ALexis got very sick [18:47] She's in bed since Friday [18:48] oh dear [18:48] send her my regards [18:49] So you're not being ignored. [18:52] ok [18:54] I hope "RISE OF THE METADATA WARRIOR" will be recorded; the title is awesome [18:56] Who is ALexis ? [18:57] SketchCow's boss at IA [18:58] - Create a public, museum-like archive of 3D Porch in about 30 days. This museum will probably just have 500 or so photos. [18:58] damn [18:58] hello everybody [18:58] i dunno if it's significant enough for you guys but I'm just throwing it out there [18:58] it's a site where you can upload 3D photos [18:58] it's not very big/popular and I don't think it's been around very long [18:58] so uh, there's this site called 3D Porch which might be shutting down http://3dporch.com/ [18:58] 50,000 items, 7 files per [18:58] :| [18:58] I wonder how much it costs [18:59] Someone please grab it [19:00] gonna try to get in touch with him [19:00] grab first if it's that small [19:04] man some people do NOT know how to use 3D cameras [19:26] wish I could do higher than "project backer" [19:27] I'm surprised tape is more popular than arcade at the moment [19:28] i am glad :) [19:28] I'm on that one [19:30] It's a good and interesting way to get opinion. :) [19:31] Ok, off to broadway [19:31] seeing a musical for my birthday [19:31] happy birthday! have fun [19:40] I'd like Tape over Arcade as well [19:44] well, buy it now! [19:44] receive it in 4 years or so :) [19:46] There are 46425 photos on 3dporch, I think (the photos on the 'popular' lists). I have a list of the photo ids, downloading them now, unless someone else is doing that too. [19:47] each one is 7 files [19:47] + metadata if you care [19:47] 7? I have 6. [19:48] .jps [19:48] .left.jpg [19:48] .mpo [19:48] .redcyan.jpg [19:48] .right.jpg [19:48] .wiggle.gif [19:48] missing .sbs [19:48] oh nm [19:48] those just rearrange left and right [19:48] you're right [19:49] I am missing the wiggle.thumb [19:49] what kind of speed are you getting [19:49] 4MB/s [19:49] nice. should be quick work [19:49] It's amazon, so as fast as I can. [19:51] how did you get all of the IDs? [19:52] I grabbed the 'popular' pages and extracted the IDs. [19:53] So I only have the popular page stuff, but maybe that's all there is? [19:53] Everything is popular? [19:56] you can go to each type of camera [19:56] and cross check [19:56] hmm [19:56] looks like IDs are just [0-9a-z]{4} [19:56] caps also but yes [19:57] hahah [19:57] http://3dporch.com/b6gp [19:58] i have not yet encountered an uppercase letter in the ids [19:58] alard: I really wish I knew how to do all that :| [19:59] Coderjoe: go to nintendo 3ds there are a bunch on the first page [19:59] ah [19:59] i think my favorite is this one http://3dporch.com/4gro [20:00] [0-9A-Za-z]{4} is enough for 14.8M IDs [20:01] not seeing much in the way of metadata [20:01] hey, author replied [20:01] er, owner [20:01] I am really honored that you would pick my site to archive. [20:01] Right now all the images are hosted on S3, which doesn't provide a convenient gunzip tool. The total data is about 50GB. [20:01] Also, if you crawled 3D Porch, you'd miss a lot of anonymous uploads (about 50-80% of the site's content). [20:01] I think the ideal would be for me to generate a static HTML version of the site, zip that and send it to you, and then let you fetch the individual S3 assets from that HTML. Then when you're done, I'll delete my S3 store. [20:01] viewcount and creator [20:01] What do you think? [20:01] Cool! [20:02] should I have him do that? [20:03] Yes, I think that would be very helpful. Getting a list of the ids is key, having the metadata in the html files is even better. [20:04] replied [20:04] no wonder he can't afford to keep it up if it's all on s3 [20:13] do ip registries such as ripe, arin etc offer their whois databases to the public? [20:14] the content returned in response to whois queries [20:16] i am currently trying to get http://who.is to stop showing an archived version of a site of mine where i used my full name so i hope not .p [20:16] ip assignment whois or domain whois? [20:16] many lookups will actually show you legalese about how youare not allowed to store that information iirc [20:16] sorry [20:16] domain [20:17] <-stooopid [20:17] do you know of whois.sc? [20:17] domaintools.com nowadays [20:17] yes [20:17] they compile that information and sell it for commercial gain [20:17] yeah [20:21] nighty! [20:23] is anyone interested in starting a project to archive ip assignment data, domain whois, and dns history? [20:26] would that be in accord with the archive team philosophy? the intention is to make available this data to the world for free [20:27] the philosophy is do whatever, aslong as you're doing something [20:28] I'd be a bit worried about doing that though, there's a fuckload of fucktards who will spam you to oblivion if you keep that kind of data and say you do online AND provide it [20:30] who would have reason to be angry? (other than domaintools, heh ;) [20:33] i think i'll work independently and write about my progress on archive team wiki [20:36] verisign might not be too happy [20:49] verisign can suck it