[00:02] The bytesize is correct, yay [00:03] So to restart, use --warc-dedup it looks like? [00:30] Alard, when you have a chance, it'd be good to have an option in the bookmarklet to read the article. [00:30] You know, so it has a second purpose. [01:06] Oh, if I want to pull in things linked from another domain (not necessarily spider the whole other domain)....how greedy is wget -H exactly? [01:08] it will follow any links to anything on the specified domain. if something on your source domain links to a page on another domain in -H and that page links to other pages on itself, it will go follow them [01:11] So if someone linked to yamaha.com in a forum...oh dear. That's not quite what I'm after. [01:12] if yamaha.com is in the -H [01:13] rt [01:13] er [01:13] hmm [01:13] my memory of what -H was is flawed [01:14] mixed it up with -D [01:14] I am not really sure what -H does [01:14] Ah. [01:58] Hooray, 5 terabytes of Friendster uploaded. [02:58] thanks for the heads-up about the radio appearance- I visited the thisiswhyimbroke.com site and it is indeed really awesome [02:59] (rest of the show is great as well) [03:46] So dangerous! [03:46] Camera Lens Cups [03:47] http://t.co/czPoAYa5 [04:22] Hi, I'm looking for a range of Friendster data, ids 3000-3999, but I don't see it in this list: http://www.archive.org/details/FRIENDSTER-000000000 [04:22] Is it actually available? [04:23] Uploads still happening. [04:23] Should be available, I am sure we have it. [04:23] Great, will that page be updated when it gets uploaded? Or is there a better place to look? [04:23] Keep watching. [04:23] I'm just uploading as much and as fast as I can, and people are providing the uploading machine with data constantly. [04:24] It's just a lot of data, terabytes, and it's taking a while to upload. [04:24] Cool, thanks. [04:31] yes, we have that range [06:07] metadata metadata metadata [06:29] Unfortunately, I don't think it can be summoned like Beetlejuice. [06:31] what? [06:34] Metadata, unlike Hastur and Beetlejuice, cannot be called forth by speaking thrice its name. [06:35] It'd be much easier if it could. [06:35] blowjob blowjob blowjob [06:35] .... [06:35] DAMNIT [06:36] Actually, a few volunteers are doing some kickass work for me. [06:36] That means you have minions? Cool [06:37] Oh yeah, has anyone else encountered segfaults with alard's wget branch? [07:32] 1. Always download and upload when you click 'View PDF', then offer a 'read a copy yourself' button. [07:32] 2. Add an extra link to the box on the JSTOR site: 'View PDF' works as normal, 'Liberate/save/free PDF' does the download-upload thing. [07:32] SketchCow: Like a sort of JSTOR shuffle. [07:32] Three options, which do you prefer? [07:32] 3. Like option 2, with two links, but download-upload is always enabled, even if you click 'View PDF'. [07:32] I prefer 3 [07:32] Wyatt: Have you checked if the 'normal' wget doesn't segfault? What where the options you tried? [07:34] I like 3 [07:34] I just want it that you always have the option, the encouragement, to read it [07:34] You freed it, you read it [07:35] It's not about speed. [07:36] I want a thousand people leisurely sucking them dry [07:36] I want them forced into a dick move. [07:36] That's why the reading is critical. [07:36] You're just reading it! [07:36] I'm trying to decide if I can afford a second pair of new shoes. [07:37] what could be wrong with that? [07:37] two pairs?!? [07:37] bourgeois [07:37] alard: Been fiddling with it a bit, trying to replicate it more thoroughly. If I'm not mistaken it's when trying to resume with --warc-dedup [07:37] You have no idea. [07:37] These are expensive, lovely shoes [07:37] doubtless [07:38] But are they comfortable? That's important too. [07:38] SketchCow: If they'll last long - do it! [07:38] CHOOSE ONE, MAKING YOU BETTER FEELING [07:38] MAKING YOU! [07:38] my shoes tend to wear out after about 2,000,000 steps. [07:38] BETTER FEELING! [07:38] Shoe one: http://www.bornshoes.com/Product.aspx?ProductID=5516 [07:38] http://www.youtube.com/watch?v=hHjSj_nKTws [07:39] Shoe 2: http://www.bornshoes.com/Product.aspx?ProductID=3877 [07:40] I like the first one, it says "sizes 8-14" and someone is of course asking for a size 7 [07:40] why is a data archiving channel talking about shoes [07:40] The black ones look like they might be pretty comfortable. [07:41] BlueMax: because I'm waiting on curl [07:41] Though I really do prefer to try footwear on before purchase; I have a weird-shaped arch. [07:41] Shoes are data too, right? [07:42] When someone acts up, I kick them with the fuckin' shoes [07:42] SketchCow: I got my uploader stuff working, successfully separated in time metadata and data [07:42] also made it so I can update the metadata, then rerun the same uploader and it'll see that only the metadata is new [07:42] Excellent. [07:42] Sorry it's so freaky to learn [07:43] no worries [07:43] Luckily, you can rape one item over and over [07:43] yeah, usually a small item [07:43] :P [07:43] unless you're like you and have more bandwidth than god [07:43] :( [07:43] The littlest ones are the most fun to rape [07:43] uh i heard [07:43] >.> [07:43] * SketchCow takes a seat over there [07:43] rape means steal, not deposit [07:43] THIS IS MUCH BETTER THAN SHOES [07:43] hurdur [07:45] my metadata thing is pretty nice to use, for a 100-line shellscript [07:45] * BlueMax slowly hides the shoes [07:45] display the document in xzgv, ask questions [07:45] Yeah, Friendsmash is helping upload friendster data like crazy [07:45] I'd love to have that here, but right now I don't do that (for bitsavers documents) [07:45] aye [07:46] I'll share it tomorrow or something if you want [07:46] do you know how to display a progress bar on http file upload? [07:50] curl? "If you want a progress meter for HTTP POST or PUT requests, you need to redirect the response output to a file, using shell redirect (>), -o [file] or similar." [07:51] doesn't give me any progress indicator. [07:51] oh. [07:52] oh, um, okay. got it. [07:52] that's more like it [07:59] I'd love to see THAT code. [07:59] Bought the shoes, got free shipping and cheaper price. [08:07] I wonder if you need to feel fabulous while archiving [08:07] I think you can get along with hungover [08:09] OK, note to self, keep it down for SketchCow's sake [08:09] right, he wants to talk about shoes today [08:11] I can't imagine my "I'm uploading friendster!!!" getting anything but boring. [08:12] I'm uploading scans!!! [08:12] watch them go! [08:12] Because I'm uploading a metric asston of friendster. [08:12] http://www.archive.org/details/bellsystem_CD-1C605-01 [08:40] Done, JSTOR now gets two buttons: "View & Save PDF" and "Just Save PDF". [08:41] Can I see? [08:41] I still owe you verbiage [08:43] If you still have the bookmarklet, just go to www.jstor.org and click on it. [08:43] If you've lost the bookmarklet: http://severe-samurai-6114.heroku.com/ [08:44] It would also be nice to decide on the right words to use: what are you doing? Have you 'saved', 'freed', 'liberated', 'stored', 'stolen' the PDF? [08:44] But that probably should depend on the tone of your texts. [08:44] I'm sticking with liberator [08:44] Liberated. [08:45] You read it and gave away a copy. [08:47] So that's probably also what the buttons should say: 'View & Liberate' vs 'Just Liberate'? (Probably reduces confusion, since 'View & Save' could also mean that you save it for yourself.) [08:48] Yes. [08:52] Done. [08:53] Thanks [08:53] There are 449.287 free articles to download, by the way, so there's probably enough for everyone. [08:53] Agreed. [08:56] OK. [08:56] I just re-read the terms. [08:56] We're golden. [08:57] I owe you verbiage for that page, and we need to have a server that takes in the data. [08:57] I have some theoretical ones. [08:58] Let me know how it pushes it, and so on. [08:58] Also, I guess we need to make a liberator.archiveteam.org [08:59] Ok, bed, it's 5am, we'll talk tomorrow [08:59] Great job. [08:59] I think we'll announce Monday, once we get all ducks in a row [08:59] Okay. Good night. [08:59] (We should probably check first to see if what comes out at the other end is actually useful.) [09:00] Put it in a directory for me. [09:00] Exciting stuff [09:00] It would be really useful if that server could run something like Redis to keep the list of things to do. [09:13] * EDream Great Electronics Sale! Prices are reduced up to 50%! Laptops, PDAs, Tablet PCs and more only at X Laptops Co, Ltd. Check us out at http://XLaptops.net [09:59] So what's this new project if I can ask [10:30] It's to archive ALL electronics! And have a GREAT SALE! reduced prices up to 50%! [17:54] Does anyone know of a way to archive articles / newspapers from the Google News Archives? [17:55] they stopped adding new content back in May, and now they've removed the archive search page that was at http://www.google.com/archivesearch/advanced_search [17:55] how long until they kill it completely? [18:04] hmm [18:48] Heads up, gang: [18:48] http://s.assetbar.com/index [18:51] http://public.numair.com/2011_fbfool.html [19:14] they sound like they were more interested in their technology than what they were doing with it: http://www.assetbar.com/index_about_us [19:19] anyone know a pastebin where i can upload a 5mb textfile? or is it my browser that refuses to paste [19:21] try Chrome/Chromium [19:22] it froze [19:22] it pasted [19:23] The connection to pastebin.com was interrupted. [19:23] http://p.defau.lt/new.html [19:24] filehost that might be dying http://stashbox.org/ [19:28] thanks [19:28] oh, emijrp aint here [19:42] argh, wget does not like -c and --content-disposition together it seems [19:52] help, i am too dumb to make aria2c simply not download a file if it already exists locally [19:55] --auto-file-renaming=false seems like a dirty hack (and results in an error) [19:57] if [ ! -f $file ]; then $aria ...; fi [19:59] can't do that. i am fetching albums from jamendo and the url i pass to aria2c is not what the downloaded file is named [20:00] example http://www.jamendo.com/get/album/id/album/archiverestricted/redirect/29/?p2pnet=bittorrent&are=ogg3 [20:00] with wget i would need to use --content-disposition [20:08] oh actually aria2c is being smart [20:08] many albums at jamendo were changed, aria2c notices that and decides to download even though the _filename_ already exists [20:08] should have expected that [20:12] eh, on another run it does not notice that filename.1 already is the renamed one [20:12] meh [20:52] speaking about delicious, this is an interesting read: http://simonwillison.net/notes/2006/summit/schachter.txt [20:53] Morals: You have to develop a sense of morals when you build your system. It's [20:53] the user's data; it's not yours. Make sure they can remove themselves and [20:53] their account if they want to. [20:53] hmmm. [20:53] In del.icio.us if a user deletes something they really do purge the data from [20:53] the system. No transaction logs etc for getting stuff back. [21:49] no backups? [22:27] chronomex: I think I would prefer that to having it retained indefinitely, though that opens concerns of malicious deletion by other people, etc. [22:27] yeah...