#archiveteam 2011-09-23,Fri

↑back Search

Time Nickname Message
00:02 🔗 SketchCow The bytesize is correct, yay
00:03 🔗 Wyatt So to restart, use --warc-dedup it looks like?
00:30 🔗 SketchCow Alard, when you have a chance, it'd be good to have an option in the bookmarklet to read the article.
00:30 🔗 SketchCow You know, so it has a second purpose.
01:06 🔗 Wyatt Oh, if I want to pull in things linked from another domain (not necessarily spider the whole other domain)....how greedy is wget -H exactly?
01:08 🔗 Coderjoe it will follow any links to anything on the specified domain. if something on your source domain links to a page on another domain in -H and that page links to other pages on itself, it will go follow them
01:11 🔗 Wyatt So if someone linked to yamaha.com in a forum...oh dear. That's not quite what I'm after.
01:12 🔗 Coderjoe if yamaha.com is in the -H
01:13 🔗 Coderjoe rt
01:13 🔗 Coderjoe er
01:13 🔗 Coderjoe hmm
01:13 🔗 Coderjoe my memory of what -H was is flawed
01:14 🔗 Coderjoe mixed it up with -D
01:14 🔗 Coderjoe I am not really sure what -H does
01:14 🔗 Wyatt Ah.
01:58 🔗 SketchCow Hooray, 5 terabytes of Friendster uploaded.
02:58 🔗 dashcloud thanks for the heads-up about the radio appearance- I visited the thisiswhyimbroke.com site and it is indeed really awesome
02:59 🔗 dashcloud (rest of the show is great as well)
03:46 🔗 SketchCow So dangerous!
03:46 🔗 SketchCow Camera Lens Cups
03:47 🔗 SketchCow http://t.co/czPoAYa5
04:22 🔗 bertrando Hi, I'm looking for a range of Friendster data, ids 3000-3999, but I don't see it in this list: http://www.archive.org/details/FRIENDSTER-000000000
04:22 🔗 bertrando Is it actually available?
04:23 🔗 SketchCow Uploads still happening.
04:23 🔗 SketchCow Should be available, I am sure we have it.
04:23 🔗 bertrando Great, will that page be updated when it gets uploaded? Or is there a better place to look?
04:23 🔗 SketchCow Keep watching.
04:23 🔗 SketchCow I'm just uploading as much and as fast as I can, and people are providing the uploading machine with data constantly.
04:24 🔗 SketchCow It's just a lot of data, terabytes, and it's taking a while to upload.
04:24 🔗 bertrando Cool, thanks.
04:31 🔗 db48x yes, we have that range
06:07 🔗 chronomex metadata metadata metadata
06:29 🔗 Wyatt Unfortunately, I don't think it can be summoned like Beetlejuice.
06:31 🔗 chronomex what?
06:34 🔗 Wyatt Metadata, unlike Hastur and Beetlejuice, cannot be called forth by speaking thrice its name.
06:35 🔗 Wyatt It'd be much easier if it could.
06:35 🔗 SketchCow blowjob blowjob blowjob
06:35 🔗 SketchCow ....
06:35 🔗 SketchCow DAMNIT
06:36 🔗 SketchCow Actually, a few volunteers are doing some kickass work for me.
06:36 🔗 Wyatt That means you have minions? Cool
06:37 🔗 Wyatt Oh yeah, has anyone else encountered segfaults with alard's wget branch?
07:32 🔗 alard 1. Always download and upload when you click 'View PDF', then offer a 'read a copy yourself' button.
07:32 🔗 alard 2. Add an extra link to the box on the JSTOR site: 'View PDF' works as normal, 'Liberate/save/free PDF' does the download-upload thing.
07:32 🔗 alard SketchCow: Like a sort of JSTOR shuffle.
07:32 🔗 alard Three options, which do you prefer?
07:32 🔗 alard 3. Like option 2, with two links, but download-upload is always enabled, even if you click 'View PDF'.
07:32 🔗 chronomex I prefer 3
07:32 🔗 alard Wyatt: Have you checked if the 'normal' wget doesn't segfault? What where the options you tried?
07:34 🔗 SketchCow I like 3
07:34 🔗 SketchCow I just want it that you always have the option, the encouragement, to read it
07:34 🔗 SketchCow You freed it, you read it
07:35 🔗 SketchCow It's not about speed.
07:36 🔗 SketchCow I want a thousand people leisurely sucking them dry
07:36 🔗 SketchCow I want them forced into a dick move.
07:36 🔗 SketchCow That's why the reading is critical.
07:36 🔗 SketchCow You're just reading it!
07:36 🔗 SketchCow I'm trying to decide if I can afford a second pair of new shoes.
07:37 🔗 chronomex what could be wrong with that?
07:37 🔗 chronomex two pairs?!?
07:37 🔗 chronomex bourgeois
07:37 🔗 Wyatt alard: Been fiddling with it a bit, trying to replicate it more thoroughly. If I'm not mistaken it's when trying to resume with --warc-dedup
07:37 🔗 SketchCow You have no idea.
07:37 🔗 SketchCow These are expensive, lovely shoes
07:37 🔗 chronomex doubtless
07:38 🔗 Wyatt But are they comfortable? That's important too.
07:38 🔗 ersi SketchCow: If they'll last long - do it!
07:38 🔗 ersi CHOOSE ONE, MAKING YOU BETTER FEELING
07:38 🔗 ersi MAKING YOU!
07:38 🔗 chronomex my shoes tend to wear out after about 2,000,000 steps.
07:38 🔗 ersi BETTER FEELING!
07:38 🔗 SketchCow Shoe one: http://www.bornshoes.com/Product.aspx?ProductID=5516
07:38 🔗 ersi http://www.youtube.com/watch?v=hHjSj_nKTws
07:39 🔗 SketchCow Shoe 2: http://www.bornshoes.com/Product.aspx?ProductID=3877
07:40 🔗 chronomex I like the first one, it says "sizes 8-14" and someone is of course asking for a size 7
07:40 🔗 BlueMax why is a data archiving channel talking about shoes
07:40 🔗 Wyatt The black ones look like they might be pretty comfortable.
07:41 🔗 chronomex BlueMax: because I'm waiting on curl
07:41 🔗 Wyatt Though I really do prefer to try footwear on before purchase; I have a weird-shaped arch.
07:41 🔗 Wyatt Shoes are data too, right?
07:42 🔗 SketchCow When someone acts up, I kick them with the fuckin' shoes
07:42 🔗 chronomex SketchCow: I got my uploader stuff working, successfully separated in time metadata and data
07:42 🔗 chronomex also made it so I can update the metadata, then rerun the same uploader and it'll see that only the metadata is new
07:42 🔗 SketchCow Excellent.
07:42 🔗 SketchCow Sorry it's so freaky to learn
07:43 🔗 chronomex no worries
07:43 🔗 SketchCow Luckily, you can rape one item over and over
07:43 🔗 chronomex yeah, usually a small item
07:43 🔗 chronomex :P
07:43 🔗 chronomex unless you're like you and have more bandwidth than god
07:43 🔗 chronomex :(
07:43 🔗 SketchCow The littlest ones are the most fun to rape
07:43 🔗 SketchCow uh i heard
07:43 🔗 chronomex >.>
07:43 🔗 * SketchCow takes a seat over there
07:43 🔗 chronomex rape means steal, not deposit
07:43 🔗 SketchCow THIS IS MUCH BETTER THAN SHOES
07:43 🔗 chronomex hurdur
07:45 🔗 chronomex my metadata thing is pretty nice to use, for a 100-line shellscript
07:45 🔗 * BlueMax slowly hides the shoes
07:45 🔗 chronomex display the document in xzgv, ask questions
07:45 🔗 SketchCow Yeah, Friendsmash is helping upload friendster data like crazy
07:45 🔗 SketchCow I'd love to have that here, but right now I don't do that (for bitsavers documents)
07:45 🔗 chronomex aye
07:46 🔗 chronomex I'll share it tomorrow or something if you want
07:46 🔗 chronomex do you know how to display a progress bar on http file upload?
07:50 🔗 alard curl? "If you want a progress meter for HTTP POST or PUT requests, you need to redirect the response output to a file, using shell redirect (>), -o [file] or similar."
07:51 🔗 chronomex doesn't give me any progress indicator.
07:51 🔗 alard oh.
07:52 🔗 chronomex oh, um, okay. got it.
07:52 🔗 chronomex that's more like it
07:59 🔗 SketchCow I'd love to see THAT code.
07:59 🔗 SketchCow Bought the shoes, got free shipping and cheaper price.
08:07 🔗 BlueMax I wonder if you need to feel fabulous while archiving
08:07 🔗 chronomex I think you can get along with hungover
08:09 🔗 BlueMax OK, note to self, keep it down for SketchCow's sake
08:09 🔗 chronomex right, he wants to talk about shoes today
08:11 🔗 SketchCow I can't imagine my "I'm uploading friendster!!!" getting anything but boring.
08:12 🔗 chronomex I'm uploading scans!!!
08:12 🔗 chronomex watch them go!
08:12 🔗 SketchCow Because I'm uploading a metric asston of friendster.
08:12 🔗 chronomex http://www.archive.org/details/bellsystem_CD-1C605-01
08:40 🔗 alard Done, JSTOR now gets two buttons: "View & Save PDF" and "Just Save PDF".
08:41 🔗 SketchCow Can I see?
08:41 🔗 SketchCow I still owe you verbiage
08:43 🔗 alard If you still have the bookmarklet, just go to www.jstor.org and click on it.
08:43 🔗 alard If you've lost the bookmarklet: http://severe-samurai-6114.heroku.com/
08:44 🔗 alard It would also be nice to decide on the right words to use: what are you doing? Have you 'saved', 'freed', 'liberated', 'stored', 'stolen' the PDF?
08:44 🔗 alard But that probably should depend on the tone of your texts.
08:44 🔗 SketchCow I'm sticking with liberator
08:44 🔗 SketchCow Liberated.
08:45 🔗 SketchCow You read it and gave away a copy.
08:47 🔗 alard So that's probably also what the buttons should say: 'View & Liberate' vs 'Just Liberate'? (Probably reduces confusion, since 'View & Save' could also mean that you save it for yourself.)
08:48 🔗 SketchCow Yes.
08:52 🔗 alard Done.
08:53 🔗 SketchCow Thanks
08:53 🔗 alard There are 449.287 free articles to download, by the way, so there's probably enough for everyone.
08:53 🔗 SketchCow Agreed.
08:56 🔗 SketchCow OK.
08:56 🔗 SketchCow I just re-read the terms.
08:56 🔗 SketchCow We're golden.
08:57 🔗 SketchCow I owe you verbiage for that page, and we need to have a server that takes in the data.
08:57 🔗 SketchCow I have some theoretical ones.
08:58 🔗 SketchCow Let me know how it pushes it, and so on.
08:58 🔗 SketchCow Also, I guess we need to make a liberator.archiveteam.org
08:59 🔗 SketchCow Ok, bed, it's 5am, we'll talk tomorrow
08:59 🔗 SketchCow Great job.
08:59 🔗 SketchCow I think we'll announce Monday, once we get all ducks in a row
08:59 🔗 alard Okay. Good night.
08:59 🔗 alard (We should probably check first to see if what comes out at the other end is actually useful.)
09:00 🔗 SketchCow Put it in a directory for me.
09:00 🔗 SketchCow Exciting stuff
09:00 🔗 alard It would be really useful if that server could run something like Redis to keep the list of things to do.
09:13 🔗 * EDream Great Electronics Sale! Prices are reduced up to 50%! Laptops, PDAs, Tablet PCs and more only at X Laptops Co, Ltd. Check us out at http://XLaptops.net
09:59 🔗 BlueMax So what's this new project if I can ask
10:30 🔗 ersi It's to archive ALL electronics! And have a GREAT SALE! reduced prices up to 50%!
17:54 🔗 sankin Does anyone know of a way to archive articles / newspapers from the Google News Archives?
17:55 🔗 sankin they stopped adding new content back in May, and now they've removed the archive search page that was at http://www.google.com/archivesearch/advanced_search
17:55 🔗 sankin how long until they kill it completely?
18:04 🔗 chronomex hmm
18:48 🔗 SketchCow Heads up, gang:
18:48 🔗 SketchCow http://s.assetbar.com/index
18:51 🔗 db48x2 http://public.numair.com/2011_fbfool.html
19:14 🔗 chronomex they sound like they were more interested in their technology than what they were doing with it: http://www.assetbar.com/index_about_us
19:19 🔗 Schbirid anyone know a pastebin where i can upload a 5mb textfile? or is it my browser that refuses to paste
19:21 🔗 Nemo_bis try Chrome/Chromium
19:22 🔗 Schbirid it froze
19:22 🔗 Schbirid it pasted
19:23 🔗 Schbirid The connection to pastebin.com was interrupted.
19:23 🔗 Nemo_bis http://p.defau.lt/new.html
19:24 🔗 Schbirid filehost that might be dying http://stashbox.org/
19:28 🔗 Schbirid thanks
19:28 🔗 Schbirid oh, emijrp aint here
19:42 🔗 Schbirid argh, wget does not like -c and --content-disposition together it seems
19:52 🔗 Schbirid help, i am too dumb to make aria2c simply not download a file if it already exists locally
19:55 🔗 Schbirid --auto-file-renaming=false seems like a dirty hack (and results in an error)
19:57 🔗 db48x2 if [ ! -f $file ]; then $aria ...; fi
19:59 🔗 Schbirid can't do that. i am fetching albums from jamendo and the url i pass to aria2c is not what the downloaded file is named
20:00 🔗 Schbirid example http://www.jamendo.com/get/album/id/album/archiverestricted/redirect/29/?p2pnet=bittorrent&are=ogg3
20:00 🔗 Schbirid with wget i would need to use --content-disposition
20:08 🔗 Schbirid oh actually aria2c is being smart
20:08 🔗 Schbirid many albums at jamendo were changed, aria2c notices that and decides to download even though the _filename_ already exists
20:08 🔗 Schbirid should have expected that
20:12 🔗 Schbirid eh, on another run it does not notice that filename.1 already is the renamed one
20:12 🔗 Schbirid meh
20:52 🔗 chronomex speaking about delicious, this is an interesting read: http://simonwillison.net/notes/2006/summit/schachter.txt
20:53 🔗 chronomex Morals: You have to develop a sense of morals when you build your system. It's
20:53 🔗 chronomex the user's data; it's not yours. Make sure they can remove themselves and
20:53 🔗 chronomex their account if they want to.
20:53 🔗 chronomex hmmm.
20:53 🔗 chronomex In del.icio.us if a user deletes something they really do purge the data from
20:53 🔗 chronomex the system. No transaction logs etc for getting stuff back.
21:49 🔗 Ymgve no backups?
22:27 🔗 Zebranky chronomex: I think I would prefer that to having it retained indefinitely, though that opens concerns of malicious deletion by other people, etc.
22:27 🔗 chronomex yeah...

irclogger-viewer