#archiveteam 2013-01-15,Tue

↑back Search

Time Nickname Message
00:12 πŸ”— filer [16:11:28.004] GET http://aaronsw.archiveteam.org/next-item?r=0.25202059401081345 [HTTP/1.1 500 Internal Server Error 256ms]
00:13 πŸ”— filer is this related to the DOS attack?
00:13 πŸ”— Coderjoe one per person, iirc
00:13 πŸ”— balrog_ see -bs
00:13 πŸ”— balrog_ there's something broken :(
00:13 πŸ”— SketchCow He's looking at it
00:13 πŸ”— SketchCow We already blew out file handles. :)
00:13 πŸ”— filer heh
00:14 πŸ”— SketchCow This is a very nice experience test for underscor
00:14 πŸ”— SketchCow He's going to learn a lot tonight
00:15 πŸ”— chronomex haha :)
00:18 πŸ”— BlueMaxim you must be so proud of your star pupil SketchCow
00:19 πŸ”— SketchCow Every damned day
00:20 πŸ”— SketchCow And you, you're like the Voldemort. I expect you to rise against us from an australian law firm in 2023, having bided your time appropritately
00:21 πŸ”— SketchCow New meaning for the term "Kangaroo Court"
00:22 πŸ”— TomRiddle actually, one difference is that Voldemort knew what he was doing
00:23 πŸ”— TomRiddle comparing your knowledge of computers to mine is like a needle and a haystack
00:23 πŸ”— SketchCow Not intially
00:23 πŸ”— SketchCow Did you really just say that
00:23 πŸ”— TomRiddle ...I got it the wrong way around
00:23 πŸ”— TomRiddle you know what I meant >_<
00:25 πŸ”— SketchCow https://twitter.com/textfiles/status/290975346147340288
00:32 πŸ”— kanzure yo
00:33 πŸ”— SketchCow WELCOME
00:34 πŸ”— SketchCow aaaaand now it's down
00:34 πŸ”— kanzure hi ivan, X-Scale, nitro2k01
00:34 πŸ”— kanzure jason sent me here
00:34 πŸ”— kanzure said something about some infrastructure for rapidly archiving a failing site?
00:35 πŸ”— nitro2k01 Jason Fucking Scott; Middle name Fucking, hence the capitalization.
00:35 πŸ”— SketchCow Fuuuuuuuuuuuuuuuuuuuuuuuuuuuuucking
00:35 πŸ”— SketchCow Someone point him to the Warrior
00:35 πŸ”— nitro2k01 Ywah, isn't the link to it supposed to be in the /topic?
00:36 πŸ”— kanzure also this:
00:36 πŸ”— kanzure https://groups.google.com/group/science-liberation-front
00:36 πŸ”— kanzure i've been working on some mobile app that serves as a proxy for android and iphone that college students run to grab papers
00:38 πŸ”— kanzure dunno if you guys would be into that
00:39 πŸ”— BlueMax wow there's 100 people in this channel
00:39 πŸ”— kanzure nitro2k01: don't i know you from somewhere?
00:40 πŸ”— balrog_ kanzure: there an irc channel for that?
00:40 πŸ”— balrog_ also I want a proxy that not only grabs papers
00:42 πŸ”— kanzure there's ##hplusroadmap on irc.freenode.net i guess
00:42 πŸ”— kanzure we do do-it-yourself biohacking/genetic engineering/dna synthesis/nootropics and things.
00:42 πŸ”— kanzure and paperbot, our paper-fetching irc bot
00:43 πŸ”— kanzure balrog_: well, we could just deploy a botnet
00:44 πŸ”— kanzure unfortunately i'm not as hooked into the android malware scene these days, i have no idea what software would be a good choice
00:44 πŸ”— kanzure transproxy doesn't look like what i need, and proxydroid is only for redirecting your outgoing requests (not accepting incoming connections)
00:44 πŸ”— kanzure plus proxydriod totally fails to run on android-x86 because it's all armeabi junk
00:44 πŸ”— balrog_ I'm thinking more of browser plugins
00:44 πŸ”— balrog_ for desktop browsers
00:45 πŸ”— kanzure you're going to run a proxy in a browser plugin?
00:45 πŸ”— balrog_ no, a browser plugin that just saves viewed PDFs and metadata
00:45 πŸ”— kanzure zotero does that already
00:45 πŸ”— balrog_ or something of that sort
00:45 πŸ”— kanzure paperbot is based on a headless version of zotero translators
00:45 πŸ”— kanzure https://github.com/zotero/translators
00:45 πŸ”— kanzure https://github.com/zotero/translation-server
00:46 πŸ”— kanzure however, you have to click 'save'- this could be enabled by default instead and it could be switched to HTTP POST to somewhere
00:46 πŸ”— kanzure i think there's also a zotero server for collecting pdfs/bibliographies but i've never used it
00:47 πŸ”— kanzure (like for managing a few institutional users)
00:49 πŸ”— kanzure is that what you had in mind?
00:51 πŸ”— balrog_ brb
01:08 πŸ”— GLaDOS Liberator is stuck on uploading.
01:08 πŸ”— GLaDOS Damn you DoS!
01:12 πŸ”— GLaDOS http://arstechnica.com/tech-policy/2013/01/aaron-swartz-memorial-jstor-liberator-sets-public-domain-academic-articles-free/ Article is up, SketchCow
01:13 πŸ”— kanzure neat
01:13 πŸ”— BlueMax nice, hope that brings in some attention
01:14 πŸ”— kanzure "By running the scriptҀ”which is limited to once per browser" what
01:14 πŸ”— kanzure that must be a misunderstanding
01:15 πŸ”— balrog_ that's deliberate
01:15 πŸ”— kanzure that's silly
01:15 πŸ”— GLaDOS Each browser can run it only once.
01:15 πŸ”— GLaDOS It's a memorial more than an archiving effort.
01:15 πŸ”— GLaDOS If it were the latter, we would've fired our warriors up.
01:15 πŸ”— kanzure do your warriors have access?
01:16 πŸ”— kanzure "(they were only dropped this morning)" also a misunderstanding
01:17 πŸ”— GLaDOS No, they were.
01:18 πŸ”— kanzure i thought that's because they can't go after his estate
01:19 πŸ”— kanzure it would be more relevant to report if that /didn't/ happen
01:21 πŸ”— balrog_ http://www.huffingtonpost.com/2013/01/14/aaron-swartz-stephen-heymann_n_2473278.html?utm_hp_ref=tw
01:28 πŸ”— SketchCow chronomex: ping
01:28 πŸ”— chronomex pong
01:31 πŸ”— SketchCow alard: redis seems to not be working for underscor now
01:31 πŸ”— ex-parrot someone has probably already spotted this, but http://aaronsw.archiveteam.org/ just seems to have gone down for me
01:32 πŸ”— chronomex we're on it
01:32 πŸ”— SketchCow http://arstechnica.com/tech-policy/2013/01/aaron-swartz-memorial-jstor-liberator-sets-public-domain-academic-articles-free/
01:32 πŸ”— ex-parrot cool :) I almost managed to run the bookmarklet before it died :)
01:34 πŸ”— SketchCow underscor is here now.
01:34 πŸ”— SketchCow Let's sort this
01:35 πŸ”— underscor pong
01:41 πŸ”— SketchCow underscor: anything else needed?
01:41 πŸ”— chronomex we appear to be handling this in -bs
01:41 πŸ”— chronomex for better or for worse
01:41 πŸ”— underscor No, other than alard's thoughts on what happened
01:41 πŸ”— underscor also that
01:41 πŸ”— SketchCow Thanks
01:45 πŸ”— DonnchaC Hi
01:45 πŸ”— ex-parrot fwiw, the JSTOR liberator seems to be sticking at "Asking for next item..." on my machine still. though I am running iceweasel 18, which is probably not well tested
01:46 πŸ”— GLaDOS ex-parrot: did you run it successfully before?
01:46 πŸ”— ex-parrot GLaDOS: nope, but the site went down at roughly the same instant I tried to run it for the first time, so who knows what state it's in
01:47 πŸ”— GLaDOS Hm
01:47 πŸ”— ex-parrot I also have Ghostery, AdBlock and NoScript installed which have a tendency to break javascript in unusual ways
01:48 πŸ”— ex-parrot disabling them makes no difference
01:49 πŸ”— chronomex please hold
01:50 πŸ”— DonnchaC I created a crappy PoC maybe a week ago for a bug I saw on SpringerLink. Their "LookInside" functionality just loads up a png of the page with JS.
01:51 πŸ”— DonnchaC It turns out the png url is /000.png, /001.png. It is possible to incrementally get each page image, download them into local browser and upload back to a server where they are converted to PDF.
01:52 πŸ”— DonnchaC I had created a shitty greasemonkey script for this last week and its available at http://0bin.net/paste/29713b9cbf8d1cd60f3cf07e71757ba429196833#SalSJ4E3+RxzQz15KrnaJ9g6gtUpGXj65YFUIH3rBTw=
01:52 πŸ”— chronomex nice
01:53 πŸ”— DonnchaC I had a look through the terms and this doesn't appear to be anything explictly restricting, viewing the "preview" in your browser. Obviously it would be possible to expand a similar script to download original PDF instead if available.
01:53 πŸ”— DonnchaC I am obviously not condoning the use of this or similar script by anyone to violate any laws in there respective countries.
01:54 πŸ”— chronomex wink wink nod nod
01:55 πŸ”— kanzure DonnchaC: could you also post that information here? https://groups.google.com/group/science-liberation-front
01:55 πŸ”— kanzure i wonder about dumping zotero translators into a greasemonkey csript
01:55 πŸ”— kanzure i think the api is different. i haven't used greasemonkey in, gosh, 4 years at least
01:57 πŸ”— kanzure also, i have some PoC in the works for removing watermarks from pdfs from publishers. not quite ready yet.. but if we can detect malware in pdf, we can certainly detect watermarks.
01:57 πŸ”— kanzure so far i've found that sciencedirect/elsevier/nature publishing group don't seem to add watermarks (confirming via md5sum of the documents from multiple different retrievals on different ezproxy endpoints)
01:57 πŸ”— kanzure ieee definitely adds visible watermarks..
02:00 πŸ”— DonnchaC RSC journals add visible watermarks around the margins, not sure if they have other watermarks.
02:04 πŸ”— kanzure i keep forgetting who it is that adds that entire first page of watermarking
02:04 πŸ”— kanzure is it wiley?? i want to say wiley. :(
02:04 πŸ”— kanzure anyway the number one problem i am encountering is that i can't pick a reasonable pdf modification library for python
02:04 πŸ”— kanzure maybe there's something in pdf.js that could be used
02:04 πŸ”— kanzure https://github.com/mozilla/pdf.js
02:06 πŸ”— balrog_ watermarking is easy to remove from scan-sourced media
02:07 πŸ”— balrog_ you just extract the images and use them and that's it
02:07 πŸ”— chronomex yes
02:09 πŸ”— kanzure in pdf it's even easier because they are extra xml attributes in the file (more or less)
02:09 πŸ”— kanzure (please don't murder me; i'm not a pdf spec wizard yet)
02:10 πŸ”— kanzure xml elements, i mean. not attributes.
02:11 πŸ”— instence string him up! PDF wizard mana too low!
02:11 πŸ”— balrog_ pdf is a messy standard
02:11 πŸ”— balrog_ a lot of bells and whistles
02:11 πŸ”— balrog_ I suggest decompressing though if you want to analyze as the first step
02:11 πŸ”— chronomex pdf allows all kinds of scary things like embedded flash
02:11 πŸ”— kanzure and javascript
02:12 πŸ”— DonnchaC Yeah it should be relativily straightforward to remove the copyright strings from a PDF
02:12 πŸ”— kanzure no it's not the copyright strings that matter
02:12 πŸ”— DonnchaC I have done some playing around with the format before.
02:12 πŸ”— DonnchaC (the identifying source strings)
02:12 πŸ”— kanzure "Authorized licensed use limited to: University of Texas at Austin. Downloaded on July 22, 2009 at 15:50 from IEEE Xplore. Restrictions apply."
02:12 πŸ”— kanzure that shit.
02:12 πŸ”— kanzure that shit's gotta go.
02:13 πŸ”— DonnchaC Have you go
02:13 πŸ”— kanzure http://scholar.google.com/scholar?q=%22IEEE+Xplore.+Restrictions+apply.%22
02:36 πŸ”— tef download it twice, from different sources, null out the bits that are different :v
02:37 πŸ”— kanzure well
02:38 πŸ”— kanzure oen way is to pipe it into ghostscript and just convert it to another format and then back again
02:38 πŸ”— kanzure the problem with downloading from multiple sources is that it would require keeping track of which ezproxy servers have access to which publishers
02:38 πŸ”— kanzure i mean, that's not a huge problem. it's just annoying.
02:43 πŸ”— DonnchaC It is indeed. How extensivily are articles watermarked?
02:43 πŸ”— filer I've stripped that message from IEEE Xplore documents before
02:43 πŸ”— filer it was plain text inside the PDF
02:44 πŸ”— filer so I just replaced it with spaces
02:44 πŸ”— DonnchaC Is it just a couple of the big players or are a lot of pubishers doing that?
02:44 πŸ”— kanzure manually or with some script?
02:44 πŸ”— chronomex hah
02:44 πŸ”— DonnchaC Keeping it simple.
02:44 πŸ”— kanzure DonnchaC: it's really random, some publishers do others dont
02:44 πŸ”— filer yeah, that was my experience
02:44 πŸ”— kanzure i really want to write up a quick script to do it though
02:44 πŸ”— chronomex there are lots of sneakier ways you can watermark a pdf, but I haven't heard of them in use yet
02:45 πŸ”— filer once I realized it was a fixed string, I think I just used sed
02:45 πŸ”— kanzure chronomex: yeah, i think we should be out looking for them, but for now we shouldn't assume they are being that sneaky
02:45 πŸ”— DonnchaC Most watermarks will probably just be a plaintext tag in the PDF.
02:45 πŸ”— chronomex yes
02:45 πŸ”— filer but yeah, get a couple copies and cmp
02:45 πŸ”— kanzure sed works if you know the string in advance
02:45 πŸ”— kanzure i think what we need is a simple script that has a list of regexes
02:45 πŸ”— DonnchaC I suppose it will be an arms rase, they will only advanced to sneaker techniques when there is mass sharing and watermark removal
02:45 πŸ”— chronomex yes
02:46 πŸ”— kanzure the zotero team has proved that we can win the arms race
02:46 πŸ”— kanzure scrapers break -> they fix within 24 hours
02:46 πŸ”— filer well, if you get two copies through two different netblocks and the diffs are simple, then you have a profile for that particular publisher
02:46 πŸ”— chronomex also who cares about "hey this document was contributed to the public domain by this cool person on tuesday jan 15 2013"
02:46 πŸ”— filer also that
02:46 πŸ”— kanzure well, sometimes it includes an ip address
02:46 πŸ”— chronomex I don't mind
02:47 πŸ”— kanzure you would mind if you are downloading in mass
02:47 πŸ”— kanzure suddenly your professor gets the blame because you were in his lab for whatever reason
02:47 πŸ”— chronomex IA's scans of books include the library's stickers
02:47 πŸ”— DonnchaC You would want something that fails safe? If no matching watermark is found for a site you know watermarks documents. You probably don't want that shared in case there is a new form of watermark, potentionally getting someone in journal
02:48 πŸ”— kanzure especially if the document is redistributed
02:48 πŸ”— chronomex NOBODY IS GOING TO JAIL FOR PUBLIC DOMAIN WORKS
02:48 πŸ”— kanzure the last thing you want to do is get some poor bastard blamed for a pdf or some shit
02:48 πŸ”— chronomex NOT UNDER MY WATCH
02:52 πŸ”— DonnchaC Unfortunatly if there is large scale information liberation and redistribution they will target the small guys and whoever they can get
02:52 πŸ”— filer for distributing public domain materials?
02:52 πŸ”— kanzure haha, no not public domain
02:53 πŸ”— adamcaudi It's less about the distrobution than it is about accessing them to distribute
02:57 πŸ”— kanzure well, proxies are very easy to deploy. i should go writeup my mobile proxy idea somewhere.
02:59 πŸ”— filer not sure what sense of "mobile" you're referring to, but I have a stack of $20 TP-Link TL-WR703N OpenWRT-compatible routers with USB ports
02:59 πŸ”— filer easy to velcro to things, heh
02:59 πŸ”— adamcaudi The 703N is great, had lots of fun with those
03:00 πŸ”— ex-parrot I wish they were easier to solar power
03:00 πŸ”— kanzure filer: i mean for students to run on their phone while htey are on campus
03:00 πŸ”— kanzure browser extensions are cool but phones are always on
03:00 πŸ”— filer ah
03:00 πŸ”— kanzure just think how much battery life you could potentially be draining!
03:01 πŸ”— filer I wonder how long one of those routers could run on a cheap battery
03:01 πŸ”— filer I think they consume something like 100mw
03:01 πŸ”— adamcaudi ex-parrot, I just embedded one in a power strip - hidden and hard-wired for power
03:01 πŸ”— filer nice
03:01 πŸ”— ex-parrot that's genius, assuming you're installing it inside :)
03:02 πŸ”— ex-parrot and assuming the switching PSU small enough to fit inside a power strip is also well made enough not to catch fire after a while :/
03:02 πŸ”— kanzure i keep forgetting the name of that really cheap board that you rop into a powerstrip
03:02 πŸ”— kanzure *drop
03:02 πŸ”— filer someone else I talked to had such an idea, but at the time there weren't routers that were tiny enough
03:02 πŸ”— ex-parrot filer: check dealextreme, there are versions which have a built in battery already. I did some numbers on trying to solar power them but it didn't look too practical
03:02 πŸ”— kanzure it was basically a linux server that was powered by cat5 or something
03:02 πŸ”— kanzure am i making this up?
03:03 πŸ”— ex-parrot shivaplug?
03:03 πŸ”— filer *sheeva
03:04 πŸ”— filer the SheevaPlug is cool, but more powerful and more expensive
03:04 πŸ”— filer I have one router that is actually the size of an iphone charger
03:04 πŸ”— filer unfortunately, I think it must use some RTOS
03:05 πŸ”— adamcaudi ex-parrot, https://twitter.com/adamcaudill/status/227249569765916672
03:05 πŸ”— filer oh, don't thost just have tp-links inside?
03:06 πŸ”— ex-parrot very nice adamcaudi, certainly better than the overpriced govt engineered one doing the rounds a few months ago
03:06 πŸ”— adamcaudi That's what inspired it :)
03:06 πŸ”— filer oh yeah http://www.minipwner.com/index.php/minipwner-build
03:07 πŸ”— adamcaudi Actually talked to the guy that designed the $1300 version - I don't think he realized just how close you could get for $50
03:07 πŸ”— chronomex ha
03:09 πŸ”— ex-parrot you could have built it 5+ years ago I guess, a gumstix would fit and they have had low end units which definitely came in at < $1300
03:10 πŸ”— filer having a $20 router with USB definitely helps though
03:10 πŸ”— chronomex filer: speaking of, mind if I swing over in 20?
03:10 πŸ”— ex-parrot they are great. I have a few here for various projects. friend is using them as radio modules for robotics control
03:10 πŸ”— chronomex might want to unload a TPlink from you
03:10 πŸ”— filer no problem
03:11 πŸ”— chronomex coolz
03:13 πŸ”— adamcaudi Have you seen thegrugq's PORTAL project? It's a 703N that routes everything over TOR
03:14 πŸ”— chronomex neat
03:15 πŸ”— filer cool, I've wanted to have something like that
03:15 πŸ”— filer glad to know someone's already made it, saves me work :)
03:18 πŸ”— kanzure blast from the past:
03:18 πŸ”— kanzure https://groups.google.com/forum/?fromgroups=#!topic/diybio/SFuyGIAt74k
03:18 πŸ”— kanzure this was from when aaronsw was starting the getarticles group
03:21 πŸ”— execute why don't you start publishing the aaronsw documents as torrents, and distribute the torrents magnet links via an RSS feed? I think a lot of people would subscribe their torrent clients to the that feed and help store and distribute it
03:22 πŸ”— kanzure because nobody seeds
03:22 πŸ”— kanzure library genesis did that, and nobody fucking seeds it
03:22 πŸ”— kanzure http://libgen.net/
03:22 πŸ”— execute ah, well, that sucks
03:23 πŸ”— kanzure it's probably the greatest dump of ebooks and academic articles ever
03:47 πŸ”— kanzure hmm there's a zotero plugin that is supposed to autosave pdfs when you browse to a page
03:47 πŸ”— kanzure (according to zotero's maintainer)
03:47 πŸ”— kanzure but he left in a cloud of smoke and now i'm not sure what he is talking about. any ideas?
03:56 πŸ”— lithiumg greetings everyone
03:58 πŸ”— lithiumg does a scripted version of jstor liberator exist?
04:00 πŸ”— kanzure there's a springerlink version, https://groups.google.com/group/science-liberation-front/t/d6bb86b96de8c6a6
04:00 πŸ”— kanzure if that's what you mean?
04:01 πŸ”— lithiumg i was hoping to find a bash/perl/python/etc version
04:02 πŸ”— lithiumg I've got a few linux boxes scattered about that I'd like to toss at it
04:02 πŸ”— kanzure they seem to only accept one article per user, it's limited on the server end
04:03 πŸ”— lithiumg I haven't seen that limit
04:03 πŸ”— kanzure oh, maybe it's on the client side neat
04:03 πŸ”— kanzure you guys all lied to me
04:04 πŸ”— lithiumg someone on the internet lied to you?
04:06 πŸ”— kanzure hmm multiple people have asked me to change the name of that mailing list, any suggestions?
05:01 πŸ”— kanzure auto-save plugin for zotero https://groups.google.com/group/science-liberation-front/t/2b3b468fca63a6b2
05:12 πŸ”— SketchCow OK, I've piled all the godane material into collections n' crap
06:02 πŸ”— ersi Whoa.
06:13 πŸ”— mjb_b greetings. sorry if this has been asked 8971236497861 times already, but what's the status of getting the JSTOR liberator bookmarklet working again?
06:14 πŸ”— mjb_b and is there anything I can do to help diagnose?
06:15 πŸ”— GLaDOS mjb_b: http://aaronsw.archiveteam.org/
06:15 πŸ”— GLaDOS It's working, you may only use it once though.
06:18 πŸ”— slythfox I'd be neat if there was a system for people to suggest articles for others to liberate, thus further encouraging the one-per-browser?
06:20 πŸ”— SketchCow Our admin is either asleep or broken
06:21 πŸ”— tsp_ Why is it only one per browser? JSTOR limitation?
06:22 πŸ”— chronomex policy choice
06:24 πŸ”— mjb_b it didn't work even once for me - on win7, with chrome
06:25 πŸ”— mjb_b the next-item GET hangs
06:25 πŸ”— mjb_b I think because something goes wrong with the frameset creation
06:25 πŸ”— mjb_b the jstor document doesnt get put into the lower frame
06:27 πŸ”— mjb_b the part of the script that tries to put it into the lower frame is resulting in an immediately canceled GET, according to the network tab in the developer tools
06:28 πŸ”— mjb_b aaronsw.archiveteam.org homepage keeps showing the same most recently liberated doc...nothing liberated for a while...
06:29 πŸ”— chronomex immediately canceled GET may be symptomatic of needing an Access-Control-Allow-Origin HTTP header
06:30 πŸ”— slythfox Try running it in --disable-web-security (chrome) ?
06:30 πŸ”— kanzure my favorite option
06:30 πŸ”— chronomex --enable-surprise-buttsex
06:30 πŸ”— kanzure chronomex: do you guys mind me linking to science-liberation-front?
06:31 πŸ”— kanzure chronomex: yes that is the correct reading of that option
06:31 πŸ”— chronomex what's SLF, kanzure?
06:31 πŸ”— Cameron_D https://groups.google.com/group/science-liberation-front/ this?
06:31 πŸ”— SketchCow chronomex: Things stopped woking - could you check the box?
06:32 πŸ”— kanzure chronomex: https://groups.google.com/group/science-liberation-front/t/2b3b468fca63a6b2
06:32 πŸ”— kanzure and so on
06:32 πŸ”— kanzure chronomex: just grouping together some peeps who want to work on crawlers and things
06:32 πŸ”— chronomex kanzure: archiveteam welcomes inbound links from all comers
06:33 πŸ”— chronomex SketchCow: I don't see anything wrong.
06:33 πŸ”— kanzure ok. because when people are wondering about why you guys don't want more than 1 document per person, i feel sort of compelled to point out that there are others who would want more by linking to that. heh.
06:33 πŸ”— kanzure and it feels disingenous to be spamming your excellent channel
06:33 πŸ”— chronomex there are moving parts into which I have no visibility and no insight
06:34 πŸ”— mjb_b no luck with --disable-web-security. Still hangs with empty lower frame, upper frame "Looking for another liberated item"
06:34 πŸ”— chronomex kanzure: if I'm not mistaken, this in particular is about making a statement rather than hoovering JSTOR
06:35 πŸ”— kanzure right, but some people might want to do more
06:35 πŸ”— chronomex I understnad
06:35 πŸ”— kanzure actually i think you guys should probably elaborate on the page itself
06:35 πŸ”— chronomex probably, yes
06:36 πŸ”— kanzure although that might shoot yourself in the foot. tough call.
06:36 πŸ”— chronomex he who shoots from the hip sometimes forgets to point away from foot first
06:36 πŸ”— chronomex archiveteam shoots from hip
06:38 πŸ”— SketchCow chronomex: Thanks
06:41 πŸ”— SketchCow Back and running again
06:42 πŸ”— SketchCow He doesn't know how he fixed it
06:42 πŸ”— SketchCow He logged in and it just worked
06:42 πŸ”— chronomex sometimes you just need to kick something
06:42 πŸ”— chronomex that's always the scariest kind of fix
06:43 πŸ”— filer yayyyy
06:43 πŸ”— * filer has successfully contributed
06:46 πŸ”— mjb_b yes! it just worked for me
06:46 πŸ”— chronomex \o/
06:46 πŸ”— mjb_b the homepage is updating like gangbusters too
06:46 πŸ”— filer indeed
06:49 πŸ”— filer chronomex: apropos of nothing, I noticed recently that the north wall of the gov pubs collection at Suzzallo has many, many red boxes of microcards of parliamentary transcripts or something
06:49 πŸ”— filer I wonder if those are online
06:49 πŸ”— chronomex hm
06:52 πŸ”— filer yes, as a ProQuest service... http://parlipapers.chadwyck.co.uk/marketing/index.jsp ... bleaugh
06:54 πŸ”— filer good thing to know that all of those materials dating back to 1688 are safely protected by a paywall
06:55 πŸ”— chronomex hooray
06:56 πŸ”— Coderjoe btw, the "just liberated" list is showing doubles for me
06:56 πŸ”— chronomex looks fine to me, refresh?
06:57 πŸ”— Coderjoe hmm
06:57 πŸ”— Coderjoe full refresh (or perhaps it was just the second refresh) seems to have fixed it
07:01 πŸ”— Coderjoe darn it. my article was just an abstract.
07:01 πŸ”— chronomex I got one that was paywalled for $34
07:01 πŸ”— chronomex so I tried again
07:02 πŸ”— kanzure 23:01 <@jblake> scrape liberty from the heels of your oppressors
07:02 πŸ”— kanzure 23:01 <@jblake> march against the paywalls of injustice!
07:02 πŸ”— chronomex Coderjoe: it could be sillier ... http://www.jstor.org/stable/3253788
08:06 πŸ”— filer "march against the paywalls of injustice!"
08:06 πŸ”— filer I like this
08:09 πŸ”— kanzure filer: join science-liberation-front
08:09 πŸ”— filer #?
08:10 πŸ”— kanzure filer: it's a mailing list. http://groups.google.com/group/science-liberation-front
08:10 πŸ”— kanzure although we have a bunch of people in ##hplusroadmap
08:10 πŸ”— kanzure .. kind of a happenstance i guess. maybe a different channel should be used. btw that was freenode.
08:12 πŸ”— filer cool, joined
08:12 πŸ”— kanzure i am busy poking at a possible ezproxy exploit
08:16 πŸ”— chronomex chumby is perhaps closing their remaining assets http://forum.chumby.com/viewtopic.php?id=8457
08:16 πŸ”— chronomex I'll fire off a warc
08:18 πŸ”— filer ouch, $4300-$5500/mo
08:18 πŸ”— filer I wonder how many chumbies that is
08:19 πŸ”— chronomex 40k chumbies
08:19 πŸ”— chronomex it's 11 cents a month per chumby
08:22 πŸ”— Cameron_D "3) Find someone in the community to host this forum and the wiki. If you can do this, please contact me." anyone want to offer?
08:47 πŸ”— chronomex well I'm sucking down the source code site and the forum
08:48 πŸ”— chronomex might as well get the wiki too while I'm at it
08:50 πŸ”— chronomex relatively small wiki, 197 pages
09:23 πŸ”— Nemo_bis underscor, SketchCow, it's not a "secret" that https://archive.org/details/philosophicaltransactions come from JSTOR, is it?
09:23 πŸ”— Nemo_bis (It could be at most a "segreto di pulcinella", as we'd say in Italian.)
09:36 πŸ”— filer a secret puffin?
09:42 πŸ”— SketchCow It's not a secret.
09:42 πŸ”— SketchCow But it's besides the point related to the liberator.
09:45 πŸ”— tef SketchCow: so i hav a warc of stuff, which ia collection? yours or brewsters?
09:45 πŸ”— tef (sure i've mentioned the content of said warc enough times)
09:45 πŸ”— SketchCow I've forgotten
09:46 πŸ”— tef ok, I took a crawl of hn front page + articles with a crawler that supports ajax
09:46 πŸ”— tef so I stick it in ark-aaronsw or aaronsw
09:46 πŸ”— tef assuming it's relevant
09:50 πŸ”— SketchCow Yeah
09:50 πŸ”— SketchCow Do it for either
09:51 πŸ”— SketchCow http://archive.org/details/magazine_rack_misc is fun stuff
09:53 πŸ”— kanzure "Hey all, I'm coordinating a series of memorial hackathons for Aaron Swartz. Currently there's going to be one at Noisebridge in SF on Jan. 26 (ish) and another somewhere in Boston, but the more the better."
09:53 πŸ”— kanzure "The idea is to bring together people at hackerspaces around the world to work on projects that in some way continue the work that Aaron did to facilitate the sharing of human knowledge, social/political justice, and free culture."
09:53 πŸ”— kanzure https://groups.google.com/group/science-liberation-front/t/3d17904bef7759b0
09:55 πŸ”— tef ok officially I am too incompetent to use the archive uploader http://archive.org/details/NewsYcFrontpagePlusArticlesThreads
09:55 πŸ”— tef batcave was so much easier for my poor brain
10:46 πŸ”— alard tef: It needs a different media type, but the files are there.
11:20 πŸ”— Smiley https://ia601608.us.archive.org/23/items/NewsYcFrontpagePlusArticlesThreads/ << see :)
13:49 πŸ”— maxigas hi, can somebody tell me where are the docs uploaded to http://aaronsw.archiveteam.org/ are available?
13:51 πŸ”— maxigas also, i think the counter is restarted every once in a while.
14:18 πŸ”— alard I think the counter is missing a zero.
14:19 πŸ”— maxigas i saw it go from around 696 to 12 when i refreshed the page after a few minutes.
14:26 πŸ”— alard 12 was probably 1002, but there's a bug in the code that removes the 00 when it adds a thin space between 1 and 002.
14:26 πŸ”— alard It does Math.floor(n/1000) + " " + (n%1000).
14:54 πŸ”— maxigas so where are the downloaded dox?
14:55 πŸ”— alard I don't know. They're probably not available at the moment, but SketchCow will surely find a way to make them available later.
14:59 πŸ”— maxigas hm it undermines the legitimacy of the project a bit... documents should be available shortly after you submitted them.
15:00 πŸ”— maxigas i just showed the website to four people and each of them asked about where to find the assembled documents.
15:03 πŸ”— alard That may be true, but it's also easier said than done.
15:11 πŸ”— SketchCow Awwwwwwww.
15:11 πŸ”— SketchCow You know what I love? I mean love?
15:11 πŸ”— SketchCow When someone comes to a project and complains.
15:11 πŸ”— SketchCow Let's see.
15:11 πŸ”— SketchCow The project was launched around 5pm last night.
15:12 πŸ”— SketchCow So that's... hmmm, 16 hours or so.
15:12 πŸ”— SketchCow We immediately started getting swamped.
15:12 πŸ”— SketchCow We dealt with being swamped.
15:12 πŸ”— SketchCow So I guess.... well....
15:12 πŸ”— SketchCow I know. Fuck you.
15:13 πŸ”— SketchCow How long did we spend trying to keep the server up and dealing with DoS attempts and hacking attacks? Probably 8 of those 16 hours.
15:13 πŸ”— SketchCow So.... there we go.
15:13 πŸ”— SketchCow Morning, alard.
15:14 πŸ”— alard Hello. (Afternoon.)
15:17 πŸ”— SketchCow Do you want to send underscor a suggestion to fix the counter thing?
15:18 πŸ”— SketchCow That'll save him when he wakes up in whatever addled state he does this morning
15:18 πŸ”— alard I've done so, before I responded here.
15:19 πŸ”— maxigas SketchCow: good point, sorry about that. :/
15:19 πŸ”— balrog_ another thing for underscor: "millions of dollars in fees" -> "millions of dollars in fines"
15:19 πŸ”— maxigas can i help in some way?
15:20 πŸ”— SketchCow Yes, you can shut the hell up.
15:22 πŸ”— SketchCow (fees/fines, wasn't sure of the best term)
15:22 πŸ”— balrog_ a fee is something you pay voluntarily, so yeah. it just sounds a bit weird
15:23 πŸ”— SketchCow I will not re-iterate to you the circumstances in which I wrote the verbiage.
15:24 πŸ”— SketchCow http://archive.org/stream/1975PredictionsInAwakeMagazineByJehovahsWitnesses/1975_Predictions_Awake_Magazine#page/n11/mode/2up
15:24 πŸ”— SketchCow Finally, someone speaks the truth
15:25 πŸ”— alard maxigas: Perhaps with other ArchiveTeam projects, later, or think of a project of your own. Are you running a warrior yet?
15:25 πŸ”— mistym That title design is delightful.
15:26 πŸ”— SketchCow It is.
15:26 πŸ”— alard That's a question I've often asked myself, so I'm glad to see it answered.
15:26 πŸ”— SketchCow As I mentioned last night, I put http://archive.org/details/magazine_rack_misc together, grabbing 100+ orphaned magazines and shoving them in, so it's this pretty crazy bin of magazines, superpamphlets, and screeds.
15:32 πŸ”— SketchCow And as many of these items were sitting around in the collection for years, they have ridiculous download stats.
15:40 πŸ”— SketchCow http://archive.org/details/thescreensavers - 87 episodes saved
15:40 πŸ”— SketchCow Oh, so under "oh no, what the fuck", Myspace is starting to begin to get rid of old profiles.
15:40 πŸ”— SketchCow Or, let people voluntarily upgrade to the new format.
15:43 πŸ”— Smiley o_O
15:44 πŸ”— Smiley my old myspace is so amusing, I linked it on facebook recently :<
16:29 πŸ”— SketchCow I'm speaking in Germany first week of February.
16:29 πŸ”— SketchCow I just found out they will have a pneumatic tube system active for the event between locations
16:29 πŸ”— SketchCow And the orientation letter just let us know to expect capsules to slam into the room during our panels
16:30 πŸ”— mistym Maybe the greatest distraction?
16:30 πŸ”— SketchCow That's a good question.
16:30 πŸ”— SketchCow It may be.
16:32 πŸ”— Smiley o_O
16:44 πŸ”— kanzure embedding metadata in pdfs https://groups.google.com/group/science-liberation-front/t/b73592f3606b9420
16:47 πŸ”— Smiley nice.
16:47 πŸ”— Smiley but whats wrong with md5 sums? :D
17:00 πŸ”— kanzure Smiley: nothing, i think md5sum is a good idea
17:00 πŸ”— kanzure but i am also of the opinion that people should be splicing supplemental material into the pdf
17:00 πŸ”— kanzure chances are, if you don't include it in the .pdf itself, it's not going to be distributed when the paper is read/downloaded
17:01 πŸ”— Smiley kanzure: a md5sum IS the pdf.
17:01 πŸ”— Smiley Hense why it works so well :D
18:12 πŸ”— kanzure On Tue, Jan 15, 2013 at 12:10 PM, Piotr Migdal <pmigdal@gmail.com> wrote:
18:12 πŸ”— kanzure > "Zorrotero" ;))
18:12 πŸ”— kanzure > (Silly remark: anyway, for the "guerrilla" Zotero, a good name is
18:14 πŸ”— X-Scale Speaking of "guerrilla" ... http://www.1000manifestos.com/aaron-swartz-the-guerilla-open-access-manifesto/
18:22 πŸ”— kanzure X-Scale: yes that was what the reference was to
18:25 πŸ”— kanzure retoshare/jstor dump https://groups.google.com/group/science-liberation-front/t/9f6c865cfdb43382?hl=en_US
18:36 πŸ”— balrog_ kanzure: are you sure that isn't the collection of Philosophical Transactions of the Royal Society papers released by Greg Maxwell?
18:37 πŸ”— kanzure balrog_: it seems to include other things, because it's retroshare
18:37 πŸ”— kanzure balrog_: but yeah this is useless
19:02 πŸ”— SketchCow Journal of Higher Education statement sent.
19:02 πŸ”— SketchCow next: Forbes
19:06 πŸ”— kanzure link?
19:06 πŸ”— X-Scale balrog_: http://h33t.com/torrent/04934029/r-i-p-aaron-swartz-jstor-archive-35gb
19:06 πŸ”— X-Scale vs http://thepiratebay.se/torrent/6554331/Papers_from_Philosophical_Transactions_of_the_Royal_Society__fro
19:07 πŸ”— X-Scale Seems to be the same package
19:07 πŸ”— kanzure man this distribution infrastructure sucks
19:09 πŸ”— X-Scale one comment: "Brilliant upload and thoughtful rationale. Many thanks for this. For those who care, a fair amount of the non-PD portions of these journals can be found on rutracker; search for Royal Society."
19:10 πŸ”— balrog_ hah
19:10 πŸ”— * balrog_ takes a look
19:10 πŸ”— kanzure or on libgen
19:17 πŸ”— SketchCow Forbes Guy is Dulllllllllllllllllllllllllllllll
19:26 πŸ”— SketchCow Dulllllll
19:26 πŸ”— SketchCow He's gone now
19:26 πŸ”— SketchCow sooooo dullllllll
19:58 πŸ”— SketchCow http://chronicle.com/blogs/profhacker/civil-disobedience-the-aaron-swartz-memorial-jstor-liberator/45397
20:06 πŸ”— chronomex "fair and balanced"
20:17 πŸ”— SketchCow http://web.archive.org/web/*/http://archive.org is a real thing
20:19 πŸ”— ersi We need to go deeper
20:19 πŸ”— SketchCow http://media.tumblr.com/tumblr_li3y1guDbS1qbdtco.gif
20:20 πŸ”— ersi <3
20:23 πŸ”— ersi Hmmmm, the new wayback machine doesn't redirect you to the links you click on. Like clicking on "Webmasters" on the eldest archived version, the address bar is still at archive.org/ then
20:23 πŸ”— Ymgve http://web.archive.org/robots.txt
20:23 πŸ”— Ymgve you can't go deeper
20:23 πŸ”— ersi But at least the new wayback is super duper fast
20:23 πŸ”— ersi Ymgve: Aww :<
20:24 πŸ”— Ymgve but I wonder what /1 matches
20:24 πŸ”— Ymgve oh wait, of course, 199x
20:25 πŸ”— ersi http://web.archive.org/web/19980301000000/http://archive.org/get_archived.html
20:31 πŸ”— SketchCow OK, so, seriously.
20:31 πŸ”— SketchCow archiveteam.org closed wiki. That needs to stop.
20:31 πŸ”— SketchCow Can people please help me find working, installable anti-spam measures?
20:31 πŸ”— SketchCow And then we'll open it again.
20:31 πŸ”— Nemo_bis Yes|
20:31 πŸ”— SketchCow Let's fix this. Today.
20:32 πŸ”— Nemo_bis I just did a big research.
20:32 πŸ”— Nemo_bis https://www.mediawiki.org/wiki/Thread:Extension_talk:ConfirmEdit/Wikis_account_registration_tour
20:32 πŸ”— SketchCow I hope your solution isn't "massive burlap sacks"
20:32 πŸ”— SketchCow Because that's your solution to everything
20:32 πŸ”— Nemo_bis My solution is copy Arch Wiki
20:33 πŸ”— Nemo_bis SketchCow: did you receive the third one?
20:33 πŸ”— SketchCow not yet
20:33 πŸ”— SketchCow Dude, mail.
20:33 πŸ”— SketchCow You sent it in a sack
20:33 πŸ”— Nemo_bis That's the only authorised kind of sack btw. I had to fetch it with a bike+train ride 30 km away from home.
20:33 πŸ”— SketchCow It might be on another ship
20:33 πŸ”— Nemo_bis It comes by plane.
20:33 πŸ”— chronomex giant canvas sack?
20:33 πŸ”— SketchCow So basically you live on the set of the Godfather's flashbacks
20:33 πŸ”— Nemo_bis So they said.
20:33 πŸ”— chronomex nice.
20:33 πŸ”— Nemo_bis No, it's plastic.
20:33 πŸ”— chronomex oh
20:33 πŸ”— chronomex :(
20:33 πŸ”— S[h]O[r]T What is the output of "date -u +%V`uname`|sha256sum|sed 's/\W//g'"?
20:34 πŸ”— S[h]O[r]T lol
20:35 πŸ”— balrog_ 979aa183120fc18c292abab0ab967e5bcf132b375f7f8f3283637e6bb10996bb
20:49 πŸ”— DFJustin The Archive will provide historians, researchers, scholars, and others access to this vast collection of data (reaching ten terabytes), and ensure the longevity of this information.
20:50 πŸ”— DFJustin so that's three orders of magnitude in 16 years
20:52 πŸ”— * DFJustin awaits the 10 exabyte party
20:54 πŸ”— Nemo_bis SketchCow: was the advice enough?
20:55 πŸ”— Nemo_bis In short, just use https://www.mediawiki.org/wiki/Extension:ConfirmEdit#QuestyCaptcha
20:55 πŸ”— Nemo_bis I'm sure you can find all the special witty questions one might ever need
20:57 πŸ”— Coderjoe I would not suggest the uname bit, because that would differ between OSes (obviously)
21:00 πŸ”— Nemo_bis Well one does not really *have to* copy Arch Wiki. :D
21:00 πŸ”— Nemo_bis It was just funny
21:01 πŸ”— Nemo_bis Of course Arch Wiki must be different from all others. ;)
21:01 πŸ”— Coderjoe and stupid
21:01 πŸ”— Nemo_bis Why stupid?
21:02 πŸ”— Coderjoe what if I am a macos or bsd (or windows) user that happens to be trying out arch on a second machine and want to use my primary system for doing stuff on the wiki (or something)?
21:03 πŸ”— kanzure did anyone archive the tweets of @tomjdolan
21:04 πŸ”— balrog_ kanzure: pull it from google cache
21:04 πŸ”— SketchCow http://i.imgur.com/o92sl.gif
21:04 πŸ”— balrog_ though that's not the whole thing. I've seen it across news sites though
21:05 πŸ”— SketchCow Will do, Nemo_bis
21:05 πŸ”— kanzure buzzfeed might have it?
21:06 πŸ”— Nemo_bis Coderjoe: sure, but most people going there surely have linux.
21:07 πŸ”— Nemo_bis Coderjoe: trust me, it's not worse then SpongeBob questions in German. *That* was impossible and unfair. At least date has man page.
21:18 πŸ”— S[h]O[r]T what if it gave you the id of a textfiles tweet and then you had to go copy/paste it :P
21:19 πŸ”— SketchCow I really like questycaptcha
21:20 πŸ”— SketchCow http://www.nextlevelofnews.com/2013/01/prosecutors-husband-tomjdolan-aaron-swartz-was-offered-a-6-month-deal-by-buzzfeed.html
21:20 πŸ”— SketchCow http://topsy.com/twitter/tomjdolan <---- grab that immediately.
21:27 πŸ”— SketchCow http://archive.is/ is real
21:28 πŸ”— ersi indeed, and it's great.
21:28 πŸ”— ersi I wonder who's behind it
21:29 πŸ”— alard http://blog.archive.is/post/38139265209/what-will-happen-to-the-data-when-you-shut-the-site
21:30 πŸ”— ersi 10TB?! :O
21:30 πŸ”— SketchCow hahaha
21:30 πŸ”— SketchCow Wow, seems kind of sketchy, huh
21:30 πŸ”— tef he saves screenshots of the sites too.
21:30 πŸ”— tef probably as png.
21:34 πŸ”— ersi probably means he's using a browser, right?
21:36 πŸ”— alard http://archive.is/clqhG
21:38 πŸ”— ersi ah, coolio
21:38 πŸ”— alard It doesn't seem to have any plugins. http://archive.is/FfdwK
21:40 πŸ”— ersi Wonder if it'll do flash/Java applets any way though
21:41 πŸ”— kanzure phantomjs used to do flash :/
21:41 πŸ”— kanzure until they removed the plugin
21:53 πŸ”— bsmith094 anyone archive aaron's blog yet?
22:00 πŸ”— adamc[a] bsmith094, pretty sure I saw a couple copies on archive.org
22:00 πŸ”— SketchCow https://twitter.com/textfiles/status/291303908205268994
22:01 πŸ”— DFJustin https://archive.org/details/www.aaronsw.com-20130112-mirror
22:20 πŸ”— SketchCow OK, here we go.
22:27 πŸ”— SketchCow Archive Team Wiki is BACK TO NEW USER CREATION
22:27 πŸ”— beardicus should i be worried that i'm downloading lots of wikipedia in the yahooblog-grab ?
22:28 πŸ”— beardicus seems like it's grabbing urls such as "index.php?title=Special:WhatLinksHere&target=File:Jackie+Chan+2002.jpg.html" when maybe it should just be grabbing hotlinked images?
22:29 πŸ”— alard I find the Yahooblogs quite annoying. They're slow, messy, and are they even disappearing?
22:30 πŸ”— ersi There seems to be a few asking about that though (yahooblogs grab -> wikipedia)
22:31 πŸ”— adamcaudi I just kill the ones that do that - they never seem to make it back out
22:33 πŸ”— alard If someone wants to add wikipedia to the reject-regex, here it is: https://github.com/ArchiveTeam/yahooblog-grab/blob/master/pipeline.py#L87-L88
22:35 πŸ”— alard At the moment it tries to download every url that ends with an image extension, including these Wikipedia urls.
22:37 πŸ”— beardicus alard, these all end in .html though... does the "$" in the regex not mean "end of line"?
22:37 πŸ”— beardicus in the accept-regex, that is ^^^^^
22:37 πŸ”— alard No, the url ends in .jpg, Wget adds the .html later (the --adjust-extensions option).
22:38 πŸ”— ersi Ah
22:38 πŸ”— ersi makes perfect sense, or well - it makes sense
22:38 πŸ”— chronomex to someone it makes sense
22:38 πŸ”— chronomex that's what is important
22:38 πŸ”— alard The --adjust-extensions option is handy if you have folders that aren't folders, for the sites where you download a page /a and then later find /a/b and discover that /a should have been a folder.
22:39 πŸ”— alard If that makes any sense. :)
22:39 πŸ”— SketchCow Who here: 1. Has been around forever 2. I know you 3. Has time to do a VERY boring Wiki thing.
22:39 πŸ”— chronomex not I
22:40 πŸ”— beardicus aha. thanks alard. i've got some hackerspace time to spend in a few hours... maybe i'll figure out the wizardry needed to blacklist wikipedia images or figure out why it seems to never stop slurping.
22:42 πŸ”— alard beardicus: That would be nice. It's a pity that Wget's --span-hosts option doesn't make a difference between urls from image tags and non-image urls.
22:43 πŸ”— beardicus seems more a shame that an html page would be served up to a .jpg request :)
22:44 πŸ”— alard Perhaps you can convince the mediawiki-people to change their software. :)
22:45 πŸ”— beardicus looks like keeping upload.wikimedia would get us the actual jpgs, whereas vi.wikipedia could get the boot.
22:48 πŸ”— ersi So, either fix/change a regexp (in yahooblag-grab) or fix/change another regexp (mediawiki is a regexp of regexps)
22:48 πŸ”— ersi :D
23:37 πŸ”— t4rx is there a way to run ATW in non-virtual environment? my distro is too cool to offer stable and reliable virtualization options...
23:42 πŸ”— t4rx oh, found a repo at github, guess i can set up non-virtual environment for it.
23:42 πŸ”— t4rx s/a repo/the repo
23:45 πŸ”— balrog_ t4rx: pip install seesaw
23:45 πŸ”— balrog_ then clone the repo for the project that you want to participate in
23:45 πŸ”— balrog_ then run the get wget lua script which will download and compile wget-lue
23:45 πŸ”— balrog_ lua*
23:46 πŸ”— balrog_ then run seesaw as follows: run-pipeline ./pipeline.py username
23:46 πŸ”— balrog_ run-pipeline --help if you want info on parameters
23:50 πŸ”— Nemo_bis http://aubreymcfato.com/2013/01/15/how-to-exploit-academics/
23:59 πŸ”— kanzure why not just get a mole into elsevier and dump the databases

irclogger-viewer