[00:12] [16:11:28.004] GET http://aaronsw.archiveteam.org/next-item?r=0.25202059401081345 [HTTP/1.1 500 Internal Server Error 256ms] [00:13] is this related to the DOS attack? [00:13] one per person, iirc [00:13] see -bs [00:13] there's something broken :( [00:13] He's looking at it [00:13] We already blew out file handles. :) [00:13] heh [00:14] This is a very nice experience test for underscor [00:14] He's going to learn a lot tonight [00:15] haha :) [00:18] you must be so proud of your star pupil SketchCow [00:19] Every damned day [00:20] And you, you're like the Voldemort. I expect you to rise against us from an australian law firm in 2023, having bided your time appropritately [00:21] New meaning for the term "Kangaroo Court" [00:22] actually, one difference is that Voldemort knew what he was doing [00:23] comparing your knowledge of computers to mine is like a needle and a haystack [00:23] Not intially [00:23] Did you really just say that [00:23] ...I got it the wrong way around [00:23] you know what I meant >_< [00:25] https://twitter.com/textfiles/status/290975346147340288 [00:32] yo [00:33] WELCOME [00:34] aaaaand now it's down [00:34] hi ivan, X-Scale, nitro2k01 [00:34] jason sent me here [00:34] said something about some infrastructure for rapidly archiving a failing site? [00:35] Jason Fucking Scott; Middle name Fucking, hence the capitalization. [00:35] Fuuuuuuuuuuuuuuuuuuuuuuuuuuuuucking [00:35] Someone point him to the Warrior [00:35] Ywah, isn't the link to it supposed to be in the /topic? [00:36] also this: [00:36] https://groups.google.com/group/science-liberation-front [00:36] i've been working on some mobile app that serves as a proxy for android and iphone that college students run to grab papers [00:38] dunno if you guys would be into that [00:39] wow there's 100 people in this channel [00:39] nitro2k01: don't i know you from somewhere? [00:40] kanzure: there an irc channel for that? [00:40] also I want a proxy that not only grabs papers [00:42] there's ##hplusroadmap on irc.freenode.net i guess [00:42] we do do-it-yourself biohacking/genetic engineering/dna synthesis/nootropics and things. [00:42] and paperbot, our paper-fetching irc bot [00:43] balrog_: well, we could just deploy a botnet [00:44] unfortunately i'm not as hooked into the android malware scene these days, i have no idea what software would be a good choice [00:44] transproxy doesn't look like what i need, and proxydroid is only for redirecting your outgoing requests (not accepting incoming connections) [00:44] plus proxydriod totally fails to run on android-x86 because it's all armeabi junk [00:44] I'm thinking more of browser plugins [00:44] for desktop browsers [00:45] you're going to run a proxy in a browser plugin? [00:45] no, a browser plugin that just saves viewed PDFs and metadata [00:45] zotero does that already [00:45] or something of that sort [00:45] paperbot is based on a headless version of zotero translators [00:45] https://github.com/zotero/translators [00:45] https://github.com/zotero/translation-server [00:46] however, you have to click 'save'- this could be enabled by default instead and it could be switched to HTTP POST to somewhere [00:46] i think there's also a zotero server for collecting pdfs/bibliographies but i've never used it [00:47] (like for managing a few institutional users) [00:49] is that what you had in mind? [00:51] brb [01:08] Liberator is stuck on uploading. [01:08] Damn you DoS! [01:12] http://arstechnica.com/tech-policy/2013/01/aaron-swartz-memorial-jstor-liberator-sets-public-domain-academic-articles-free/ Article is up, SketchCow [01:13] neat [01:13] nice, hope that brings in some attention [01:14] "By running the scriptâwhich is limited to once per browser" what [01:14] that must be a misunderstanding [01:15] that's deliberate [01:15] that's silly [01:15] Each browser can run it only once. [01:15] It's a memorial more than an archiving effort. [01:15] If it were the latter, we would've fired our warriors up. [01:15] do your warriors have access? [01:16] "(they were only dropped this morning)" also a misunderstanding [01:17] No, they were. [01:18] i thought that's because they can't go after his estate [01:19] it would be more relevant to report if that /didn't/ happen [01:21] http://www.huffingtonpost.com/2013/01/14/aaron-swartz-stephen-heymann_n_2473278.html?utm_hp_ref=tw [01:28] chronomex: ping [01:28] pong [01:31] alard: redis seems to not be working for underscor now [01:31] someone has probably already spotted this, but http://aaronsw.archiveteam.org/ just seems to have gone down for me [01:32] we're on it [01:32] http://arstechnica.com/tech-policy/2013/01/aaron-swartz-memorial-jstor-liberator-sets-public-domain-academic-articles-free/ [01:32] cool :) I almost managed to run the bookmarklet before it died :) [01:34] underscor is here now. [01:34] Let's sort this [01:35] pong [01:41] underscor: anything else needed? [01:41] we appear to be handling this in -bs [01:41] for better or for worse [01:41] No, other than alard's thoughts on what happened [01:41] also that [01:41] Thanks [01:45] Hi [01:45] fwiw, the JSTOR liberator seems to be sticking at "Asking for next item..." on my machine still. though I am running iceweasel 18, which is probably not well tested [01:46] ex-parrot: did you run it successfully before? [01:46] GLaDOS: nope, but the site went down at roughly the same instant I tried to run it for the first time, so who knows what state it's in [01:47] Hm [01:47] I also have Ghostery, AdBlock and NoScript installed which have a tendency to break javascript in unusual ways [01:48] disabling them makes no difference [01:49] please hold [01:50] I created a crappy PoC maybe a week ago for a bug I saw on SpringerLink. Their "LookInside" functionality just loads up a png of the page with JS. [01:51] It turns out the png url is /000.png, /001.png. It is possible to incrementally get each page image, download them into local browser and upload back to a server where they are converted to PDF. [01:52] I had created a shitty greasemonkey script for this last week and its available at http://0bin.net/paste/29713b9cbf8d1cd60f3cf07e71757ba429196833#SalSJ4E3+RxzQz15KrnaJ9g6gtUpGXj65YFUIH3rBTw= [01:52] nice [01:53] I had a look through the terms and this doesn't appear to be anything explictly restricting, viewing the "preview" in your browser. Obviously it would be possible to expand a similar script to download original PDF instead if available. [01:53] I am obviously not condoning the use of this or similar script by anyone to violate any laws in there respective countries. [01:54] wink wink nod nod [01:55] DonnchaC: could you also post that information here? https://groups.google.com/group/science-liberation-front [01:55] i wonder about dumping zotero translators into a greasemonkey csript [01:55] i think the api is different. i haven't used greasemonkey in, gosh, 4 years at least [01:57] also, i have some PoC in the works for removing watermarks from pdfs from publishers. not quite ready yet.. but if we can detect malware in pdf, we can certainly detect watermarks. [01:57] so far i've found that sciencedirect/elsevier/nature publishing group don't seem to add watermarks (confirming via md5sum of the documents from multiple different retrievals on different ezproxy endpoints) [01:57] ieee definitely adds visible watermarks.. [02:00] RSC journals add visible watermarks around the margins, not sure if they have other watermarks. [02:04] i keep forgetting who it is that adds that entire first page of watermarking [02:04] is it wiley?? i want to say wiley. :( [02:04] anyway the number one problem i am encountering is that i can't pick a reasonable pdf modification library for python [02:04] maybe there's something in pdf.js that could be used [02:04] https://github.com/mozilla/pdf.js [02:06] watermarking is easy to remove from scan-sourced media [02:07] you just extract the images and use them and that's it [02:07] yes [02:09] in pdf it's even easier because they are extra xml attributes in the file (more or less) [02:09] (please don't murder me; i'm not a pdf spec wizard yet) [02:10] xml elements, i mean. not attributes. [02:11] string him up! PDF wizard mana too low! [02:11] pdf is a messy standard [02:11] a lot of bells and whistles [02:11] I suggest decompressing though if you want to analyze as the first step [02:11] pdf allows all kinds of scary things like embedded flash [02:11] and javascript [02:12] Yeah it should be relativily straightforward to remove the copyright strings from a PDF [02:12] no it's not the copyright strings that matter [02:12] I have done some playing around with the format before. [02:12] (the identifying source strings) [02:12] "Authorized licensed use limited to: University of Texas at Austin. Downloaded on July 22, 2009 at 15:50 from IEEE Xplore. Restrictions apply." [02:12] that shit. [02:12] that shit's gotta go. [02:13] Have you go [02:13] http://scholar.google.com/scholar?q=%22IEEE+Xplore.+Restrictions+apply.%22 [02:36] download it twice, from different sources, null out the bits that are different :v [02:37] well [02:38] oen way is to pipe it into ghostscript and just convert it to another format and then back again [02:38] the problem with downloading from multiple sources is that it would require keeping track of which ezproxy servers have access to which publishers [02:38] i mean, that's not a huge problem. it's just annoying. [02:43] It is indeed. How extensivily are articles watermarked? [02:43] I've stripped that message from IEEE Xplore documents before [02:43] it was plain text inside the PDF [02:44] so I just replaced it with spaces [02:44] Is it just a couple of the big players or are a lot of pubishers doing that? [02:44] manually or with some script? [02:44] hah [02:44] Keeping it simple. [02:44] DonnchaC: it's really random, some publishers do others dont [02:44] yeah, that was my experience [02:44] i really want to write up a quick script to do it though [02:44] there are lots of sneakier ways you can watermark a pdf, but I haven't heard of them in use yet [02:45] once I realized it was a fixed string, I think I just used sed [02:45] chronomex: yeah, i think we should be out looking for them, but for now we shouldn't assume they are being that sneaky [02:45] Most watermarks will probably just be a plaintext tag in the PDF. [02:45] yes [02:45] but yeah, get a couple copies and cmp [02:45] sed works if you know the string in advance [02:45] i think what we need is a simple script that has a list of regexes [02:45] I suppose it will be an arms rase, they will only advanced to sneaker techniques when there is mass sharing and watermark removal [02:45] yes [02:46] the zotero team has proved that we can win the arms race [02:46] scrapers break -> they fix within 24 hours [02:46] well, if you get two copies through two different netblocks and the diffs are simple, then you have a profile for that particular publisher [02:46] also who cares about "hey this document was contributed to the public domain by this cool person on tuesday jan 15 2013" [02:46] also that [02:46] well, sometimes it includes an ip address [02:46] I don't mind [02:47] you would mind if you are downloading in mass [02:47] suddenly your professor gets the blame because you were in his lab for whatever reason [02:47] IA's scans of books include the library's stickers [02:47] You would want something that fails safe? If no matching watermark is found for a site you know watermarks documents. You probably don't want that shared in case there is a new form of watermark, potentionally getting someone in journal [02:48] especially if the document is redistributed [02:48] NOBODY IS GOING TO JAIL FOR PUBLIC DOMAIN WORKS [02:48] the last thing you want to do is get some poor bastard blamed for a pdf or some shit [02:48] NOT UNDER MY WATCH [02:52] Unfortunatly if there is large scale information liberation and redistribution they will target the small guys and whoever they can get [02:52] for distributing public domain materials? [02:52] haha, no not public domain [02:53] It's less about the distrobution than it is about accessing them to distribute [02:57] well, proxies are very easy to deploy. i should go writeup my mobile proxy idea somewhere. [02:59] not sure what sense of "mobile" you're referring to, but I have a stack of $20 TP-Link TL-WR703N OpenWRT-compatible routers with USB ports [02:59] easy to velcro to things, heh [02:59] The 703N is great, had lots of fun with those [03:00] I wish they were easier to solar power [03:00] filer: i mean for students to run on their phone while htey are on campus [03:00] browser extensions are cool but phones are always on [03:00] ah [03:00] just think how much battery life you could potentially be draining! [03:01] I wonder how long one of those routers could run on a cheap battery [03:01] I think they consume something like 100mw [03:01] ex-parrot, I just embedded one in a power strip - hidden and hard-wired for power [03:01] nice [03:01] that's genius, assuming you're installing it inside :) [03:02] and assuming the switching PSU small enough to fit inside a power strip is also well made enough not to catch fire after a while :/ [03:02] i keep forgetting the name of that really cheap board that you rop into a powerstrip [03:02] *drop [03:02] someone else I talked to had such an idea, but at the time there weren't routers that were tiny enough [03:02] filer: check dealextreme, there are versions which have a built in battery already. I did some numbers on trying to solar power them but it didn't look too practical [03:02] it was basically a linux server that was powered by cat5 or something [03:02] am i making this up? [03:03] shivaplug? [03:03] *sheeva [03:04] the SheevaPlug is cool, but more powerful and more expensive [03:04] I have one router that is actually the size of an iphone charger [03:04] unfortunately, I think it must use some RTOS [03:05] ex-parrot, https://twitter.com/adamcaudill/status/227249569765916672 [03:05] oh, don't thost just have tp-links inside? [03:06] very nice adamcaudi, certainly better than the overpriced govt engineered one doing the rounds a few months ago [03:06] That's what inspired it :) [03:06] oh yeah http://www.minipwner.com/index.php/minipwner-build [03:07] Actually talked to the guy that designed the $1300 version - I don't think he realized just how close you could get for $50 [03:07] ha [03:09] you could have built it 5+ years ago I guess, a gumstix would fit and they have had low end units which definitely came in at < $1300 [03:10] having a $20 router with USB definitely helps though [03:10] filer: speaking of, mind if I swing over in 20? [03:10] they are great. I have a few here for various projects. friend is using them as radio modules for robotics control [03:10] might want to unload a TPlink from you [03:10] no problem [03:11] coolz [03:13] Have you seen thegrugq's PORTAL project? It's a 703N that routes everything over TOR [03:14] neat [03:15] cool, I've wanted to have something like that [03:15] glad to know someone's already made it, saves me work :) [03:18] blast from the past: [03:18] https://groups.google.com/forum/?fromgroups=#!topic/diybio/SFuyGIAt74k [03:18] this was from when aaronsw was starting the getarticles group [03:21] why don't you start publishing the aaronsw documents as torrents, and distribute the torrents magnet links via an RSS feed? I think a lot of people would subscribe their torrent clients to the that feed and help store and distribute it [03:22] because nobody seeds [03:22] library genesis did that, and nobody fucking seeds it [03:22] http://libgen.net/ [03:22] ah, well, that sucks [03:23] it's probably the greatest dump of ebooks and academic articles ever [03:47] hmm there's a zotero plugin that is supposed to autosave pdfs when you browse to a page [03:47] (according to zotero's maintainer) [03:47] but he left in a cloud of smoke and now i'm not sure what he is talking about. any ideas? [03:56] greetings everyone [03:58] does a scripted version of jstor liberator exist? [04:00] there's a springerlink version, https://groups.google.com/group/science-liberation-front/t/d6bb86b96de8c6a6 [04:00] if that's what you mean? [04:01] i was hoping to find a bash/perl/python/etc version [04:02] I've got a few linux boxes scattered about that I'd like to toss at it [04:02] they seem to only accept one article per user, it's limited on the server end [04:03] I haven't seen that limit [04:03] oh, maybe it's on the client side neat [04:03] you guys all lied to me [04:04] someone on the internet lied to you? [04:06] hmm multiple people have asked me to change the name of that mailing list, any suggestions? [05:01] auto-save plugin for zotero https://groups.google.com/group/science-liberation-front/t/2b3b468fca63a6b2 [05:12] OK, I've piled all the godane material into collections n' crap [06:02] Whoa. [06:13] greetings. sorry if this has been asked 8971236497861 times already, but what's the status of getting the JSTOR liberator bookmarklet working again? [06:14] and is there anything I can do to help diagnose? [06:15] mjb_b: http://aaronsw.archiveteam.org/ [06:15] It's working, you may only use it once though. [06:18] I'd be neat if there was a system for people to suggest articles for others to liberate, thus further encouraging the one-per-browser? [06:20] Our admin is either asleep or broken [06:21] Why is it only one per browser? JSTOR limitation? [06:22] policy choice [06:24] it didn't work even once for me - on win7, with chrome [06:25] the next-item GET hangs [06:25] I think because something goes wrong with the frameset creation [06:25] the jstor document doesnt get put into the lower frame [06:27] the part of the script that tries to put it into the lower frame is resulting in an immediately canceled GET, according to the network tab in the developer tools [06:28] aaronsw.archiveteam.org homepage keeps showing the same most recently liberated doc...nothing liberated for a while... [06:29] immediately canceled GET may be symptomatic of needing an Access-Control-Allow-Origin HTTP header [06:30] Try running it in --disable-web-security (chrome) ? [06:30] my favorite option [06:30] --enable-surprise-buttsex [06:30] chronomex: do you guys mind me linking to science-liberation-front? [06:31] chronomex: yes that is the correct reading of that option [06:31] what's SLF, kanzure? [06:31] https://groups.google.com/group/science-liberation-front/ this? [06:31] chronomex: Things stopped woking - could you check the box? [06:32] chronomex: https://groups.google.com/group/science-liberation-front/t/2b3b468fca63a6b2 [06:32] and so on [06:32] chronomex: just grouping together some peeps who want to work on crawlers and things [06:32] kanzure: archiveteam welcomes inbound links from all comers [06:33] SketchCow: I don't see anything wrong. [06:33] ok. because when people are wondering about why you guys don't want more than 1 document per person, i feel sort of compelled to point out that there are others who would want more by linking to that. heh. [06:33] and it feels disingenous to be spamming your excellent channel [06:33] there are moving parts into which I have no visibility and no insight [06:34] no luck with --disable-web-security. Still hangs with empty lower frame, upper frame "Looking for another liberated item" [06:34] kanzure: if I'm not mistaken, this in particular is about making a statement rather than hoovering JSTOR [06:35] right, but some people might want to do more [06:35] I understnad [06:35] actually i think you guys should probably elaborate on the page itself [06:35] probably, yes [06:36] although that might shoot yourself in the foot. tough call. [06:36] he who shoots from the hip sometimes forgets to point away from foot first [06:36] archiveteam shoots from hip [06:38] chronomex: Thanks [06:41] Back and running again [06:42] He doesn't know how he fixed it [06:42] He logged in and it just worked [06:42] sometimes you just need to kick something [06:42] that's always the scariest kind of fix [06:43] yayyyy [06:43] * filer has successfully contributed [06:46] yes! it just worked for me [06:46] \o/ [06:46] the homepage is updating like gangbusters too [06:46] indeed [06:49] chronomex: apropos of nothing, I noticed recently that the north wall of the gov pubs collection at Suzzallo has many, many red boxes of microcards of parliamentary transcripts or something [06:49] I wonder if those are online [06:49] hm [06:52] yes, as a ProQuest service... http://parlipapers.chadwyck.co.uk/marketing/index.jsp ... bleaugh [06:54] good thing to know that all of those materials dating back to 1688 are safely protected by a paywall [06:55] hooray [06:56] btw, the "just liberated" list is showing doubles for me [06:56] looks fine to me, refresh? [06:57] hmm [06:57] full refresh (or perhaps it was just the second refresh) seems to have fixed it [07:01] darn it. my article was just an abstract. [07:01] I got one that was paywalled for $34 [07:01] so I tried again [07:02] 23:01 <@jblake> scrape liberty from the heels of your oppressors [07:02] 23:01 <@jblake> march against the paywalls of injustice! [07:02] Coderjoe: it could be sillier ... http://www.jstor.org/stable/3253788 [08:06] "march against the paywalls of injustice!" [08:06] I like this [08:09] filer: join science-liberation-front [08:09] #? [08:10] filer: it's a mailing list. http://groups.google.com/group/science-liberation-front [08:10] although we have a bunch of people in ##hplusroadmap [08:10] .. kind of a happenstance i guess. maybe a different channel should be used. btw that was freenode. [08:12] cool, joined [08:12] i am busy poking at a possible ezproxy exploit [08:16] chumby is perhaps closing their remaining assets http://forum.chumby.com/viewtopic.php?id=8457 [08:16] I'll fire off a warc [08:18] ouch, $4300-$5500/mo [08:18] I wonder how many chumbies that is [08:19] 40k chumbies [08:19] it's 11 cents a month per chumby [08:22] "3) Find someone in the community to host this forum and the wiki. If you can do this, please contact me." anyone want to offer? [08:47] well I'm sucking down the source code site and the forum [08:48] might as well get the wiki too while I'm at it [08:50] relatively small wiki, 197 pages [09:23] underscor, SketchCow, it's not a "secret" that https://archive.org/details/philosophicaltransactions come from JSTOR, is it? [09:23] (It could be at most a "segreto di pulcinella", as we'd say in Italian.) [09:36] a secret puffin? [09:42] It's not a secret. [09:42] But it's besides the point related to the liberator. [09:45] SketchCow: so i hav a warc of stuff, which ia collection? yours or brewsters? [09:45] (sure i've mentioned the content of said warc enough times) [09:45] I've forgotten [09:46] ok, I took a crawl of hn front page + articles with a crawler that supports ajax [09:46] so I stick it in ark-aaronsw or aaronsw [09:46] assuming it's relevant [09:50] Yeah [09:50] Do it for either [09:51] http://archive.org/details/magazine_rack_misc is fun stuff [09:53] "Hey all, I'm coordinating a series of memorial hackathons for Aaron Swartz. Currently there's going to be one at Noisebridge in SF on Jan. 26 (ish) and another somewhere in Boston, but the more the better." [09:53] "The idea is to bring together people at hackerspaces around the world to work on projects that in some way continue the work that Aaron did to facilitate the sharing of human knowledge, social/political justice, and free culture." [09:53] https://groups.google.com/group/science-liberation-front/t/3d17904bef7759b0 [09:55] ok officially I am too incompetent to use the archive uploader http://archive.org/details/NewsYcFrontpagePlusArticlesThreads [09:55] batcave was so much easier for my poor brain [10:46] tef: It needs a different media type, but the files are there. [11:20] https://ia601608.us.archive.org/23/items/NewsYcFrontpagePlusArticlesThreads/ << see :) [13:49] hi, can somebody tell me where are the docs uploaded to http://aaronsw.archiveteam.org/ are available? [13:51] also, i think the counter is restarted every once in a while. [14:18] I think the counter is missing a zero. [14:19] i saw it go from around 696 to 12 when i refreshed the page after a few minutes. [14:26] 12 was probably 1002, but there's a bug in the code that removes the 00 when it adds a thin space between 1 and 002. [14:26] It does Math.floor(n/1000) + " " + (n%1000). [14:54] so where are the downloaded dox? [14:55] I don't know. They're probably not available at the moment, but SketchCow will surely find a way to make them available later. [14:59] hm it undermines the legitimacy of the project a bit... documents should be available shortly after you submitted them. [15:00] i just showed the website to four people and each of them asked about where to find the assembled documents. [15:03] That may be true, but it's also easier said than done. [15:11] Awwwwwwww. [15:11] You know what I love? I mean love? [15:11] When someone comes to a project and complains. [15:11] Let's see. [15:11] The project was launched around 5pm last night. [15:12] So that's... hmmm, 16 hours or so. [15:12] We immediately started getting swamped. [15:12] We dealt with being swamped. [15:12] So I guess.... well.... [15:12] I know. Fuck you. [15:13] How long did we spend trying to keep the server up and dealing with DoS attempts and hacking attacks? Probably 8 of those 16 hours. [15:13] So.... there we go. [15:13] Morning, alard. [15:14] Hello. (Afternoon.) [15:17] Do you want to send underscor a suggestion to fix the counter thing? [15:18] That'll save him when he wakes up in whatever addled state he does this morning [15:18] I've done so, before I responded here. [15:19] SketchCow: good point, sorry about that. :/ [15:19] another thing for underscor: "millions of dollars in fees" -> "millions of dollars in fines" [15:19] can i help in some way? [15:20] Yes, you can shut the hell up. [15:22] (fees/fines, wasn't sure of the best term) [15:22] a fee is something you pay voluntarily, so yeah. it just sounds a bit weird [15:23]