#archiveteam 2013-01-15,Tue

↑back Search

Time	Nickname	Message
00:12 ^🔗	filer	[16:11:28.004] GET http://aaronsw.archiveteam.org/next-item?r=0.25202059401081345 [HTTP/1.1 500 Internal Server Error 256ms]
00:13 ^🔗	filer	is this related to the DOS attack?
00:13 ^🔗	Coderjoe	one per person, iirc
00:13 ^🔗	balrog_	see -bs
00:13 ^🔗	balrog_	there's something broken :(
00:13 ^🔗	SketchCow	He's looking at it
00:13 ^🔗	SketchCow	We already blew out file handles. :)
00:13 ^🔗	filer	heh
00:14 ^🔗	SketchCow	This is a very nice experience test for underscor
00:14 ^🔗	SketchCow	He's going to learn a lot tonight
00:15 ^🔗	chronomex	haha :)
00:18 ^🔗	BlueMaxim	you must be so proud of your star pupil SketchCow
00:19 ^🔗	SketchCow	Every damned day
00:20 ^🔗	SketchCow	And you, you're like the Voldemort. I expect you to rise against us from an australian law firm in 2023, having bided your time appropritately
00:21 ^🔗	SketchCow	New meaning for the term "Kangaroo Court"
00:22 ^🔗	TomRiddle	actually, one difference is that Voldemort knew what he was doing
00:23 ^🔗	TomRiddle	comparing your knowledge of computers to mine is like a needle and a haystack
00:23 ^🔗	SketchCow	Not intially
00:23 ^🔗	SketchCow	Did you really just say that
00:23 ^🔗	TomRiddle	...I got it the wrong way around
00:23 ^🔗	TomRiddle	you know what I meant >_<
00:25 ^🔗	SketchCow	https://twitter.com/textfiles/status/290975346147340288
00:32 ^🔗	kanzure	yo
00:33 ^🔗	SketchCow	WELCOME
00:34 ^🔗	SketchCow	aaaaand now it's down
00:34 ^🔗	kanzure	hi ivan, X-Scale, nitro2k01
00:34 ^🔗	kanzure	jason sent me here
00:34 ^🔗	kanzure	said something about some infrastructure for rapidly archiving a failing site?
00:35 ^🔗	nitro2k01	Jason Fucking Scott; Middle name Fucking, hence the capitalization.
00:35 ^🔗	SketchCow	Fuuuuuuuuuuuuuuuuuuuuuuuuuuuuucking
00:35 ^🔗	SketchCow	Someone point him to the Warrior
00:35 ^🔗	nitro2k01	Ywah, isn't the link to it supposed to be in the /topic?
00:36 ^🔗	kanzure	also this:
00:36 ^🔗	kanzure	https://groups.google.com/group/science-liberation-front
00:36 ^🔗	kanzure	i've been working on some mobile app that serves as a proxy for android and iphone that college students run to grab papers
00:38 ^🔗	kanzure	dunno if you guys would be into that
00:39 ^🔗	BlueMax	wow there's 100 people in this channel
00:39 ^🔗	kanzure	nitro2k01: don't i know you from somewhere?
00:40 ^🔗	balrog_	kanzure: there an irc channel for that?
00:40 ^🔗	balrog_	also I want a proxy that not only grabs papers
00:42 ^🔗	kanzure	there's ##hplusroadmap on irc.freenode.net i guess
00:42 ^🔗	kanzure	we do do-it-yourself biohacking/genetic engineering/dna synthesis/nootropics and things.
00:42 ^🔗	kanzure	and paperbot, our paper-fetching irc bot
00:43 ^🔗	kanzure	balrog_: well, we could just deploy a botnet
00:44 ^🔗	kanzure	unfortunately i'm not as hooked into the android malware scene these days, i have no idea what software would be a good choice
00:44 ^🔗	kanzure	transproxy doesn't look like what i need, and proxydroid is only for redirecting your outgoing requests (not accepting incoming connections)
00:44 ^🔗	kanzure	plus proxydriod totally fails to run on android-x86 because it's all armeabi junk
00:44 ^🔗	balrog_	I'm thinking more of browser plugins
00:44 ^🔗	balrog_	for desktop browsers
00:45 ^🔗	kanzure	you're going to run a proxy in a browser plugin?
00:45 ^🔗	balrog_	no, a browser plugin that just saves viewed PDFs and metadata
00:45 ^🔗	kanzure	zotero does that already
00:45 ^🔗	balrog_	or something of that sort
00:45 ^🔗	kanzure	paperbot is based on a headless version of zotero translators
00:45 ^🔗	kanzure	https://github.com/zotero/translators
00:45 ^🔗	kanzure	https://github.com/zotero/translation-server
00:46 ^🔗	kanzure	however, you have to click 'save'- this could be enabled by default instead and it could be switched to HTTP POST to somewhere
00:46 ^🔗	kanzure	i think there's also a zotero server for collecting pdfs/bibliographies but i've never used it
00:47 ^🔗	kanzure	(like for managing a few institutional users)
00:49 ^🔗	kanzure	is that what you had in mind?
00:51 ^🔗	balrog_	brb
01:08 ^🔗	GLaDOS	Liberator is stuck on uploading.
01:08 ^🔗	GLaDOS	Damn you DoS!
01:12 ^🔗	GLaDOS	http://arstechnica.com/tech-policy/2013/01/aaron-swartz-memorial-jstor-liberator-sets-public-domain-academic-articles-free/ Article is up, SketchCow
01:13 ^🔗	kanzure	neat
01:13 ^🔗	BlueMax	nice, hope that brings in some attention
01:14 ^🔗	kanzure	"By running the scriptâwhich is limited to once per browser" what
01:14 ^🔗	kanzure	that must be a misunderstanding
01:15 ^🔗	balrog_	that's deliberate
01:15 ^🔗	kanzure	that's silly
01:15 ^🔗	GLaDOS	Each browser can run it only once.
01:15 ^🔗	GLaDOS	It's a memorial more than an archiving effort.
01:15 ^🔗	GLaDOS	If it were the latter, we would've fired our warriors up.
01:15 ^🔗	kanzure	do your warriors have access?
01:16 ^🔗	kanzure	"(they were only dropped this morning)" also a misunderstanding
01:17 ^🔗	GLaDOS	No, they were.
01:18 ^🔗	kanzure	i thought that's because they can't go after his estate
01:19 ^🔗	kanzure	it would be more relevant to report if that /didn't/ happen
01:21 ^🔗	balrog_	http://www.huffingtonpost.com/2013/01/14/aaron-swartz-stephen-heymann_n_2473278.html?utm_hp_ref=tw
01:28 ^🔗	SketchCow	chronomex: ping
01:28 ^🔗	chronomex	pong
01:31 ^🔗	SketchCow	alard: redis seems to not be working for underscor now
01:31 ^🔗	ex-parrot	someone has probably already spotted this, but http://aaronsw.archiveteam.org/ just seems to have gone down for me
01:32 ^🔗	chronomex	we're on it
01:32 ^🔗	SketchCow	http://arstechnica.com/tech-policy/2013/01/aaron-swartz-memorial-jstor-liberator-sets-public-domain-academic-articles-free/
01:32 ^🔗	ex-parrot	cool :) I almost managed to run the bookmarklet before it died :)
01:34 ^🔗	SketchCow	underscor is here now.
01:34 ^🔗	SketchCow	Let's sort this
01:35 ^🔗	underscor	pong
01:41 ^🔗	SketchCow	underscor: anything else needed?
01:41 ^🔗	chronomex	we appear to be handling this in -bs
01:41 ^🔗	chronomex	for better or for worse
01:41 ^🔗	underscor	No, other than alard's thoughts on what happened
01:41 ^🔗	underscor	also that
01:41 ^🔗	SketchCow	Thanks
01:45 ^🔗	DonnchaC	Hi
01:45 ^🔗	ex-parrot	fwiw, the JSTOR liberator seems to be sticking at "Asking for next item..." on my machine still. though I am running iceweasel 18, which is probably not well tested
01:46 ^🔗	GLaDOS	ex-parrot: did you run it successfully before?
01:46 ^🔗	ex-parrot	GLaDOS: nope, but the site went down at roughly the same instant I tried to run it for the first time, so who knows what state it's in
01:47 ^🔗	GLaDOS	Hm
01:47 ^🔗	ex-parrot	I also have Ghostery, AdBlock and NoScript installed which have a tendency to break javascript in unusual ways
01:48 ^🔗	ex-parrot	disabling them makes no difference
01:49 ^🔗	chronomex	please hold
01:50 ^🔗	DonnchaC	I created a crappy PoC maybe a week ago for a bug I saw on SpringerLink. Their "LookInside" functionality just loads up a png of the page with JS.
01:51 ^🔗	DonnchaC	It turns out the png url is /000.png, /001.png. It is possible to incrementally get each page image, download them into local browser and upload back to a server where they are converted to PDF.
01:52 ^🔗	DonnchaC	I had created a shitty greasemonkey script for this last week and its available at http://0bin.net/paste/29713b9cbf8d1cd60f3cf07e71757ba429196833#SalSJ4E3+RxzQz15KrnaJ9g6gtUpGXj65YFUIH3rBTw=
01:52 ^🔗	chronomex	nice
01:53 ^🔗	DonnchaC	I had a look through the terms and this doesn't appear to be anything explictly restricting, viewing the "preview" in your browser. Obviously it would be possible to expand a similar script to download original PDF instead if available.
01:53 ^🔗	DonnchaC	I am obviously not condoning the use of this or similar script by anyone to violate any laws in there respective countries.
01:54 ^🔗	chronomex	wink wink nod nod
01:55 ^🔗	kanzure	DonnchaC: could you also post that information here? https://groups.google.com/group/science-liberation-front
01:55 ^🔗	kanzure	i wonder about dumping zotero translators into a greasemonkey csript
01:55 ^🔗	kanzure	i think the api is different. i haven't used greasemonkey in, gosh, 4 years at least
01:57 ^🔗	kanzure	also, i have some PoC in the works for removing watermarks from pdfs from publishers. not quite ready yet.. but if we can detect malware in pdf, we can certainly detect watermarks.
01:57 ^🔗	kanzure	so far i've found that sciencedirect/elsevier/nature publishing group don't seem to add watermarks (confirming via md5sum of the documents from multiple different retrievals on different ezproxy endpoints)
01:57 ^🔗	kanzure	ieee definitely adds visible watermarks..
02:00 ^🔗	DonnchaC	RSC journals add visible watermarks around the margins, not sure if they have other watermarks.
02:04 ^🔗	kanzure	i keep forgetting who it is that adds that entire first page of watermarking
02:04 ^🔗	kanzure	is it wiley?? i want to say wiley. :(
02:04 ^🔗	kanzure	anyway the number one problem i am encountering is that i can't pick a reasonable pdf modification library for python
02:04 ^🔗	kanzure	maybe there's something in pdf.js that could be used
02:04 ^🔗	kanzure	https://github.com/mozilla/pdf.js
02:06 ^🔗	balrog_	watermarking is easy to remove from scan-sourced media
02:07 ^🔗	balrog_	you just extract the images and use them and that's it
02:07 ^🔗	chronomex	yes
02:09 ^🔗	kanzure	in pdf it's even easier because they are extra xml attributes in the file (more or less)
02:09 ^🔗	kanzure	(please don't murder me; i'm not a pdf spec wizard yet)
02:10 ^🔗	kanzure	xml elements, i mean. not attributes.
02:11 ^🔗	instence	string him up! PDF wizard mana too low!
02:11 ^🔗	balrog_	pdf is a messy standard
02:11 ^🔗	balrog_	a lot of bells and whistles
02:11 ^🔗	balrog_	I suggest decompressing though if you want to analyze as the first step
02:11 ^🔗	chronomex	pdf allows all kinds of scary things like embedded flash
02:11 ^🔗	kanzure	and javascript
02:12 ^🔗	DonnchaC	Yeah it should be relativily straightforward to remove the copyright strings from a PDF
02:12 ^🔗	kanzure	no it's not the copyright strings that matter
02:12 ^🔗	DonnchaC	I have done some playing around with the format before.
02:12 ^🔗	DonnchaC	(the identifying source strings)
02:12 ^🔗	kanzure	"Authorized licensed use limited to: University of Texas at Austin. Downloaded on July 22, 2009 at 15:50 from IEEE Xplore. Restrictions apply."
02:12 ^🔗	kanzure	that shit.
02:12 ^🔗	kanzure	that shit's gotta go.
02:13 ^🔗	DonnchaC	Have you go
02:13 ^🔗	kanzure	http://scholar.google.com/scholar?q=%22IEEE+Xplore.+Restrictions+apply.%22
02:36 ^🔗	tef	download it twice, from different sources, null out the bits that are different :v
02:37 ^🔗	kanzure	well
02:38 ^🔗	kanzure	oen way is to pipe it into ghostscript and just convert it to another format and then back again
02:38 ^🔗	kanzure	the problem with downloading from multiple sources is that it would require keeping track of which ezproxy servers have access to which publishers
02:38 ^🔗	kanzure	i mean, that's not a huge problem. it's just annoying.
02:43 ^🔗	DonnchaC	It is indeed. How extensivily are articles watermarked?
02:43 ^🔗	filer	I've stripped that message from IEEE Xplore documents before
02:43 ^🔗	filer	it was plain text inside the PDF
02:44 ^🔗	filer	so I just replaced it with spaces
02:44 ^🔗	DonnchaC	Is it just a couple of the big players or are a lot of pubishers doing that?
02:44 ^🔗	kanzure	manually or with some script?
02:44 ^🔗	chronomex	hah
02:44 ^🔗	DonnchaC	Keeping it simple.
02:44 ^🔗	kanzure	DonnchaC: it's really random, some publishers do others dont
02:44 ^🔗	filer	yeah, that was my experience
02:44 ^🔗	kanzure	i really want to write up a quick script to do it though
02:44 ^🔗	chronomex	there are lots of sneakier ways you can watermark a pdf, but I haven't heard of them in use yet
02:45 ^🔗	filer	once I realized it was a fixed string, I think I just used sed
02:45 ^🔗	kanzure	chronomex: yeah, i think we should be out looking for them, but for now we shouldn't assume they are being that sneaky
02:45 ^🔗	DonnchaC	Most watermarks will probably just be a plaintext tag in the PDF.
02:45 ^🔗	chronomex	yes
02:45 ^🔗	filer	but yeah, get a couple copies and cmp
02:45 ^🔗	kanzure	sed works if you know the string in advance
02:45 ^🔗	kanzure	i think what we need is a simple script that has a list of regexes
02:45 ^🔗	DonnchaC	I suppose it will be an arms rase, they will only advanced to sneaker techniques when there is mass sharing and watermark removal
02:45 ^🔗	chronomex	yes
02:46 ^🔗	kanzure	the zotero team has proved that we can win the arms race
02:46 ^🔗	kanzure	scrapers break -> they fix within 24 hours
02:46 ^🔗	filer	well, if you get two copies through two different netblocks and the diffs are simple, then you have a profile for that particular publisher
02:46 ^🔗	chronomex	also who cares about "hey this document was contributed to the public domain by this cool person on tuesday jan 15 2013"
02:46 ^🔗	filer	also that
02:46 ^🔗	kanzure	well, sometimes it includes an ip address
02:46 ^🔗	chronomex	I don't mind
02:47 ^🔗	kanzure	you would mind if you are downloading in mass
02:47 ^🔗	kanzure	suddenly your professor gets the blame because you were in his lab for whatever reason
02:47 ^🔗	chronomex	IA's scans of books include the library's stickers
02:47 ^🔗	DonnchaC	You would want something that fails safe? If no matching watermark is found for a site you know watermarks documents. You probably don't want that shared in case there is a new form of watermark, potentionally getting someone in journal
02:48 ^🔗	kanzure	especially if the document is redistributed
02:48 ^🔗	chronomex	NOBODY IS GOING TO JAIL FOR PUBLIC DOMAIN WORKS
02:48 ^🔗	kanzure	the last thing you want to do is get some poor bastard blamed for a pdf or some shit
02:48 ^🔗	chronomex	NOT UNDER MY WATCH
02:52 ^🔗	DonnchaC	Unfortunatly if there is large scale information liberation and redistribution they will target the small guys and whoever they can get
02:52 ^🔗	filer	for distributing public domain materials?
02:52 ^🔗	kanzure	haha, no not public domain
02:53 ^🔗	adamcaudi	It's less about the distrobution than it is about accessing them to distribute
02:57 ^🔗	kanzure	well, proxies are very easy to deploy. i should go writeup my mobile proxy idea somewhere.
02:59 ^🔗	filer	not sure what sense of "mobile" you're referring to, but I have a stack of $20 TP-Link TL-WR703N OpenWRT-compatible routers with USB ports
02:59 ^🔗	filer	easy to velcro to things, heh
02:59 ^🔗	adamcaudi	The 703N is great, had lots of fun with those
03:00 ^🔗	ex-parrot	I wish they were easier to solar power
03:00 ^🔗	kanzure	filer: i mean for students to run on their phone while htey are on campus
03:00 ^🔗	kanzure	browser extensions are cool but phones are always on
03:00 ^🔗	filer	ah
03:00 ^🔗	kanzure	just think how much battery life you could potentially be draining!
03:01 ^🔗	filer	I wonder how long one of those routers could run on a cheap battery
03:01 ^🔗	filer	I think they consume something like 100mw
03:01 ^🔗	adamcaudi	ex-parrot, I just embedded one in a power strip - hidden and hard-wired for power
03:01 ^🔗	filer	nice
03:01 ^🔗	ex-parrot	that's genius, assuming you're installing it inside :)
03:02 ^🔗	ex-parrot	and assuming the switching PSU small enough to fit inside a power strip is also well made enough not to catch fire after a while :/
03:02 ^🔗	kanzure	i keep forgetting the name of that really cheap board that you rop into a powerstrip
03:02 ^🔗	kanzure	*drop
03:02 ^🔗	filer	someone else I talked to had such an idea, but at the time there weren't routers that were tiny enough
03:02 ^🔗	ex-parrot	filer: check dealextreme, there are versions which have a built in battery already. I did some numbers on trying to solar power them but it didn't look too practical
03:02 ^🔗	kanzure	it was basically a linux server that was powered by cat5 or something
03:02 ^🔗	kanzure	am i making this up?
03:03 ^🔗	ex-parrot	shivaplug?
03:03 ^🔗	filer	*sheeva
03:04 ^🔗	filer	the SheevaPlug is cool, but more powerful and more expensive
03:04 ^🔗	filer	I have one router that is actually the size of an iphone charger
03:04 ^🔗	filer	unfortunately, I think it must use some RTOS
03:05 ^🔗	adamcaudi	ex-parrot, https://twitter.com/adamcaudill/status/227249569765916672
03:05 ^🔗	filer	oh, don't thost just have tp-links inside?
03:06 ^🔗	ex-parrot	very nice adamcaudi, certainly better than the overpriced govt engineered one doing the rounds a few months ago
03:06 ^🔗	adamcaudi	That's what inspired it :)
03:06 ^🔗	filer	oh yeah http://www.minipwner.com/index.php/minipwner-build
03:07 ^🔗	adamcaudi	Actually talked to the guy that designed the $1300 version - I don't think he realized just how close you could get for $50
03:07 ^🔗	chronomex	ha
03:09 ^🔗	ex-parrot	you could have built it 5+ years ago I guess, a gumstix would fit and they have had low end units which definitely came in at < $1300
03:10 ^🔗	filer	having a $20 router with USB definitely helps though
03:10 ^🔗	chronomex	filer: speaking of, mind if I swing over in 20?
03:10 ^🔗	ex-parrot	they are great. I have a few here for various projects. friend is using them as radio modules for robotics control
03:10 ^🔗	chronomex	might want to unload a TPlink from you
03:10 ^🔗	filer	no problem
03:11 ^🔗	chronomex	coolz
03:13 ^🔗	adamcaudi	Have you seen thegrugq's PORTAL project? It's a 703N that routes everything over TOR
03:14 ^🔗	chronomex	neat
03:15 ^🔗	filer	cool, I've wanted to have something like that
03:15 ^🔗	filer	glad to know someone's already made it, saves me work :)
03:18 ^🔗	kanzure	blast from the past:
03:18 ^🔗	kanzure	https://groups.google.com/forum/?fromgroups=#!topic/diybio/SFuyGIAt74k
03:18 ^🔗	kanzure	this was from when aaronsw was starting the getarticles group
03:21 ^🔗	execute	why don't you start publishing the aaronsw documents as torrents, and distribute the torrents magnet links via an RSS feed? I think a lot of people would subscribe their torrent clients to the that feed and help store and distribute it
03:22 ^🔗	kanzure	because nobody seeds
03:22 ^🔗	kanzure	library genesis did that, and nobody fucking seeds it
03:22 ^🔗	kanzure	http://libgen.net/
03:22 ^🔗	execute	ah, well, that sucks
03:23 ^🔗	kanzure	it's probably the greatest dump of ebooks and academic articles ever
03:47 ^🔗	kanzure	hmm there's a zotero plugin that is supposed to autosave pdfs when you browse to a page
03:47 ^🔗	kanzure	(according to zotero's maintainer)
03:47 ^🔗	kanzure	but he left in a cloud of smoke and now i'm not sure what he is talking about. any ideas?
03:56 ^🔗	lithiumg	greetings everyone
03:58 ^🔗	lithiumg	does a scripted version of jstor liberator exist?
04:00 ^🔗	kanzure	there's a springerlink version, https://groups.google.com/group/science-liberation-front/t/d6bb86b96de8c6a6
04:00 ^🔗	kanzure	if that's what you mean?
04:01 ^🔗	lithiumg	i was hoping to find a bash/perl/python/etc version
04:02 ^🔗	lithiumg	I've got a few linux boxes scattered about that I'd like to toss at it
04:02 ^🔗	kanzure	they seem to only accept one article per user, it's limited on the server end
04:03 ^🔗	lithiumg	I haven't seen that limit
04:03 ^🔗	kanzure	oh, maybe it's on the client side neat
04:03 ^🔗	kanzure	you guys all lied to me
04:04 ^🔗	lithiumg	someone on the internet lied to you?
04:06 ^🔗	kanzure	hmm multiple people have asked me to change the name of that mailing list, any suggestions?
05:01 ^🔗	kanzure	auto-save plugin for zotero https://groups.google.com/group/science-liberation-front/t/2b3b468fca63a6b2
05:12 ^🔗	SketchCow	OK, I've piled all the godane material into collections n' crap
06:02 ^🔗	ersi	Whoa.
06:13 ^🔗	mjb_b	greetings. sorry if this has been asked 8971236497861 times already, but what's the status of getting the JSTOR liberator bookmarklet working again?
06:14 ^🔗	mjb_b	and is there anything I can do to help diagnose?
06:15 ^🔗	GLaDOS	mjb_b: http://aaronsw.archiveteam.org/
06:15 ^🔗	GLaDOS	It's working, you may only use it once though.
06:18 ^🔗	slythfox	I'd be neat if there was a system for people to suggest articles for others to liberate, thus further encouraging the one-per-browser?
06:20 ^🔗	SketchCow	Our admin is either asleep or broken
06:21 ^🔗	tsp_	Why is it only one per browser? JSTOR limitation?
06:22 ^🔗	chronomex	policy choice
06:24 ^🔗	mjb_b	it didn't work even once for me - on win7, with chrome
06:25 ^🔗	mjb_b	the next-item GET hangs
06:25 ^🔗	mjb_b	I think because something goes wrong with the frameset creation
06:25 ^🔗	mjb_b	the jstor document doesnt get put into the lower frame
06:27 ^🔗	mjb_b	the part of the script that tries to put it into the lower frame is resulting in an immediately canceled GET, according to the network tab in the developer tools
06:28 ^🔗	mjb_b	aaronsw.archiveteam.org homepage keeps showing the same most recently liberated doc...nothing liberated for a while...
06:29 ^🔗	chronomex	immediately canceled GET may be symptomatic of needing an Access-Control-Allow-Origin HTTP header
06:30 ^🔗	slythfox	Try running it in --disable-web-security (chrome) ?
06:30 ^🔗	kanzure	my favorite option
06:30 ^🔗	chronomex	--enable-surprise-buttsex
06:30 ^🔗	kanzure	chronomex: do you guys mind me linking to science-liberation-front?
06:31 ^🔗	kanzure	chronomex: yes that is the correct reading of that option
06:31 ^🔗	chronomex	what's SLF, kanzure?
06:31 ^🔗	Cameron_D	https://groups.google.com/group/science-liberation-front/ this?
06:31 ^🔗	SketchCow	chronomex: Things stopped woking - could you check the box?
06:32 ^🔗	kanzure	chronomex: https://groups.google.com/group/science-liberation-front/t/2b3b468fca63a6b2
06:32 ^🔗	kanzure	and so on
06:32 ^🔗	kanzure	chronomex: just grouping together some peeps who want to work on crawlers and things
06:32 ^🔗	chronomex	kanzure: archiveteam welcomes inbound links from all comers
06:33 ^🔗	chronomex	SketchCow: I don't see anything wrong.
06:33 ^🔗	kanzure	ok. because when people are wondering about why you guys don't want more than 1 document per person, i feel sort of compelled to point out that there are others who would want more by linking to that. heh.
06:33 ^🔗	kanzure	and it feels disingenous to be spamming your excellent channel
06:33 ^🔗	chronomex	there are moving parts into which I have no visibility and no insight
06:34 ^🔗	mjb_b	no luck with --disable-web-security. Still hangs with empty lower frame, upper frame "Looking for another liberated item"
06:34 ^🔗	chronomex	kanzure: if I'm not mistaken, this in particular is about making a statement rather than hoovering JSTOR
06:35 ^🔗	kanzure	right, but some people might want to do more
06:35 ^🔗	chronomex	I understnad
06:35 ^🔗	kanzure	actually i think you guys should probably elaborate on the page itself
06:35 ^🔗	chronomex	probably, yes
06:36 ^🔗	kanzure	although that might shoot yourself in the foot. tough call.
06:36 ^🔗	chronomex	he who shoots from the hip sometimes forgets to point away from foot first
06:36 ^🔗	chronomex	archiveteam shoots from hip
06:38 ^🔗	SketchCow	chronomex: Thanks
06:41 ^🔗	SketchCow	Back and running again
06:42 ^🔗	SketchCow	He doesn't know how he fixed it
06:42 ^🔗	SketchCow	He logged in and it just worked
06:42 ^🔗	chronomex	sometimes you just need to kick something
06:42 ^🔗	chronomex	that's always the scariest kind of fix
06:43 ^🔗	filer	yayyyy
06:43 ^🔗	*	filer has successfully contributed
06:46 ^🔗	mjb_b	yes! it just worked for me
06:46 ^🔗	chronomex	\o/
06:46 ^🔗	mjb_b	the homepage is updating like gangbusters too
06:46 ^🔗	filer	indeed
06:49 ^🔗	filer	chronomex: apropos of nothing, I noticed recently that the north wall of the gov pubs collection at Suzzallo has many, many red boxes of microcards of parliamentary transcripts or something
06:49 ^🔗	filer	I wonder if those are online
06:49 ^🔗	chronomex	hm
06:52 ^🔗	filer	yes, as a ProQuest service... http://parlipapers.chadwyck.co.uk/marketing/index.jsp ... bleaugh
06:54 ^🔗	filer	good thing to know that all of those materials dating back to 1688 are safely protected by a paywall
06:55 ^🔗	chronomex	hooray
06:56 ^🔗	Coderjoe	btw, the "just liberated" list is showing doubles for me
06:56 ^🔗	chronomex	looks fine to me, refresh?
06:57 ^🔗	Coderjoe	hmm
06:57 ^🔗	Coderjoe	full refresh (or perhaps it was just the second refresh) seems to have fixed it
07:01 ^🔗	Coderjoe	darn it. my article was just an abstract.
07:01 ^🔗	chronomex	I got one that was paywalled for $34
07:01 ^🔗	chronomex	so I tried again
07:02 ^🔗	kanzure	23:01 <@jblake> scrape liberty from the heels of your oppressors
07:02 ^🔗	kanzure	23:01 <@jblake> march against the paywalls of injustice!
07:02 ^🔗	chronomex	Coderjoe: it could be sillier ... http://www.jstor.org/stable/3253788
08:06 ^🔗	filer	"march against the paywalls of injustice!"
08:06 ^🔗	filer	I like this
08:09 ^🔗	kanzure	filer: join science-liberation-front
08:09 ^🔗	filer	#?
08:10 ^🔗	kanzure	filer: it's a mailing list. http://groups.google.com/group/science-liberation-front
08:10 ^🔗	kanzure	although we have a bunch of people in ##hplusroadmap
08:10 ^🔗	kanzure	.. kind of a happenstance i guess. maybe a different channel should be used. btw that was freenode.
08:12 ^🔗	filer	cool, joined
08:12 ^🔗	kanzure	i am busy poking at a possible ezproxy exploit
08:16 ^🔗	chronomex	chumby is perhaps closing their remaining assets http://forum.chumby.com/viewtopic.php?id=8457
08:16 ^🔗	chronomex	I'll fire off a warc
08:18 ^🔗	filer	ouch, $4300-$5500/mo
08:18 ^🔗	filer	I wonder how many chumbies that is
08:19 ^🔗	chronomex	40k chumbies
08:19 ^🔗	chronomex	it's 11 cents a month per chumby
08:22 ^🔗	Cameron_D	"3) Find someone in the community to host this forum and the wiki. If you can do this, please contact me." anyone want to offer?
08:47 ^🔗	chronomex	well I'm sucking down the source code site and the forum
08:48 ^🔗	chronomex	might as well get the wiki too while I'm at it
08:50 ^🔗	chronomex	relatively small wiki, 197 pages
09:23 ^🔗	Nemo_bis	underscor, SketchCow, it's not a "secret" that https://archive.org/details/philosophicaltransactions come from JSTOR, is it?
09:23 ^🔗	Nemo_bis	(It could be at most a "segreto di pulcinella", as we'd say in Italian.)
09:36 ^🔗	filer	a secret puffin?
09:42 ^🔗	SketchCow	It's not a secret.
09:42 ^🔗	SketchCow	But it's besides the point related to the liberator.
09:45 ^🔗	tef	SketchCow: so i hav a warc of stuff, which ia collection? yours or brewsters?
09:45 ^🔗	tef	(sure i've mentioned the content of said warc enough times)
09:45 ^🔗	SketchCow	I've forgotten
09:46 ^🔗	tef	ok, I took a crawl of hn front page + articles with a crawler that supports ajax
09:46 ^🔗	tef	so I stick it in ark-aaronsw or aaronsw
09:46 ^🔗	tef	assuming it's relevant
09:50 ^🔗	SketchCow	Yeah
09:50 ^🔗	SketchCow	Do it for either
09:51 ^🔗	SketchCow	http://archive.org/details/magazine_rack_misc is fun stuff
09:53 ^🔗	kanzure	"Hey all, I'm coordinating a series of memorial hackathons for Aaron Swartz. Currently there's going to be one at Noisebridge in SF on Jan. 26 (ish) and another somewhere in Boston, but the more the better."
09:53 ^🔗	kanzure	"The idea is to bring together people at hackerspaces around the world to work on projects that in some way continue the work that Aaron did to facilitate the sharing of human knowledge, social/political justice, and free culture."
09:53 ^🔗	kanzure	https://groups.google.com/group/science-liberation-front/t/3d17904bef7759b0
09:55 ^🔗	tef	ok officially I am too incompetent to use the archive uploader http://archive.org/details/NewsYcFrontpagePlusArticlesThreads
09:55 ^🔗	tef	batcave was so much easier for my poor brain
10:46 ^🔗	alard	tef: It needs a different media type, but the files are there.
11:20 ^🔗	Smiley	https://ia601608.us.archive.org/23/items/NewsYcFrontpagePlusArticlesThreads/ << see :)
13:49 ^🔗	maxigas	hi, can somebody tell me where are the docs uploaded to http://aaronsw.archiveteam.org/ are available?
13:51 ^🔗	maxigas	also, i think the counter is restarted every once in a while.
14:18 ^🔗	alard	I think the counter is missing a zero.
14:19 ^🔗	maxigas	i saw it go from around 696 to 12 when i refreshed the page after a few minutes.
14:26 ^🔗	alard	12 was probably 1002, but there's a bug in the code that removes the 00 when it adds a thin space between 1 and 002.
14:26 ^🔗	alard	It does Math.floor(n/1000) + " " + (n%1000).
14:54 ^🔗	maxigas	so where are the downloaded dox?
14:55 ^🔗	alard	I don't know. They're probably not available at the moment, but SketchCow will surely find a way to make them available later.
14:59 ^🔗	maxigas	hm it undermines the legitimacy of the project a bit... documents should be available shortly after you submitted them.
15:00 ^🔗	maxigas	i just showed the website to four people and each of them asked about where to find the assembled documents.
15:03 ^🔗	alard	That may be true, but it's also easier said than done.
15:11 ^🔗	SketchCow	Awwwwwwww.
15:11 ^🔗	SketchCow	You know what I love? I mean love?
15:11 ^🔗	SketchCow	When someone comes to a project and complains.
15:11 ^🔗	SketchCow	Let's see.
15:11 ^🔗	SketchCow	The project was launched around 5pm last night.
15:12 ^🔗	SketchCow	So that's... hmmm, 16 hours or so.
15:12 ^🔗	SketchCow	We immediately started getting swamped.
15:12 ^🔗	SketchCow	We dealt with being swamped.
15:12 ^🔗	SketchCow	So I guess.... well....
15:12 ^🔗	SketchCow	I know. Fuck you.
15:13 ^🔗	SketchCow	How long did we spend trying to keep the server up and dealing with DoS attempts and hacking attacks? Probably 8 of those 16 hours.
15:13 ^🔗	SketchCow	So.... there we go.
15:13 ^🔗	SketchCow	Morning, alard.
15:14 ^🔗	alard	Hello. (Afternoon.)
15:17 ^🔗	SketchCow	Do you want to send underscor a suggestion to fix the counter thing?
15:18 ^🔗	SketchCow	That'll save him when he wakes up in whatever addled state he does this morning
15:18 ^🔗	alard	I've done so, before I responded here.
15:19 ^🔗	maxigas	SketchCow: good point, sorry about that. :/
15:19 ^🔗	balrog_	another thing for underscor: "millions of dollars in fees" -> "millions of dollars in fines"
15:19 ^🔗	maxigas	can i help in some way?
15:20 ^🔗	SketchCow	Yes, you can shut the hell up.
15:22 ^🔗	SketchCow	(fees/fines, wasn't sure of the best term)
15:22 ^🔗	balrog_	a fee is something you pay voluntarily, so yeah. it just sounds a bit weird
15:23 ^🔗	SketchCow	I will not re-iterate to you the circumstances in which I wrote the verbiage.
15:24 ^🔗	SketchCow	http://archive.org/stream/1975PredictionsInAwakeMagazineByJehovahsWitnesses/1975_Predictions_Awake_Magazine#page/n11/mode/2up
15:24 ^🔗	SketchCow	Finally, someone speaks the truth
15:25 ^🔗	alard	maxigas: Perhaps with other ArchiveTeam projects, later, or think of a project of your own. Are you running a warrior yet?
15:25 ^🔗	mistym	That title design is delightful.
15:26 ^🔗	SketchCow	It is.
15:26 ^🔗	alard	That's a question I've often asked myself, so I'm glad to see it answered.
15:26 ^🔗	SketchCow	As I mentioned last night, I put http://archive.org/details/magazine_rack_misc together, grabbing 100+ orphaned magazines and shoving them in, so it's this pretty crazy bin of magazines, superpamphlets, and screeds.
15:32 ^🔗	SketchCow	And as many of these items were sitting around in the collection for years, they have ridiculous download stats.
15:40 ^🔗	SketchCow	http://archive.org/details/thescreensavers - 87 episodes saved
15:40 ^🔗	SketchCow	Oh, so under "oh no, what the fuck", Myspace is starting to begin to get rid of old profiles.
15:40 ^🔗	SketchCow	Or, let people voluntarily upgrade to the new format.
15:43 ^🔗	Smiley	o_O
15:44 ^🔗	Smiley	my old myspace is so amusing, I linked it on facebook recently :<
16:29 ^🔗	SketchCow	I'm speaking in Germany first week of February.
16:29 ^🔗	SketchCow	I just found out they will have a pneumatic tube system active for the event between locations
16:29 ^🔗	SketchCow	And the orientation letter just let us know to expect capsules to slam into the room during our panels
16:30 ^🔗	mistym	Maybe the greatest distraction?
16:30 ^🔗	SketchCow	That's a good question.
16:30 ^🔗	SketchCow	It may be.
16:32 ^🔗	Smiley	o_O
16:44 ^🔗	kanzure	embedding metadata in pdfs https://groups.google.com/group/science-liberation-front/t/b73592f3606b9420
16:47 ^🔗	Smiley	nice.
16:47 ^🔗	Smiley	but whats wrong with md5 sums? :D
17:00 ^🔗	kanzure	Smiley: nothing, i think md5sum is a good idea
17:00 ^🔗	kanzure	but i am also of the opinion that people should be splicing supplemental material into the pdf
17:00 ^🔗	kanzure	chances are, if you don't include it in the .pdf itself, it's not going to be distributed when the paper is read/downloaded
17:01 ^🔗	Smiley	kanzure: a md5sum IS the pdf.
17:01 ^🔗	Smiley	Hense why it works so well :D
18:12 ^🔗	kanzure	On Tue, Jan 15, 2013 at 12:10 PM, Piotr Migdal <pmigdal@gmail.com> wrote:
18:12 ^🔗	kanzure	> "Zorrotero" ;))
18:12 ^🔗	kanzure	> (Silly remark: anyway, for the "guerrilla" Zotero, a good name is
18:14 ^🔗	X-Scale	Speaking of "guerrilla" ... http://www.1000manifestos.com/aaron-swartz-the-guerilla-open-access-manifesto/
18:22 ^🔗	kanzure	X-Scale: yes that was what the reference was to
18:25 ^🔗	kanzure	retoshare/jstor dump https://groups.google.com/group/science-liberation-front/t/9f6c865cfdb43382?hl=en_US
18:36 ^🔗	balrog_	kanzure: are you sure that isn't the collection of Philosophical Transactions of the Royal Society papers released by Greg Maxwell?
18:37 ^🔗	kanzure	balrog_: it seems to include other things, because it's retroshare
18:37 ^🔗	kanzure	balrog_: but yeah this is useless
19:02 ^🔗	SketchCow	Journal of Higher Education statement sent.
19:02 ^🔗	SketchCow	next: Forbes
19:06 ^🔗	kanzure	link?
19:06 ^🔗	X-Scale	balrog_: http://h33t.com/torrent/04934029/r-i-p-aaron-swartz-jstor-archive-35gb
19:06 ^🔗	X-Scale	vs http://thepiratebay.se/torrent/6554331/Papers_from_Philosophical_Transactions_of_the_Royal_Society__fro
19:07 ^🔗	X-Scale	Seems to be the same package
19:07 ^🔗	kanzure	man this distribution infrastructure sucks
19:09 ^🔗	X-Scale	one comment: "Brilliant upload and thoughtful rationale. Many thanks for this. For those who care, a fair amount of the non-PD portions of these journals can be found on rutracker; search for Royal Society."
19:10 ^🔗	balrog_	hah
19:10 ^🔗	*	balrog_ takes a look
19:10 ^🔗	kanzure	or on libgen
19:17 ^🔗	SketchCow	Forbes Guy is Dulllllllllllllllllllllllllllllll
19:26 ^🔗	SketchCow	Dulllllll
19:26 ^🔗	SketchCow	He's gone now
19:26 ^🔗	SketchCow	sooooo dullllllll
19:58 ^🔗	SketchCow	http://chronicle.com/blogs/profhacker/civil-disobedience-the-aaron-swartz-memorial-jstor-liberator/45397
20:06 ^🔗	chronomex	"fair and balanced"
20:17 ^🔗	SketchCow	http://web.archive.org/web/*/http://archive.org is a real thing
20:19 ^🔗	ersi	We need to go deeper
20:19 ^🔗	SketchCow	http://media.tumblr.com/tumblr_li3y1guDbS1qbdtco.gif
20:20 ^🔗	ersi	<3
20:23 ^🔗	ersi	Hmmmm, the new wayback machine doesn't redirect you to the links you click on. Like clicking on "Webmasters" on the eldest archived version, the address bar is still at archive.org/ then
20:23 ^🔗	Ymgve	http://web.archive.org/robots.txt
20:23 ^🔗	Ymgve	you can't go deeper
20:23 ^🔗	ersi	But at least the new wayback is super duper fast
20:23 ^🔗	ersi	Ymgve: Aww :<
20:24 ^🔗	Ymgve	but I wonder what /1 matches
20:24 ^🔗	Ymgve	oh wait, of course, 199x
20:25 ^🔗	ersi	http://web.archive.org/web/19980301000000/http://archive.org/get_archived.html
20:31 ^🔗	SketchCow	OK, so, seriously.
20:31 ^🔗	SketchCow	archiveteam.org closed wiki. That needs to stop.
20:31 ^🔗	SketchCow	Can people please help me find working, installable anti-spam measures?
20:31 ^🔗	SketchCow	And then we'll open it again.
20:31 ^🔗	Nemo_bis	Yes\|
20:31 ^🔗	SketchCow	Let's fix this. Today.
20:32 ^🔗	Nemo_bis	I just did a big research.
20:32 ^🔗	Nemo_bis	https://www.mediawiki.org/wiki/Thread:Extension_talk:ConfirmEdit/Wikis_account_registration_tour
20:32 ^🔗	SketchCow	I hope your solution isn't "massive burlap sacks"
20:32 ^🔗	SketchCow	Because that's your solution to everything
20:32 ^🔗	Nemo_bis	My solution is copy Arch Wiki
20:33 ^🔗	Nemo_bis	SketchCow: did you receive the third one?
20:33 ^🔗	SketchCow	not yet
20:33 ^🔗	SketchCow	Dude, mail.
20:33 ^🔗	SketchCow	You sent it in a sack
20:33 ^🔗	Nemo_bis	That's the only authorised kind of sack btw. I had to fetch it with a bike+train ride 30 km away from home.
20:33 ^🔗	SketchCow	It might be on another ship
20:33 ^🔗	Nemo_bis	It comes by plane.
20:33 ^🔗	chronomex	giant canvas sack?
20:33 ^🔗	SketchCow	So basically you live on the set of the Godfather's flashbacks
20:33 ^🔗	Nemo_bis	So they said.
20:33 ^🔗	chronomex	nice.
20:33 ^🔗	Nemo_bis	No, it's plastic.
20:33 ^🔗	chronomex	oh
20:33 ^🔗	chronomex	:(
20:33 ^🔗	S[h]O[r]T	What is the output of "date -u +%V`uname`\|sha256sum\|sed 's/\W//g'"?
20:34 ^🔗	S[h]O[r]T	lol
20:35 ^🔗	balrog_	979aa183120fc18c292abab0ab967e5bcf132b375f7f8f3283637e6bb10996bb
20:49 ^🔗	DFJustin	The Archive will provide historians, researchers, scholars, and others access to this vast collection of data (reaching ten terabytes), and ensure the longevity of this information.
20:50 ^🔗	DFJustin	so that's three orders of magnitude in 16 years
20:52 ^🔗	*	DFJustin awaits the 10 exabyte party
20:54 ^🔗	Nemo_bis	SketchCow: was the advice enough?
20:55 ^🔗	Nemo_bis	In short, just use https://www.mediawiki.org/wiki/Extension:ConfirmEdit#QuestyCaptcha
20:55 ^🔗	Nemo_bis	I'm sure you can find all the special witty questions one might ever need
20:57 ^🔗	Coderjoe	I would not suggest the uname bit, because that would differ between OSes (obviously)
21:00 ^🔗	Nemo_bis	Well one does not really have to copy Arch Wiki. :D
21:00 ^🔗	Nemo_bis	It was just funny
21:01 ^🔗	Nemo_bis	Of course Arch Wiki must be different from all others. ;)
21:01 ^🔗	Coderjoe	and stupid
21:01 ^🔗	Nemo_bis	Why stupid?
21:02 ^🔗	Coderjoe	what if I am a macos or bsd (or windows) user that happens to be trying out arch on a second machine and want to use my primary system for doing stuff on the wiki (or something)?
21:03 ^🔗	kanzure	did anyone archive the tweets of @tomjdolan
21:04 ^🔗	balrog_	kanzure: pull it from google cache
21:04 ^🔗	SketchCow	http://i.imgur.com/o92sl.gif
21:04 ^🔗	balrog_	though that's not the whole thing. I've seen it across news sites though
21:05 ^🔗	SketchCow	Will do, Nemo_bis
21:05 ^🔗	kanzure	buzzfeed might have it?
21:06 ^🔗	Nemo_bis	Coderjoe: sure, but most people going there surely have linux.
21:07 ^🔗	Nemo_bis	Coderjoe: trust me, it's not worse then SpongeBob questions in German. That was impossible and unfair. At least date has man page.
21:18 ^🔗	S[h]O[r]T	what if it gave you the id of a textfiles tweet and then you had to go copy/paste it :P
21:19 ^🔗	SketchCow	I really like questycaptcha
21:20 ^🔗	SketchCow	http://www.nextlevelofnews.com/2013/01/prosecutors-husband-tomjdolan-aaron-swartz-was-offered-a-6-month-deal-by-buzzfeed.html
21:20 ^🔗	SketchCow	http://topsy.com/twitter/tomjdolan <---- grab that immediately.
21:27 ^🔗	SketchCow	http://archive.is/ is real
21:28 ^🔗	ersi	indeed, and it's great.
21:28 ^🔗	ersi	I wonder who's behind it
21:29 ^🔗	alard	http://blog.archive.is/post/38139265209/what-will-happen-to-the-data-when-you-shut-the-site
21:30 ^🔗	ersi	10TB?! :O
21:30 ^🔗	SketchCow	hahaha
21:30 ^🔗	SketchCow	Wow, seems kind of sketchy, huh
21:30 ^🔗	tef	he saves screenshots of the sites too.
21:30 ^🔗	tef	probably as png.
21:34 ^🔗	ersi	probably means he's using a browser, right?
21:36 ^🔗	alard	http://archive.is/clqhG
21:38 ^🔗	ersi	ah, coolio
21:38 ^🔗	alard	It doesn't seem to have any plugins. http://archive.is/FfdwK
21:40 ^🔗	ersi	Wonder if it'll do flash/Java applets any way though
21:41 ^🔗	kanzure	phantomjs used to do flash :/
21:41 ^🔗	kanzure	until they removed the plugin
21:53 ^🔗	bsmith094	anyone archive aaron's blog yet?
22:00 ^🔗	adamc[a]	bsmith094, pretty sure I saw a couple copies on archive.org
22:00 ^🔗	SketchCow	https://twitter.com/textfiles/status/291303908205268994
22:01 ^🔗	DFJustin	https://archive.org/details/www.aaronsw.com-20130112-mirror
22:20 ^🔗	SketchCow	OK, here we go.
22:27 ^🔗	SketchCow	Archive Team Wiki is BACK TO NEW USER CREATION
22:27 ^🔗	beardicus	should i be worried that i'm downloading lots of wikipedia in the yahooblog-grab ?
22:28 ^🔗	beardicus	seems like it's grabbing urls such as "index.php?title=Special:WhatLinksHere&target=File:Jackie+Chan+2002.jpg.html" when maybe it should just be grabbing hotlinked images?
22:29 ^🔗	alard	I find the Yahooblogs quite annoying. They're slow, messy, and are they even disappearing?
22:30 ^🔗	ersi	There seems to be a few asking about that though (yahooblogs grab -> wikipedia)
22:31 ^🔗	adamcaudi	I just kill the ones that do that - they never seem to make it back out
22:33 ^🔗	alard	If someone wants to add wikipedia to the reject-regex, here it is: https://github.com/ArchiveTeam/yahooblog-grab/blob/master/pipeline.py#L87-L88
22:35 ^🔗	alard	At the moment it tries to download every url that ends with an image extension, including these Wikipedia urls.
22:37 ^🔗	beardicus	alard, these all end in .html though... does the "$" in the regex not mean "end of line"?
22:37 ^🔗	beardicus	in the accept-regex, that is ^^^^^
22:37 ^🔗	alard	No, the url ends in .jpg, Wget adds the .html later (the --adjust-extensions option).
22:38 ^🔗	ersi	Ah
22:38 ^🔗	ersi	makes perfect sense, or well - it makes sense
22:38 ^🔗	chronomex	to someone it makes sense
22:38 ^🔗	chronomex	that's what is important
22:38 ^🔗	alard	The --adjust-extensions option is handy if you have folders that aren't folders, for the sites where you download a page /a and then later find /a/b and discover that /a should have been a folder.
22:39 ^🔗	alard	If that makes any sense. :)
22:39 ^🔗	SketchCow	Who here: 1. Has been around forever 2. I know you 3. Has time to do a VERY boring Wiki thing.
22:39 ^🔗	chronomex	not I
22:40 ^🔗	beardicus	aha. thanks alard. i've got some hackerspace time to spend in a few hours... maybe i'll figure out the wizardry needed to blacklist wikipedia images or figure out why it seems to never stop slurping.
22:42 ^🔗	alard	beardicus: That would be nice. It's a pity that Wget's --span-hosts option doesn't make a difference between urls from image tags and non-image urls.
22:43 ^🔗	beardicus	seems more a shame that an html page would be served up to a .jpg request :)
22:44 ^🔗	alard	Perhaps you can convince the mediawiki-people to change their software. :)
22:45 ^🔗	beardicus	looks like keeping upload.wikimedia would get us the actual jpgs, whereas vi.wikipedia could get the boot.
22:48 ^🔗	ersi	So, either fix/change a regexp (in yahooblag-grab) or fix/change another regexp (mediawiki is a regexp of regexps)
22:48 ^🔗	ersi	:D
23:37 ^🔗	t4rx	is there a way to run ATW in non-virtual environment? my distro is too cool to offer stable and reliable virtualization options...
23:42 ^🔗	t4rx	oh, found a repo at github, guess i can set up non-virtual environment for it.
23:42 ^🔗	t4rx	s/a repo/the repo
23:45 ^🔗	balrog_	t4rx: pip install seesaw
23:45 ^🔗	balrog_	then clone the repo for the project that you want to participate in
23:45 ^🔗	balrog_	then run the get wget lua script which will download and compile wget-lue
23:45 ^🔗	balrog_	lua*
23:46 ^🔗	balrog_	then run seesaw as follows: run-pipeline ./pipeline.py username
23:46 ^🔗	balrog_	run-pipeline --help if you want info on parameters
23:50 ^🔗	Nemo_bis	http://aubreymcfato.com/2013/01/15/how-to-exploit-academics/
23:59 ^🔗	kanzure	why not just get a mole into elsevier and dump the databases

irclogger-viewer