[00:26] <namespace> dashcloud: Ooh.
[00:26] <namespace> I'd like to see support for those wordstar files or whatever.
[00:27] <namespace> You know, all those 80's word processors.
[01:01] <dashcloud> get in touch- I'm sure they've love to support them
[01:14] <dashcloud> here's a good list of the formats supported, and what they could use help with (samples especially): http://sourceforge.net/p/libmwaw/wiki/Home/
[01:31] <omf_> I have used libreoffice in headless mode to do backoffice document conversion arrith1
[01:36] <DFJustin> wow didn't know openoffice.org had kept going after the fork, that's pretty lame
[01:38] <arrith1> all their assets were transferred to the apache foundation. combining the projects would be good of course, not sure where the progress is on that
[01:39] <DFJustin> sounds like they're just moving further apart
[01:40] <dashcloud> it's unfortunate, but they're unlikely to be reconciled
[01:41] <arrith1> everyone in-the-know has switched to LibreOffice. i guess the apache people are used to maintaining stuff for a long time without a lot of fanfare
[01:47] <Aranje> yep. and libreoffice has taken to ripping out all the shit code, java, and apis no longer in use. 4.0 was smaller than 3.6 or whatever
[01:47] <arrith1> Mar 27 2013    http://arstechnica.com/information-technology/2013/03/libreoffice-adoption-soaring-but-openoffice-still-open-source-king/
[01:47] <Aranje> libreoffice hasn't added a massive number of new features, but it's gotten 10x faster and lighter
[01:50] <Aranje> I can't remember if they finished, but libreoffice had summer of code interns rewriting all the java-using utilities in python.
[01:51] <omf_> openoffice is backed by Apache and IBM. The only reason people still use them is existing installs. When libreoffice shipped its first stable version all the major linux distros switched to LO from OO
[01:51] <omf_> Besides the license issue between the two OO has a stank on them from Oracle and IBM
[01:53] <omf_> LO is paying down the decades of technical debt in the code base. There is an excellent talk from FOSDEM 2013 about what has been done. https://www.youtube.com/watch?v=r5DOOlNN9GU
[01:54] <omf_> TL;DR The code base is copy pasta hell
[01:56] <arrith1> it would be really nice if they could put together an updater like firefox's updater. at least on osx it sends you to a download page to grab a new 150-200MB archive. i'm usually on some linux so i don't have to deal with it but still
[01:57] <DFJustin> I haven't bothered updating since 3.1.1, guess it's about time
[02:01] <arrith1> i do think it's gotten faster over each update, based on my limited usage
[02:03] <omf_> It has, there are numerous articles about speedups for each new version. Speed is a primary concern
[02:09] <arrith1> that's good. the only one issue i'd notice is when you'd try to resize the window the contents would disappear, pretty sure it still had that issue 6 months ago or less
[04:59] <agkjdflha> WHAT FORSOOTH, PRITHEE TELL ME THE SECRET
[05:01] <agkjdflha> Please
[05:05] <omf_> the secret is 'yahoosucks'
[05:06] <agkjdflha> Thanks
[06:56] <namespace> Don't they?
[06:56] <namespace> Let me guess, they killed something else didn't they?
[06:59] <namespace> Nope, nevermind.
[13:37] <gui77> in the warrior, is there any way of stopping download until the data's been uploaded? it's beginning to add up...
[13:54] <GLaDOS> Hitting stop will prevent any new jobs from starting.
[15:07] <ivan`> third and final greader-related grab is ready, https://github.com/ArchiveTeam/greader-stats-grab http://tracker-alt.dyn.ludios.net:9292/greader-stats/
[15:09] <winr4r> ivan`: bandwidth-intensive?
[15:11] <ivan`> winr4r: also very low bandwidth, 240 URLs per item, ~800KB upload
[15:15] <winr4r> ivan`: which do you think is higher-priority, directory-grab or stats-grab?
[15:16] <winr4r> ivan`: you're doing awesome work, by the way
[15:17] <ivan`> thanks
[15:18] <ivan`> winr4r: I don't really know :)
[15:18] <ivan`> probably directory by a slight margin
[15:18] <winr4r> well stats is running now so i'll leave it running
[15:19] <ivan`> yep, that's fine
[15:19] <winr4r> although it looks like you're going to get most of it by yourself at this rate
[15:22] <ivan`> more IPs are helpful if google starts throwing banhammers
[15:22] <ivan`> but hopefully they've all gone home for the weekend
[15:23] <winr4r> i doubt google would even notice
[17:03] <gui77> GLaDOS: thanks. is it normal for the warrior to take so long to upload? practically all the jobs are in the upload phase...
[17:26] <ivan`> anyone have ideas for queries to feed into Reader Feed Search?
[17:27] <ivan`> already done common words for many languages, some English n-grams, numbers, some interest-specific things
[17:34] <winr4r> gui77: not sure what's going on, the tracker appears to be frozen
[17:34] <winr4r> ivan`: find railway-related ones and i'll hug you :)
[17:35] <winr4r> meanwhile, a saturday afternoon side-project: http://archiveteam.org/index.php?title=In_The_Media
[17:35] <winr4r> collecting major media articles on archive team
[17:36] <winr4r> if anyone knows of others (and i know there are more) go ahead and add them
[17:52] <ivan`> winr4r: you can upload all your railway-related queries to http://allyourfeed.ludios.org:8080/
[18:08] <elgarfo> hi there, i stumbled over archiveteam site while i was searching for a website which seems to be long gone. its a prodigy.net site, and the archiveteam wiki says, some hundred pages were archived. but i can not find a download link for this particular archive. is there any possibility to get the prodigy.net-dump?
[18:12] <winr4r> elgarfo: oh hello
[18:12] <winr4r> let me go back through my logs to see who was working on it at the time
[18:12] <elgarfo> thanks winr4r
[18:13] <winr4r> elgarfo: okay, i think underscor was working on it, you may want to ask him
[18:13] * winr4r pokes underscor
[18:14] <winr4r> (brb, store)
[18:17] <godane> looks like i have a bbc click: https://archive.org/details/bbcclick
[18:17] <godane> i was hoping it would go into the computer and tech videos collection
[18:37] <winr4r> godane: hey, jason will be in there somewhere!
[18:51] <underscor> elgarfo: Did you check the wayback machine at archive.org?
[18:51] <underscor> I think those were incorproated there
[18:53] <elgarfo> how could i not think of the wayback machine -.- thank you for pointing it out. the content is actually there :)
[18:54] <winr4r> hooray!
[18:54] <winr4r> elgarfo: a lot of AT's stuff goes into the wayback machine, as well as putting out big archives of All The Shit
[18:57] <winr4r> that page on the wiki should probably be updated though
[18:58] <elgarfo> i've used the wayback machine before, so the first logical step should be to look there, but for some reason, i only tried google this time. after finding archiveteam wiki which stated that robots.txt prevented the internet archive from crawling it, i did not even try.
[18:58] <winr4r> ah, hm
[18:59] <winr4r> i don't know how it works if robots.txt blocks it AND it gets manually ingested into the wayback machine
[18:59] <underscor> the robots.txt check is done every time you load a page in wayback
[18:59] <underscor> and cached for like 5 or 30 mintues
[18:59] <DFJustin> I think it prevents you from viewing it, but maybe in this case they (or a new domain owner) changed the robots.txt later
[18:59] <underscor> minutes*
[19:00] <underscor> elgarfo: If you email me about what you want (abuie@archive.org) I have internal buttons that let me get stuff out even if robots.txt'd
[19:00] <underscor> ((we don't make it public cause it has to go through policy checking and stuff)
[19:06] <elgarfo> underscor, its all fine, i already found what i wanted to find. the one snapshot from 2010 worked like a charm
[19:07] * winr4r likes a happy ending!
[19:42] <underscor> elgarfo: yay!
[20:50] <namespace> Is there a way to get wget to not download certain subdomains?
[20:50] <namespace> Or even better, *only* download certain subdomains?
[20:50] <omf_> yes
[20:50] <namespace> How?
[20:50] <omf_> give me asec
[20:51] <omf_> take a look at this example http://www.archiveteam.org/index.php?title=Ispygames look at the --span-hosts and --domains= combo
[20:51] <omf_> that is how you do selective domains
[20:52] <namespace> Span hosts opens me up to grabbing *more* stuff, which is the opposite of what I want.
[20:54] <namespace> Actually, I have it wrong, sorry. I don't want to grab certain subdomains, I want to grab certain directories.
[20:55] <namespace> Which when phrased like that I guess just means don't use -mirror
[20:56] <namespace> Wait nevermind, found the options in the manual.
[22:12] <dashcloud> someone should make (or expand the existing one maybe) a wget warc example page with all the differents commands people use and what they use them for
[22:12] <arrith1> yeah, specific wget_warc_examples page would be awesome