#archiveteam 2013-06-22,Sat

↑back Search

Time Nickname Message
00:26 🔗 namespace dashcloud: Ooh.
00:26 🔗 namespace I'd like to see support for those wordstar files or whatever.
00:27 🔗 namespace You know, all those 80's word processors.
01:01 🔗 dashcloud get in touch- I'm sure they've love to support them
01:14 🔗 dashcloud here's a good list of the formats supported, and what they could use help with (samples especially): http://sourceforge.net/p/libmwaw/wiki/Home/
01:31 🔗 omf_ I have used libreoffice in headless mode to do backoffice document conversion arrith1
01:36 🔗 DFJustin wow didn't know openoffice.org had kept going after the fork, that's pretty lame
01:38 🔗 arrith1 all their assets were transferred to the apache foundation. combining the projects would be good of course, not sure where the progress is on that
01:39 🔗 DFJustin sounds like they're just moving further apart
01:40 🔗 dashcloud it's unfortunate, but they're unlikely to be reconciled
01:41 🔗 arrith1 everyone in-the-know has switched to LibreOffice. i guess the apache people are used to maintaining stuff for a long time without a lot of fanfare
01:47 🔗 Aranje yep. and libreoffice has taken to ripping out all the shit code, java, and apis no longer in use. 4.0 was smaller than 3.6 or whatever
01:47 🔗 arrith1 Mar 27 2013 http://arstechnica.com/information-technology/2013/03/libreoffice-adoption-soaring-but-openoffice-still-open-source-king/
01:47 🔗 Aranje libreoffice hasn't added a massive number of new features, but it's gotten 10x faster and lighter
01:50 🔗 Aranje I can't remember if they finished, but libreoffice had summer of code interns rewriting all the java-using utilities in python.
01:51 🔗 omf_ openoffice is backed by Apache and IBM. The only reason people still use them is existing installs. When libreoffice shipped its first stable version all the major linux distros switched to LO from OO
01:51 🔗 omf_ Besides the license issue between the two OO has a stank on them from Oracle and IBM
01:53 🔗 omf_ LO is paying down the decades of technical debt in the code base. There is an excellent talk from FOSDEM 2013 about what has been done. https://www.youtube.com/watch?v=r5DOOlNN9GU
01:54 🔗 omf_ TL;DR The code base is copy pasta hell
01:56 🔗 arrith1 it would be really nice if they could put together an updater like firefox's updater. at least on osx it sends you to a download page to grab a new 150-200MB archive. i'm usually on some linux so i don't have to deal with it but still
01:57 🔗 DFJustin I haven't bothered updating since 3.1.1, guess it's about time
02:01 🔗 arrith1 i do think it's gotten faster over each update, based on my limited usage
02:03 🔗 omf_ It has, there are numerous articles about speedups for each new version. Speed is a primary concern
02:09 🔗 arrith1 that's good. the only one issue i'd notice is when you'd try to resize the window the contents would disappear, pretty sure it still had that issue 6 months ago or less
04:59 🔗 agkjdflha WHAT FORSOOTH, PRITHEE TELL ME THE SECRET
05:01 🔗 agkjdflha Please
05:05 🔗 omf_ the secret is 'yahoosucks'
05:06 🔗 agkjdflha Thanks
06:56 🔗 namespace Don't they?
06:56 🔗 namespace Let me guess, they killed something else didn't they?
06:59 🔗 namespace Nope, nevermind.
13:37 🔗 gui77 in the warrior, is there any way of stopping download until the data's been uploaded? it's beginning to add up...
13:54 🔗 GLaDOS Hitting stop will prevent any new jobs from starting.
15:07 🔗 ivan` third and final greader-related grab is ready, https://github.com/ArchiveTeam/greader-stats-grab http://tracker-alt.dyn.ludios.net:9292/greader-stats/
15:09 🔗 winr4r ivan`: bandwidth-intensive?
15:11 🔗 ivan` winr4r: also very low bandwidth, 240 URLs per item, ~800KB upload
15:15 🔗 winr4r ivan`: which do you think is higher-priority, directory-grab or stats-grab?
15:16 🔗 winr4r ivan`: you're doing awesome work, by the way
15:17 🔗 ivan` thanks
15:18 🔗 ivan` winr4r: I don't really know :)
15:18 🔗 ivan` probably directory by a slight margin
15:18 🔗 winr4r well stats is running now so i'll leave it running
15:19 🔗 ivan` yep, that's fine
15:19 🔗 winr4r although it looks like you're going to get most of it by yourself at this rate
15:22 🔗 ivan` more IPs are helpful if google starts throwing banhammers
15:22 🔗 ivan` but hopefully they've all gone home for the weekend
15:23 🔗 winr4r i doubt google would even notice
17:03 🔗 gui77 GLaDOS: thanks. is it normal for the warrior to take so long to upload? practically all the jobs are in the upload phase...
17:26 🔗 ivan` anyone have ideas for queries to feed into Reader Feed Search?
17:27 🔗 ivan` already done common words for many languages, some English n-grams, numbers, some interest-specific things
17:34 🔗 winr4r gui77: not sure what's going on, the tracker appears to be frozen
17:34 🔗 winr4r ivan`: find railway-related ones and i'll hug you :)
17:35 🔗 winr4r meanwhile, a saturday afternoon side-project: http://archiveteam.org/index.php?title=In_The_Media
17:35 🔗 winr4r collecting major media articles on archive team
17:36 🔗 winr4r if anyone knows of others (and i know there are more) go ahead and add them
17:52 🔗 ivan` winr4r: you can upload all your railway-related queries to http://allyourfeed.ludios.org:8080/
18:08 🔗 elgarfo hi there, i stumbled over archiveteam site while i was searching for a website which seems to be long gone. its a prodigy.net site, and the archiveteam wiki says, some hundred pages were archived. but i can not find a download link for this particular archive. is there any possibility to get the prodigy.net-dump?
18:12 🔗 winr4r elgarfo: oh hello
18:12 🔗 winr4r let me go back through my logs to see who was working on it at the time
18:12 🔗 elgarfo thanks winr4r
18:13 🔗 winr4r elgarfo: okay, i think underscor was working on it, you may want to ask him
18:13 🔗 * winr4r pokes underscor
18:14 🔗 winr4r (brb, store)
18:17 🔗 godane looks like i have a bbc click: https://archive.org/details/bbcclick
18:17 🔗 godane i was hoping it would go into the computer and tech videos collection
18:37 🔗 winr4r godane: hey, jason will be in there somewhere!
18:51 🔗 underscor elgarfo: Did you check the wayback machine at archive.org?
18:51 🔗 underscor I think those were incorproated there
18:53 🔗 elgarfo how could i not think of the wayback machine -.- thank you for pointing it out. the content is actually there :)
18:54 🔗 winr4r hooray!
18:54 🔗 winr4r elgarfo: a lot of AT's stuff goes into the wayback machine, as well as putting out big archives of All The Shit
18:57 🔗 winr4r that page on the wiki should probably be updated though
18:58 🔗 elgarfo i've used the wayback machine before, so the first logical step should be to look there, but for some reason, i only tried google this time. after finding archiveteam wiki which stated that robots.txt prevented the internet archive from crawling it, i did not even try.
18:58 🔗 winr4r ah, hm
18:59 🔗 winr4r i don't know how it works if robots.txt blocks it AND it gets manually ingested into the wayback machine
18:59 🔗 underscor the robots.txt check is done every time you load a page in wayback
18:59 🔗 underscor and cached for like 5 or 30 mintues
18:59 🔗 DFJustin I think it prevents you from viewing it, but maybe in this case they (or a new domain owner) changed the robots.txt later
18:59 🔗 underscor minutes*
19:00 🔗 underscor elgarfo: If you email me about what you want (abuie@archive.org) I have internal buttons that let me get stuff out even if robots.txt'd
19:00 🔗 underscor ((we don't make it public cause it has to go through policy checking and stuff)
19:06 🔗 elgarfo underscor, its all fine, i already found what i wanted to find. the one snapshot from 2010 worked like a charm
19:07 🔗 * winr4r likes a happy ending!
19:42 🔗 underscor elgarfo: yay!
20:50 🔗 namespace Is there a way to get wget to not download certain subdomains?
20:50 🔗 namespace Or even better, *only* download certain subdomains?
20:50 🔗 omf_ yes
20:50 🔗 namespace How?
20:50 🔗 omf_ give me asec
20:51 🔗 omf_ take a look at this example http://www.archiveteam.org/index.php?title=Ispygames look at the --span-hosts and --domains= combo
20:51 🔗 omf_ that is how you do selective domains
20:52 🔗 namespace Span hosts opens me up to grabbing *more* stuff, which is the opposite of what I want.
20:54 🔗 namespace Actually, I have it wrong, sorry. I don't want to grab certain subdomains, I want to grab certain directories.
20:55 🔗 namespace Which when phrased like that I guess just means don't use -mirror
20:56 🔗 namespace Wait nevermind, found the options in the manual.
22:12 🔗 dashcloud someone should make (or expand the existing one maybe) a wget warc example page with all the differents commands people use and what they use them for
22:12 🔗 arrith1 yeah, specific wget_warc_examples page would be awesome

irclogger-viewer