#archiveteam 2013-06-22,Sat

↑back Search

Time	Nickname	Message
00:26 ^🔗	namespace	dashcloud: Ooh.
00:26 ^🔗	namespace	I'd like to see support for those wordstar files or whatever.
00:27 ^🔗	namespace	You know, all those 80's word processors.
01:01 ^🔗	dashcloud	get in touch- I'm sure they've love to support them
01:14 ^🔗	dashcloud	here's a good list of the formats supported, and what they could use help with (samples especially): http://sourceforge.net/p/libmwaw/wiki/Home/
01:31 ^🔗	omf_	I have used libreoffice in headless mode to do backoffice document conversion arrith1
01:36 ^🔗	DFJustin	wow didn't know openoffice.org had kept going after the fork, that's pretty lame
01:38 ^🔗	arrith1	all their assets were transferred to the apache foundation. combining the projects would be good of course, not sure where the progress is on that
01:39 ^🔗	DFJustin	sounds like they're just moving further apart
01:40 ^🔗	dashcloud	it's unfortunate, but they're unlikely to be reconciled
01:41 ^🔗	arrith1	everyone in-the-know has switched to LibreOffice. i guess the apache people are used to maintaining stuff for a long time without a lot of fanfare
01:47 ^🔗	Aranje	yep. and libreoffice has taken to ripping out all the shit code, java, and apis no longer in use. 4.0 was smaller than 3.6 or whatever
01:47 ^🔗	arrith1	Mar 27 2013 http://arstechnica.com/information-technology/2013/03/libreoffice-adoption-soaring-but-openoffice-still-open-source-king/
01:47 ^🔗	Aranje	libreoffice hasn't added a massive number of new features, but it's gotten 10x faster and lighter
01:50 ^🔗	Aranje	I can't remember if they finished, but libreoffice had summer of code interns rewriting all the java-using utilities in python.
01:51 ^🔗	omf_	openoffice is backed by Apache and IBM. The only reason people still use them is existing installs. When libreoffice shipped its first stable version all the major linux distros switched to LO from OO
01:51 ^🔗	omf_	Besides the license issue between the two OO has a stank on them from Oracle and IBM
01:53 ^🔗	omf_	LO is paying down the decades of technical debt in the code base. There is an excellent talk from FOSDEM 2013 about what has been done. https://www.youtube.com/watch?v=r5DOOlNN9GU
01:54 ^🔗	omf_	TL;DR The code base is copy pasta hell
01:56 ^🔗	arrith1	it would be really nice if they could put together an updater like firefox's updater. at least on osx it sends you to a download page to grab a new 150-200MB archive. i'm usually on some linux so i don't have to deal with it but still
01:57 ^🔗	DFJustin	I haven't bothered updating since 3.1.1, guess it's about time
02:01 ^🔗	arrith1	i do think it's gotten faster over each update, based on my limited usage
02:03 ^🔗	omf_	It has, there are numerous articles about speedups for each new version. Speed is a primary concern
02:09 ^🔗	arrith1	that's good. the only one issue i'd notice is when you'd try to resize the window the contents would disappear, pretty sure it still had that issue 6 months ago or less
04:59 ^🔗	agkjdflha	WHAT FORSOOTH, PRITHEE TELL ME THE SECRET
05:01 ^🔗	agkjdflha	Please
05:05 ^🔗	omf_	the secret is 'yahoosucks'
05:06 ^🔗	agkjdflha	Thanks
06:56 ^🔗	namespace	Don't they?
06:56 ^🔗	namespace	Let me guess, they killed something else didn't they?
06:59 ^🔗	namespace	Nope, nevermind.
13:37 ^🔗	gui77	in the warrior, is there any way of stopping download until the data's been uploaded? it's beginning to add up...
13:54 ^🔗	GLaDOS	Hitting stop will prevent any new jobs from starting.
15:07 ^🔗	ivan`	third and final greader-related grab is ready, https://github.com/ArchiveTeam/greader-stats-grab http://tracker-alt.dyn.ludios.net:9292/greader-stats/
15:09 ^🔗	winr4r	ivan`: bandwidth-intensive?
15:11 ^🔗	ivan`	winr4r: also very low bandwidth, 240 URLs per item, ~800KB upload
15:15 ^🔗	winr4r	ivan`: which do you think is higher-priority, directory-grab or stats-grab?
15:16 ^🔗	winr4r	ivan`: you're doing awesome work, by the way
15:17 ^🔗	ivan`	thanks
15:18 ^🔗	ivan`	winr4r: I don't really know :)
15:18 ^🔗	ivan`	probably directory by a slight margin
15:18 ^🔗	winr4r	well stats is running now so i'll leave it running
15:19 ^🔗	ivan`	yep, that's fine
15:19 ^🔗	winr4r	although it looks like you're going to get most of it by yourself at this rate
15:22 ^🔗	ivan`	more IPs are helpful if google starts throwing banhammers
15:22 ^🔗	ivan`	but hopefully they've all gone home for the weekend
15:23 ^🔗	winr4r	i doubt google would even notice
17:03 ^🔗	gui77	GLaDOS: thanks. is it normal for the warrior to take so long to upload? practically all the jobs are in the upload phase...
17:26 ^🔗	ivan`	anyone have ideas for queries to feed into Reader Feed Search?
17:27 ^🔗	ivan`	already done common words for many languages, some English n-grams, numbers, some interest-specific things
17:34 ^🔗	winr4r	gui77: not sure what's going on, the tracker appears to be frozen
17:34 ^🔗	winr4r	ivan`: find railway-related ones and i'll hug you :)
17:35 ^🔗	winr4r	meanwhile, a saturday afternoon side-project: http://archiveteam.org/index.php?title=In_The_Media
17:35 ^🔗	winr4r	collecting major media articles on archive team
17:36 ^🔗	winr4r	if anyone knows of others (and i know there are more) go ahead and add them
17:52 ^🔗	ivan`	winr4r: you can upload all your railway-related queries to http://allyourfeed.ludios.org:8080/
18:08 ^🔗	elgarfo	hi there, i stumbled over archiveteam site while i was searching for a website which seems to be long gone. its a prodigy.net site, and the archiveteam wiki says, some hundred pages were archived. but i can not find a download link for this particular archive. is there any possibility to get the prodigy.net-dump?
18:12 ^🔗	winr4r	elgarfo: oh hello
18:12 ^🔗	winr4r	let me go back through my logs to see who was working on it at the time
18:12 ^🔗	elgarfo	thanks winr4r
18:13 ^🔗	winr4r	elgarfo: okay, i think underscor was working on it, you may want to ask him
18:13 ^🔗	*	winr4r pokes underscor
18:14 ^🔗	winr4r	(brb, store)
18:17 ^🔗	godane	looks like i have a bbc click: https://archive.org/details/bbcclick
18:17 ^🔗	godane	i was hoping it would go into the computer and tech videos collection
18:37 ^🔗	winr4r	godane: hey, jason will be in there somewhere!
18:51 ^🔗	underscor	elgarfo: Did you check the wayback machine at archive.org?
18:51 ^🔗	underscor	I think those were incorproated there
18:53 ^🔗	elgarfo	how could i not think of the wayback machine -.- thank you for pointing it out. the content is actually there :)
18:54 ^🔗	winr4r	hooray!
18:54 ^🔗	winr4r	elgarfo: a lot of AT's stuff goes into the wayback machine, as well as putting out big archives of All The Shit
18:57 ^🔗	winr4r	that page on the wiki should probably be updated though
18:58 ^🔗	elgarfo	i've used the wayback machine before, so the first logical step should be to look there, but for some reason, i only tried google this time. after finding archiveteam wiki which stated that robots.txt prevented the internet archive from crawling it, i did not even try.
18:58 ^🔗	winr4r	ah, hm
18:59 ^🔗	winr4r	i don't know how it works if robots.txt blocks it AND it gets manually ingested into the wayback machine
18:59 ^🔗	underscor	the robots.txt check is done every time you load a page in wayback
18:59 ^🔗	underscor	and cached for like 5 or 30 mintues
18:59 ^🔗	DFJustin	I think it prevents you from viewing it, but maybe in this case they (or a new domain owner) changed the robots.txt later
18:59 ^🔗	underscor	minutes*
19:00 ^🔗	underscor	elgarfo: If you email me about what you want (abuie@archive.org) I have internal buttons that let me get stuff out even if robots.txt'd
19:00 ^🔗	underscor	((we don't make it public cause it has to go through policy checking and stuff)
19:06 ^🔗	elgarfo	underscor, its all fine, i already found what i wanted to find. the one snapshot from 2010 worked like a charm
19:07 ^🔗	*	winr4r likes a happy ending!
19:42 ^🔗	underscor	elgarfo: yay!
20:50 ^🔗	namespace	Is there a way to get wget to not download certain subdomains?
20:50 ^🔗	namespace	Or even better, only download certain subdomains?
20:50 ^🔗	omf_	yes
20:50 ^🔗	namespace	How?
20:50 ^🔗	omf_	give me asec
20:51 ^🔗	omf_	take a look at this example http://www.archiveteam.org/index.php?title=Ispygames look at the --span-hosts and --domains= combo
20:51 ^🔗	omf_	that is how you do selective domains
20:52 ^🔗	namespace	Span hosts opens me up to grabbing more stuff, which is the opposite of what I want.
20:54 ^🔗	namespace	Actually, I have it wrong, sorry. I don't want to grab certain subdomains, I want to grab certain directories.
20:55 ^🔗	namespace	Which when phrased like that I guess just means don't use -mirror
20:56 ^🔗	namespace	Wait nevermind, found the options in the manual.
22:12 ^🔗	dashcloud	someone should make (or expand the existing one maybe) a wget warc example page with all the differents commands people use and what they use them for
22:12 ^🔗	arrith1	yeah, specific wget_warc_examples page would be awesome

irclogger-viewer