#archiveteam-bs 2014-03-12,Wed

↑back Search

Time	Nickname	Message
01:06 ^🔗	dashcloud	damn that's awesome
01:35 ^🔗	Leo_TCK	did he give a list of what cds he got? or is he just gonna surprise upload them?
02:29 ^🔗	SketchCow	Surprise!
02:36 ^🔗	BlueMax	well ain't that gonna be fun
02:46 ^🔗	yipdw_	cmake is so maddening
02:46 ^🔗	yipdw_	I'm trying to figure out how to shell out as part of a custom target
02:47 ^🔗	yipdw_	add_custom_target and execute_process are both cmake commands, but composing them together is full of SURPRISE
02:53 ^🔗	yipdw_	oh, appearently the answer is to not use execute_process, because that's meant to be used to execute a program at configure time
03:30 ^🔗	mistym	Anyone know a decent place to get old Windows versions? Looking for Windows 2.0ish
04:00 ^🔗	SketchCow	Trying to collect as I can
05:05 ^🔗	BlueMax	used to know a good place for old windows versions but now I'd be damned if I could remember
05:43 ^🔗	DFJustin	http://wdl2.winworldpc.com/Abandonware%20Operating%20Systems/PC/Microsoft%20Windows/
09:53 ^🔗	midas	stupid robots.txt
11:08 ^🔗	dashcloud	on the subject of old software versions, this site is pretty comprehensive: http://vetusware.com/
11:08 ^🔗	dashcloud	there's a group trying to re-create Opera v12 here: http://otter-browser.org
12:41 ^🔗	balrog	dashcloud: yeah, there's no easy way to archive that site though
12:41 ^🔗	balrog	even with the highest membership you only can do 10 downloads per day
14:27 ^🔗	DFJustin	there was a mirror at http://files.wehack.net/Vetusware/ that seemed pretty comprehensive
14:27 ^🔗	DFJustin	it seems down now but archivebot grabbed it
14:27 ^🔗	DFJustin	https://archive.org/download/archiveteam_archivebot_go_024/files.wehack.net-inf-20140128-223321-bfv4w.warc.gz
16:40 ^🔗	ohhdemgir	how was archiveteam getting 4chan?
16:42 ^🔗	DFJustin	I don't think we are specifically, jason has some older archives that I think he was given
16:43 ^🔗	ohhdemgir	I used to archive sma with https://code.google.com/p/libchan/ and it no longer works, looking for a new automated method
17:00 ^🔗	ersi	Ouch, no development since 2012
17:04 ^🔗	DFJustin	there was this story recently https://library.stanford.edu/blogs/digital-library-blog/2014/01/sdr-deposit-week-4chan-forum-archives
17:05 ^🔗	DFJustin	maybe these yotsuba society folks have a tool
17:21 ^🔗	RedType	DFJustin: i think it's more a combination of their crawler as well as sites such as 4chan archive
17:21 ^🔗	RedType	thought*
17:50 ^🔗	underscor	This is a very Cool Thing
17:50 ^🔗	underscor	http://pywb.herokuapp.com/
17:50 ^🔗	underscor	Check out the twitter scrolling that works!
17:50 ^🔗	underscor	Even facebook too
17:50 ^🔗	underscor	https://groups.google.com/forum/#!msg/openwayback-dev/MAFY4Q0Jo8Y/nsHKReRwfyAJ
17:58 ^🔗	arkiver	wow
17:58 ^🔗	arkiver	ay idea how they are doing that?
17:58 ^🔗	arkiver	any*
17:59 ^🔗	arkiver	hmm
17:59 ^🔗	arkiver	http://pywb.herokuapp.com/pywb/*/https://pbs.twimg.com/profile_images/2179744751/IIPC_Twitter_ProfilePic_bigger.gif
17:59 ^🔗	arkiver	same as wayback
18:00 ^🔗	arkiver	not he interface, but the way of the urls
18:00 ^🔗	arkiver	with the *
18:01 ^🔗	underscor	https://github.com/ikreymer/pywb
18:01 ^🔗	underscor	It's a (nearly) feature-complete rewrite of the wayback machine in python by the engineer who wrote a lot of the old wayback components
18:02 ^🔗	underscor	including the new "save-page-now" feature and the new api that was released on 10/24/13
18:02 ^🔗	arkiver	wow
18:02 ^🔗	underscor	(pywb is fully unaffiliated with IA, though)
18:02 ^🔗	arkiver	gosh, IA could look at the code and implement facebook and twitter crolling in heritrix...
18:03 ^🔗	arkiver	but it creates warc.gz files
18:03 ^🔗	arkiver	so it should work with wayback
18:09 ^🔗	underscor	the big issue is that java wayback doesn't support domain specific rules
18:10 ^🔗	underscor	which you really need to playback weird ajaxy content
18:10 ^🔗	underscor	certain headers, certain string manipulations, etc
18:10 ^🔗	DFJustin	isn't it doing something custom for youtube
18:11 ^🔗	underscor	kinda
18:11 ^🔗	underscor	but it's very hacky
18:11 ^🔗	underscor	and not scaleable at all
18:12 ^🔗	underscor	also youtube doesn't require nearly as many "special" things, since it just replaces the yt player with a custom jw one
18:12 ^🔗	underscor	jwplayer*
18:13 ^🔗	DFJustin	it also seems to redirect you if there is a grab of the same video with different url parameters
18:24 ^🔗	arkiver	is there an example in the wayback machine with a youtube page that actually plays a video?
18:24 ^🔗	arkiver	I actually have never found such a video
18:29 ^🔗	DFJustin	hmm the one I saved before doesn't work anymore, it seems kind of inconsistent
18:35 ^🔗	arkiver	yes
18:36 ^🔗	arkiver	but man the wayback machine is showing pages better and better
18:36 ^🔗	arkiver	http://web.archive.org/web/20140312182903/https://www.uber.com/
18:36 ^🔗	arkiver	looks great
18:37 ^🔗	arkiver	DFJustin: they are crawling youtube videos: https://archive.org/details/youtubecrawl
18:37 ^🔗	arkiver	but just blocking the
18:37 ^🔗	arkiver	m
18:38 ^🔗	DFJustin	I don't think they're blocking them so much as the wayback hack stuff is flaky
18:45 ^🔗	DFJustin	you can see how it works with the archive-it ones https://wayback.archive-it.org/4399/20140301235240/http://www.youtube.com/watch?v=n1Q1p7Oc_5g
18:46 ^🔗	arkiver	Ah I see, thank you
18:47 ^🔗	arkiver	but why don't they just add that to the IA wayback machine too then?
18:51 ^🔗	DFJustin	the code is in place it's just not working, probably because they have no manpower to fix everything all the time
18:52 ^🔗	arkiver	ah, well, it's good to see the information is saved, even though it's not playable
19:02 ^🔗	yipdw_	another really annoying problem with infinite scrollers is that some sites (i.e. patch.com) shove a timestamp into the URL
19:03 ^🔗	yipdw_	so recording the request/response is insufficient
19:03 ^🔗	yipdw_	this is the result of voodoo web programming by people who think that cache-busting needs to be done with fucking the query string
19:39 ^🔗	DFJustin	yeah seen a bunch of ?nocache=137891278941 stuff in archivebot
19:43 ^🔗	ohhdemgir	http://www.reddit.com/r/IAmA/comments/2091d4/i_am_tim_bernerslee_i_invented_the_www_25_years/
19:45 ^🔗	Coderjoe	well, URL-based cache busting is sometimes needed to get around stupid caching proxies that don't respect the cache control headers
19:52 ^🔗	DFJustin	oh myspace brought old photos back
20:02 ^🔗	balrog	DFJustin: did they bring journal content back too?
20:03 ^🔗	DFJustin	the email only mentions photos
20:08 ^🔗	yipdw_	Coderjoe: I maintain that the blame is still on shitty web programmers who expect that they will always be able to expect a request from a browser
20:09 ^🔗	yipdw_	and designing their applications on faulty assumptions
20:18 ^🔗	Coderjoe	I hate sites where all I get is a blank page until I start blessing javascript
20:59 ^🔗	exmic	I hate a lot of things
20:59 ^🔗	exmic	I practice hate-driven development
21:00 ^🔗	arrith	i hear that's good for your health
23:43 ^🔗	nico	http://redmine.replicant.us/projects/replicant/wiki/SamsungGalaxyBackdoor
23:43 ^🔗	nico	dashcloud: :)

irclogger-viewer