#archiveteam-bs 2014-03-12,Wed

↑back Search

Time Nickname Message
01:06 🔗 dashcloud damn that's awesome
01:35 🔗 Leo_TCK did he give a list of what cds he got? or is he just gonna surprise upload them?
02:29 🔗 SketchCow Surprise!
02:36 🔗 BlueMax well ain't that gonna be fun
02:46 🔗 yipdw_ cmake is so maddening
02:46 🔗 yipdw_ I'm trying to figure out how to shell out as part of a custom target
02:47 🔗 yipdw_ add_custom_target and execute_process are both cmake commands, but composing them together is full of SURPRISE
02:53 🔗 yipdw_ oh, appearently the answer is to not use execute_process, because that's meant to be used to execute a program at configure time
03:30 🔗 mistym Anyone know a decent place to get old Windows versions? Looking for Windows 2.0ish
04:00 🔗 SketchCow Trying to collect as I can
05:05 🔗 BlueMax used to know a good place for old windows versions but now I'd be damned if I could remember
05:43 🔗 DFJustin http://wdl2.winworldpc.com/Abandonware%20Operating%20Systems/PC/Microsoft%20Windows/
09:53 🔗 midas stupid robots.txt
11:08 🔗 dashcloud on the subject of old software versions, this site is pretty comprehensive: http://vetusware.com/
11:08 🔗 dashcloud there's a group trying to re-create Opera v12 here: http://otter-browser.org
12:41 🔗 balrog dashcloud: yeah, there's no easy way to archive that site though
12:41 🔗 balrog even with the highest membership you only can do 10 downloads per day
14:27 🔗 DFJustin there was a mirror at http://files.wehack.net/Vetusware/ that seemed pretty comprehensive
14:27 🔗 DFJustin it seems down now but archivebot grabbed it
14:27 🔗 DFJustin https://archive.org/download/archiveteam_archivebot_go_024/files.wehack.net-inf-20140128-223321-bfv4w.warc.gz
16:40 🔗 ohhdemgir how was archiveteam getting 4chan?
16:42 🔗 DFJustin I don't think we are specifically, jason has some older archives that I think he was given
16:43 🔗 ohhdemgir I used to archive sma with https://code.google.com/p/libchan/ and it no longer works, looking for a new automated method
17:00 🔗 ersi Ouch, no development since 2012
17:04 🔗 DFJustin there was this story recently https://library.stanford.edu/blogs/digital-library-blog/2014/01/sdr-deposit-week-4chan-forum-archives
17:05 🔗 DFJustin maybe these yotsuba society folks have a tool
17:21 🔗 RedType DFJustin: i think it's more a combination of their crawler as well as sites such as 4chan archive
17:21 🔗 RedType thought*
17:50 🔗 underscor This is a very Cool Thing
17:50 🔗 underscor http://pywb.herokuapp.com/
17:50 🔗 underscor Check out the twitter scrolling that works!
17:50 🔗 underscor Even facebook too
17:50 🔗 underscor https://groups.google.com/forum/#!msg/openwayback-dev/MAFY4Q0Jo8Y/nsHKReRwfyAJ
17:58 🔗 arkiver wow
17:58 🔗 arkiver ay idea how they are doing that?
17:58 🔗 arkiver any*
17:59 🔗 arkiver hmm
17:59 🔗 arkiver http://pywb.herokuapp.com/pywb/*/https://pbs.twimg.com/profile_images/2179744751/IIPC_Twitter_ProfilePic_bigger.gif
17:59 🔗 arkiver same as wayback
18:00 🔗 arkiver not he interface, but the way of the urls
18:00 🔗 arkiver with the *
18:01 🔗 underscor https://github.com/ikreymer/pywb
18:01 🔗 underscor It's a (nearly) feature-complete rewrite of the wayback machine in python by the engineer who wrote a lot of the old wayback components
18:02 🔗 underscor including the new "save-page-now" feature and the new api that was released on 10/24/13
18:02 🔗 arkiver wow
18:02 🔗 underscor (pywb is fully unaffiliated with IA, though)
18:02 🔗 arkiver gosh, IA could look at the code and implement facebook and twitter crolling in heritrix...
18:03 🔗 arkiver but it creates warc.gz files
18:03 🔗 arkiver so it should work with wayback
18:09 🔗 underscor the big issue is that java wayback doesn't support domain specific rules
18:10 🔗 underscor which you really need to playback weird ajaxy content
18:10 🔗 underscor certain headers, certain string manipulations, etc
18:10 🔗 DFJustin isn't it doing something custom for youtube
18:11 🔗 underscor kinda
18:11 🔗 underscor but it's very hacky
18:11 🔗 underscor and not scaleable at all
18:12 🔗 underscor also youtube doesn't require nearly as many "special" things, since it just replaces the yt player with a custom jw one
18:12 🔗 underscor jwplayer*
18:13 🔗 DFJustin it also seems to redirect you if there is a grab of the same video with different url parameters
18:24 🔗 arkiver is there an example in the wayback machine with a youtube page that actually plays a video?
18:24 🔗 arkiver I actually have never found such a video
18:29 🔗 DFJustin hmm the one I saved before doesn't work anymore, it seems kind of inconsistent
18:35 🔗 arkiver yes
18:36 🔗 arkiver but man the wayback machine is showing pages better and better
18:36 🔗 arkiver http://web.archive.org/web/20140312182903/https://www.uber.com/
18:36 🔗 arkiver looks great
18:37 🔗 arkiver DFJustin: they are crawling youtube videos: https://archive.org/details/youtubecrawl
18:37 🔗 arkiver but just blocking the
18:37 🔗 arkiver m
18:38 🔗 DFJustin I don't think they're blocking them so much as the wayback hack stuff is flaky
18:45 🔗 DFJustin you can see how it works with the archive-it ones https://wayback.archive-it.org/4399/20140301235240/http://www.youtube.com/watch?v=n1Q1p7Oc_5g
18:46 🔗 arkiver Ah I see, thank you
18:47 🔗 arkiver but why don't they just add that to the IA wayback machine too then?
18:51 🔗 DFJustin the code is in place it's just not working, probably because they have no manpower to fix everything all the time
18:52 🔗 arkiver ah, well, it's good to see the information is saved, even though it's not playable
19:02 🔗 yipdw_ another really annoying problem with infinite scrollers is that some sites (i.e. patch.com) shove a timestamp into the URL
19:03 🔗 yipdw_ so recording the request/response is insufficient
19:03 🔗 yipdw_ this is the result of voodoo web programming by people who think that cache-busting needs to be done with fucking the query string
19:39 🔗 DFJustin yeah seen a bunch of ?nocache=137891278941 stuff in archivebot
19:43 🔗 ohhdemgir http://www.reddit.com/r/IAmA/comments/2091d4/i_am_tim_bernerslee_i_invented_the_www_25_years/
19:45 🔗 Coderjoe well, URL-based cache busting is sometimes needed to get around stupid caching proxies that don't respect the cache control headers
19:52 🔗 DFJustin oh myspace brought old photos back
20:02 🔗 balrog DFJustin: did they bring journal content back too?
20:03 🔗 DFJustin the email only mentions photos
20:08 🔗 yipdw_ Coderjoe: I maintain that the blame is still on shitty web programmers who expect that they will always be able to expect a request from a browser
20:09 🔗 yipdw_ and designing their applications on faulty assumptions
20:18 🔗 Coderjoe I hate sites where all I get is a blank page until I start blessing javascript
20:59 🔗 exmic I hate a lot of things
20:59 🔗 exmic I practice hate-driven development
21:00 🔗 arrith i hear that's good for your health
23:43 🔗 nico http://redmine.replicant.us/projects/replicant/wiki/SamsungGalaxyBackdoor
23:43 🔗 nico dashcloud: :)

irclogger-viewer