#archiveteam-bs 2016-12-02,Fri

↑back Search

Time	Nickname	Message
00:02 ^🔗		Stilett0 has quit IRC (Ping timeout: 246 seconds)
00:07 ^🔗		yipdw has quit IRC (ny.us.hub irc.servercentral.net)
00:07 ^🔗		ravetcofx has quit IRC (ny.us.hub irc.servercentral.net)
00:07 ^🔗		robink has quit IRC (ny.us.hub irc.servercentral.net)
00:07 ^🔗		swebb has quit IRC (ny.us.hub irc.servercentral.net)
00:07 ^🔗		Laverne has quit IRC (ny.us.hub irc.servercentral.net)
00:07 ^🔗		Flierp has quit IRC (ny.us.hub irc.servercentral.net)
00:07 ^🔗		ZizzyDizz has quit IRC (ny.us.hub irc.servercentral.net)
00:14 ^🔗		Start has joined #archiveteam-bs
00:14 ^🔗		GE has quit IRC (Quit: zzz)
00:21 ^🔗		ravetcofx has joined #archiveteam-bs
00:21 ^🔗		Laverne has joined #archiveteam-bs
00:21 ^🔗		chazchaz has joined #archiveteam-bs
00:21 ^🔗		atlogbot has joined #archiveteam-bs
00:21 ^🔗		slyphic_ has joined #archiveteam-bs
00:21 ^🔗		Cameron_D has joined #archiveteam-bs
00:21 ^🔗		MrRadar has joined #archiveteam-bs
00:21 ^🔗		Flierp has joined #archiveteam-bs
00:21 ^🔗		ZizzyDizz has joined #archiveteam-bs
00:22 ^🔗		swebb_ is now known as swebb
00:25 ^🔗		brayden has joined #archiveteam-bs
00:53 ^🔗		tfgbd_znc has joined #archiveteam-bs
01:15 ^🔗		Somebody has joined #archiveteam-bs
01:17 ^🔗		Start has quit IRC (Quit: Disconnected.)
01:20 ^🔗		Start has joined #archiveteam-bs
01:23 ^🔗		Stiletto has joined #archiveteam-bs
01:37 ^🔗		VADemon has quit IRC (Quit: left4dead)
02:21 ^🔗		tfgbd_znc has quit IRC (Read error: Connection reset by peer)
02:23 ^🔗	Somebody	This delights my weird archivist heart: https://archive.org/details/MusicLocker_201608
02:26 ^🔗		tfgbd_znc has joined #archiveteam-bs
03:05 ^🔗		Somebody has quit IRC (Ping timeout: 370 seconds)
03:18 ^🔗		jrwr has quit IRC (Leaving)
03:21 ^🔗		Stiletto has quit IRC (Ping timeout: 246 seconds)
03:28 ^🔗		Ravenloft has quit IRC (Ping timeout: 244 seconds)
03:43 ^🔗		Somebody has joined #archiveteam-bs
03:50 ^🔗		Stiletto has joined #archiveteam-bs
03:56 ^🔗		dashcloud has quit IRC (Read error: Operation timed out)
03:57 ^🔗		dashcloud has joined #archiveteam-bs
04:26 ^🔗		vitzli has joined #archiveteam-bs
04:29 ^🔗		BlueMaxim has joined #archiveteam-bs
04:55 ^🔗		Yoshimura has quit IRC (Ping timeout: 255 seconds)
05:20 ^🔗	ranma	https://twitter.com/mikko/status/804232169728053252
05:20 ^🔗	ranma	<Chii> Mikko Hypponen on Twitter: "I'm a bit worried about what's going to happen to Pebble now that Fitbit seems to be acquiring them. https://t.co/vPjz2WRk1F" ~ twitter.com
05:32 ^🔗	yipdw_	ranma: https://badcheese.com/~steve/atlogs/?chan=archiveteam&day=2016-12-01
05:35 ^🔗	ranma	ah, i missed the !a request
05:45 ^🔗		Sk1d has quit IRC (Ping timeout: 250 seconds)
05:48 ^🔗		Aranje has joined #archiveteam-bs
05:52 ^🔗		Sk1d has joined #archiveteam-bs
05:54 ^🔗		Aranje has quit IRC (Read error: Connection timed out)
05:54 ^🔗		Aranje has joined #archiveteam-bs
06:02 ^🔗		ravetcofx has quit IRC (Read error: Operation timed out)
06:10 ^🔗		ndiddy has quit IRC (Read error: Connection reset by peer)
06:14 ^🔗		Somebody has quit IRC (Ping timeout: 370 seconds)
06:18 ^🔗		ravetcofx has joined #archiveteam-bs
06:26 ^🔗		krazedkat has quit IRC (Ping timeout: 244 seconds)
06:29 ^🔗		krazedkat has joined #archiveteam-bs
06:33 ^🔗		jsp12345 has quit IRC (Ping timeout: 492 seconds)
06:41 ^🔗		krazedkat has quit IRC (Ping timeout: 244 seconds)
06:42 ^🔗		krazedkat has joined #archiveteam-bs
06:55 ^🔗		Aranje has quit IRC (Quit: Three sheets to the wind)
07:04 ^🔗		alembic has joined #archiveteam-bs
07:07 ^🔗		alembic has quit IRC (Client Quit)
07:10 ^🔗		alembic has joined #archiveteam-bs
07:11 ^🔗		Somebody has joined #archiveteam-bs
07:11 ^🔗		alembic has quit IRC (Client Quit)
07:11 ^🔗		alembic has joined #archiveteam-bs
07:12 ^🔗		krazedkat has quit IRC (Quit: Leaving)
07:15 ^🔗		REiN^ has quit IRC (Max SendQ exceeded)
07:15 ^🔗		REiN^ has joined #archiveteam-bs
07:39 ^🔗		Stiletto has quit IRC (Read error: Connection reset by peer)
07:40 ^🔗		Stiletto has joined #archiveteam-bs
08:23 ^🔗		Somebody has quit IRC (Ping timeout: 370 seconds)
08:33 ^🔗		GE has joined #archiveteam-bs
09:12 ^🔗		yipdw_ is now known as yipdw
09:19 ^🔗		hawc145 is now known as HCross
09:22 ^🔗		vitzli has quit IRC (Quit: Leaving)
09:28 ^🔗		vitzli has joined #archiveteam-bs
09:34 ^🔗		HCross has quit IRC (Read error: Connection reset by peer)
09:35 ^🔗		HCross has joined #archiveteam-bs
09:36 ^🔗		xx343 has quit IRC (Read error: Connection reset by peer)
09:37 ^🔗		xx343 has joined #archiveteam-bs
10:00 ^🔗	godane	so Arirang Business Daily is almost done
10:00 ^🔗	godane	i'm uploading episode 2016-10-24 episode right now
10:37 ^🔗		BlueMaxim has quit IRC (Read error: Operation timed out)
10:38 ^🔗		BlueMaxim has joined #archiveteam-bs
10:39 ^🔗		ravetcofx has quit IRC (Read error: Operation timed out)
11:12 ^🔗	godane	i'm starting uploaded free north korea radio
11:13 ^🔗	godane	i got mp3s going back to feb 2010
11:13 ^🔗		GE has quit IRC (Remote host closed the connection)
11:40 ^🔗		BlueMaxim has quit IRC (Ping timeout: 370 seconds)
12:16 ^🔗		signius_ has joined #archiveteam-bs
12:54 ^🔗		VADemon has joined #archiveteam-bs
13:01 ^🔗		GE has joined #archiveteam-bs
14:09 ^🔗	godane	SketchCow: can you find out why WBAI archives are mostly gone
14:09 ^🔗	godane	there was like 10 years worth of mp3s about 18 months ago
14:27 ^🔗		vitzli has quit IRC (Quit: Leaving)
15:30 ^🔗		RichardG_ has joined #archiveteam-bs
15:31 ^🔗	godane	SketchCow: this needs to be put into a collection: https://archive.org/search.php?query=subject%3A%22nerdtv%22&sort=-publicdate&and[]=subject%3A%22cringely%22
15:31 ^🔗	godane	its the PBS NerdTV series
15:34 ^🔗		RichardG has quit IRC (Ping timeout: 364 seconds)
15:34 ^🔗	godane	we also got lucky cause the site is down
15:42 ^🔗		RichardG_ is now known as RichardG
15:59 ^🔗		fie has joined #archiveteam-bs
16:44 ^🔗		RichardG has quit IRC (Ping timeout: 250 seconds)
16:59 ^🔗		RichardG has joined #archiveteam-bs
17:20 ^🔗	godane	i'm grabbing tons of mp3s from 2005 from way back for wbai archive collection
17:36 ^🔗		Somebody has joined #archiveteam-bs
18:43 ^🔗		Somebody has quit IRC (Ping timeout: 370 seconds)
19:31 ^🔗		ravetcofx has joined #archiveteam-bs
19:34 ^🔗		drunksci has quit IRC (Remote host closed the connection)
19:38 ^🔗		jsp12345 has joined #archiveteam-bs
20:26 ^🔗		drunksci has joined #archiveteam-bs
20:27 ^🔗		BlueMaxim has joined #archiveteam-bs
20:31 ^🔗		Coderjoe has joined #archiveteam-bs
20:32 ^🔗		jrwr has joined #archiveteam-bs
20:33 ^🔗	Coderjoe	grr. I see several places linking to mailing list messages in the pipermail archive that used to be hosted at arduino.cc, but those links are now dead and I can't seem to find copies of those messages.
20:36 ^🔗	Coderjoe	the specific message I am currently trying to find used to live at http://arduino.cc/pipermail/developers_arduino.cc/2011-September/005568.html
20:37 ^🔗	Coderjoe	but I see several other dead links. this makes me both angry and sad.
20:39 ^🔗	joepie91	Coderjoe: it's possible that they got renumbered
20:39 ^🔗	joepie91	this happens with python mailinglists every few months
20:39 ^🔗	joepie91	it's very irritating
20:59 ^🔗		powerKitt has joined #archiveteam-bs
21:00 ^🔗	powerKitt	I want to start a project to archive the SCP Foundation wiki (and other Wikidot sites) but there's one "small" problem.
21:00 ^🔗	powerKitt	Usage of the API requires a $49.90 payment to Wikidot yearly.
21:01 ^🔗	powerKitt	http://www.wikidot.com/plans https://www.wikidot.com/doc:api
21:01 ^🔗	xmc	fffff
21:04 ^🔗	powerKitt	Also, it appears you may need to be a "member" of a wiki before you can scrape it using the API. I'd check, but I don't have $49.90 payable to Wikidot on hand.
21:05 ^🔗	joepie91	you wouldn't want to archive through the API anyway
21:05 ^🔗	joepie91	at least not initially
21:05 ^🔗	joepie91	a WARC from a web scrape is more useful, generally
21:05 ^🔗	xmc	also, i'm sure you could construct a pretty straightworward api from the website anyway
21:06 ^🔗	powerKitt	The main issue with WARC scrapes, Wikidot-wise at least, is that Wikidot pages are messes of javascript.
21:06 ^🔗	xmc	yeah
21:08 ^🔗	powerKitt	Take http://www.scp-wiki.net/scp-343 for example. Revision history and files uploaded are javascript drop downs.
21:09 ^🔗	powerKitt	Viewing a past revision require using the History dropdown, and then clicking the button to view the revision. URL does not change. Source code for an article revision is obtained the same way.
21:10 ^🔗	powerKitt	Oh, and the dropdowns don't even appear if you aren't a logged in Wikidot user who's a "member" of the SCP Foundation wiki.
21:10 ^🔗	xmc	i have a wikidot account, haven't formally joind scp-wiki in any way, and they're visible to me
21:10 ^🔗	xmc	when logged in
21:14 ^🔗	powerKitt	Huh.
21:15 ^🔗	powerKitt	http://ci-wiki.wikidot.com/item-experimentation They don't appear on this one, though. Which is strange.
21:17 ^🔗	powerKitt	http://ci-wiki.wikidot.com/system:list-all-pages It should be noted that this is the Wikidot equivalent of MediaWiki's Special:AllPages
21:26 ^🔗	powerKitt	http://r.wikidot.com/ What kind of absurd mistake is this. http://r.wikidot.com/system:list-all-pages
21:27 ^🔗	powerKitt	Who thought a Wikidot based //URL SHORTENER// was a good idea??
21:27 ^🔗	xmc	is this real?!?
21:27 ^🔗	xmc	oh my gosh
21:27 ^🔗	powerKitt	Apparently so
21:28 ^🔗	powerKitt	I mean, I've some complete mistakes of free site usage before.
21:28 ^🔗	powerKitt	But this is next level madness.
21:29 ^🔗	ae_g_i_s	especially since the domain is way too long for a shortener
21:31 ^🔗	powerKitt	Some quick calculations reveal that there's roughly 118762 pages on http://r.wikidot.com/
21:33 ^🔗	powerKitt	So there's probably a bit under that many links "shortened" this way, as I haven't subtracted non-link pages.
21:49 ^🔗	powerKitt	brb, checking sometihng
21:49 ^🔗		powerKitt has quit IRC (Quit: Page closed)
21:52 ^🔗		powerKitt has joined #archiveteam-bs
21:53 ^🔗	powerKitt	Well, I did some looking around, but I can't seem to find a way to get page source/revisions/filelists without javascript.
21:56 ^🔗		drunksci has quit IRC ()
21:56 ^🔗	powerKitt	http://web.archive.org/web/20161202215539/http://scp-wiki.wikidot.com/scp-2111/ Wow that is one ugly mess the Wayback machine spit out
21:58 ^🔗	powerKitt	http://i.imgur.com/pouy0Ar.png
22:05 ^🔗	xmc	my gosh
22:07 ^🔗	arkiver	powerKitt: try https://web-beta.archive.org/web/20161202215539/http://scp-wiki.wikidot.com/scp-2111/
22:08 ^🔗	powerKitt	Apparently if you try to save a wikidot page manually with a logged in account, it freaks out.
22:09 ^🔗	Coderjoe	joepie91: the entire pipermail tree is gone. the pipermail directory itself redirects to a google groups mailing list, and I don't think it includes any of the old list's messages
22:22 ^🔗	powerKitt	http://scp-jp-sandbox2.wikidot.com/system:list-all-pages Apparently it's possible for a Wikidot wiki to delete system:list-all-pages
22:22 ^🔗	powerKitt	great.
22:24 ^🔗	powerKitt	That, or it's localized to the wiki region
22:24 ^🔗	powerKitt	Either way, /great/.
22:27 ^🔗	ae_g_i_s	the js is also quite horrible ^^
22:28 ^🔗	ae_g_i_s	did a quick check how difficult it'd be to emulate/rewrite it, but it's minified and just...bah
22:29 ^🔗	yipdw	web-beta works fine, but honestly I like the glitchy version more
22:29 ^🔗	yipdw	I mean, really, the page has a mention of "memetic security systems"
22:29 ^🔗	yipdw	OF COURSE you need glitches
22:30 ^🔗	powerKitt	ae_g_i_s: If you're using Notepad++, the JSTool plugin can make less hideous looking.
22:32 ^🔗	yipdw	people who run cyberpunkish fiction sites should totally install request filters that, if they detect something like ia_archiver, introduces glitch CSS
22:32 ^🔗	yipdw	that would be awesome and would drive people here insane
22:32 ^🔗	*	yipdw +1
22:32 ^🔗	ae_g_i_s	powerKitt: thx, chrome does have a pretty printer too, but what it can't do is refactor variables and other identifiers :/
22:32 ^🔗	xmc	that's a perfect item for "evil thought of the day"
22:32 ^🔗	yipdw	I prefer to think of it as performance art
22:33 ^🔗	ae_g_i_s	:D
22:33 ^🔗	yipdw	you don't destroy of the content, you merely jack with its form
22:33 ^🔗	xmc	https://twitter.com/search?q=3totd
22:33 ^🔗	yipdw	oh i didn't know that was a thing
22:34 ^🔗	xmc	it's mostly a few people in seattle
22:35 ^🔗	yipdw	OR
22:36 ^🔗	yipdw	ok so, these days, you can get the current date from Javascript really easily and it's not too hard to use that to do things like manipulate CSS classes
22:36 ^🔗	yipdw	if you detect ia_archiver or ArchiveBot: introduce CSS and Javascript to activate it, but the change is subtle and occurs over time
22:36 ^🔗	yipdw	like, have the page slowly rot
22:37 ^🔗	xmc	or you could make two requests to the same endpoint, which should return different results; if you get the same result then you're being cached or archived, so activate the payload
22:37 ^🔗	ae_g_i_s	a dali painting over 4 years of wayback machine
22:37 ^🔗	xmc	hm
22:37 ^🔗	ae_g_i_s	melting away
22:37 ^🔗	xmc	i like this rotting idea though
22:39 ^🔗	yipdw	I wonder if there's a way to do this just with CSS
22:41 ^🔗	Sanqui	yipdw: css prefixes are literally a way of webpage rot
22:41 ^🔗	ae_g_i_s	yipdw: that's exactly the reason i have the wikipedia page for media queries open
22:41 ^🔗	yipdw	yeah, but I want something that occurs over time
22:41 ^🔗	Sanqui	because they stop being supported at some point
22:41 ^🔗	yipdw	controlled
22:41 ^🔗	yipdw	not via vendor prefixes
22:41 ^🔗	powerKitt	http://de-scp.wikidot.com/ http://scp-wiki-de.wikidot.com/ Weird. There's actually two German SCP Foundation wikis.
22:41 ^🔗	ae_g_i_s	but they don't seem to do this kind of thing, i can't find anything that'd depend on a long-term state or date
22:42 ^🔗	yipdw	yeah, me either
22:42 ^🔗	xmc	so you serve it with the current unix time, and the further the page's stored time is from its run-time, it degrades more?
22:42 ^🔗	xmc	stored in the source for the page or whatever
22:42 ^🔗	yipdw	the closest thing I've found so far are the :past/:future selectors in the CSS level 4 proposal
22:42 ^🔗	yipdw	but that's intended for WebVTT, which is all relative times
22:43 ^🔗	yipdw	xmc: something like that, but only if the page was requested in a way that it's clear an archiver user-agent was involved
22:43 ^🔗	Sanqui	really long term css animations
22:43 ^🔗	yipdw	so the rot has to be client-side and ideally would not involve JS
22:43 ^🔗	Sanqui	will degrade the page if you keep it open
22:43 ^🔗	Sanqui	lol
22:43 ^🔗	xmc	yipdw: hm.
22:44 ^🔗	xmc	i was just thinking "if this page was autogenerated more than X days ago, activate progressive rot"
22:44 ^🔗	yipdw	I figure someone must have done this already
22:44 ^🔗	xmc	oh! (1) does wayback serve with the date-modified header, and (2) can you fetch that from page scripting
22:44 ^🔗	yipdw	you can parse out the grab date from the URL
22:45 ^🔗	yipdw	but I think you still need Javascript to do that
22:45 ^🔗	ae_g_i_s	yeah, there's no 'text matching' in CSS selectors
22:45 ^🔗	xmc	i'm thinking a thing that works independent of wayback itself
22:45 ^🔗	xmc	just an age-of-page thingy
22:46 ^🔗	yipdw	oh
22:47 ^🔗	ae_g_i_s	one ugly and naive way to do it would be writing the age of the page as a class into a specific element in a server-side script...combined with a huge amount of css selectors
22:47 ^🔗	ae_g_i_s	i.e. one per day or whatever time unit you're using
22:47 ^🔗	ae_g_i_s	well, not just selectors, but also "CSS rules of the day" for every day in the future
22:48 ^🔗	yipdw	er wait lol
22:48 ^🔗	yipdw	<time>
22:48 ^🔗	yipdw	hmm
22:48 ^🔗	ae_g_i_s	?
22:48 ^🔗	ae_g_i_s	is that an actual tag?
22:48 ^🔗	yipdw	wait sorry
22:48 ^🔗	yipdw	that's a markup tag, not an input control
22:49 ^🔗	yipdw	there is an input type="time" and maybe you can do some stuff with attr^=value
22:49 ^🔗	ae_g_i_s	damn :/ also, i don't think you can select based on form controls' content
22:49 ^🔗	yipdw	er sorry, datetime
22:49 ^🔗	yipdw	yeah, I think that's true too
22:51 ^🔗	yipdw	that and date/datetime/datetime-local/etc. has pretty poor browser support, plus I don't see a way to autopopulate those with the current time
22:51 ^🔗	yipdw	maybe in the future though
22:54 ^🔗	ae_g_i_s	was gonna check out the 'turing complete' argument for CSS3/HTML5, but it requires user interaction
22:54 ^🔗	powerKitt	http://pastebin.com/FuVWe9nY Preliminary Wikidot scrape for SCP Foundation wikis.
22:56 ^🔗	yipdw	ae_g_i_s: oh, the Rule 110 automaton?
22:57 ^🔗	ae_g_i_s	yipdw: yeah, exactly...was considering if maybe some parts of it would be reusable for this, but probably not
22:57 ^🔗	yipdw	ah
23:01 ^🔗	powerKitt	http://developer.wikidot.com/i-want-api-access
23:01 ^🔗	powerKitt	"Note the API access needs to be enabled in _admin" well there goes my plans.
23:04 ^🔗	powerKitt	Looks like I'm definitely going to have to whip up some kind of scraper.
23:10 ^🔗		GE has quit IRC (Quit: zzz)
23:12 ^🔗	powerKitt	https://github.com/wertercatt/Wikidot-Scraper Used Google Chrome's inspect element tool to save the packets sent and recieved when you view:
23:13 ^🔗	ae_g_i_s	oh, cool
23:13 ^🔗	powerKitt	Page history, A specific page revision, source of a page revision, and file listing
23:14 ^🔗	powerKitt	I have no idea where to start on writing a scraper though, so help would be appreciated.
23:19 ^🔗	powerKitt	Fun fact: I have one of those "instantly save to Internet Archive" bookmarklets.
23:20 ^🔗	powerKitt	and every so often I accidently hit it while trying to click in the url bar.
23:20 ^🔗	xmc	:)
23:27 ^🔗	powerKitt	Anyway, I guess I'll try to figure out how site scraping works.
23:30 ^🔗	powerKitt	Idea: script that finds YouTube video pages saved to the wayback machine, and then runs tubeup.py on them.
23:34 ^🔗		ndiddy has joined #archiveteam-bs
23:37 ^🔗		powerKitt has quit IRC (Quit: Page closed)

irclogger-viewer