[00:02] *** Stilett0 has quit IRC (Ping timeout: 246 seconds)
[00:07] *** yipdw has quit IRC (ny.us.hub irc.servercentral.net)
[00:07] *** ravetcofx has quit IRC (ny.us.hub irc.servercentral.net)
[00:07] *** robink has quit IRC (ny.us.hub irc.servercentral.net)
[00:07] *** swebb has quit IRC (ny.us.hub irc.servercentral.net)
[00:07] *** Laverne has quit IRC (ny.us.hub irc.servercentral.net)
[00:07] *** Flierp has quit IRC (ny.us.hub irc.servercentral.net)
[00:07] *** ZizzyDizz has quit IRC (ny.us.hub irc.servercentral.net)
[00:14] *** Start has joined #archiveteam-bs
[00:14] *** GE has quit IRC (Quit: zzz)
[00:21] *** ravetcofx has joined #archiveteam-bs
[00:21] *** Laverne has joined #archiveteam-bs
[00:21] *** chazchaz has joined #archiveteam-bs
[00:21] *** atlogbot has joined #archiveteam-bs
[00:21] *** slyphic_ has joined #archiveteam-bs
[00:21] *** Cameron_D has joined #archiveteam-bs
[00:21] *** MrRadar has joined #archiveteam-bs
[00:21] *** Flierp has joined #archiveteam-bs
[00:21] *** ZizzyDizz has joined #archiveteam-bs
[00:22] *** swebb_ is now known as swebb
[00:25] *** brayden has joined #archiveteam-bs
[00:53] *** tfgbd_znc has joined #archiveteam-bs
[01:15] *** Somebody has joined #archiveteam-bs
[01:17] *** Start has quit IRC (Quit: Disconnected.)
[01:20] *** Start has joined #archiveteam-bs
[01:23] *** Stiletto has joined #archiveteam-bs
[01:37] *** VADemon has quit IRC (Quit: left4dead)
[02:21] *** tfgbd_znc has quit IRC (Read error: Connection reset by peer)
[02:23] <Somebody> This delights my weird archivist heart: https://archive.org/details/MusicLocker_201608
[02:26] *** tfgbd_znc has joined #archiveteam-bs
[03:05] *** Somebody has quit IRC (Ping timeout: 370 seconds)
[03:18] *** jrwr has quit IRC (Leaving)
[03:21] *** Stiletto has quit IRC (Ping timeout: 246 seconds)
[03:28] *** Ravenloft has quit IRC (Ping timeout: 244 seconds)
[03:43] *** Somebody has joined #archiveteam-bs
[03:50] *** Stiletto has joined #archiveteam-bs
[03:56] *** dashcloud has quit IRC (Read error: Operation timed out)
[03:57] *** dashcloud has joined #archiveteam-bs
[04:26] *** vitzli has joined #archiveteam-bs
[04:29] *** BlueMaxim has joined #archiveteam-bs
[04:55] *** Yoshimura has quit IRC (Ping timeout: 255 seconds)
[05:20] <ranma> https://twitter.com/mikko/status/804232169728053252
[05:20] <ranma> <Chii> Mikko Hypponen on Twitter: "I'm a bit worried about what's going to happen to Pebble now that Fitbit seems to be acquiring them. https://t.co/vPjz2WRk1F" ~ twitter.com
[05:32] <yipdw_> ranma: https://badcheese.com/~steve/atlogs/?chan=archiveteam&day=2016-12-01
[05:35] <ranma> ah, i missed the !a request
[05:45] *** Sk1d has quit IRC (Ping timeout: 250 seconds)
[05:48] *** Aranje has joined #archiveteam-bs
[05:52] *** Sk1d has joined #archiveteam-bs
[05:54] *** Aranje has quit IRC (Read error: Connection timed out)
[05:54] *** Aranje has joined #archiveteam-bs
[06:02] *** ravetcofx has quit IRC (Read error: Operation timed out)
[06:10] *** ndiddy has quit IRC (Read error: Connection reset by peer)
[06:14] *** Somebody has quit IRC (Ping timeout: 370 seconds)
[06:18] *** ravetcofx has joined #archiveteam-bs
[06:26] *** krazedkat has quit IRC (Ping timeout: 244 seconds)
[06:29] *** krazedkat has joined #archiveteam-bs
[06:33] *** jsp12345 has quit IRC (Ping timeout: 492 seconds)
[06:41] *** krazedkat has quit IRC (Ping timeout: 244 seconds)
[06:42] *** krazedkat has joined #archiveteam-bs
[06:55] *** Aranje has quit IRC (Quit: Three sheets to the wind)
[07:04] *** alembic has joined #archiveteam-bs
[07:07] *** alembic has quit IRC (Client Quit)
[07:10] *** alembic has joined #archiveteam-bs
[07:11] *** Somebody has joined #archiveteam-bs
[07:11] *** alembic has quit IRC (Client Quit)
[07:11] *** alembic has joined #archiveteam-bs
[07:12] *** krazedkat has quit IRC (Quit: Leaving)
[07:15] *** REiN^ has quit IRC (Max SendQ exceeded)
[07:15] *** REiN^ has joined #archiveteam-bs
[07:39] *** Stiletto has quit IRC (Read error: Connection reset by peer)
[07:40] *** Stiletto has joined #archiveteam-bs
[08:23] *** Somebody has quit IRC (Ping timeout: 370 seconds)
[08:33] *** GE has joined #archiveteam-bs
[09:12] *** yipdw_ is now known as yipdw
[09:19] *** hawc145 is now known as HCross
[09:22] *** vitzli has quit IRC (Quit: Leaving)
[09:28] *** vitzli has joined #archiveteam-bs
[09:34] *** HCross has quit IRC (Read error: Connection reset by peer)
[09:35] *** HCross has joined #archiveteam-bs
[09:36] *** xx343 has quit IRC (Read error: Connection reset by peer)
[09:37] *** xx343 has joined #archiveteam-bs
[10:00] <godane> so Arirang Business Daily is almost done
[10:00] <godane> i'm uploading episode 2016-10-24 episode right now
[10:37] *** BlueMaxim has quit IRC (Read error: Operation timed out)
[10:38] *** BlueMaxim has joined #archiveteam-bs
[10:39] *** ravetcofx has quit IRC (Read error: Operation timed out)
[11:12] <godane> i'm starting uploaded free north korea radio
[11:13] <godane> i got mp3s going back to feb 2010
[11:13] *** GE has quit IRC (Remote host closed the connection)
[11:40] *** BlueMaxim has quit IRC (Ping timeout: 370 seconds)
[12:16] *** signius_ has joined #archiveteam-bs
[12:54] *** VADemon has joined #archiveteam-bs
[13:01] *** GE has joined #archiveteam-bs
[14:09] <godane> SketchCow: can you find out why WBAI archives are mostly gone
[14:09] <godane> there was like 10 years worth of mp3s about 18 months ago
[14:27] *** vitzli has quit IRC (Quit: Leaving)
[15:30] *** RichardG_ has joined #archiveteam-bs
[15:31] <godane> SketchCow: this needs to be put into a collection: https://archive.org/search.php?query=subject%3A%22nerdtv%22&sort=-publicdate&and[]=subject%3A%22cringely%22
[15:31] <godane> its the PBS NerdTV series
[15:34] *** RichardG has quit IRC (Ping timeout: 364 seconds)
[15:34] <godane> we also got lucky cause the site is down
[15:42] *** RichardG_ is now known as RichardG
[15:59] *** fie has joined #archiveteam-bs
[16:44] *** RichardG has quit IRC (Ping timeout: 250 seconds)
[16:59] *** RichardG has joined #archiveteam-bs
[17:20] <godane> i'm grabbing tons of mp3s from 2005 from way back for wbai archive collection
[17:36] *** Somebody has joined #archiveteam-bs
[18:43] *** Somebody has quit IRC (Ping timeout: 370 seconds)
[19:31] *** ravetcofx has joined #archiveteam-bs
[19:34] *** drunksci has quit IRC (Remote host closed the connection)
[19:38] *** jsp12345 has joined #archiveteam-bs
[20:26] *** drunksci has joined #archiveteam-bs
[20:27] *** BlueMaxim has joined #archiveteam-bs
[20:31] *** Coderjoe has joined #archiveteam-bs
[20:32] *** jrwr has joined #archiveteam-bs
[20:33] <Coderjoe> grr. I see several places linking to mailing list messages in the pipermail archive that used to be hosted at arduino.cc, but those links are now dead and I can't seem to find copies of those messages.
[20:36] <Coderjoe> the specific message I am currently trying to find used to live at http://arduino.cc/pipermail/developers_arduino.cc/2011-September/005568.html
[20:37] <Coderjoe> but I see several other dead links. this makes me both angry and sad.
[20:39] <joepie91> Coderjoe: it's possible that they got renumbered
[20:39] <joepie91> this happens with python mailinglists every few months
[20:39] <joepie91> it's very irritating
[20:59] *** powerKitt has joined #archiveteam-bs
[21:00] <powerKitt> I want to start a project to archive the SCP Foundation wiki (and other Wikidot sites) but there's one "small" problem.
[21:00] <powerKitt> Usage of the API requires a $49.90 payment to Wikidot yearly. 
[21:01] <powerKitt> http://www.wikidot.com/plans https://www.wikidot.com/doc:api
[21:01] <xmc> fffff
[21:04] <powerKitt> Also, it appears you may need to be a "member" of a wiki before you can scrape it using the API. I'd check, but I don't have $49.90 payable to Wikidot on hand.
[21:05] <joepie91> you wouldn't want to archive through the API anyway
[21:05] <joepie91> at least not initially
[21:05] <joepie91> a WARC from a web scrape is more useful, generally
[21:05] <xmc> also, i'm sure you could construct a pretty straightworward api from the website anyway
[21:06] <powerKitt> The main issue with WARC scrapes, Wikidot-wise at least, is that Wikidot pages are messes of javascript.
[21:06] <xmc> yeah
[21:08] <powerKitt> Take http://www.scp-wiki.net/scp-343 for example. Revision history and files uploaded are javascript drop downs.
[21:09] <powerKitt> Viewing a past revision require using the History dropdown, and then clicking the button to view the revision. URL does not change. Source code for an article revision is obtained the same way.
[21:10] <powerKitt> Oh, and the dropdowns don't even appear if you aren't a logged in Wikidot user who's a "member" of the SCP Foundation wiki.
[21:10] <xmc> i have a wikidot account, haven't formally joind scp-wiki in any way, and they're visible to me
[21:10] <xmc> when logged in
[21:14] <powerKitt> Huh.
[21:15] <powerKitt> http://ci-wiki.wikidot.com/item-experimentation They don't appear on this one, though. Which is strange.
[21:17] <powerKitt> http://ci-wiki.wikidot.com/system:list-all-pages It should be noted that this is the Wikidot equivalent of MediaWiki's Special:AllPages
[21:26] <powerKitt> http://r.wikidot.com/ What kind of absurd mistake is this. http://r.wikidot.com/system:list-all-pages
[21:27] <powerKitt> Who thought a Wikidot based //URL SHORTENER// was a good idea??
[21:27] <xmc> is this real?!?
[21:27] <xmc> oh my gosh
[21:27] <powerKitt> Apparently so
[21:28] <powerKitt> I mean, I've some complete mistakes of free site usage before.
[21:28] <powerKitt> But this is next level madness.
[21:29] <ae_g_i_s> especially since the domain is way too long for a shortener
[21:31] <powerKitt> Some quick calculations reveal that there's roughly 118762 pages on http://r.wikidot.com/
[21:33] <powerKitt> So there's probably a bit under that many links "shortened" this way, as I haven't subtracted non-link pages.
[21:49] <powerKitt> brb, checking sometihng
[21:49] *** powerKitt has quit IRC (Quit: Page closed)
[21:52] *** powerKitt has joined #archiveteam-bs
[21:53] <powerKitt> Well, I did some looking around, but I can't seem to find a way to get page source/revisions/filelists without javascript.
[21:56] *** drunksci has quit IRC ()
[21:56] <powerKitt> http://web.archive.org/web/20161202215539/http://scp-wiki.wikidot.com/scp-2111/ Wow that is one ugly mess the Wayback machine spit out
[21:58] <powerKitt> http://i.imgur.com/pouy0Ar.png 
[22:05] <xmc> my gosh
[22:07] <arkiver> powerKitt: try https://web-beta.archive.org/web/20161202215539/http://scp-wiki.wikidot.com/scp-2111/
[22:08] <powerKitt> Apparently if you try to save a wikidot page manually with a logged in account, it freaks out.
[22:09] <Coderjoe> joepie91: the entire pipermail tree is gone. the pipermail directory itself redirects to a google groups mailing list, and I don't think it includes any of the old list's messages
[22:22] <powerKitt> http://scp-jp-sandbox2.wikidot.com/system:list-all-pages Apparently it's possible for a Wikidot wiki to delete system:list-all-pages
[22:22] <powerKitt> great.
[22:24] <powerKitt> That, or it's localized to the wiki region
[22:24] <powerKitt> Either way, /great/.
[22:27] <ae_g_i_s> the js is also quite horrible ^^
[22:28] <ae_g_i_s> did a quick check how difficult it'd be to emulate/rewrite it, but it's minified and just...bah
[22:29] <yipdw> web-beta works fine, but honestly I like the glitchy version more
[22:29] <yipdw> I mean, really, the page has a mention of "memetic security systems"
[22:29] <yipdw> OF COURSE you need glitches
[22:30] <powerKitt> ae_g_i_s: If you're using Notepad++, the JSTool plugin can make less hideous looking.
[22:32] <yipdw> people who run cyberpunkish fiction sites should totally install request filters that, if they detect something like ia_archiver, introduces glitch CSS
[22:32] <yipdw> that would be awesome and would drive people here insane
[22:32] * yipdw +1
[22:32] <ae_g_i_s> powerKitt: thx, chrome does have a pretty printer too, but what it can't do is refactor variables and other identifiers :/
[22:32] <xmc> that's a perfect item for "evil thought of the day"
[22:32] <yipdw> I prefer to think of it as performance art
[22:33] <ae_g_i_s> :D
[22:33] <yipdw> you don't destroy of the content, you merely jack with its form
[22:33] <xmc> https://twitter.com/search?q=3totd
[22:33] <yipdw> oh i didn't know that was a thing
[22:34] <xmc> it's mostly a few people in seattle
[22:35] <yipdw> OR 
[22:36] <yipdw> ok so, these days, you can get the current date from Javascript really easily and it's not too hard to use that to do things like manipulate CSS classes
[22:36] <yipdw> if you detect ia_archiver or ArchiveBot: introduce CSS and Javascript to activate it, but the change is subtle and occurs over time
[22:36] <yipdw> like, have the page slowly rot
[22:37] <xmc> or you could make two requests to the same endpoint, which should return different results; if you get the same result then you're being cached or archived, so activate the payload
[22:37] <ae_g_i_s> a dali painting over 4 years of wayback machine
[22:37] <xmc> hm
[22:37] <ae_g_i_s> melting away
[22:37] <xmc> i like this rotting idea though
[22:39] <yipdw>  I wonder if there's a way to do this just with CSS
[22:41] <Sanqui> yipdw: css prefixes are literally a way of webpage rot
[22:41] <ae_g_i_s> yipdw: that's exactly the reason i have the wikipedia page for media queries open
[22:41] <yipdw> yeah, but I want something that occurs over time
[22:41] <Sanqui> because they stop being supported at some point
[22:41] <yipdw> controlled
[22:41] <yipdw> not via vendor prefixes
[22:41] <powerKitt> http://de-scp.wikidot.com/ http://scp-wiki-de.wikidot.com/ Weird. There's actually two German SCP Foundation wikis.
[22:41] <ae_g_i_s> but they don't seem to do this kind of thing, i can't find anything that'd depend on a long-term state or date
[22:42] <yipdw> yeah, me either
[22:42] <xmc> so you serve it with the current unix time, and the further the page's stored time is from its run-time, it degrades more?
[22:42] <xmc> stored in the source for the page or whatever
[22:42] <yipdw> the closest thing I've found so far are the :past/:future selectors in the CSS level 4 proposal
[22:42] <yipdw> but that's intended for WebVTT, which is all relative times
[22:43] <yipdw> xmc: something like that, but only if the page was requested in a way that it's clear an archiver user-agent was involved
[22:43] <Sanqui> really long term css animations
[22:43] <yipdw> so the rot has to be client-side and ideally would not involve JS
[22:43] <Sanqui> will degrade the page if you keep it open
[22:43] <Sanqui> lol
[22:43] <xmc> yipdw: hm.
[22:44] <xmc> i was just thinking "if this page was autogenerated more than X days ago, activate progressive rot"
[22:44] <yipdw> I figure someone must have done this already
[22:44] <xmc> oh! (1) does wayback serve with the date-modified header, and (2) can you fetch that from page scripting
[22:44] <yipdw> you can parse out the grab date from the URL
[22:45] <yipdw> but I think you still need Javascript to do that
[22:45] <ae_g_i_s> yeah, there's no 'text matching' in CSS selectors
[22:45] <xmc> i'm thinking a thing that works independent of wayback itself
[22:45] <xmc> just an age-of-page thingy
[22:46] <yipdw> oh
[22:47] <ae_g_i_s> one ugly and naive way to do it would be writing the age of the page as a class into a specific element in a server-side script...combined with a huge amount of css selectors
[22:47] <ae_g_i_s> i.e. one per day or whatever time unit you're using
[22:47] <ae_g_i_s> well, not just selectors, but also "CSS rules of the day" for every day in the future
[22:48] <yipdw> er wait lol
[22:48] <yipdw> <time>
[22:48] <yipdw> hmm
[22:48] <ae_g_i_s> ?
[22:48] <ae_g_i_s> is that an actual tag?
[22:48] <yipdw> wait sorry
[22:48] <yipdw> that's a markup tag, not an input control
[22:49] <yipdw> there is an input type="time" and maybe you can do some stuff with attr^=value
[22:49] <ae_g_i_s> damn :/ also, i don't think you can select based on form controls' content
[22:49] <yipdw> er sorry, datetime
[22:49] <yipdw> yeah, I think that's true too 
[22:51] <yipdw> that and date/datetime/datetime-local/etc. has pretty poor browser support, plus I don't see a way to autopopulate those with the current time
[22:51] <yipdw> maybe in the future though
[22:54] <ae_g_i_s> was gonna check out the 'turing complete' argument for CSS3/HTML5, but it requires user interaction
[22:54] <powerKitt> http://pastebin.com/FuVWe9nY Preliminary Wikidot scrape for SCP Foundation wikis.
[22:56] <yipdw> ae_g_i_s: oh, the Rule 110 automaton?
[22:57] <ae_g_i_s> yipdw: yeah, exactly...was considering if maybe some parts of it would be reusable for this, but probably not
[22:57] <yipdw> ah
[23:01] <powerKitt> http://developer.wikidot.com/i-want-api-access 
[23:01] <powerKitt> "Note the API access needs to be enabled in _admin" well there goes my plans.
[23:04] <powerKitt> Looks like I'm definitely going to have to whip up some kind of scraper.
[23:10] *** GE has quit IRC (Quit: zzz)
[23:12] <powerKitt> https://github.com/wertercatt/Wikidot-Scraper Used Google Chrome's inspect element tool to save the packets sent and recieved when you view:
[23:13] <ae_g_i_s> oh, cool
[23:13] <powerKitt> Page history, A specific page revision, source of a page revision, and file listing
[23:14] <powerKitt> I have no idea where to start on writing a scraper though, so help would be appreciated.
[23:19] <powerKitt> Fun fact: I have one of those "instantly save to Internet Archive" bookmarklets.
[23:20] <powerKitt> and every so often I accidently hit it while trying to click in the url bar.
[23:20] <xmc> :)
[23:27] <powerKitt> Anyway, I guess I'll try to figure out how site scraping works.
[23:30] <powerKitt> Idea: script that finds YouTube video pages saved to the wayback machine, and then runs tubeup.py on them.
[23:34] *** ndiddy has joined #archiveteam-bs
[23:37] *** powerKitt has quit IRC (Quit: Page closed)