[00:32] *** JesseW has joined #archiveteam [00:35] *** Morbus has quit IRC (Read error: Operation timed out) [01:08] *** kremlin has joined #archiveteam [01:16] *** wp494_ is now known as wp494 [01:23] Who is uploading into wacko_files on FOS [01:23] *** hawc145 has joined #archiveteam [01:24] Whoever it is, you uploaded 67 gigabytes of basically commercial crap into it. Ub Iwerks, Animaniacs, Powerpull Girls, etc. [01:24] Deleted. [01:24] All of it. [01:25] Nothing uploaded. If we ever figure out who that is, they should be told to stop. [01:26] Same with a 14gb collection of "toontastic". Deleted. [01:27] *** HCross has quit IRC (Ping timeout: 370 seconds) [01:48] *** BlueMaxim has joined #archiveteam [01:59] *** WinterFox has joined #archiveteam [02:06] *** kyounko has joined #archiveteam [02:22] *** balrog has joined #archiveteam [02:22] *** swebb sets mode: +o balrog [02:23] ok, hackforums is sorted. [02:23] >10reqs/second, heavily filtered to just useful stuff. [02:25] *** balrog has quit IRC (Read error: Operation timed out) [02:36] *** balrog has joined #archiveteam [02:36] *** swebb sets mode: +o balrog [02:57] *** balrog has quit IRC (Quit: Bye) [03:21] *** zyphlar has joined #archiveteam [03:26] *** maelstrom has quit IRC (Quit: Leaving) [03:28] *** maelstrom has joined #archiveteam [03:36] *** balrog has joined #archiveteam [03:36] *** swebb sets mode: +o balrog [04:27] *** maelstrom has quit IRC (Read error: Operation timed out) [04:33] *** maelstrom has joined #archiveteam [04:42] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [04:49] *** Sk1d has joined #archiveteam [05:03] *** maelstrom has quit IRC (Leaving) [05:20] *** randomdes has quit IRC (Read error: Operation timed out) [05:24] *** twrist has joined #archiveteam [05:25] *** GLaDOS has quit IRC (Read error: Operation timed out) [05:25] *** twrist is now known as GLaDOS [05:36] *** yuitimoth has quit IRC (Ping timeout: 506 seconds) [05:50] *** yuitimoth has joined #archiveteam [06:52] *** JesseW has quit IRC (Read error: Operation timed out) [07:40] *** ravetcofx has quit IRC (Quit: Leaving) [07:55] https://archive.org/details/GithubReadme2016September [07:55] ^ capture of every single github readme :) [08:14] *** vOYtEC has quit IRC (Quit: rm -r *) [08:19] *** vOYtEC has joined #archiveteam [08:19] *** schbird has joined #archiveteam [08:19] https://de.wikipedia.org/wiki/EinsPlus and https://de.wikipedia.org/wiki/ZDFkultur are going to be shut down, not sure about their existing content [08:38] *** vOYtEC has quit IRC (Ping timeout: 370 seconds) [08:48] also, oh shit. http://www.bbc.co.uk/news/business-37503109 [08:49] Spotify 'in talks to take over Soundcloud' [09:16] *** atomotic has joined #archiveteam [09:27] *** schbirid has joined #archiveteam [09:35] *** schbird has quit IRC (Ping timeout: 255 seconds) [09:35] *** atomotic has quit IRC (Ping timeout: 260 seconds) [09:44] *** vOYtEC has joined #archiveteam [09:55] *** BartoCH has quit IRC (Ping timeout: 260 seconds) [09:59] *** yuitimoth has quit IRC (Ping timeout: 246 seconds) [10:05] *** vOYtEC has quit IRC (Ping timeout: 259 seconds) [10:08] *** yuitimoth has joined #archiveteam [10:08] *** zyphlar has quit IRC (Quit: Connection closed for inactivity) [10:13] *** godane has quit IRC (Ping timeout: 244 seconds) [10:24] *** BartoCH has joined #archiveteam [10:25] zout: awesome, so you're now using wpull with a custom script? [10:27] *** godane has joined #archiveteam [10:30] *** BartoCH has quit IRC (Ping timeout: 260 seconds) [10:31] *** BartoCH has joined #archiveteam [11:13] *** kyounko has quit IRC (KVIrc 4.2.0 Equilibrium http://www.kvirc.net/) [11:14] *** BlueMaxim has quit IRC (Quit: Leaving) [11:40] *** r3c0d3x has quit IRC (Ping timeout: 260 seconds) [11:52] *** r3c0d3x has joined #archiveteam [12:15] *** kris33 has joined #archiveteam [12:24] *** kristian_ has joined #archiveteam [12:38] *** godane has quit IRC (Quit: Leaving.) [12:52] *** RichardG has quit IRC (Ping timeout: 255 seconds) [13:01] *** RichardG has joined #archiveteam [13:24] *** yuitimoth has quit IRC (Ping timeout: 246 seconds) [13:25] *** yuitimoth has joined #archiveteam [13:33] *** kris33_ has joined #archiveteam [13:33] *** kris33 has quit IRC (Read error: Connection reset by peer) [13:34] *** dashcloud has quit IRC (Read error: Operation timed out) [13:37] *** dashcloud has joined #archiveteam [13:44] *** Super-Dot has quit IRC (Quit: Connection closed for inactivity) [14:17] *** godane has joined #archiveteam [14:21] *** yeoldetoa has joined #archiveteam [14:22] *** kris33_ has quit IRC (Textual IRC Client: www.textualapp.com) [14:34] *** Start has quit IRC (Ping timeout: 506 seconds) [14:39] *** dashcloud has quit IRC (Read error: Operation timed out) [14:43] *** dashcloud has joined #archiveteam [14:44] *** Start has joined #archiveteam [14:45] *** PurpleSym sets mode: -b *!*@c-24-1-111-31.hsd1.il.comcast.net [14:50] *** laufwerkf has joined #archiveteam [14:51] *** Start has quit IRC (Read error: Operation timed out) [14:52] *** laufwerkf has quit IRC (Read error: Connection reset by peer) [14:52] *** PurpleSym sets mode: +b *!*@c-24-1-111-31.hsd1.il.comcast.net [15:20] *** WinterFox has quit IRC (Read error: Operation timed out) [15:44] *** JesseW has joined #archiveteam [15:44] *** kristian_ has quit IRC (Quit: Leaving) [16:12] *** VADemon has joined #archiveteam [16:17] *** JesseW has quit IRC (Ping timeout: 370 seconds) [16:52] *** randomdes has joined #archiveteam [17:57] *** bRick5772 has joined #archiveteam [18:01] *** JW_work1 has joined #archiveteam [18:03] *** JW_work has quit IRC (Read error: Operation timed out) [18:13] *** AlexLehm has joined #archiveteam [18:56] *** phuzion has quit IRC (Read error: Operation timed out) [19:20] *** bRick5772 has quit IRC (Quit: Leaving.) [19:26] *** maelstrom has joined #archiveteam [19:33] *** phuzion has joined #archiveteam [20:13] *** Stiletto has quit IRC (Read error: Operation timed out) [20:50] *** VADemon has quit IRC (Quit: left4dead) [21:30] So, there is some talk at work about sunsetting a feature. Or more accurately, finally remove it from life support. It is a publishing thing that hasn't been supported for a couple of years now. No way to edit things, no way to create new ones, no way to view old ones. They're still accessible if you know the URL, so still available via Google/bookmarks/etc, but you're not going to stumble across them. [21:31] I am of the opinion that, absent a good reason to remove it, we should just leave it. It's not doing any harm (no lost productivity/minimal maintenance burden, etc), and people trusted us with their content. On the other hand, it is deprecated, and it is legacy code, and we would like to clean up that kind of stuff. So there is a growing position for "Anyone who wanted this will have taken a copy already, let's just kill [21:31] "no way to view old ones" and then you give the way to view old ones [21:31] Can anyone recommend any good links that advocate for my perspective on this? I'd like to have something better than my own voice in this discussion. [21:31] xmc: Sure, but I mean it's not exposed in the site navigation. [21:31] make a list of urls, run a warc-creating spider (e.g. wpull) on it, and give the warc files to us [21:32] How long would you estimate that process would take me? Setting up the spider & shipping you the content? We have a hack day coming up tomorrow, so I could squeeze it in to that. [21:34] Like, is that more like "This'll take you an hour to set up" or "this is fiddly, you'll need to spin up a VM & fight with some bash scripts" kind of thing? [21:34] wpull is a python thing, you can install it p quick [21:35] How big is the lists of sites? Both in URL count and storage size [21:36] creature: no way to bake the existing entries into html and serve them statically forever? [21:36] you could do that with wpull [21:37] Kaz: I'm not sure, and it turns out that our internal query tool is broken at the moment. But I think it's around 200,000 items. Basically "blog posts". [21:38] Sanqui: No doubt there is, but the bigger problem is convincing people that this is something they should care about and that it's worth putting in the time to do that. [21:38] if you can provide a list of base URLs, those could be grabbed pretty quickly [21:38] yeah, we could probably squeeze that into archivebot [21:39] *** Stiletto has joined #archiveteam [21:39] the way that would work, you'd just give us a list of urls, and keep it running until we give you the signal, then you can take it down, and it'll stay in the wayback machine [21:39] I'll see if I can do that as a contingency plan. That's a good idea. [21:40] is there media? images/videos? [21:40] or just plain blog posts? [21:40] Images. No videos. [21:40] ok [21:40] But as I say, I'm sure I've seen some articles/talks about why this stuff matters. I'd love something like that that I could share around. [21:40] sketchcow has done tons on this topic, hold on [21:41] It might have been one of SketchCow's talks. I remember a thing about Geocities, and about how people have died who built the site & their loved ones still visit it. [21:41] yes [21:41] http://www.archiveteam.org/index.php?title=User:Jscott [21:41] that is why we do this [21:43] those are all pretty old though [21:44] Like, my ideal outcome of this situation is that I convince a couple of the other devs & at least one product person that there's very little cost associated with it (hosting is basically free, not a lot of bandwidth, doesn't cause much maintenance, doesn't suck up developer time) and we _should_ host it (because we invited people to host it with us in the first place), so let's just keep it. [21:44] Because at least it's not just me taking that position then. [21:44] this talk is also good but not on that page and has some about geocities https://www.youtube.com/watch?v=-2ZTmuX3cog [21:44] archive it, quietly, in warc form, and also carry on this conversation [21:45] yeah, it would be great if you just kept it up. all of us are too gloomy at this point so we just focus on archival but if you can keep it running, that's ideal [21:45] *** AlexLehm has quit IRC (Ping timeout: 260 seconds) [21:45] *** schbirid has quit IRC (Quit: Leaving) [21:46] but if you're thinking of archiving it, you should [21:46] yeah. so just pass us the urls silently :) [21:47] I don't think it's an imminent thing, it just happened to come up again in passing & I wanted to do a better job of explaining to others why I care about it. [21:47] I'll have a look through Jason's articles and see if there's a good intro to it in there. [21:54] I HAVE DONE MORE RECENT TALKS [21:56] creature: I'm assuming that not naming the service in question was intentional? [21:58] i would assume so [21:59] joepie91: Pretty much. It wouldn't be too tricky to track it down, but as I say: no need to panic/scramble to archive it. [21:59] alright :) [22:00] It's an old topic that happened to come up again, and I figured this would be a good channel to ask about good intros to this stuff. [22:00] creature: I'm mostly asking because I might have some "in the wild" data of whether the data is still considered important by people [22:00] but that's more a curiosity point really [22:00] archival is a good idea regardless :P [22:01] Yeah. TBH, I'm not so worried about that. As I say, I think it's a couple of hundred thousand posts; there's gotta be at least a few in there that people care about. [22:03] There was a form being designed the other day that was asking for a user's gender, and it was really useful to be able to link to https://www.gov.uk/service-manual/user-centred-design/resources/patterns/gender-and-sex.html to explain why that's a more complicated thing than it might seem. So I was hoping for a similar intro/resource as a counterpoint to "Let's just turn it off, nobody uses it any more, we deprecated it ag [22:04] :) [22:10] Oh, one complication to wpull: our site uses a _lot_ of front-end rendering. The content might not render without JavaScript. Does wpull use a full-on headless browser, or is it expecting it to be in the returned content from a URL? [22:11] there is a phantomjs option with wpull [22:12] Super. :) [22:12] *** zout has quit IRC (Ping timeout: 244 seconds) [22:17] note, this runs the js and scrolls down and stores all the resources. it doesn't flatten the DOM and then recompose it as HTML [22:18] I'll give it a try, and see how it gets on. [22:22] Am I imagining a Wired profile of Archiveteam, that also touches a lot on why people want to preserve this stuff? [22:25] creature: do a test crawl and use webarchiveplayer to test it [22:43] *** Stiletto has quit IRC (Ping timeout: 244 seconds) [22:56] *** Start has joined #archiveteam [23:00] *** pfallenop has quit IRC (Ping timeout: 260 seconds) [23:08] *** Morbus has joined #archiveteam [23:16] *** hawc145 has quit IRC (Ping timeout: 250 seconds) [23:17] *** hawc145 has joined #archiveteam [23:28] *** Morbus has quit IRC (hub.efnet.us ircd.choopa.net) [23:28] *** phuzion has quit IRC (hub.efnet.us ircd.choopa.net) [23:28] *** GLaDOS has quit IRC (hub.efnet.us ircd.choopa.net) [23:28] *** balrog has quit IRC (hub.efnet.us ircd.choopa.net) [23:28] *** kremlin has quit IRC (hub.efnet.us ircd.choopa.net) [23:28] *** Smiley has quit IRC (hub.efnet.us ircd.choopa.net) [23:28] *** MMovie has quit IRC (hub.efnet.us ircd.choopa.net) [23:28] *** nwf has quit IRC (hub.efnet.us ircd.choopa.net) [23:28] *** trs80 has quit IRC (hub.efnet.us ircd.choopa.net) [23:28] *** bwn has quit IRC (hub.efnet.us ircd.choopa.net) [23:28] *** Kenshin has quit IRC (hub.efnet.us ircd.choopa.net) [23:28] *** maseck has quit IRC (hub.efnet.us ircd.choopa.net) [23:28] *** creature has quit IRC (hub.efnet.us ircd.choopa.net) [23:28] *** thefinn93 has quit IRC (hub.efnet.us ircd.choopa.net) [23:28] *** aschmitz has quit IRC (hub.efnet.us ircd.choopa.net) [23:28] *** Gfy has quit IRC (hub.efnet.us ircd.choopa.net) [23:28] *** fusl has quit IRC (hub.efnet.us ircd.choopa.net) [23:28] *** jk[SVP] has quit IRC (hub.efnet.us ircd.choopa.net) [23:28] *** ivan has quit IRC (hub.efnet.us ircd.choopa.net) [23:28] *** Petri152 has quit IRC (hub.efnet.us ircd.choopa.net) [23:28] *** _Zialus_ has quit IRC (hub.efnet.us ircd.choopa.net) [23:28] *** aMunster has quit IRC (hub.efnet.us ircd.choopa.net) [23:33] *** fusl has joined #archiveteam [23:33] *** Morbus has joined #archiveteam [23:33] *** phuzion has joined #archiveteam [23:33] *** GLaDOS has joined #archiveteam [23:33] *** balrog has joined #archiveteam [23:33] *** kremlin has joined #archiveteam [23:33] *** Smiley has joined #archiveteam [23:33] *** MMovie has joined #archiveteam [23:33] *** nwf has joined #archiveteam [23:33] *** trs80 has joined #archiveteam [23:33] *** bwn has joined #archiveteam [23:33] *** Kenshin has joined #archiveteam [23:33] *** maseck has joined #archiveteam [23:33] *** creature has joined #archiveteam [23:33] *** thefinn93 has joined #archiveteam [23:33] *** aschmitz has joined #archiveteam [23:33] *** Gfy has joined #archiveteam [23:33] *** jk[SVP] has joined #archiveteam [23:33] *** ivan has joined #archiveteam [23:33] *** Petri152 has joined #archiveteam [23:33] *** _Zialus_ has joined #archiveteam [23:33] *** aMunster has joined #archiveteam [23:33] *** ircd.choopa.net sets mode: +o balrog [23:33] *** swebb sets mode: +o balrog [23:34] *** yuitimoth has quit IRC (Ping timeout: 246 seconds) [23:37] *** yuitimoth has joined #archiveteam [23:45] creature: maybe one of these? http://www.archiveteam.org/index.php?title=In_The_Media [23:51] *** pfalleno1 has joined #archiveteam [23:54] this seems like a thing I may have to write at some point...