#archiveteam 2016-09-29,Thu

↑back Search

Time Nickname Message
00:32 🔗 JesseW has joined #archiveteam
00:35 🔗 Morbus has quit IRC (Read error: Operation timed out)
01:08 🔗 kremlin has joined #archiveteam
01:16 🔗 wp494_ is now known as wp494
01:23 🔗 SketchCow Who is uploading into wacko_files on FOS
01:23 🔗 hawc145 has joined #archiveteam
01:24 🔗 SketchCow Whoever it is, you uploaded 67 gigabytes of basically commercial crap into it. Ub Iwerks, Animaniacs, Powerpull Girls, etc.
01:24 🔗 SketchCow Deleted.
01:24 🔗 SketchCow All of it.
01:25 🔗 SketchCow Nothing uploaded. If we ever figure out who that is, they should be told to stop.
01:26 🔗 SketchCow Same with a 14gb collection of "toontastic". Deleted.
01:27 🔗 HCross has quit IRC (Ping timeout: 370 seconds)
01:48 🔗 BlueMaxim has joined #archiveteam
01:59 🔗 WinterFox has joined #archiveteam
02:06 🔗 kyounko has joined #archiveteam
02:22 🔗 balrog has joined #archiveteam
02:22 🔗 swebb sets mode: +o balrog
02:23 🔗 zout ok, hackforums is sorted.
02:23 🔗 zout >10reqs/second, heavily filtered to just useful stuff.
02:25 🔗 balrog has quit IRC (Read error: Operation timed out)
02:36 🔗 balrog has joined #archiveteam
02:36 🔗 swebb sets mode: +o balrog
02:57 🔗 balrog has quit IRC (Quit: Bye)
03:21 🔗 zyphlar has joined #archiveteam
03:26 🔗 maelstrom has quit IRC (Quit: Leaving)
03:28 🔗 maelstrom has joined #archiveteam
03:36 🔗 balrog has joined #archiveteam
03:36 🔗 swebb sets mode: +o balrog
04:27 🔗 maelstrom has quit IRC (Read error: Operation timed out)
04:33 🔗 maelstrom has joined #archiveteam
04:42 🔗 Sk1d has quit IRC (Ping timeout: 250 seconds)
04:49 🔗 Sk1d has joined #archiveteam
05:03 🔗 maelstrom has quit IRC (Leaving)
05:20 🔗 randomdes has quit IRC (Read error: Operation timed out)
05:24 🔗 twrist has joined #archiveteam
05:25 🔗 GLaDOS has quit IRC (Read error: Operation timed out)
05:25 🔗 twrist is now known as GLaDOS
05:36 🔗 yuitimoth has quit IRC (Ping timeout: 506 seconds)
05:50 🔗 yuitimoth has joined #archiveteam
06:52 🔗 JesseW has quit IRC (Read error: Operation timed out)
07:40 🔗 ravetcofx has quit IRC (Quit: Leaving)
07:55 🔗 zout https://archive.org/details/GithubReadme2016September
07:55 🔗 zout ^ capture of every single github readme :)
08:14 🔗 vOYtEC has quit IRC (Quit: rm -r *)
08:19 🔗 vOYtEC has joined #archiveteam
08:19 🔗 schbird has joined #archiveteam
08:19 🔗 schbird https://de.wikipedia.org/wiki/EinsPlus and https://de.wikipedia.org/wiki/ZDFkultur are going to be shut down, not sure about their existing content
08:38 🔗 vOYtEC has quit IRC (Ping timeout: 370 seconds)
08:48 🔗 schbird also, oh shit. http://www.bbc.co.uk/news/business-37503109
08:49 🔗 schbird Spotify 'in talks to take over Soundcloud'
09:16 🔗 atomotic has joined #archiveteam
09:27 🔗 schbirid has joined #archiveteam
09:35 🔗 schbird has quit IRC (Ping timeout: 255 seconds)
09:35 🔗 atomotic has quit IRC (Ping timeout: 260 seconds)
09:44 🔗 vOYtEC has joined #archiveteam
09:55 🔗 BartoCH has quit IRC (Ping timeout: 260 seconds)
09:59 🔗 yuitimoth has quit IRC (Ping timeout: 246 seconds)
10:05 🔗 vOYtEC has quit IRC (Ping timeout: 259 seconds)
10:08 🔗 yuitimoth has joined #archiveteam
10:08 🔗 zyphlar has quit IRC (Quit: Connection closed for inactivity)
10:13 🔗 godane has quit IRC (Ping timeout: 244 seconds)
10:24 🔗 BartoCH has joined #archiveteam
10:25 🔗 arkiver zout: awesome, so you're now using wpull with a custom script?
10:27 🔗 godane has joined #archiveteam
10:30 🔗 BartoCH has quit IRC (Ping timeout: 260 seconds)
10:31 🔗 BartoCH has joined #archiveteam
11:13 🔗 kyounko has quit IRC (KVIrc 4.2.0 Equilibrium http://www.kvirc.net/)
11:14 🔗 BlueMaxim has quit IRC (Quit: Leaving)
11:40 🔗 r3c0d3x has quit IRC (Ping timeout: 260 seconds)
11:52 🔗 r3c0d3x has joined #archiveteam
12:15 🔗 kris33 has joined #archiveteam
12:24 🔗 kristian_ has joined #archiveteam
12:38 🔗 godane has quit IRC (Quit: Leaving.)
12:52 🔗 RichardG has quit IRC (Ping timeout: 255 seconds)
13:01 🔗 RichardG has joined #archiveteam
13:24 🔗 yuitimoth has quit IRC (Ping timeout: 246 seconds)
13:25 🔗 yuitimoth has joined #archiveteam
13:33 🔗 kris33_ has joined #archiveteam
13:33 🔗 kris33 has quit IRC (Read error: Connection reset by peer)
13:34 🔗 dashcloud has quit IRC (Read error: Operation timed out)
13:37 🔗 dashcloud has joined #archiveteam
13:44 🔗 Super-Dot has quit IRC (Quit: Connection closed for inactivity)
14:17 🔗 godane has joined #archiveteam
14:21 🔗 yeoldetoa has joined #archiveteam
14:22 🔗 kris33_ has quit IRC (Textual IRC Client: www.textualapp.com)
14:34 🔗 Start has quit IRC (Ping timeout: 506 seconds)
14:39 🔗 dashcloud has quit IRC (Read error: Operation timed out)
14:43 🔗 dashcloud has joined #archiveteam
14:44 🔗 Start has joined #archiveteam
14:45 🔗 PurpleSym sets mode: -b *!*@c-24-1-111-31.hsd1.il.comcast.net
14:50 🔗 laufwerkf has joined #archiveteam
14:51 🔗 Start has quit IRC (Read error: Operation timed out)
14:52 🔗 laufwerkf has quit IRC (Read error: Connection reset by peer)
14:52 🔗 PurpleSym sets mode: +b *!*@c-24-1-111-31.hsd1.il.comcast.net
15:20 🔗 WinterFox has quit IRC (Read error: Operation timed out)
15:44 🔗 JesseW has joined #archiveteam
15:44 🔗 kristian_ has quit IRC (Quit: Leaving)
16:12 🔗 VADemon has joined #archiveteam
16:17 🔗 JesseW has quit IRC (Ping timeout: 370 seconds)
16:52 🔗 randomdes has joined #archiveteam
17:57 🔗 bRick5772 has joined #archiveteam
18:01 🔗 JW_work1 has joined #archiveteam
18:03 🔗 JW_work has quit IRC (Read error: Operation timed out)
18:13 🔗 AlexLehm has joined #archiveteam
18:56 🔗 phuzion has quit IRC (Read error: Operation timed out)
19:20 🔗 bRick5772 has quit IRC (Quit: Leaving.)
19:26 🔗 maelstrom has joined #archiveteam
19:33 🔗 phuzion has joined #archiveteam
20:13 🔗 Stiletto has quit IRC (Read error: Operation timed out)
20:50 🔗 VADemon has quit IRC (Quit: left4dead)
21:30 🔗 creature So, there is some talk at work about sunsetting a feature. Or more accurately, finally remove it from life support. It is a publishing thing that hasn't been supported for a couple of years now. No way to edit things, no way to create new ones, no way to view old ones. They're still accessible if you know the URL, so still available via Google/bookmarks/etc, but you're not going to stumble across them.
21:31 🔗 creature I am of the opinion that, absent a good reason to remove it, we should just leave it. It's not doing any harm (no lost productivity/minimal maintenance burden, etc), and people trusted us with their content. On the other hand, it is deprecated, and it is legacy code, and we would like to clean up that kind of stuff. So there is a growing position for "Anyone who wanted this will have taken a copy already, let's just kill
21:31 🔗 xmc "no way to view old ones" and then you give the way to view old ones
21:31 🔗 creature Can anyone recommend any good links that advocate for my perspective on this? I'd like to have something better than my own voice in this discussion.
21:31 🔗 creature xmc: Sure, but I mean it's not exposed in the site navigation.
21:31 🔗 xmc make a list of urls, run a warc-creating spider (e.g. wpull) on it, and give the warc files to us
21:32 🔗 creature How long would you estimate that process would take me? Setting up the spider & shipping you the content? We have a hack day coming up tomorrow, so I could squeeze it in to that.
21:34 🔗 creature Like, is that more like "This'll take you an hour to set up" or "this is fiddly, you'll need to spin up a VM & fight with some bash scripts" kind of thing?
21:34 🔗 xmc wpull is a python thing, you can install it p quick
21:35 🔗 Kaz How big is the lists of sites? Both in URL count and storage size
21:36 🔗 Sanqui creature: no way to bake the existing entries into html and serve them statically forever?
21:36 🔗 xmc you could do that with wpull
21:37 🔗 creature Kaz: I'm not sure, and it turns out that our internal query tool is broken at the moment. But I think it's around 200,000 items. Basically "blog posts".
21:38 🔗 creature Sanqui: No doubt there is, but the bigger problem is convincing people that this is something they should care about and that it's worth putting in the time to do that.
21:38 🔗 Kaz if you can provide a list of base URLs, those could be grabbed pretty quickly
21:38 🔗 Sanqui yeah, we could probably squeeze that into archivebot
21:39 🔗 Stiletto has joined #archiveteam
21:39 🔗 Sanqui the way that would work, you'd just give us a list of urls, and keep it running until we give you the signal, then you can take it down, and it'll stay in the wayback machine
21:39 🔗 creature I'll see if I can do that as a contingency plan. That's a good idea.
21:40 🔗 Sanqui is there media? images/videos?
21:40 🔗 Sanqui or just plain blog posts?
21:40 🔗 creature Images. No videos.
21:40 🔗 Sanqui ok
21:40 🔗 creature But as I say, I'm sure I've seen some articles/talks about why this stuff matters. I'd love something like that that I could share around.
21:40 🔗 Sanqui sketchcow has done tons on this topic, hold on
21:41 🔗 creature It might have been one of SketchCow's talks. I remember a thing about Geocities, and about how people have died who built the site & their loved ones still visit it.
21:41 🔗 xmc yes
21:41 🔗 Sanqui http://www.archiveteam.org/index.php?title=User:Jscott
21:41 🔗 xmc that is why we do this
21:43 🔗 Sanqui those are all pretty old though
21:44 🔗 creature Like, my ideal outcome of this situation is that I convince a couple of the other devs & at least one product person that there's very little cost associated with it (hosting is basically free, not a lot of bandwidth, doesn't cause much maintenance, doesn't suck up developer time) and we _should_ host it (because we invited people to host it with us in the first place), so let's just keep it.
21:44 🔗 creature Because at least it's not just me taking that position then.
21:44 🔗 achip this talk is also good but not on that page and has some about geocities https://www.youtube.com/watch?v=-2ZTmuX3cog
21:44 🔗 xmc archive it, quietly, in warc form, and also carry on this conversation
21:45 🔗 Sanqui yeah, it would be great if you just kept it up. all of us are too gloomy at this point so we just focus on archival but if you can keep it running, that's ideal
21:45 🔗 AlexLehm has quit IRC (Ping timeout: 260 seconds)
21:45 🔗 schbirid has quit IRC (Quit: Leaving)
21:46 🔗 xmc but if you're thinking of archiving it, you should
21:46 🔗 Sanqui yeah. so just pass us the urls silently :)
21:47 🔗 creature I don't think it's an imminent thing, it just happened to come up again in passing & I wanted to do a better job of explaining to others why I care about it.
21:47 🔗 creature I'll have a look through Jason's articles and see if there's a good intro to it in there.
21:54 🔗 SketchCow I HAVE DONE MORE RECENT TALKS
21:56 🔗 joepie91 creature: I'm assuming that not naming the service in question was intentional?
21:58 🔗 xmc i would assume so
21:59 🔗 creature joepie91: Pretty much. It wouldn't be too tricky to track it down, but as I say: no need to panic/scramble to archive it.
21:59 🔗 joepie91 alright :)
22:00 🔗 creature It's an old topic that happened to come up again, and I figured this would be a good channel to ask about good intros to this stuff.
22:00 🔗 joepie91 creature: I'm mostly asking because I might have some "in the wild" data of whether the data is still considered important by people
22:00 🔗 joepie91 but that's more a curiosity point really
22:00 🔗 joepie91 archival is a good idea regardless :P
22:01 🔗 creature Yeah. TBH, I'm not so worried about that. As I say, I think it's a couple of hundred thousand posts; there's gotta be at least a few in there that people care about.
22:03 🔗 creature There was a form being designed the other day that was asking for a user's gender, and it was really useful to be able to link to https://www.gov.uk/service-manual/user-centred-design/resources/patterns/gender-and-sex.html to explain why that's a more complicated thing than it might seem. So I was hoping for a similar intro/resource as a counterpoint to "Let's just turn it off, nobody uses it any more, we deprecated it ag
22:04 🔗 xmc :)
22:10 🔗 creature Oh, one complication to wpull: our site uses a _lot_ of front-end rendering. The content might not render without JavaScript. Does wpull use a full-on headless browser, or is it expecting it to be in the returned content from a URL?
22:11 🔗 Frogging there is a phantomjs option with wpull
22:12 🔗 creature Super. :)
22:12 🔗 zout has quit IRC (Ping timeout: 244 seconds)
22:17 🔗 xmc note, this runs the js and scrolls down and stores all the resources. it doesn't flatten the DOM and then recompose it as HTML
22:18 🔗 creature I'll give it a try, and see how it gets on.
22:22 🔗 creature Am I imagining a Wired profile of Archiveteam, that also touches a lot on why people want to preserve this stuff?
22:25 🔗 HCross2 creature: do a test crawl and use webarchiveplayer to test it
22:43 🔗 Stiletto has quit IRC (Ping timeout: 244 seconds)
22:56 🔗 Start has joined #archiveteam
23:00 🔗 pfallenop has quit IRC (Ping timeout: 260 seconds)
23:08 🔗 Morbus has joined #archiveteam
23:16 🔗 hawc145 has quit IRC (Ping timeout: 250 seconds)
23:17 🔗 hawc145 has joined #archiveteam
23:28 🔗 Morbus has quit IRC (hub.efnet.us ircd.choopa.net)
23:28 🔗 phuzion has quit IRC (hub.efnet.us ircd.choopa.net)
23:28 🔗 GLaDOS has quit IRC (hub.efnet.us ircd.choopa.net)
23:28 🔗 balrog has quit IRC (hub.efnet.us ircd.choopa.net)
23:28 🔗 kremlin has quit IRC (hub.efnet.us ircd.choopa.net)
23:28 🔗 Smiley has quit IRC (hub.efnet.us ircd.choopa.net)
23:28 🔗 MMovie has quit IRC (hub.efnet.us ircd.choopa.net)
23:28 🔗 nwf has quit IRC (hub.efnet.us ircd.choopa.net)
23:28 🔗 trs80 has quit IRC (hub.efnet.us ircd.choopa.net)
23:28 🔗 bwn has quit IRC (hub.efnet.us ircd.choopa.net)
23:28 🔗 Kenshin has quit IRC (hub.efnet.us ircd.choopa.net)
23:28 🔗 maseck has quit IRC (hub.efnet.us ircd.choopa.net)
23:28 🔗 creature has quit IRC (hub.efnet.us ircd.choopa.net)
23:28 🔗 thefinn93 has quit IRC (hub.efnet.us ircd.choopa.net)
23:28 🔗 aschmitz has quit IRC (hub.efnet.us ircd.choopa.net)
23:28 🔗 Gfy has quit IRC (hub.efnet.us ircd.choopa.net)
23:28 🔗 fusl has quit IRC (hub.efnet.us ircd.choopa.net)
23:28 🔗 jk[SVP] has quit IRC (hub.efnet.us ircd.choopa.net)
23:28 🔗 ivan has quit IRC (hub.efnet.us ircd.choopa.net)
23:28 🔗 Petri152 has quit IRC (hub.efnet.us ircd.choopa.net)
23:28 🔗 _Zialus_ has quit IRC (hub.efnet.us ircd.choopa.net)
23:28 🔗 aMunster has quit IRC (hub.efnet.us ircd.choopa.net)
23:33 🔗 fusl has joined #archiveteam
23:33 🔗 Morbus has joined #archiveteam
23:33 🔗 phuzion has joined #archiveteam
23:33 🔗 GLaDOS has joined #archiveteam
23:33 🔗 balrog has joined #archiveteam
23:33 🔗 kremlin has joined #archiveteam
23:33 🔗 Smiley has joined #archiveteam
23:33 🔗 MMovie has joined #archiveteam
23:33 🔗 nwf has joined #archiveteam
23:33 🔗 trs80 has joined #archiveteam
23:33 🔗 bwn has joined #archiveteam
23:33 🔗 Kenshin has joined #archiveteam
23:33 🔗 maseck has joined #archiveteam
23:33 🔗 creature has joined #archiveteam
23:33 🔗 thefinn93 has joined #archiveteam
23:33 🔗 aschmitz has joined #archiveteam
23:33 🔗 Gfy has joined #archiveteam
23:33 🔗 jk[SVP] has joined #archiveteam
23:33 🔗 ivan has joined #archiveteam
23:33 🔗 Petri152 has joined #archiveteam
23:33 🔗 _Zialus_ has joined #archiveteam
23:33 🔗 aMunster has joined #archiveteam
23:33 🔗 ircd.choopa.net sets mode: +o balrog
23:33 🔗 swebb sets mode: +o balrog
23:34 🔗 yuitimoth has quit IRC (Ping timeout: 246 seconds)
23:37 🔗 yuitimoth has joined #archiveteam
23:45 🔗 DFJustin creature: maybe one of these? http://www.archiveteam.org/index.php?title=In_The_Media
23:51 🔗 pfalleno1 has joined #archiveteam
23:54 🔗 joepie91 this seems like a thing I may have to write at some point...

irclogger-viewer