#archiveteam 2016-06-24,Fri

↑back Search

Time Nickname Message
00:10 🔗 Stiletto has quit IRC (Ping timeout: 260 seconds)
00:21 🔗 hook54321 has quit IRC (Quit: Connection closed for inactivity)
00:31 🔗 brayden has joined #archiveteam
00:31 🔗 swebb sets mode: +o brayden
00:42 🔗 DoomTay !ao https://www.youtube.com/watch?v=yompC3id0-4 --youtube-dl
00:53 🔗 SN4T14 has quit IRC (Remote host closed the connection)
00:55 🔗 SN4T14 has joined #archiveteam
00:57 🔗 fie has joined #archiveteam
01:05 🔗 ris has quit IRC ()
01:24 🔗 hook54321 has joined #archiveteam
01:46 🔗 Jonimus has quit IRC (Read error: Connection reset by peer)
02:05 🔗 Stiletto has joined #archiveteam
02:12 🔗 vitzli has joined #archiveteam
02:31 🔗 sdfsdf has quit IRC (Quit: Page closed)
02:40 🔗 DoomTay Can someone save http://www.templeos.org/ ?
02:40 🔗 Jonimus has joined #archiveteam
02:40 🔗 swebb sets mode: +o Jonimus
02:42 🔗 tfgbd_znc has quit IRC (Read error: Connection reset by peer)
02:45 🔗 VADemon has quit IRC (Read error: Connection reset by peer)
03:31 🔗 luckcolor has quit IRC (Read error: Connection reset by peer)
03:31 🔗 hook54321 has quit IRC (Quit: Connection closed for inactivity)
03:34 🔗 luckcolor has joined #archiveteam
04:15 🔗 galaxy_an has joined #archiveteam
04:15 🔗 galaxy_an For anyone in here that's not in #archivebot, I wanted to mention:
04:15 🔗 dashcloud has quit IRC (Read error: Operation timed out)
04:17 🔗 BlueMaxim has quit IRC (Read error: Operation timed out)
04:17 🔗 galaxy_an it looks like the google melange site (where all the work from GCI and GSoC is) is going down (I've been told any day/any time now, although there's no public word)
04:17 🔗 BlueMaxim has joined #archiveteam
04:18 🔗 galaxy_an there's an "archive" on the site, but that only includes (at least for gci, I'm not sure exactly what is up with gsoc) the descriptions of tasks; it is missing all of the comments and all of the actual work submitted (which is quite important)
04:18 🔗 dashcloud has joined #archiveteam
04:19 🔗 galaxy_an for gsoc it looks like the "archive" has the abstracts, but none of the detailed proposal work or code samples or anything
04:19 🔗 galaxy_an so I was talking to a few people in #archivebot about trying to archive it quickly before it goes down (as I said, any time now)
04:20 🔗 galaxy_an JesseW got a few jobs running for GCI <= 2013 (the total data is froma GCI 2010--2014, GSoC 2009-2015)
04:20 🔗 galaxy_an unfortunately, JesseW had to leave now for the weekend, so the effort is stuck halfway
04:21 🔗 galaxy_an (I can't crawl it properly on my own, and I can't submit jobs to ArchiveBot)
04:21 🔗 galaxy_an I just wanted to check if anyone else in here was interseted in helping look into how to archive some of that data, which seems like it has a fair amount of value
04:21 🔗 galaxy_an (especially given how many projects reference gsoc proposals heavily in planning documents, etc.)
04:24 🔗 joepie91 wait, gsoc is going away?
04:26 🔗 MMovie1 has quit IRC (Read error: Operation timed out)
04:27 🔗 RichardG_ has joined #archiveteam
04:27 🔗 RichardG has quit IRC (Ping timeout: 260 seconds)
04:28 🔗 MMovie has joined #archiveteam
04:46 🔗 MMovie1 has joined #archiveteam
04:48 🔗 MMovie has quit IRC (Read error: Operation timed out)
04:51 🔗 MMovie1 has quit IRC (Client Quit)
04:52 🔗 MMovie has joined #archiveteam
05:01 🔗 Sk1d has quit IRC (Ping timeout: 250 seconds)
05:06 🔗 DoomTay has quit IRC (Quit: Page closed)
05:07 🔗 Sk1d has joined #archiveteam
05:47 🔗 SmileyG has joined #archiveteam
05:49 🔗 Smiley has quit IRC (Read error: Operation timed out)
05:52 🔗 SketchCow UK Leaving EU
05:52 🔗 SketchCow We probably want to grab some web stuff
06:11 🔗 godane SketchCow: i got gizmodo from 2002 to 2007
06:12 🔗 godane i'm uploading 2007 right now
06:12 🔗 godane i will have to upload 2003 and 2004 later
06:14 🔗 godane SketchCow: i'm also going after mp3s for RN Breakfast from ABC
06:15 🔗 godane i'm also going to see what full episodes are in way back so we have few more mp3s in the collection
06:15 🔗 godane the complete mp3s are only go back to 2012-08-22
06:15 🔗 godane but they exist before that
06:17 🔗 tomwsmf-a has quit IRC (Ping timeout: 258 seconds)
06:18 🔗 Sue_ has quit IRC (Remote host closed the connection)
06:30 🔗 n0000 has joined #archiveteam
06:31 🔗 n0000 WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD
06:31 🔗 n0000 has quit IRC (Client Quit)
06:33 🔗 pikhq YOUSUCKATIRC
06:38 🔗 Atluxity :P
06:39 🔗 SketchCow Folks.
06:39 🔗 SketchCow Someone asks, we give them access
06:39 🔗 SketchCow When did we add more interview questions
06:40 🔗 SketchCow (I know that didn't happen this time, but I saw it elsewhere.)
06:44 🔗 Atluxity sure
06:47 🔗 Sue_ has joined #archiveteam
07:28 🔗 Fake-Name has quit IRC (Ping timeout: 258 seconds)
07:58 🔗 dashcloud has quit IRC (Read error: Operation timed out)
08:01 🔗 schbirid has joined #archiveteam
08:02 🔗 dashcloud has joined #archiveteam
08:09 🔗 Fake-Name has joined #archiveteam
09:12 🔗 fie_ has joined #archiveteam
09:13 🔗 fie has quit IRC (Read error: Operation timed out)
10:16 🔗 BlueMaxim has quit IRC (Quit: Leaving)
10:31 🔗 Gfy has quit IRC (Quit: I'll be back!)
10:47 🔗 Gfy has joined #archiveteam
11:04 🔗 VADemon has joined #archiveteam
11:21 🔗 Morbus has joined #archiveteam
11:30 🔗 dashcloud has quit IRC (Ping timeout: 250 seconds)
11:38 🔗 dashcloud has joined #archiveteam
12:46 🔗 RichardG has joined #archiveteam
12:47 🔗 RichardG_ has quit IRC (Ping timeout: 244 seconds)
13:27 🔗 arkiver script for arto are updated
13:27 🔗 arkiver items requeued
13:27 🔗 arkiver they'll be closing the server the 30th of this month
13:53 🔗 dashcloud has quit IRC (Read error: Operation timed out)
13:56 🔗 dashcloud has joined #archiveteam
14:01 🔗 j08nY has joined #archiveteam
14:10 🔗 arrith has joined #archiveteam
14:59 🔗 DoomTay has joined #archiveteam
15:19 🔗 khaoohs has joined #archiveteam
15:34 🔗 metalcamp has joined #archiveteam
15:51 🔗 mutoso_ has joined #archiveteam
15:53 🔗 mutoso has quit IRC (Read error: Operation timed out)
16:44 🔗 dashcloud has quit IRC (Read error: Operation timed out)
16:46 🔗 atomotic has joined #archiveteam
16:48 🔗 atomotic has quit IRC (Client Quit)
16:54 🔗 godane has quit IRC (Read error: Operation timed out)
16:58 🔗 dashcloud has joined #archiveteam
17:06 🔗 RichardG has quit IRC (Read error: Operation timed out)
17:19 🔗 godane has joined #archiveteam
17:28 🔗 Tomcat_ has joined #archiveteam
17:44 🔗 RichardG has joined #archiveteam
18:01 🔗 galaxy_an joepie91: sorry fo rdisappearing, this whole thing has been at the extreme of inopportune times fo rme
18:02 🔗 galaxy_an GSoC the competition is still happening, but it's hosted on a new platform
18:02 🔗 galaxy_an google is shutting down the old platform for GSoC and their other competition (GCI)
18:02 🔗 galaxy_an the have an "archive" but it doesn't archive most of the importatn stuff
18:03 🔗 galaxy_an it's hard to crawl, since all of the links to the individual pages with important information come from tables that are dynamically set up by js that parses json from an xhr
18:05 🔗 galaxy_an I should be around most of the rest of today, with intermittent connectivity drops
18:05 🔗 galaxy_an but I don't have the infra needed to actually do the crawling
18:35 🔗 Froggypwn has joined #archiveteam
18:38 🔗 joepie91 galaxy_an: hm. would phantomjs be sufficient for archiving it?
18:39 🔗 DoomTay I think it was brought up earlier that it MIGHT help, but no guarantees
18:45 🔗 galaxy_an joepie91: we tried that in archivebot the night before last
18:45 🔗 galaxy_an it doesn
18:45 🔗 galaxy_an 't seem to wokr
18:46 🔗 galaxy_an (especially since the whole list is paginated, but there's another problem as well)
18:46 🔗 galaxy_an I believe that we decided that the best way to do it was to use the JSON that is used to make the lists
18:47 🔗 galaxy_an (I'd imagine that we'd ideally archive that JSON as well, just in case it has any extra useful metadata)
19:03 🔗 atomotic has joined #archiveteam
19:10 🔗 atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
19:13 🔗 swebb I started a full gawker grab of all of their sites using the heritrix 3.3.0 engine. Will probably take a while.
19:35 🔗 vitzli has quit IRC (Leaving)
19:56 🔗 BartoCH has quit IRC (Read error: Connection reset by peer)
20:06 🔗 BartoCH has joined #archiveteam
20:21 🔗 schbirid https://twitter.com/AuswaertigesAmt/status/746422386598223872 please :D
20:24 🔗 tomwsmf-a has joined #archiveteam
20:32 🔗 DoomTay Done
20:33 🔗 ris has joined #archiveteam
20:36 🔗 schbirid thanks
21:37 🔗 Aranje has joined #archiveteam
21:48 🔗 Tomcat_ has quit IRC (Remote host closed the connection)
22:27 🔗 DoomTay has quit IRC (Quit: Page closed)
22:51 🔗 metalcamp has quit IRC (Ping timeout: 244 seconds)
23:24 🔗 j08nY has quit IRC (Quit: Leaving)
23:24 🔗 galaxy_an the strategy that we've been trying to use for gci/gsoc crawling isn't working
23:24 🔗 galaxy_an (the main task tables don't have all the tasks in them, for gci at least)
23:24 🔗 galaxy_an I am quite out of time to work on this now, unfortunately; does anyone else have the time to look into it?
23:28 🔗 galaxy_an oh....

irclogger-viewer