Time |
Nickname |
Message |
00:10
🔗
|
|
Stiletto has quit IRC (Ping timeout: 260 seconds) |
00:21
🔗
|
|
hook54321 has quit IRC (Quit: Connection closed for inactivity) |
00:31
🔗
|
|
brayden has joined #archiveteam |
00:31
🔗
|
|
swebb sets mode: +o brayden |
00:42
🔗
|
DoomTay |
!ao https://www.youtube.com/watch?v=yompC3id0-4 --youtube-dl |
00:53
🔗
|
|
SN4T14 has quit IRC (Remote host closed the connection) |
00:55
🔗
|
|
SN4T14 has joined #archiveteam |
00:57
🔗
|
|
fie has joined #archiveteam |
01:05
🔗
|
|
ris has quit IRC () |
01:24
🔗
|
|
hook54321 has joined #archiveteam |
01:46
🔗
|
|
Jonimus has quit IRC (Read error: Connection reset by peer) |
02:05
🔗
|
|
Stiletto has joined #archiveteam |
02:12
🔗
|
|
vitzli has joined #archiveteam |
02:31
🔗
|
|
sdfsdf has quit IRC (Quit: Page closed) |
02:40
🔗
|
DoomTay |
Can someone save http://www.templeos.org/ ? |
02:40
🔗
|
|
Jonimus has joined #archiveteam |
02:40
🔗
|
|
swebb sets mode: +o Jonimus |
02:42
🔗
|
|
tfgbd_znc has quit IRC (Read error: Connection reset by peer) |
02:45
🔗
|
|
VADemon has quit IRC (Read error: Connection reset by peer) |
03:31
🔗
|
|
luckcolor has quit IRC (Read error: Connection reset by peer) |
03:31
🔗
|
|
hook54321 has quit IRC (Quit: Connection closed for inactivity) |
03:34
🔗
|
|
luckcolor has joined #archiveteam |
04:15
🔗
|
|
galaxy_an has joined #archiveteam |
04:15
🔗
|
galaxy_an |
For anyone in here that's not in #archivebot, I wanted to mention: |
04:15
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
04:17
🔗
|
|
BlueMaxim has quit IRC (Read error: Operation timed out) |
04:17
🔗
|
galaxy_an |
it looks like the google melange site (where all the work from GCI and GSoC is) is going down (I've been told any day/any time now, although there's no public word) |
04:17
🔗
|
|
BlueMaxim has joined #archiveteam |
04:18
🔗
|
galaxy_an |
there's an "archive" on the site, but that only includes (at least for gci, I'm not sure exactly what is up with gsoc) the descriptions of tasks; it is missing all of the comments and all of the actual work submitted (which is quite important) |
04:18
🔗
|
|
dashcloud has joined #archiveteam |
04:19
🔗
|
galaxy_an |
for gsoc it looks like the "archive" has the abstracts, but none of the detailed proposal work or code samples or anything |
04:19
🔗
|
galaxy_an |
so I was talking to a few people in #archivebot about trying to archive it quickly before it goes down (as I said, any time now) |
04:20
🔗
|
galaxy_an |
JesseW got a few jobs running for GCI <= 2013 (the total data is froma GCI 2010--2014, GSoC 2009-2015) |
04:20
🔗
|
galaxy_an |
unfortunately, JesseW had to leave now for the weekend, so the effort is stuck halfway |
04:21
🔗
|
galaxy_an |
(I can't crawl it properly on my own, and I can't submit jobs to ArchiveBot) |
04:21
🔗
|
galaxy_an |
I just wanted to check if anyone else in here was interseted in helping look into how to archive some of that data, which seems like it has a fair amount of value |
04:21
🔗
|
galaxy_an |
(especially given how many projects reference gsoc proposals heavily in planning documents, etc.) |
04:24
🔗
|
joepie91 |
wait, gsoc is going away? |
04:26
🔗
|
|
MMovie1 has quit IRC (Read error: Operation timed out) |
04:27
🔗
|
|
RichardG_ has joined #archiveteam |
04:27
🔗
|
|
RichardG has quit IRC (Ping timeout: 260 seconds) |
04:28
🔗
|
|
MMovie has joined #archiveteam |
04:46
🔗
|
|
MMovie1 has joined #archiveteam |
04:48
🔗
|
|
MMovie has quit IRC (Read error: Operation timed out) |
04:51
🔗
|
|
MMovie1 has quit IRC (Client Quit) |
04:52
🔗
|
|
MMovie has joined #archiveteam |
05:01
🔗
|
|
Sk1d has quit IRC (Ping timeout: 250 seconds) |
05:06
🔗
|
|
DoomTay has quit IRC (Quit: Page closed) |
05:07
🔗
|
|
Sk1d has joined #archiveteam |
05:47
🔗
|
|
SmileyG has joined #archiveteam |
05:49
🔗
|
|
Smiley has quit IRC (Read error: Operation timed out) |
05:52
🔗
|
SketchCow |
UK Leaving EU |
05:52
🔗
|
SketchCow |
We probably want to grab some web stuff |
06:11
🔗
|
godane |
SketchCow: i got gizmodo from 2002 to 2007 |
06:12
🔗
|
godane |
i'm uploading 2007 right now |
06:12
🔗
|
godane |
i will have to upload 2003 and 2004 later |
06:14
🔗
|
godane |
SketchCow: i'm also going after mp3s for RN Breakfast from ABC |
06:15
🔗
|
godane |
i'm also going to see what full episodes are in way back so we have few more mp3s in the collection |
06:15
🔗
|
godane |
the complete mp3s are only go back to 2012-08-22 |
06:15
🔗
|
godane |
but they exist before that |
06:17
🔗
|
|
tomwsmf-a has quit IRC (Ping timeout: 258 seconds) |
06:18
🔗
|
|
Sue_ has quit IRC (Remote host closed the connection) |
06:30
🔗
|
|
n0000 has joined #archiveteam |
06:31
🔗
|
n0000 |
WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD |
06:31
🔗
|
|
n0000 has quit IRC (Client Quit) |
06:33
🔗
|
pikhq |
YOUSUCKATIRC |
06:38
🔗
|
Atluxity |
:P |
06:39
🔗
|
SketchCow |
Folks. |
06:39
🔗
|
SketchCow |
Someone asks, we give them access |
06:39
🔗
|
SketchCow |
When did we add more interview questions |
06:40
🔗
|
SketchCow |
(I know that didn't happen this time, but I saw it elsewhere.) |
06:44
🔗
|
Atluxity |
sure |
06:47
🔗
|
|
Sue_ has joined #archiveteam |
07:28
🔗
|
|
Fake-Name has quit IRC (Ping timeout: 258 seconds) |
07:58
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
08:01
🔗
|
|
schbirid has joined #archiveteam |
08:02
🔗
|
|
dashcloud has joined #archiveteam |
08:09
🔗
|
|
Fake-Name has joined #archiveteam |
09:12
🔗
|
|
fie_ has joined #archiveteam |
09:13
🔗
|
|
fie has quit IRC (Read error: Operation timed out) |
10:16
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
10:31
🔗
|
|
Gfy has quit IRC (Quit: I'll be back!) |
10:47
🔗
|
|
Gfy has joined #archiveteam |
11:04
🔗
|
|
VADemon has joined #archiveteam |
11:21
🔗
|
|
Morbus has joined #archiveteam |
11:30
🔗
|
|
dashcloud has quit IRC (Ping timeout: 250 seconds) |
11:38
🔗
|
|
dashcloud has joined #archiveteam |
12:46
🔗
|
|
RichardG has joined #archiveteam |
12:47
🔗
|
|
RichardG_ has quit IRC (Ping timeout: 244 seconds) |
13:27
🔗
|
arkiver |
script for arto are updated |
13:27
🔗
|
arkiver |
items requeued |
13:27
🔗
|
arkiver |
they'll be closing the server the 30th of this month |
13:53
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
13:56
🔗
|
|
dashcloud has joined #archiveteam |
14:01
🔗
|
|
j08nY has joined #archiveteam |
14:10
🔗
|
|
arrith has joined #archiveteam |
14:59
🔗
|
|
DoomTay has joined #archiveteam |
15:19
🔗
|
|
khaoohs has joined #archiveteam |
15:34
🔗
|
|
metalcamp has joined #archiveteam |
15:51
🔗
|
|
mutoso_ has joined #archiveteam |
15:53
🔗
|
|
mutoso has quit IRC (Read error: Operation timed out) |
16:44
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
16:46
🔗
|
|
atomotic has joined #archiveteam |
16:48
🔗
|
|
atomotic has quit IRC (Client Quit) |
16:54
🔗
|
|
godane has quit IRC (Read error: Operation timed out) |
16:58
🔗
|
|
dashcloud has joined #archiveteam |
17:06
🔗
|
|
RichardG has quit IRC (Read error: Operation timed out) |
17:19
🔗
|
|
godane has joined #archiveteam |
17:28
🔗
|
|
Tomcat_ has joined #archiveteam |
17:44
🔗
|
|
RichardG has joined #archiveteam |
18:01
🔗
|
galaxy_an |
joepie91: sorry fo rdisappearing, this whole thing has been at the extreme of inopportune times fo rme |
18:02
🔗
|
galaxy_an |
GSoC the competition is still happening, but it's hosted on a new platform |
18:02
🔗
|
galaxy_an |
google is shutting down the old platform for GSoC and their other competition (GCI) |
18:02
🔗
|
galaxy_an |
the have an "archive" but it doesn't archive most of the importatn stuff |
18:03
🔗
|
galaxy_an |
it's hard to crawl, since all of the links to the individual pages with important information come from tables that are dynamically set up by js that parses json from an xhr |
18:05
🔗
|
galaxy_an |
I should be around most of the rest of today, with intermittent connectivity drops |
18:05
🔗
|
galaxy_an |
but I don't have the infra needed to actually do the crawling |
18:35
🔗
|
|
Froggypwn has joined #archiveteam |
18:38
🔗
|
joepie91 |
galaxy_an: hm. would phantomjs be sufficient for archiving it? |
18:39
🔗
|
DoomTay |
I think it was brought up earlier that it MIGHT help, but no guarantees |
18:45
🔗
|
galaxy_an |
joepie91: we tried that in archivebot the night before last |
18:45
🔗
|
galaxy_an |
it doesn |
18:45
🔗
|
galaxy_an |
't seem to wokr |
18:46
🔗
|
galaxy_an |
(especially since the whole list is paginated, but there's another problem as well) |
18:46
🔗
|
galaxy_an |
I believe that we decided that the best way to do it was to use the JSON that is used to make the lists |
18:47
🔗
|
galaxy_an |
(I'd imagine that we'd ideally archive that JSON as well, just in case it has any extra useful metadata) |
19:03
🔗
|
|
atomotic has joined #archiveteam |
19:10
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
19:13
🔗
|
swebb |
I started a full gawker grab of all of their sites using the heritrix 3.3.0 engine. Will probably take a while. |
19:35
🔗
|
|
vitzli has quit IRC (Leaving) |
19:56
🔗
|
|
BartoCH has quit IRC (Read error: Connection reset by peer) |
20:06
🔗
|
|
BartoCH has joined #archiveteam |
20:21
🔗
|
schbirid |
https://twitter.com/AuswaertigesAmt/status/746422386598223872 please :D |
20:24
🔗
|
|
tomwsmf-a has joined #archiveteam |
20:32
🔗
|
DoomTay |
Done |
20:33
🔗
|
|
ris has joined #archiveteam |
20:36
🔗
|
schbirid |
thanks |
21:37
🔗
|
|
Aranje has joined #archiveteam |
21:48
🔗
|
|
Tomcat_ has quit IRC (Remote host closed the connection) |
22:27
🔗
|
|
DoomTay has quit IRC (Quit: Page closed) |
22:51
🔗
|
|
metalcamp has quit IRC (Ping timeout: 244 seconds) |
23:24
🔗
|
|
j08nY has quit IRC (Quit: Leaving) |
23:24
🔗
|
galaxy_an |
the strategy that we've been trying to use for gci/gsoc crawling isn't working |
23:24
🔗
|
galaxy_an |
(the main task tables don't have all the tasks in them, for gci at least) |
23:24
🔗
|
galaxy_an |
I am quite out of time to work on this now, unfortunately; does anyone else have the time to look into it? |
23:28
🔗
|
galaxy_an |
oh.... |