[00:02] I got to see Dweezil perform a set before- it was pretty good. [00:24] *** BlueMaxim has joined #archiveteam-bs [00:44] *** will has quit IRC (Ping timeout: 244 seconds) [00:59] *** will has joined #archiveteam-bs [01:00] *** lytv has joined #archiveteam-bs [01:05] *** vtyl has quit IRC (Read error: Operation timed out) [01:13] *** dashcloud has quit IRC (Read error: Operation timed out) [01:16] *** JesseW has joined #archiveteam-bs [01:28] *** dashcloud has joined #archiveteam-bs [02:00] *** tomwsmf-a has joined #archiveteam-bs [03:01] *** bwn has quit IRC (Read error: Operation timed out) [04:00] *** tomwsmf-a has quit IRC (Read error: Operation timed out) [04:08] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [04:10] *** dashcloud has quit IRC (Read error: Operation timed out) [04:13] *** Sk1d has joined #archiveteam-bs [04:17] *** dashcloud has joined #archiveteam-bs [04:48] JesseW: one last thing regarding that sql db you made, how far past 10 million did the stories go? for the sake of round numbers, thats where my new grab starts, and i can't think of a quick way to check. [04:49] JesseW: story id, is not a number, and i cant think of a way to break it up that way, plus the numbers arent zero padded, so thats annoying. [05:03] *** marvinw is now known as ivan` [05:40] bsmith093: hm [05:44] Well, ordering the "Story URL" column gives me: [05:48] it goes up to at least https://www.fanfiction.net/s/9999903/1/ [05:50] now running select "Story URL" from metadata order by "Story URL" desc limit 10; [05:50] it is taking a while [05:51] JesseW: is null data literally nothing, or is it a single space, in this table? [05:52] nothing -- it's not a single space [05:53] select path from metadata where "story url" is null that should return something, becasue you told me there were entries with a lot of missing data [05:54] yes, 59, IIRC [05:54] can i pull data where any column is null? [05:54] query finished [05:54] The largest story URL is https://www.fanfiction.net/s/9999999/1/ [05:54] select path from metadata where * is null that's not working either [05:55] * means "all the columns" -- you can't ask "are all the columns null" that way [05:55] awesome. no overlap! [05:55] also, even if you did, you'd just get null back :-) [05:55] right because path would never be null. [05:56] return path where at least one other column is null? [05:58] well, afaik, all the ones with one null column have all the rest null as well (except path [06:00] JesseW: but i keep checking, and it's not returning anything [06:02] apparently they are empty strings, not null [06:02] select * from metadata where language = ''; [06:02] will give you the paths [06:03] and there should be 79 of them [06:04] i've also found a weird quirk. apparently some stories literally have a word count of 0, thats what the website returns as the metadata. nothing we can do about it, of course, just interesting [06:05] here's an example http://www.fanfiction.net/s/2279489/1/ [06:05] hm, that is odd [06:08] Twilight - Drowning in Chaos - Mesmerizing angel.txt id is fanfiction.net/s/6958332, according to google. these must have been a edge case for the scraper. [06:10] how does count work, i can't find decent docs? [06:16] i figured it out, but damn, that's not intuitive syntax at *all*. [06:16] lol. No, no it is not. [06:16] select count (*) from metadata where language = ''; [06:17] what else could i possible want but all of that group, why give the option?! [06:22] because it is actually counting the number of *non-null* values *in the given column*. [06:22] it's just the most often, you want a count of any non-null columns. [06:23] er, a count of all rows with *any* non-null column [06:24] *** Honno has joined #archiveteam-bs [06:28] thanks. [06:33] glad to help [06:36] *** metalcamp has joined #archiveteam-bs [06:48] *** JesseW has quit IRC (Ping timeout: 370 seconds) [07:09] *** bwn has joined #archiveteam-bs [09:38] *** mr-b has quit IRC (Read error: Operation timed out) [09:38] *** Jonimus has quit IRC (Read error: Operation timed out) [09:39] *** Jonimus has joined #archiveteam-bs [09:39] *** swebb sets mode: +o Jonimus [09:55] *** mr-b has joined #archiveteam-bs [10:52] *** Stiletto has quit IRC (Ping timeout: 250 seconds) [10:53] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [10:53] *** logchfoo4 has quit IRC (Ping timeout: 250 seconds) [10:57] *** logchfoo1 starts logging #archiveteam-bs at Fri Apr 08 10:57:28 2016 [10:57] *** logchfoo1 has joined #archiveteam-bs [11:02] *** koon has quit IRC (Ping timeout: 250 seconds) [11:38] *** koon has joined #archiveteam-bs [11:44] *** Fletcher_ has joined #archiveteam-bs [12:18] *** BlueMaxim has quit IRC (Quit: Leaving) [12:53] *** metalcamp has quit IRC (Quit: Bye) [13:05] *** metalcamp has joined #archiveteam-bs [13:38] http://archiveguide.witness.org/ [13:51] all the videos i download aren't for public consumption, lol [13:51] actually, i don't even know who would own the copyright on the shit i download [14:16] *** jut has joined #archiveteam-bs [14:24] *** Start has quit IRC (Quit: Disconnected.) [14:38] *** vitzli has joined #archiveteam-bs [15:26] *** Start has joined #archiveteam-bs [15:40] *** JesseW has joined #archiveteam-bs [16:09] *** Start has quit IRC (Quit: Disconnected.) [16:13] *** Start has joined #archiveteam-bs [16:15] *** JesseW has quit IRC (Ping timeout: 370 seconds) [16:20] *** Start has quit IRC (Quit: Disconnected.) [16:26] atrocity: Needs more LZMA [16:27] oh wait, 7z is LZMA [16:29] 7zma [16:29] *** vitzli has quit IRC (Quit: Leaving) [16:38] *** VADemon has joined #archiveteam-bs [16:48] *** Honno has quit IRC (Read error: Operation timed out) [16:49] *** Honno has joined #archiveteam-bs [17:09] *** jspiros has joined #archiveteam-bs [17:09] *** yakfish has joined #archiveteam-bs [17:09] *** matthusby has joined #archiveteam-bs [17:13] *** SadDM has joined #archiveteam-bs [17:13] *** swebb sets mode: +o SadDM [17:47] *** Stilett0 is now known as Stiletto [17:50] https://www.youtube.com/watch?v=D2fSXp6N-vs [17:59] I wonder if there’s a program to make these kind of videos. As in: Feed it timestamped transcripts of audio/video and a text. Should be fairly easy actually – except for the timestamped transcripts. [18:01] Yes [18:23] *** Start has joined #archiveteam-bs [18:49] *** Honno has quit IRC (Read error: Operation timed out) [19:01] *** Honno has joined #archiveteam-bs [19:04] *** Boppen has quit IRC (Ping timeout: 194 seconds) [19:04] *** Sk2d has joined #archiveteam-bs [19:09] *** jut has quit IRC (Read error: Connection reset by peer) [19:10] *** Sk1d has quit IRC (hub.se irc.du.se) [19:23] *** Start has quit IRC (Ping timeout: 260 seconds) [19:25] *** Sk2d is now known as Sk1d [19:30] *** bwn has quit IRC (Read error: Operation timed out) [19:31] *** Start has joined #archiveteam-bs [19:44] *** Start has quit IRC (Quit: Disconnected.) [19:49] *** Start has joined #archiveteam-bs [19:49] *** bwn has joined #archiveteam-bs [19:51] *** Sanqui has quit IRC (Read error: Operation timed out) [19:51] *** Sanqui has joined #archiveteam-bs [19:53] *** Start has quit IRC (Client Quit) [19:56] *** Infreq_ has quit IRC (Read error: Operation timed out) [19:56] *** Sanqui has quit IRC (Read error: Operation timed out) [20:04] *** Infreq has joined #archiveteam-bs [20:13] *** Sanqui has joined #archiveteam-bs [20:47] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [20:59] SketchCow: i'm up to 2015-04-30 with kpfa [21:00] we are less then a year behind now [21:02] I have just acquired two dv-tape recorders/players and will start a project digitizing ~150 dv tapes my organization got archived [21:04] *** Start has joined #archiveteam-bs [21:05] *** Honno has quit IRC (Read error: Operation timed out) [21:10] *** BlueMaxim has joined #archiveteam-bs [21:14] *** Start has quit IRC (Quit: Disconnected.) [21:35] nice! [21:42] tens of years of free software talks [21:51] *** Honno has joined #archiveteam-bs [22:05] *** Start has joined #archiveteam-bs [22:18] oh god joepie91 that brings back so many memories [22:22] : [22:22] :D* [22:27] *** slpeeds has joined #archiveteam-bs [22:34] *** fdo54ss has quit IRC (Ping timeout: 633 seconds) [22:43] *** Stiletto is now known as Stilett0 [23:08] *** Honno has quit IRC (Read error: Operation timed out) [23:53] *** dashcloud has quit IRC (Read error: Operation timed out)