[00:05] *** mismatch has joined #archiveteam-bs [00:58] *** JesseW has joined #archiveteam-bs [01:01] *** xXx_ndidd has quit IRC (Ping timeout: 633 seconds) [01:20] JesseW: how goes the upload? and the csv [01:20] the csvs have finished -- I need to load them into a database. [01:22] the upload has also finished, after a total of about 63 hours. [01:26] I broke the csvs into separate files per directory -- I'm calculating the total size now. [01:34] so i'm close to having all of gawker.com sitemap [01:37] *** bwn_ has joined #archiveteam-bs [01:39] godane: nice! [01:49] *** bwn has quit IRC (Read error: Operation timed out) [01:57] bsmith093: so the total size of the CSVs is 3.5GB [01:58] holy crap. will anything even read that massive sql thing?! [02:06] 3.5GB of data in a sql database isn't particularly large. [02:07] It might be painful for sqlite (maybe), but not for other databases. [02:35] * JesseW would like help/a listening chatroom (does that make sense?) in figuring out how to filter leaf nodes out of an adjacency list... [02:36] I have a list of IA identifier -> collection it is in, and I want to filter out non-collections (i.e. leaf nodes) without loading the whole thing into a graph system [03:02] being a CS student that sounds like something I should know [03:02] but alas [03:39] heh [04:07] *** tomwsmf-a has quit IRC (Ping timeout: 258 seconds) [04:09] Amusing item: https://archive.org/metadata/ia-das/metadata -- a collection which is a member of itself. [04:38] *** bwn has joined #archiveteam-bs [04:48] *** bwn_ has quit IRC (Read error: Operation timed out) [05:17] .j #justsolve [05:17] Shit. [05:34] remsen: ? [05:35] * JesseW has solved the graph problem I mentioned; it turned out just listing all the internal nodes, then running over the list again worked fine. [05:35] I was going through the IA census collections data -- it turns out there are about 16,000 collections. [05:35] JesseW, command fuckup! I need now to log off and toss my modem into the garbage. [05:38] sympathy. modems -> :-( [05:40] My new modem/router combo (!!!) from TWC is actually an upgrade from the Linksys one I bought myself. [05:41] Well, the one that was purchased for the household. [05:43] It actually has decent default security too. Good on Arris. [05:44] It's obviously leased so I can't flash it. [05:53] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [05:55] *** JesseW has quit IRC (Remote host closed the connection) [05:56] *** Honno has joined #archiveteam-bs [05:57] *** JesseW has joined #archiveteam-bs [06:01] *** Sk1d has joined #archiveteam-bs [06:26] i'm starting to uploading Sky & Telescope: https://archive.org/details/Sky_and_Telescope_1941-11-cbr [06:27] the cbr files [06:28] your going to get cbr and pdf collections of it [06:29] this is mostly cause there could be gaps in both collections [07:09] *** Start has quit IRC (Ping timeout: 260 seconds) [07:36] *** JesseW has quit IRC (Quit: Leaving.) [07:41] *** JesseW has joined #archiveteam-bs [08:05] *** schbirid has joined #archiveteam-bs [08:09] *** mismatch has quit IRC (Ping timeout: 633 seconds) [08:16] *** JesseW has quit IRC (Quit: Leaving.) [08:54] *** marvinw has quit IRC (Ping timeout: 633 seconds) [09:21] I'm starting to upload gawker.com sitemap for 2005 [10:01] *** marvinw has joined #archiveteam-bs [10:04] *** bwn has quit IRC (Read error: Operation timed out) [10:12] *** bwn has joined #archiveteam-bs [10:20] *** marvinw has quit IRC (Read error: Connection reset by peer) [10:29] *** marvinw has joined #archiveteam-bs [10:45] *** BlueMaxim has quit IRC (Quit: Leaving) [10:59] SketchCow: the collection for this item is a item: https://archive.org/details/Lifehacker_Extra_17 [11:00] https://archive.org/details/lifehacker [11:00] may want to change it to lifehacker-extra or rev3-lifehacker-extra [11:02] SketchCow: also some got dark being mark spam: https://archive.org/details/Lifehacker_Extra_1 [11:16] godane: that's awesome! https://archive.org/details/Sky_and_Telescope_1941-11-cbr [11:16] will you upload all years? [11:23] up to 2009 [11:25] i'm up to this far : https://archive.org/details/Sky_and_Telescope_1960-12-cbr [11:29] so looks like cbr files have no gaps [11:29] it was only 1949 in the pdf format that had a gap [11:29] mostly cause there is only 1 1949 magazine in pdf format [13:18] *** Stiletto has quit IRC (Ping timeout: 260 seconds) [13:19] *** Famicoma1 has quit IRC (Ping timeout: 260 seconds) [13:19] *** Muad-Dib has quit IRC (Ping timeout: 260 seconds) [13:19] *** Stiletto has joined #archiveteam-bs [13:21] *** Famicoma1 has joined #archiveteam-bs [13:26] *** vitzli has joined #archiveteam-bs [13:31] *** Famicoma1 has quit IRC (Remote host closed the connection) [13:31] *** Famicoma1 has joined #archiveteam-bs [13:32] *** Muad-Dib has joined #archiveteam-bs [14:37] *** metalcamp has joined #archiveteam-bs [15:24] *** JesseW has joined #archiveteam-bs [15:38] *** JesseW has quit IRC (Quit: Leaving.) [16:05] *** altlabel has joined #archiveteam-bs [16:11] *** zino has quit IRC (Read error: Operation timed out) [16:12] *** Start has joined #archiveteam-bs [16:17] *** vitzli has quit IRC (Leaving) [19:36] *** bwn has quit IRC (Read error: Operation timed out) [20:03] *** bwn has joined #archiveteam-bs [20:31] *** schbirid has quit IRC (Quit: Leaving) [21:02] *** mksplg has quit IRC (Ping timeout: 260 seconds) [21:05] *** Rickster has quit IRC (Remote host closed the connection) [21:07] *** Rickster has joined #archiveteam-bs [21:14] *** mksplg has joined #archiveteam-bs [21:15] *** zino has joined #archiveteam-bs [21:22] *** Stiletto is now known as Stilett0 [21:39] *** bauruine has quit IRC (Ping timeout: 260 seconds) [21:42] *** Stilett0 has quit IRC (Read error: Operation timed out) [21:48] *** BlueMaxim has joined #archiveteam-bs [21:53] *** bauruine has joined #archiveteam-bs [22:02] *** Stiletto has joined #archiveteam-bs [22:06] *** BlueMaxim has quit IRC (Quit: Leaving) [22:06] *** BlueMaxim has joined #archiveteam-bs [22:18] *** Honno has quit IRC (Ping timeout: 492 seconds) [23:14] *** toad2 has quit IRC (Read error: Operation timed out) [23:14] *** toad1 has joined #archiveteam-bs [23:27] *** Stiletto has quit IRC (Ping timeout: 260 seconds) [23:32] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [23:39] *** JetBalsa has joined #archiveteam-bs [23:59] *** ndiddy has joined #archiveteam-bs