[00:07] *** primus has quit IRC (Ping timeout: 512 seconds) [00:10] SketchCow: for the items "2015_ovi_store_panic" and "2015_ovi_store_panic_2", the CDX files don't seem to be generated. could you check up on that? [00:10] *** primus has joined #archiveteam [00:18] *** primus104 has joined #archiveteam [00:22] anyone know if there's a python script for scraping reddit.com/domain/ ? [00:25] !a http://www.reddit.com/r/subreddit --ignore-sets=reddit [00:25] or https://github.com/ludios/grab-site as a distributaable [00:30] i'm more interested in just getting the urls from a specific domain, for example reddit.com/domain/layervault.com [00:32] *** ohhdemgir has quit IRC (Quit: Leaving) [00:53] *** kyan has joined #archiveteam [00:56] *** beardicus has quit IRC (Quit: My MacBook Pro has gone to sleep. ZZZzzz…) [01:07] *** RichardG has joined #archiveteam [01:08] *** Wizardcry has quit IRC (Read error: Operation timed out) [01:19] *** RichardG has quit IRC (Quit: No keyboard found, press F1 to continue) [01:20] *** RichardG has joined #archiveteam [01:27] Start: there's a node.js module named reddit-stream that might do what you want [01:34] *** NovaKing_ has quit IRC (Read error: Operation timed out) [01:34] *** Selanda has quit IRC (Read error: Operation timed out) [01:35] *** lytv has quit IRC (Read error: Operation timed out) [01:35] *** cadbury_ has quit IRC (Read error: Operation timed out) [01:35] *** aNthraXx has quit IRC (Read error: Operation timed out) [01:35] *** caber has quit IRC (Read error: Operation timed out) [01:36] *** lytv has joined #archiveteam [01:36] *** Coderjoe has quit IRC (Read error: Operation timed out) [01:37] *** Coderjoe has joined #archiveteam [01:37] *** caber has joined #archiveteam [01:40] *** brayden has quit IRC (Read error: Operation timed out) [01:40] *** caber has quit IRC (Read error: Operation timed out) [01:41] *** Selanda has joined #archiveteam [01:42] *** Coderjoe has quit IRC (Read error: Operation timed out) [01:51] *** primus104 has quit IRC (Leaving.) [01:53] *** Coderjoe has joined #archiveteam [01:59] *** caber has joined #archiveteam [02:01] *** cadbury_ has joined #archiveteam [02:03] *** aNthraXx has joined #archiveteam [02:12] *** NovaKing_ has joined #archiveteam [02:44] arkiver: we should be able to begin layervault in the next few days [02:44] i've discovered some sequential api urls [02:45] i'd recommend having layervault.com and news.layervault.com (designer news) as separate warrior projects, as they are completely different sites [03:05] *** brayden has joined #archiveteam [03:19] *** primus has quit IRC (ircd.choopa.net irc.eversible.com) [03:19] *** BlueMaxim has quit IRC (ircd.choopa.net irc.eversible.com) [03:19] *** SN4T14_ has quit IRC (ircd.choopa.net irc.eversible.com) [03:19] *** Emcy has quit IRC (ircd.choopa.net irc.eversible.com) [03:19] *** Mayonaise has quit IRC (ircd.choopa.net irc.eversible.com) [03:19] *** rejon has quit IRC (ircd.choopa.net irc.eversible.com) [03:19] *** Rickster has quit IRC (ircd.choopa.net irc.eversible.com) [03:19] *** ryan_ has quit IRC (ircd.choopa.net irc.eversible.com) [03:19] *** xmc has quit IRC (ircd.choopa.net irc.eversible.com) [03:19] *** Sue_ has quit IRC (ircd.choopa.net irc.eversible.com) [03:19] *** yipdw has quit IRC (ircd.choopa.net irc.eversible.com) [03:19] *** dcmorton has quit IRC (ircd.choopa.net irc.eversible.com) [03:19] *** marnold has quit IRC (ircd.choopa.net irc.eversible.com) [03:19] *** ersi has quit IRC (ircd.choopa.net irc.eversible.com) [03:19] *** slash` has quit IRC (ircd.choopa.net irc.eversible.com) [03:19] *** Famicoman has quit IRC (ircd.choopa.net irc.eversible.com) [03:19] *** eprillios has quit IRC (ircd.choopa.net irc.eversible.com) [03:19] *** Cameron_D has quit IRC (ircd.choopa.net irc.eversible.com) [03:21] *** SN4T14 has joined #archiveteam [03:23] *** primus has joined #archiveteam [03:23] *** BlueMaxim has joined #archiveteam [03:23] *** SN4T14_ has joined #archiveteam [03:23] *** Mayonaise has joined #archiveteam [03:23] *** rejon has joined #archiveteam [03:23] *** Rickster has joined #archiveteam [03:23] *** ryan_ has joined #archiveteam [03:23] *** xmc has joined #archiveteam [03:23] *** Sue_ has joined #archiveteam [03:23] *** yipdw has joined #archiveteam [03:23] *** dcmorton has joined #archiveteam [03:23] *** marnold has joined #archiveteam [03:23] *** ersi has joined #archiveteam [03:23] *** slash` has joined #archiveteam [03:23] *** Famicoman has joined #archiveteam [03:23] *** Cameron_D has joined #archiveteam [03:23] *** irc.eversible.com sets mode: +oooo xmc dcmorton ersi Cameron_D [03:23] *** swebb sets mode: +o xmc [03:23] *** swebb sets mode: +o ersi [03:25] *** Wolfie has quit IRC (Read error: Connection reset by peer) [03:26] *** dcmorton has quit IRC (Excess Flood) [03:26] *** dcmorton has joined #archiveteam [03:27] *** Famicoman has quit IRC (Remote host closed the connection) [03:27] *** ersi has quit IRC (Read error: Connection reset by peer) [03:27] *** ersi has joined #archiveteam [03:27] *** swebb sets mode: +o ersi [03:30] *** SN4T14_ has quit IRC (Ping timeout: 512 seconds) [03:35] *** eprillios has joined #archiveteam [03:36] *** Famicoman has joined #archiveteam [03:37] *** fiatjaf has left undefined [04:16] chfoo: Restarted - let's see if it derives [04:18] *** Infreq has joined #archiveteam [04:21] *** chazchaz_ has quit IRC (Remote host closed the connection) [04:22] *** chazchaz_ has joined #archiveteam [04:33] *** svchfoo2 has quit IRC (Quit: Closing) [04:36] *** svchfoo2 has joined #archiveteam [06:06] I'm writing up a grab project for blingee. [06:06] (channel name suggestion: #jankee) [06:18] *** mistym has joined #archiveteam [06:45] *** JMC has quit IRC (Ping timeout: 370 seconds) [07:18] *** scyther has joined #archiveteam [07:32] Is there any ETA on when the Google Code Grab is likely to start [07:33] *** dashcloud has quit IRC (Read error: Operation timed out) [08:07] *** primus104 has joined #archiveteam [08:24] *** schbirid has joined #archiveteam [08:50] *** mistym has quit IRC (Remote host closed the connection) [08:58] *** BlueMaxim has quit IRC (Ping timeout: 512 seconds) [08:59] *** BlueMaxim has joined #archiveteam [09:44] *** Ymgve has joined #archiveteam [09:45] *** habi has joined #archiveteam [09:47] *** habi has left [10:02] *** signius has quit IRC (Ping timeout: 306 seconds) [10:14] *** signius has joined #archiveteam [10:22] could someone fully archive https://www.reddit.com/r/IAmA/comments/31esm0/iama_95_year_old_german_women_from_a_village_in/ ? it's wonderful [10:31] archivebot no good for it?? [11:09] I imagine that loading all the comments would be the problem [11:17] *** BlueMaxim has quit IRC (Read error: Connection reset by peer) [11:34] *** SimpBrain has joined #archiveteam [11:48] *** primus has quit IRC (Read error: Connection timed out) [11:49] *** primus has joined #archiveteam [11:59] *** dashcloud has joined #archiveteam [12:32] *** Ara_ has joined #archiveteam [12:35] *** philpem has joined #archiveteam [12:38] *** Ara__ has quit IRC (Ping timeout: 492 seconds) [12:45] *** Ara__ has joined #archiveteam [12:51] *** Ara_ has quit IRC (Ping timeout: 492 seconds) [12:53] *** Ara_ has joined #archiveteam [12:54] *** Ara__ has quit IRC (Ping timeout: 492 seconds) [13:10] got a new server ready to pile archiveteam data on to. Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz (Cores 8), 2 x 3TB hdd's. 16GB Ram [13:29] *** Ara_ has quit IRC (Ping timeout: 492 seconds) [13:35] *** monod has joined #archiveteam [13:42] Would someone be able to help recover lost files related to the Charity programming language? https://github.com/mietek/charity-language/issues/1 [13:43] I’ve manually archived the Charity website (http://pll.cpsc.ucalgary.ca/charity1/www/home.html) while it’s still available, fixing broken links and restoring papers from other locations: https://github.com/mietek/charity-language [13:43] There are, however, some files which I cannot find [13:45] I’ve also made a full mirror of the website, which includes many TeX source files, and unrelated papers — if anyone is interested, I can upload the tarball somewhere. [13:46] It’s 160MB compressed [13:46] you can always upload those archived to the Internet Archive [13:46] they'd happily store it for :) [13:49] Didn’t know they take tarballs [13:50] they take any kind of file [14:01] *** antomatic has quit IRC () [14:03] *** Wolfie has joined #archiveteam [14:05] *** antomatic has joined #archiveteam [14:08] *** primus104 has quit IRC (Leaving.) [14:38] *** bzc6p has joined #archiveteam [14:39] *** bzc6p has left [14:50] *** Peetz0r_ has joined #archiveteam [14:50] *** Peetz0r has quit IRC (Read error: Connection reset by peer) [14:58] *** Ara_ has joined #archiveteam [15:13] *** monod has quit IRC (Ping timeout: 512 seconds) [15:18] *** SimpBrai1 has joined #archiveteam [15:25] *** SimpBrain has quit IRC (Ping timeout: 512 seconds) [15:31] *** Infreq has quit IRC () [15:46] *** signius has quit IRC (Quit: Leaving) [15:47] *** signius has joined #archiveteam [15:48] *** signius has quit IRC (Client Quit) [15:48] *** signius has joined #archiveteam [16:40] *** monod has joined #archiveteam [16:40] mietek: what happened to ftp.cpsc.ucalgary.ca? [16:41] balrog: good question [16:41] has anyone asked the university? [16:41] Probably overzealous IT departments [16:41] Note the Calgary pages block IA [16:41] (asked as a programmer/researcher) [16:41] I’ve contacted the main researcher behind the project; no response yet [16:42] :/ ok [16:42] I’m now working down the list of people associated with the project [16:42] But they’re all long gone from the university [16:42] *** primus104 has joined #archiveteam [16:42] It really pisses me off that universities delete people’s home pages [16:42] It should be a crime to do that [16:43] http://pll.cpsc.ucalgary.ca/charity1/www/home.html does seem still to be up [16:43] and that server has no robots.txt [16:43] I know. I pasted that above :) [16:43] That server is pretty badly set up, so you can actually browse the entire hierarchy [16:43] And so I was able to recover almost all of their papers [16:44] Home pages were hosted on e.g. http://web.archive.org/web/*/pages.cpsc.ucalgary.ca/%7Espoonerd/ [16:44] and there's no robots.txt there either [16:44] this is an issue with IA where it doesn't refresh if the robots.txt is removed, apparently :/ [16:45] I’m holding out hope that IA crawls even if it’s blocked [16:45] And just silently collects the data [16:45] For the future [16:45] *** habi has joined #archiveteam [16:45] afaik IA does not [16:45] :( [16:46] *** habi has left [16:46] archivebot does! [16:47] Was it around in 1997? [16:47] no. [16:50] Do you have any tips for locating people? [16:50] https://github.com/mietek/charity-language/blob/master/doc/pdf/2003-zeng-an-implementation-of-charity.pdf [16:51] Min Zeng, Calgary MSc 2003 [16:51] *** habi1 has joined #archiveteam [16:51] Actually, that’s probably easy. [16:55] *** Ara_ has quit IRC (Ping timeout: 240 seconds) [16:58] *** habi1 has left [17:04] *** mistym has joined #archiveteam [17:28] *** Wizardcry has joined #archiveteam [17:53] *** monod has quit IRC (Ping timeout: 512 seconds) [17:56] *** Wizardcry has quit IRC (Read error: Operation timed out) [18:02] *** appledash has quit IRC (Read error: Connection reset by peer) [18:12] *** Ara_ has joined #archiveteam [18:19] *** rolfb has joined #archiveteam [18:21] *** aliz has joined #archiveteam [18:29] *** rolfb has quit IRC (Leaving...) [19:25] *** garyrh has quit IRC (Write error: Broken pipe) [19:28] *** useretail has quit IRC (hub.se irc.ac.za) [19:35] *** garyrh has joined #archiveteam [19:36] *** lytv has quit IRC (Ping timeout: 265 seconds) [19:37] arkiver: once we've started with friendfeed, we'll be able to start layervault [19:37] i found a way of grabbing everything sequentially through their api [19:39] *** lytv has joined #archiveteam [19:40] Start: awesome [19:40] looks like I can safely fully start the grab of friendfeed tonight, which means less work on that [19:40] then I'll get on layervault [19:47] *** Rickster has quit IRC (Quit: ZNC - http://znc.in) [19:48] *** SN4T14_ has joined #archiveteam [19:51] *** Rickster has joined #archiveteam [19:52] *** Mayonaise has quit IRC (Ping timeout: 512 seconds) [19:53] *** SN4T14 has quit IRC (Ping timeout: 306 seconds) [20:39] *** Mayonaise has joined #archiveteam [20:40] *** godane has quit IRC (Read error: Operation timed out) [20:47] *** svchfoo2 has quit IRC (Ping timeout: 240 seconds) [20:52] *** svchfoo2 has joined #archiveteam [21:06] *** SimpBrai1 has quit IRC (Quit: Leaving) [21:08] *** Deewiant has joined #archiveteam [21:11] *** aaaaaaaaa has joined #archiveteam [21:13] *** godane has joined #archiveteam [21:39] *** Peetz0r_ is now known as Peetz0r [21:56] *** scyther has quit IRC (Read error: Connection reset by peer) [22:15] *** wtron has joined #archiveteam [22:16] *** BlueMaxim has joined #archiveteam [22:21] *** mistym has quit IRC (Remote host closed the connection) [22:51] Start: can we talk in ~10 hours about the findings you got from layerfault? [22:51] we'll be starting a discover tomorrow for that [22:52] ok [23:15] *** mahadri has joined #archiveteam [23:58] hmmm.. I just had a fantasy about a archive warrior using html5, websockets etc... All one would need to participate would be to visit a warrior-url with a modern browser, and then it would do the job from there [23:58] *** Ara_ has quit IRC (Ping timeout: 240 seconds) [23:59] *** philpem has quit IRC (Ping timeout: 260 seconds) [23:59] *** mistym has joined #archiveteam