[00:11] *** Ymgve has quit IRC () [00:30] *** LordNigh2 has joined #archiveteam [00:38] *** Lord_Nigh has quit IRC (Ping timeout: 600 seconds) [00:38] *** LordNigh2 is now known as Lord_Nigh [01:22] *** mutoso has joined #archiveteam [01:39] *** cf has joined #archiveteam [01:41] *** ete_ has joined #archiveteam [01:48] *** primus104 has quit IRC (Leaving.) [01:48] *** arkhive has joined #archiveteam [01:49] *** the_fox has quit IRC (Ping timeout: 335 seconds) [01:49] *** mistym has quit IRC (Remote host closed the connection) [01:50] *** the_fox has joined #archiveteam [02:13] *** Aranje has quit IRC (Read error: Operation timed out) [02:16] *** philpem has quit IRC (Ping timeout: 272 seconds) [02:24] *** Aranje has joined #archiveteam [02:37] *** REiN^ has quit IRC () [02:37] *** REiN^ has joined #archiveteam [02:56] *** signius_ has quit IRC (Ping timeout: 258 seconds) [02:57] *** ete_ has quit IRC (Remote host closed the connection) [03:09] *** mistym has joined #archiveteam [03:09] *** signius_ has joined #archiveteam [03:28] *** rejon has joined #archiveteam [04:17] *** ex-parro1 has quit IRC (Leaving.) [04:28] *** ruukasu has quit IRC (Quit: WeeChat 1.0.1) [04:28] *** ruukasu has joined #archiveteam [04:29] *** ruukasu has quit IRC (Client Quit) [04:29] *** ruukasu has joined #archiveteam [04:50] *** BlueMaxim has joined #archiveteam [04:55] *** aaaaaaaaa has quit IRC (Leaving) [05:03] *** todrobbin has joined #archiveteam [05:09] *** ruukasu has quit IRC (Quit: WeeChat 1.0.1) [05:10] *** mistym has quit IRC (Remote host closed the connection) [05:10] *** mistym has joined #archiveteam [05:17] *** ruukasu has joined #archiveteam [05:33] *** Start is now known as StartAway [05:34] *** antomati_ has joined #archiveteam [05:36] *** antomatic has quit IRC (Ping timeout: 633 seconds) [05:45] *** mistym has quit IRC (Remote host closed the connection) [05:48] *** todrobbin has quit IRC (todrobbin) [05:50] *** todrobbin has joined #archiveteam [05:56] *** todrobbin has quit IRC (Quit: todrobbin) [05:59] *** dashcloud has quit IRC (Read error: Operation timed out) [06:06] *** dashcloud has joined #archiveteam [06:11] *** BiggieJo1 has joined #archiveteam [06:15] *** BiggieJon has quit IRC (Read error: Operation timed out) [07:24] *** ZorbaTHut has quit IRC (Read error: Connection reset by peer) [07:25] *** ZorbaTHut has joined #archiveteam [07:37] SketchCow: do you have a collection ready for Viddy? [07:50] *** primus104 has joined #archiveteam [07:57] *** dashcloud has quit IRC (Read error: Operation timed out) [08:04] *** dashcloud has joined #archiveteam [08:05] *** ex-parrot has quit IRC (Read error: Operation timed out) [08:06] *** ex-parrot has joined #archiveteam [08:07] *** APerti has quit IRC (Read error: Operation timed out) [08:13] *** APerti has joined #archiveteam [08:18] Yes. I need to know the user account on IA to grant admin [08:19] *** mistym has joined #archiveteam [08:29] Done. archiveteam_viddy is now your victim. [08:30] *** mistym has quit IRC (Read error: Operation timed out) [08:30] It has all the proper logo and writing and so on. [08:31] *** primus104 has quit IRC (Leaving.) [08:34] thanks SketchCow ! [08:54] *** amerrykan has quit IRC (Quit: Quitting) [09:26] *** APerti has quit IRC (Ping timeout: 480 seconds) [09:27] *** amerrykan has joined #archiveteam [09:29] *** primus104 has joined #archiveteam [09:36] *** primus104 has quit IRC (Leaving.) [10:41] *** antomati_ is now known as antomatic [10:48] *** dashcloud has quit IRC (Read error: Operation timed out) [10:55] *** dashcloud has joined #archiveteam [11:26] *** Ymgve has joined #archiveteam [11:38] *** ruukasu has quit IRC (Ping timeout: 265 seconds) [12:21] *** schbirid has joined #archiveteam [12:21] *** Emcy_ has quit IRC (Read error: Connection reset by peer) [12:55] *** cf has quit IRC (cf) [13:11] *** Morbus has quit IRC (Quit: http://www.disobey.com/) [13:14] *** Morbus has joined #archiveteam [13:16] *** ruukasu has joined #archiveteam [13:33] *** useretail has quit IRC (ircd.shaw.ca irc.shaw.ca) [13:33] *** rduser has quit IRC (ircd.shaw.ca irc.shaw.ca) [13:33] *** Jogie has quit IRC (ircd.shaw.ca irc.shaw.ca) [13:33] *** w0rp has quit IRC (ircd.shaw.ca irc.shaw.ca) [13:33] *** SadDM has quit IRC (ircd.shaw.ca irc.shaw.ca) [13:33] *** Sellyme has quit IRC (ircd.shaw.ca irc.shaw.ca) [13:33] *** w0rp_ has joined #archiveteam [13:34] *** sankin has joined #archiveteam [13:34] *** Sellyme has joined #archiveteam [13:34] *** SadDM has joined #archiveteam [13:35] *** rduser has joined #archiveteam [13:42] *** primus104 has joined #archiveteam [13:48] *** w0rp_ is now known as w0rp [13:49] *** sankin has quit IRC (Leaving.) [13:49] *** useretail has joined #archiveteam [14:00] *** sankin has joined #archiveteam [14:02] *** ruukasu has quit IRC (Quit: WeeChat 1.0.1) [14:07] *** ruukasu has joined #archiveteam [14:22] *** ruukasuu has joined #archiveteam [14:22] *** ruukasu has quit IRC (Ping timeout: 265 seconds) [14:23] *** ruukasuu has quit IRC (Client Quit) [14:37] *** REiN^ has quit IRC () [14:38] *** REiN^ has joined #archiveteam [14:57] *** BiggieJo1 is now known as BiggieJon [15:19] *** StartAway is now known as Start [15:24] *** BiggieJon has left [15:26] *** cf has joined #archiveteam [15:34] *** mistym has joined #archiveteam [15:34] *** mistym has quit IRC (Remote host closed the connection) [15:39] *** BiggieJon has joined #archiveteam [15:43] *** primus104 has quit IRC (Leaving.) [15:44] *** Start has quit IRC (Remote host closed the connection) [15:55] *** mistym has joined #archiveteam [15:56] *** aaaaaaaaa has joined #archiveteam [16:00] privat.t-online.de has a lot of personal homepages, no idea how to discover them all though [16:00] google site:privat.t-online.de ? [16:02] yeah but google does not let one paginate anymore after ~25 or something [16:03] Google will give you the number of found links, like 1 million, but will only allow you to view 1000 [16:16] *** thechip has quit IRC (Read error: Connection reset by peer) [16:18] *** Emcy has joined #archiveteam [16:23] *** chipper_ has joined #archiveteam [16:24] *** chipper_ has left [16:34] SketchCow: can you move https://archive.org/details/DarkHorseComicsMessageBoards-FinalGrab into the archive team colloection when you have a moment? [16:37] Done [16:50] *** ruukasu has joined #archiveteam [16:55] Tripod.com is going down [16:56] tripod.com [16:56] Maybe [16:58] ! ? [16:59] *** Start_ has joined #archiveteam [16:59] Sites aren't hard to save, problem is the discovery of the sites that exist. http://196thovi.tripod.com/ [16:59] somewhere between 25-jun-2014 and today they got rid of , but still link it from the front page [17:00] http://web.archive.org/web/20140625035208/http://team-blog.tripod.com/ [17:00] we have various sources (wayback, google, etc.) [17:00] but those will most likely not get everything. The wayback just doesn't have all websites and google only shows the first 1000 results [17:01] there's also searching wayback for the old url, http://members.tripod.com/* [17:01] http://urlsearch.commoncrawl.org/?q=tripod.com [17:01] yeah, I mentioned that [17:02] I mean the wayback [17:02] not the commoncrawl yet [17:02] SketchCow: if you are not able to get a full list of websites some way (they might have some hidden index on their site?), would you like to contact them about this? [17:02] we can do a discovery scraping google/bing with a dictionary if that's needed [17:03] a dictionary on google? [17:03] a word list i mean [17:04] Like: site:*.tripod.com *aaa* [17:04] site:*.tripod.com *the* etc.? [17:05] hmm it's not accepting my tripod password [17:06] should we start a project for http://ep1c.com? [17:06] it's also owned by viddy and shutting down on the same date (dec. 15) [17:07] Start_: yep, I saw your posts about it (sorry for not responding) [17:07] reset works though [17:09] *** dashcloud has quit IRC (Read error: Operation timed out) [17:10] *** mistym has quit IRC (Remote host closed the connection) [17:15] *** dashcloud has joined #archiveteam [17:22] *** ruukasu has quit IRC (Ping timeout: 265 seconds) [17:23] not tripod :(( [17:24] http://members.tripod.com/robots.txt has sitemaps [17:24] *** Start_ is now known as Start [17:25] whoever does this, please grab angelfire in the same go. same sitemap structure [17:25] also please educate me how you do it, because i got stuck with angelfire and got no help [17:26] thanks for those sitemaps [17:26] I'll create a discovery project which will find all the sites using those sitemaps [17:26] all URLs are inside the sitemaps [17:27] yes [17:28] just not the media embedded in those sites, that was my problem [17:29] schbirid: do you have an example for me? [17:29] and were you using wget lua? [17:29] http://members.tripod.com/a1modularhomes/sitemap.xml random [17:29] nope [17:29] i gave up because i would have made a mess [17:29] *** lbft has quit IRC (Read error: Operation timed out) [17:30] do you mean by "the media embedded in those sites" external pictures and videos? [17:30] the sitemaps only have html pages [17:30] so any images etc need to be found [17:30] I see what you mean now, sorry [17:30] :) [17:30] Yeah, I'll get those done by wget lua [17:31] maybe there should be a tripod channel [17:31] Maybe it's not going to be a discovery project btw [17:31] but we'll see [17:32] #wobbly ? [17:33] SketchCow: do we have the shutdown date? [17:33] *** lbft has joined #archiveteam [17:34] No, and there's a chance this tip may have just come from someone finding what you did - the site seems really on the rack, blog no longer works, etc. [17:34] *** mistym has joined #archiveteam [17:38] #byepod is what I was thinking [17:43] *** philpem has joined #archiveteam [17:50] *** Start has quit IRC (Ping timeout: 272 seconds) [18:05] *** Jogie has joined #archiveteam [18:09] *** APerti has joined #archiveteam [18:10] *** rejon has quit IRC (Ping timeout: 480 seconds) [18:35] *** cf_ has joined #archiveteam [18:36] *** cf has quit IRC (Ping timeout: 246 seconds) [18:36] *** cf_ is now known as cf [18:39] *** primus104 has joined #archiveteam [18:43] *** thechip has joined #archiveteam [18:56] *** thechip has quit IRC (Read error: Operation timed out) [19:03] SketchCow: ok if I wait till there is more information on the shutdown before I get the scripts ready? [19:04] *** Sk2d has joined #archiveteam [19:04] *** Sk1d has quit IRC (Read error: Operation timed out) [19:04] *** Sk2d is now known as Sk1d [19:11] *** Sk1d has quit IRC (Ping timeout: 265 seconds) [19:11] *** dashcloud has quit IRC (Read error: Operation timed out) [19:14] *** Sk1d has joined #archiveteam [19:21] *** dashcloud has joined #archiveteam [19:23] *** primus104 has quit IRC (Leaving.) [19:30] *** dashcloud has quit IRC (Remote host closed the connection) [19:31] *** dashcloud has joined #archiveteam [19:32] midas: http://dat.serveert.me.uk/p/ftp [19:32] is currently down :/ [19:34] Yes, please do. [19:35] ok [19:42] *** Start has joined #archiveteam [19:43] *** ruukasu has joined #archiveteam [19:46] *** bauruine has quit IRC (Ping timeout: 265 seconds) [19:48] *** philpem has quit IRC (Ping timeout: 272 seconds) [19:51] *** bauruine has joined #archiveteam [20:02] https://roon.io [20:02] http://blog.ghost.org/roon/ [20:02] "The Roon.io hosted platform will be closing its doors on December 31st, 2014." [20:03] *** Kniffy has quit IRC (Quit: pup) [20:03] *** thechip has joined #archiveteam [20:05] *** Kniffy has joined #archiveteam [20:08] *** ruukasu has quit IRC (Ping timeout: 265 seconds) [20:16] here's a google crawl for roon: http://paste.archivingyoursh.it/goxowihalo.avrasm [20:19] *** SN4T14 has quit IRC (Ping timeout: 369 seconds) [20:19] looks like roon can be sequentially scraped through its api: https://roon.io/developer/blogs [20:28] *** Start has quit IRC (Read error: Connection reset by peer) [20:32] *** Start has joined #archiveteam [20:34] *** primus104 has joined #archiveteam [20:37] SketchCow: a fast and small project is starting very soon: ziplist [20:37] #zipyourlips [20:37] *** ex-parro1 has joined #archiveteam [20:37] That one is going to FOS, currently 30.000 warc's [20:42] *** dashcloud has quit IRC (Read error: Operation timed out) [20:43] cf: since you've been doing API scrapes for a couple recent projects, mind doing one for roon? [20:44] http://archiveteam.org/index.php?title=Roon [20:44] Start: I’ll have a go at it. Not sure when I’ll get around to it, but within a week or so [20:45] Start: are those api's just incremental numbered? [20:45] yes [20:45] *** dashcloud has joined #archiveteam [20:45] then I'll do them in the scripts, we also save the api urls that way [20:45] ok [20:45] Yea, just about to say [20:46] we need an irc channel name for roon [20:46] #rooin [20:48] or maybe #rooined [20:49] i like rooined better [21:01] *** T31M has quit IRC (Quit: Leaving) [21:09] *** aaaaaaaaa has quit IRC (Leaving) [21:09] *** aaaaaaaaa has joined #archiveteam [21:25] *** cf has quit IRC (Ping timeout: 265 seconds) [21:26] *** Start_ has joined #archiveteam [21:27] *** Start has quit IRC (Read error: Connection reset by peer) [21:28] arkiver: ill fix it in a minute [21:36] fixed [21:36] forgot it rebooted this box [21:36] *** bauruine has quit IRC (Ping timeout: 265 seconds) [21:36] *** K4k has joined #archiveteam [21:38] *** xk_id has quit IRC (Read error: Operation timed out) [21:42] *** bauruine has joined #archiveteam [21:54] thanks midas [21:56] *** SN4T14 has joined #archiveteam [21:57] *** sankin has quit IRC (Leaving.) [22:00] *** hive-mind has quit IRC (Ping timeout: 272 seconds) [22:07] *** ruukasu has joined #archiveteam [22:09] *** cbb has joined #archiveteam [22:11] *** thechip has quit IRC (Quit: Leaving...) [22:12] *** hive-mind has joined #archiveteam [22:14] *** Start_ is now known as Start [22:17] *** dashcloud has quit IRC (Read error: Operation timed out) [22:20] *** dashcloud has joined #archiveteam [22:22] *** Start has quit IRC (Read error: Connection reset by peer) [22:24] SketchCow: ziplist should be incoming on FOS now, 30.000 warc's [22:24] *** Start has joined #archiveteam [22:25] *** K4k has quit IRC (Ping timeout: 378 seconds) [22:25] *** schbirid has quit IRC (Leaving) [22:27] *** cf has joined #archiveteam [22:40] *** REiN^ has quit IRC (Read error: Connection reset by peer) [22:41] *** REiN^ has joined #archiveteam [22:46] arkiver: the highest valid roon blog i could find was: https://roon.io/api/v1/blogs/122233 [22:47] *** REiN^ has quit IRC (Read error: Connection reset by peer) [22:47] Start: thanks, but first ep1c [22:47] Tomorrow is ep1c day [22:47] ok [22:48] i'm guessing that ep1c's grab scripts will be very similar to viddy's? [22:48] *** REiN^ has joined #archiveteam [22:49] *** signius_ has quit IRC (Read error: Operation timed out) [22:50] probably, but I'll see that tomorrow [23:02] *** signius_ has joined #archiveteam [23:07] *** ex-parro1 has quit IRC (Remote host closed the connection) [23:07] so tripod really is going down? [23:08] *** Start has quit IRC (Ping timeout: 378 seconds) [23:08] maaaybe [23:08] might as well grab angelfire while we're at it- you'd then have the three big players from the 90s [23:10] does lycos still host homepages? [23:11] *** REiN^ has quit IRC (Read error: Operation timed out) [23:15] yeah- there's classic 90s Angelfire:http://www.angelfire.com/sd/ScrewAOL/ and the modern Angelfire: http://www.angelfire.lycos.com/ [23:17] *** ex-parro1 has joined #archiveteam [23:26] so modern angelfire is probably not too hard to archive because there's sitemaps listing the pages: http://www.angelfire.com/sitemap-index-00.xml.gz [23:26] I started that project but when wget got killed because of memkiller, I stopped [23:50] *** cf has quit IRC (Quit: cf) [23:57] so some of the KBS News Today i got are incomplete [23:58] doing a 2nd rtmpdump gets me a bigger file [23:58] i must have been doing too many at once