[00:15] *** sivoais has joined #archiveteam [00:16] *** kyan has joined #archiveteam [00:23] *** mistym has quit IRC (Remote host closed the connection) [00:26] *** mistym has joined #archiveteam [00:26] *** mistym has quit IRC (Remote host closed the connection) [00:32] *** kyan has quit IRC (Quit: This computer has gone to sleep) [00:37] *** kyan has joined #archiveteam [00:39] *** mistym has joined #archiveteam [00:53] *** philpem has quit IRC (Ping timeout: 252 seconds) [00:58] *** Yiffiel_d has joined #archiveteam [00:59] oh, it finally connected me, woohoo [01:00] *** link343 has quit IRC () [01:11] *** sb057 has joined #archiveteam [01:11] so have the blip alarms been sounded yet [01:12] sb057: yup, #blooper.tv [01:13] *** DFJustin has quit IRC (Quit: IMHOSTFU) [01:16] *** DFJustin has joined #archiveteam [01:16] *** swebb sets mode: +o DFJustin [01:18] *** DFJustin has quit IRC (Client Quit) [01:18] *** DFJustin has joined #archiveteam [01:18] *** swebb sets mode: +o DFJustin [01:18] *** DFJustin has quit IRC (Client Quit) [01:19] *** DFJustin has joined #archiveteam [01:19] *** swebb sets mode: +o DFJustin [01:23] wow, didn't know rapidshare finally died before hitting the site. Thought they would just keep making bad decisions yet mysteriously survive forever like reddit and yahoo. [01:40] *** Spritecla has quit IRC (Read error: Operation timed out) [01:40] *** Spritecla has joined #archiveteam [01:48] *** Asparagir has quit IRC (Asparagir) [01:50] *** schbirid2 has quit IRC (Read error: Operation timed out) [01:54] Dead dead everybody's dead [01:54] I'm FINALLY uploading the fanfiction grab. [02:04] *** schbirid2 has joined #archiveteam [02:11] ff.net? [02:12] ...that reminds me, did anyone ever save all the x-/r-rated stuff there or was that before this site's time [02:14] Archive Team: Saving Your Porn Since 2009 [02:16] nah nah that was back in like....'03? Because a bunch of soccer moms complained about their kids reading naughty fics. [02:16] nm [02:19] *** BlueMaxim has joined #archiveteam [02:21] *** primus104 has quit IRC (Leaving.) [02:22] https://en.wikipedia.org/wiki/FanFiction.Net#NC-17_ratings oh 2002 [02:23] wow they had three content purges. [02:24] First that, then a CYOA purge in 2005, and then they purged naughty stories again 10 years later: https://en.wikinews.org/wiki/FanFiction.Net_adult_content_purge_felt_across_fandom_two_weeks_on [02:32] -bs [02:32] Took a 552-mb first letter file, got it down to 152mb. [02:32] So that's good. [02:33] *** JesseW has joined #archiveteam [02:40] *** Asparagir has joined #archiveteam [02:49] *** Coderjoe has quit IRC (Ping timeout: 252 seconds) [02:49] *** Ravenloft has quit IRC () [02:50] *** Coderjoe has joined #archiveteam [04:17] *** Stiletto has quit IRC (Read error: Connection reset by peer) [04:18] *** Stiletto has joined #archiveteam [04:31] *** aaaaaaaaa has quit IRC (Leaving) [04:45] *** JesseW has quit IRC (Quit: Leaving.) [04:53] *** Asparagir has quit IRC (Asparagir) [04:55] *** Stiletto has quit IRC (Read error: Operation timed out) [04:56] *** Stiletto has joined #archiveteam [05:02] *** cadbury_ has quit IRC (Ping timeout: 606 seconds) [05:10] *** cadbury_ has joined #archiveteam [05:12] *** JesseW has joined #archiveteam [05:29] *** john1 has quit IRC (WeeChat 1.2) [05:46] *** JesseW has quit IRC (hub.dk irc.underworld.no) [05:46] *** espes__ has quit IRC (hub.dk irc.underworld.no) [05:46] *** will has quit IRC (hub.dk irc.underworld.no) [05:46] *** filippo__ has quit IRC (hub.dk irc.underworld.no) [05:46] *** Aranje has quit IRC (hub.dk irc.underworld.no) [05:46] *** PepsiMax has quit IRC (hub.dk irc.underworld.no) [05:46] *** zhongfu has quit IRC (hub.dk irc.underworld.no) [05:46] *** vOYtEC has quit IRC (hub.dk irc.underworld.no) [05:46] *** wm_ has quit IRC (hub.dk irc.underworld.no) [05:46] *** ersi has quit IRC (hub.dk irc.underworld.no) [05:46] *** raylee has quit IRC (hub.dk irc.underworld.no) [05:46] *** Atluxity has quit IRC (hub.dk irc.underworld.no) [05:48] *** espes___ has joined #archiveteam [05:49] *** ersi_ has joined #archiveteam [05:50] *** PepsiMax_ has joined #archiveteam [05:50] *** zhongfu_ has joined #archiveteam [05:51] *** will__ has joined #archiveteam [06:02] *** will__ is now known as will [06:08] *** vOYtEC has joined #archiveteam [06:13] *** filippo__ has joined #archiveteam [06:23] *** wp494 has quit IRC (Ping timeout: 306 seconds) [06:38] *** Start has quit IRC (Read error: Operation timed out) [06:59] *** wp494 has joined #archiveteam [06:59] *** wp494 has quit IRC (Excess Flood) [06:59] *** wp494 has joined #archiveteam [07:03] *** WinterFox has joined #archiveteam [07:04] *** mistym has quit IRC (Remote host closed the connection) [07:07] *** mistym has joined #archiveteam [07:08] *** WinterFox has quit IRC (Read error: Operation timed out) [07:11] *** mistym has quit IRC (Remote host closed the connection) [07:14] *** Wyatts has quit IRC (Remote host closed the connection) [07:19] *** Wyatts has joined #archiveteam [07:21] *** WinterFox has joined #archiveteam [07:28] *** mafrasi2 has quit IRC (Read error: Connection reset by peer) [07:30] *** mafrasi2 has joined #archiveteam [07:31] *** Spritecla has quit IRC (Quit: If wanting to be in a world filled with human horses is wrong, then I don't want to be right.) [07:53] *** mafrasi2 has quit IRC (Read error: Connection reset by peer) [07:57] *** mafrasi2 has joined #archiveteam [08:11] *** mistym has joined #archiveteam [08:21] *** mistym has quit IRC (Read error: Operation timed out) [08:30] *** primus104 has joined #archiveteam [08:34] *** atomotic has joined #archiveteam [08:39] *** dashcloud has quit IRC (Read error: Operation timed out) [08:47] *** dashcloud has joined #archiveteam [09:01] *** zenguy_pc has quit IRC (Read error: Connection reset by peer) [09:01] *** zenguy_pc has joined #archiveteam [09:09] *** raylee has joined #archiveteam [09:10] *** wm_ has joined #archiveteam [09:15] *** mistym has joined #archiveteam [09:19] *** mistym has quit IRC (Ping timeout: 252 seconds) [09:31] *** garyrh has quit IRC (http://bnc4free.com/) [10:18] *** MMovie has quit IRC (Ping timeout: 306 seconds) [10:27] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [10:42] *** link343 has joined #archiveteam [10:44] *** RedType has quit IRC (Read error: Operation timed out) [10:44] *** xmc has quit IRC (Ping timeout: 483 seconds) [10:50] *** RedType has joined #archiveteam [10:50] *** WinterFox has quit IRC (Read error: Operation timed out) [10:54] *** primus104 has quit IRC (Leaving.) [10:57] *** xmc has joined #archiveteam [10:57] *** swebb sets mode: +o xmc [11:04] *** WinterFox has joined #archiveteam [11:39] *** atomotic has joined #archiveteam [12:26] *** McGee has joined #archiveteam [12:28] *** godane has quit IRC (Quit: Leaving.) [12:31] *** godane has joined #archiveteam [12:52] so [12:52] what projects require assistance? [13:11] *** primus104 has joined #archiveteam [13:16] *** mistym has joined #archiveteam [13:21] *** mistym has quit IRC (Ping timeout: 252 seconds) [13:26] *** oldcad has joined #archiveteam [13:35] *** Start has joined #archiveteam [13:46] *** moof_ has joined #archiveteam [13:48] *** BlueMaxim has quit IRC (Quit: Leaving) [13:53] *** Ungstein has quit IRC (Quit: Leaving.) [13:54] *** zenguy_pc has quit IRC (Read error: Connection reset by peer) [13:54] *** zenguy_pc has joined #archiveteam [13:57] *** MMovie has joined #archiveteam [13:58] *** moof_ has quit IRC (Quit: Page closed) [14:10] *** garyrh has joined #archiveteam [14:17] *** Atluxity has joined #archiveteam [14:19] *** Ungstein has joined #archiveteam [14:20] *** Ungstein has quit IRC (Read error: Connection reset by peer) [14:24] *** Ungstein has joined #archiveteam [14:25] *** Ungstein has quit IRC (Read error: Connection reset by peer) [14:28] *** vitzli has joined #archiveteam [14:30] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [14:31] *** mistym has joined #archiveteam [14:44] *** mistym has quit IRC (Remote host closed the connection) [14:46] *** Ungstein has joined #archiveteam [15:00] *** Ungstein has quit IRC (Quit: Leaving.) [15:01] google moderator needs to be started soon. [15:02] *** mistym has joined #archiveteam [15:02] okay [15:02] chfoo: any way I can help? [15:04] Google moderator has a lot of #'s in the urls [15:04] some urls don't [15:04] ok [15:04] McGee: maybe try sampling the csv urls to see how many exist or not. e.g., if more than 50% of the urls exist, then we just grab them all without checking if they exist [15:04] we'll grab the urls without it with archivebot [15:04] cool [15:04] there's at least 2 million csv urls [15:04] which all return http status 200 [15:05] chfoo: https://github.com/ArchiveTeam/googlemoderator-items/blob/master/01_moderator.txt [15:05] if they all return 200 let's grab all of them [15:05] tiny project [15:05] arkiver: that list is far from complete [15:05] ^ [15:06] i mean they return 200 regardless whether the csv file exists [15:06] so that makes 16777216 urls [15:06] yes, so we'll grab all of them [15:07] 21fdbf is the highest i found [15:07] ok, good [15:07] will have scripts ready today [15:07] ok [15:11] chfoo: yes, I think we got that [15:11] oh oops [15:11] scrolled back a bit, saw "i mean they return 200 regardless whether the csv file exists" :P [15:12] :D [15:12] chfoo: do you know of any way to correctly grab the urls with #? [15:13] arkiver: no, i haven't yet [15:13] *** primus104 has quit IRC (Leaving.) [15:15] chfoo: ok [15:23] http://arstechnica.com/tech-policy/2015/07/site-revives-grooveshark-playlists-free-streaming-and-says-its-100-legit/ -- wonder how hard it would be to scrape the db data before this goes down [15:37] *** closure has quit IRC (Read error: Connection reset by peer) [15:37] To be honest, I think we need more people for Archivebot [15:42] oh SketchCow ? [15:42] SketchCow: we should be do something like rss/google scrap in archivebot [15:43] that way we can set it to check ever 3 or 12 hours rss feeds and grab all links that not in archivebot alreadly [15:43] *** closure has joined #archiveteam [15:55] I have a group offering me a superfeed. [15:55] I'd rather archivebot be a laser than a harvester. [15:55] Otherwise we end up with the nightmare that is the Internet Archive crawlers. [16:01] true [16:05] chfoo: can you please add http://tracker.archiveteam.org/googlemoderator/ to the projects.json? [16:05] and you can remove the atomicgamer from the projects.json if you have time [16:05] SketchCow: btw i have all of the KBS World Radio from 2004 to 2014 now [16:05] at least the english program [16:05] *** mistym has quit IRC (Quit: Leaving...) [16:07] *** dashcloud has quit IRC (Read error: Connection reset by peer) [16:08] *** dashcloud has joined #archiveteam [16:08] chfoo: https://www.google.com/moderator/g/csv/moderator-20fdb1.zip return 403 [16:09] and for exa [16:15] *** Emcy has joined #archiveteam [16:19] scripts for google moderator are ready [16:20] ok, servers are ready [16:21] cool [16:21] arkiver: where? [16:21] it's visible after chfoo has added it to the projects.json [16:21] I'll load up the items and do a few more tests to be sure [16:22] we're also going to start the grab of last.fm profiles soon [16:22] working on that now too ^ [16:22] ok [16:24] IRC Channel? [16:24] #lastchance.fm [16:24] #moderhater [16:25] *** Ungstein has joined #archiveteam [16:26] thx [17:04] *** dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.) [17:05] *** dashcloud has joined #archiveteam [17:10] Interesting https://tcrf.net/The_Cutting_Room_Floor [17:15] *** McGee has quit IRC (http://www.mibbit.com ajax IRC Client) [17:29] Yes, we've grabbed it before [17:31] *** VADemon has joined #archiveteam [17:34] *** K4k has joined #archiveteam [17:41] *** primus104 has joined #archiveteam [18:06] The Google Moderator grab is running! [18:07] *** primus104 has quit IRC (Leaving.) [18:07] *** ersi_ is now known as ersi [18:15] not far from the 1st 1000 items [18:20] *** vitzli has quit IRC (Quit: Leaving) [18:24] *** Ungstein has quit IRC (Ping timeout: 265 seconds) [18:28] *** kyan has quit IRC (Quit: This computer has gone to sleep) [18:35] *** primus104 has joined #archiveteam [18:57] *** primus104 has quit IRC (Leaving.) [19:02] *** aaaaaaaaa has joined #archiveteam [19:02] *** swebb sets mode: +o aaaaaaaaa [19:03] *** kyan has joined #archiveteam [19:12] *** Stiletto has quit IRC (Ping timeout: 624 seconds) [19:47] i wrote my first wiki article http://archiveteam.org/index.php?title=Niconico [19:47] sorry, should be in #archiveteam-bs [19:48] good article! [19:50] *** kyan has quit IRC (Quit: This computer has gone to sleep) [20:03] *** VADemon has quit IRC (Quit: left4dead) [20:03] http://tech.slashdot.org/story/15/07/21/1757239/google-photos-to-shut-down-august-1 [20:12] *** primus104 has joined #archiveteam [20:31] *** Stiletto has joined #archiveteam [20:39] *** primus104 has quit IRC (Leaving.) [20:49] *** K4k has quit IRC (WeeChat 1.2) [21:22] *** primus104 has joined #archiveteam [21:32] *** dashcloud has quit IRC (Read error: Operation timed out) [21:33] *** dashcloud has joined #archiveteam [21:33] yeah actually it's pretty well done szalwia [21:37] *** PotcFdk has joined #archiveteam [21:38] Hey, on http://archiveteam.org/index.php?title=Pomf.se it says "to prevent link rot, all a.pomf.se links now redirect to the corresponding file on the Wayback Machine.". However, the Wayback Machine says "Page cannot be crawled or displayed due to robots.txt. [21:38] " [21:38] Has anyone noticed yet? [21:39] they broke their robots.txt by redirecting it to wayback, iirc [21:40] Yeah that's what I assumed. Maybe they should make an exception for robots.txt [21:41] How would one go about contacting the correct person? [21:43] you need to contact the owner of pomf [21:44] maybe you can through his/her twitter account or mail maybe [21:44] the owner needs to exclude the robots.txt from being redirecte to the wayback macine [21:44] Will try my best, thanks. o/ [21:44] machine* [21:45] *** McGEE has joined #archiveteam [21:48] *** philpem has joined #archiveteam [21:50] PotcFdk: try neku@pomf.se or https://twitter.com/nekunekus [21:54] szalwia: Yeah I just sent a mail. [21:58] *** dan- has quit IRC (Quit: Nyan nyan) [22:07] *** dan- has joined #archiveteam [22:19] *** McGEE has quit IRC (http://www.kiwiirc.com/ - A hand crafted IRC client) [22:36] *** PotcFdk has quit IRC (Quit: ~/~) [22:55] *** aaaaaaaaa has quit IRC (Leaving) [23:29] *** Spritecla has joined #archiveteam [23:42] *** kyan has joined #archiveteam [23:47] *** aaaaaaaaa has joined #archiveteam [23:47] *** swebb sets mode: +o aaaaaaaaa [23:58] *** Stiletto has quit IRC (Read error: Operation timed out)