[00:00] so lifehacker.com and deadspin.com sitemap urls are all updated [00:07] tay.kotaku.com redirect to tay.kinja.com [00:11] *** JesseW has joined #archiveteam-bs [00:12] *** BartoCH has joined #archiveteam-bs [00:15] Suggestions for scripts for owners of a private github repo to use extract the issues and publish them? https://github.com/npm/www/issues/9 [00:15] I suggested joeyh's github-backup -- but other suggestions would also be very welcome. [00:25] *** RichardG_ is now known as RichardG [00:27] https://archive.org/details/Making_of_Antarctica [00:27] *** JesseW has quit IRC (Read error: Operation timed out) [00:27] https://archive.org/details/Otaku_JJ_Beineix [00:28] https://archive.org/details/Raid_1954 [00:28] https://archive.org/details/Audience_of_One_-_2007 [00:29] https://archive.org/details/Showdown_in_Little_Tokyo_Uncut_CG [00:29] https://archive.org/details/Hollywood_Mavericks.1990.Florence_Dauman.Dale_Ann_Steiber.mkv [00:30] thats all of the Cinemageddon videos i uploaded to FOS [00:30] i figure people here would want them [00:37] I'm pretty sure lycos ignores robots.txt. At least partially... [00:39] Does Lycos have any search operators? [00:40] looks like they lost most of them in 2004 when they switched to Yahoo! DB? [00:40] http://www.searchengineshowdown.com/features/lycos/ [00:45] the advanced page search doesn't seem to exist anymore :/ [00:47] *** BartoCH has quit IRC (Ping timeout: 260 seconds) [00:57] *** BlueMaxim has joined #archiveteam-bs [01:02] *** username1 has joined #archiveteam-bs [01:05] *** schbirid2 has quit IRC (Read error: Operation timed out) [01:59] *** schbirid2 has joined #archiveteam-bs [02:03] *** username1 has quit IRC (Read error: Operation timed out) [02:06] *** Aranje has joined #archiveteam-bs [02:19] *** hook54321 has left [03:05] *** hook54321 has joined #archiveteam-bs [03:13] can someone re-op me in #archivebot? svchfoo disappeared [03:28] *** mutoso_ has joined #archiveteam-bs [03:29] *** Smiley has quit IRC (Read error: Operation timed out) [03:29] *** beardicus has quit IRC (Read error: Operation timed out) [03:31] *** Whopper_ has joined #archiveteam-bs [03:31] *** mutoso has quit IRC (Read error: Operation timed out) [03:31] *** VADemon has quit IRC (Quit: left4dead) [03:32] *** closure has quit IRC (Read error: Operation timed out) [03:34] *** Smiley has joined #archiveteam-bs [03:37] *** Whopper has quit IRC (Read error: Operation timed out) [03:47] *** closure has joined #archiveteam-bs [04:20] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [04:27] *** Sk1d has joined #archiveteam-bs [04:28] *** Aranje has quit IRC (Ping timeout: 260 seconds) [04:36] *** beardicus has joined #archiveteam-bs [04:49] i'm uploading a adland.tv web archive from 2014-09-01 [04:50] just know its incomplete meaning it stop before being completed [04:50] but its +500M of it [04:51] Star Trek Beyond [04:52] a-ok [04:52] i watched that on my birthday [04:52] i also went to five guys [04:53] https://archive.org/details/adland.tv-20140901 [05:04] *** brayden has joined #archiveteam-bs [05:04] *** swebb sets mode: +o brayden [05:09] *** brayden_ has quit IRC (Read error: Operation timed out) [05:36] *** tomwsmf has quit IRC (Ping timeout: 255 seconds) [06:09] *** dashcloud has quit IRC (Read error: Operation timed out) [06:12] wow [06:12] The Cuban CDN http://hn.premii.com/#/article/12319063 [06:13] *** dashcloud has joined #archiveteam-bs [07:14] *** JesseW has joined #archiveteam-bs [07:27] *** acridAxid has quit IRC (marauder) [07:28] *** acridAxid has joined #archiveteam-bs [07:51] *** Honno has joined #archiveteam-bs [07:59] *** JesseW has quit IRC (Ping timeout: 370 seconds) [08:29] *** GE has joined #archiveteam-bs [08:35] *** BartoCH has joined #archiveteam-bs [10:51] *** GE_ has joined #archiveteam-bs [10:54] *** GE has quit IRC (Ping timeout: 255 seconds) [10:54] *** GE_ is now known as GE [11:47] *** BartoCH has quit IRC (Ping timeout: 260 seconds) [11:48] *** BartoCH has joined #archiveteam-bs [12:03] *** BartoCH has quit IRC (Quit: WeeChat 1.5) [12:05] *** BartoCH has joined #archiveteam-bs [12:57] *** BartoCH has quit IRC (Ping timeout: 260 seconds) [13:02] *** BartoCH has joined #archiveteam-bs [13:08] *** luckcolor has quit IRC (Read error: Operation timed out) [13:08] *** luckcolor has joined #archiveteam-bs [13:21] *** davidar has quit IRC (Quit: Connection closed for inactivity) [13:26] *** GE has quit IRC (Quit: zzz) [14:08] *** atrocity has joined #archiveteam-bs [14:08] oh 90%, why was I so young... [14:08] 90's... [14:08] https://www.youtube.com/watch?v=IY2j_GPIqRA [14:41] *** BlueMaxim has quit IRC (Quit: Leaving) [15:04] *** RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue) [15:06] *** RichardG has joined #archiveteam-bs [15:19] *** GE has joined #archiveteam-bs [16:26] *** GE_ has joined #archiveteam-bs [16:28] *** GE has quit IRC (Ping timeout: 255 seconds) [16:28] *** GE_ is now known as GE [17:38] *** JesseW has joined #archiveteam-bs [17:59] i'm uploading more pdfs from the Sky and Telescope [18:59] *** GE_ has joined #archiveteam-bs [19:00] *** GE has quit IRC (Ping timeout: 255 seconds) [19:00] *** GE_ is now known as GE [19:27] Is that the one I tried yesterday HCross [19:27] 4 million items? [19:28] is it the NASA funded docs? [19:28] Yis [19:28] (the NASA items aren't that big tho) [19:29] ah I was looking at it too [19:29] ArchiveBot prob wont get it all [19:29] ive exported it as XML and am looking at it [19:29] Archivebot went a bit mad. [19:29] I was working through it as individual items [19:32] Igloo^, doing some testing, but it may be easier to create a large list [19:36] *** RichardG has quit IRC (Ping timeout: 250 seconds) [19:37] Yeah, Create a list and whack it through AB [19:42] that nasa docs i uploaded is around 90k [19:42] *** RichardG has joined #archiveteam-bs [19:43] *** tomwsmf has joined #archiveteam-bs [19:44] Igloo^, they let you export the ID #'s but its per page. If I list it by 100 then its only 9 pages. Ill then write something that will generate up the URLs [19:45] !ao works [19:45] However, Doesn't get the sub images [19:45] or not. Managed to download a complete list [19:45] yea [19:45] Which is a bit ropey. [19:46] !ao gets too much other stuff [19:46] ao gets the full site for the waybackmachine [19:46] it may be that its better if I do a grab-site instance with some custom ignores etc [19:46] I was thinking of doing a custom Heritrix run [19:46] BUT I don't think that'll work [19:46] as it'll be huge [19:46] want me to generate a full list of URLs anyway [19:47] Sure [20:07] Igloo^, www.ncbi.nlm.nih.gov/pmc/articles/PMC4973959 [20:07] www.ncbi.nlm.nih.gov/pmc/articles/PMC4980455 [20:07] www.ncbi.nlm.nih.gov/pmc/articles/PMC4971634 [20:07] www.ncbi.nlm.nih.gov/pmc/articles/PMC4964660 [20:07] www.ncbi.nlm.nih.gov/pmc/articles/PMC4971156 [20:07] www.ncbi.nlm.nih.gov/pmc/articles/PMC4939048 [20:07] www.ncbi.nlm.nih.gov/pmc/articles/PMC4937211 [20:07] www.ncbi.nlm.nih.gov/pmc/articles/PMC4917110 [20:07] www.ncbi.nlm.nih.gov/pmc/articles/PMC4934352 [20:07] www.ncbi.nlm.nih.gov/pmc/articles/PMC4926486 [20:07] www.ncbi.nlm.nih.gov/pmc/articles/PMC4932956 [20:07] www.ncbi.nlm.nih.gov/pmc/articles/PMC4872529 [20:07] www.ncbi.nlm.nih.gov/pmc/articles/PMC4870578 [20:08] www.ncbi.nlm.nih.gov/pmc/articles/PMC4896262 [20:08] www.ncbi.nlm.nih.gov/pmc/articles/PMC4919777 [20:08] www.ncbi.nlm.nih.gov/pmc/articles/PMC4848480 [20:08] www.ncbi.nlm.nih.gov/pmc/articles/PMC4831017 [20:08] www.ncbi.nlm.nih.gov/pmc/articles/PMC4846461 [20:08] www.ncbi.nlm.nih.gov/pmc/articles/PMC4820435 [20:08] www.ncbi.nlm.nih.gov/pmc/articles/PMC4814050 [20:08] www.ncbi.nlm.nih.gov/pmc/articles/PMC4797119 [20:08] www.ncbi.nlm.nih.gov/pmc/articles/PMC4794207 [20:08] www.ncbi.nlm.nih.gov/pmc/articles/PMC4808930 [20:08] Fucking patebin it or something [20:08] www.ncbi.nlm.nih.gov/pmc/articles/PMC4866469 [20:08] www.ncbi.nlm.nih.gov/pmc/articles/PMC4771323 [20:08] Instead of several hundred lines [20:08] www.ncbi.nlm.nih.gov/pmc/articles/PMC4751316 [20:08] :P [20:08] www.ncbi.nlm.nih.gov/pmc/articles/PMC4750446 [20:08] www.ncbi.nlm.nih.gov/pmc/articles/PMC4760178 [20:08] www.ncbi.nlm.nih.gov/pmc/articles/PMC4810239 [20:08] www.ncbi.nlm.nih.gov/pmc/articles/PMC4738353 [20:08] www.ncbi.nlm.nih.gov/pmc/articles/PMC4829277 [20:08] www.ncbi.nlm.nih.gov/pmc/articles/PMC4770934 [20:08] www.ncbi.nlm.nih.gov/pmc/articles/PMC4731148 [20:08] www.ncbi.nlm.nih.gov/pmc/articles/PMC4729913 [20:08] www.ncbi.nlm.nih.gov/pmc/articles/PMC4728390 [20:08] www.ncbi.nlm.nih.gov/pmc/articles/PMC4727388 [20:08] www.ncbi.nlm.nih.gov/pmc/articles/PMC4718941 [20:08] *** HCross was kicked by Frogging (HCross) [20:08] Thank you Frogging [20:10] I wonder if he meant to paste the pastebin link but still had the list in his clipboard :po [20:10] :p * [20:10] I think that's what he meant to do :P [20:10] But yaknow, noob etc [20:12] Now waiting while my hexchat stops having a meltdown over that, sorry [20:12] no problem :p [20:12] *** Frogging sets mode: +o HCross2 [20:13] *** HCross has joined #archiveteam-bs [20:13] there we go [20:13] http://paste.nerds.io/axorogoxif.avrasm [20:13] *** Frogging sets mode: +o HCross [20:13] thanks [20:15] *** JesseW has quit IRC (Quit: Leaving.) [20:15] *** JesseW has joined #archiveteam-bs [20:16] *** kristian_ has joined #archiveteam-bs [20:26] *** JesseW has quit IRC (Ping timeout: 370 seconds) [20:29] is it 4 million docs? [20:29] the new ones that they released are just 900 oddd [20:34] It would be a good warrior job. [20:41] http://www.ncbi.nlm.nih.gov/pmc/journals/1978/ [20:41] you grab by journal number [20:41] then grab the links from those pages [20:41] the pdfs are linked there [20:42] http://www.ncbi.nlm.nih.gov/pmc/issues/218561/ [21:17] *** Coderjoe has quit IRC (Read error: Operation timed out) [21:45] *** GE has quit IRC (Quit: zzz) [21:56] *** GE has joined #archiveteam-bs [22:17] *** GE_ has joined #archiveteam-bs [22:20] *** GE has quit IRC (Ping timeout: 255 seconds) [22:20] *** GE_ is now known as GE [22:34] *** Start has quit IRC (Quit: Disconnected.) [22:34] *** Start has joined #archiveteam-bs [22:45] *** Coderjoe has joined #archiveteam-bs [22:59] *** GE has quit IRC (Remote host closed the connection) [23:13] *** kristian_ has quit IRC (Leaving) [23:14] *** Honno has quit IRC (Read error: Operation timed out) [23:14] *** tomwsmf has quit IRC (Read error: Operation timed out)