[00:08] *** RichardG_ has joined #archiveteam-bs [00:15] *** RichardG has quit IRC (Ping timeout: 499 seconds) [00:16] *** RichardG_ is now known as RichardG [00:20] *** ris has quit IRC () [00:21] *** JesseW has joined #archiveteam-bs [00:25] *** VADemon has quit IRC (left4dead) [00:27] *** DoomTay has quit IRC (Ping timeout: 268 seconds) [00:34] *** DoomTay has joined #archiveteam-bs [00:55] *** Jeroen52 has quit IRC (Ping timeout: 260 seconds) [00:58] *** coretx has quit IRC (Ping timeout: 506 seconds) [01:02] *** JesseW has quit IRC (Ping timeout: 370 seconds) [01:03] *** Jeroen52 has joined #archiveteam-bs [01:11] *** coretx has joined #archiveteam-bs [01:37] *** mutoso has joined #archiveteam-bs [01:37] *** mutoso_ has quit IRC (Read error: Connection reset by peer) [01:41] *** davidar_ has quit IRC (Quit: Connection closed for inactivity) [01:55] *** arkiver has quit IRC (Read error: Operation timed out) [01:57] *** arkiver has joined #archiveteam-bs [02:00] *** zenguy has quit IRC (Ping timeout: 370 seconds) [02:01] *** zenguy has joined #archiveteam-bs [02:02] *** dcmorton has quit IRC (Ping timeout: 370 seconds) [02:03] *** winr5r has joined #archiveteam-bs [02:03] *** winr4r has quit IRC (Read error: Operation timed out) [02:07] *** dcmorton has joined #archiveteam-bs [02:09] *** BlueMaxim has quit IRC (Read error: Operation timed out) [02:09] *** dcmorton has quit IRC (Excess Flood) [02:10] *** dcmorton has joined #archiveteam-bs [02:10] *** dcmorton has quit IRC (Excess Flood) [02:11] *** dcmorton has joined #archiveteam-bs [02:12] *** BlueMaxim has joined #archiveteam-bs [02:35] *** dcmorton has quit IRC (Ping timeout: 370 seconds) [02:41] *** dcmorton has joined #archiveteam-bs [02:56] *** Coderjoe has quit IRC (Read error: Operation timed out) [03:06] *** dcmorton has quit IRC (Ping timeout: 370 seconds) [03:12] *** dcmorton has joined #archiveteam-bs [03:21] *** Coderjoe has joined #archiveteam-bs [03:21] *** nickname_ has joined #archiveteam-bs [03:44] *** dcmorton has quit IRC (Excess Flood) [03:47] *** dcmorton has joined #archiveteam-bs [04:04] *** JesseW has joined #archiveteam-bs [04:16] *** dcmorton has quit IRC (Max SendQ exceeded) [04:19] *** dcmorton has joined #archiveteam-bs [04:40] *** DFJustin has quit IRC (Remote host closed the connection) [04:42] *** DFJustin has joined #archiveteam-bs [04:42] *** swebb sets mode: +o DFJustin [04:43] *** nickname_ has quit IRC (Read error: Operation timed out) [05:01] *** jut has joined #archiveteam-bs [05:01] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [05:10] *** Sk1d has joined #archiveteam-bs [05:11] *** hook54321 has quit IRC (Quit: Connection closed for inactivity) [05:30] *** antomati_ has quit IRC (Ping timeout: 258 seconds) [05:50] SketchCow: i'm up to 2015 with deadspin.com grab [05:51] i'm uploading 2013 to 2015 right now of it [05:51] i'm also grab 2016-01- to 2016-05 of deadspin.com [05:52] godane: btw, we're working on grabbing GSOC web pages right now in #archivebot -- your help could probably be useful [05:58] *** DoomTay has quit IRC (Quit: Page closed) [06:11] *** Cameron_D has quit IRC (Ping timeout: 370 seconds) [06:17] *** Cameron_D has joined #archiveteam-bs [06:22] *** BlueMaxim has quit IRC (Read error: Operation timed out) [06:24] *** BlueMaxim has joined #archiveteam-bs [06:25] *** aschmitz has quit IRC (Read error: Operation timed out) [06:48] *** JesseW has quit IRC (Ping timeout: 370 seconds) [06:58] *** aschmitz has joined #archiveteam-bs [07:18] *** vtyl has joined #archiveteam-bs [07:18] *** lytv has quit IRC (Ping timeout: 258 seconds) [07:58] *** tomwsmf-a has quit IRC (Read error: Operation timed out) [08:25] *** schbirid has joined #archiveteam-bs [09:01] *** antomatic has joined #archiveteam-bs [09:01] *** swebb sets mode: +o antomatic [09:02] *** antomatic has quit IRC (Client Quit) [09:09] *** antomatic has joined #archiveteam-bs [09:09] *** swebb sets mode: +o antomatic [09:16] *** dashcloud has quit IRC (Read error: Operation timed out) [09:20] *** dashcloud has joined #archiveteam-bs [10:45] *** anjacks0n has joined #archiveteam-bs [10:53] *** anjacks0n has quit IRC (anjacks0n) [11:15] *** signius has quit IRC (Ping timeout: 260 seconds) [11:22] *** signius has joined #archiveteam-bs [11:40] *** anjacks0n has joined #archiveteam-bs [11:41] *** anjacks0n has quit IRC (Client Quit) [11:48] *** hook54321 has joined #archiveteam-bs [12:24] *** jut has quit IRC (Read error: Connection reset by peer) [12:37] *** anjacks0n has joined #archiveteam-bs [12:53] *** Boppen has joined #archiveteam-bs [12:58] *** BlueMaxim has quit IRC (Quit: Leaving) [13:02] *** anjacks0n has quit IRC (anjacks0n) [13:23] *** anjacks0n has joined #archiveteam-bs [13:41] *** anjacks0n has quit IRC (anjacks0n) [13:42] *** anjacks0n has joined #archiveteam-bs [13:47] *** kristian_ has joined #archiveteam-bs [13:47] *** anjacks0n has quit IRC (anjacks0n) [13:48] *** anjacks0n has joined #archiveteam-bs [13:49] *** anjacks0n has quit IRC (Client Quit) [13:50] *** anjacks0n has joined #archiveteam-bs [13:50] *** anjacks0n has quit IRC (Client Quit) [13:54] So, don't spread to social media or post anywhere... [13:54] ...there's a new beta version of the next iteration of the Wayback machine. [13:57] https://web-beta.archive.org [13:57] Please consider yourselves invited to bang the living shit out of it. [13:58] If you hit something SUPER broken, mail Mark at mark@archive.org. [13:58] He's head of Wayback [14:00] cool [14:06] it shows the source of the crawl, that is awesome. https://wayback-beta.archive.org/web/20160312075544/http://www.whtimes.co.uk/home :) [14:07] is that good? it may lead to people going after WARCs to get them darked [14:08] wouldnt they just contact the IA and go "delete xxxx.co.uk please" [14:08] anyway, without the warc [14:09] eh, maybe they'd just throw robots.txt at it [14:09] dunno, just idle speculation :p [14:11] *** hook54321 has quit IRC (Quit: Connection closed for inactivity) [14:13] *** DoomTay has joined #archiveteam-bs [14:17] For everyone who wishes they could look at 1,500 of my Japan photos: https://www.flickr.com/photos/textfiles/albums/72157669136764700 [14:18] *** anjacks0n has joined #archiveteam-bs [14:25] So how's that GCI sweeping going in? [14:37] *** j08nY has joined #archiveteam-bs [14:38] *** anjacks0n has quit IRC (anjacks0n) [14:39] *** nickname_ has joined #archiveteam-bs [15:20] *** VADemon has joined #archiveteam-bs [15:40] *** Kenshin has quit IRC (Remote host closed the connection) [15:46] *** JesseW has joined #archiveteam-bs [15:48] *** kristian_ has quit IRC (Leaving) [15:48] *** nickname_ has quit IRC (Read error: Operation timed out) [15:50] *** Kenshin has joined #archiveteam-bs [15:53] *** nickname_ has joined #archiveteam-bs [16:11] *** JesseW has quit IRC (Ping timeout: 370 seconds) [16:13] *** RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue) [16:16] *** nickname_ has quit IRC (Read error: Connection reset by peer) [16:21] *** RichardG has joined #archiveteam-bs [17:17] SketchCow: minor UI bug that makes it nigh impossible to click the "why" items in the domain timeline because the chart is overlapping it.... should I report that via that email as well, or is that just for severe functionality breakage? [17:22] meh, found some breakage, I'll just combine it into one email [17:23] *** metalcamp has joined #archiveteam-bs [17:24] hm. Google is not visible due to robots.txt, in the current wayback machine? wut? [17:25] *** Stilett0 has quit IRC () [17:30] Works fine for me [17:34] * joepie91 plays QA engineer [17:34] up to 6 issues: 2 functionality issues, 2 UI quirks, 2 possible UI improvements [17:52] *** VADemon has quit IRC (Quit: left4dead) [17:53] *** VADemon has joined #archiveteam-bs [18:05] *** JW_work has quit IRC (Quit: Leaving.) [18:07] *** mutoso has quit IRC (Read error: Operation timed out) [18:08] *** JW_work has joined #archiveteam-bs [18:09] *** mutoso has joined #archiveteam-bs [18:23] *** ris has joined #archiveteam-bs [18:27] guys just installed latest update of wpull 2.0.1 [18:27] Traceback (most recent call last): [18:27] File "/usr/local/bin/grab-site", line 4, in [18:27] main.main() [18:27] File "/usr/local/lib/python3.4/site-packages/click/core.py", line 716, in __call__ [18:27] return self.main(*args, **kwargs) [18:27] File "/usr/local/lib/python3.4/site-packages/click/core.py", line 696, in main [18:27] rv = self.invoke(ctx) [18:28] File "/usr/local/lib/python3.4/site-packages/click/core.py", line 889, in invoke [18:28] return ctx.invoke(self.callback, **ctx.params) [18:28] File "/usr/local/lib/python3.4/site-packages/click/core.py", line 534, in invoke [18:28] return callback(*args, **kwargs) [18:28] File "/usr/local/lib/python3.4/site-packages/libgrabsite/main.py", line 359, in main [18:28] from wpull.app import Application [18:28] ImportError: No module named 'wpull.app' [18:28] any ideas? [18:28] already tried to re install it [18:43] leftover .pyc files ? [18:44] mmh [18:44] i'll do a rm -r *.pyc [18:44] this is not recursive [18:45] I usually do `find . -name '*.pyc' -delete` [18:46] (I missed the -r in your command, sorry) [18:47] nope [18:47] doesn't work [18:47] maybe it's grabsite? [18:48] i'll reboot archivebot and se if it has errors [18:53] *** arrith has quit IRC (Read error: Operation timed out) [19:03] ok archivebot works [19:03] wich means is a a grab.site bug [19:03] *grab-site [19:05] will file issue if someone can make a patch soon it would be amazing [19:06] SketchCow: deadspin.com is up to 2016-05 now and all uploaded [19:06] i'm grabbing gizmodo.com right now [19:11] https://github.com/ludios/grab-site/issues/92 [19:11] for who is interested [19:13] [20:46] (I missed the -r in your command, sorry) [19:13] still wouldn't make it recursive I think [19:14] since you're only selecting *.pyc and not all folders [19:14] you'd need something like... **/*.pyc? [19:14] yeah I thought of the expansion after that [19:14] ("zero or more path segments containing anything, followed by *.pyc") [19:15] anywatìys that wasn't the problem [19:15] well [19:15] for what i know ofc [19:49] *** Stiletto has joined #archiveteam-bs [20:09] Okay, I started getting WARCs of a site that is going to change in the 20th. What's the next step? How do I get these into Wayback Machine? [20:12] *** tomwsmf-a has joined #archiveteam-bs [20:15] *** schbirid has quit IRC (Quit: Leaving) [21:06] DoomTay: what site [21:06] portalgraphics.net [21:07] Yes, I sicced ArchiveBot on that site twice, though I remember that yipdw kinds threw a fit the second time [21:08] Plusi f there's another adavantage to the way I'm doing it now, cookie injection means the site will always come out in English [21:09] why not just use /en/ for english? [21:10] Hmm..lemme try that real quick.... [21:12] Agh, knew it. Did no good for http://www.portalgraphics.net/pg/illust/?image_id=90308. Putting "&lang=en" did no good either [21:14] What cookie are you using [21:14] (haven't had a look at them yet) [21:14] langset=en [21:17] *** decay has quit IRC (Read error: Operation timed out) [21:17] *** decay has joined #archiveteam-bs [21:17] *** Lord_Nigh has quit IRC (Read error: Operation timed out) [21:17] And why English? It looks like the website wants to server Japanese by default. [21:18] so I'm not sure if you'd want to force english [21:18] *** Lord_Nigh has joined #archiveteam-bs [21:18] maybe do both [21:18] Also, a normal grab probably won't grab http://www.portalgraphics.net/pg/illust/?image_id=90301 correctly [21:19] *** ring has quit IRC (Read error: Operation timed out) [21:19] *** luckcolor has quit IRC (Read error: Operation timed out) [21:19] *** SilSte has quit IRC (Read error: Operation timed out) [21:19] *** j08nY has quit IRC (Read error: Operation timed out) [21:19] *** MrRadar has quit IRC (Read error: Operation timed out) [21:19] it looks like the flash player loads http://www.portalgraphics.net/pg/movie/address.php?image%5Fid=90301 [21:19] Oh, right [21:19] *** MrRadar has joined #archiveteam-bs [21:19] *** luckcolor has joined #archiveteam-bs [21:19] Hmm... [21:19] *** chazchaz_ has quit IRC (Read error: Operation timed out) [21:19] which contains info, movie and image [21:19] *** chazchaz has joined #archiveteam-bs [21:19] Apart from that, language selection seems to be random [21:20] Hell, I don't know if it would even be possible to save both [21:20] *** Fletcher has quit IRC (Read error: Operation timed out) [21:20] And havethem both on Wayback Machine [21:20] *** alfie has quit IRC (Read error: Operation timed out) [21:20] Well, at least wget could pull them both [21:20] It is possible. But language selection in the Wayback Machine would be 'random' too [21:20] But it doesn't save the video items correctly [21:21] *** brayden has quit IRC (Read error: Operation timed out) [21:21] *** alfie has joined #archiveteam-bs [21:21] *** Fletcher has joined #archiveteam-bs [21:21] *** Baljem_ has quit IRC (Read error: Connection reset by peer) [21:22] *** ring has joined #archiveteam-bs [21:22] *** SilSte has joined #archiveteam-bs [21:22] *** Baljem has joined #archiveteam-bs [21:23] *** joepie91 has quit IRC (Excess Flood) [21:23] At least we know the movie file is at http://www.portalgraphics.net/data/movie/90000/90301.mp4 [21:23] The URL would be pretty easy to guess for others [21:24] I could probably fix the "not accessed properly" part on Wayback Machine with a userscript when it comes time [21:24] the movie path is in http://www.portalgraphics.net/pg/movie/address.php?image%5Fid=90301 [21:24] [21:24] http://www.portalgraphics.net/pg/movie/movie.php?movie_path=90000/90301 redirects to http://www.portalgraphics.net/data/movie/90000/90301.mp4 [21:25] *** joepie91 has joined #archiveteam-bs [21:25] *** midas sets mode: +o joepie91 [21:29] also, URLs like http://www.portalgraphics.net/pg/movie/pg_player/res_movie_data.php?mid=90301&lang=en won't be extracted by wget or wpull from http://www.portalgraphics.net/pg/illust/?image_id=90301 [21:29] *** Aranje has quit IRC (Quit: Three sheets to the wind) [21:29] they also contain URLs that should be grabbed [21:32] *** Stiletto has quit IRC (Ping timeout: 260 seconds) [21:33] Hmm [21:38] *** Aranje has joined #archiveteam-bs [21:51] *** hook54321 has joined #archiveteam-bs [21:56] *** Stiletto has joined #archiveteam-bs [22:05] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [22:16] https://en.wikipedia.org/wiki/Wikipedia:TWA/Portal this is pretty cool [22:21] Huh, there's http://www.portalgraphics.net/lang.php?lang=en&url=http://www.portalgraphics.net/pg/illust/?image_id=90301 [22:22] Okay, never mind, that completely failed [22:37] Still, getting that "information page" and fullsize image for each thing would be miles better than nothing [22:39] *** JesseW has joined #archiveteam-bs [22:46] Besides, the URL for each of those can be guessed easily for other images [22:46] *** Aranje has quit IRC (Quit: Three sheets to the wind) [22:51] when is portalgraphics closing [23:10] *** JesseW has quit IRC (Ping timeout: 370 seconds) [23:18] *** BlueMaxim has joined #archiveteam-bs [23:20] *** DoomTay has quit IRC (Ping timeout: 268 seconds) [23:39] *** DoomTay has joined #archiveteam-bs [23:39] Well, it's not closing per se, by on 7/20, they will be deleting accound user data and associated data, according to http://www.portalgraphics.net/pg/guide/news20160520.html