[00:17] *** j08nY has quit IRC (Remote host closed the connection) [00:44] *** RichardG has quit IRC (Read error: Operation timed out) [00:44] *** RichardG has joined #archiveteam-bs [01:11] *** RichardG has quit IRC (Read error: Operation timed out) [01:11] *** RichardG has joined #archiveteam-bs [01:38] *** RichardG has quit IRC (Read error: Operation timed out) [01:38] *** RichardG has joined #archiveteam-bs [02:24] *** RichardG has quit IRC (Read error: Operation timed out) [02:24] *** RichardG has joined #archiveteam-bs [02:25] * Somebody2 just discovered that the Wayback Save feature works just fine through curl -- and presumably wrapped in cron. [02:25] If there's any single page you think would be nice if it was regularly scoped up into the Wayback Machine (and it's not blocked by robots.txt)... [02:25] that might be a nice way to do it. [02:38] *** schbirid2 has joined #archiveteam-bs [02:41] *** schbirid has quit IRC (Read error: Operation timed out) [02:48] *** pizzaiolo has left [02:49] *** bwn has quit IRC (Read error: Operation timed out) [02:59] *** kyounko has joined #archiveteam-bs [03:00] *** bwn has joined #archiveteam-bs [03:27] *** RichardG has quit IRC (Read error: Operation timed out) [03:27] *** RichardG has joined #archiveteam-bs [03:34] Question: If we download and archive into IA a site that's blocked by robots.txt, is that site accessible through the Wayback Machine once the site goes down, or will the site only be accessible through WARCs for all eternity? [03:36] *** ndiddy has quit IRC () [03:37] the latter, basically [03:39] Unless the site fixes its robots.txt before going offline [03:40] aaw [03:41] In our archiving, I know we ignore robots.txt, but do we include it in the WARCs for upload? [03:42] we include it, yes [03:56] SketchCow: your russian pc world magazine say there in english: https://archive.org/details/PC-World_Magazine_2007-11 [04:02] What's the best way to add metadata to them? [04:22] *** icedice has quit IRC (Quit: Leaving) [04:31] *** RichardG has quit IRC (Read error: Operation timed out) [04:31] *** RichardG has joined #archiveteam-bs [04:52] hardly " [04:52] ... "all eternity" [04:53] archive.org may change their mind [04:53] or someone else may buy the domain [04:55] and if the site owner hears about the archive, and contacts IA with a request, it can easily be included it in the Wayback Machine, or make unavailable via the archivebot collection, depending on what the site owner asks for. [05:10] --- [05:10] On another topic -- does anyone remember the online short story about 'the day a Google engineer decided to turn off all authentication for all Google services`? [05:11] Unsurprisingly, that description doesn't work very well to search for. :-) [05:31] https://archive.org/details/youtube-pWpqGKUG5yY <- neat [05:50] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [05:56] *** Sk1d has joined #archiveteam-bs [06:03] Somebody2: yeah, there's room for optimism in such cases. i'm just too used to seeing the opposite happen at the hands of squatters et al >.> [06:22] *** RichardG has quit IRC (Read error: Operation timed out) [06:22] *** RichardG has joined #archiveteam-bs [06:49] *** RichardG has quit IRC (Read error: Operation timed out) [06:49] *** RichardG has joined #archiveteam-bs [07:18] *** RichardG has quit IRC (Read error: Operation timed out) [07:19] *** RichardG has joined #archiveteam-bs [07:40] *** j08nY has joined #archiveteam-bs [07:58] Frogging: agreed [08:03] *** GE has joined #archiveteam-bs [08:34] *** kristian_ has joined #archiveteam-bs [08:47] Somebody2: 435k metadata and 429k parse [08:47] In 17 hours [09:23] before i waste my day on crappy shareware tools, is there a straightforward way to create a windows _installation_ (not _installer_) on a usb stick from linux? [09:23] got a windows 10 iso and that's it [09:23] i just need a way to run discimagecreator... [09:24] alternatively a redump compatible ripper for CDs on linux :| [09:53] *** RichardG has quit IRC (Read error: Operation timed out) [09:53] *** RichardG has joined #archiveteam-bs [10:08] *** dashcloud has quit IRC (Read error: Operation timed out) [10:11] *** dashcloud has joined #archiveteam-bs [10:40] *** RichardG has quit IRC (Read error: Operation timed out) [10:40] *** RichardG has joined #archiveteam-bs [10:50] *** VADemon has quit IRC (Read error: Connection reset by peer) [11:07] *** RichardG_ has joined #archiveteam-bs [11:08] *** RichardG has quit IRC (Read error: Operation timed out) [11:21] i'm uploading more npr morning edition [11:21] from year 2008 [11:41] *** JAA has joined #archiveteam-bs [12:08] Somebody2: that was a Tom Scott video wasn't it? [12:08] Somebody2: https://www.youtube.com/playlist?list=PL96C35uN7xGI08uVWv9iEX-maKrBhEcIn [12:08] specifically [12:08] https://www.youtube.com/watch?v=y4GB_NDU43Q&list=PL96C35uN7xGI08uVWv9iEX-maKrBhEcIn&index=8 [12:09] "Single Point of Failure: The (Fictional) Day Google Forgot To Check Passwords" [12:09] Somebody2, regarding Wayback Machine save and curl: I'm not sure how curl handles images and stylesheets, but I don't think it loads them by default, meaning that the save will be incomplete. For obvious reasons, it also won't handle anything generated by JavaScript. Headless chromium might be a better option. [12:15] i think Somebody2 meant the /save/ url, that works serverside at IA right? [12:16] It does. It may even handle images, stylesheets, etc. (not sure about that), but without a JavaScript interpreter, it certainly won't handle any dynamic content. Yes, the Wayback Machine isn't very good at that anyway, but some things do work. [12:17] ah [12:23] shit, i accidentaly hit ctrl-c in a logn running wpull [12:24] is there a change to resume while it is at the "INFO Stopping once all requests complete... INFO Interrupt again to force stopping immediately." stage? [12:29] *** BlueMaxim has quit IRC (Quit: Leaving) [12:38] SketchCow: you may want to look at this: https://computerarchive.org/ [12:38] it may have stuff you don't have but i don't know [12:41] SketchCow: here is 64 tape computing: https://computerarchive.org/files/comp/magazines/64-tape-computing/ [12:47] *** username1 has joined #archiveteam-bs [12:48] schbirid2: Based on a quick glance at the wpull source code, that doesn't seem to be possible [12:50] *** schbirid2 has quit IRC (Read error: Operation timed out) [13:14] *** GE has quit IRC (Remote host closed the connection) [13:14] dang [13:14] thanks [13:14] *** username1 is now known as schbirid [13:15] well i saw some garbage to add to the reject regex anyways :) [13:19] godane: Interesting. I will compare some of his work. [13:20] *** kristian_ has quit IRC (Quit: Leaving) [13:46] *** yuitimoth has quit IRC (Remote host closed the connection) [13:46] *** yuitimoth has joined #archiveteam-bs [14:09] *** pnJay has joined #archiveteam-bs [14:46] *** ndiddy has joined #archiveteam-bs [14:52] SketchCow: here is another magazine i could find in the archives: https://computerarchive.org/files/comp/magazines/video-toaster-user/ [14:53] i'm not say it couldn't be a dark collection i would think less likely [14:59] *** GE has joined #archiveteam-bs [15:06] I may go in the front door with this guy. [15:12] and more disks for your emulators: https://computerarchive.org/files/comp/disks/commodore/disks/ [15:14] anyways i'm up to 2008-03-31 with npr morning edition [15:15] btw i'm grabbing network world from google books [15:16] i think they did alot better job with network world then with infoworld and pc magazine [15:31] *** ndiddy has quit IRC () [17:31] *** odemg has joined #archiveteam-bs [17:36] *** schbirid2 has joined #archiveteam-bs [17:39] *** schbirid has quit IRC (Read error: Operation timed out) [17:43] *** ndiddy has joined #archiveteam-bs [17:54] https://www.ssllabs.com/ssltest/analyze.html?d=tracker.archiveteam.org [17:54] B [17:54] cool [18:01] *** pizzaiolo has joined #archiveteam-bs [18:03] *** mls has quit IRC (Ping timeout: 250 seconds) [18:21] *** mls has joined #archiveteam-bs [18:32] *** Stilett0 has quit IRC (Read error: Operation timed out) [18:36] *** j08nY has quit IRC (Quit: Leaving) [18:37] JensRex: I just realized Debian fixed WeakDH in security updates for Apache, but not nginx... (That version of nginx seems to be Debian stable, right) [18:38] *** VADemon has joined #archiveteam-bs [19:02] *** RichardG_ is now known as RichardG [19:12] *** odemg has quit IRC (Remote host closed the connection) [19:13] *** odemg has joined #archiveteam-bs [19:29] *** odemg has quit IRC (Remote host closed the connection) [19:30] *** GE has quit IRC (Remote host closed the connection) [19:36] *** odemg has joined #archiveteam-bs [19:39] TobiX: You can just create DH primes yourself. [19:40] TobiX: It's probably a Good Idea to do that anyway. [20:04] openssl dhparam -out dhparams.pem 2048 [20:15] *** Stilett0 has joined #archiveteam-bs [20:19] *** odemg has quit IRC (Remote host closed the connection) [20:20] *** odemg has joined #archiveteam-bs [20:46] JAA: https://github.com/JensRex/appnet-grab/blob/master/get-wget-lua.sh [20:46] Now with more file and less pipe. [20:46] schbirid2: Are you able to use wine to run discimagecreator? If not, Microsoft does make free VMs available, and you could run the program inside the VM [20:51] *** GE has joined #archiveteam-bs [21:01] *** GE_ has joined #archiveteam-bs [21:03] *** GE has quit IRC (Ping timeout: 255 seconds) [21:03] *** GE_ is now known as GE [21:19] joepie91: YES! That was it. I found a transcript somewhere... [21:25] http://www.allreadable.com/a6da4fyv [21:30] *** DFJustin has quit IRC (Remote host closed the connection) [21:35] *** DFJustin has joined #archiveteam-bs [21:38] Somebody2: I recommend pretty much everything on Tom Scott's channel, fwiw :) [21:43] Yes Tom is another excellent Scott-on-the-Internet. :-) [21:43] *** odemg has quit IRC (Remote host closed the connection) [21:44] *** odemg has joined #archiveteam-bs [21:45] HCross2: re: census progress; ok, good to know. It's not urgent -- if it takes a week rather than a day, because of the lack of iamine... shrug. [21:46] Yea. I'll let it trundle along [21:48] JAA: yes, I was referring to web.archive.org/save/ -- which does work server-side, I think. Got an example of a dynamic page I can try? [21:48] *** BlueMaxim has joined #archiveteam-bs [22:19] *** Stilett0 has quit IRC (Ping timeout: 244 seconds) [22:20] JensRex: looks good to me. You could use sha256sum --check instead of sha256sum | diff, but that doesn't really make a difference. Maybe keep the temporary file/directory if there is a mismatch so it can be investigated? [22:21] JAA: --check requires a checksum file to check against. [22:23] *** icedice has joined #archiveteam-bs [22:23] Somebody2: not right now, I'd have to dig a bit. Also, I don't think that /save saves the embedded media server-side; if anything, it might rewrite those links also to /save URLs. I could be wrong though. [22:24] JensRex: echo "$checksum $filename" | sha256sum --check - or sha256sum --check <(echo "$checksum $filename") should work [22:25] JAA: yeah, that makes sense. The page I'm saving is a pretty plain one, though, so I think it should work fine. [22:25] Yeah, in that case, it'll probably be fine [22:26] JAA: Oh right... duh. [22:27] Actually, the trailing dash isn't necessary in the first version, as sha256sum reads from stdin by default. The --status option might also be interesting. [22:28] sha256 outputs filename as "-" when checking stdin. [22:29] Only when you run sha256sum in the normal mode for hashing data, but not when checking [22:29] JAA: you do seem to be right, though -- https://web.archive.org/web/20170318222720/https:/twitter.com/textfiles/status/843092127705944064 <- saved via curl [22:30] vs https://web.archive.org/web/20170318222856/https:/twitter.com/textfiles/status/843092127705944064 <- saved via Firefox [22:30] JensRex: When checking, the output will just be "file: OK" with "file" coming from the input "hash file" [22:31] Somebody2: yeah, it'll work fine for very common pages like Twitter, since the images, stylesheets etc. will already be archived from other saves [22:31] Okay. I'll look into that. [22:33] Somebody2: also, I think that the images on that curl'ed save were only archived when you accessed the saved page through the browser. (When you access an image through the Wayback Machine which isn't archived yet, it's archived transparently.) [22:37] Note the time in the URL in the curl link you posted vs. https://web.archive.org/web/20170318222858im_/https://pbs.twimg.com/media/C7NDu1eXgAEdIsI.jpg [22:40] (I'm really tired and need to go to bed now; I'll read the logs though) [22:40] *** JAA has quit IRC (Quit: Page closed) [22:42] dashcloud: i never thought about using lowlevel stuff like that with wine. is there any chance it could work? [22:42] wine or vm [22:42] wine possibly, VM very likely if your client supports USB passthrough [22:43] otherwise, if you have physical access to a spare machine with a CD drive, just boot the ISO, and build the USB drive from there- it takes longer, but doesn't require any software [22:44] oh nice [22:44] that may not actually work- you might have to do an install first, which isn't terribly convenient [22:45] my cd drive is actually sata so no usb passthrough sohuld be needed, right? [22:45] don't think so- just make sure the drive is attached to the VM and not the host [22:46] although I can say I've never burned a physical disc from a VM- make ISO's, and mounted CDs inside one, but not burn a non-ISO CD [22:46] if you need a prebuilt VM, visit modern.ie [22:47] if the version you want isn't there, visit the page in the wayback machine, and go back until you find the version you want [22:48] <3 [22:48] first i have to find a disc i would not mind losing, havent used the drive ever before [23:09] no luck in wine fixme:ntdll:server_ioctl_file Unsupported ioctl 2d1400 (device=2d access=0 func=500 method=0) [23:09] will grab a win7 vm [23:48] *** odemg has quit IRC (Remote host closed the connection) [23:55] *** GE has quit IRC (Remote host closed the connection)