#archiveteam-bs 2017-03-18,Sat

↑back Search

Time Nickname Message
00:17 🔗 j08nY has quit IRC (Remote host closed the connection)
00:44 🔗 RichardG has quit IRC (Read error: Operation timed out)
00:44 🔗 RichardG has joined #archiveteam-bs
01:11 🔗 RichardG has quit IRC (Read error: Operation timed out)
01:11 🔗 RichardG has joined #archiveteam-bs
01:38 🔗 RichardG has quit IRC (Read error: Operation timed out)
01:38 🔗 RichardG has joined #archiveteam-bs
02:24 🔗 RichardG has quit IRC (Read error: Operation timed out)
02:24 🔗 RichardG has joined #archiveteam-bs
02:25 🔗 * Somebody2 just discovered that the Wayback Save feature works just fine through curl -- and presumably wrapped in cron.
02:25 🔗 Somebody2 If there's any single page you think would be nice if it was regularly scoped up into the Wayback Machine (and it's not blocked by robots.txt)...
02:25 🔗 Somebody2 that might be a nice way to do it.
02:38 🔗 schbirid2 has joined #archiveteam-bs
02:41 🔗 schbirid has quit IRC (Read error: Operation timed out)
02:48 🔗 pizzaiolo has left
02:49 🔗 bwn has quit IRC (Read error: Operation timed out)
02:59 🔗 kyounko has joined #archiveteam-bs
03:00 🔗 bwn has joined #archiveteam-bs
03:27 🔗 RichardG has quit IRC (Read error: Operation timed out)
03:27 🔗 RichardG has joined #archiveteam-bs
03:34 🔗 Matt_Lock Question: If we download and archive into IA a site that's blocked by robots.txt, is that site accessible through the Wayback Machine once the site goes down, or will the site only be accessible through WARCs for all eternity?
03:36 🔗 ndiddy has quit IRC ()
03:37 🔗 Frogging the latter, basically
03:39 🔗 Frogging Unless the site fixes its robots.txt before going offline
03:40 🔗 Matt_Lock aaw
03:41 🔗 Matt_Lock In our archiving, I know we ignore robots.txt, but do we include it in the WARCs for upload?
03:42 🔗 xmc we include it, yes
03:56 🔗 godane SketchCow: your russian pc world magazine say there in english: https://archive.org/details/PC-World_Magazine_2007-11
04:02 🔗 VADemon What's the best way to add metadata to them?
04:22 🔗 icedice has quit IRC (Quit: Leaving)
04:31 🔗 RichardG has quit IRC (Read error: Operation timed out)
04:31 🔗 RichardG has joined #archiveteam-bs
04:52 🔗 Somebody2 hardly "
04:52 🔗 Somebody2 ... "all eternity"
04:53 🔗 Somebody2 archive.org may change their mind
04:53 🔗 Somebody2 or someone else may buy the domain
04:55 🔗 Somebody2 and if the site owner hears about the archive, and contacts IA with a request, it can easily be included it in the Wayback Machine, or make unavailable via the archivebot collection, depending on what the site owner asks for.
05:10 🔗 Somebody2 ---
05:10 🔗 Somebody2 On another topic -- does anyone remember the online short story about 'the day a Google engineer decided to turn off all authentication for all Google services`?
05:11 🔗 Somebody2 Unsurprisingly, that description doesn't work very well to search for. :-)
05:31 🔗 Somebody2 https://archive.org/details/youtube-pWpqGKUG5yY <- neat
05:50 🔗 Sk1d has quit IRC (Ping timeout: 194 seconds)
05:56 🔗 Sk1d has joined #archiveteam-bs
06:03 🔗 Frogging Somebody2: yeah, there's room for optimism in such cases. i'm just too used to seeing the opposite happen at the hands of squatters et al >.>
06:22 🔗 RichardG has quit IRC (Read error: Operation timed out)
06:22 🔗 RichardG has joined #archiveteam-bs
06:49 🔗 RichardG has quit IRC (Read error: Operation timed out)
06:49 🔗 RichardG has joined #archiveteam-bs
07:18 🔗 RichardG has quit IRC (Read error: Operation timed out)
07:19 🔗 RichardG has joined #archiveteam-bs
07:40 🔗 j08nY has joined #archiveteam-bs
07:58 🔗 Somebody2 Frogging: agreed
08:03 🔗 GE has joined #archiveteam-bs
08:34 🔗 kristian_ has joined #archiveteam-bs
08:47 🔗 HCross2 Somebody2: 435k metadata and 429k parse
08:47 🔗 HCross2 In 17 hours
09:23 🔗 schbirid2 before i waste my day on crappy shareware tools, is there a straightforward way to create a windows _installation_ (not _installer_) on a usb stick from linux?
09:23 🔗 schbirid2 got a windows 10 iso and that's it
09:23 🔗 schbirid2 i just need a way to run discimagecreator...
09:24 🔗 schbirid2 alternatively a redump compatible ripper for CDs on linux :|
09:53 🔗 RichardG has quit IRC (Read error: Operation timed out)
09:53 🔗 RichardG has joined #archiveteam-bs
10:08 🔗 dashcloud has quit IRC (Read error: Operation timed out)
10:11 🔗 dashcloud has joined #archiveteam-bs
10:40 🔗 RichardG has quit IRC (Read error: Operation timed out)
10:40 🔗 RichardG has joined #archiveteam-bs
10:50 🔗 VADemon has quit IRC (Read error: Connection reset by peer)
11:07 🔗 RichardG_ has joined #archiveteam-bs
11:08 🔗 RichardG has quit IRC (Read error: Operation timed out)
11:21 🔗 godane i'm uploading more npr morning edition
11:21 🔗 godane from year 2008
11:41 🔗 JAA has joined #archiveteam-bs
12:08 🔗 joepie91 Somebody2: that was a Tom Scott video wasn't it?
12:08 🔗 joepie91 Somebody2: https://www.youtube.com/playlist?list=PL96C35uN7xGI08uVWv9iEX-maKrBhEcIn
12:08 🔗 joepie91 specifically
12:08 🔗 joepie91 https://www.youtube.com/watch?v=y4GB_NDU43Q&list=PL96C35uN7xGI08uVWv9iEX-maKrBhEcIn&index=8
12:09 🔗 joepie91 "Single Point of Failure: The (Fictional) Day Google Forgot To Check Passwords"
12:09 🔗 JAA Somebody2, regarding Wayback Machine save and curl: I'm not sure how curl handles images and stylesheets, but I don't think it loads them by default, meaning that the save will be incomplete. For obvious reasons, it also won't handle anything generated by JavaScript. Headless chromium might be a better option.
12:15 🔗 schbirid2 i think Somebody2 meant the /save/ url, that works serverside at IA right?
12:16 🔗 JAA It does. It may even handle images, stylesheets, etc. (not sure about that), but without a JavaScript interpreter, it certainly won't handle any dynamic content. Yes, the Wayback Machine isn't very good at that anyway, but some things do work.
12:17 🔗 schbirid2 ah
12:23 🔗 schbirid2 shit, i accidentaly hit ctrl-c in a logn running wpull
12:24 🔗 schbirid2 is there a change to resume while it is at the "INFO Stopping once all requests complete... INFO Interrupt again to force stopping immediately." stage?
12:29 🔗 BlueMaxim has quit IRC (Quit: Leaving)
12:38 🔗 godane SketchCow: you may want to look at this: https://computerarchive.org/
12:38 🔗 godane it may have stuff you don't have but i don't know
12:41 🔗 godane SketchCow: here is 64 tape computing: https://computerarchive.org/files/comp/magazines/64-tape-computing/
12:47 🔗 username1 has joined #archiveteam-bs
12:48 🔗 JAA schbirid2: Based on a quick glance at the wpull source code, that doesn't seem to be possible
12:50 🔗 schbirid2 has quit IRC (Read error: Operation timed out)
13:14 🔗 GE has quit IRC (Remote host closed the connection)
13:14 🔗 username1 dang
13:14 🔗 username1 thanks
13:14 🔗 username1 is now known as schbirid
13:15 🔗 schbirid well i saw some garbage to add to the reject regex anyways :)
13:19 🔗 SketchCow godane: Interesting. I will compare some of his work.
13:20 🔗 kristian_ has quit IRC (Quit: Leaving)
13:46 🔗 yuitimoth has quit IRC (Remote host closed the connection)
13:46 🔗 yuitimoth has joined #archiveteam-bs
14:09 🔗 pnJay has joined #archiveteam-bs
14:46 🔗 ndiddy has joined #archiveteam-bs
14:52 🔗 godane SketchCow: here is another magazine i could find in the archives: https://computerarchive.org/files/comp/magazines/video-toaster-user/
14:53 🔗 godane i'm not say it couldn't be a dark collection i would think less likely
14:59 🔗 GE has joined #archiveteam-bs
15:06 🔗 SketchCow I may go in the front door with this guy.
15:12 🔗 godane and more disks for your emulators: https://computerarchive.org/files/comp/disks/commodore/disks/
15:14 🔗 godane anyways i'm up to 2008-03-31 with npr morning edition
15:15 🔗 godane btw i'm grabbing network world from google books
15:16 🔗 godane i think they did alot better job with network world then with infoworld and pc magazine
15:31 🔗 ndiddy has quit IRC ()
17:31 🔗 odemg has joined #archiveteam-bs
17:36 🔗 schbirid2 has joined #archiveteam-bs
17:39 🔗 schbirid has quit IRC (Read error: Operation timed out)
17:43 🔗 ndiddy has joined #archiveteam-bs
17:54 🔗 JensRex https://www.ssllabs.com/ssltest/analyze.html?d=tracker.archiveteam.org
17:54 🔗 JensRex B
17:54 🔗 Frogging cool
18:01 🔗 pizzaiolo has joined #archiveteam-bs
18:03 🔗 mls has quit IRC (Ping timeout: 250 seconds)
18:21 🔗 mls has joined #archiveteam-bs
18:32 🔗 Stilett0 has quit IRC (Read error: Operation timed out)
18:36 🔗 j08nY has quit IRC (Quit: Leaving)
18:37 🔗 TobiX JensRex: I just realized Debian fixed WeakDH in security updates for Apache, but not nginx... (That version of nginx seems to be Debian stable, right)
18:38 🔗 VADemon has joined #archiveteam-bs
19:02 🔗 RichardG_ is now known as RichardG
19:12 🔗 odemg has quit IRC (Remote host closed the connection)
19:13 🔗 odemg has joined #archiveteam-bs
19:29 🔗 odemg has quit IRC (Remote host closed the connection)
19:30 🔗 GE has quit IRC (Remote host closed the connection)
19:36 🔗 odemg has joined #archiveteam-bs
19:39 🔗 JensRex TobiX: You can just create DH primes yourself.
19:40 🔗 JensRex TobiX: It's probably a Good Idea to do that anyway.
20:04 🔗 Frogging openssl dhparam -out dhparams.pem 2048
20:15 🔗 Stilett0 has joined #archiveteam-bs
20:19 🔗 odemg has quit IRC (Remote host closed the connection)
20:20 🔗 odemg has joined #archiveteam-bs
20:46 🔗 JensRex JAA: https://github.com/JensRex/appnet-grab/blob/master/get-wget-lua.sh
20:46 🔗 JensRex Now with more file and less pipe.
20:46 🔗 dashcloud schbirid2: Are you able to use wine to run discimagecreator? If not, Microsoft does make free VMs available, and you could run the program inside the VM
20:51 🔗 GE has joined #archiveteam-bs
21:01 🔗 GE_ has joined #archiveteam-bs
21:03 🔗 GE has quit IRC (Ping timeout: 255 seconds)
21:03 🔗 GE_ is now known as GE
21:19 🔗 Somebody2 joepie91: YES! That was it. I found a transcript somewhere...
21:25 🔗 Somebody2 http://www.allreadable.com/a6da4fyv
21:30 🔗 DFJustin has quit IRC (Remote host closed the connection)
21:35 🔗 DFJustin has joined #archiveteam-bs
21:38 🔗 joepie91 Somebody2: I recommend pretty much everything on Tom Scott's channel, fwiw :)
21:43 🔗 Somebody2 Yes Tom is another excellent Scott-on-the-Internet. :-)
21:43 🔗 odemg has quit IRC (Remote host closed the connection)
21:44 🔗 odemg has joined #archiveteam-bs
21:45 🔗 Somebody2 HCross2: re: census progress; ok, good to know. It's not urgent -- if it takes a week rather than a day, because of the lack of iamine... shrug.
21:46 🔗 HCross2 Yea. I'll let it trundle along
21:48 🔗 Somebody2 JAA: yes, I was referring to web.archive.org/save/ -- which does work server-side, I think. Got an example of a dynamic page I can try?
21:48 🔗 BlueMaxim has joined #archiveteam-bs
22:19 🔗 Stilett0 has quit IRC (Ping timeout: 244 seconds)
22:20 🔗 JAA JensRex: looks good to me. You could use sha256sum --check instead of sha256sum | diff, but that doesn't really make a difference. Maybe keep the temporary file/directory if there is a mismatch so it can be investigated?
22:21 🔗 JensRex JAA: --check requires a checksum file to check against.
22:23 🔗 icedice has joined #archiveteam-bs
22:23 🔗 JAA Somebody2: not right now, I'd have to dig a bit. Also, I don't think that /save saves the embedded media server-side; if anything, it might rewrite those links also to /save URLs. I could be wrong though.
22:24 🔗 JAA JensRex: echo "$checksum $filename" | sha256sum --check - or sha256sum --check <(echo "$checksum $filename") should work
22:25 🔗 Somebody2 JAA: yeah, that makes sense. The page I'm saving is a pretty plain one, though, so I think it should work fine.
22:25 🔗 JAA Yeah, in that case, it'll probably be fine
22:26 🔗 JensRex JAA: Oh right... duh.
22:27 🔗 JAA Actually, the trailing dash isn't necessary in the first version, as sha256sum reads from stdin by default. The --status option might also be interesting.
22:28 🔗 JensRex sha256 outputs filename as "-" when checking stdin.
22:29 🔗 JAA Only when you run sha256sum in the normal mode for hashing data, but not when checking
22:29 🔗 Somebody2 JAA: you do seem to be right, though -- https://web.archive.org/web/20170318222720/https:/twitter.com/textfiles/status/843092127705944064 <- saved via curl
22:30 🔗 Somebody2 vs https://web.archive.org/web/20170318222856/https:/twitter.com/textfiles/status/843092127705944064 <- saved via Firefox
22:30 🔗 JAA JensRex: When checking, the output will just be "file: OK" with "file" coming from the input "hash file"
22:31 🔗 JAA Somebody2: yeah, it'll work fine for very common pages like Twitter, since the images, stylesheets etc. will already be archived from other saves
22:31 🔗 JensRex Okay. I'll look into that.
22:33 🔗 JAA Somebody2: also, I think that the images on that curl'ed save were only archived when you accessed the saved page through the browser. (When you access an image through the Wayback Machine which isn't archived yet, it's archived transparently.)
22:37 🔗 JAA Note the time in the URL in the curl link you posted vs. https://web.archive.org/web/20170318222858im_/https://pbs.twimg.com/media/C7NDu1eXgAEdIsI.jpg
22:40 🔗 JAA (I'm really tired and need to go to bed now; I'll read the logs though)
22:40 🔗 JAA has quit IRC (Quit: Page closed)
22:42 🔗 schbirid2 dashcloud: i never thought about using lowlevel stuff like that with wine. is there any chance it could work?
22:42 🔗 schbirid2 wine or vm
22:42 🔗 dashcloud wine possibly, VM very likely if your client supports USB passthrough
22:43 🔗 dashcloud otherwise, if you have physical access to a spare machine with a CD drive, just boot the ISO, and build the USB drive from there- it takes longer, but doesn't require any software
22:44 🔗 schbirid2 oh nice
22:44 🔗 dashcloud that may not actually work- you might have to do an install first, which isn't terribly convenient
22:45 🔗 schbirid2 my cd drive is actually sata so no usb passthrough sohuld be needed, right?
22:45 🔗 dashcloud don't think so- just make sure the drive is attached to the VM and not the host
22:46 🔗 dashcloud although I can say I've never burned a physical disc from a VM- make ISO's, and mounted CDs inside one, but not burn a non-ISO CD
22:46 🔗 dashcloud if you need a prebuilt VM, visit modern.ie
22:47 🔗 dashcloud if the version you want isn't there, visit the page in the wayback machine, and go back until you find the version you want
22:48 🔗 schbirid2 <3
22:48 🔗 schbirid2 first i have to find a disc i would not mind losing, havent used the drive ever before
23:09 🔗 schbirid2 no luck in wine fixme:ntdll:server_ioctl_file Unsupported ioctl 2d1400 (device=2d access=0 func=500 method=0)
23:09 🔗 schbirid2 will grab a win7 vm
23:48 🔗 odemg has quit IRC (Remote host closed the connection)
23:55 🔗 GE has quit IRC (Remote host closed the connection)

irclogger-viewer