#archiveteam-bs 2019-10-27,Sun

↑back Search

Time Nickname Message
00:01 🔗 benjinss has quit IRC (Quit: Leaving)
00:03 🔗 benjins has joined #archiveteam-bs
00:04 🔗 icedice has quit IRC (Quit: Leaving)
00:06 🔗 SketchCow So I am about to start uploading the next large batch of items off FOS into archive. this will make the machine mostly empty.
00:06 🔗 SketchCow Then I will go through and figure out where the last terabyte or so of this-or-that is hiding.
00:07 🔗 SketchCow At some point, maybe, it'll be empty enough that it can do duties, but I also have people wanting to give me terabyte donations
00:27 🔗 coderobe has quit IRC (Ping timeout: 252 seconds)
00:27 🔗 mc2 has quit IRC (Read error: Operation timed out)
00:28 🔗 mc2 has joined #archiveteam-bs
00:29 🔗 paul2520 has quit IRC (Read error: Operation timed out)
00:29 🔗 closure has quit IRC (Read error: Operation timed out)
00:29 🔗 closure has joined #archiveteam-bs
00:29 🔗 tonsofpcs has quit IRC (Read error: Operation timed out)
00:32 🔗 robogoat has joined #archiveteam-bs
00:32 🔗 eythian_ has joined #archiveteam-bs
00:32 🔗 eythian has quit IRC (Read error: Operation timed out)
00:33 🔗 antomati_ has joined #archiveteam-bs
00:33 🔗 chazchaz_ has joined #archiveteam-bs
00:34 🔗 robogoat_ has quit IRC (Ping timeout: 496 seconds)
00:35 🔗 Ryz has quit IRC (Read error: Connection reset by peer)
00:35 🔗 paul2520 has joined #archiveteam-bs
00:36 🔗 Ryz has joined #archiveteam-bs
00:37 🔗 chazchaz has quit IRC (Read error: Operation timed out)
00:37 🔗 Hooloovoo has quit IRC (Read error: Connection reset by peer)
00:37 🔗 Datechnom I've offered JAA to donate space and happy to help out where i can SketchCow. I'm working with markedL and betamax atm to archive yahoo but always have spare space and bandwidth to throw around
00:37 🔗 pikami_ has joined #archiveteam-bs
00:38 🔗 ppsym has joined #archiveteam-bs
00:38 🔗 Fusl____ sets mode: +o ppsym
00:38 🔗 Fusl sets mode: +o ppsym
00:38 🔗 Fusl_ sets mode: +o ppsym
00:38 🔗 RichardG_ has joined #archiveteam-bs
00:38 🔗 PurpleSym has quit IRC (Ping timeout: 496 seconds)
00:38 🔗 ppsym is now known as PurpleSym
00:38 🔗 antomatic has quit IRC (Read error: Operation timed out)
00:39 🔗 Hooloovoo has joined #archiveteam-bs
00:40 🔗 RichardG has quit IRC (Ping timeout: 496 seconds)
00:40 🔗 _niklas has quit IRC (Ping timeout: 744 seconds)
00:41 🔗 markedL has quit IRC (Ping timeout: 496 seconds)
00:41 🔗 pikami has quit IRC (Ping timeout: 496 seconds)
00:42 🔗 markedL has joined #archiveteam-bs
00:42 🔗 Fionera has quit IRC (Ping timeout: 744 seconds)
00:43 🔗 Fionera has joined #archiveteam-bs
00:44 🔗 nyany__ has joined #archiveteam-bs
00:44 🔗 nyany_ has quit IRC (Remote host closed the connection)
00:45 🔗 markedL has quit IRC (Client Quit)
00:45 🔗 tonsofpcs has joined #archiveteam-bs
00:46 🔗 paul2520 has quit IRC (Ping timeout: 496 seconds)
00:47 🔗 paul2520 has joined #archiveteam-bs
00:48 🔗 _niklas has joined #archiveteam-bs
00:51 🔗 benjins has quit IRC (Remote host closed the connection)
00:53 🔗 benjins has joined #archiveteam-bs
01:05 🔗 Raccoon has quit IRC (Ping timeout: 745 seconds)
01:29 🔗 d5f4a3622 has quit IRC (Quit: https://i.imgur.com/xacQ09F.mp4)
01:31 🔗 d5f4a3622 has joined #archiveteam-bs
01:40 🔗 sHATNER has quit IRC (Read error: Operation timed out)
01:40 🔗 sHATNER has joined #archiveteam-bs
01:46 🔗 MrRadar2 has quit IRC (Read error: Connection reset by peer)
01:48 🔗 anarcat has quit IRC (Read error: Connection reset by peer)
01:50 🔗 coderobe has joined #archiveteam-bs
01:52 🔗 Dallas has quit IRC (Ping timeout: 864 seconds)
01:52 🔗 brayden has quit IRC (Read error: Operation timed out)
01:53 🔗 Dallas has joined #archiveteam-bs
01:53 🔗 brayden has joined #archiveteam-bs
01:56 🔗 anarcat has joined #archiveteam-bs
01:56 🔗 anarcat has quit IRC (Handshake flooding)
01:58 🔗 killsushi has quit IRC (Quit: Leaving)
02:07 🔗 godane has quit IRC (Ping timeout: 864 seconds)
02:07 🔗 godane has joined #archiveteam-bs
02:11 🔗 anarcat has joined #archiveteam-bs
02:15 🔗 brayden has quit IRC (Ping timeout: 864 seconds)
02:19 🔗 brayden has joined #archiveteam-bs
02:20 🔗 MrRadar2 has joined #archiveteam-bs
02:24 🔗 manjaro-u has quit IRC (Read error: Operation timed out)
03:26 🔗 fredgido_ has joined #archiveteam-bs
03:26 🔗 HashbangI has quit IRC (Read error: Operation timed out)
03:27 🔗 PhrackD has quit IRC (Read error: Connection reset by peer)
03:28 🔗 c4rc4s has quit IRC (Read error: Operation timed out)
03:28 🔗 RichardG_ has quit IRC (Read error: Operation timed out)
03:29 🔗 RichardG has joined #archiveteam-bs
03:29 🔗 klg_ has joined #archiveteam-bs
03:29 🔗 ShellyRol has quit IRC (Read error: Operation timed out)
03:29 🔗 Mayonaise has quit IRC (Read error: Operation timed out)
03:30 🔗 ShellyRol has joined #archiveteam-bs
03:30 🔗 Mateon1 has quit IRC (Read error: Operation timed out)
03:30 🔗 fredgido has quit IRC (Read error: Operation timed out)
03:31 🔗 Mateon1 has joined #archiveteam-bs
03:31 🔗 Mayonaise has joined #archiveteam-bs
03:31 🔗 systwi has quit IRC (Read error: Operation timed out)
03:34 🔗 klg has quit IRC (Read error: Operation timed out)
04:04 🔗 HashbangI has joined #archiveteam-bs
04:04 🔗 systwi has joined #archiveteam-bs
04:13 🔗 PhrackD has joined #archiveteam-bs
04:24 🔗 markedL has joined #archiveteam-bs
04:46 🔗 odemgi_ has joined #archiveteam-bs
04:51 🔗 odemgi has quit IRC (Read error: Operation timed out)
04:54 🔗 qw3rty has joined #archiveteam-bs
04:57 🔗 odemg has quit IRC (Ping timeout: 745 seconds)
05:01 🔗 odemg has joined #archiveteam-bs
05:01 🔗 qw3rty2 has quit IRC (Ping timeout: 745 seconds)
05:57 🔗 jodizzle Looks like the mips crawl of the royal society articles is done. If either JAA or Fusl could do another crawl with the ones that 403'd when they get a chance, that would be helpful.
05:57 🔗 jodizzle Thanks for doing this.
08:07 🔗 manjaro-u has joined #archiveteam-bs
08:08 🔗 d5f4a3622 has quit IRC (Ping timeout: 496 seconds)
08:34 🔗 d5f4a3622 has joined #archiveteam-bs
10:23 🔗 omglolba- has quit IRC (Ping timeout: 258 seconds)
10:25 🔗 omglolbah has joined #archiveteam-bs
11:03 🔗 Stiletto has quit IRC (Read error: Operation timed out)
11:05 🔗 Stiletto has joined #archiveteam-bs
11:06 🔗 BlueMax has quit IRC (Quit: Leaving)
11:25 🔗 bluefoo has joined #archiveteam-bs
11:28 🔗 d5f4a3622 has quit IRC (Ping timeout: 864 seconds)
11:52 🔗 nanoadmin has joined #archiveteam-bs
11:56 🔗 VerifiedJ has joined #archiveteam-bs
12:01 🔗 nanoadmin has quit IRC (Quit: Leaving)
12:07 🔗 Maylay has quit IRC (Read error: Operation timed out)
12:10 🔗 Maylay has joined #archiveteam-bs
14:02 🔗 manjaro-u has quit IRC (Quit: Konversation terminated!)
14:20 🔗 wyatt8740 has quit IRC (Read error: Operation timed out)
14:21 🔗 wyatt8740 has joined #archiveteam-bs
14:46 🔗 Raccoon has joined #archiveteam-bs
15:50 🔗 Zerote_ has quit IRC (Read error: Operation timed out)
16:05 🔗 wyatt8750 has joined #archiveteam-bs
16:05 🔗 wyatt8740 has quit IRC (Read error: Connection reset by peer)
16:05 🔗 Zerote has joined #archiveteam-bs
16:36 🔗 ivan- has joined #archiveteam-bs
16:36 🔗 Fusl____ sets mode: +o ivan-
16:36 🔗 Fusl sets mode: +o ivan-
16:36 🔗 Fusl_ sets mode: +o ivan-
16:47 🔗 qw3rty2 has joined #archiveteam-bs
16:47 🔗 ivan has quit IRC (Ping timeout: 745 seconds)
16:52 🔗 qw3rty has quit IRC (Ping timeout: 745 seconds)
17:14 🔗 qw3rty has joined #archiveteam-bs
17:16 🔗 manjaro-u has joined #archiveteam-bs
17:22 🔗 qw3rty2 has quit IRC (Ping timeout: 745 seconds)
17:45 🔗 bluefoo has quit IRC (Ping timeout: 610 seconds)
18:24 🔗 godane has quit IRC (Quit: Leaving.)
19:06 🔗 MaximeleG has joined #archiveteam-bs
19:13 🔗 godane has joined #archiveteam-bs
19:30 🔗 ats has quit IRC (Read error: Connection reset by peer)
19:41 🔗 ats has joined #archiveteam-bs
19:42 🔗 alt__ has quit IRC (Quit: Reconnecting)
19:42 🔗 alt__ has joined #archiveteam-bs
19:47 🔗 d5f4a3622 has joined #archiveteam-bs
20:18 🔗 JAA jodizzle: Looks like the Royal Society stuff is still free to access for now. I'll set up another mips job for those 403s now.
20:18 🔗 JAA (Cc Fusl)
20:19 🔗 Flashfire https://www.tweakguides.com/Site_updates.html
20:21 🔗 JAA jodizzle: Only 667 URLs failed like that, by the way.
20:25 🔗 jodizzle JAA: Great, thanks.
20:25 🔗 JAA https://transfer.notkiska.pw/KBF36/royalsocietypublishing.org-articles-pdf-403s
20:25 🔗 JAA Running now.
20:33 🔗 JAA So apparently the cookies expire after a while.
20:35 🔗 MaximeleG has quit IRC (Remote host closed the connection)
20:35 🔗 JAA jodizzle: All grabbed now.
20:37 🔗 jodizzle JAA: Great. No more 403's?
20:39 🔗 JAA jodizzle: Just three, which I ran again manually. So yes, everything should be covered now.
20:39 🔗 jodizzle Cool.
20:39 🔗 jodizzle There were also some 302's I saw in the original run which I was curious about, but those are probably normal aspects of the site.
20:42 🔗 JAA Yes, that's the cookie-setting thing.
20:42 🔗 JAA Very common on many journals actually.
20:42 🔗 JAA When you first request a page, the server tries to set a cookie and redirects you to another URL. The second request checks whether the cookie was set and redirects you back to the original URL if so, or to an error page if not.
20:57 🔗 jodizzle I see, that makes sense.
20:57 🔗 jodizzle Is the cookie a tracking cookie, in this case?
21:03 🔗 BlueMax has joined #archiveteam-bs
21:08 🔗 JAA I have no idea what it is. It's just a fixed I2KBRCK=1. ¯\_(ツ)_/¯
21:09 🔗 Video Sorry to divert the topic on hand but this is something I need to note: Apple still has a bunch of old versions of their software online (some software I'm guessing has been discontinued are even on there) for download. URLs are incremental (https://support.apple.com/kb/DL*). An example of a URL would be https://support.apple.com/kb/DL3
21:09 🔗 Video I don't know the future of the old files they still offer the downloads for
21:10 🔗 JAA Video: How high do the numbers go?
21:12 🔗 Video I have no idea, but the highest I've gotten to is 2020
21:12 🔗 Video There is some software there that go all the way back to 2004
21:13 🔗 Video (the year, that is)
21:16 🔗 JAA Hmm, any idea about the size?
21:16 🔗 Video I don't have one. Here's the one from 2004 though: https://support.apple.com/kb/DL600?locale=en_US
21:22 🔗 Video data like this should not go unpreserved
21:23 🔗 JAA Looks like files are at most a couple hundred MB.
21:24 🔗 JAA Just some random sampling: seq 1 2020 | shuf -n 10 | while read -r id; do curl -sL https://support.apple.com/kb/DL${id} | grep -Po '"metaUrl"\s*:\s*"\K[^"]+' | xargs curl -I 2>&1 | grep Content-Length; done
21:24 🔗 JAA So at most we're looking at a couple hundred GB, but almost certainly it'll be less.
21:31 🔗 Video Definitely
21:31 🔗 Video some links don't work for some reason but most do
21:42 🔗 Datechnom If we have a automated way of pulling the data in happy to download for now and move it to IA if required
21:59 🔗 steve5430 has joined #archiveteam-bs
22:04 🔗 steve5430 has quit IRC (Client Quit)
22:08 🔗 BartoCH has quit IRC (Ping timeout: 615 seconds)
22:09 🔗 alt__ has quit IRC (Quit: leaving)
22:09 🔗 ctrl has joined #archiveteam-bs
22:38 🔗 JAA picosong has shut down. I think I got nearly all of it, though I didn't have a chance to look again at a few weird errors. I'll try to get some numbers on that soonish.
22:49 🔗 BartoCH has joined #archiveteam-bs
22:57 🔗 ivan- is now known as ivan
23:15 🔗 JAA Video: Do you have an example where the download differs by locale?
23:17 🔗 JAA According to some old snapshots in the WBM, the download URLs used to contain the locale, but that's no longer the case at least for the ones I tried now.
23:22 🔗 JAA And some aren't downloadable, e.g. https://support.apple.com/kb/DL1060?locale=fr_FR
23:51 🔗 Video it's apparently downloadable via itunes

irclogger-viewer