#archiveteam-bs 2019-12-03,Tue

↑back Search

Time Nickname Message
00:10 🔗 britm0b has quit IRC (Read error: Connection reset by peer)
00:13 🔗 JAA The worst part of web archival: finding old archives hidden away in deeply nested directories, discovering that I've already uploaded something to IA, and trying to figure out whether everything is already uploaded and, if not, what isn't. Bonus points for having renamed things on upload because the initial directory structure during the archival was a mess.
00:14 🔗 JAA (Without changing the directory structure itself, of course.)
00:14 🔗 astrid aaaa yes the worst
00:16 🔗 JAA Oh also, the filenames are all the same. (ノ°Д°)ノ︵ ┻━┻
00:20 🔗 britmob has joined #archiveteam-bs
00:58 🔗 Raccoon It's going to get interesting when Archive Team 2040 is racing to archive The Internet Archive hours before shutdown (probably owing to copyright and right-to-be-forgotten doctrine)
00:59 🔗 ivan I expect Google to have stopped hosting petabytes for free by 2040 as well, so your only hope is great new storage devices
01:04 🔗 Raccoon What does one PB reasonably cost today? About $20,000 for disks and probably that much for a the machines to spin them?
01:05 🔗 Raccoon We could probably GoFundMe that in a couple months
01:07 🔗 JAA That's 1 PB of raw storage, not 1 PB of usable storage with redundancy, power, network, backups, etc.
01:08 🔗 britmob 15k with 12tb easystores + some shopping around could get you 2x cheap 36 bays :P
01:09 🔗 britmob It would probably cost a lot to pay people to shuck them though..
01:09 🔗 Raccoon So we need to build 2 or 3 of these. Make a few appeal videos on YouTube and find several sympathetic personalities with 10 million subscribers to run it for us
01:09 🔗 Raccoon I don't know that you can shuck too many enclosed drives anymore, since they usually don't include header pins
01:10 🔗 britmob I just shucked a few easystores the other day, they're popular among the datahoarder community
01:10 🔗 Raccoon more reasonable to leave them as is though?
01:10 🔗 JAA This is getting into -ot territory.
01:11 🔗 Raccoon we're talking about building a bike shed!
01:11 🔗 Raccoon ;) ok
01:11 🔗 kiska Also we have a channel for this its called #huntinggrounds
01:15 🔗 Raccoon [#huntinggrounds] britmob: why is it more ideal to shuck enclosed drives than run them over USB as intended?
02:14 🔗 pew has quit IRC (Ping timeout: 252 seconds)
02:26 🔗 pew has joined #archiveteam-bs
02:55 🔗 prq has quit IRC (Quit: WeeChat 2.1)
03:01 🔗 prq has joined #archiveteam-bs
03:27 🔗 DLoader has quit IRC (Ping timeout: 745 seconds)
03:46 🔗 HP_Archiv has joined #archiveteam-bs
03:51 🔗 BlueMax has joined #archiveteam-bs
03:53 🔗 Nick-PC has quit IRC (Read error: Operation timed out)
04:11 🔗 qw3rty2 has joined #archiveteam-bs
04:15 🔗 eientei95 JAA: Found a bug with s3-bucket-list when doing appengage-video.s3.amazonaws.com
04:17 🔗 eientei95 `assert all(a[1:] == b[b.rindex(b'</') + 2:] for a, b in zip(tags[:-2:2], tags[1:-2:2]))` is failing on '<Key>campaign_videos/0LegacyUpload/#EveryoneIsWelcome13.mp4</Key><LastModified>2017-04-21T22:42:04.000Z</LastModified><ETag>&quot;40a219b5bd9456b1b07b8ce273fc61fc&quot;</ETag><Size>4942607</Size><Owner><ID>cb27edd331e6627421c28b0ed1ea1f23cf27300871696b4941d2c95df1d68a42</ID><DisplayName>hostmaster</DisplayName></Owner><StorageClass>STANDAR
04:17 🔗 eientei95 D</StorageClass></Contents>'
04:19 🔗 systwi has quit IRC (Remote host closed the connection)
04:20 🔗 Raccoon` has joined #archiveteam-bs
04:21 🔗 qw3rty has quit IRC (Ping timeout: 745 seconds)
04:21 🔗 JAA eientei95: Huh, I thought I had fixed that a while ago.
04:21 🔗 JAA Clearly my XML "parser" is extremely robust. :-)
04:23 🔗 eientei95 lol yeah :p
04:23 🔗 JAA Ah no, I only wrote down that it breaks in this case. :-P
04:24 🔗 eientei95 https://audio-market-dev.s3.amazonaws.com raises the marker loop error
04:25 🔗 JAA Works fine here.
04:25 🔗 JAA Unless it happens somewhere on a later page?
04:26 🔗 Raccoon has quit IRC (Read error: Operation timed out)
04:26 🔗 Raccoon` is now known as Raccoon
04:26 🔗 eientei95 Later page
04:26 🔗 JAA Have the marker for that?
04:27 🔗 Raccoon has quit IRC (Remote host closed the connection)
04:28 🔗 odemgi has joined #archiveteam-bs
04:29 🔗 eientei95 ... odd, now it works fine
04:29 🔗 eientei95 spoke too soon
04:30 🔗 odemgi_ has quit IRC (Read error: Operation timed out)
04:30 🔗 eientei95 JAA: `media/23/Hard Style Producer&apos;s Multi Toolkit Vol 1/HS_PUNCH_KICK_30_p.mp3`
04:31 🔗 eientei95 Is it failing on the &apos;
04:31 🔗 JAA Ah
04:33 🔗 systwi has joined #archiveteam-bs
04:36 🔗 eientei95 `hostname 'birdy-app.com.s3.amazonaws.com' doesn't match either of '*.s3.amazonaws.com', 's3.amazonaws.com'`
04:37 🔗 JAA Stop uncovering all the bugs in my code. :-P
04:40 🔗 eientei95 That's not a bug in your code, it fails the same way when using requests or curl :p
04:45 🔗 JAA Well yes, but only when accessed over HTTPS. You can access it just fine over HTTP in the other tools, but s3-bucket-list forces HTTPS.
04:45 🔗 JAA But it's also partially a bug on Amazon's side.
04:45 🔗 JAA Anyway, the appengage-video.s3.amazonaws.com issue is fixed.
04:46 🔗 Pixi` has quit IRC (Quit: Pixi`)
04:49 🔗 JAA And the marker loop as well.
04:49 🔗 Pixi has joined #archiveteam-bs
04:52 🔗 JAA eientei95: Regarding birdy-app.com: https://stackoverflow.com/questions/3048236/amazon-s3-https-ssl-is-it-possible
04:53 🔗 eientei95 Huh
04:54 🔗 JAA "To work around this, use HTTP or write your own certificate verification logic."
04:54 🔗 JAA ...
04:54 🔗 JAA This is going to give loads of people headaches when vhost-style access becomes mandatory soon.
05:19 🔗 RichardG_ has joined #archiveteam-bs
05:23 🔗 RichardG has quit IRC (Read error: Operation timed out)
05:32 🔗 icedice has joined #archiveteam-bs
05:32 🔗 icedice has quit IRC (Client Quit)
06:28 🔗 Frogging has quit IRC (Quit: Close the World, Open the nExt)
06:30 🔗 systwi_ has joined #archiveteam-bs
06:32 🔗 systwi has quit IRC (Read error: Operation timed out)
06:38 🔗 HP_Archiv has quit IRC (Quit: Leaving)
06:39 🔗 HP_Archiv has joined #archiveteam-bs
06:41 🔗 Frogging has joined #archiveteam-bs
06:52 🔗 Raccoon has joined #archiveteam-bs
07:48 🔗 fredgido has quit IRC (Read error: Connection reset by peer)
07:50 🔗 fredgido has joined #archiveteam-bs
07:50 🔗 Zerote_ has joined #archiveteam-bs
07:54 🔗 Zerote has quit IRC (Read error: Operation timed out)
08:12 🔗 systwi_ is now known as systwi
09:06 🔗 ablabiX has joined #archiveteam-bs
09:06 🔗 Xibalba has quit IRC (Read error: Connection reset by peer)
09:06 🔗 ablabiX is now known as Xibalba
10:05 🔗 d5f4a3622 has quit IRC (Quit: https://i.imgur.com/xacQ09F.mp4)
10:07 🔗 d5f4a3622 has joined #archiveteam-bs
10:09 🔗 cppchrisc has quit IRC (Ping timeout: 496 seconds)
11:05 🔗 BlueMax has quit IRC (Read error: Connection reset by peer)
11:36 🔗 DLoader has joined #archiveteam-bs
11:37 🔗 SilSte has quit IRC (Ping timeout: 745 seconds)
12:20 🔗 britmob has quit IRC (Read error: Operation timed out)
13:06 🔗 kiska18 has quit IRC (Remote host closed the connection)
13:06 🔗 Ryz has quit IRC (Remote host closed the connection)
13:07 🔗 kiska18 has joined #archiveteam-bs
13:07 🔗 Ryz has joined #archiveteam-bs
13:08 🔗 svchfoo3 sets mode: +o kiska18
13:08 🔗 svchfoo1 sets mode: +o kiska18
14:46 🔗 katocala has quit IRC ()
14:49 🔗 katocala has joined #archiveteam-bs
15:44 🔗 RichardG_ is now known as RichardG
16:04 🔗 Ryz Is anyone still working on getting as much content of YouTube 'Liked videos' playlist as possible? There hasn't been any sufficient activity as far as I know D:
16:04 🔗 Ryz It's gonna be a goner on or after 2019 December 05: https://support.google.com/youtube/answer/6083270
16:04 🔗 hknowles has joined #archiveteam-bs
16:12 🔗 markedL this has been like a 80:20 rule. small buggy sites consume more time than large fast sites.
16:19 🔗 kiska More like 95:5 :D
16:59 🔗 BlueMax has joined #archiveteam-bs
17:34 🔗 icedice has joined #archiveteam-bs
17:34 🔗 icedice has quit IRC (Client Quit)
17:37 🔗 icedice has joined #archiveteam-bs
19:30 🔗 superkuh has joined #archiveteam-bs
19:30 🔗 superkuh has quit IRC (Connection closed)
19:58 🔗 HP_Archiv has quit IRC (Quit: Leaving)
19:58 🔗 asdf0101 has quit IRC (The Lounge - https://thelounge.chat)
19:58 🔗 markedL has quit IRC (Quit: The Lounge - https://thelounge.chat)
19:58 🔗 HP_Archiv has joined #archiveteam-bs
19:59 🔗 markedL has joined #archiveteam-bs
19:59 🔗 asdf0101 has joined #archiveteam-bs
20:00 🔗 asdf0101 has quit IRC (Client Quit)
20:00 🔗 markedL has quit IRC (Client Quit)
20:10 🔗 markedL has joined #archiveteam-bs
20:23 🔗 bluefoo has quit IRC (Ping timeout: 610 seconds)
20:23 🔗 HP_Archiv has quit IRC (Quit: Leaving)
20:25 🔗 schbirid has joined #archiveteam-bs
20:32 🔗 DLoader_ has joined #archiveteam-bs
20:34 🔗 bluefoo has joined #archiveteam-bs
20:38 🔗 tech234a has joined #archiveteam-bs
20:43 🔗 DLoader has quit IRC (Ping timeout: 745 seconds)
20:43 🔗 DLoader_ is now known as DLoader
20:48 🔗 Jopik has joined #archiveteam-bs
20:50 🔗 Jopik has quit IRC (Client Quit)
21:03 🔗 LowLevelM arkiver: http://174.87.20.246/uploads/77136fe4d57fcf83/Screen%20Shot%202019-12-03%20at%201.02.56%20PM.png
21:11 🔗 schbirid maybe the bouncer aint letting them in
21:32 🔗 ranma has quit IRC (Quit: ZNC - http://znc.in)
21:34 🔗 manwith1n has joined #archiveteam-bs
21:48 🔗 britmob has joined #archiveteam-bs
21:51 🔗 schbirid has quit IRC (Quit: Leaving)
22:02 🔗 trc has joined #archiveteam-bs
22:29 🔗 Kaz the longest running joke is that arkiver's uptime is worse than Efnet's
22:44 🔗 trc has quit IRC (Quit: Leaving)
23:04 🔗 godane has joined #archiveteam-bs
23:08 🔗 Kaz SketchCow: whoever deals with CDX-Writer needs to push the latest update ASAP please - all our derives are now failing. Looks like the fix is in the repo, but not deployed yet
23:08 🔗 Kaz See https://github.com/internetarchive/CDX-Writer/commit/d3d43ad38b333269bdebcb4a0d35b77eca5be9b0
23:12 🔗 Flashfire has quit IRC (Remote host closed the connection)
23:12 🔗 kiska has quit IRC (Remote host closed the connection)
23:13 🔗 Flashfire has joined #archiveteam-bs
23:14 🔗 kiska has joined #archiveteam-bs
23:15 🔗 svchfoo3 sets mode: +o kiska
23:15 🔗 svchfoo1 sets mode: +o kiska
23:46 🔗 hknowles has quit IRC (Quit: Page closed)
23:50 🔗 dxrt [19:11:19] <Nemo_bis> Dozens of newspapers are changing hands at once in Italy https://www.bnnbloomberg.ca/italy-s-agnellis-add-la-repubblica-publisher-to-media-holdings-1.1356533
23:50 🔗 dxrt [19:14:47] <Nemo_bis> So gelocal.it would use a deep archival https://www.google.com/search?q=site%3Agelocal.it
23:50 🔗 dxrt Any movement on this one?
23:56 🔗 X-Scale` has joined #archiveteam-bs
23:58 🔗 X-Scale has quit IRC (Ping timeout: 252 seconds)
23:58 🔗 X-Scale` is now known as X-Scale

irclogger-viewer