Time |
Nickname |
Message |
00:10
🔗
|
|
britm0b has quit IRC (Read error: Connection reset by peer) |
00:13
🔗
|
JAA |
The worst part of web archival: finding old archives hidden away in deeply nested directories, discovering that I've already uploaded something to IA, and trying to figure out whether everything is already uploaded and, if not, what isn't. Bonus points for having renamed things on upload because the initial directory structure during the archival was a mess. |
00:14
🔗
|
JAA |
(Without changing the directory structure itself, of course.) |
00:14
🔗
|
astrid |
aaaa yes the worst |
00:16
🔗
|
JAA |
Oh also, the filenames are all the same. (ノ°Д°)ノ︵ ┻━┻ |
00:20
🔗
|
|
britmob has joined #archiveteam-bs |
00:58
🔗
|
Raccoon |
It's going to get interesting when Archive Team 2040 is racing to archive The Internet Archive hours before shutdown (probably owing to copyright and right-to-be-forgotten doctrine) |
00:59
🔗
|
ivan |
I expect Google to have stopped hosting petabytes for free by 2040 as well, so your only hope is great new storage devices |
01:04
🔗
|
Raccoon |
What does one PB reasonably cost today? About $20,000 for disks and probably that much for a the machines to spin them? |
01:05
🔗
|
Raccoon |
We could probably GoFundMe that in a couple months |
01:07
🔗
|
JAA |
That's 1 PB of raw storage, not 1 PB of usable storage with redundancy, power, network, backups, etc. |
01:08
🔗
|
britmob |
15k with 12tb easystores + some shopping around could get you 2x cheap 36 bays :P |
01:09
🔗
|
britmob |
It would probably cost a lot to pay people to shuck them though.. |
01:09
🔗
|
Raccoon |
So we need to build 2 or 3 of these. Make a few appeal videos on YouTube and find several sympathetic personalities with 10 million subscribers to run it for us |
01:09
🔗
|
Raccoon |
I don't know that you can shuck too many enclosed drives anymore, since they usually don't include header pins |
01:10
🔗
|
britmob |
I just shucked a few easystores the other day, they're popular among the datahoarder community |
01:10
🔗
|
Raccoon |
more reasonable to leave them as is though? |
01:10
🔗
|
JAA |
This is getting into -ot territory. |
01:11
🔗
|
Raccoon |
we're talking about building a bike shed! |
01:11
🔗
|
Raccoon |
;) ok |
01:11
🔗
|
kiska |
Also we have a channel for this its called #huntinggrounds |
01:15
🔗
|
Raccoon |
[#huntinggrounds] britmob: why is it more ideal to shuck enclosed drives than run them over USB as intended? |
02:14
🔗
|
|
pew has quit IRC (Ping timeout: 252 seconds) |
02:26
🔗
|
|
pew has joined #archiveteam-bs |
02:55
🔗
|
|
prq has quit IRC (Quit: WeeChat 2.1) |
03:01
🔗
|
|
prq has joined #archiveteam-bs |
03:27
🔗
|
|
DLoader has quit IRC (Ping timeout: 745 seconds) |
03:46
🔗
|
|
HP_Archiv has joined #archiveteam-bs |
03:51
🔗
|
|
BlueMax has joined #archiveteam-bs |
03:53
🔗
|
|
Nick-PC has quit IRC (Read error: Operation timed out) |
04:11
🔗
|
|
qw3rty2 has joined #archiveteam-bs |
04:15
🔗
|
eientei95 |
JAA: Found a bug with s3-bucket-list when doing appengage-video.s3.amazonaws.com |
04:17
🔗
|
eientei95 |
`assert all(a[1:] == b[b.rindex(b'</') + 2:] for a, b in zip(tags[:-2:2], tags[1:-2:2]))` is failing on '<Key>campaign_videos/0LegacyUpload/#EveryoneIsWelcome13.mp4</Key><LastModified>2017-04-21T22:42:04.000Z</LastModified><ETag>"40a219b5bd9456b1b07b8ce273fc61fc"</ETag><Size>4942607</Size><Owner><ID>cb27edd331e6627421c28b0ed1ea1f23cf27300871696b4941d2c95df1d68a42</ID><DisplayName>hostmaster</DisplayName></Owner><StorageClass>STANDAR |
04:17
🔗
|
eientei95 |
D</StorageClass></Contents>' |
04:19
🔗
|
|
systwi has quit IRC (Remote host closed the connection) |
04:20
🔗
|
|
Raccoon` has joined #archiveteam-bs |
04:21
🔗
|
|
qw3rty has quit IRC (Ping timeout: 745 seconds) |
04:21
🔗
|
JAA |
eientei95: Huh, I thought I had fixed that a while ago. |
04:21
🔗
|
JAA |
Clearly my XML "parser" is extremely robust. :-) |
04:23
🔗
|
eientei95 |
lol yeah :p |
04:23
🔗
|
JAA |
Ah no, I only wrote down that it breaks in this case. :-P |
04:24
🔗
|
eientei95 |
https://audio-market-dev.s3.amazonaws.com raises the marker loop error |
04:25
🔗
|
JAA |
Works fine here. |
04:25
🔗
|
JAA |
Unless it happens somewhere on a later page? |
04:26
🔗
|
|
Raccoon has quit IRC (Read error: Operation timed out) |
04:26
🔗
|
|
Raccoon` is now known as Raccoon |
04:26
🔗
|
eientei95 |
Later page |
04:26
🔗
|
JAA |
Have the marker for that? |
04:27
🔗
|
|
Raccoon has quit IRC (Remote host closed the connection) |
04:28
🔗
|
|
odemgi has joined #archiveteam-bs |
04:29
🔗
|
eientei95 |
... odd, now it works fine |
04:29
🔗
|
eientei95 |
spoke too soon |
04:30
🔗
|
|
odemgi_ has quit IRC (Read error: Operation timed out) |
04:30
🔗
|
eientei95 |
JAA: `media/23/Hard Style Producer's Multi Toolkit Vol 1/HS_PUNCH_KICK_30_p.mp3` |
04:31
🔗
|
eientei95 |
Is it failing on the ' |
04:31
🔗
|
JAA |
Ah |
04:33
🔗
|
|
systwi has joined #archiveteam-bs |
04:36
🔗
|
eientei95 |
`hostname 'birdy-app.com.s3.amazonaws.com' doesn't match either of '*.s3.amazonaws.com', 's3.amazonaws.com'` |
04:37
🔗
|
JAA |
Stop uncovering all the bugs in my code. :-P |
04:40
🔗
|
eientei95 |
That's not a bug in your code, it fails the same way when using requests or curl :p |
04:45
🔗
|
JAA |
Well yes, but only when accessed over HTTPS. You can access it just fine over HTTP in the other tools, but s3-bucket-list forces HTTPS. |
04:45
🔗
|
JAA |
But it's also partially a bug on Amazon's side. |
04:45
🔗
|
JAA |
Anyway, the appengage-video.s3.amazonaws.com issue is fixed. |
04:46
🔗
|
|
Pixi` has quit IRC (Quit: Pixi`) |
04:49
🔗
|
JAA |
And the marker loop as well. |
04:49
🔗
|
|
Pixi has joined #archiveteam-bs |
04:52
🔗
|
JAA |
eientei95: Regarding birdy-app.com: https://stackoverflow.com/questions/3048236/amazon-s3-https-ssl-is-it-possible |
04:53
🔗
|
eientei95 |
Huh |
04:54
🔗
|
JAA |
"To work around this, use HTTP or write your own certificate verification logic." |
04:54
🔗
|
JAA |
... |
04:54
🔗
|
JAA |
This is going to give loads of people headaches when vhost-style access becomes mandatory soon. |
05:19
🔗
|
|
RichardG_ has joined #archiveteam-bs |
05:23
🔗
|
|
RichardG has quit IRC (Read error: Operation timed out) |
05:32
🔗
|
|
icedice has joined #archiveteam-bs |
05:32
🔗
|
|
icedice has quit IRC (Client Quit) |
06:28
🔗
|
|
Frogging has quit IRC (Quit: Close the World, Open the nExt) |
06:30
🔗
|
|
systwi_ has joined #archiveteam-bs |
06:32
🔗
|
|
systwi has quit IRC (Read error: Operation timed out) |
06:38
🔗
|
|
HP_Archiv has quit IRC (Quit: Leaving) |
06:39
🔗
|
|
HP_Archiv has joined #archiveteam-bs |
06:41
🔗
|
|
Frogging has joined #archiveteam-bs |
06:52
🔗
|
|
Raccoon has joined #archiveteam-bs |
07:48
🔗
|
|
fredgido has quit IRC (Read error: Connection reset by peer) |
07:50
🔗
|
|
fredgido has joined #archiveteam-bs |
07:50
🔗
|
|
Zerote_ has joined #archiveteam-bs |
07:54
🔗
|
|
Zerote has quit IRC (Read error: Operation timed out) |
08:12
🔗
|
|
systwi_ is now known as systwi |
09:06
🔗
|
|
ablabiX has joined #archiveteam-bs |
09:06
🔗
|
|
Xibalba has quit IRC (Read error: Connection reset by peer) |
09:06
🔗
|
|
ablabiX is now known as Xibalba |
10:05
🔗
|
|
d5f4a3622 has quit IRC (Quit: https://i.imgur.com/xacQ09F.mp4) |
10:07
🔗
|
|
d5f4a3622 has joined #archiveteam-bs |
10:09
🔗
|
|
cppchrisc has quit IRC (Ping timeout: 496 seconds) |
11:05
🔗
|
|
BlueMax has quit IRC (Read error: Connection reset by peer) |
11:36
🔗
|
|
DLoader has joined #archiveteam-bs |
11:37
🔗
|
|
SilSte has quit IRC (Ping timeout: 745 seconds) |
12:20
🔗
|
|
britmob has quit IRC (Read error: Operation timed out) |
13:06
🔗
|
|
kiska18 has quit IRC (Remote host closed the connection) |
13:06
🔗
|
|
Ryz has quit IRC (Remote host closed the connection) |
13:07
🔗
|
|
kiska18 has joined #archiveteam-bs |
13:07
🔗
|
|
Ryz has joined #archiveteam-bs |
13:08
🔗
|
|
svchfoo3 sets mode: +o kiska18 |
13:08
🔗
|
|
svchfoo1 sets mode: +o kiska18 |
14:46
🔗
|
|
katocala has quit IRC () |
14:49
🔗
|
|
katocala has joined #archiveteam-bs |
15:44
🔗
|
|
RichardG_ is now known as RichardG |
16:04
🔗
|
Ryz |
Is anyone still working on getting as much content of YouTube 'Liked videos' playlist as possible? There hasn't been any sufficient activity as far as I know D: |
16:04
🔗
|
Ryz |
It's gonna be a goner on or after 2019 December 05: https://support.google.com/youtube/answer/6083270 |
16:04
🔗
|
|
hknowles has joined #archiveteam-bs |
16:12
🔗
|
markedL |
this has been like a 80:20 rule. small buggy sites consume more time than large fast sites. |
16:19
🔗
|
kiska |
More like 95:5 :D |
16:59
🔗
|
|
BlueMax has joined #archiveteam-bs |
17:34
🔗
|
|
icedice has joined #archiveteam-bs |
17:34
🔗
|
|
icedice has quit IRC (Client Quit) |
17:37
🔗
|
|
icedice has joined #archiveteam-bs |
19:30
🔗
|
|
superkuh has joined #archiveteam-bs |
19:30
🔗
|
|
superkuh has quit IRC (Connection closed) |
19:58
🔗
|
|
HP_Archiv has quit IRC (Quit: Leaving) |
19:58
🔗
|
|
asdf0101 has quit IRC (The Lounge - https://thelounge.chat) |
19:58
🔗
|
|
markedL has quit IRC (Quit: The Lounge - https://thelounge.chat) |
19:58
🔗
|
|
HP_Archiv has joined #archiveteam-bs |
19:59
🔗
|
|
markedL has joined #archiveteam-bs |
19:59
🔗
|
|
asdf0101 has joined #archiveteam-bs |
20:00
🔗
|
|
asdf0101 has quit IRC (Client Quit) |
20:00
🔗
|
|
markedL has quit IRC (Client Quit) |
20:10
🔗
|
|
markedL has joined #archiveteam-bs |
20:23
🔗
|
|
bluefoo has quit IRC (Ping timeout: 610 seconds) |
20:23
🔗
|
|
HP_Archiv has quit IRC (Quit: Leaving) |
20:25
🔗
|
|
schbirid has joined #archiveteam-bs |
20:32
🔗
|
|
DLoader_ has joined #archiveteam-bs |
20:34
🔗
|
|
bluefoo has joined #archiveteam-bs |
20:38
🔗
|
|
tech234a has joined #archiveteam-bs |
20:43
🔗
|
|
DLoader has quit IRC (Ping timeout: 745 seconds) |
20:43
🔗
|
|
DLoader_ is now known as DLoader |
20:48
🔗
|
|
Jopik has joined #archiveteam-bs |
20:50
🔗
|
|
Jopik has quit IRC (Client Quit) |
21:03
🔗
|
LowLevelM |
arkiver: http://174.87.20.246/uploads/77136fe4d57fcf83/Screen%20Shot%202019-12-03%20at%201.02.56%20PM.png |
21:11
🔗
|
schbirid |
maybe the bouncer aint letting them in |
21:32
🔗
|
|
ranma has quit IRC (Quit: ZNC - http://znc.in) |
21:34
🔗
|
|
manwith1n has joined #archiveteam-bs |
21:48
🔗
|
|
britmob has joined #archiveteam-bs |
21:51
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
22:02
🔗
|
|
trc has joined #archiveteam-bs |
22:29
🔗
|
Kaz |
the longest running joke is that arkiver's uptime is worse than Efnet's |
22:44
🔗
|
|
trc has quit IRC (Quit: Leaving) |
23:04
🔗
|
|
godane has joined #archiveteam-bs |
23:08
🔗
|
Kaz |
SketchCow: whoever deals with CDX-Writer needs to push the latest update ASAP please - all our derives are now failing. Looks like the fix is in the repo, but not deployed yet |
23:08
🔗
|
Kaz |
See https://github.com/internetarchive/CDX-Writer/commit/d3d43ad38b333269bdebcb4a0d35b77eca5be9b0 |
23:12
🔗
|
|
Flashfire has quit IRC (Remote host closed the connection) |
23:12
🔗
|
|
kiska has quit IRC (Remote host closed the connection) |
23:13
🔗
|
|
Flashfire has joined #archiveteam-bs |
23:14
🔗
|
|
kiska has joined #archiveteam-bs |
23:15
🔗
|
|
svchfoo3 sets mode: +o kiska |
23:15
🔗
|
|
svchfoo1 sets mode: +o kiska |
23:46
🔗
|
|
hknowles has quit IRC (Quit: Page closed) |
23:50
🔗
|
dxrt |
[19:11:19] <Nemo_bis> Dozens of newspapers are changing hands at once in Italy https://www.bnnbloomberg.ca/italy-s-agnellis-add-la-repubblica-publisher-to-media-holdings-1.1356533 |
23:50
🔗
|
dxrt |
[19:14:47] <Nemo_bis> So gelocal.it would use a deep archival https://www.google.com/search?q=site%3Agelocal.it |
23:50
🔗
|
dxrt |
Any movement on this one? |
23:56
🔗
|
|
X-Scale` has joined #archiveteam-bs |
23:58
🔗
|
|
X-Scale has quit IRC (Ping timeout: 252 seconds) |
23:58
🔗
|
|
X-Scale` is now known as X-Scale |