Time |
Nickname |
Message |
00:05
🔗
|
atphoenix |
betamax: I used to request AOL signup kits under multiple names. There were post-paid postcards in the computer magazines. I always checked the 3.5" floppies checkbox, as the CDs were always coming in the mail. Years ago I recall people making art out of the countless AOL CDs that they received. The floppies were a godsend, and things got better as browsers got bigger. By the end I think I got 6 1.44MB floppies per postcard. |
00:05
🔗
|
atphoenix |
I became an expert at cleanly removing the labels as well :D |
00:06
🔗
|
atphoenix |
I likely have some of those in storage (floppies and AOL CDs). |
00:10
🔗
|
|
britmob has quit IRC (Read error: Connection reset by peer) |
00:24
🔗
|
|
HP_Archiv has joined #archiveteam-bs |
01:01
🔗
|
|
icedice has joined #archiveteam-bs |
01:09
🔗
|
|
fredgido has joined #archiveteam-bs |
01:13
🔗
|
|
fredgido_ has quit IRC (Read error: Operation timed out) |
02:03
🔗
|
|
Maylay has quit IRC (Ping timeout: 745 seconds) |
02:09
🔗
|
|
thuban4 has joined #archiveteam-bs |
02:11
🔗
|
|
thuban3 has quit IRC (Read error: Operation timed out) |
02:18
🔗
|
|
godane has quit IRC (Ping timeout: 255 seconds) |
02:27
🔗
|
|
Maylay has joined #archiveteam-bs |
02:27
🔗
|
|
Maylay has quit IRC (Remote host closed the connection!) |
02:27
🔗
|
|
Maylay has joined #archiveteam-bs |
02:32
🔗
|
|
godane has joined #archiveteam-bs |
02:39
🔗
|
|
Flashfire has quit IRC (Remote host closed the connection) |
02:39
🔗
|
|
kiska has quit IRC (Remote host closed the connection) |
02:40
🔗
|
|
bitbit has quit IRC (Quit: Leaving) |
02:40
🔗
|
|
kiska has joined #archiveteam-bs |
02:40
🔗
|
|
svchfoo3 sets mode: +o kiska |
02:40
🔗
|
|
svchfoo1 sets mode: +o kiska |
02:40
🔗
|
|
Flashfire has joined #archiveteam-bs |
03:05
🔗
|
|
Stilett0 has joined #archiveteam-bs |
03:06
🔗
|
|
Stiletto has quit IRC (Read error: Operation timed out) |
04:06
🔗
|
marked1 |
g |
04:40
🔗
|
|
icedice has quit IRC (Leaving) |
04:44
🔗
|
|
qw3rty__ has joined #archiveteam-bs |
04:48
🔗
|
|
qw3rty_ has quit IRC (Ping timeout: 276 seconds) |
05:22
🔗
|
|
HP_Archiv has quit IRC (Quit: Leaving) |
05:32
🔗
|
|
thuban has joined #archiveteam-bs |
05:35
🔗
|
|
thuban4 has quit IRC (Ping timeout: 258 seconds) |
06:21
🔗
|
|
wp494 has quit IRC (LOUD UNNECESSARY QUIT MESSAGES) |
06:26
🔗
|
|
systwi_ has joined #archiveteam-bs |
06:32
🔗
|
|
systwi has quit IRC (Ping timeout: 622 seconds) |
06:44
🔗
|
|
NIC007a83 has quit IRC (Read error: Connection reset by peer) |
06:44
🔗
|
|
wp494 has joined #archiveteam-bs |
07:35
🔗
|
|
Larsenv has quit IRC (Quit: ZNC 1.7.5 - https://znc.in) |
07:35
🔗
|
|
Larsenv has joined #archiveteam-bs |
07:36
🔗
|
|
thuban has quit IRC (Read error: Operation timed out) |
08:11
🔗
|
|
alex73_ has joined #archiveteam-bs |
08:17
🔗
|
alex73_ |
So, about WARC uploading to Wayback Machine as described on https://www.archiveteam.org/index.php?title=Frequently_Asked_Questions. I uploaded it, but can't set mediatype to "web", because usual user can't do it himself. I wrote about uploaded archive to Jason Scott, but there is no any answer more than a week. So, this way from FAQ doesn't work. |
08:19
🔗
|
alex73_ |
As far as I understand, there is two other ways for push data to Wayback Machine - feed SPN and Archivebot, and Archivebot is preferred way for site with tens of thousands. Right ? |
08:29
🔗
|
|
atbk_ has quit IRC (Ping timeout: 745 seconds) |
09:08
🔗
|
|
atbk has joined #archiveteam-bs |
09:43
🔗
|
|
thuban has joined #archiveteam-bs |
10:00
🔗
|
|
Datechnom has quit IRC (Quit: Ping timeout (120 seconds)) |
10:00
🔗
|
|
Datechnom has joined #archiveteam-bs |
10:06
🔗
|
|
BlueMax has quit IRC (Read error: Connection reset by peer) |
10:14
🔗
|
|
atbk has quit IRC (Ping timeout: 745 seconds) |
12:00
🔗
|
|
figgyc has joined #archiveteam-bs |
12:31
🔗
|
|
figgyc has quit IRC (Ping timeout: 260 seconds) |
13:17
🔗
|
|
atbk has joined #archiveteam-bs |
13:39
🔗
|
JAA |
alex73_: mediatype:web has to be set on the initial upload. It can indeed only be changed by IA admins once the item exists. |
13:39
🔗
|
JAA |
And yes, ArchiveBot is the easiest way to get sites into the WBM. |
13:40
🔗
|
JAA |
Jason will get back to you eventually. If he doesn't after a few weeks, ping him again about it since he might've missed the mail. He gets a lot of them. |
13:48
🔗
|
alex73_ |
JAA: There is no mediatype in web uploading, only "Page Title", "Page URL", "Description", "Subject Tags", "Creator", "Date", "Collection", "Test Item", "Language", "License" and custom fields. |
13:48
🔗
|
alex73_ |
And Jeff Kaplan said: user uploaded warcs typically remain as mediatype=data - https://archive.org/post/1105805/please-change-mediatype-from-data-to-web |
14:07
🔗
|
JAA |
alex73_: Ah, I've never used the web upload form. |
14:16
🔗
|
|
bitbit has joined #archiveteam-bs |
14:40
🔗
|
|
mtntmnky has joined #archiveteam-bs |
14:51
🔗
|
|
mtntmnky_ has quit IRC (Remote host closed the connection) |
15:02
🔗
|
OrIdow6 |
To anyone interested: there are two different PlayStation forums, https://community.playstation.com/ and https://www.playstation.com/en-gb/community.topic.html/announcement_commun-zRTI/ - I think that I remember somebody running the first through AB (not sure if it worked or not), where there was also a brief discussion of this |
15:03
🔗
|
OrIdow6 |
The second forums are a mess - it's a bunch of AJAJ, and (as I recall from my brief investigation) pagination of thread lists and lists of pages within threads are done by requests with a very recent timestamp in them |
15:23
🔗
|
JAA |
https://community.playstation.com/ is a 403 for me. |
15:29
🔗
|
|
prq has joined #archiveteam-bs |
15:49
🔗
|
|
wp494 has quit IRC (Read error: Operation timed out) |
15:49
🔗
|
|
britmob has joined #archiveteam-bs |
15:53
🔗
|
|
wp494 has joined #archiveteam-bs |
16:21
🔗
|
|
DogsRNice has joined #archiveteam-bs |
16:52
🔗
|
godane |
SketchCow: so i got my 50 tapes i bought from ebay |
16:52
🔗
|
godane |
i'm seeing about using easycap for these cause there not SP commercial tapes |
16:53
🔗
|
godane |
i also don't want each tape to be over 40gb in size |
17:04
🔗
|
SketchCow |
Go with your heart |
17:15
🔗
|
godane |
also what works cause easycap gave me problems with old computer |
17:15
🔗
|
godane |
like sync issues and stuff |
17:15
🔗
|
godane |
just maybe this new computer can handle the easycap much better |
17:16
🔗
|
godane |
i have not notice any problems so far for the 30 minutes of this tape |
17:29
🔗
|
|
thuban has quit IRC (Read error: Connection reset by peer) |
17:30
🔗
|
|
thuban has joined #archiveteam-bs |
17:41
🔗
|
|
Clefairy has joined #archiveteam-bs |
18:10
🔗
|
prq |
every time I look into tape for diy archival, it just seems to be somewhat lacking. non-sealed cassettes have lower environmental tolerances than optical archival media or even spinning sata drives. |
18:18
🔗
|
|
Lancet has joined #archiveteam-bs |
18:18
🔗
|
Lancet |
Hello all! I am looking for help in sourcing a particular MP3 file from an Archive Team python archive, for the former picosong.com website. Is this channel an appropriate place to ask for help? |
18:19
🔗
|
JAA |
Lancet: I'm the one who archived picosong, and I think everything should be available in the Wayback Machine. That requires you know the URL though. |
18:20
🔗
|
Lancet |
Thank you for your effort! I do have the URL - it is http://picosong.com/8dQk/. Unfortunately, plugging that straight into archive.org returned an error message. |
18:21
🔗
|
Lancet |
https://web.archive.org/web/20190401000000*/http://picosong.com/8dQk/ |
18:21
🔗
|
JAA |
Right, so one complication with picosong is that its URLs were case-sensitive but the Wayback Machine is case-insensitive. |
18:22
🔗
|
Lancet |
ah... |
18:24
🔗
|
JAA |
I definitely archived that song though. Just a second. |
18:25
🔗
|
Lancet |
Fair play to you! I presume specific exceptions must have been made at the Wayback Machine for other case-sensitive sites like Youtube? |
18:27
🔗
|
JAA |
Hmm, it should be here, but that just throws an error: https://web.archive.org/web/20191006025227/http://picosong.com/8dQk/ |
18:27
🔗
|
JAA |
This is the actual file, but same error there: https://web.archive.org/web/2019*/http://picosong.s3.amazonaws.com/8dQk/The%20Alan%20Kelly%20Rap-245312842.mp3?Signature=ilt4iP917gN0hZ%2FvsJVqdB%2FXIHE%3D&Expires=1570331247&AWSAccessKeyId=AKIAIVYGJY7GGRJY2Y3A |
18:31
🔗
|
JAA |
Here's the page data: https://transfer.notkiska.pw/inline/isJAI/picosong-8dQk-page |
18:34
🔗
|
Lancet |
Thanks - I presume it means one would need to track through the python libraries to find the specific mp3? |
18:34
🔗
|
|
bitbit has quit IRC (Ping timeout: 276 seconds) |
18:34
🔗
|
JAA |
Through the WARC data, but yeah. I'm extracting it right now. |
18:36
🔗
|
Lancet |
Thank you - really appreciate this, I'm fairly sure this is the last version on the internet! |
18:36
🔗
|
JAA |
Here you go: https://transfer.notkiska.pw/dj2jW/picosong-8dQk.mp3 |
18:39
🔗
|
Lancet |
Thank you so much - I appreciate what you guys do! Do you have a tip jar/donation link anywhere? |
18:41
🔗
|
Frogging |
https://www.archiveteam.org/index.php?title=Donate |
18:41
🔗
|
JAA |
We have a thingy on Open Collective, but that money isn't being spent, so consider donating to the Internet Archive instead. |
18:42
🔗
|
Frogging |
yeah, donate to IA: https://archive.org/donate/ |
18:42
🔗
|
Frogging |
They store *a lot* of stuff :) |
18:47
🔗
|
|
systwi has joined #archiveteam-bs |
18:49
🔗
|
Lancet |
Done - just realised I've benefitted from IA for two decades or so but have never contributed |
18:49
🔗
|
Frogging |
Great ^^ |
18:54
🔗
|
|
systwi_ has quit IRC (Ping timeout: 622 seconds) |
19:01
🔗
|
JAA |
Lancet: Oh, just realised that I forgot to cut off the linebreaks at the end of the file, so you should remove the last 4 bytes of that MP3 file. |
19:03
🔗
|
JAA |
Fixed file: https://transfer.notkiska.pw/moSVv/picosong-8dQk.mp3 |
19:06
🔗
|
JAA |
In case someone needs to do that again or wants to know how that works, here's what I did: |
19:06
🔗
|
JAA |
curl -sL https://archive.org/download/picosong.com_201910_part0/picosong.com_201910_part0.cdx.gz | zgrep -F 8dQk |
19:06
🔗
|
JAA |
This produces, among others these two lines, where the 9th field is the length and the 10th the offset of the corresponding WARC record in the WARC file (11th field): |
19:06
🔗
|
JAA |
com,amazonaws,s3,picosong)/8dqk/<trimmed> 20191006025228 http://picosong.s3.amazonaws.com/8dQk/<trimmed> application/octet-stream 200 2T4RL35AJMP3CTBHFLKAS6EFDVOA6RPC - - 3672733 359120473 picosong.com_201910_part0/picosong-site-00052.warc.gz |
19:07
🔗
|
JAA |
com,picosong)/8dqk 20191006025227 https://picosong.com/8dQk/ text/html 200 KRK6VAGMNC74USNPCGJQL7SAYZOX4PZS - - 2659 349480170 picosong.com_201910_part0/picosong-site-00052.warc.gz |
19:07
🔗
|
JAA |
So, to retrieve and upload the WARC record of the page: |
19:07
🔗
|
JAA |
curl -sL --range "349480170-$((349480170+2658))" https://archive.org/download/picosong.com_201910_part0/picosong-site-00052.warc.gz | curl --upload-file - https://transfer.notkiska.pw/picosong-8dQk-page; echo |
19:07
🔗
|
JAA |
And to extract the MP3 file: |
19:07
🔗
|
JAA |
curl -sL --range "359120473-$((359120473+3672732))" https://archive.org/download/picosong.com_201910_part0/picosong-site-00052.warc.gz >picosong-8dQk-warc |
19:07
🔗
|
JAA |
zcat picosong-8dQk-warc | tail -c+1144 | head -c-4 | curl -sv --upload-file - https://transfer.notkiska.pw/picosong-8dQk.mp3; echo |
19:07
🔗
|
JAA |
Here, 1144 is the offset of the HTTP body. This would not work if there was chunked transfer encoding involved and probably in a few other cases, so a proper WARC reader would be needed then. The `head -c-4` removes the double CRLF at the end of the WARC block. |
19:08
🔗
|
|
opticnerv has joined #archiveteam-bs |
19:10
🔗
|
|
opticnerv has quit IRC (Leaving) |
19:15
🔗
|
marked1 |
prq : tradeoffs in other ways. tape drives selling point is low cost per additional GB |
19:17
🔗
|
prq |
yup. all my archival projects that I have in mind don't quite hit that price point where I can justify that margin (when including the cost of the drive itself) |
19:18
🔗
|
prq |
I love the idea of tape though |
19:19
🔗
|
marked1 |
yeah, some context, the institutions I know using tape, have tape robots |
19:20
🔗
|
marked1 |
bigger than my bedroom actually |
19:20
🔗
|
prq |
indeed. |
19:26
🔗
|
|
Stilett0 has quit IRC (Ping timeout: 255 seconds) |
19:29
🔗
|
|
Stiletto has joined #archiveteam-bs |
19:43
🔗
|
|
asdf0101 has quit IRC (The Lounge - https://thelounge.chat) |
19:43
🔗
|
|
marked1 has quit IRC (Quit: The Lounge - https://thelounge.chat) |
19:47
🔗
|
|
asdf0101 has joined #archiveteam-bs |
19:50
🔗
|
|
marked1 has joined #archiveteam-bs |
19:50
🔗
|
|
asdf0101 has quit IRC (Read error: Operation timed out) |
19:51
🔗
|
|
asdf0101 has joined #archiveteam-bs |
20:01
🔗
|
|
icedice has joined #archiveteam-bs |
20:09
🔗
|
|
BlueMax has joined #archiveteam-bs |
20:48
🔗
|
|
Clefairy has quit IRC (Quit: ZNC: the superior metal to CBLT) |
20:55
🔗
|
|
Clefable has joined #archiveteam-bs |
21:31
🔗
|
|
ShellyRol has quit IRC (Read error: Connection reset by peer) |
21:32
🔗
|
|
ShellyRol has joined #archiveteam-bs |
21:54
🔗
|
|
Lancet has quit IRC (Ping timeout: 260 seconds) |
21:59
🔗
|
|
bsmith094 has quit IRC (Remote host closed the connection) |
22:03
🔗
|
|
bsmith093 has joined #archiveteam-bs |
22:11
🔗
|
|
bitbit has joined #archiveteam-bs |
22:11
🔗
|
|
bsmith093 has quit IRC (Leaving.) |
22:14
🔗
|
|
bsmith093 has joined #archiveteam-bs |
22:15
🔗
|
|
bsmith093 has quit IRC (Remote host closed the connection) |
22:17
🔗
|
|
bsmith093 has joined #archiveteam-bs |
22:19
🔗
|
godane |
SketchCow: you can upload my captures |
22:19
🔗
|
godane |
i'm not uploading right now |
22:30
🔗
|
|
icedice has quit IRC (Leaving) |
23:51
🔗
|
|
thuban1 has joined #archiveteam-bs |
23:53
🔗
|
|
thuban has quit IRC (Read error: Operation timed out) |