#archiveteam-bs 2020-02-18,Tue

↑back Search

Time Nickname Message
00:05 🔗 atphoenix betamax: I used to request AOL signup kits under multiple names. There were post-paid postcards in the computer magazines. I always checked the 3.5" floppies checkbox, as the CDs were always coming in the mail. Years ago I recall people making art out of the countless AOL CDs that they received. The floppies were a godsend, and things got better as browsers got bigger. By the end I think I got 6 1.44MB floppies per postcard.
00:05 🔗 atphoenix I became an expert at cleanly removing the labels as well :D
00:06 🔗 atphoenix I likely have some of those in storage (floppies and AOL CDs).
00:10 🔗 britmob has quit IRC (Read error: Connection reset by peer)
00:24 🔗 HP_Archiv has joined #archiveteam-bs
01:01 🔗 icedice has joined #archiveteam-bs
01:09 🔗 fredgido has joined #archiveteam-bs
01:13 🔗 fredgido_ has quit IRC (Read error: Operation timed out)
02:03 🔗 Maylay has quit IRC (Ping timeout: 745 seconds)
02:09 🔗 thuban4 has joined #archiveteam-bs
02:11 🔗 thuban3 has quit IRC (Read error: Operation timed out)
02:18 🔗 godane has quit IRC (Ping timeout: 255 seconds)
02:27 🔗 Maylay has joined #archiveteam-bs
02:27 🔗 Maylay has quit IRC (Remote host closed the connection!)
02:27 🔗 Maylay has joined #archiveteam-bs
02:32 🔗 godane has joined #archiveteam-bs
02:39 🔗 Flashfire has quit IRC (Remote host closed the connection)
02:39 🔗 kiska has quit IRC (Remote host closed the connection)
02:40 🔗 bitbit has quit IRC (Quit: Leaving)
02:40 🔗 kiska has joined #archiveteam-bs
02:40 🔗 svchfoo3 sets mode: +o kiska
02:40 🔗 svchfoo1 sets mode: +o kiska
02:40 🔗 Flashfire has joined #archiveteam-bs
03:05 🔗 Stilett0 has joined #archiveteam-bs
03:06 🔗 Stiletto has quit IRC (Read error: Operation timed out)
04:06 🔗 marked1 g
04:40 🔗 icedice has quit IRC (Leaving)
04:44 🔗 qw3rty__ has joined #archiveteam-bs
04:48 🔗 qw3rty_ has quit IRC (Ping timeout: 276 seconds)
05:22 🔗 HP_Archiv has quit IRC (Quit: Leaving)
05:32 🔗 thuban has joined #archiveteam-bs
05:35 🔗 thuban4 has quit IRC (Ping timeout: 258 seconds)
06:21 🔗 wp494 has quit IRC (LOUD UNNECESSARY QUIT MESSAGES)
06:26 🔗 systwi_ has joined #archiveteam-bs
06:32 🔗 systwi has quit IRC (Ping timeout: 622 seconds)
06:44 🔗 NIC007a83 has quit IRC (Read error: Connection reset by peer)
06:44 🔗 wp494 has joined #archiveteam-bs
07:35 🔗 Larsenv has quit IRC (Quit: ZNC 1.7.5 - https://znc.in)
07:35 🔗 Larsenv has joined #archiveteam-bs
07:36 🔗 thuban has quit IRC (Read error: Operation timed out)
08:11 🔗 alex73_ has joined #archiveteam-bs
08:17 🔗 alex73_ So, about WARC uploading to Wayback Machine as described on https://www.archiveteam.org/index.php?title=Frequently_Asked_Questions. I uploaded it, but can't set mediatype to "web", because usual user can't do it himself. I wrote about uploaded archive to Jason Scott, but there is no any answer more than a week. So, this way from FAQ doesn't work.
08:19 🔗 alex73_ As far as I understand, there is two other ways for push data to Wayback Machine - feed SPN and Archivebot, and Archivebot is preferred way for site with tens of thousands. Right ?
08:29 🔗 atbk_ has quit IRC (Ping timeout: 745 seconds)
09:08 🔗 atbk has joined #archiveteam-bs
09:43 🔗 thuban has joined #archiveteam-bs
10:00 🔗 Datechnom has quit IRC (Quit: Ping timeout (120 seconds))
10:00 🔗 Datechnom has joined #archiveteam-bs
10:06 🔗 BlueMax has quit IRC (Read error: Connection reset by peer)
10:14 🔗 atbk has quit IRC (Ping timeout: 745 seconds)
12:00 🔗 figgyc has joined #archiveteam-bs
12:31 🔗 figgyc has quit IRC (Ping timeout: 260 seconds)
13:17 🔗 atbk has joined #archiveteam-bs
13:39 🔗 JAA alex73_: mediatype:web has to be set on the initial upload. It can indeed only be changed by IA admins once the item exists.
13:39 🔗 JAA And yes, ArchiveBot is the easiest way to get sites into the WBM.
13:40 🔗 JAA Jason will get back to you eventually. If he doesn't after a few weeks, ping him again about it since he might've missed the mail. He gets a lot of them.
13:48 🔗 alex73_ JAA: There is no mediatype in web uploading, only "Page Title", "Page URL", "Description", "Subject Tags", "Creator", "Date", "Collection", "Test Item", "Language", "License" and custom fields.
13:48 🔗 alex73_ And Jeff Kaplan said: user uploaded warcs typically remain as mediatype=data - https://archive.org/post/1105805/please-change-mediatype-from-data-to-web
14:07 🔗 JAA alex73_: Ah, I've never used the web upload form.
14:16 🔗 bitbit has joined #archiveteam-bs
14:40 🔗 mtntmnky has joined #archiveteam-bs
14:51 🔗 mtntmnky_ has quit IRC (Remote host closed the connection)
15:02 🔗 OrIdow6 To anyone interested: there are two different PlayStation forums, https://community.playstation.com/ and https://www.playstation.com/en-gb/community.topic.html/announcement_commun-zRTI/ - I think that I remember somebody running the first through AB (not sure if it worked or not), where there was also a brief discussion of this
15:03 🔗 OrIdow6 The second forums are a mess - it's a bunch of AJAJ, and (as I recall from my brief investigation) pagination of thread lists and lists of pages within threads are done by requests with a very recent timestamp in them
15:23 🔗 JAA https://community.playstation.com/ is a 403 for me.
15:29 🔗 prq has joined #archiveteam-bs
15:49 🔗 wp494 has quit IRC (Read error: Operation timed out)
15:49 🔗 britmob has joined #archiveteam-bs
15:53 🔗 wp494 has joined #archiveteam-bs
16:21 🔗 DogsRNice has joined #archiveteam-bs
16:52 🔗 godane SketchCow: so i got my 50 tapes i bought from ebay
16:52 🔗 godane i'm seeing about using easycap for these cause there not SP commercial tapes
16:53 🔗 godane i also don't want each tape to be over 40gb in size
17:04 🔗 SketchCow Go with your heart
17:15 🔗 godane also what works cause easycap gave me problems with old computer
17:15 🔗 godane like sync issues and stuff
17:15 🔗 godane just maybe this new computer can handle the easycap much better
17:16 🔗 godane i have not notice any problems so far for the 30 minutes of this tape
17:29 🔗 thuban has quit IRC (Read error: Connection reset by peer)
17:30 🔗 thuban has joined #archiveteam-bs
17:41 🔗 Clefairy has joined #archiveteam-bs
18:10 🔗 prq every time I look into tape for diy archival, it just seems to be somewhat lacking. non-sealed cassettes have lower environmental tolerances than optical archival media or even spinning sata drives.
18:18 🔗 Lancet has joined #archiveteam-bs
18:18 🔗 Lancet Hello all! I am looking for help in sourcing a particular MP3 file from an Archive Team python archive, for the former picosong.com website. Is this channel an appropriate place to ask for help?
18:19 🔗 JAA Lancet: I'm the one who archived picosong, and I think everything should be available in the Wayback Machine. That requires you know the URL though.
18:20 🔗 Lancet Thank you for your effort! I do have the URL - it is http://picosong.com/8dQk/. Unfortunately, plugging that straight into archive.org returned an error message.
18:21 🔗 Lancet https://web.archive.org/web/20190401000000*/http://picosong.com/8dQk/
18:21 🔗 JAA Right, so one complication with picosong is that its URLs were case-sensitive but the Wayback Machine is case-insensitive.
18:22 🔗 Lancet ah...
18:24 🔗 JAA I definitely archived that song though. Just a second.
18:25 🔗 Lancet Fair play to you! I presume specific exceptions must have been made at the Wayback Machine for other case-sensitive sites like Youtube?
18:27 🔗 JAA Hmm, it should be here, but that just throws an error: https://web.archive.org/web/20191006025227/http://picosong.com/8dQk/
18:27 🔗 JAA This is the actual file, but same error there: https://web.archive.org/web/2019*/http://picosong.s3.amazonaws.com/8dQk/The%20Alan%20Kelly%20Rap-245312842.mp3?Signature=ilt4iP917gN0hZ%2FvsJVqdB%2FXIHE%3D&Expires=1570331247&AWSAccessKeyId=AKIAIVYGJY7GGRJY2Y3A
18:31 🔗 JAA Here's the page data: https://transfer.notkiska.pw/inline/isJAI/picosong-8dQk-page
18:34 🔗 Lancet Thanks - I presume it means one would need to track through the python libraries to find the specific mp3?
18:34 🔗 bitbit has quit IRC (Ping timeout: 276 seconds)
18:34 🔗 JAA Through the WARC data, but yeah. I'm extracting it right now.
18:36 🔗 Lancet Thank you - really appreciate this, I'm fairly sure this is the last version on the internet!
18:36 🔗 JAA Here you go: https://transfer.notkiska.pw/dj2jW/picosong-8dQk.mp3
18:39 🔗 Lancet Thank you so much - I appreciate what you guys do! Do you have a tip jar/donation link anywhere?
18:41 🔗 Frogging https://www.archiveteam.org/index.php?title=Donate
18:41 🔗 JAA We have a thingy on Open Collective, but that money isn't being spent, so consider donating to the Internet Archive instead.
18:42 🔗 Frogging yeah, donate to IA: https://archive.org/donate/
18:42 🔗 Frogging They store *a lot* of stuff :)
18:47 🔗 systwi has joined #archiveteam-bs
18:49 🔗 Lancet Done - just realised I've benefitted from IA for two decades or so but have never contributed
18:49 🔗 Frogging Great ^^
18:54 🔗 systwi_ has quit IRC (Ping timeout: 622 seconds)
19:01 🔗 JAA Lancet: Oh, just realised that I forgot to cut off the linebreaks at the end of the file, so you should remove the last 4 bytes of that MP3 file.
19:03 🔗 JAA Fixed file: https://transfer.notkiska.pw/moSVv/picosong-8dQk.mp3
19:06 🔗 JAA In case someone needs to do that again or wants to know how that works, here's what I did:
19:06 🔗 JAA curl -sL https://archive.org/download/picosong.com_201910_part0/picosong.com_201910_part0.cdx.gz | zgrep -F 8dQk
19:06 🔗 JAA This produces, among others these two lines, where the 9th field is the length and the 10th the offset of the corresponding WARC record in the WARC file (11th field):
19:06 🔗 JAA com,amazonaws,s3,picosong)/8dqk/<trimmed> 20191006025228 http://picosong.s3.amazonaws.com/8dQk/<trimmed> application/octet-stream 200 2T4RL35AJMP3CTBHFLKAS6EFDVOA6RPC - - 3672733 359120473 picosong.com_201910_part0/picosong-site-00052.warc.gz
19:07 🔗 JAA com,picosong)/8dqk 20191006025227 https://picosong.com/8dQk/ text/html 200 KRK6VAGMNC74USNPCGJQL7SAYZOX4PZS - - 2659 349480170 picosong.com_201910_part0/picosong-site-00052.warc.gz
19:07 🔗 JAA So, to retrieve and upload the WARC record of the page:
19:07 🔗 JAA curl -sL --range "349480170-$((349480170+2658))" https://archive.org/download/picosong.com_201910_part0/picosong-site-00052.warc.gz | curl --upload-file - https://transfer.notkiska.pw/picosong-8dQk-page; echo
19:07 🔗 JAA And to extract the MP3 file:
19:07 🔗 JAA curl -sL --range "359120473-$((359120473+3672732))" https://archive.org/download/picosong.com_201910_part0/picosong-site-00052.warc.gz >picosong-8dQk-warc
19:07 🔗 JAA zcat picosong-8dQk-warc | tail -c+1144 | head -c-4 | curl -sv --upload-file - https://transfer.notkiska.pw/picosong-8dQk.mp3; echo
19:07 🔗 JAA Here, 1144 is the offset of the HTTP body. This would not work if there was chunked transfer encoding involved and probably in a few other cases, so a proper WARC reader would be needed then. The `head -c-4` removes the double CRLF at the end of the WARC block.
19:08 🔗 opticnerv has joined #archiveteam-bs
19:10 🔗 opticnerv has quit IRC (Leaving)
19:15 🔗 marked1 prq : tradeoffs in other ways. tape drives selling point is low cost per additional GB
19:17 🔗 prq yup. all my archival projects that I have in mind don't quite hit that price point where I can justify that margin (when including the cost of the drive itself)
19:18 🔗 prq I love the idea of tape though
19:19 🔗 marked1 yeah, some context, the institutions I know using tape, have tape robots
19:20 🔗 marked1 bigger than my bedroom actually
19:20 🔗 prq indeed.
19:26 🔗 Stilett0 has quit IRC (Ping timeout: 255 seconds)
19:29 🔗 Stiletto has joined #archiveteam-bs
19:43 🔗 asdf0101 has quit IRC (The Lounge - https://thelounge.chat)
19:43 🔗 marked1 has quit IRC (Quit: The Lounge - https://thelounge.chat)
19:47 🔗 asdf0101 has joined #archiveteam-bs
19:50 🔗 marked1 has joined #archiveteam-bs
19:50 🔗 asdf0101 has quit IRC (Read error: Operation timed out)
19:51 🔗 asdf0101 has joined #archiveteam-bs
20:01 🔗 icedice has joined #archiveteam-bs
20:09 🔗 BlueMax has joined #archiveteam-bs
20:48 🔗 Clefairy has quit IRC (Quit: ZNC: the superior metal to CBLT)
20:55 🔗 Clefable has joined #archiveteam-bs
21:31 🔗 ShellyRol has quit IRC (Read error: Connection reset by peer)
21:32 🔗 ShellyRol has joined #archiveteam-bs
21:54 🔗 Lancet has quit IRC (Ping timeout: 260 seconds)
21:59 🔗 bsmith094 has quit IRC (Remote host closed the connection)
22:03 🔗 bsmith093 has joined #archiveteam-bs
22:11 🔗 bitbit has joined #archiveteam-bs
22:11 🔗 bsmith093 has quit IRC (Leaving.)
22:14 🔗 bsmith093 has joined #archiveteam-bs
22:15 🔗 bsmith093 has quit IRC (Remote host closed the connection)
22:17 🔗 bsmith093 has joined #archiveteam-bs
22:19 🔗 godane SketchCow: you can upload my captures
22:19 🔗 godane i'm not uploading right now
22:30 🔗 icedice has quit IRC (Leaving)
23:51 🔗 thuban1 has joined #archiveteam-bs
23:53 🔗 thuban has quit IRC (Read error: Operation timed out)

irclogger-viewer