[00:05] betamax: I used to request AOL signup kits under multiple names. There were post-paid postcards in the computer magazines. I always checked the 3.5" floppies checkbox, as the CDs were always coming in the mail. Years ago I recall people making art out of the countless AOL CDs that they received. The floppies were a godsend, and things got better as browsers got bigger. By the end I think I got 6 1.44MB floppies per postcard. [00:05] I became an expert at cleanly removing the labels as well :D [00:06] I likely have some of those in storage (floppies and AOL CDs). [00:10] *** britmob has quit IRC (Read error: Connection reset by peer) [00:24] *** HP_Archiv has joined #archiveteam-bs [01:01] *** icedice has joined #archiveteam-bs [01:09] *** fredgido has joined #archiveteam-bs [01:13] *** fredgido_ has quit IRC (Read error: Operation timed out) [02:03] *** Maylay has quit IRC (Ping timeout: 745 seconds) [02:09] *** thuban4 has joined #archiveteam-bs [02:11] *** thuban3 has quit IRC (Read error: Operation timed out) [02:18] *** godane has quit IRC (Ping timeout: 255 seconds) [02:27] *** Maylay has joined #archiveteam-bs [02:27] *** Maylay has quit IRC (Remote host closed the connection!) [02:27] *** Maylay has joined #archiveteam-bs [02:32] *** godane has joined #archiveteam-bs [02:39] *** Flashfire has quit IRC (Remote host closed the connection) [02:39] *** kiska has quit IRC (Remote host closed the connection) [02:40] *** bitbit has quit IRC (Quit: Leaving) [02:40] *** kiska has joined #archiveteam-bs [02:40] *** svchfoo3 sets mode: +o kiska [02:40] *** svchfoo1 sets mode: +o kiska [02:40] *** Flashfire has joined #archiveteam-bs [03:05] *** Stilett0 has joined #archiveteam-bs [03:06] *** Stiletto has quit IRC (Read error: Operation timed out) [04:06] g [04:40] *** icedice has quit IRC (Leaving) [04:44] *** qw3rty__ has joined #archiveteam-bs [04:48] *** qw3rty_ has quit IRC (Ping timeout: 276 seconds) [05:22] *** HP_Archiv has quit IRC (Quit: Leaving) [05:32] *** thuban has joined #archiveteam-bs [05:35] *** thuban4 has quit IRC (Ping timeout: 258 seconds) [06:21] *** wp494 has quit IRC (LOUD UNNECESSARY QUIT MESSAGES) [06:26] *** systwi_ has joined #archiveteam-bs [06:32] *** systwi has quit IRC (Ping timeout: 622 seconds) [06:44] *** NIC007a83 has quit IRC (Read error: Connection reset by peer) [06:44] *** wp494 has joined #archiveteam-bs [07:35] *** Larsenv has quit IRC (Quit: ZNC 1.7.5 - https://znc.in) [07:35] *** Larsenv has joined #archiveteam-bs [07:36] *** thuban has quit IRC (Read error: Operation timed out) [08:11] *** alex73_ has joined #archiveteam-bs [08:17] So, about WARC uploading to Wayback Machine as described on https://www.archiveteam.org/index.php?title=Frequently_Asked_Questions. I uploaded it, but can't set mediatype to "web", because usual user can't do it himself. I wrote about uploaded archive to Jason Scott, but there is no any answer more than a week. So, this way from FAQ doesn't work. [08:19] As far as I understand, there is two other ways for push data to Wayback Machine - feed SPN and Archivebot, and Archivebot is preferred way for site with tens of thousands. Right ? [08:29] *** atbk_ has quit IRC (Ping timeout: 745 seconds) [09:08] *** atbk has joined #archiveteam-bs [09:43] *** thuban has joined #archiveteam-bs [10:00] *** Datechnom has quit IRC (Quit: Ping timeout (120 seconds)) [10:00] *** Datechnom has joined #archiveteam-bs [10:06] *** BlueMax has quit IRC (Read error: Connection reset by peer) [10:14] *** atbk has quit IRC (Ping timeout: 745 seconds) [12:00] *** figgyc has joined #archiveteam-bs [12:31] *** figgyc has quit IRC (Ping timeout: 260 seconds) [13:17] *** atbk has joined #archiveteam-bs [13:39] alex73_: mediatype:web has to be set on the initial upload. It can indeed only be changed by IA admins once the item exists. [13:39] And yes, ArchiveBot is the easiest way to get sites into the WBM. [13:40] Jason will get back to you eventually. If he doesn't after a few weeks, ping him again about it since he might've missed the mail. He gets a lot of them. [13:48] JAA: There is no mediatype in web uploading, only "Page Title", "Page URL", "Description", "Subject Tags", "Creator", "Date", "Collection", "Test Item", "Language", "License" and custom fields. [13:48] And Jeff Kaplan said: user uploaded warcs typically remain as mediatype=data - https://archive.org/post/1105805/please-change-mediatype-from-data-to-web [14:07] alex73_: Ah, I've never used the web upload form. [14:16] *** bitbit has joined #archiveteam-bs [14:40] *** mtntmnky has joined #archiveteam-bs [14:51] *** mtntmnky_ has quit IRC (Remote host closed the connection) [15:02] To anyone interested: there are two different PlayStation forums, https://community.playstation.com/ and https://www.playstation.com/en-gb/community.topic.html/announcement_commun-zRTI/ - I think that I remember somebody running the first through AB (not sure if it worked or not), where there was also a brief discussion of this [15:03] The second forums are a mess - it's a bunch of AJAJ, and (as I recall from my brief investigation) pagination of thread lists and lists of pages within threads are done by requests with a very recent timestamp in them [15:23] https://community.playstation.com/ is a 403 for me. [15:29] *** prq has joined #archiveteam-bs [15:49] *** wp494 has quit IRC (Read error: Operation timed out) [15:49] *** britmob has joined #archiveteam-bs [15:53] *** wp494 has joined #archiveteam-bs [16:21] *** DogsRNice has joined #archiveteam-bs [16:52] SketchCow: so i got my 50 tapes i bought from ebay [16:52] i'm seeing about using easycap for these cause there not SP commercial tapes [16:53] i also don't want each tape to be over 40gb in size [17:04] Go with your heart [17:15] also what works cause easycap gave me problems with old computer [17:15] like sync issues and stuff [17:15] just maybe this new computer can handle the easycap much better [17:16] i have not notice any problems so far for the 30 minutes of this tape [17:29] *** thuban has quit IRC (Read error: Connection reset by peer) [17:30] *** thuban has joined #archiveteam-bs [17:41] *** Clefairy has joined #archiveteam-bs [18:10] every time I look into tape for diy archival, it just seems to be somewhat lacking. non-sealed cassettes have lower environmental tolerances than optical archival media or even spinning sata drives. [18:18] *** Lancet has joined #archiveteam-bs [18:18] Hello all! I am looking for help in sourcing a particular MP3 file from an Archive Team python archive, for the former picosong.com website. Is this channel an appropriate place to ask for help? [18:19] Lancet: I'm the one who archived picosong, and I think everything should be available in the Wayback Machine. That requires you know the URL though. [18:20] Thank you for your effort! I do have the URL - it is http://picosong.com/8dQk/. Unfortunately, plugging that straight into archive.org returned an error message. [18:21] https://web.archive.org/web/20190401000000*/http://picosong.com/8dQk/ [18:21] Right, so one complication with picosong is that its URLs were case-sensitive but the Wayback Machine is case-insensitive. [18:22] ah... [18:24] I definitely archived that song though. Just a second. [18:25] Fair play to you! I presume specific exceptions must have been made at the Wayback Machine for other case-sensitive sites like Youtube? [18:27] Hmm, it should be here, but that just throws an error: https://web.archive.org/web/20191006025227/http://picosong.com/8dQk/ [18:27] This is the actual file, but same error there: https://web.archive.org/web/2019*/http://picosong.s3.amazonaws.com/8dQk/The%20Alan%20Kelly%20Rap-245312842.mp3?Signature=ilt4iP917gN0hZ%2FvsJVqdB%2FXIHE%3D&Expires=1570331247&AWSAccessKeyId=AKIAIVYGJY7GGRJY2Y3A [18:31] Here's the page data: https://transfer.notkiska.pw/inline/isJAI/picosong-8dQk-page [18:34] Thanks - I presume it means one would need to track through the python libraries to find the specific mp3? [18:34] *** bitbit has quit IRC (Ping timeout: 276 seconds) [18:34] Through the WARC data, but yeah. I'm extracting it right now. [18:36] Thank you - really appreciate this, I'm fairly sure this is the last version on the internet! [18:36] Here you go: https://transfer.notkiska.pw/dj2jW/picosong-8dQk.mp3 [18:39] Thank you so much - I appreciate what you guys do! Do you have a tip jar/donation link anywhere? [18:41] https://www.archiveteam.org/index.php?title=Donate [18:41] We have a thingy on Open Collective, but that money isn't being spent, so consider donating to the Internet Archive instead. [18:42] yeah, donate to IA: https://archive.org/donate/ [18:42] They store *a lot* of stuff :) [18:47] *** systwi has joined #archiveteam-bs [18:49] Done - just realised I've benefitted from IA for two decades or so but have never contributed [18:49] Great ^^ [18:54] *** systwi_ has quit IRC (Ping timeout: 622 seconds) [19:01] Lancet: Oh, just realised that I forgot to cut off the linebreaks at the end of the file, so you should remove the last 4 bytes of that MP3 file. [19:03] Fixed file: https://transfer.notkiska.pw/moSVv/picosong-8dQk.mp3 [19:06] In case someone needs to do that again or wants to know how that works, here's what I did: [19:06] curl -sL https://archive.org/download/picosong.com_201910_part0/picosong.com_201910_part0.cdx.gz | zgrep -F 8dQk [19:06] This produces, among others these two lines, where the 9th field is the length and the 10th the offset of the corresponding WARC record in the WARC file (11th field): [19:06] com,amazonaws,s3,picosong)/8dqk/ 20191006025228 http://picosong.s3.amazonaws.com/8dQk/ application/octet-stream 200 2T4RL35AJMP3CTBHFLKAS6EFDVOA6RPC - - 3672733 359120473 picosong.com_201910_part0/picosong-site-00052.warc.gz [19:07] com,picosong)/8dqk 20191006025227 https://picosong.com/8dQk/ text/html 200 KRK6VAGMNC74USNPCGJQL7SAYZOX4PZS - - 2659 349480170 picosong.com_201910_part0/picosong-site-00052.warc.gz [19:07] So, to retrieve and upload the WARC record of the page: [19:07] curl -sL --range "349480170-$((349480170+2658))" https://archive.org/download/picosong.com_201910_part0/picosong-site-00052.warc.gz | curl --upload-file - https://transfer.notkiska.pw/picosong-8dQk-page; echo [19:07] And to extract the MP3 file: [19:07] curl -sL --range "359120473-$((359120473+3672732))" https://archive.org/download/picosong.com_201910_part0/picosong-site-00052.warc.gz >picosong-8dQk-warc [19:07] zcat picosong-8dQk-warc | tail -c+1144 | head -c-4 | curl -sv --upload-file - https://transfer.notkiska.pw/picosong-8dQk.mp3; echo [19:07] Here, 1144 is the offset of the HTTP body. This would not work if there was chunked transfer encoding involved and probably in a few other cases, so a proper WARC reader would be needed then. The `head -c-4` removes the double CRLF at the end of the WARC block. [19:08] *** opticnerv has joined #archiveteam-bs [19:10] *** opticnerv has quit IRC (Leaving) [19:15] prq : tradeoffs in other ways. tape drives selling point is low cost per additional GB [19:17] yup. all my archival projects that I have in mind don't quite hit that price point where I can justify that margin (when including the cost of the drive itself) [19:18] I love the idea of tape though [19:19] yeah, some context, the institutions I know using tape, have tape robots [19:20] bigger than my bedroom actually [19:20] indeed. [19:26] *** Stilett0 has quit IRC (Ping timeout: 255 seconds) [19:29] *** Stiletto has joined #archiveteam-bs [19:43] *** asdf0101 has quit IRC (The Lounge - https://thelounge.chat) [19:43] *** marked1 has quit IRC (Quit: The Lounge - https://thelounge.chat) [19:47] *** asdf0101 has joined #archiveteam-bs [19:50] *** marked1 has joined #archiveteam-bs [19:50] *** asdf0101 has quit IRC (Read error: Operation timed out) [19:51] *** asdf0101 has joined #archiveteam-bs [20:01] *** icedice has joined #archiveteam-bs [20:09] *** BlueMax has joined #archiveteam-bs [20:48] *** Clefairy has quit IRC (Quit: ZNC: the superior metal to CBLT) [20:55] *** Clefable has joined #archiveteam-bs [21:31] *** ShellyRol has quit IRC (Read error: Connection reset by peer) [21:32] *** ShellyRol has joined #archiveteam-bs [21:54] *** Lancet has quit IRC (Ping timeout: 260 seconds) [21:59] *** bsmith094 has quit IRC (Remote host closed the connection) [22:03] *** bsmith093 has joined #archiveteam-bs [22:11] *** bitbit has joined #archiveteam-bs [22:11] *** bsmith093 has quit IRC (Leaving.) [22:14] *** bsmith093 has joined #archiveteam-bs [22:15] *** bsmith093 has quit IRC (Remote host closed the connection) [22:17] *** bsmith093 has joined #archiveteam-bs [22:19] SketchCow: you can upload my captures [22:19] i'm not uploading right now [22:30] *** icedice has quit IRC (Leaving) [23:51] *** thuban1 has joined #archiveteam-bs [23:53] *** thuban has quit IRC (Read error: Operation timed out)