[00:39] The InterfaceLIFT grab from two years ago is *finally* transferring over to my machine now Will upload that to the Internet Archive shortly. Also, the site's back up, so I'll probably regrab it given how unstable it is. [00:40] I'm really happy that it all worked out fine in the end. :-) [01:01] *** killsushi has quit IRC (Quit: Leaving) [01:02] *** killsushi has joined #archiveteam-bs [01:14] *** ATrescue has quit IRC (Ping timeout: 260 seconds) [01:14] Ivan do we want that channel on the wiki? [01:19] Flashfire: what channel [01:19] *** bitBaron has quit IRC (Quit: My computer has gone to sleep. 😴😪ZZZzzz…) [01:19] *** ATrescue has joined #archiveteam-bs [01:20] Check out the second most recent purplebot [01:20] noticece [01:20] https://mozilla.logbot.info/ has some 16 million events in logs spanning 14.5 years. That might be worth a grab once the IRC servers are shut down. (I don't expect those logs to disappear anytime soon. LogBot has been around for a while.) [01:23] Flashfire: Do you mean Instagram on ArchiveTeam Wiki? (got disconnected, can't see) [01:24] *** killsushi has quit IRC (Quit: Leaving) [01:24] I meant the the youtube ome [01:24] *** killsushi has joined #archiveteam-bs [01:24] https://www.archiveteam.org/index.php?title=List_of_lost_online_videos [01:25] *** killsushi has quit IRC (Client Quit) [01:25] Yes I wasn’t sure if Ivan wanted the channel public [01:27] Flashfire: It was private? Oh, sorry. Didn't know. If it is not appropriate, we can remove it. (stays in version history, but nearly nobody digs that deep) [01:28] I don’t know is all so thought I’d ask [01:28] It was mentioned a couple times in public channels before, so not exactly private. [01:29] But probably best not to advertise it too much either. [01:29] Should we get rid of it (and refer to #archiveteam instead?) [01:39] JAA: Re mozilla.logbot.info some smart person grabbed it last year already https://archive.org/details/mozilla.logbot.info_201809 [01:43] :-) [01:44] That was you, wasn't it? [01:55] *** Terbium has quit IRC (https://quassel-irc.org - Chat comfortably. Anywhere.) [01:58] *** Terbium has joined #archiveteam-bs [02:05] *** icedice has joined #archiveteam-bs [02:11] haha, you bet! [02:16] *** Rome_Silv has quit IRC (Read error: Connection reset by peer) [02:17] *** Rome_Silv has joined #archiveteam-bs [03:19] *** odemgi_ has joined #archiveteam-bs [03:22] *** odemgi has quit IRC (Ping timeout: 252 seconds) [03:42] *** BlueMax has quit IRC (Quit: Leaving) [04:40] *** icedice has quit IRC (Ping timeout: 252 seconds) [05:07] *** enowaldo has joined #archiveteam-bs [05:22] *** atbk has quit IRC (Quit: ZNC - https://znc.in) [05:22] *** atbk has joined #archiveteam-bs [05:29] *** enowaldo has quit IRC (Read error: Operation timed out) [05:41] *** ndiddy has quit IRC (Ping timeout: 268 seconds) [05:48] *** Rome_Silv has quit IRC (Read error: Connection reset by peer) [05:48] *** Rome_Silv has joined #archiveteam-bs [06:54] *** enowaldo has joined #archiveteam-bs [07:05] *** Atom-- has joined #archiveteam-bs [07:07] *** enowaldo has quit IRC (Read error: Operation timed out) [07:11] *** Atom__ has quit IRC (Read error: Operation timed out) [07:22] *** Mateon1 has quit IRC (Remote host closed the connection) [07:22] *** Mateon1 has joined #archiveteam-bs [07:52] *** BlueMax has joined #archiveteam-bs [07:56] I don't know about you guys, but I think IMs falls squarely within Archive Team's sphere of interest [07:58] Sanqui I may be able to dig up some more for you if you would like [07:59] Flashfire: would be lovely [07:59] I will DM you a few [09:05] *** enowaldo has joined #archiveteam-bs [09:10] *** enowaldo has quit IRC (Ping timeout: 265 seconds) [10:27] *** BlueMax has quit IRC (Quit: Leaving) [11:25] *** Zerote has quit IRC (Read error: Operation timed out) [12:04] Sanqui: For sure. The only issue is that often there isn't much to archive after the fact. And of course much of it is private talk which we really shouldn't archive. [12:05] JAA: I mainly want to provide resources to help people and communities download chat logs and media for personal archives [12:05] Ah yeah [12:06] i feel like community archival and chronicling is often overlooked [12:06] Very true [12:06] it's not public per se, but microcultures have value [12:07] and i feel like what was on forums was implicitly public is now on discord which is implicitly closed, but maybe not intentionally [12:08] i'm not quite arguing for dumping discord servers and publishing the on the internet, but i am arguing for people in communities to back up and hoard their data [12:49] https://rarelust.com/ Godane I’m not sure if this is of any interest to you but it might be [12:57] *** Zerote has joined #archiveteam-bs [13:43] Soo, I'm trying to verify that some WARC files I got are intact using warcat, and I'm getting lots of errors like "Content block length changed from 4408 to 4399". These were produced by wpull, and it very much looks like they're perfectly alright (at least the one I looked into manually). [13:44] The length change is interesting in that it matches the number of lines in the block (it's a warcinfo record). Smells a bit like CRLF vs LF. [13:45] Frogging, you've had that issue before about two years ago. Did you ever figure out the reason? http://archive.fart.website/bin/irclogger_log/archiveteam-bs?date=2017-03-25,Sat&sel=48#l44 [13:46] chfoo: ^ Any ideas? [13:46] There was thus https://github.com/ArchiveTeam/wpull/pull/360 [13:46] this* [13:47] Yeah, I'm aware of that one, but the payload hash compares fine apparently. [13:47] But that was a hash error, not content block length [13:47] I only get the content length warning. [13:48] Did these files go over ftp before getting to you? [13:48] Yeah, I don't believe I figured that one out. [13:49] marked: Not when I did it [13:49] That reminds me I should add more to the FTP list soon [13:50] This particular file was copied through five or six machines, which is in particular why I want to check the integrity. But I can reproduce it with a file that was never moved at all as well. [13:50] *** enowaldo has joined #archiveteam-bs [13:51] Doubt it's an integrity problem [13:51] I've observed it with no copying [13:56] Yeah, it certainly doesn't look like one. [14:15] *** VerifiedJ has joined #archiveteam-bs [14:17] Looks like newlines are handled correctly, but I have a feeling it has to do with the header field normalisation, somewhat similar to https://github.com/chfoo/warcat/issues/1 [14:19] In any case, it is a bug in warcat I think since "verify" should only validate that the file is a valid WARC file. [14:20] *** enowaldo has quit IRC (Read error: Operation timed out) [14:31] could you file the steps to reproduce somewhere in github? [14:33] *** dxrt_ has quit IRC (Read error: Operation timed out) [14:35] *** bitBaron has joined #archiveteam-bs [14:35] *** dxrt_ has joined #archiveteam-bs [14:35] *** dxrt sets mode: +o dxrt_ [14:51] *** enowaldo has joined #archiveteam-bs [14:56] *** Rome_Silv has quit IRC (Read error: Operation timed out) [15:02] *** enowaldo has quit IRC (Ping timeout: 265 seconds) [15:08] Welp, now I'm running into a ValueError crash due to a malformed HTTP response. Meh... [15:08] Yeah, of course I'll file an issue later. [15:40] *** Zerote_ has joined #archiveteam-bs [15:43] *** Zerote has quit IRC (Ping timeout: 600 seconds) [15:47] *** bitBaron has quit IRC (Quit: My computer has gone to sleep. 😴😪ZZZzzz…) [15:50] *** Zerote_ has quit IRC (Ping timeout: 604 seconds) [15:50] *** Zerote has joined #archiveteam-bs [16:07] *** Zerote has quit IRC (Ping timeout: 604 seconds) [16:11] *** Zerote has joined #archiveteam-bs [16:12] *** dxrt_ has quit IRC (Read error: Operation timed out) [16:23] *** dxrt_ has joined #archiveteam-bs [16:23] *** dxrt sets mode: +o dxrt_ [16:24] *** enowaldo has joined #archiveteam-bs [17:03] *** tomaspark has joined #archiveteam-bs [17:04] *** enowaldo has quit IRC (Read error: Operation timed out) [17:06] *** RomeSilva has joined #archiveteam-bs [17:08] JAA arkiver: I discovered 23M sketches and 1.3M users from sketch.sonymobile.com: https://6xq.net/paste/sony-sketch_sketches.txt.lz https://6xq.net/paste/sony-sketch_artists.txt.lz [17:09] Warrior project? [17:18] *** killsushi has joined #archiveteam-bs [17:19] *** Rome has joined #archiveteam-bs [17:22] *** RomeSilva has quit IRC (Read error: Operation timed out) [17:22] How many days do we have left? [17:24] Until end of september. [17:26] *** bitBaron has joined #archiveteam-bs [17:30] *** Rome_Silv has joined #archiveteam-bs [17:34] *** RomeSilva has joined #archiveteam-bs [17:37] *** Rome has quit IRC (Read error: Operation timed out) [17:37] *** bitBaron has quit IRC (Ping timeout: 615 seconds) [17:37] *** Rome has joined #archiveteam-bs [17:38] *** RomeSilva has quit IRC (Ping timeout: 252 seconds) [17:39] *** Rome_Silv has quit IRC (Read error: Operation timed out) [17:39] *** RomeSilva has joined #archiveteam-bs [17:43] *** Rome has quit IRC (Read error: Operation timed out) [17:47] *** Rome has joined #archiveteam-bs [17:51] *** ndiddy has joined #archiveteam-bs [17:52] *** RomeSilva has quit IRC (Read error: Operation timed out) [17:58] *** killsushi has quit IRC (Quit: Leaving) [17:58] *** killsushi has joined #archiveteam-bs [17:58] *** RomeSilva has joined #archiveteam-bs [18:00] *** Rome has quit IRC (Ping timeout: 246 seconds) [18:02] *** Rome has joined #archiveteam-bs [18:05] *** RomeSilva has quit IRC (Ping timeout: 252 seconds) [18:11] *** ATrescue has quit IRC (Ping timeout: 260 seconds) [18:11] *** bitBaron has joined #archiveteam-bs [18:17] PurpleSym: Nice, thank you! I might try grabbing it with qwarc if only to get more experience with it and fix bugs. That doesn't mean it isn't worth investigating a warrior project as well though. [18:18] *** ATrescue_ has joined #archiveteam-bs [18:21] *** bitBaron has quit IRC (Quit: My computer has gone to sleep. 😴😪ZZZzzz…) [18:22] *** Rome has quit IRC (Read error: Connection reset by peer) [18:23] *** Rome has joined #archiveteam-bs [18:29] *** enowaldo has joined #archiveteam-bs [18:43] Loss: https://www.gonintendo.com/stories/333803-nintendo-removes-wii-wiiware-ds-and-dsiware-games-from-their-w [18:54] *** DogsRNice has joined #archiveteam-bs [18:55] Sorry if this isnt a good question to ask but why is the archivebot irc log password protected? [19:01] DogsRNice: You mean here: http://archive.fart.website/bin/irclogger_logs/ ? [19:01] DogsRNice: Several of them are password-protected. I am not sure why. [19:02] yeah [19:09] the logic (at least to me) is most people except irc not to be logged except for people's own logs [19:11] DogsRNice: The logs are operated by chfoo. You can ask him. [19:11] Kaz: https://www.archiveteam.org/index.php?title=Archiveteam:IRC#Why_does_IRC_need_chat_logs.3F [19:11] nod [19:12] DogsRNice: (He will now be notified anyway because I typed his name in the message.) [19:28] DogsRNice: maybe so that one can crawl a site without appearing in the google results for "domain" [19:37] Not disagreeing with the culture of IRC, but since we are creating archives, I feel an IRC logs is additional documentation of how an archive was created [19:44] disagree [19:44] not to mention most of the project channels are not logged anyway [19:44] at least, in a public way [19:44] want me to publish all zE LOGS? [19:45] nop [19:45] aweww [19:45] marked: Good point. Sometimes I found something on the Internet and I forgot how exactly I found it. [19:48] if you want something logged, keep your own logsa [19:48] simple [19:50] If the IRC logs are part of the documentation, then the documentation is wrong. The relevant stuff should be on the wiki page or the IA item page (depending on the circumstances). And of course there's things that need to be kept in a channel. #archivebot has many of those, and that's why the logs are password-protected I believe. [19:51] By "need to be kept in a channel" I mean "shouldn't be public". [19:53] *** Anthony1 has joined #archiveteam-bs [19:54] WikiHow [19:57] *** bitBaron has joined #archiveteam-bs [20:04] *** DogsRNice has left [20:07] Guys [20:07] *** Despatche has quit IRC (Quit: Read error: Connection reset by deer) [20:08] How do I go about making a WikiHow project? [20:13] *** odemg has joined #archiveteam-bs [20:18] Anthony1: Register on WikiHow. [20:22] The thing that got me into this was Google+ [20:28] We had an ArchiveBot job for WikiHow about 1.5 years ago. Have you checked how good the coverage on the WBM is? [20:29] I am new to this and probably not [20:30] WBM = Wayback Machine [20:30] The ArchiveBot job was in October 2017. [20:32] I know the Wayback machine [20:34] Just didn't know abrev to that [20:35] Here's our crawl, try clicking around a bit: https://web.archive.org/web/20171010022335/https://www.wikihow.com/Main-Page [20:37] *** bitBaron has quit IRC (Quit: My computer has gone to sleep. 😴😪ZZZzzz…) [20:37] *** bitBaron has joined #archiveteam-bs [20:39] @JAA Aside from WikiHow what is going to be the next Warrior project? [20:40] Reddit [20:42] Posts? Subreddits? Images? [20:42] *** enowaldo has quit IRC (Ping timeout: 252 seconds) [20:43] User profiles? [20:51] *** Despatche has joined #archiveteam-bs [20:51] Anthony1: My best guess: Mainly text content and metadata first. Multimedia can be captured later, because it is space consuming. I often see missing reddit users and posts. [20:53] *** icedice has joined #archiveteam-bs [20:53] Posts. Subreddits are useless because they only list the newest 1k submissions. Same for user profiles. [20:53] But for anything regarding Reddit: #shreddit [20:53] *** bitBaron has quit IRC (Quit: My computer has gone to sleep. 😴😪ZZZzzz…) [21:01] *** VerifiedJ has quit IRC (Quit: Leaving) [21:21] *** ndiddy has quit IRC () [21:21] *** ndiddy has joined #archiveteam-bs [21:22] *** Anthony1 has quit IRC (Ping timeout: 260 seconds) [21:30] *** ndiddy has quit IRC (Quit: WeeChat 1.4) [21:30] *** ndiddy has joined #archiveteam-bs [21:52] *** enowaldo has joined #archiveteam-bs [22:02] *** enowaldo has quit IRC (Ping timeout: 268 seconds) [22:37] *** bitBaron has joined #archiveteam-bs [22:37] *** GuysFree has joined #archiveteam-bs [22:47] *** bitBaron has quit IRC (Quit: My computer has gone to sleep. 😴😪ZZZzzz…) [23:17] *** enowaldo has joined #archiveteam-bs [23:27] *** enowaldo has quit IRC (Ping timeout: 252 seconds) [23:48] hook54321: My git repo bundles for octo.sh are now uploading to https://archive.org/details/octo.sh_git_repo_bundles_201904 [23:49] *** BlueMax has joined #archiveteam-bs