[00:12] *** kode54 has quit IRC (Quit: ZNC 1.7.2 - https://znc.in) [00:13] *** swebb has joined #archiveteam-bs [00:20] *** kode54 has joined #archiveteam-bs [00:28] can someone here change the default warrior project back to urlteam? [00:30] Kaz, astrid, chfoo: ^ ? No idea who exactly has global tracker admin. [00:32] done [00:32] thanks [00:52] *** BlueMax has joined #archiveteam-bs [01:10] *** exoire has quit IRC (Remote host closed the connection) [01:17] *** SimpBrain has quit IRC (Read error: Operation timed out) [01:18] *** SimpBrain has joined #archiveteam-bs [03:09] *** ReimuHaku has quit IRC (Ping timeout: 268 seconds) [03:15] *** bitBaron has quit IRC (Quit: My computer has gone to sleep. 😴😪ZZZzzz…) [03:19] *** nerd1337 has joined #archiveteam-bs [03:21] https://archive.org/details/FanficRepack_Redux i uploaded this, its an updated repack of fanfiction.net, how do i get the torrent to rebuild itself, there's one file per letter, and the torrent stops after O [03:27] *** ReimuHaku has joined #archiveteam-bs [03:40] *** Despatche has quit IRC (Read error: Connection reset by peer) [03:40] *** Despatche has joined #archiveteam-bs [03:43] *** Despatche has quit IRC (Remote host closed the connection) [03:44] *** Despatche has joined #archiveteam-bs [03:59] *** bitBaron has joined #archiveteam-bs [04:07] *** odemgi_ has joined #archiveteam-bs [04:09] *** odemgi has quit IRC (Ping timeout: 252 seconds) [04:16] *** odemg has quit IRC (Ping timeout: 615 seconds) [04:22] *** odemg has joined #archiveteam-bs [04:30] *** kiska1 has quit IRC (Ping timeout (120 seconds)) [04:34] *** kiska1 has joined #archiveteam-bs [04:48] *** qw3rty115 has joined #archiveteam-bs [04:48] *** ndiddy has quit IRC () [04:53] *** qw3rty114 has quit IRC (Read error: Operation timed out) [05:18] *** wacky has quit IRC (Remote host closed the connection) [06:06] *** bitBaron has quit IRC (Quit: My computer has gone to sleep. 😴😪ZZZzzz…) [06:27] *** SimpBrain has quit IRC (Read error: Connection reset by peer) [06:28] *** SimpBrain has joined #archiveteam-bs [06:33] *** Atom-- has joined #archiveteam-bs [06:35] Hi, I'm the webmaster for a small barbershop chorus that has recently disbanded, and our website will be taken down on March 18. [06:36] I came across Archiveteam while looking for ways to publicly archive our web presence and you guys seem like the best way to get a good archive. [06:36] Our website is a small Wordpress site, we also have some social media but that has no fixed deadline (except for g+) and can be dealt with after the main site. [06:37] *** Atom has quit IRC (Read error: Operation timed out) [06:40] SketchCow: we got 2 movies that are woc from 1986-11 [06:41] those are uploaded already [06:41] *** SimpBrain has quit IRC (Remote host closed the connection) [06:41] *** SimpBrain has joined #archiveteam-bs [06:47] *** Despatche has quit IRC (Read error: Operation timed out) [06:48] Exairnous: What's the URLs of your site and social media accounts? [06:49] *** Despatche has joined #archiveteam-bs [06:52] eientei95: http://ngharmony.ca, https://twitter.com/NGH_Singers, https://facebook.com/NGHSingers, http://plus.google.com/111467764711942357704, https://www.instagram.com/ngh_singers, https://youtube.com/channel/UC8VFq9R5XDOlh8KmV5G_CYA [06:52] so, no trouble with wordpress sites? [06:54] I've been lurking in #archivebot a bit, so I've heard some sites work better than others [07:05] eientei95: is it true that it takes a week for archivebot to upload to the wayback machine? [07:06] It takes a day to upload to IA and then yeah, probably a week to get processed and go into IA [07:06] does size contribute? [07:06] it's a pretty small site [07:09] eientei95: ^^ [07:09] I'm not sure, someone like SketchCow would know more [07:09] ok [07:12] eientei95: I 've heard instagram is basically a no go for IA, but what about the other SM accounts? [07:12] what are the chances they'll work? [07:17] *** Atom-- has quit IRC (Ping timeout: 252 seconds) [07:22] eientei95: for IA do I need to worry about these robots.txt rules?: [07:22] User-agent: * [07:22] Disallow: /wp-content/ [07:22] IA yes, archiveteam no [07:22] Disallow: /wp-admin/ [07:22] Disallow: /wp-includes/ [07:22] *** Atom-- has joined #archiveteam-bs [07:23] eientei95: so what's the chance of IA setting stuff to private if these rules remain until the site is shut down? [07:24] Exairnous: Basically that just means that your standard "good" robot won't archive images from the site. We don't follow robots.txt's rules so you don't have to worry about that [07:25] eientei95: I'm not worried about archiveteam :) I just don't want IA taking stuff down by mistake [07:25] Don't worry about that, IA doesn't delete anything, just hides it [07:26] eientei95: that kind of defeats the purpose of a public archive though [07:26] Blame robots.txt [07:26] eientei95: I guess I'll change it then [07:27] Basically the only rule you need to remove is "Disallow: /wp-content/" [07:27] ok [07:31] eientei95: I also have Disallow: /members-only/ & Disallow: /wp/members-only/ [07:31] those pages don't matter and are empty now, but I don't want IA to do anything drastic [07:32] They don't matter as long as they don't exist and/or are empty [07:33] they exist, I just don't care if they're archived or not [07:33] so long as IA doesn't make the whole site private because of the rules [07:35] eientei95: ^^ should I remove them just to be sure? [07:35] If you want [07:35] The only ones that you probably don't want accessible or indexed are wp-admin and wp-includes [07:35] and wp-admin and wp-includes won't harm anything? [07:37] eientei95: ^^ [07:38] Yeah, they won't harm anything being in there [07:38] eientei95: does IA take down just the page when it hits a robots.txt rule or the whole site? [07:38] If the file is affected by the robots.txt rule, then IA won't make it available [07:41] eientei95: ok, then the members-only rules should be fine. The pages/links aren't visible unless you're logged in so they shouldn't be caught by archivebot anyway? [07:41] Yeah, they shouldn't, unless archivebot has gained sentience [07:42] and can guess usernames and passwords :) [08:10] *** kbtoo__ has joined #archiveteam-bs [08:11] arkiver: Is blox.pl still on your radar? [08:14] *** kbtoo_ has quit IRC (Ping timeout: 255 seconds) [08:25] hey dashcloud [08:25] i got your tapes [08:26] there being digitized using SVIDEO cable [08:26] and a new vcr [08:41] *** m007a83 has quit IRC (Read error: Connection reset by peer) [08:43] *** m007a83 has joined #archiveteam-bs [08:50] *** wp494 has quit IRC (Ping timeout: 268 seconds) [08:50] *** wp494 has joined #archiveteam-bs [08:50] *** kbtoo_ has joined #archiveteam-bs [08:54] *** kbtoo__ has quit IRC (Ping timeout: 255 seconds) [09:22] *** m007a83 has quit IRC (Ping timeout: 252 seconds) [09:23] *** m007a83 has joined #archiveteam-bs [09:39] Exairnous: I've taken care of the social media accounts. Instagram won't playback properly in the Wayback Machine, but the content will be preserved at least. The other sites work okay I believe. [10:23] *** BlueMax has quit IRC (Quit: Leaving) [11:06] *** SimpBrain has quit IRC (Read error: Operation timed out) [11:07] *** chimyatta has joined #archiveteam-bs [11:07] *** SimpBrain has joined #archiveteam-bs [11:16] *** VADemon has joined #archiveteam-bs [13:08] PurpleSym: yes [13:08] but thanks for the reminder [13:11] *** bitBaron has joined #archiveteam-bs [13:27] *** bitBaron has quit IRC (Quit: My computer has gone to sleep. 😴😪ZZZzzz…) [13:32] *** bitBaron has joined #archiveteam-bs [13:35] *** wp494 has quit IRC (Ping timeout: 1212 seconds) [13:36] *** deevious has quit IRC (Ping timeout: 252 seconds) [14:01] *** wp494 has joined #archiveteam-bs [14:04] SketchCow, dude decided to stick it here instead: http://www.os2bbs.com/zippedfilecollection/ [14:05] *** Mateon1 has quit IRC (Ping timeout: 615 seconds) [14:09] He says... >I may take the collection offline again in a few weeks. [14:09] So I'm mirroring it to the-eye too [14:10] *** Mateon1 has joined #archiveteam-bs [14:16] *** deevious has joined #archiveteam-bs [14:35] *** kiska1 has quit IRC (Ping timeout (120 seconds)) [14:42] *** kiska1 has joined #archiveteam-bs [14:45] *** kiska1 has quit IRC (Remote host closed the connection) [14:45] *** kiska1 has joined #archiveteam-bs [14:57] *** arbin_ has quit IRC (Quit: .) [14:58] *** arbin has joined #archiveteam-bs [15:04] *** ciunwired has joined #archiveteam-bs [15:04] *** Despatche has quit IRC (Read error: Operation timed out) [15:05] *** Despatche has joined #archiveteam-bs [15:22] *** ciunwired has quit IRC (Read error: Connection reset by peer) [15:22] *** ciunwired has joined #archiveteam-bs [15:23] *** bitBaron has quit IRC (Quit: My computer has gone to sleep. 😴😪ZZZzzz…) [15:23] *** ciunwired has quit IRC (Read error: Connection reset by peer) [15:23] *** ciunwired has joined #archiveteam-bs [15:26] *** ciunwired has quit IRC (Read error: Connection reset by peer) [15:26] *** ciunwired has joined #archiveteam-bs [15:26] *** VADemon has quit IRC (Read error: Operation timed out) [15:34] *** ciunwired has quit IRC (Leaving) [15:47] *** VADemon has joined #archiveteam-bs [16:27] *** bitBaron has joined #archiveteam-bs [16:34] *** Despatche has quit IRC (Read error: Connection reset by peer) [16:35] *** Despatche has joined #archiveteam-bs [16:36] *** Despatche has quit IRC (Remote host closed the connection) [16:37] *** Despatche has joined #archiveteam-bs [16:39] *** VerifiedJ has quit IRC (Ping timeout: 252 seconds) [16:51] *** VerifiedJ has joined #archiveteam-bs [16:51] *** SimpBrain has quit IRC (Remote host closed the connection) [16:52] *** SimpBrain has joined #archiveteam-bs [17:14] *** Despatche has quit IRC (Ping timeout: 252 seconds) [17:15] *** Despatche has joined #archiveteam-bs [17:18] *** schbirid has joined #archiveteam-bs [17:28] someone should tell them to add export https://groups.io/static/transfer [17:35] *** Despatche has quit IRC (Ping timeout: 252 seconds) [17:37] *** Despatche has joined #archiveteam-bs [17:48] *** wp494 has quit IRC (Ping timeout: 255 seconds) [17:49] *** wp494 has joined #archiveteam-bs [18:05] *** bitBaron has quit IRC (My computer has gone to sleep. 😴😪ZZZzzz…) [18:46] *** bitBaron has joined #archiveteam-bs [18:50] *** fredgido_ has quit IRC (Ping timeout: 252 seconds) [19:04] *** SimpBrain has quit IRC (Remote host closed the connection) [19:04] *** SimpBrain has joined #archiveteam-bs [19:10] odemgi_: He's letting me dupe it directly [19:10] AND we had it apparently [19:10] Although I'm now comparing [19:11] *** Despatche has quit IRC (Ping timeout: 600 seconds) [19:25] JAA: Thanks. I assume it'll take a week for the social media accounts to get into IA as well? [19:25] anyone know of a public online archive that handles instagram properly? [19:34] Exairnous: Usually just a day or two, but yeah, it can take up to a week. Not sure how large the uploading backlog is currently on the pipeline that ran these jobs. [19:35] ok, good to know [19:35] That machine is having some network issues, so there was a backlog of almost 500 GiB yesterday. [19:36] SketchCow, hmm, reading his initial post I figured it would be unique, or at least partially so, let me know if it's a 100% dupe please [19:37] JAA: did the youtube job from #youtubearchive finish or is it still going? [19:49] Exairnous: Looks like it finished. [19:49] :) [19:49] Be aware that those videos don't get uploaded to IA automatically. [19:49] They'll eventually end up there if/when they get deleted on YT. [19:49] *** nataraj_ has joined #archiveteam-bs [19:50] so, no way to test fully whether the archive worked until after the channel has been deleted? [19:53] does anyone have any plans whatosever to fix the regression in the warrior code from may 2017 which prevents youtube-dl from working? I keep bringing it up and getting very loud silence [19:53] JAA: ^^ [19:53] IIRC the cause of the bug is removal of the CONNECT function, which did indeed break object oriented programming rules, but is necessary for youtube-dl to work [19:54] Exairnous: Here's what it grabbed: https://transfer.sh/G9npU/ts-ls-UC8VFq9R5XDOlh8KmV5G_CYA [19:56] Lord_Nigh: Yes, I want to fix that once I get https://github.com/ArchiveTeam/wpull/pull/393 merged. [19:56] I can't even find the commit which removed it now :( [19:56] I remember it was removed by someone who ran a lot of warriors but isn't even in #archivebot anymore and may have gone AWOL? [19:56] Commit 561380774baf5fd44990d16d64f545259c7385a1 [19:57] In wpull [19:57] falconk [19:57] Yep [19:58] i honestly say we just revert that commit and fix stuff that aged badly and breaks because of the revert [19:58] Well yeah, but we need to revert it only partially. [19:59] *** m007a83 has quit IRC (Read error: Connection reset by peer) [19:59] Anyway, until PR 393 is merged, it's unlikely that anything else will happen. [19:59] *** m007a83 has joined #archiveteam-bs [19:59] nothing's been merged since october, which doesn't bode well... [20:00] Yeah [20:00] time for a fork? [20:00] lol [20:00] It's our repo. [20:00] time for a project administrator change? :P [20:00] Heh [20:00] There used to be tons of wpull forks, it was a mess. [20:01] Yup, "wpull 2.0.3" was FalconK's fork for example. It was merged back into the official repo because it was used for so long on ArchiveBot that there are a lot of WARCs with that version number out there. [20:01] ivan also has his own fork used in grab-site with version numbers 3.x. [20:01] oh awesome, is wpull usable again then? [20:01] only read the last two lines here :D [20:02] not until 561380774baf5fd44990d16d64f545259c7385a1 gets (partially?) reverted [20:02] Yeah, 2.0.3 is usable. You need to install from GitHub though as it's not on PyPI yet. [20:02] *** Oddly has joined #archiveteam-bs [20:02] ^^ That commit only breaks the youtube-dl integration. Otherwise, 2.0.3 is fine-ish. [20:02] did anyone bother ps/top testing whether the frozendict change actually helps reduce load in that commit? [20:02] There are annoying bugs in the network stack though. [20:02] it seems sort of arbitrary [20:03] Lord_Nigh: I believe FalconK and/or yipdw did performance testing at the time. At least I think I read something about that in the #archivebot logs. (I wasn't around yet when that happened.) [20:03] there's no linked document explaining why frozendict hurts performance, just "it does" [20:04] Or rather, they ran a profiler and found that the FrozenDict takes up a lot of CPU time. [20:04] if that's true, then the removal makes sense [20:05] wpull incurred something like a 5% performance hit by reallocating that metadata into a FrozenDict structure [20:05] if frozendict was working around a poorly implemented python feature which was fixed in the last 5 years, then it makes even more sense [20:05] ok, having never run wpull, is it python 2.x or 3.x code? [20:05] (i'm assuming 2.x) [20:06] python 3 according to github [20:06] oh. that's good! less issues in the long run [20:06] 2.x is dying an altogether too slow death [20:07] wpull was always 3.x-only, fortunately. [20:07] though as cuavas pointed out, 3.x has some issues 2.x didn't have, particularly with arrays of text [20:07] where the assumption in 2.x that text arrays were ascii and one-byte-per-char made things easier, while 3.x's utf8 causes problems [20:08] That's because in 3.x, strings are actually strings of characters. [20:08] Rather than strings of bytes. [20:08] yes [20:08] I find the 3.x concept much, *much* better. [20:08] I do too [20:09] Do you have any example of such an issue? [20:09] Most "3.x strings are so annoying" issues were simply due to people not using strings properly. [20:09] not offhand, I'd have to dig in a few years of irc logs to find the specific issue [20:09] Most such issues I've seen* [20:10] for 99% of cases the python 3 way is better [20:10] this specific case i think is in the 1% [20:10] My only issue with the 3.x situation is that I always forget whether the method to go from str to bytes is .decode or .encode. :-P [20:11] i'd assume its decode [20:11] because a string is utf-8 encoded [20:11] Nope, it's encode. [20:11] great... [20:11] Because you encode a string of characters in UTF-8. [20:11] Or something like that. [20:12] oh, so you're 'encoding' the intermediate representation of "a string" into a canonical byte format [20:12] Yeah [20:12] hmm does this mean that python can .encode into non-utf8 representations too? like shift-jis or CP437 or etc? [20:12] Yes, I think so. [20:12] that's handy! [20:13] are those built-in? or only utf8? [20:14] since being able to encode to utf16 (for legacy windows stuff), UCS-2 (apple stuff) etc is very useful [20:14] Not sure since I never needed it. [20:14] I'd expect it to be built-in though. Possibly in the "codecs" module. [20:14] We're very far in -ot territory by now, so let's move this there. [20:16] nah, i'm done, we can go back to -bs'ing [20:16] *** glmd has joined #archiveteam-bs [20:16] ok here's an on-topic thing: a while back someone linked me this: https://archive.org/stream/TNM_Music__Speech_Audio_synthesizer_-_Logistics__20170915_0481 [20:16] glmd: is all your stuff being uploaded to the IA? [20:17] There's a project trying to archive Blogspot Google+ comments. We have less than a day to get them all. [20:17] I've been trying to find more information about that device, digging in IA and in other places, and have come up pretty much blank [20:17] We have a server for reporting technical issues here: https://discord.gg/dP4Pu6d [20:17] If you'd like to help, clone this repo and follow the instructions in the README: https://github.com/afrmtbl/blogspot-comment-backup [20:17] I found https://stacks.stanford.edu/file/jc317zm3296/jc317zm3296_31_0000.pdf and https://monoskop.org/images/8/80/Synapse_Vol_2_No_2.pdf but no concrete evidence that that device was ever actually sold [20:18] @HCross: We plan on doing so. [20:23] *** wacky has joined #archiveteam-bs [20:25] it wasn't always 3.x https://github.com/ArchiveTeam/wpull/commit/45262a580f7654c2f27bb7ee68a24a7ae3a59d9a [20:26] -only [20:27] Huh, TIL. [20:39] odemgi_: I am fine with a dupe [20:40] https://blog.stephenwolfram.com/2019/02/seeking-the-productive-life-some-details-of-my-personal-infrastructure/ [20:40] this is the greatest thing [20:42] So [20:42] https://blog.stephenwolfram.com/2019/02/seeking-the-productive-life-some-details-of-my-personal-infrastructure/ [20:42] I mean [20:42] https://archive.org/details/scenenotices [20:42] I just spent some time on this. Turns out we have a fuckton of NFOs on the archive. [20:43] I've gone ahead and consolidated all the items into one collection. [20:44] *** fredgido_ has joined #archiveteam-bs [20:53] https://archive.org/details/FanficRepack_Redux hey SketchCow, how do i regenerate a torrent? this one doesn't have all the zip files in it. [21:33] *** nataraj_ has quit IRC (Read error: Operation timed out) [21:34] so [21:35] that blogspot stuff is going away in <24 hours right? [21:44] Yep. [21:50] *** noirscape has quit IRC (Quit: ZNC 1.7.2 - https://znc.in) [21:50] *** argus has quit IRC (Read error: Connection reset by peer) [21:51] *** argus has joined #archiveteam-bs [21:51] *** noirscape has joined #archiveteam-bs [21:53] *** BlueMax has joined #archiveteam-bs [21:54] *** schbirid has quit IRC (Remote host closed the connection) [22:01] *** bitBaron has quit IRC (Read error: Connection reset by peer) [22:02] *** bitBaron has joined #archiveteam-bs [22:18] *** nerd1337 has quit IRC (Quit: http://www.mibbit.com ajax IRC Client) [22:55] *** Despatche has joined #archiveteam-bs [23:21] *** VADemon has quit IRC (Read error: Connection reset by peer) [23:23] *** glmd has quit IRC (Ping timeout: 260 seconds) [23:49] *** Oddly has quit IRC (Ping timeout: 255 seconds)