[00:02] *** icedice has quit IRC (Read error: Connection reset by peer) [00:02] *** icedice has joined #archiveteam-bs [02:23] Grab it all [02:23] Yank it [02:23] In case the contract dies [02:24] As for the spam reviewer HA HA MOTHERFUCKERS [02:24] I have a set of 5 scripts running like a sentinel. It is currently not possible to post any of those URLs in any form in any review in the Internet Archive. [02:24] They don't last for more than 15 minutes at the most. I watched him do 10,000-11,000 spam reviews today, they're gone, gone gone. Also any going back anywhere in time. [02:26] Like that shit is done, and that's a couple hours I'm never getting back. [02:29] *** sec^nd has quit IRC (Read error: Connection reset by peer) [02:31] *** wyatt8740 has quit IRC (Quit: Ceci n'est pas un IRC quit message.) [02:33] *** wyatt8740 has joined #archiveteam-bs [02:36] *** second has joined #archiveteam-bs [03:31] *** odemgi_ has joined #archiveteam-bs [03:34] *** odemg has quit IRC (Ping timeout: 265 seconds) [03:37] *** Fusl6 has joined #archiveteam-bs [03:37] *** odemgi has quit IRC (Read error: Operation timed out) [03:40] I'll share my hours with you SketchCow [03:41] *** Fusl5 has quit IRC (Read error: Operation timed out) [03:46] *** odemg has joined #archiveteam-bs [04:29] *** jspiros has quit IRC (Read error: Operation timed out) [04:30] *** jspiros has joined #archiveteam-bs [05:15] *** godane has joined #archiveteam-bs [05:17] !ao https://www.yahoo.com/finance/news/40-acres-and-a-mule-reparations-in-2019-190018747.html [05:17] wrong channel [05:19] *** AnthonyI has quit IRC (Ping timeout: 264 seconds) [05:24] *** PurpleSym has quit IRC (Read error: Operation timed out) [05:24] *** PurpleSym has joined #archiveteam-bs [05:25] *** purplebot has quit IRC (Read error: Operation timed out) [05:26] *** Fusl has quit IRC (Read error: Operation timed out) [05:26] *** robogoat_ has joined #archiveteam-bs [05:26] *** Fusl has joined #archiveteam-bs [05:26] *** nyany has quit IRC (Ping timeout: 506 seconds) [05:26] *** Fusl_ sets mode: +o Fusl [05:27] *** wp494 has quit IRC (Ping timeout: 506 seconds) [05:27] *** svchfoo3 has quit IRC (Read error: Operation timed out) [05:28] *** wp494 has joined #archiveteam-bs [05:29] *** atomicthu has quit IRC (Ping timeout: 506 seconds) [05:29] *** robogoat has quit IRC (Ping timeout: 506 seconds) [05:29] *** mr_archiv has quit IRC (Ping timeout: 506 seconds) [05:29] *** atomicthu has joined #archiveteam-bs [05:31] *** mr_archiv has joined #archiveteam-bs [05:32] *** icedice2 has joined #archiveteam-bs [05:32] *** icedice2 has quit IRC (Connection closed) [05:33] *** icedice2 has joined #archiveteam-bs [05:37] *** icedice has quit IRC (Read error: Operation timed out) [05:42] *** nyaomi has quit IRC (Quit: meow) [06:10] *** nyaomi has joined #archiveteam-bs [06:21] *** purplebot has joined #archiveteam-bs [06:22] *** nyany has joined #archiveteam-bs [06:23] *** svchfoo3 has joined #archiveteam-bs [06:23] *** Fusl sets mode: +o svchfoo3 [06:23] *** svchfoo1 sets mode: +o svchfoo3 [06:32] *** icedice has joined #archiveteam-bs [06:39] *** icedice2 has quit IRC (Read error: Operation timed out) [07:03] *** stapler11 has quit IRC (Read error: Connection reset by peer) [07:04] *** stapler11 has joined #archiveteam-bs [07:16] *** Raccoon has joined #archiveteam-bs [10:30] *** icedice has quit IRC (Read error: Connection reset by peer) [10:30] *** icedice has joined #archiveteam-bs [10:37] *** icedice has quit IRC (Read error: Connection reset by peer) [10:37] *** icedice has joined #archiveteam-bs [11:12] *** Raccoon has quit IRC (Ping timeout: 265 seconds) [11:50] *** BlueMax has quit IRC (Read error: Connection reset by peer) [11:53] *** icedice has quit IRC (Quit: Leaving) [13:09] *** killsushi has quit IRC (Quit: Leaving) [13:10] *** schbirid has joined #archiveteam-bs [13:16] *** BartoCH has quit IRC (Ping timeout: 615 seconds) [13:47] *** katocala has quit IRC (Read error: Operation timed out) [15:03] *** Raccoon has joined #archiveteam-bs [15:12] i have https://github.com/Fusl/ateam-scripts now for anyone interested in knowing how i do my stuff [15:28] JAA: https://github.com/Fusl/ateam-scripts/blob/master/df/nratv/files/script.sh [15:30] Fusl_: Nice. [15:30] Soon we might need a list of repositories containing useful scripts. lol [15:30] :D [15:30] Fusl_: I think wpull won't write a meta WARC with those args anyway, but why do you filter it out before the upload? [15:31] its just a copy paste of the sonysketch one: https://github.com/Fusl/ateam-scripts/blob/master/df/sonysketch/files/script.sh [15:32] grab-site definitely writes meta WARCs. So I guess those are lost? [15:32] yeah i dont grab those meta warcs, the data warcs ended up in megawarcs [15:32] :-( [15:32] meta WARCs are important as well, they contain the retrieval log. [15:32] are they any useful? [15:33] ic [15:33] but we dont want them in megawarcs, right [15:33] right? [15:34] Not sure. It might be best to have two megawarcs, one with the data and one with the meta WARCs. That way you can easily access the logs without downloading the entire thing. [15:34] why even megawarc grab-site output? [15:34] related sites? [15:35] ivan_: sony sketch stuff [15:35] ah [15:35] roundabout 2.4tb of sony sketch images, each batch was 150 sketch ids and was assigned to one grab-site worker [15:36] also, JAA, looks like grab-site is actually not writing the meta warcs for me anymore [15:37] i just checked the running jobs on mips and none of them has any meta warcs [15:37] Fusl_: The meta WARC is only written at the end. [15:37] (And not when wpull crashes) [15:37] oic [15:38] welp [15:46] *** BartoCH has joined #archiveteam-bs [16:54] http://xor.meo.ws/e1db0857/1c6e/4b4d/800a/a71830b109b8.png not gonna cancel any of my ex42-nvme servers any time soon it seems :D [17:09] they've been out for a few days at least, bad times [17:11] Yeah, noticed that the other day as well. [17:23] heh, I was wondering how much stock they had [17:35] *** godane has quit IRC (Quit: Leaving.) [17:36] *** Stilettoo is now known as Stiletto [17:38] *** Kenshin has quit IRC (Quit: ZNC - http://znc.in) [17:40] *** Kenshin has joined #archiveteam-bs [17:40] *** Fusl sets mode: +o Kenshin [17:41] so uh if anyone around wants to get rid of their ex42-nvme server, preferably in finland but germany is also fine, let me know and i'll add them to the ateam hoard http://xor.meo.ws/8000c941/de80/43e1/82e7/05587f15c62b.png :P [17:44] *** Hani111 has joined #archiveteam-bs [17:54] *** Hani has quit IRC (Ping timeout: 615 seconds) [17:54] *** Hani111 is now known as Hani [18:39] *** Atom-- has quit IRC (Ping timeout: 604 seconds) [18:52] *** Raccoon has quit IRC (Read error: Connection reset by peer) [19:02] JAA: fyi i'm doing a custom crawl of www.edis.at the old website pointing dns to the old ip address on mips. they recently got a new website design and lots of information was not ported over to it, especially support articles, etc. [19:03] imma dump it directly into IA once done but dont think we want it in the WBM i guess? [19:04] Fusl: Hmm, last time I had this issue it was not concerning to have it in the WBM because the domain was gone, but yeah, in this case, it might be better to not have it in there since it would collide with the actual current website. [19:06] arkiver, SketchCow: Thoughts? ^ [19:06] Fusl: I assume the old site is not accessible directly with just the IP but requires the domain in the Host header? [19:07] correct [19:10] You could try asking them to point a subdomain to the old site, I'm guessing they probably wouldn't bother though. [19:11] thats not easily doable [19:11] IA would love the data [19:11] the software running on that thing doesnt allow just "pointing a domain to it" [19:11] But it might not be a good idea to put this in the Wayback Machine [19:12] One more alternative: grab it under the IP and use --header 'Host: www.edis.at'. That should make wpull write the IP to the WARC headers, so the snapshots wouldn't collide with the current site. [19:12] But I'm not sure how that behaves for offsite links. [19:12] Or images etc. [19:12] hmm interesting [19:13] And also it would still be incorrect since you can't access the site under the URL in the WARC then... [19:13] yeah that [19:14] all the page resources wouldnt load correctly [19:15] In theory, wpull should probably overwrite the Host header if a child URL is on a different host. In practice, it probably doesn't do that. [19:16] In practice, it actually doesn't do that [20:20] JAA Igloo: fyi, there's been a brief network outage on OSS host as the maximum-prefix limit on the routers were tripped due to me adding two more /24s for mips. everything is back now and the maximum-prefix limit on NFOrce router side has been raised by x3 [20:23] *** Dj-Wawa has quit IRC (Quit: Connection closed for inactivity) [20:37] *** thepaul has joined #archiveteam-bs [20:37] ivan_ JAA: it was knzk [20:39] thepaul: URLs? [20:41] knzk.me? [20:43] that one' [20:44] chromebot was used to archive one page on it so it's probably just that and whatever other scraps are in wayback [20:45] is mastodon some sort of running joke I'm not in on? why does it exist and why is it so bad [20:46] a playground for failing at federated social networking [20:47] federation but with a large amount of centralization because who can operate servers anyway [20:47] no data escrow/evacuation [20:50] sounds like a right laugh, unless you're one of the suckers that suddenly has all their data dropped because someone forgot to pay a bill [20:53] thepaul: you could try looking for your cached data in your browser profile or other people's mastodon instances [21:12] *** icedice has joined #archiveteam-bs [21:19] *** icedice has quit IRC (Read error: Connection reset by peer) [21:19] *** icedice has joined #archiveteam-bs [21:20] or uh ask this guy http://web.archive.org/web/20180307150224/https://knzk.me/@Knzk [21:21] *** Stilettoo has joined #archiveteam-bs [21:21] *** Stiletto has quit IRC (Read error: Operation timed out) [21:52] ive yet to figure out whos killing the connection between me and archive.org [21:52] i can get to the homepage but any attempt to view an archived page results in "web.archive.org unexpectedly closed the connection" [21:55] *** icedice has quit IRC (Read error: Connection reset by peer) [21:55] *** icedice has joined #archiveteam-bs [22:08] *** Raccoon has joined #archiveteam-bs [22:16] *** BlueMax has joined #archiveteam-bs [22:26] connecting from where? [22:35] The A-team hoard need better names besides 'archivebox-hel1' [22:36] Like... Hannibal and Howling Mad and Faceman and Bad A Baracus [22:36] sounds great until you forgot which is which [22:36] fun names don't scale past.. 5 machine [22:37] Do the Mormons number their kids? Do the Roman Catholics? [22:37] roman numeral kids? [22:39] yes, actually: https://www.bbc.co.uk/news/uk-politics-40506109 [22:39] 5 got normal names, 6th was 'Sixtus' [22:41] heh. I pitty that foo [22:59] *** killsushi has joined #archiveteam-bs [23:45] *** kiskabak has quit IRC (Remote host closed the connection) [23:45] *** kiskabak has joined #archiveteam-bs [23:45] *** Fusl sets mode: +o kiskabak [23:46] abstract I've got that issue before, try clearing your site cookies/cached data for *.archive.org