[00:01] working on it now [00:02] anyone wants to make a wiki page for it? [00:09] JAA: shall we just get the review URLs as well :P [00:09] the outlinks [00:10] That can work, in a separate job yeah~ [00:10] Ran the 2 jobs with no-offsite-links because of a lot of friction potential [00:11] will keep it in the same warrior project [00:12] only getting the HTML, no page requisites [00:13] Yeah, grabbing the reviews as well would be nice. Not sure if it should be done in a second phase though so we're sure we can grab the site itself in time before the shutdown. [00:13] if we have a large amount of concurrent it should not be a problem [00:13] I know it's only a few million URLs (including the image pages etc.), but still. [00:13] in case of problem on the outlinks I'll just skip them [00:13] problems* [00:15] do we know of any games with a very large number of reviews? [00:15] possible pagination [00:16] actually yeah, let's do it in a seperate grab project, we can also get the news URLs then [00:19] Gears of War has 108 reviews and no pagination: https://www.gamerankings.com/xbox360/928234-gears-of-war/articles.html [00:20] But if we just follow any link of https://www.gamerankings.com/PLATFORM/ID-SLUG/ (after detection of PLATFORM and SLUG, for IDs in the item range), any such cases should be covered. [00:21] yeah, but indeed let's do that later [00:21] and also get all articles [00:21] the news [00:21] Why later for that part? [00:21] Articles and news outlinks, sure. [00:22] (And just to be sure it's not missed, the images are offsite.) [00:23] cbistatic yeah [00:23] cbsistatic [00:23] getting those [00:24] :-) [00:30] tracker is online at gamerankings [00:30] we can use some targets :P [00:32] Kaz, kiska: ^ [00:34] alright ready I think [00:35] it was an easy site [00:36] will do 10 IDs/item [00:36] *** Terbium has joined #archiveteam-bs [00:38] *** superkuh has joined #archiveteam-bs [00:44] JAA: using FOS now [00:44] SketchCow: gamerankings is going to FOS [00:45] it's on the tracker, probably need the new wget-lua [00:45] I got a CONSSLERR with the old wget-lua versions [00:49] tiny problem, working on fix [00:49] tracker paused [00:52] *** odemgi_ has joined #archiveteam-bs [00:53] updated and items requeued [00:54] *** Terbium has quit IRC (Quit: Page closed) [00:59] *** odemgi has quit IRC (Read error: Operation timed out) [01:00] updated again [01:00] *** Terbium has joined #archiveteam-bs [01:02] jodizzle: thanks for staying up to date :P [01:03] Once again, the hero of archiveteam is revealed [01:04] haha, :P [01:04] well all should be running good now [01:04] * arkiver is afk for some food [01:09] made it the default project [01:13] limiting at 2000 items/min [01:13] should be done soon [01:14] *** odemgi has joined #archiveteam-bs [01:14] 2k per minute, wow. And Fusl isn't even on it. [01:14] Lots of 0 MB items, is that legit? [01:14] yeah [01:15] only a few ranges actually have games [01:15] Oh, yano is on it, nvm. [01:15] Right. [01:15] others just 302 to the main page [01:15] lol [01:15] hai [01:15] :-) [01:15] I'll switch default back to yahoo :P we need it more there [01:16] yano: have fun [01:16] :D [01:16] (And don't tell Fusl. :-P) [01:16] heh [01:16] I only queued 0-999999 [01:16] didn't see any IDs above 999999 [01:17] Yeah, might be worth checking the files returned over 925k or so if there's anything that isn't tiny. [01:17] will just queue 1000000-1499999 after this [01:17] should be all 0 MB then [01:17] Or that. [01:18] to be save :P [01:18] "Because we can." [01:18] :P [01:18] and website is fast sooo [01:18] *** odemgi_ has quit IRC (Read error: Operation timed out) [01:18] i set these up on my phone while i was waiting for my partner to come out of his house; i got a notification is got added to the tracker :D [01:18] nice! [01:19] * arkiver is really off for food now [01:19] i almost got this down to a science :3 [01:21] s/is got added/it got added/ [01:24] i just need to script the git clone of the latest *-grab repo on the GitHub page :3 [01:28] *** odemgi_ has joined #archiveteam-bs [01:36] *** odemgi has quit IRC (Read error: Operation timed out) [01:38] *** d5f4a3622 has joined #archiveteam-bs [01:41] *** odemgi has joined #archiveteam-bs [01:44] *** atbk_ has joined #archiveteam-bs [01:47] *** odemgi_ has quit IRC (Read error: Operation timed out) [01:48] *** atbk has quit IRC (Ping timeout: 745 seconds) [01:52] *** odemgi_ has joined #archiveteam-bs [01:56] *** odemgi has quit IRC (Read error: Operation timed out) [01:59] done [02:01] arkiver: are you gonna queue the next batch for gameranking? [02:04] *** odemgi has joined #archiveteam-bs [02:08] *** odemgi_ has quit IRC (Read error: Operation timed out) [02:14] *** d5f4a3622 has quit IRC (Read error: Connection reset by peer) [02:25] *** d5f4a3622 has joined #archiveteam-bs [02:26] *** odemgi_ has joined #archiveteam-bs [02:29] i stopped all of my workers gracefully [02:29] *** odemgi has quit IRC (Read error: Operation timed out) [02:38] *** odemgi has joined #archiveteam-bs [02:41] *** godane1 has quit IRC (Read error: Operation timed out) [02:42] *** godane has joined #archiveteam-bs [02:47] *** odemgi_ has quit IRC (Ping timeout: 610 seconds) [02:49] *** odemgi has quit IRC (Read error: Operation timed out) [02:55] *** HP_Archiv has joined #archiveteam-bs [02:57] *** odemgi has joined #archiveteam-bs [03:01] this is crazy https://www.zdnet.com/article/20-vps-providers-to-shut-down-on-monday-giving-customers-two-days-to-save-their-data/ [03:06] *** odemgi_ has joined #archiveteam-bs [03:08] *** godane has quit IRC (Ping timeout: 745 seconds) [03:09] *** odemgi has quit IRC (Ping timeout: 252 seconds) [03:15] *** godane has joined #archiveteam-bs [03:15] Hmm. We should probably grab the websites for those services [03:20] anarcat, jodizzle - doing it right now~ [03:20] Baking up a list - better than having to toss it in one by one [03:21] *** odemgi has joined #archiveteam-bs [03:23] Thanks Ryz [03:23] *** odemgi_ has quit IRC (Ping timeout: 252 seconds) [03:30] If there's more like that anarcat and anyone else that read the article, pitch in more info~ [03:48] *** JackT has joined #archiveteam-bs [03:56] *** Ravenloft has quit IRC (Read error: Connection reset by peer) [04:22] *** odemgi_ has joined #archiveteam-bs [04:24] *** odemgi has quit IRC (Ping timeout: 252 seconds) [04:35] *** WooWoo289 has joined #archiveteam-bs [04:55] *** coldon2dr has joined #archiveteam-bs [04:58] *** ryangerma has joined #archiveteam-bs [04:58] hi i have this IBM CD i found but it says US Government users Restrict Rights use duplication or disclosure by GSA ADP Schedule Contract with IBM corp since it wasnt from a government entity it doesnt apply right? Can I send it to you? [05:00] *** Terbium has quit IRC (Read error: Operation timed out) [05:00] *** qw3rty has joined #archiveteam-bs [05:01] that means "there are extra rules that apply to you if you are the government" but unless you are government you can do more or less what you want [05:05] *** coldon2dr has quit IRC (Leaving) [05:07] *** qw3rty2 has quit IRC (Ping timeout: 745 seconds) [05:07] *** ryangerma has quit IRC (Ping timeout: 260 seconds) [05:17] *** Craigle has quit IRC () [05:17] *** WooWoo289 is now known as Craigle [08:57] *** britm0b has joined #archiveteam-bs [08:58] *** Atom-- has joined #archiveteam-bs [08:59] *** yano_ has joined #archiveteam-bs [08:59] *** dxrt- has joined #archiveteam-bs [09:00] *** Hoolootwo has joined #archiveteam-bs [09:00] *** foureyes_ has joined #archiveteam-bs [09:02] *** odemgi has joined #archiveteam-bs [09:02] *** is-_ has joined #archiveteam-bs [09:02] *** Dj-Wawa_ has joined #archiveteam-bs [09:02] *** Craigle has quit IRC (Quit: The Lounge - https://thelounge.chat) [09:02] *** klg_ has joined #archiveteam-bs [09:07] *** ats_ has joined #archiveteam-bs [09:12] *** odemgi_ has quit IRC (se.hub irc.underworld.no) [09:12] *** ats has quit IRC (se.hub irc.underworld.no) [09:12] *** i0npulse has quit IRC (se.hub irc.underworld.no) [09:12] *** purplebot has quit IRC (se.hub irc.underworld.no) [09:12] *** kiska has quit IRC (se.hub irc.underworld.no) [09:12] *** Flashfire has quit IRC (se.hub irc.underworld.no) [09:12] *** Atom__ has quit IRC (se.hub irc.underworld.no) [09:12] *** britmob_ has quit IRC (se.hub irc.underworld.no) [09:12] *** britmob has quit IRC (se.hub irc.underworld.no) [09:12] *** pew has quit IRC (se.hub irc.underworld.no) [09:12] *** klg has quit IRC (se.hub irc.underworld.no) [09:12] *** ranma has quit IRC (se.hub irc.underworld.no) [09:12] *** OrIdow6 has quit IRC (se.hub irc.underworld.no) [09:12] *** deevious has quit IRC (se.hub irc.underworld.no) [09:12] *** Dj-Wawa has quit IRC (se.hub irc.underworld.no) [09:12] *** wp494 has quit IRC (se.hub irc.underworld.no) [09:12] *** lempamo has quit IRC (se.hub irc.underworld.no) [09:12] *** VoynichCr has quit IRC (se.hub irc.underworld.no) [09:12] *** dxrt has quit IRC (se.hub irc.underworld.no) [09:12] *** yano has quit IRC (se.hub irc.underworld.no) [09:12] *** coderobe has quit IRC (se.hub irc.underworld.no) [09:12] *** foureyes has quit IRC (se.hub irc.underworld.no) [09:12] *** is- has quit IRC (se.hub irc.underworld.no) [09:12] *** Hooloovoo has quit IRC (se.hub irc.underworld.no) [09:25] *** Medowar has joined #archiveteam-bs [09:28] *** wp494 has joined #archiveteam-bs [09:28] *** ranma has joined #archiveteam-bs [09:28] *** britmob has joined #archiveteam-bs [09:29] *** lempamo has joined #archiveteam-bs [09:37] *** dxrt- is now known as dxrt [09:37] *** dxrt has quit IRC (ZNC - http://znc.sourceforge.net) [09:37] *** dxrt has joined #archiveteam-bs [09:38] *** svchfoo3 sets mode: +o dxrt [09:38] *** svchfoo1 sets mode: +o dxrt [09:39] *** VoynichCr has joined #archiveteam-bs [09:44] *** tech234a has quit IRC (Quit: Connection closed for inactivity) [10:09] *** OrIdow6 has joined #archiveteam-bs [10:09] *** pew has joined #archiveteam-bs [10:11] *** i0npulse has joined #archiveteam-bs [10:24] *** deevious has joined #archiveteam-bs [11:14] from #warrior: [11:14] The warrior wikipage doesn't point anyone towards the docker container. This is much easier for a ton of people than downloading the VM, these days [11:14] if someone wants to have a go at fixing that, please do [11:20] *** VerifiedJ has joined #archiveteam-bs [11:27] It does [11:29] you're right. Could do with being more prominent though [11:51] *** kiska has joined #archiveteam-bs [11:51] *** svchfoo1 sets mode: +o kiska [11:51] *** svchfoo3 sets mode: +o kiska [12:44] *** BlueMax has quit IRC (Quit: Leaving) [12:59] *** foureyes_ is now known as foureyes [13:11] *** indelible has joined #archiveteam-bs [13:27] *** yano_ is now known as yano [13:29] *** yano has quit IRC (WeeChat, The Better IRC Client, https://weechat.org/) [13:32] *** yano has joined #archiveteam-bs [13:32] *** i0npulse has quit IRC (Ping timeout: 248 seconds) [13:32] *** i0npulse has joined #archiveteam-bs [14:12] *** indelible has quit IRC (Remote host closed the connection) [14:12] *** indelible has joined #archiveteam-bs [14:39] We need to start plays.tv for a 1 day grab [15:07] @Kaz I've updated the page, I agree as well [15:08] Cheers [15:09] at some point, I think it would be cool to have something like the alpine virtualbox image auto built for a rPi and require a 64GB SD Card and offer that up for super easy installs [15:10] something to throw at the pile :) [15:11] throw a X Server and Firefox ESR so you can manage it right from the Pi [15:11] Ryz: thanks! [15:23] I'm not a fan of having RPi's mixed in with servers [15:26] for the standard warrior, most people are using Desktops running virtual box [15:26] with a rPi4 its faster then some cheap VPSes [15:29] *** MaximeleG has joined #archiveteam-bs [15:32] *** tech234a has joined #archiveteam-bs [15:33] *** SoraUta has joined #archiveteam-bs [15:37] *** killsushi has joined #archiveteam-bs [15:41] *** DigiDigi has quit IRC (Remote host closed the connection) [16:27] *** killsushi has quit IRC (Quit: Leaving) [16:29] Hold up, motherfuckers, I'm uploading 234gb of Professional Darts Competitions to IA [16:29] I mean, who cares about Yahoo [16:29] Daaaaaaarts [16:38] *** coderobe has joined #archiveteam-bs [16:44] *** phiresky9 has joined #archiveteam-bs [16:46] *** phiresky9 is now known as phiresky [16:50] *** DigiDigi has joined #archiveteam-bs [17:02] arkiver: As usual, you set up gamerankings so it's thousands of tiny, tiny files, which will take a long time to process. [17:03] I see it's at 7.2gb and already past 16,663 items in one object. [17:03] Try and not do that again, next time [17:03] *** X-Scale` has joined #archiveteam-bs [17:10] *** X-Scale has quit IRC (Ping timeout: 610 seconds) [17:10] *** X-Scale` is now known as X-Scale [17:18] *** Craigle has joined #archiveteam-bs [17:18] https://arstechnica.com/tech-policy/2019/12/verizon-reportedly-blocks-archivists-from-yahoo-groups-days-before-deletion/ [17:18] PurpleSym: purplebot's disappeared. [17:18] that's wack, yo [17:19] Traffic has increased to the wiki, I've done some light mitigations to ensure it survives [17:21] Inb4 508 Resource Limit Is Reached [17:22] *** SoraUta has quit IRC (Ping timeout: 610 seconds) [17:22] I do have a special error page for that [17:22] it links to the wayback version of the wiki (current latest version) [17:23] https://archiveteam.org/508.html [17:23] nice [17:23] Ah right, cool! [17:24] now, you can stil bury the webserver, but its super hard to cause the 508 to bury itself [17:24] * Frogging gets the shovel and starts digging [17:24] :p [17:24] * JAA grabs the qwarc cannon. [17:28] Real time usage: 0.114 seconds [17:28] so 100ms to gen a page [17:28] not too bad [17:28] but we have like 10 Chinese bots scraping the site right now [17:28] lol [17:33] its interesting, I'm watching the access logs to yahoo_groups [17:34] most are now coming off twitter and Mastodon [17:34] *** purplebot has joined #archiveteam-bs [17:34] and the sheer amount of Mastodon instances [17:35] I wonder if that page is up to date sufficiently [17:35] JAA: It’s back. [18:15] *** Medowar has quit IRC (Quit: Connection closed for inactivity) [18:50] *** systwi_ is now known as systwi [19:38] *** jc86035 has joined #archiveteam-bs [19:54] *** DogsRNice has joined #archiveteam-bs [19:58] *** Jopik has joined #archiveteam-bs [19:59] *** Jopik has quit IRC (Client Quit) [19:59] *** Jopik has joined #archiveteam-bs [20:27] *** MaximeleG has quit IRC (Quit: MaximeleG) [20:48] *** tech234a has quit IRC (Quit: Connection closed for inactivity) [20:50] *** tech234a has joined #archiveteam-bs [20:52] *** HP_Archiv has quit IRC (Read error: Connection reset by peer) [20:58] *** jamiew has joined #archiveteam-bs [21:04] *** OrIdow6 has quit IRC (Ping timeout: 248 seconds) [21:28] *** BlueMax has joined #archiveteam-bs [21:46] *** bluefoo has quit IRC (Ping timeout: 255 seconds) [21:55] *** X-Scale` has joined #archiveteam-bs [21:59] *** X-Scale has quit IRC (Read error: Operation timed out) [21:59] *** X-Scale` is now known as X-Scale [22:02] *** HP_Archiv has joined #archiveteam-bs [22:12] *** jamiew has quit IRC (zzz) [22:18] *** bluefoo has joined #archiveteam-bs [23:00] *** kode54 has quit IRC (Quit: The Lounge - https://thelounge.chat) [23:27] *** britm0b has quit IRC (Read error: Connection reset by peer) [23:27] *** britmob has quit IRC (Read error: Connection reset by peer) [23:31] *** britmob has joined #archiveteam-bs [23:31] *** OrIdow6 has joined #archiveteam-bs [23:58] *** jc86035 has quit IRC (Quit: Connection closed for inactivity)