[00:05] Tigris is dead since a couple minutes. [00:08] Nevermind, it's back. :-P [00:13] Yoyo time! [00:17] *** Lord_Nigh has quit IRC (Quit: ZNC - http://znc.in) [00:19] *** Lord_Nigh has joined #archiveteam-bs [00:38] *** Arcorann has joined #archiveteam-bs [01:44] *** britmob has quit IRC (Read error: Connection reset by peer) [01:47] *** britmob has joined #archiveteam-bs [02:04] *** LowLevelM has joined #archiveteam-bs [02:16] *** Lord_Nigh has quit IRC (Read error: Operation timed out) [02:16] *** Lord_Nigh has joined #archiveteam-bs [02:46] *** Datechnom has quit IRC (Read error: Connection reset by peer) [03:09] Here's one thing about Tigris I haven't mentioned yet (I think): it has a file sharing thingy (intended for release binaries etc.), and it's horrible and broken. Try to navigate http://subversion.tigris.org/servlets/ProjectDocumentList for example. You can't get to the nested directories at all. This means that probably it won't be possible to reliably retrieve all downloads. [03:13] There is a search function for documents, which might at first seem like a workaround for this but is also horrible and broken. For example, searching the subversion project for .tar.gz files http://subversion.tigris.org/servlets/Search?artifact=nidaba+document&query=.tar.gz&resultsPerPage=40&scope=project does not list the subversion-0.19.1.tar.gz file that appears when you search for that version [03:13] number: http://subversion.tigris.org/servlets/Search?resultsPerPage=50&query=0.19.1&scope=project&artifact=nidaba+document [03:13] And searching for 0.19.1.tar.gz produces no results at all. [03:13] So yeah... [03:19] Oh, actually, you can browse that document thing kind of, but only with cookies. Nevermind then. [03:26] The faster grab of subversion's discussions should finish in 2-3 hours. [03:26] The other one is still running but will probably just error out eventually when they pull the plug. [03:26] *** qw3rty__ has joined #archiveteam-bs [03:26] So in 2-3 hours, I will have everything I know of and intended to grab. [03:32] *** LowLevelM has quit IRC (The Lounge - https://thelounge.chat) [03:33] *** qw3rty_ has quit IRC (Read error: Operation timed out) [03:53] *** godane has quit IRC (Ping timeout: 265 seconds) [03:53] *** DogsRNice has quit IRC (Read error: Connection reset by peer) [03:54] *** godane has joined #archiveteam-bs [04:05] Komixxy is eating through the user profiles now and should be done in 23 hours or so. [04:05] Turiver is at ~75 % done. [04:11] my tiny VPS ran out of disk space, I should have made the script gzip the chat files as soon as they're written (80% savings) [04:53] (#archiveteam) Winnipeg Free Press seems to use a lot of javascript. Comments in particular are apparently thanks to "OpenWeb": https://www.openweb.com/ [04:55] So, ArchiveBot is a no-go for this, even just to get articles. [05:00] *** Datechnom has joined #archiveteam-bs [06:30] *** nicolas17 has quit IRC (Quit: Konversation terminated!) [06:46] *** HP_Archiv has joined #archiveteam-bs [06:55] *** HP_Archiv has quit IRC (Quit: Leaving) [07:27] *** HP_Archiv has joined #archiveteam-bs [07:29] The comment provider's homepage does not bode well [08:49] *** godane has quit IRC (Quit: Leaving.) [11:00] *** godane has joined #archiveteam-bs [11:09] *** BlueMax has quit IRC (Quit: Leaving) [11:23] *** rziman has joined #archiveteam-bs [11:24] *** schbirid has joined #archiveteam-bs [12:03] "OpenWeb is a social engagement platform that builds online communities around digital content.[1] OpenWeb works with publishers to bring conversations back from social networks to publisher sites." [12:04] (From Wikipedia) [12:04] "It's another silo, but it's not Facebook, so it's *obviously* better!!1!" [12:07] The comments are loaded with POST (because of course they are), so even if we manage to grab them, the WBM won't play them back. [12:08] The post (= article) identifier is also passed through a header, so even if the WBM somehow managed to play back the POST requests, it would probably load random comments rather than the ones belonging to a particular article. [12:42] *** dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.) [12:42] *** rziman has quit IRC (Ping timeout: 252 seconds) [12:43] *** dashcloud has joined #archiveteam-bs [12:51] schbirid: The AB job crashed due to a big file. I'll rerun it later when the pipelines have drained a bit. [12:59] JAA: huh, weird. anyways, the site seems half broken at the moment anyways [13:00] As predicted, my Tigris subversion discussion grab finished at about 05:50. There were a handful of errors due to the short outage just after midnight, but otherwise, it looks good, apart from a few attachments that can't be downloaded "because malware". [13:00] schbirid: Mhm, want to wait and hope it gets fixed? [13:08] probably best [13:09] Yeah, unless it suddenly disappears forever. [13:10] Could run a quick job without offsite links and aggressive ignores to at least get a basic dump of what's accessible now, then a more thorough archive when/if it comes back properly. [13:10] :D [13:10] brb [13:10] *** schbirid has quit IRC (Quit: Leaving) [13:13] *** schbirid has joined #archiveteam-bs [14:28] *** schbirid has quit IRC (Quit: Leaving) [14:43] *** HP_Archiv has quit IRC (Quit: Leaving) [14:54] *** HP_Archiv has joined #archiveteam-bs [15:07] *** Arcorann has quit IRC (Read error: Connection reset by peer) [15:12] *** qw3rty__ has quit IRC (Leaving) [15:36] *** maxfan8_ has quit IRC (WeeChat 2.8) [15:36] *** maxfan8 has joined #archiveteam-bs [15:38] *** DigiDigi has quit IRC (Read error: Operation timed out) [15:49] *** DigiDigi has joined #archiveteam-bs [15:53] *** HP_Archiv has quit IRC (Quit: Leaving) [17:14] Tigris can be considered done. My original crawl is still slowly going through the subversion discussions, and I can't really kill that cleanly, but I grabbed those separately anyway; it will eventually die when they take the site offline. The AB job for CVS repos is also still running, and I'll let it run to the bitter end because my crawl only focused on getting as much as possible of the actual [17:15] repo data, not the various links that could be useful for users. There are a handful of projects which are almost or entirely impossible to grab because they're very slow or throw errors, so those are not covered at all, but I'm not sure anything can be done about that: argouml-groovy, fikafighters, lawngnome, odontosoft, realmforge, and roseevo. [17:57] *** Nikchemny has joined #archiveteam-bs [17:58] Hello, there is site goodgame.ru and som streams from this site: https://docs.google.com/spreadsheets/u/0/d/1sXWdJBU17YmLBNfYBPE6-ag1oqCTyCOhQnykr5JHOSw/htmlview#gid=1621977900 [17:58] !ig 9ipompjl9jo5lsvvanx7wbz1w ^https?://www\.afroamcivilwar\.org/events/eventsbyday/ [17:59] blah [18:20] *** SynMonger has quit IRC (Quit: Wait, what?) [18:23] *** SynMonger has joined #archiveteam-bs [18:24] *** SynMonger has quit IRC (Client Quit) [18:25] *** SynMonger has joined #archiveteam-bs [18:37] *** DLoader_ has joined #archiveteam-bs [18:47] *** DLoader has quit IRC (Ping timeout: 745 seconds) [18:47] *** DLoader_ is now known as DLoader [18:50] *** RichardG_ has joined #archiveteam-bs [18:52] *** nicolas17 has joined #archiveteam-bs [18:55] *** RichardG has quit IRC (Ping timeout: 496 seconds) [19:05] *** fredgido has joined #archiveteam-bs [19:22] *** Xibalba has quit IRC (Quit: ZNC - https://znc.in) [19:30] *** Xibalba has joined #archiveteam-bs [19:39] #archivebot [19:59] Komixxy finished about an hour ago. Will check later if there's anything left to do there. [20:03] So I'll put out a tentative "claim" on the Winnipeg Free Press comments (in light of the impediments to playback, may see about warc + text files of the extracted comments or something like that); hopefully it doesn't end up being too complicated [20:04] Have fun with the access tokens, device UUIDs, and all that crap I've seen. :-| [20:21] curiously, page moves in wiki must be approved my moderator, by i am one myself [20:22] weeeird [20:28] You're an automoderated user, not a mod. But yeah, odd. jrwr? [20:30] well, now i see page was moved correctly, did anybody approve the change? or it was auto? [20:30] https://www.archiveteam.org/index.php?title=List_of_newspapers&action=history [20:31] I approved it. [20:51] VoynichCr: Will you grow "list of newspapers" page? Or it'll have only for Wikidata lists? [20:52] I mean can I add vedomosti.ru and kommersant.ru? [20:52] sure, add them in a Russian == section == [20:53] Ok, I'll do it later. Btw, the site must literally has a real newspaper, or just news-site? [20:54] https://www.ng.ru/ and https://www.gazeta.ru/ , https://novayagazeta.ru/ [20:55] VoynichCr [20:55] news site is fine [20:55] ok [20:56] Hmm, I have many examples of them. [20:56] *** Nikchemny has quit IRC (Quit: Page closed) [20:57] *** godane has quit IRC (Quit: Leaving.) [21:10] *** Mayonaise has quit IRC (Read error: Operation timed out) [21:57] *** Maylay_ has quit IRC (Read error: Operation timed out) [22:01] *** Maylay has joined #archiveteam-bs [22:47] https://nicolas17.s3.amazonaws.com/reckful-meta-with-chats.zip.torrent metadata, HLS playlists, and chat logs for every Twitch VOD of user Reckful [23:47] *** ranma has joined #archiveteam-bs [23:55] *** BlueMax has joined #archiveteam-bs