#archiveteam-bs 2020-07-04,Sat

↑back Search

Time Nickname Message
00:05 🔗 JAA Tigris is dead since a couple minutes.
00:08 🔗 JAA Nevermind, it's back. :-P
00:13 🔗 kiska Yoyo time!
00:17 🔗 Lord_Nigh has quit IRC (Quit: ZNC - http://znc.in)
00:19 🔗 Lord_Nigh has joined #archiveteam-bs
00:38 🔗 Arcorann has joined #archiveteam-bs
01:44 🔗 britmob has quit IRC (Read error: Connection reset by peer)
01:47 🔗 britmob has joined #archiveteam-bs
02:04 🔗 LowLevelM has joined #archiveteam-bs
02:16 🔗 Lord_Nigh has quit IRC (Read error: Operation timed out)
02:16 🔗 Lord_Nigh has joined #archiveteam-bs
02:46 🔗 Datechnom has quit IRC (Read error: Connection reset by peer)
03:09 🔗 JAA Here's one thing about Tigris I haven't mentioned yet (I think): it has a file sharing thingy (intended for release binaries etc.), and it's horrible and broken. Try to navigate http://subversion.tigris.org/servlets/ProjectDocumentList for example. You can't get to the nested directories at all. This means that probably it won't be possible to reliably retrieve all downloads.
03:13 🔗 JAA There is a search function for documents, which might at first seem like a workaround for this but is also horrible and broken. For example, searching the subversion project for .tar.gz files http://subversion.tigris.org/servlets/Search?artifact=nidaba+document&query=.tar.gz&resultsPerPage=40&scope=project does not list the subversion-0.19.1.tar.gz file that appears when you search for that version
03:13 🔗 JAA number: http://subversion.tigris.org/servlets/Search?resultsPerPage=50&query=0.19.1&scope=project&artifact=nidaba+document
03:13 🔗 JAA And searching for 0.19.1.tar.gz produces no results at all.
03:13 🔗 JAA So yeah...
03:19 🔗 JAA Oh, actually, you can browse that document thing kind of, but only with cookies. Nevermind then.
03:26 🔗 JAA The faster grab of subversion's discussions should finish in 2-3 hours.
03:26 🔗 JAA The other one is still running but will probably just error out eventually when they pull the plug.
03:26 🔗 qw3rty__ has joined #archiveteam-bs
03:26 🔗 JAA So in 2-3 hours, I will have everything I know of and intended to grab.
03:32 🔗 LowLevelM has quit IRC (The Lounge - https://thelounge.chat)
03:33 🔗 qw3rty_ has quit IRC (Read error: Operation timed out)
03:53 🔗 godane has quit IRC (Ping timeout: 265 seconds)
03:53 🔗 DogsRNice has quit IRC (Read error: Connection reset by peer)
03:54 🔗 godane has joined #archiveteam-bs
04:05 🔗 JAA Komixxy is eating through the user profiles now and should be done in 23 hours or so.
04:05 🔗 JAA Turiver is at ~75 % done.
04:11 🔗 nicolas17 my tiny VPS ran out of disk space, I should have made the script gzip the chat files as soon as they're written (80% savings)
04:53 🔗 jodizzle (#archiveteam) Winnipeg Free Press seems to use a lot of javascript. Comments in particular are apparently thanks to "OpenWeb": https://www.openweb.com/
04:55 🔗 jodizzle So, ArchiveBot is a no-go for this, even just to get articles.
05:00 🔗 Datechnom has joined #archiveteam-bs
06:30 🔗 nicolas17 has quit IRC (Quit: Konversation terminated!)
06:46 🔗 HP_Archiv has joined #archiveteam-bs
06:55 🔗 HP_Archiv has quit IRC (Quit: Leaving)
07:27 🔗 HP_Archiv has joined #archiveteam-bs
07:29 🔗 OrIdow6 The comment provider's homepage does not bode well
08:49 🔗 godane has quit IRC (Quit: Leaving.)
11:00 🔗 godane has joined #archiveteam-bs
11:09 🔗 BlueMax has quit IRC (Quit: Leaving)
11:23 🔗 rziman has joined #archiveteam-bs
11:24 🔗 schbirid has joined #archiveteam-bs
12:03 🔗 JAA "OpenWeb is a social engagement platform that builds online communities around digital content.[1] OpenWeb works with publishers to bring conversations back from social networks to publisher sites."
12:04 🔗 JAA (From Wikipedia)
12:04 🔗 JAA "It's another silo, but it's not Facebook, so it's *obviously* better!!1!"
12:07 🔗 JAA The comments are loaded with POST (because of course they are), so even if we manage to grab them, the WBM won't play them back.
12:08 🔗 JAA The post (= article) identifier is also passed through a header, so even if the WBM somehow managed to play back the POST requests, it would probably load random comments rather than the ones belonging to a particular article.
12:42 🔗 dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.)
12:42 🔗 rziman has quit IRC (Ping timeout: 252 seconds)
12:43 🔗 dashcloud has joined #archiveteam-bs
12:51 🔗 JAA schbirid: The AB job crashed due to a big file. I'll rerun it later when the pipelines have drained a bit.
12:59 🔗 schbirid JAA: huh, weird. anyways, the site seems half broken at the moment anyways
13:00 🔗 JAA As predicted, my Tigris subversion discussion grab finished at about 05:50. There were a handful of errors due to the short outage just after midnight, but otherwise, it looks good, apart from a few attachments that can't be downloaded "because malware".
13:00 🔗 JAA schbirid: Mhm, want to wait and hope it gets fixed?
13:08 🔗 schbirid probably best
13:09 🔗 JAA Yeah, unless it suddenly disappears forever.
13:10 🔗 JAA Could run a quick job without offsite links and aggressive ignores to at least get a basic dump of what's accessible now, then a more thorough archive when/if it comes back properly.
13:10 🔗 schbirid :D
13:10 🔗 schbirid brb
13:10 🔗 schbirid has quit IRC (Quit: Leaving)
13:13 🔗 schbirid has joined #archiveteam-bs
14:28 🔗 schbirid has quit IRC (Quit: Leaving)
14:43 🔗 HP_Archiv has quit IRC (Quit: Leaving)
14:54 🔗 HP_Archiv has joined #archiveteam-bs
15:07 🔗 Arcorann has quit IRC (Read error: Connection reset by peer)
15:12 🔗 qw3rty__ has quit IRC (Leaving)
15:36 🔗 maxfan8_ has quit IRC (WeeChat 2.8)
15:36 🔗 maxfan8 has joined #archiveteam-bs
15:38 🔗 DigiDigi has quit IRC (Read error: Operation timed out)
15:49 🔗 DigiDigi has joined #archiveteam-bs
15:53 🔗 HP_Archiv has quit IRC (Quit: Leaving)
17:14 🔗 JAA Tigris can be considered done. My original crawl is still slowly going through the subversion discussions, and I can't really kill that cleanly, but I grabbed those separately anyway; it will eventually die when they take the site offline. The AB job for CVS repos is also still running, and I'll let it run to the bitter end because my crawl only focused on getting as much as possible of the actual
17:15 🔗 JAA repo data, not the various links that could be useful for users. There are a handful of projects which are almost or entirely impossible to grab because they're very slow or throw errors, so those are not covered at all, but I'm not sure anything can be done about that: argouml-groovy, fikafighters, lawngnome, odontosoft, realmforge, and roseevo.
17:57 🔗 Nikchemny has joined #archiveteam-bs
17:58 🔗 Nikchemny Hello, there is site goodgame.ru and som streams from this site: https://docs.google.com/spreadsheets/u/0/d/1sXWdJBU17YmLBNfYBPE6-ag1oqCTyCOhQnykr5JHOSw/htmlview#gid=1621977900
17:58 🔗 katocala !ig 9ipompjl9jo5lsvvanx7wbz1w ^https?://www\.afroamcivilwar\.org/events/eventsbyday/
17:59 🔗 katocala blah
18:20 🔗 SynMonger has quit IRC (Quit: Wait, what?)
18:23 🔗 SynMonger has joined #archiveteam-bs
18:24 🔗 SynMonger has quit IRC (Client Quit)
18:25 🔗 SynMonger has joined #archiveteam-bs
18:37 🔗 DLoader_ has joined #archiveteam-bs
18:47 🔗 DLoader has quit IRC (Ping timeout: 745 seconds)
18:47 🔗 DLoader_ is now known as DLoader
18:50 🔗 RichardG_ has joined #archiveteam-bs
18:52 🔗 nicolas17 has joined #archiveteam-bs
18:55 🔗 RichardG has quit IRC (Ping timeout: 496 seconds)
19:05 🔗 fredgido has joined #archiveteam-bs
19:22 🔗 Xibalba has quit IRC (Quit: ZNC - https://znc.in)
19:30 🔗 Xibalba has joined #archiveteam-bs
19:39 🔗 Nikchemny #archivebot
19:59 🔗 JAA Komixxy finished about an hour ago. Will check later if there's anything left to do there.
20:03 🔗 OrIdow6 So I'll put out a tentative "claim" on the Winnipeg Free Press comments (in light of the impediments to playback, may see about warc + text files of the extracted comments or something like that); hopefully it doesn't end up being too complicated
20:04 🔗 JAA Have fun with the access tokens, device UUIDs, and all that crap I've seen. :-|
20:21 🔗 VoynichCr curiously, page moves in wiki must be approved my moderator, by i am one myself
20:22 🔗 VoynichCr weeeird
20:28 🔗 JAA You're an automoderated user, not a mod. But yeah, odd. jrwr?
20:30 🔗 VoynichCr well, now i see page was moved correctly, did anybody approve the change? or it was auto?
20:30 🔗 VoynichCr https://www.archiveteam.org/index.php?title=List_of_newspapers&action=history
20:31 🔗 JAA I approved it.
20:51 🔗 Nikchemny VoynichCr: Will you grow "list of newspapers" page? Or it'll have only for Wikidata lists?
20:52 🔗 Nikchemny I mean can I add vedomosti.ru and kommersant.ru?
20:52 🔗 VoynichCr sure, add them in a Russian == section ==
20:53 🔗 Nikchemny Ok, I'll do it later. Btw, the site must literally has a real newspaper, or just news-site?
20:54 🔗 Nikchemny https://www.ng.ru/ and https://www.gazeta.ru/ , https://novayagazeta.ru/
20:55 🔗 Nikchemny VoynichCr
20:55 🔗 VoynichCr news site is fine
20:55 🔗 Nikchemny ok
20:56 🔗 Nikchemny Hmm, I have many examples of them.
20:56 🔗 Nikchemny has quit IRC (Quit: Page closed)
20:57 🔗 godane has quit IRC (Quit: Leaving.)
21:10 🔗 Mayonaise has quit IRC (Read error: Operation timed out)
21:57 🔗 Maylay_ has quit IRC (Read error: Operation timed out)
22:01 🔗 Maylay has joined #archiveteam-bs
22:47 🔗 nicolas17 https://nicolas17.s3.amazonaws.com/reckful-meta-with-chats.zip.torrent metadata, HLS playlists, and chat logs for every Twitch VOD of user Reckful
23:47 🔗 ranma has joined #archiveteam-bs
23:55 🔗 BlueMax has joined #archiveteam-bs

irclogger-viewer