[00:05] *** bztoot has joined #archiveteam-bs [00:05] *** t2t2 has quit IRC (Ping timeout: 633 seconds) [00:07] *** BlueMax has joined #archiveteam-bs [00:23] *** Soni has joined #archiveteam-bs [00:23] *** t2t2 has joined #archiveteam-bs [00:28] *** Stilett0 has quit IRC (Ping timeout: 252 seconds) [00:28] *** bztoot has quit IRC (Ping timeout: 633 seconds) [00:30] *** Stilett0 has joined #archiveteam-bs [00:38] *** Stiletto has joined #archiveteam-bs [00:38] *** t2t2 has quit IRC (Ping timeout: 260 seconds) [00:38] *** t2t2 has joined #archiveteam-bs [00:43] *** Stilett0 has quit IRC (Read error: Operation timed out) [00:53] *** t2t2 has quit IRC (Read error: Connection reset by peer) [00:56] *** t2t2 has joined #archiveteam-bs [01:17] *** bztoot has joined #archiveteam-bs [01:21] *** t2t2 has quit IRC (Ping timeout: 633 seconds) [01:24] *** bztoot has quit IRC (Ping timeout: 260 seconds) [01:28] *** t2t2 has joined #archiveteam-bs [01:41] *** Stilett0 has joined #archiveteam-bs [01:43] *** Stiletto has quit IRC (Read error: Operation timed out) [02:18] *** Soni has quit IRC (Ping timeout: 264 seconds) [02:33] *** Stilett0 has quit IRC (Ping timeout: 268 seconds) [02:36] *** Stilett0 has joined #archiveteam-bs [02:54] *** t2t2 has quit IRC (Read error: Connection reset by peer) [02:55] *** t2t2 has joined #archiveteam-bs [03:14] *** t2t2 has quit IRC (Ping timeout: 260 seconds) [03:18] *** t2t2 has joined #archiveteam-bs [03:33] *** Stiletto has joined #archiveteam-bs [03:35] *** Stilett0 has quit IRC (Ping timeout: 268 seconds) [03:38] *** Stilett0 has joined #archiveteam-bs [03:38] *** Stiletto has quit IRC (Ping timeout: 261 seconds) [03:41] *** t2t2 has quit IRC (Ping timeout: 259 seconds) [03:44] *** t2t2 has joined #archiveteam-bs [03:48] *** Stilett0 has quit IRC (Ping timeout: 360 seconds) [03:50] *** Stilett0 has joined #archiveteam-bs [03:53] *** archodg_ has joined #archiveteam-bs [03:54] *** t2t2 has quit IRC (Ping timeout: 260 seconds) [03:54] *** t2t2 has joined #archiveteam-bs [03:55] *** archodg__ has quit IRC (Ping timeout: 252 seconds) [03:55] *** odemg has quit IRC (Ping timeout: 260 seconds) [04:07] *** Stilett0 has quit IRC (Read error: Operation timed out) [04:08] *** odemg has joined #archiveteam-bs [04:09] *** bztoot has joined #archiveteam-bs [04:14] *** t2t2 has quit IRC (Ping timeout: 633 seconds) [04:27] *** bztoot has quit IRC (Ping timeout: 260 seconds) [04:27] *** t2t2 has joined #archiveteam-bs [04:41] *** archodg__ has joined #archiveteam-bs [04:42] *** archodg_ has quit IRC (Read error: Connection reset by peer) [05:08] *** bztoot has joined #archiveteam-bs [05:11] *** t2t2 has quit IRC (Ping timeout: 633 seconds) [05:34] *** bztoot has quit IRC (Quit: bztoot) [06:20] *** wp494 has quit IRC (Read error: Connection reset by peer) [06:33] *** sam-p has joined #archiveteam-bs [06:37] *** m007a83_ has quit IRC (Read error: Operation timed out) [06:37] *** wp494 has joined #archiveteam-bs [06:44] *** m007a83 has joined #archiveteam-bs [07:02] *** Stilett0 has joined #archiveteam-bs [08:19] *** ta9le has joined #archiveteam-bs [10:04] *** icedice has joined #archiveteam-bs [10:26] *** ta9le has quit IRC (Quit: Connection closed for inactivity) [10:27] *** fredgido has quit IRC (Quit: Connection closed for inactivity) [10:41] *** m007a83_ has joined #archiveteam-bs [10:43] *** m007a83 has quit IRC (Read error: Operation timed out) [11:02] *** Jusque has quit IRC (Ping timeout: 268 seconds) [11:03] *** Jusque has joined #archiveteam-bs [11:16] *** m007a83 has joined #archiveteam-bs [11:19] *** m007a83_ has quit IRC (Read error: Operation timed out) [11:23] *** Darkstar has quit IRC (Ping timeout: 260 seconds) [11:47] *** Darkstar has joined #archiveteam-bs [12:00] *** BlueMax has quit IRC (Read error: Connection reset by peer) [12:17] arkiver, are we done with google-newspapers? [12:18] Not even close. [12:18] (I think) [12:19] Still 72k todo items on the tracker, and I think that's only the "A" papers. [12:19] tracker isn't moving, what's going on? [12:20] Has been paused for a while since we kept getting banned IIRC. [12:20] Anyway, let's take this to #papersplease. [13:06] *** bitBaron has joined #archiveteam-bs [13:55] *** davidar has quit IRC (Quit: Connection closed for inactivity) [14:03] *** Soni has joined #archiveteam-bs [14:14] As usual, when I'm actually paying attention to the pipelines, I see stuff go by where I ask myself "why did we archive this" [14:14] Like archive.scene.org. Why [14:16] *** Odd0002 has quit IRC (Read error: Operation timed out) [14:20] *** Odd0002 has joined #archiveteam-bs [14:32] Moving SO MANY PODCASTS [14:35] I'll definitely be interested why we downloaded 55gb of scene.archive.org and put it into archivebot [14:42] Flashfire: ^ (I assume you mean archive.scene.org.) [14:43] Yeah [14:43] I'm about to shove whatever this newspapers collection was we're grabbing [14:43] Google News Archive, in #papersplease? [14:44] I assume? Maybe? [14:44] root@teamarchive2:/2/ARCHIVETEAM/NEWSPAPERS/archive# ls [14:44] 20180812054958 20180812055001 20180812055005 20180812055008 20180812055012 [14:44] 20180812054959 20180812055003 20180812055006 20180812055010 20180812055013 [14:44] np-p_8022000-8022999-20180718-205346.warc.gz [14:46] It's going in https://archive.org/details/archiveteam_newspapers [14:46] ok [14:47] Shit's moving, buddy! [14:47] Can I blast 4tb of material before they take the machine down? [14:48] of course you can :) [14:49] How big is FOS? [14:50] CODEBLENDER.txtfiles.tar [14:50] Where is that going, arkiver [14:51] FOS is as big as a crime boss walking into a parlor and the piano stopping [14:52] I see a year of archiveteam xml dumps [14:54] 16gb. foosh oosh [14:55] I don´t remember codeblender [14:55] Pile of hiphop tapes on the way [14:55] Someone wanted the textfiles [14:55] don´t see it on the tracker either [14:55] It's usually tou [14:55] you [14:55] what is the size? [14:56] Not big, 45mb [14:56] ok, got a link to download? [14:57] Need a little time [14:57] I'm moving very fast over here [14:58] We backed up salon too, I see [14:59] And obsoletemedia.org and romulation.net [14:59] and we got purevolume and ytmnd still sitting on FOS waiting to be uploaded [14:59] Well, look the fuck out, right after I get this hiphop sorted away, another window ill go to those [15:00] https://archive.org/details/archiveteam_github [15:00] wooh [15:03] fos.textfiles.com/CODEBLENDER.txtfiles.tar [15:03] 787gb of ytmnd [15:14] got the codeblender file [15:42] *** wp494 has quit IRC (Read error: Operation timed out) [15:42] *** wp494 has joined #archiveteam-bs [15:53] *** m007a83_ has joined #archiveteam-bs [15:54] *** m007a83 has quit IRC (Ping timeout: 252 seconds) [15:56] Purevolume joins the fun [16:22] github, ytmnd and newspapers now uploading, that's 2.5tb of material [16:24] So would IA mind storing potentially 0.5TB per day of atc feeds for preservation? [16:29] Unlikely. [16:29] That's a lot of data. [16:29] Speaking of which... story-raw - JAA, is that yours? [16:29] Who's got that project going? We should move on that. It's 648gb sitting here. [16:30] SketchCow: Storify? Oh yeah, shit. Sorry, completely forgot about that. Yes, this is mine. Need to filter out the bogus responses etc. [16:30] How do we want to do this [16:30] I can probably give you a temp account [16:30] Yeah, that was the idea I think. [16:30] or I can run something? or something. [16:31] I don't have a script yet or anything. Haven't done this before, so I'll figure it out as I go. I guess I'll filter out all response and request records that are going to be a problem in browsing the archives. [16:35] Ok, then. We'll arrange a user account for you Wednesday (after the shutdown) that lets you do whatever you want with it [16:36] Sounds great, thanks. [16:36] Stay on me if I forget [16:38] Yeah, I'll try to remember it this time. [16:51] *** m007a83 has joined #archiveteam-bs [16:52] *** m007a83_ has quit IRC (Ping timeout: 252 seconds) [17:02] kiska: let´s take this to #radio-archive [17:18] *** bitBaron has quit IRC (Quit: Bye!) [17:37] are there any scripts or orchestrators that let me more easily manage multiple concurrent instances? right now i'm manually bash-for-looping a hundred screen'd python instances (with disable-web-server and binding to a different IP) [17:41] *** m007a83_ has joined #archiveteam-bs [17:44] *** m007a83 has quit IRC (Ping timeout: 252 seconds) [17:53] catboy: You talking about the warrior? [17:54] i.. think so. i'm running with the `screen su - [...] archiveteam` thing, not the VM [18:00] *** betamax has quit IRC (Read error: Operation timed out) [18:01] the warrior is the vm [18:01] you're runing the scripts manually - I don't think there's much available in terms of orchestration, everyone has their own little scripts here and there [18:02] i tend to just have a script that kicks off x amount of pipelines, and restarts them every (hour|day|whatever) [18:14] works for me, i guess. i was hoping to be able to maybe start one of the warrior webservers or something where i can keep track of all the jobs in one interface [18:14] but it only shows the two concurrent on that bound ip or something, unless i'm running it in some horribly wrong method [18:17] *** fredgido has joined #archiveteam-bs [18:34] *** archodg_ has joined #archiveteam-bs [18:36] *** archodg__ has quit IRC (Ping timeout: 252 seconds) [18:36] *** odemg has quit IRC (Ping timeout: 260 seconds) [18:43] *** m007a83 has joined #archiveteam-bs [18:44] *** m007a83_ has quit IRC (Ping timeout: 252 seconds) [18:50] *** odemg has joined #archiveteam-bs [19:03] catboy: I don't think it's possible to have one dashboard for multiple parallel instances. You can of course run only one pipeline instance with a higher concurrency though. [19:05] can i bind each 'concurrency' to a different outgoing wget-lua IP? [19:05] *** schbirid has quit IRC (Remote host closed the connection) [19:06] catboy: I don't think that's supported, no. [19:19] *** m007a83_ has joined #archiveteam-bs [19:23] *** m007a83 has quit IRC (Read error: Operation timed out) [19:27] *** bsmith093 has quit IRC (Remote host closed the connection) [19:44] webinterface is CPU overhead, bin it off [19:44] in theory you shouldn't need to watch progress unless there's a problem [19:45] catboy: can't bind each concurrent to a different IP, but nothing stopping you from running the pipeline multiple times with different IPs (and a concurrency of 1) [19:54] AccuWeather archive project name discussion [19:55] accio-weather? [19:55] there's gotta be a rain/storm pun here somewhere [19:55] something with torrent? [19:57] F5-torrent? [19:59] where does the idea come from to do accuweather? [20:00] nvm :) [20:02] JAA ^ [20:03] topics seem to be sequentially numbered [20:04] users have nice numbers too [20:04] could archive single post URLs too if there´s enough time [20:22] *** nyaomi has quit IRC (Quit: meow) [20:24] Looks like standard IP.Board (aka Invision Forums). We'll want to be a bit careful to keep session IDs out of the links in the archives so everything's nice and browsable. [20:26] There are different ways to do that, e.g. a separate process that writes a cookie jar to be used by the actual archival (like we did for login on SPUF) or by requesting an extra page at the beginning of each job. Once the cookies are set, session IDs aren't inserted into links anymore. [20:26] *** fredgido has quit IRC (Quit: Connection closed for inactivity) [20:27] Requesting an extra page is probably the easiest. But both are fine. [20:29] Regarding the name, maybe something related to storm drains (i.e. the site going down the drain)? I suck at puns though. [20:30] latest digitize tapes uploaded: https://www.patreon.com/posts/digitize-tapes-20740076 [20:32] Looks like it's also possible for users to leave comments on other users' profile pages. [20:33] And there are frames on the profile pages which are only loaded on a click (the tabs above "My content" on the right). [20:38] This is the most official statement I was able to find regarding the shutdown: http://forums.accuweather.com/index.php?showtopic=33652&st=0&p=2331556&#entry2331556 [20:40] Do you want to write the scripts for this project? Or shall I work on them [20:40] If we want to do a project [20:40] Would be nice to archive URLs for individual posts too [20:41] Agreed. Yeah, I'll write the scripts. [20:43] Awesome [20:44] thx kaz - yeah i am running just a hundred copies of the pipeline stuff right now. was hoping there was a better way but it'll work [20:56] *** bsmith093 has joined #archiveteam-bs [20:59] *** nyaomi has joined #archiveteam-bs [21:00] *** bsmith093 has quit IRC (Client Quit) [21:03] *** bsmith093 has joined #archiveteam-bs [21:06] I just realised that the Quizlet project was never announced in the main channels. The channel for that is #quizletusin. [21:06] (For those who are not aware: "Quizlet is a mobile and web-based study application that allows students to study information via learning tools and games." It's not at risk currently as far as we can tell, but it's still worth a grab.) [21:07] i found out about AT because of that actually, some of my friends said they have started doing mass takedowns of entire chapters/books/publishers on request or something [21:08] And set concurecy to 2 [21:08] concurrency [21:14] catboy: Oh, interesting, haven't heard about that before. [21:16] *** m007a83_ has quit IRC (Ping timeout: 252 seconds) [21:43] *** Stilett0 has quit IRC (Read error: Operation timed out) [22:01] *** lindalap has quit IRC (Quit: lindalap) [22:07] JAA / arkiver: Did we decide a name/should we split the channel? [22:07] tyzoid: No name yet, any suggestions? [22:08] my vote is still #accio-weather [22:08] #rainingdata [22:09] #acquireweather [22:09] Although I am a fan of the suggestion of tyzoid [22:09] as well [23:16] *** m007a83 has joined #archiveteam-bs [23:32] *** tyzoid has quit IRC (Read error: Operation timed out) [23:54] *** adinbied has quit IRC (Left Channel.) [23:55] *** ivan has quit IRC (Read error: Operation timed out) [23:55] *** adinbied has joined #archiveteam-bs [23:55] *** JAA has quit IRC (Read error: Operation timed out) [23:55] *** ivan has joined #archiveteam-bs [23:56] *** zyphlar has quit IRC (Read error: Operation timed out) [23:56] *** jspiros has quit IRC (Read error: Operation timed out) [23:56] *** wabu has quit IRC (Read error: Operation timed out) [23:57] *** Petri152 has quit IRC (Read error: Operation timed out) [23:57] *** Jusque has quit IRC (Read error: Operation timed out) [23:58] *** Jusque has joined #archiveteam-bs