#archiveteam-bs 2018-08-13,Mon

↑back Search

Time Nickname Message
00:05 πŸ”— bztoot has joined #archiveteam-bs
00:05 πŸ”— t2t2 has quit IRC (Ping timeout: 633 seconds)
00:07 πŸ”— BlueMax has joined #archiveteam-bs
00:23 πŸ”— Soni has joined #archiveteam-bs
00:23 πŸ”— t2t2 has joined #archiveteam-bs
00:28 πŸ”— Stilett0 has quit IRC (Ping timeout: 252 seconds)
00:28 πŸ”— bztoot has quit IRC (Ping timeout: 633 seconds)
00:30 πŸ”— Stilett0 has joined #archiveteam-bs
00:38 πŸ”— Stiletto has joined #archiveteam-bs
00:38 πŸ”— t2t2 has quit IRC (Ping timeout: 260 seconds)
00:38 πŸ”— t2t2 has joined #archiveteam-bs
00:43 πŸ”— Stilett0 has quit IRC (Read error: Operation timed out)
00:53 πŸ”— t2t2 has quit IRC (Read error: Connection reset by peer)
00:56 πŸ”— t2t2 has joined #archiveteam-bs
01:17 πŸ”— bztoot has joined #archiveteam-bs
01:21 πŸ”— t2t2 has quit IRC (Ping timeout: 633 seconds)
01:24 πŸ”— bztoot has quit IRC (Ping timeout: 260 seconds)
01:28 πŸ”— t2t2 has joined #archiveteam-bs
01:41 πŸ”— Stilett0 has joined #archiveteam-bs
01:43 πŸ”— Stiletto has quit IRC (Read error: Operation timed out)
02:18 πŸ”— Soni has quit IRC (Ping timeout: 264 seconds)
02:33 πŸ”— Stilett0 has quit IRC (Ping timeout: 268 seconds)
02:36 πŸ”— Stilett0 has joined #archiveteam-bs
02:54 πŸ”— t2t2 has quit IRC (Read error: Connection reset by peer)
02:55 πŸ”— t2t2 has joined #archiveteam-bs
03:14 πŸ”— t2t2 has quit IRC (Ping timeout: 260 seconds)
03:18 πŸ”— t2t2 has joined #archiveteam-bs
03:33 πŸ”— Stiletto has joined #archiveteam-bs
03:35 πŸ”— Stilett0 has quit IRC (Ping timeout: 268 seconds)
03:38 πŸ”— Stilett0 has joined #archiveteam-bs
03:38 πŸ”— Stiletto has quit IRC (Ping timeout: 261 seconds)
03:41 πŸ”— t2t2 has quit IRC (Ping timeout: 259 seconds)
03:44 πŸ”— t2t2 has joined #archiveteam-bs
03:48 πŸ”— Stilett0 has quit IRC (Ping timeout: 360 seconds)
03:50 πŸ”— Stilett0 has joined #archiveteam-bs
03:53 πŸ”— archodg_ has joined #archiveteam-bs
03:54 πŸ”— t2t2 has quit IRC (Ping timeout: 260 seconds)
03:54 πŸ”— t2t2 has joined #archiveteam-bs
03:55 πŸ”— archodg__ has quit IRC (Ping timeout: 252 seconds)
03:55 πŸ”— odemg has quit IRC (Ping timeout: 260 seconds)
04:07 πŸ”— Stilett0 has quit IRC (Read error: Operation timed out)
04:08 πŸ”— odemg has joined #archiveteam-bs
04:09 πŸ”— bztoot has joined #archiveteam-bs
04:14 πŸ”— t2t2 has quit IRC (Ping timeout: 633 seconds)
04:27 πŸ”— bztoot has quit IRC (Ping timeout: 260 seconds)
04:27 πŸ”— t2t2 has joined #archiveteam-bs
04:41 πŸ”— archodg__ has joined #archiveteam-bs
04:42 πŸ”— archodg_ has quit IRC (Read error: Connection reset by peer)
05:08 πŸ”— bztoot has joined #archiveteam-bs
05:11 πŸ”— t2t2 has quit IRC (Ping timeout: 633 seconds)
05:34 πŸ”— bztoot has quit IRC (Quit: bztoot)
06:20 πŸ”— wp494 has quit IRC (Read error: Connection reset by peer)
06:33 πŸ”— sam-p has joined #archiveteam-bs
06:37 πŸ”— m007a83_ has quit IRC (Read error: Operation timed out)
06:37 πŸ”— wp494 has joined #archiveteam-bs
06:44 πŸ”— m007a83 has joined #archiveteam-bs
07:02 πŸ”— Stilett0 has joined #archiveteam-bs
08:19 πŸ”— ta9le has joined #archiveteam-bs
10:04 πŸ”— icedice has joined #archiveteam-bs
10:26 πŸ”— ta9le has quit IRC (Quit: Connection closed for inactivity)
10:27 πŸ”— fredgido has quit IRC (Quit: Connection closed for inactivity)
10:41 πŸ”— m007a83_ has joined #archiveteam-bs
10:43 πŸ”— m007a83 has quit IRC (Read error: Operation timed out)
11:02 πŸ”— Jusque has quit IRC (Ping timeout: 268 seconds)
11:03 πŸ”— Jusque has joined #archiveteam-bs
11:16 πŸ”— m007a83 has joined #archiveteam-bs
11:19 πŸ”— m007a83_ has quit IRC (Read error: Operation timed out)
11:23 πŸ”— Darkstar has quit IRC (Ping timeout: 260 seconds)
11:47 πŸ”— Darkstar has joined #archiveteam-bs
12:00 πŸ”— BlueMax has quit IRC (Read error: Connection reset by peer)
12:17 πŸ”— archodg__ arkiver, are we done with google-newspapers?
12:18 πŸ”— JAA Not even close.
12:18 πŸ”— JAA (I think)
12:19 πŸ”— JAA Still 72k todo items on the tracker, and I think that's only the "A" papers.
12:19 πŸ”— archodg__ tracker isn't moving, what's going on?
12:20 πŸ”— JAA Has been paused for a while since we kept getting banned IIRC.
12:20 πŸ”— JAA Anyway, let's take this to #papersplease.
13:06 πŸ”— bitBaron has joined #archiveteam-bs
13:55 πŸ”— davidar has quit IRC (Quit: Connection closed for inactivity)
14:03 πŸ”— Soni has joined #archiveteam-bs
14:14 πŸ”— SketchCow As usual, when I'm actually paying attention to the pipelines, I see stuff go by where I ask myself "why did we archive this"
14:14 πŸ”— SketchCow Like archive.scene.org. Why
14:16 πŸ”— Odd0002 has quit IRC (Read error: Operation timed out)
14:20 πŸ”— Odd0002 has joined #archiveteam-bs
14:32 πŸ”— SketchCow Moving SO MANY PODCASTS
14:35 πŸ”— SketchCow I'll definitely be interested why we downloaded 55gb of scene.archive.org and put it into archivebot
14:42 πŸ”— JAA Flashfire: ^ (I assume you mean archive.scene.org.)
14:43 πŸ”— SketchCow Yeah
14:43 πŸ”— SketchCow I'm about to shove whatever this newspapers collection was we're grabbing
14:43 πŸ”— JAA Google News Archive, in #papersplease?
14:44 πŸ”— SketchCow I assume? Maybe?
14:44 πŸ”— SketchCow root@teamarchive2:/2/ARCHIVETEAM/NEWSPAPERS/archive# ls
14:44 πŸ”— SketchCow 20180812054958 20180812055001 20180812055005 20180812055008 20180812055012
14:44 πŸ”— SketchCow 20180812054959 20180812055003 20180812055006 20180812055010 20180812055013
14:44 πŸ”— SketchCow np-p_8022000-8022999-20180718-205346.warc.gz
14:46 πŸ”— SketchCow It's going in https://archive.org/details/archiveteam_newspapers
14:46 πŸ”— arkiver ok
14:47 πŸ”— SketchCow Shit's moving, buddy!
14:47 πŸ”— SketchCow Can I blast 4tb of material before they take the machine down?
14:48 πŸ”— arkiver of course you can :)
14:49 πŸ”— jut How big is FOS?
14:50 πŸ”— SketchCow CODEBLENDER.txtfiles.tar
14:50 πŸ”— SketchCow Where is that going, arkiver
14:51 πŸ”— SketchCow FOS is as big as a crime boss walking into a parlor and the piano stopping
14:52 πŸ”— SketchCow I see a year of archiveteam xml dumps
14:54 πŸ”— SketchCow 16gb. foosh oosh
14:55 πŸ”— arkiver I donΒ΄t remember codeblender
14:55 πŸ”— SketchCow Pile of hiphop tapes on the way
14:55 πŸ”— SketchCow Someone wanted the textfiles
14:55 πŸ”— arkiver donΒ΄t see it on the tracker either
14:55 πŸ”— SketchCow It's usually tou
14:55 πŸ”— SketchCow you
14:55 πŸ”— arkiver what is the size?
14:56 πŸ”— SketchCow Not big, 45mb
14:56 πŸ”— arkiver ok, got a link to download?
14:57 πŸ”— SketchCow Need a little time
14:57 πŸ”— SketchCow I'm moving very fast over here
14:58 πŸ”— SketchCow We backed up salon too, I see
14:59 πŸ”— SketchCow And obsoletemedia.org and romulation.net
14:59 πŸ”— arkiver and we got purevolume and ytmnd still sitting on FOS waiting to be uploaded
14:59 πŸ”— SketchCow Well, look the fuck out, right after I get this hiphop sorted away, another window ill go to those
15:00 πŸ”— SketchCow https://archive.org/details/archiveteam_github
15:00 πŸ”— arkiver wooh
15:03 πŸ”— SketchCow fos.textfiles.com/CODEBLENDER.txtfiles.tar
15:03 πŸ”— SketchCow 787gb of ytmnd
15:14 πŸ”— arkiver got the codeblender file
15:42 πŸ”— wp494 has quit IRC (Read error: Operation timed out)
15:42 πŸ”— wp494 has joined #archiveteam-bs
15:53 πŸ”— m007a83_ has joined #archiveteam-bs
15:54 πŸ”— m007a83 has quit IRC (Ping timeout: 252 seconds)
15:56 πŸ”— SketchCow Purevolume joins the fun
16:22 πŸ”— SketchCow github, ytmnd and newspapers now uploading, that's 2.5tb of material
16:24 πŸ”— kiska So would IA mind storing potentially 0.5TB per day of atc feeds for preservation?
16:29 πŸ”— SketchCow Unlikely.
16:29 πŸ”— SketchCow That's a lot of data.
16:29 πŸ”— SketchCow Speaking of which... story-raw - JAA, is that yours?
16:29 πŸ”— SketchCow Who's got that project going? We should move on that. It's 648gb sitting here.
16:30 πŸ”— JAA SketchCow: Storify? Oh yeah, shit. Sorry, completely forgot about that. Yes, this is mine. Need to filter out the bogus responses etc.
16:30 πŸ”— SketchCow How do we want to do this
16:30 πŸ”— SketchCow I can probably give you a temp account
16:30 πŸ”— JAA Yeah, that was the idea I think.
16:30 πŸ”— SketchCow or I can run something? or something.
16:31 πŸ”— JAA I don't have a script yet or anything. Haven't done this before, so I'll figure it out as I go. I guess I'll filter out all response and request records that are going to be a problem in browsing the archives.
16:35 πŸ”— SketchCow Ok, then. We'll arrange a user account for you Wednesday (after the shutdown) that lets you do whatever you want with it
16:36 πŸ”— JAA Sounds great, thanks.
16:36 πŸ”— SketchCow Stay on me if I forget
16:38 πŸ”— JAA Yeah, I'll try to remember it this time.
16:51 πŸ”— m007a83 has joined #archiveteam-bs
16:52 πŸ”— m007a83_ has quit IRC (Ping timeout: 252 seconds)
17:02 πŸ”— arkiver kiska: letΒ΄s take this to #radio-archive
17:18 πŸ”— bitBaron has quit IRC (Quit: Bye!)
17:37 πŸ”— catboy are there any scripts or orchestrators that let me more easily manage multiple concurrent instances? right now i'm manually bash-for-looping a hundred screen'd python instances (with disable-web-server and binding to a different IP)
17:41 πŸ”— m007a83_ has joined #archiveteam-bs
17:44 πŸ”— m007a83 has quit IRC (Ping timeout: 252 seconds)
17:53 πŸ”— tyzoid catboy: You talking about the warrior?
17:54 πŸ”— catboy i.. think so. i'm running with the `screen su - [...] archiveteam` thing, not the VM
18:00 πŸ”— betamax has quit IRC (Read error: Operation timed out)
18:01 πŸ”— Kaz the warrior is the vm
18:01 πŸ”— Kaz you're runing the scripts manually - I don't think there's much available in terms of orchestration, everyone has their own little scripts here and there
18:02 πŸ”— Kaz i tend to just have a script that kicks off x amount of pipelines, and restarts them every (hour|day|whatever)
18:14 πŸ”— catboy works for me, i guess. i was hoping to be able to maybe start one of the warrior webservers or something where i can keep track of all the jobs in one interface
18:14 πŸ”— catboy but it only shows the two concurrent on that bound ip or something, unless i'm running it in some horribly wrong method
18:17 πŸ”— fredgido has joined #archiveteam-bs
18:34 πŸ”— archodg_ has joined #archiveteam-bs
18:36 πŸ”— archodg__ has quit IRC (Ping timeout: 252 seconds)
18:36 πŸ”— odemg has quit IRC (Ping timeout: 260 seconds)
18:43 πŸ”— m007a83 has joined #archiveteam-bs
18:44 πŸ”— m007a83_ has quit IRC (Ping timeout: 252 seconds)
18:50 πŸ”— odemg has joined #archiveteam-bs
19:03 πŸ”— JAA catboy: I don't think it's possible to have one dashboard for multiple parallel instances. You can of course run only one pipeline instance with a higher concurrency though.
19:05 πŸ”— catboy can i bind each 'concurrency' to a different outgoing wget-lua IP?
19:05 πŸ”— schbirid has quit IRC (Remote host closed the connection)
19:06 πŸ”— JAA catboy: I don't think that's supported, no.
19:19 πŸ”— m007a83_ has joined #archiveteam-bs
19:23 πŸ”— m007a83 has quit IRC (Read error: Operation timed out)
19:27 πŸ”— bsmith093 has quit IRC (Remote host closed the connection)
19:44 πŸ”— Kaz webinterface is CPU overhead, bin it off
19:44 πŸ”— Kaz in theory you shouldn't need to watch progress unless there's a problem
19:45 πŸ”— Kaz catboy: can't bind each concurrent to a different IP, but nothing stopping you from running the pipeline multiple times with different IPs (and a concurrency of 1)
19:54 πŸ”— tyzoid AccuWeather archive project name discussion
19:55 πŸ”— tyzoid accio-weather?
19:55 πŸ”— Kaz there's gotta be a rain/storm pun here somewhere
19:55 πŸ”— tyzoid something with torrent?
19:57 πŸ”— tyzoid F5-torrent?
19:59 πŸ”— arkiver where does the idea come from to do accuweather?
20:00 πŸ”— arkiver nvm :)
20:02 πŸ”— arkiver JAA ^
20:03 πŸ”— arkiver topics seem to be sequentially numbered
20:04 πŸ”— arkiver users have nice numbers too
20:04 πŸ”— arkiver could archive single post URLs too if thereΒ΄s enough time
20:22 πŸ”— nyaomi has quit IRC (Quit: meow)
20:24 πŸ”— JAA Looks like standard IP.Board (aka Invision Forums). We'll want to be a bit careful to keep session IDs out of the links in the archives so everything's nice and browsable.
20:26 πŸ”— JAA There are different ways to do that, e.g. a separate process that writes a cookie jar to be used by the actual archival (like we did for login on SPUF) or by requesting an extra page at the beginning of each job. Once the cookies are set, session IDs aren't inserted into links anymore.
20:26 πŸ”— fredgido has quit IRC (Quit: Connection closed for inactivity)
20:27 πŸ”— arkiver Requesting an extra page is probably the easiest. But both are fine.
20:29 πŸ”— JAA Regarding the name, maybe something related to storm drains (i.e. the site going down the drain)? I suck at puns though.
20:30 πŸ”— godane latest digitize tapes uploaded: https://www.patreon.com/posts/digitize-tapes-20740076
20:32 πŸ”— JAA Looks like it's also possible for users to leave comments on other users' profile pages.
20:33 πŸ”— JAA And there are frames on the profile pages which are only loaded on a click (the tabs above "My content" on the right).
20:38 πŸ”— JAA This is the most official statement I was able to find regarding the shutdown: http://forums.accuweather.com/index.php?showtopic=33652&st=0&p=2331556&#entry2331556
20:40 πŸ”— arkiver Do you want to write the scripts for this project? Or shall I work on them
20:40 πŸ”— arkiver If we want to do a project
20:40 πŸ”— arkiver Would be nice to archive URLs for individual posts too
20:41 πŸ”— JAA Agreed. Yeah, I'll write the scripts.
20:43 πŸ”— arkiver Awesome
20:44 πŸ”— catboy thx kaz - yeah i am running just a hundred copies of the pipeline stuff right now. was hoping there was a better way but it'll work
20:56 πŸ”— bsmith093 has joined #archiveteam-bs
20:59 πŸ”— nyaomi has joined #archiveteam-bs
21:00 πŸ”— bsmith093 has quit IRC (Client Quit)
21:03 πŸ”— bsmith093 has joined #archiveteam-bs
21:06 πŸ”— JAA I just realised that the Quizlet project was never announced in the main channels. The channel for that is #quizletusin.
21:06 πŸ”— JAA (For those who are not aware: "Quizlet is a mobile and web-based study application that allows students to study information via learning tools and games." It's not at risk currently as far as we can tell, but it's still worth a grab.)
21:07 πŸ”— catboy i found out about AT because of that actually, some of my friends said they have started doing mass takedowns of entire chapters/books/publishers on request or something
21:08 πŸ”— jut And set concurecy to 2
21:08 πŸ”— jut concurrency
21:14 πŸ”— JAA catboy: Oh, interesting, haven't heard about that before.
21:16 πŸ”— m007a83_ has quit IRC (Ping timeout: 252 seconds)
21:43 πŸ”— Stilett0 has quit IRC (Read error: Operation timed out)
22:01 πŸ”— lindalap has quit IRC (Quit: lindalap)
22:07 πŸ”— tyzoid JAA / arkiver: Did we decide a name/should we split the channel?
22:07 πŸ”— JAA tyzoid: No name yet, any suggestions?
22:08 πŸ”— tyzoid my vote is still #accio-weather
22:08 πŸ”— Flashfire #rainingdata
22:09 πŸ”— Flashfire #acquireweather
22:09 πŸ”— Flashfire Although I am a fan of the suggestion of tyzoid
22:09 πŸ”— Flashfire as well
23:16 πŸ”— m007a83 has joined #archiveteam-bs
23:32 πŸ”— tyzoid has quit IRC (Read error: Operation timed out)
23:54 πŸ”— adinbied has quit IRC (Left Channel.)
23:55 πŸ”— ivan has quit IRC (Read error: Operation timed out)
23:55 πŸ”— adinbied has joined #archiveteam-bs
23:55 πŸ”— JAA has quit IRC (Read error: Operation timed out)
23:55 πŸ”— ivan has joined #archiveteam-bs
23:56 πŸ”— zyphlar has quit IRC (Read error: Operation timed out)
23:56 πŸ”— jspiros has quit IRC (Read error: Operation timed out)
23:56 πŸ”— wabu has quit IRC (Read error: Operation timed out)
23:57 πŸ”— Petri152 has quit IRC (Read error: Operation timed out)
23:57 πŸ”— Jusque has quit IRC (Read error: Operation timed out)
23:58 πŸ”— Jusque has joined #archiveteam-bs

irclogger-viewer