[00:50] No, it is not possible to give FOS "a bit of a kick" [00:51] Also, a special place in hell for the team members who are opening large parallel rsync streams and wondering why it's not working [00:53] is it currently disk-bound? [00:54] because I can't imagine people have ~500 concurrent rsync streams into it [01:35] *** tomwsmf-a has quit IRC (Read error: Operation timed out) [02:06] The irony... https://twitter.com/aaronpk/status/688518372170977280 [02:09] slack does something like that [02:20] *** JesseW has joined #archiveteam-bs [02:29] chfoo -- There is now a copy of pywb-ia running on http://archivelab.org:3579/ -- so rather than running your own, you could just add links to that from the viewer. I'll see about making such a PR soon. [03:30] *** JesseW has quit IRC (Quit: Leaving.) [03:58] *** JesseW has joined #archiveteam-bs [04:03] *** bwn has quit IRC (Read error: Operation timed out) [04:08] dxrt, if you're having problems holding pipeline data I should be able to take some of it off your hands [04:09] Fletcher: I'm doing OK at the moment, but I'l let you know. Thanks! [04:10] *** Boppen has joined #archiveteam-bs [04:14] JesseW: ok, sounds good [04:23] so architecturally, what does FOS ultimately do with the WARCs? [04:23] *** Boppen has quit IRC (hub.se irc.du.se) [04:23] It combines them together into "megawarcs" and then uploads them to the IA [04:23] It's apparently on a network with a very solid route to them [04:24] I wonder if a sufficiently provisioned and active pipeline could bypass it [04:25] *** bwn has joined #archiveteam-bs [04:28] All so adorable. [04:28] What I need to do when I'm onsite in about 2 weeks is talk about a FOS replacement. [04:28] FOS is a virtual box running on a shared resource. It was never meant for this. [04:28] 19:54 < FalconK> because I can't imagine people have ~500 concurrent rsync streams into it [04:28] No, they really fucking do [04:29] brutal! [04:30] With a workload like that it's amazing it runs as well as it does [04:30] I guess if it is on site at IA the architecture makes a lot more sense [04:30] on the other hand my crawler has as much provisioned bandwidth as FOS [04:40] *** mismatch_ has quit IRC (Remote host closed the connection) [04:41] *** mismatch_ has joined #archiveteam-bs [04:41] *** kvieta has quit IRC (Ping timeout: 633 seconds) [04:43] *** botpie91 has quit IRC (Ping timeout: 633 seconds) [04:43] *** kvieta has joined #archiveteam-bs [04:43] *** botpie91 has joined #archiveteam-bs [04:44] *** Boppen has joined #archiveteam-bs [04:49] *** Boppen has quit IRC (Ping timeout: 200 seconds) [04:56] *** bsmith093 has joined #archiveteam-bs [04:59] Jessw, any updates on the repack of the fanfic grab? [04:59] JesseW: any updates on the repack of the fanfic grab? [05:19] bsmith093: still waiting on reply from people at IA about uploading the census data. [05:19] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [05:22] I should maybe ask one of the other people with accounts at FOS if they'd like to upload it -- I could do it myself, but I don't want my name permanently connected to it, to avoid random angry people emailing me years later because they are angry about something on IA. [05:27] how large is it? [05:28] *** Sk1d has joined #archiveteam-bs [05:29] SketchCow: did you see my question from earlier? Is FOS CPU bound right now or disk? If it's CPU bound, could we try lowering the rsync compression levels and see if that does anything? [05:30] *** dashcloud has quit IRC (Read error: Operation timed out) [05:30] Even if only as a temporary measure to get FOS a little healthier until a real replacement can be procured [05:31] It's not CPU bound [05:31] And no, don't lower the compression levels. [05:33] it's a moot point anyways, but the compression levels I was talking about were not related to the data on disk, but instead the compression levels used in transit. Currently, it looks like --compress-level=9 is what is in use, which is very CPU-intensive. [05:40] * phuzion heads out [05:40] *** dashcloud has joined #archiveteam-bs [05:42] Fletcher: how large are the census files to be uploaded to IA? 20G [05:42] and they are already on FOS [05:42] *** ndiddy has quit IRC (Read error: Operation timed out) [05:43] I just don't want to delete my working files (which are rather larger, because uncompressed and duplicative) until I've got the results up to IA. [05:43] https://archive.org/details/Secrets_of_The_Unknown_The_Titanic_1987_VHSRip [05:48] *** dashcloud has quit IRC (Read error: Operation timed out) [05:49] phuzion: It's rather a lot worse for the sending end than receiving end, though. [05:52] *** dashcloud has joined #archiveteam-bs [05:56] https://archive.org/details/Tackling_Football_A_Womans_Guide_to_Watching_the_Game_1986_VHSRip [06:06] *** vitzli has joined #archiveteam-bs [06:35] *** Boppen has joined #archiveteam-bs [06:39] *** dashcloud has quit IRC (Read error: Operation timed out) [06:43] *** Fletcher_ has joined #archiveteam-bs [06:43] *** dashcloud has joined #archiveteam-bs [06:45] *** Fletcher sets mode: +o Fletcher_ [06:52] *** dashcloud has quit IRC (Read error: Operation timed out) [06:56] *** dashcloud has joined #archiveteam-bs [07:04] *** dashcloud has quit IRC (Read error: Operation timed out) [07:08] *** dashcloud has joined #archiveteam-bs [07:20] *** vitzli has quit IRC (Leaving) [07:26] *** JesseW has quit IRC (Quit: Leaving.) [07:30] *** Boppen has quit IRC (hub.se irc.du.se) [08:43] *** bwn has quit IRC (Read error: Operation timed out) [08:58] *** schbirid has joined #archiveteam-bs [09:14] *** robink has joined #archiveteam-bs [09:22] *** BnA-Rob1n has quit IRC (Remote host closed the connection) [09:25] *** bwn has joined #archiveteam-bs [09:28] *** robink has quit IRC (Ping timeout: 260 seconds) [09:30] *** bwn_ has joined #archiveteam-bs [09:34] *** robink has joined #archiveteam-bs [09:41] *** bwn has quit IRC (Read error: Operation timed out) [09:43] *** bwn__ has joined #archiveteam-bs [09:46] *** dashcloud has quit IRC (Read error: Operation timed out) [09:49] *** dashcloud has joined #archiveteam-bs [09:56] *** bwn_ has quit IRC (Read error: Operation timed out) [10:12] *** jut has joined #archiveteam-bs [10:16] *** vitzli has joined #archiveteam-bs [10:53] *** schbirid has quit IRC (Quit: Leaving) [10:57] *** schbirid has joined #archiveteam-bs [11:06] *** dashcloud has quit IRC (Read error: Operation timed out) [11:10] *** dashcloud has joined #archiveteam-bs [11:12] *** dashcloud has quit IRC (Read error: Operation timed out) [11:13] *** jut has quit IRC (Read error: Connection reset by peer) [11:16] *** dashcloud has joined #archiveteam-bs [11:36] *** Rye has joined #archiveteam-bs [11:36] *** useretail has joined #archiveteam-bs [11:46] *** jut has joined #archiveteam-bs [12:14] *** jut has quit IRC (jut) [12:36] *** BnA-Rob1n has joined #archiveteam-bs [13:05] *** rduser has quit IRC (Read error: Operation timed out) [13:08] *** is-_ is now known as is- [13:17] pikhq: Yes, the sending end gets the worst of it, but when the receiving end is one machine doing 500 simultaneous decompress operations, it's gonna get beat up pretty bad too. [13:52] *** vitzli has quit IRC (Leaving) [14:23] https://archive.org/details/Cirque_Du_Soleil_Presents_Inside_La_Nouba_2000_VHSRip [14:45] ok, just found 21GB of comcast data on my box [14:45] *** dashcloud has quit IRC (Read error: Operation timed out) [14:46] *** bauruine has quit IRC (Ping timeout: 260 seconds) [14:47] *** BnA-Rob1n has quit IRC (Remote host closed the connection) [14:49] *** dashcloud has joined #archiveteam-bs [14:50] *** bauruine has joined #archiveteam-bs [14:50] *** vitzli has joined #archiveteam-bs [14:52] *** Start has quit IRC (Quit: Disconnected.) [14:55] SketchCow, thank you for putting the images into collection! [15:09] Right. This CDN im downloading from has some strange stuff on it, like http://c1.glitch.bz/wp-uploads/main/2010/11/The_Rube_Song_1290596637.mp3 [15:16] *** dashcloud has quit IRC (Read error: Operation timed out) [15:20] *** dashcloud has joined #archiveteam-bs [15:25] *** Muad-Dib has quit IRC (Ping timeout: 260 seconds) [15:34] *** Kazzy has quit IRC (Ping timeout: 260 seconds) [15:35] *** robink has quit IRC (Remote host closed the connection) [15:35] *** zhongfu has quit IRC (Remote host closed the connection) [15:36] *** FalconK has quit IRC (Ping timeout: 260 seconds) [15:37] *** robink has joined #archiveteam-bs [15:41] *** Kazzy has joined #archiveteam-bs [15:41] *** FalconK has joined #archiveteam-bs [15:41] *** zhongfu has joined #archiveteam-bs [15:45] *** Start has joined #archiveteam-bs [16:06] SketchCow, a lot of data from the UK National Archive, and the UK government is going to be coming your way eventually [16:07] *** ErkDog has joined #archiveteam-bs [16:07] oops [16:07] soz @midas [16:08] we try to keep #archiveteam clean so when someone comes in just to scream HEY SHIT IS ON FIRE AT ... we can find it again and start working on it, thats why we have -bs ;-) [16:10] Good to know. [16:10] I was up until a ridiculous time last night writing something that yanks pages out of an item (magazines) on archive. [16:11] worked? [16:12] *** Muad-Dib has joined #archiveteam-bs [16:14] *** will has quit IRC (Ping timeout: 250 seconds) [16:14] *** Kazzy has quit IRC (Ping timeout: 250 seconds) [16:14] *** Fletcher_ has quit IRC (Ping timeout: 250 seconds) [16:14] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [16:14] *** ersi_ has quit IRC (Ping timeout: 250 seconds) [16:15] SketchCow: march 2010 of kpfa is going up [16:15] Thank you. [16:15] The script worked fine, and it enabled me to do the work relatively easily, which is good, because Harlan Ellison Is Coming [16:16] How do you mean it yanks out pages? [16:16] But it was still SOMEWHAT tedious, even if the script is doing a ton of heavy lifting. [16:16] As in extract articles? [16:16] *** Kazzy has joined #archiveteam-bs [16:16] *** Fletcher_ has joined #archiveteam-bs [16:16] It makes a dupe object, darks the old one, and then I pull out individual jp2s in it [16:16] So we get the magazine without the objected-to articles. [16:16] Instead of a straight darking. [16:17] @SketchCow you are Jeff/Textfiles right? :-D [16:17] Oh, that's nice [16:17] Who. the fuck. is Jeff [16:17] >_> <_< [16:17] Jason [16:17] sorry doing lots of things [16:17] So what is being done with the pages that are pulled out? [16:17] Do less things. [16:17] arkiver: I delete them [16:17] Again, dupe object. I create itemid_modified and dark itemid [16:18] *** Sk1d has joined #archiveteam-bs [16:18] And mark it as modified and list what was modified. [16:18] I see [16:18] *** Fletcher sets mode: +o Fletcher_ [16:18] So original is also kept [16:18] Nice project! [16:18] Oh yeah. [16:18] *** will has joined #archiveteam-bs [16:19] so then, you are Jason/TextFiles? :-D [16:19] im just calling it right now, dec-99. [16:19] and ohhhh how I missed IRC [16:20] Are you dec3199? [16:20] i dont think he is, im just saying the same thing might happen [16:20] ok, nvm [16:20] *** dashcloud has quit IRC (Read error: Operation timed out) [16:21] You guys should make a betting pool/deadpool channel [16:21] (And don't tell me about it) [16:21] Be sure to add odds-modifiers like "is on new diet" "got into unreasonable twitter battle" [16:22] Unless.... you already have.... [16:22] *** JesseW has joined #archiveteam-bs [16:23] well if you -are- Jason, textfiles.com is pretty cool, I especially liked the text file about how to "rob" the bank by changing writing in your withdrawl book, lol [16:23] soooo old [16:23] but that's the point [16:26] Noted. [16:27] *** dashcloud has joined #archiveteam-bs [16:28] PurpleSym, "Site remains online so long as at least 1 peer serving it" [16:29] * vitzli cries - so many 0s/0l torrents [16:30] Sure, but centralized isn’t much better in that regard. [16:30] I guess everyone builds his own P2P website protocol nowadays. [16:31] ErkDog: Labels itself the “permanent web”, see http://ipfs.io/ [16:32] SketchCow: piratepad with bets, no we wont give you the url [16:32] ;-) [16:32] not just own p2p proto, ipfs has its own storage, *sob* [16:32] Good solution [16:32] Occasionally they surprise you. [16:32] Usually they don't. [16:35] IPFS guy and I are buddies. [16:35] He was at that Google meeting I was at with brewster and vint cerf. [16:37] * arkiver would love to be at such a meeting [16:39] I generally take a dim view of cryptocurrencies, but IPFS is a genuniely cool bit of tech built on top of them [16:39] (In terms of how it uses a cryptocurrency incentivize people to store data for the IPFS network) [16:42] *** bauruine has quit IRC (Ping timeout: 260 seconds) [16:43] SimpBrain: are you still doing the livejournal discovery? [16:45] *** bauruine has joined #archiveteam-bs [16:51] MrRadar: AFAIK, the cryptocurrency part is just in planning at this point, I think? [16:52] *** ErkDog has quit IRC () [16:52] *** ErkDog has joined #archiveteam-bs [16:54] It has been a while since I looked at it [16:57] lol bitcoin [16:58] The key difference between IPFS's (proposed?) crypto and Bitcoin is that Bitcoin incentivizes useless busywork whereas IPFS's incentivizes people to store data for IPFS which is not useless [16:59] *** JesseW has quit IRC (Quit: Leaving.) [17:03] *** ersi has joined #archiveteam-bs [17:03] *** swebb sets mode: +o ersi [17:07] *** Start has quit IRC (Quit: Disconnected.) [17:11] SketchCow, 'free' as in 'here is where you can download it' or free as in 'CC/GPL/Public domain'? [17:12] MrRadar: BUT WHATS IPFSSSS BUSINESS MODEL??? [17:14] *** Swizzle has joined #archiveteam-bs [17:26] *** altlabel has quit IRC (hub.dk irc.homelien.no) [17:26] *** PurpleSym has quit IRC (hub.dk irc.homelien.no) [17:26] *** PotcFdk has quit IRC (hub.dk irc.homelien.no) [17:26] *** pikhq has quit IRC (hub.dk irc.homelien.no) [17:26] *** i0npulse has quit IRC (hub.dk irc.homelien.no) [17:26] *** limebyte has quit IRC (hub.dk irc.homelien.no) [17:26] *** coretx has quit IRC (hub.dk irc.homelien.no) [17:29] *** bauruine has quit IRC (Ping timeout: 256 seconds) [17:33] *** bauruine has joined #archiveteam-bs [17:35] *** xmc sets mode: +o yipdw [17:45] arkiver, going slow, due to the rate limiting [17:46] 1 server got banned, the other 3 are churning out the profiles about 1-1.5secs each hit [17:47] the script randomly dies due to reset by peer and a few other errors [17:47] so during those spells, i rename the txt files and upload them to github [17:48] *** altlabel has joined #archiveteam-bs [17:48] *** PurpleSym has joined #archiveteam-bs [17:48] *** PotcFdk has joined #archiveteam-bs [17:48] *** pikhq has joined #archiveteam-bs [17:48] *** i0npulse has joined #archiveteam-bs [17:48] *** limebyte has joined #archiveteam-bs [17:48] *** coretx has joined #archiveteam-bs [18:01] *** JW_work1 has joined #archiveteam-bs [18:02] *** JW_work has quit IRC (Read error: Operation timed out) [18:05] *** JW_work1 has quit IRC (Client Quit) [18:08] *** JW_work has joined #archiveteam-bs [18:09] *** vitzli has quit IRC (Leaving) [18:23] thanks PurpleSym [18:25] *** rduser has joined #archiveteam-bs [18:26] *** RichardG has quit IRC (Ping timeout: 258 seconds) [18:33] *** RichardG has joined #archiveteam-bs [18:44] *** Start has joined #archiveteam-bs [18:46] IPFS is craaazy [19:15] *** Start has quit IRC (Quit: Disconnected.) [19:17] *** robink has quit IRC (Ping timeout: 633 seconds) [19:19] *** bwn__ has quit IRC (Read error: Operation timed out) [19:19] *** Start has joined #archiveteam-bs [19:20] !igset d51o6r27tk48yxiypvd5jl1pu blogs [19:20] !igset d51o6r27tk48yxiypvd5jl1pu blog [19:26] MrRadar: wrong channel [19:27] I know [19:29] *** tomwsmf-a has joined #archiveteam-bs [19:32] *** robink has joined #archiveteam-bs [19:35] *** bwn__ has joined #archiveteam-bs [19:44] *** JW_work has quit IRC (Quit: Leaving.) [19:46] *** JW_work has joined #archiveteam-bs [19:55] *** BnA-Rob1n has joined #archiveteam-bs [19:55] *** BnA-Rob1n has quit IRC (Remote host closed the connection) [19:55] *** BnA-Rob1n has joined #archiveteam-bs [19:56] *** Swizzle_ has joined #archiveteam-bs [20:09] *** Swizzle has quit IRC (Read error: Operation timed out) [20:45] *** Start has quit IRC (Quit: Disconnected.) [20:58] *** Swizzle__ has joined #archiveteam-bs [21:11] *** Swizzle_ has quit IRC (Read error: Operation timed out) [21:12] lost 1 server, gain another [21:14] it's all servers and roundabouts? [21:19] dumped an idle dedi, got a scaleway [21:20] nice, thats where newsbuddy started [21:31] *** mr-b has quit IRC (Read error: Operation timed out) [21:31] *** SadDM has quit IRC (Read error: Operation timed out) [21:31] *** Mayonaise has quit IRC (Read error: Operation timed out) [21:32] *** acridAxid has quit IRC (Read error: Operation timed out) [21:32] *** rduser has quit IRC (Read error: Operation timed out) [21:32] *** jspiros has quit IRC (Read error: Operation timed out) [21:32] *** wp494_ has joined #archiveteam-bs [21:33] *** wp494 has quit IRC (Ping timeout: 246 seconds) [21:33] *** matthusby has quit IRC (Ping timeout: 246 seconds) [21:33] *** SN4T14 has quit IRC (Read error: Operation timed out) [21:33] *** remsen has quit IRC (Ping timeout: 246 seconds) [21:35] *** schbirid has quit IRC (Quit: Leaving) [21:40] *** remsen has joined #archiveteam-bs [21:41] *** SN4T14 has joined #archiveteam-bs [21:43] *** rduser has joined #archiveteam-bs [21:47] *** acridAxid has joined #archiveteam-bs [21:49] *** mr-b has joined #archiveteam-bs [22:10] *** lytv has joined #archiveteam-bs [22:15] *** vtyl has quit IRC (Read error: Operation timed out) [22:47] *** kvieta has quit IRC (Read error: Operation timed out) [22:48] *** remsen has quit IRC (Read error: Operation timed out) [22:48] *** mismatch has joined #archiveteam-bs [22:49] *** botpie91 has quit IRC (Read error: Operation timed out) [22:51] *** botpie91 has joined #archiveteam-bs [22:52] *** mismatch_ has quit IRC (Read error: Operation timed out) [22:53] *** ersi has quit IRC (Read error: Operation timed out) [22:56] *** Zebranky has quit IRC (Remote host closed the connection) [22:58] *** godane has quit IRC (Read error: Operation timed out) [22:59] *** beardicus has quit IRC (Read error: Operation timed out) [23:00] *** godane has joined #archiveteam-bs [23:04] *** beardicus has joined #archiveteam-bs [23:04] *** kvieta has joined #archiveteam-bs [23:06] *** remsen has joined #archiveteam-bs [23:06] *** ersi has joined #archiveteam-bs [23:06] *** swebb sets mode: +o ersi [23:07] *** Zebranky has joined #archiveteam-bs [23:31] *** kvieta has quit IRC (Read error: Operation timed out) [23:32] *** remsen has quit IRC (Read error: Operation timed out) [23:33] *** beardicus has quit IRC (Read error: Operation timed out) [23:33] *** closure has quit IRC (Read error: Operation timed out) [23:33] *** kvieta has joined #archiveteam-bs [23:34] *** beardicus has joined #archiveteam-bs [23:35] *** remsen has joined #archiveteam-bs [23:38] *** closure has joined #archiveteam-bs [23:40] *** kvieta has quit IRC (Read error: Operation timed out) [23:40] *** mr-b has quit IRC (Read error: Operation timed out) [23:41] *** remsen has quit IRC (Read error: Operation timed out) [23:41] *** closure has quit IRC (Read error: Operation timed out) [23:43] *** beardicus has quit IRC (Read error: Operation timed out) [23:45] *** botpie91 has quit IRC (Ping timeout: 633 seconds) [23:50] *** remsen has joined #archiveteam-bs [23:50] *** botpie91 has joined #archiveteam-bs [23:50] *** kvieta has joined #archiveteam-bs [23:51] *** beardicus has joined #archiveteam-bs [23:53] *** closure has joined #archiveteam-bs [23:55] *** Start has joined #archiveteam-bs