[00:06] *** mistym has quit IRC (Remote host closed the connection) [00:16] *** godane has joined #archiveteam [00:57] *** primus104 has quit IRC (Leaving.) [00:57] *** mistym has joined #archiveteam [00:58] *** primus104 has joined #archiveteam [01:11] *** primus104 has quit IRC (Leaving.) [01:12] *** primus104 has joined #archiveteam [01:58] *** primus104 has quit IRC (Leaving.) [02:06] *** mistym has quit IRC (Remote host closed the connection) [02:19] *** philpem has quit IRC (Ping timeout: 272 seconds) [02:26] *** mistym has joined #archiveteam [02:30] *** Ymgve has quit IRC () [02:33] *** ruukasu has quit IRC (Quit: WeeChat 1.0.1) [02:36] *** ruukasu has joined #archiveteam [04:14] *** ruukasu has quit IRC (Quit: WeeChat 1.0.1) [04:16] *** Start is now known as StartAway [04:22] *** wp494 has quit IRC (Ping timeout: 186 seconds) [04:23] *** ruukasu has joined #archiveteam [04:27] *** wp494 has joined #archiveteam [05:03] *** aaaaaaaaa has quit IRC (Leaving) [06:06] Last of the nightmare directories. [06:08] hey SketchCow [06:11] In THEORY, and I mean IN THEORY, FOS's mrtgs should get better now [06:12] everything with computers is in THEORY [06:12] like in THEORY the clown should save everything [06:13] but it doesn't when it goes bankrupt or gets bought [06:14] http://fos.textfiles.com:8088/mrtg/ [06:14] godane, you're so cute when you're cynical [06:16] SketchCow: you maybe getting radionz mp3 files [06:16] it goes back to 2008 [06:18] its funny that you have these talks on these radio channels that end up putting it online [06:18] *** mistym has quit IRC (Remote host closed the connection) [06:18] but theres proof that there doing the very thing your fighting again [06:19] I can see MRTG's pain is heavily reduced [06:45] *** mistym has joined #archiveteam [06:48] I pity the foo [06:50] *** Aranje has joined #archiveteam [07:15] *** mutoso has quit IRC (Read error: Connection reset by peer) [07:20] *** APerti has joined #archiveteam [07:24] *** slipstrea has quit IRC (Ping timeout: 480 seconds) [07:24] *** pfallenop has quit IRC (Ping timeout: 480 seconds) [07:28] *** pfallenop has joined #archiveteam [07:35] *** slipstrea has joined #archiveteam [07:40] *** dxdx is now known as dx [07:44] *** slipstrea has quit IRC (Ping timeout: 480 seconds) [07:55] *** slipstrea has joined #archiveteam [08:07] *** dxdx has joined #archiveteam [08:07] *** dx has quit IRC (Read error: Operation timed out) [08:19] *** primus104 has joined #archiveteam [08:37] *** philpem has joined #archiveteam [08:48] *** dashcloud has quit IRC (Read error: Operation timed out) [08:51] *** dashcloud has joined #archiveteam [09:05] *** APerti has quit IRC (Ping timeout: 378 seconds) [09:10] *** dx has joined #archiveteam [09:11] *** slipstrea has quit IRC (Ping timeout: 480 seconds) [09:12] *** dxdx has quit IRC (Read error: Operation timed out) [09:21] *** dx has quit IRC (Read error: Operation timed out) [09:26] *** dx has joined #archiveteam [09:26] *** slipstrea has joined #archiveteam [09:41] *** dx has quit IRC (Ping timeout: 272 seconds) [09:41] *** dx has joined #archiveteam [09:48] *** dx has quit IRC (Ping timeout: 265 seconds) [09:48] *** dx has joined #archiveteam [10:09] *** schbirid has joined #archiveteam [10:11] SketchCow: viddy is going to start after IA is back from their free days [10:13] problem is, there are over 3.5 million user id's and they are all random numbers (example: f874d7f4-b3fb-4bdc-b83f-2f944cf290ac) [10:14] and then we have 1.5 million video id's, which are constructed the same way [10:15] SketchCow: so I'm afraid this will have to be over 5 million items :/ [10:21] https://twitter.com/ezmobius has passed away [10:43] schbirid: According to his entry he was some scientist... How did he die? [10:46] Yeah. I see there are not details yet [10:57] *** mutoso has joined #archiveteam [10:59] http://www.justtechnews.net/ezra-zygmuntowicz-has-passed-away/ [10:59] https://news.ycombinator.com/item?id=8676140&utm_source=dlvr.it&utm_medium=tumblr [11:03] *** dashcloud has quit IRC (Ping timeout: 335 seconds) [11:04] *** rejon has joined #archiveteam [11:08] *** dashcloud has joined #archiveteam [11:27] *** mistym has quit IRC (Remote host closed the connection) [12:04] *** Ymgve has joined #archiveteam [12:09] *** bauruine has quit IRC (Ping timeout: 265 seconds) [12:28] *** bauruine has joined #archiveteam [12:45] *** bauruine has quit IRC (Read error: Connection timed out) [12:54] *** primus104 has quit IRC (Leaving.) [13:03] *** bauruine has joined #archiveteam [13:28] *** bauruine has quit IRC (Quit: ZNC - http://znc.in) [13:31] *** bauruine has joined #archiveteam [13:39] *** BlueMaxim has quit IRC (Quit: Leaving) [14:27] *** adrian_ has joined #archiveteam [14:30] hey everyone [14:30] a while back you guys helped me archive a website I am going to turn off [14:30] http://archive.fart.website/archivebot/viewer/job/9tchq [14:32] i’m wondering if it’s possible to do a partial crawl of things that have changed since then [14:32] i want to sunset the site tomorrow [14:34] i dont think archivebot can do that atm [14:34] partial crawls, no. [14:34] unless it's a subsection of the site it can crawl entirely [14:37] such as a subdomain incase you don't know what GLaDOS means [14:37] or someone who's good with the archivebot can add ignore masks [14:41] i could do it myself through curl, maybe ? [14:49] *** fluff is now known as fluff_ [14:50] *** fluff_ is now known as fluff [14:55] *** fluff is now known as fluff_ [15:11] *** primus104 has joined #archiveteam [15:31] *** T31M has joined #archiveteam [15:32] *** cf has quit IRC (Quit: cf) [17:19] *** logchfoo has quit IRC (Ping timeout: 612 seconds) [17:21] *** logchfoo starts logging #archiveteam at Sun Nov 30 17:21:14 2014 [17:21] *** logchfoo has joined #archiveteam [17:51] *** signius_ has quit IRC (Ping timeout: 258 seconds) [17:58] *** Aranje has quit IRC (Read error: Connection reset by peer) [17:58] *** Aranje has joined #archiveteam [18:03] *** signius_ has joined #archiveteam [18:19] *** cf has joined #archiveteam [18:40] *** StartAway is now known as Start [18:41] *** APerti has joined #archiveteam [19:06] *** cf has quit IRC (cf) [19:19] *** mistym has joined #archiveteam [19:19] *** cf has joined #archiveteam [19:26] arkiver: Not 5 million IA items, right [20:15] Time to write something. [20:39] SketchCow: I just mean 5 million warc.gz items for FOS, which need to be made into megawarcs [20:42] are the user pages huge? If not it might be better to combine them in to packs. [20:43] aaaaaaaaa: problem is, there are over 3.5 million user id's and they are all random numbers (example: f874d7f4-b3fb-4bdc-b83f-2f944cf290ac) [20:46] *** mistym has quit IRC (Remote host closed the connection) [20:58] We NEED to come up with a way to create the megawarcs that doesn't kill FOS again. [21:08] *** mistym has joined #archiveteam [21:20] *** schbirid has quit IRC (Leaving) [21:24] http://archiveteam.org/index.php?title=Nightmare_Projects [21:32] *** BlueMaxim has joined #archiveteam [21:37] SketchCow: my funny or die collection is sort of a nightmare project [21:37] only cause i have no idea of its final size [21:38] but i'm uploading it slowly so IA can get the space need for it [21:43] It is a little silly [21:43] But I think it's a useful thing, although if they turn, it could disappear [21:47] There's a conspiracy theory among whatevers that you are some agent of Turner Broadcasting trying to false flag archive.org. [21:49] Which is hysTERRRRICAL [21:49] I remember when a somewhat respected member of the security industry (nowadays) accused me of being an FBI plant [21:51] Paranoia and imagination make quite a combination. [21:56] SketchCow: seriously? hahaha! what kind of "whatevers" are these? [22:01] https://archive.org/search.php?query=forumPost%3A1%20AND%20%22funny%20or%20die%22 [22:03] The forums have some... unique individuals. [22:03] SketchCow: when can I start halo again? [22:03] that i can believe ... >.> [22:04] And alright if I start viddy on FOS very soon? (size will be something like 4-5 TB probably) [22:05] ^ and a few million of items as wrote before. [22:05] *** toad1 has quit IRC (Leaving.) [22:07] Is there ANY way to pre-cook the megawarcs, or is my poor machine in hell again? [22:08] "pre-cook" [22:08] The halo should be as slow as possible. It's filling drives [22:09] Well, I can try to have multiple items per item, but I'm not sure how long the item name can be [22:09] Yeah, pre-cook. In ideal world, they're turned into megawarc before going on FOS. If not, keep it slow. [22:09] Let's put it this way. [22:09] Halo is killing FOS. [22:09] KILLING IT. [22:10] chfoo: is there a limit on the length of the item names? [22:10] *** toad has joined #archiveteam [22:10] SketchCow: we'll run halo slow [22:11] SketchCow: it might be better to ask an other AT member if he/she wants to help us out with the rsync? [22:12] hmmmm. halfbaked idea, megawarc'ing in the process that receives incoming warcs [22:13] garyrh: xmc: I generally find the IA forums to be the most... "insane" part of IA [22:15] midas: would you be able to help us out with the Viddy project with some rsync space and processing power for the megawarcs? [22:16] *** fluff_ is now known as fluff [22:16] I have to go to a movie, then drive north to the compound. [22:16] But I'm happy to go into the ridiculous requirements of things. [22:18] I mean, right now, SWIPNET and VERIZON are no longer in horrendous directories filled with hundreds of thousands of indiviual files [22:18] Now they're in massive processes of turning 3-4 directories into megawarcs. [22:18] All these are taking days. [22:20] what's the resource constraint in the megawarc process? [22:20] io wait? [22:20] cpu time? [22:20] so can FOS take viddy? Right now I'm hearing that FOS is being killed by halo, is being killed by nightmare directories, and those things. So I'm starting to think it might be better to have this projet not on FOS [22:21] it might be a good idea for a feature kind of warrior thing, that peopple can volunteer as Megawarc creators. [22:22] The main constraint with FOS is that it is doing about 10-20 things. [22:22] They need to have at least 100 GB, the files are going to those volunteers, they create the megawarcs, the megawarcs are send to FOS [22:22] volunteer megawarcing could be a solution, although not everyone has the capacity to create 50gb warcs to send to fos/ia [22:23] different size options would be needed. 10/20/30gb etc [22:23] I know, so it needs to be some seperate project people can select [22:23] the problem with that is that the files can be deleted by the volunteer [22:23] Some of those are very simple (download and later upload DNA Lounge) and some are nightmare (literally, copying ONE USER'S directory of swipnet grabs took this machine three solid days. just cp -v) [22:24] OK, so. [22:24] In an ideal world, we could probably have items creating 5 or 10gb megawarcs. [22:25] Upload those to FOS, and then it can upload into the site. [22:25] Like, Halo isn't hard, but it's pulling in a lot. [22:26] And right now, I'm watching Swipnet go for days, making a megawarc [22:30] I just wrote a couple scripts to make things more efficient, and I can shore things up more, but it's still making megawarcs of massive amounts of 1mb items [22:30] arkiver: no limit on item names i think. if you want to use long item names, you'll need to do something like sha1 digest to keep the filenames from exceeding the limit [22:30] That will always kill the machine, and it got to the point I'd see things take hours to do, like a ls [22:32] One mistake on my part was that the ia uploader now has a nice --checksum and --delete pair. [22:32] So if something is uploaded, it will delete it off the drive, having verified it got through. [22:32] That will help a LITTLE. [22:36] Since this mess started, the developer of the ia client has added a bunch of stuff for me to make life easier [22:36] Was using it for archivebot, now the uploaders will too. [22:38] I see archivebot's up to a terabyte buffer [22:43] SketchCow: I'm going to do this. Tomorrow in the afternoon I'll start viddy, I will then use FOS's rsync. But in the meantime I'll try to find an rsync other then FOS. If I have an other place to send the files to I'll disable to upload to FOS and continue with the new upload target. [22:44] chfoo: thanks, I'll see what I'm going to do with those item names [22:52] Viddy project channel: #viddiot [22:58] *** slash` has quit IRC (Quit: Coyote finally caught me) [23:00] *** slash` has joined #archiveteam [23:33] i've made notes on some sites that are currently shutting down (ziplist, nokia memories, etc.): http://paste.archivingyoursh.it/micevefine.avrasm [23:34] ziplist seems to be the easiest to archive, then exfm [23:34] brace.io reminds me of jux [23:34] and relay.im and nokia memories will be the trickiest for discovery [23:35] *** mistym has quit IRC (Read error: Connection reset by peer) [23:36] here's the correct link: http://paste.archivingyoursh.it/sebugupute.avrasm [23:36] first one has an error [23:36] *** mistym has joined #archiveteam