[00:43] Do we as heavy users of computer really workout enough. [00:43] I wouldn't be surprised if everyone in this irc channel is unfit, myself included. [00:44] Do things like standing desks help out? [00:49] ha ha. [00:49] Well, I have a fitbit scale and fitbit wearable. [00:49] I have been low carb for over a week [00:50] CPAP machine and blood pressure meds to keep that under control [00:50] Oh I didn't realize you dropped carbs AND caffeine. You must enjoy sleeping at least for a week or two [00:51] No sense in going low carb and drinking diet soda [00:52] I give up caffeine over a year ago and it was really fucking hard. I was an 8+ cups a day for 11 years [00:52] Low carb worked for me as well but both at once, I give you credit man, that is hard [01:03] archiveteam-health [01:03] (Don't go there) [01:03] (It's full of fried recipies [01:03] SketchCow: i have mirrored clutofmac articles [01:04] its about 469.3mb [01:18] Bravo [01:19] still have to go after images [01:21] So you backed up the pages but not the images... why not get both at once? [01:21] i do more then one archive [01:21] first the index so i can get the links [01:21] then the articles [01:21] speaking of which [01:22] I had a bit of an issue wget-warc'ing engadget [01:22] it... ran out of RAM [01:22] how fix? [01:22] engadget is done [01:22] well yes, but assuming a theoretical future crawl [01:22] that might have the same issue [01:22] (did you get joystiq and their subsites, btw?) [01:22] (massively and wow in particular) [01:22] i did based on years [01:23] after i grabed the pages [01:23] I see [01:23] is there any way to do a 'regular' warc but have it store whatever it's throwing into RAM, in some other place? [01:23] (I assume the URL list?) [01:24] godane, How is that more efficient than using the spidering features built into most website mirror software [01:24] joepie91, have you looked at the wget code [01:24] its was mostly so i didn't get hit by the 4gb wget warc limit [01:25] and my wifi drops sometimes [01:25] I have 20gb warcs I made with wget. That sounds more like a 32bit problem [01:25] or your running on Windows [01:26] omf_: I have not [01:26] also this gives order to things [01:26] I was more looking for a command line switch kind of thing [01:26] joepie91, that does not exist [01:27] irony http://www.nature.com/nature/journal/v497/n7448/full/497183a.html [01:27] Paywalls all the way down [01:28] godane, how does having things in "order" help at all? When this gets shoved into the wayback machine it doesn't matter [01:28] it helps me do it in warc-proxy [01:29] it takes a very long time to make idx files on the fry locally [01:29] and again my wifi sucks sometimes [01:29] or my internet sucks sometimes [01:29] Is wifi all they offer in your area? [01:29] we have cable [01:30] but my room is not where the cable is [01:31] also i don't like upload 20gb files [01:31] even 5gb to 10gb files i try my best to not grab cause it takes to long [01:32] So you put up with and change your workflow to work around wifi drops when a few dollars in cable and some time would fix the problem permanently. [01:32] even on a wired connection I've had hiccups uploading 20gb stuff to IA with consumer-level upstream [01:33] my dad will not like 60ft of cable running across the living room [01:33] So have I, but that happens way less than bad wifi single [01:34] again this is my way of doing things [01:34] i don't need the #comments pages [01:34] They sell these little plastic hooks with a sticky side at Lowes or Home Depot. They are paint safe and easy to remove. I have used them to run line on a ceiling so it is out of the way [01:34] or #top [01:34] cause there the same page [01:34] again i will not do that [01:35] wifi is my ownly option without me going with a netbook into that room [01:36] omf_: paint safe? that's a blatant lie. [01:36] chronomex, I have removed them after 3 years use and no paint damage [01:36] depends on the paint I guess [01:39] also based on the link list i cut my mirror to half cause there no bad or double links to the same story [01:39] all cause of a #comments like link [01:43] godane, URL fragments (aka #comment) can be handled by modern spiders automagically. [01:44] again i want to upload small sites [01:44] this made cultofmac small [01:44] we have the articles [01:45] also just spidering will get crap like this [01:45] http://cdn.cultofmac.com/wp-content/themes/com2012/i/a/gu [01:46] its a broken link but cultofmac.com redirects [01:46] more data [01:46] which makes it harder for me to upload [01:46] should have been this link: http://cdn.cultofmac.com/wp-content/themes/com2012/i/a/guest_post.jpg [01:46] But that redirection information can be useful. What if those links are used on another site? Leaving them out means your grabs are incomplete [01:47] BlueMax, they just make printers [01:48] a few people got in trouble for hosting the files for 3d guns [01:48] ah right OK [01:48] there are other 3d printer makers as well, all trying to make the best printer [01:48] that redirect is to something i grabed in my article dump [01:48] If I had the money spare I would own one [01:48] no need to do a 2nd grab in my image dump [01:49] It doesn't matter. If the warc does not have information that the url you left out is a redirect then there is no record of that path to that content. [01:50] plus if you deduplicate before uploading then content fingerprinting cleans things up [01:50] i talked about dedup with the same warc [01:50] wget warc doesn't work that way [01:51] it can't figure it out in less its in older dump [01:51] that you point to cdx file to [02:03] Why not use warcdump to map and then warcfilter to clean it up. I assume that is why those warc tools were written [02:06] i never use those tools [02:06] will try at some point [02:06] but right now i'm doing it my way [02:15] so, does anyone know of stories, videos, etc on the history of the slow-cooker (I guess Crock-Pot is the most well-known of the bunch)? It must have been a fascinating invention at the time- a cooking appliance you can safely leave on for 8 hours and not worry about something catching fire [02:15] I like how rice cookers can be used like crockpots now [02:17] apparently some of the Japanese models play a short bit of music once you successfully cook the rice [02:35] I heard some use a form of fuzzy logic to determine cooking time. [02:53] I know there are a fuck ton of bloggers out there now but how many software programming do you think? A few thousand that are actively updated [02:54] I was looking at the government numbers for programmers and is tiny. Then think that most of them do not blog and what exactly is left [03:01] so i got at least 40 videos today [03:01] most are tss previews for shows [03:01] but weeds out the index so i can search for other stuff [08:03] So here I am at work [08:04] Nothing ever changes, this makes me lololol. [08:26] /dev/sdc1 1.4T 685G 642G 52% /home/tim.bowers/bin/ign/storage [08:26] yes, warc's [08:29] "Ain't nobody got time for that"... lol... Huge data [08:30] norbert79: :D [08:30] I have about 50+ more metadata files to write :< [12:51] so i have started to upload the cultofmac.com dumps i have [12:51] uploaded: https://archive.org/details/www.cultofmac.com-index-20130513 [12:52] also note that the pages maybe also in my articles dump too [13:54] damn, it's quiet in here today? [14:03] uploaded: https://archive.org/details/www.cultofmac.com-articles-20130513 [14:08] i maybe able to get gphoria 2004 [14:08] what is/was that? [14:09] it was a award show on g4 [14:10] i got the 2006 one on IA [14:12] funny that mpg can be played right away when its been downloaded but not mp4 [14:12] right [14:12] im so bored out of my skull at work and have no clue wtf is going on atm (my brain has fried and I'm honestly lost) I'm just gonna work on metadata [14:13] damn, newamerica is still fetching. [14:13] as is pouet. [14:13] how big is it? [14:13] waiting for it to tell me :DX [14:13] pouet! [14:14] 141x1.1Gb [14:14] for new america, so far. [14:14] -rw-r--r-- 1 tim.bowers games 34G May 14 15:14 ./rotavault.ign.com-2013-04-19.warc [14:15] -rw-r--r-- 1 tim.bowers games 10G May 14 15:15 ./pouet/pouet.net_06052013.warc [14:25] i'm just glad i didn't have to do that now [14:25] newamerica that is [14:31] aye [15:16] https://twitter.com/mikko/status/334286983193047040 [15:17] http://www.h-online.com/security/news/item/Skype-with-care-Microsoft-is-reading-everything-you-write-1862870.html english version [18:13] SmileyG, hey yo ;) [18:14] hey [18:14] su [18:14] sup? [18:14] need some jobs to run? [18:14] JHust currently spazzing out atm so no thanks D: [18:15] Everything all right? [18:19] http://yro.slashdot.org/story/13/05/14/0134224/new-prenda-law-shell-corp-threatening-to-tell-your-neighbors-you-pirated-porn [18:19] I cannot wait for the new round of kick Prenda in court [18:19] what's left of them? thought they got dismantled by a judge. [18:20] or was that just their law firm maybe [18:20] They formed a new shell company and started right back up [18:20] same people [18:20] they've done that before, you'd think the judge woulda seen it coming [18:22] The judge cannot stop them from breaking the law again, only punish them [18:22] So today it is finally back into the 70s outside, we had a couple days back in the 40s and that sucked [18:22] looks like the previous punishment was just $80k fine (and the breakup) [18:23] yeah it's nice now! actually below freezing last night, ridiculous [18:27] omf_: hahahaha [18:27] these guys are just an infinite source of entertainment, aren't they [18:27] The cool thing is now that the judge has slapped and they did it again, now RICO can be brought in [18:27] also, completely unrelated [18:27] linux kernel sploit [18:27] root priv escalation [18:27] affects .32 (vswap) openvz kernels [18:27] if you have an openvz VPS with vswap, I recommend backing up your shit and informing your provider [18:27] http://www.lowendtalk.com/discussion/10514/linux-kernel-2.6.37-3.8.8-0day [18:28] supposedly it allows you to gain system root from a container [18:38] Oh look Ubuntu canceled brainstorm, thats a shock. A distro runs a site for community input and then ignores all that input [18:40] seems typical [18:44] omf_: Is setting up your jobs hard? Is it suitable to do in the warrior? [18:45] If I overhauled the warrior [18:47] So, that's a no? [18:50] omf_: btw what is the jobs? :D [18:50] Running the jobs is easy, the initial setup on Debian and CentOS is pretty involved because of how out of date the software is on those distros [18:51] SmileyG: Posterous-screenshotting [18:51] gentoo \o/ [18:51] and just bumping to unstable or testing causes conflicts in the dependencies [18:51] I can ssh into work and give it a poke [18:51] though I just signed my self back off work D: [18:51] SmileyG, You running gentoo at work? [18:51] yes, I'm fucking epic [18:51] haven't tried it on there, in theory it should be the easiest [18:52] indeed [18:52] give me commands [18:52] FEED ME MOAR. [18:52] sorry [18:52] hyper/headfucked right now [18:52] you will have to figure out some of the package names since they are changed with every distro [18:52] nod [18:52] znurt to teh rescue. [18:53] imagemagick by any chance? [18:53] nope [18:53] bit of python I guess... [18:53] not even a chance [18:53] :D [18:53] Ok cool, just give me commands and set me going with a small set [18:54] Also job tuning is based on cpu cores, and RAM [18:56] :O [18:56] well ram is at a premium atm due to some huige wgets [18:57] It is more CPU bound [18:57] much more [18:57] Ok good xD [18:57] I can free up a lot of CPU quite easily. [18:57] Just sorting font packages atm.. I suspect most are already installed. [18:57] blah [18:58] my internet is now so fast that my bottleneck is my disk I/O... [18:58] xD [18:58] ssd + ramdisks ftw. [18:58] I'm doing an emergency backup of all my VPSes running on openvz .32 atm [18:58] just in case [18:58] I doubt it. The bulk of them are Asian languages. So unless you frequently use Mandarin or Japanese I would think no. [18:58] And then realise if you can download faster than you can save, you can stream stuff faster than you can consume it. [18:58] joepie91: learn to QoS also. [18:58] SmileyG: ? [18:59] gentoo-portage.com/media-fonts/corefonts isn't loading for me [18:59] (also, I'm having a race with my internet right now - trying to clean out my disk in time before it fills up) [19:00] :O [19:00] pah, let me dig it out [19:01] http://corefonts.sourceforge.net/ [19:35] i just found Jurassic Park the ride E! Live Permiere Special [20:00] So bash fails with 100,000 items to glob [20:00] something new I learned today [20:02] use xargs [20:06] usual bash line length is limited to 256 kB [20:12] I used to compile the kernel so this problem wouldn't happen. I got around it by doing: find . -mindepth 1 -maxdepth 1 -iname "*.png" | zip -0 -@ images.zip [20:17] Yeah sep332 I had forgot there was a size limit [20:17] yeah it happens :) [20:17] I thought for a minute it was # of items and not size of buffer [20:28] Some days it is hard to keep track of all the moving parts [21:33] Fight the power - https://9gag.com/gag/aejoYKv [21:36] so looks like usatoday.com is not in wayback [21:49] SketchCow: cultofmac.com is backed up now [21:49] uploaded: https://archive.org/details/cdn.cultofmac.com-images-20130513 [22:41] i had to fix a desc and link for july 2 2008 epsisode of buzz out loud in the wiki i'm getting links and descs from [22:52] Load average: 27.43 [22:52] Not sure if I am doing enough work yet [22:56] omf_: pft, real men have at least 41