[10:17] what do you guys do when trying to archive sites that could have say 500,000+ pages, or even over 1,000,000 in the case of a large forum? [10:18] wget uses quite a bit of memory when doing recursive retrievals, or anything with -E or -p turned on [10:19] I shouldn't say quite a bit. Rather, it uses the approriate amount to get the job done, which just happens to become larger and larger when dealing with big sites. [10:20] I use httrack for large sites. It has much better memory management [10:20] really, hmm [10:20] the documentation was really poor for httrack last time i checked [10:21] or rather it wasn't nearly as explanatory as wget [10:21] a member of the community had written the doc, rather than the actual author of the app [10:22] I have always avoided forums in my website archives since... they are just too big and would butcher my drives. However lately I might archive some, and just package them up on the server and not do any post processing work on them. [10:23] Whatever I get is what I get [10:23] the only issue though is when the forum grab starts getting duplicate stuff, like hitting jump links to individual posts in a phpBB2 forum. [10:24] that is where the more advanced filtering in httrack comes in [10:25] ah ok cool, I will have to take another stab at fully learning httrack when I decide to start hitting some big forums [10:26] you can do domain, subdomain, file format, directory depth and regular expression matching with no limitations on how many rules you create [10:27] one thing I have been loving lately is using a RAM disc to extract content for post processing and packaging [10:28] ah yea I could have some use for that level of granularity, especially with sites that are like user.domain.com, and domain.com/user/, where the admin linked content in his HTML hard linked from either or [10:28] the domain scoping in wget is just -D [15:44] joepie91: I want a MP10 powerhead and controller... only £200+! [15:46] errr wrong hcannel and person! [15:53] :P [15:53] SmileyG: classy [16:15] so, got a question for anyone else who has had an SSD die on them: did you get any kind of warning, or know it was dying before it died? [18:09] dashcloud: never had one die myself. supposedly wear leveling on the intel ones is supposed to make them go read-only when they'd run out of spare sectors, but i don't know if that actually works or they lose the remap table sectors first [18:09] which kills the ssd [18:09] or more specifically is like losing the fat of a filesystem; the data is al there you just have no idea what order its supposed to be in [18:27] the first notice I had that something was wrong was turning the laptop on, and wondering why it's sitting at the logo screen for so long [18:36] heh [18:36] sucks, hope you got backup and this is why I don't trust ssd's yet. [18:37] All hard drives fail and this is why frequent backups are necessary [18:37] yes but spinning rust has a generally well known failure style [18:37] unless you hit a power spike, or punch your PC. [18:39] That said, for some hard drive failures checking the smart settings frequently can clue you into failures [18:39] smart does not catch all problems but it is far better than what we used to have [18:41] You still have the beginning of the bath tub curve failures which usually go undetected till they happen [18:54] SmileyG, I have a VAIO Z, 3rd gen with all the trimmings. It is a powerhouse laptop, with a quad core i7 (desktop power, not low voltage cpu), 8GB of RAM, 1080p display that has 98% Adobe RGPs color gamut reproduction. The Power Media Dock that it connects to has a Radeon 7670M, USB 3.0 ports, can handle 4 connected displays, and the thing is so light it would blow your mind. [18:55] And? [18:55] The SSD is proprietary Sony NAND Flash memory in a Raid 0 config, and might even be soldered to the motherboard. [18:55] I run gentoo and boot in 3 seconds [18:55] :D [18:55] If that SSD dies... its a very very expensive brick. [18:57] it's actually light? I'd imagine a desktop replacement like that would weigh a considerable amount [18:57] So I have been taking every precaution to minimize writes to the SSD itself. I have been trying to treat it as read only as possible, and push everything off to an external 2TB USB 3.0 HD, as well as using a 2GB RAM disc. [18:57] the good news about the SSD is it's still under warranty, so I'll get a replacement- still sucks having it die sudddenly, and needing to reinstall everything [18:57] Its 2.5 lbs [19:01] http://www.mobiletechreview.com/notebooks/Sony-Vaio-Z-2012.htm [19:02] dashcloud: yea if you can replace the drive, then that is great, it would be crummy if the laptop ended up becoming unusable [19:03] running off of a live USB drive right now. Found Youtube's html5 player pretty good (better than I expected) [19:04] If you can, carve out a section of your RAM Disk and use that to move tmp/temp dirs and partitions off the SSD, and also use it as scratch space to extract packages that might have thousands of files in them. [19:11] So far I am mostly experienced with optimizing Windows 7 for minimizing SSD writes, and I have taken it really far. Moving everything from tmp/browser cache, to RDP Bitmap cache, killing office recent files to even killing all types of other unecessary writes like Beyond Compare's BCState.xml.tmp. [19:12] If you have PowerISO running with no disc in the drive, it writes over 3,000 log entries per day telling you it can't find a disk in the drive lol [19:12] But, I have started looking up some stuff for linux, and here is a good starting point: [19:12] http://superuser.com/questions/228657/which-linux-filesystem-works-best-with-ssd [19:13] I still need to find more links, but that url is pretty meaty [19:18] and SmileyG: 3 sec boot is awesome :D nice [19:36] dashcloud: no warning for me, but the SSD isn't actually dead [19:36] dashcloud: it just pops in and out occasionally -- I suspect it's a controller problem [19:36] I've only had problems with OCZ drives :P [19:36] the Intel X25-Ms I've had for about three years now are still going fine [19:37] how big a drive yipdw ? [19:38] omf_: 240 GB [19:39] I use it for ephemeral VMs and a Steam installation [19:39] so it was a surprising non-event when it went :P [19:39] was like "huh, ok" *reboot* "oh there it is" [19:40] how many years did it last? [19:40] less than one, though it's still working [19:40] the X25-Ms just passed three [19:41] I am trying to figure out the ideal size [19:41] I've been okay with 64 GB drives [19:41] that's on my laptop, though, which mostly hosts source code [19:42] the desktop has two 80 GB SSDs as well as that 240 GB [22:28] looks like i'm grabbing old articles of dailymail.co.uk that are really from femail.co.uk [22:28] even the id number of the article is same [22:54] yes, I think Femail is the Daily Mail's women's supplement [22:54] or some such bullshit. I try and avoid the Mail as much as possible, lest my brains start dribbling out my ears [22:55] I don't mind ready Daily Mail articles, because I have AdBlock enabled on their site, so I'm costing them money. [22:56] Additionally, they can serve as reliable news. [22:56] Just assume that the opposite of whatever they say is true, and bam, reliable news [22:57] i'm only going after the first 100000 articles [22:58] there is over 2.5 million article ids to check [22:58] and i don't want to do that much [22:59] so the first 199 episodes of destructoid is uploaded [23:00] i'm downloading the 2xx epsidoes right now [23:00] also geekbrief tv is going to get uploaded [23:01] i decide to use the basename of the video files [23:02] one of my bosses is of the opinion that the Daily Fail is more truthful than mainstream. :-\ [23:03] (he's also been (still is?) a truther. and it seems he's going down the conspiracy hole.) [23:03] also out of 20000 ids there is only about 5500 that are real articles on the site [23:04] truthers are a real nutty group [23:06] also i think the truthers go there theory from a failed x-files spin off [23:15] signs of an empire in decline imo [23:15] i'm thinking the same thing with revision3 [23:16] trying to grab like everything that i can from it