[00:23] awww hell wtf [00:23] * Smiley ponders what to tell the doctor tomorrow [00:23] "HI, most of the time i feel completely normal, apart from those times I wish I didn't exist." [00:23] You have moments that make you anxious for no reason [00:24] yes [00:25] Dr Cope << hahaha [00:25] That is how you should explain it and then give a few examples. [00:25] yes. [00:25] yes I should [00:27] from twitter: Allow me to donate this slogan to any evanescent startup with no business model beyond acquisition. "We'll make you the star in rm -rf *" [00:27] yes dashcloud [00:49] Wikis must drive OCD people nuts [01:10] yeah I put in like 7000 spelling and formatting edits to wikipedia before basically giving up [01:18] I think our wiki is a lot better than it was last time this year [01:19] More content and much less spam [01:19] so I was just handed a printout of someone's geocities site that archiveteam saved. He was very happy. [01:23] awww [02:29] Iooking at the 60day history, there was a fuck ton of backing off http://archive.org/stats/s3.php#60d [03:19] 753 of the top 1 million sites are google [03:20] #2, #12, #23, #25, etc.... [04:35] On a 2gb 2core butt I can grab 7 screenshots at a time [04:35] CPU usage being the primary limiter [04:36] I am slowly scaling this up to see if any bottlenecks arise [04:36] The end goal being able to just fire this up and capture any amount of pages in a reasonable amount of time [04:37] I still cannot figure out why images were not showing up from amazon on the posterous test [04:57] what is the best way to resolve a large batch of URL's that are being redirected? aka have been "moved" [04:58] is there a better way than wget spider? [04:58] or cleaner I should say [05:07] are you just trying to get the resolve urls? [05:09] yep, i am using an app to check the status of a large URL list, which kicks back 404 not found, "ok", timed out, no connection, no such host, etc [05:10] and some are 301 object permanently moved, or 302 object temporarily moved [05:10] and I want to just get back where they were moved to [05:13] how big a url list [05:19] Hmm anywhere from 50 to 500,000 [05:19] i think on average it would be 100-1000 url's that I would be trying to resolve [05:22] I am basically using this app to chew through sites have archived for any externally linked URL's, and I run a check on those, looking for more sites to archive, and I do this on each site. [05:22] The list of potentials grows and grows [05:22] Yeah I am dealing with that kind of problem right now [05:22] I am testing out 7 urls at a time [05:24] I use a windows app called XENU to do URL checking [05:24] Scaling up the server size will help but I am looking into dns caching as well [05:27] I could use wget to spider, and set a level depth limit, causing it to resolve, but the verbose output is too much. [05:27] I would love to just get a straight list somehow [05:28] i might have to do that though, and pump the output through SED or something [05:28] just that I know how to do that in theory, not necessarily in practice off the top of my head lol [05:28] as i am not a wizard in regex yet [06:09] Ah yeah Xenu is really good. [06:15] instence, so if they give a 3xx HTTP result, the script needs to just say where the new location is? [06:17] yea [06:18] I export what is in xenu to tab list and then into spreadsheet [06:19] basically i want to just take the list of url's that are 3xx and resolve them, to find out their new location [06:20] Well whilst I was waiting for you to respond I made a Python script using some code I stole off of stackexchange, like usual, which kind of does that [06:20] http://brayden.ur.cx/redirect.py [06:20] it'll output 1 on the first line if a redirect then 2nd line it'll include the new location [06:20] otherwise 0 [06:20] using sed it would be something like sed -n1p or whatever it was to get the result [06:21] ah I was close [06:21] sed -n 1p [06:21] for line one, apparently [06:21] Is the spreadsheet just a csv or is it a proper excel one? [06:22] well, it can be anything really, as what gets dumped from xenu is just a tab delimited file [06:23] i usually drop it into a spreadsheet just to do sorting and contains searches, categorizing what is there [06:23] i would filter by 3xx and just select the urls, past to txt file and want to run that list through something to get the resolved URL's [06:25] thanks for the python example [06:25] parsing tab delimited files should be pretty easy. [06:25] just making a thing for that now [06:26] hmm almost wonder if I could do something like that in php as well, as I have written scraper scripts in php that work great [06:26] this is cool cause it will get me into learning a bit of python, which I haven't work with before yet ;) [06:26] Can you provide an example of a line from the tab delimited file? [06:27] I don't even know if I would process the tab delimited file. As I would do my sorting first and just select the range of URL's that I want to resolve so honestly whatever the script targets it could be a txt file with just a single list of URL's [06:28] so urls.txt, and just 100 URL's or whatever [06:30] C:\Users\brayden\PycharmProjects\PythonXMLTest>python show-redirect.py urls.txt [06:30] 1 http://www.google.com/ [06:30] 1 http://archiveteam.org/index.php?title=Main_Page [06:30] 1 http://www.google.com.au/ [06:30] seems to be working [06:31] awesome [06:31] http://brayden.ur.cx/redirect.py just updated this one [06:31] it has no error checking though or anything like that [06:31] if a site goes down or times out for whatever reason it will crash [06:31] really easy to fix though if htat becomes a problem [06:32] cool, at least if something does bork I will know why [06:33] this is great though [06:33] thanks a bunch for the script [06:33] no worries. Python is a really easy language to learn so you should try [06:34] quick question: does it treat tabs/spaces as contextual? since there are no curley braces? like coffee script? [06:35] for every line in the text file it'll just send the text off to the testUrl function [06:35] oh i mean for the python code itself [06:35] like the syntax [06:35] oh [06:35] it uses indenting [06:36] after every : you raise the indenting level [06:36] ok cool [06:36] It doesn't use braces or anything [06:37] this is the perfect thing to get my into experimenting with python, as I can already clearly see what this scrupt is doing and it relates to stuff I need to do [06:37] so I can probobly work from this, expand on it, and branch out into other tasks I need done [06:37] so looks like this weekend I will be getting into python a bit ;) [06:37] I'll be here around this time for the weekend if you need to ask anything [06:38] ok cool, i will most likely be up Fri/Sat night all night working on various archiving tasks [06:39] so I will ping you if I have any questions [06:39] I am going to get python up and running locally tomorrow night and test out this script for myself [06:39] yeah well python is really easy to setup on windows/linux so you shouldn't have any trouble [06:40] this just uses standard libraries [06:41] I have setup python before, for some application dependancy in the past. It was pretty straight forward. [06:44] Do you just use notepad++ or something like that to write it? [06:48] like as far as preferred editor? [06:49] well what are you planning on using? [06:50] well, I am sort of IDE agnostic at the moment. At work I use a combination of eclipse, notepad++, and Dreamweaver (coder mode only, mainly because you can custom collapse any selected region of code) [06:50] though I have ambitions to learn VIM and migrate to that [06:50] I am designer/front end developer [06:52] if I write python it might be in notepad++ for now [06:54] Well PHPstorm can collapse bits of code and I bet it has a hell of a lot better completion than Dreamweaver as well as being way cheaper. [06:55] and having Linux/Mac/Windows support [06:55] Personally I have a lot of trouble remembering things so I use a fully fledged IDE [06:55] PyCharm is the one I use. It is pretty expensive and I'm a student so can't really afford it. [06:55] Had to go via other means :( [06:55] but they have a 30 day trial [06:56] A lot of nice features are shared between editions, for instance, I'm doing a tornado template and it has detected it is HTML syntax. [06:56] and providing me with proper completion etc. [06:56] it even was nice enough to download the twitter bootstrap js and provide completion with that! [06:57] https://www.jetbrains.com/ anyway it is this lot. They do some seriously good software! [07:00] cool I will have to check it out [07:01] I actually have been shying away from hinting/completion aside from auto-closing html tags and auto-generating blocks of preformatted code [07:01] Why? [07:01] basically because I found myself getting too reliant on the auto completion, and i didn't realy know the languages as well [07:01] Well yeah there is that. [07:02] But that's not really a problem of using auto completion, I reckon at least. [07:02] so to force myself to remember and learn the languages better I have been just referring to api documentation and writing as much as I can [07:03] yea its not that auto completion is the problem per say [07:03] and few auto completions are powerful enough to let you get away with that anyway [07:03] I still have to go through module docs all the time. [07:05] I am going to checkout PyCharm [07:05] looks interesting [07:06] They made a nice theme for it too which is pretty easy on the eyes [07:06] this whole "darcula" thing they're integrating into their stuff [07:06] lol [07:06] nice name [09:30] so i found another episode of gamespot tv [09:30] i really wish these rips where at 165kbs [09:31] maybe not that great but still alot better then dialup [10:53] yes [10:53] found another episode of call for help [11:13] psyatric nurse :o [13:53] raaawr [14:06] Raring rawrtail [14:09] another gaming gem rescued from the dustbin of history http://archive.org/download/Nextys_Archive/OW__B.ISO/Butt_Slam%2FBUTTSLAM.ZIP [14:10] Whew, back [14:10] Butt slam :D Bwahaha [14:10] http://www.theverge.com/2013/5/3/4294548/tears-in-rain-how-snapchat-showed-me-the-glory-of-fading-data [15:03] Hackers for Charity documentary has 2 days left http://www.kickstarter.com/projects/1456247168/hackers-in-uganda-a-documentary?ref=live [15:39] I won't support it [15:39] They basically dumped in 5k of their own money to guarantee investment [17:57] http://www.forbes.com/sites/andygreenberg/2013/05/03/this-is-the-worlds-first-entirely-3d-printed-gun-photos/ [18:28] Thingiverse dropped the funs [18:28] guns [18:28] Talked to Bre about it [18:28] Told him I expected it, I said I expect Guniverse within seconds and he says there's already one [18:33] haha [18:33] http://i.imgur.com/KBCsbVi.gif [19:14] I am hitting the final stretch of having refreshed all my backups. It is a huge relief. Sometimes you delete a few old things, most of the time you add more. [19:22] only 21gb left to go through [19:23] I really should get more drive trays but they are $17 each. [19:26] I mention this because a friend just had his 3rd hard drive failure since I have known him and he lost everything again [20:04] Ok so another week off work :/ [20:05] however means I maybe crunching again late night as that saeems to be teh tiem i become active here. [20:05] However right now my only concern is this vodka. [20:46] soultcer: Haha, thanks for the chocolate! Awesome packaging [20:49] Super tasty :3 [20:51] :O [21:00] http://dilbert.com/dyn/str_strip/000000000/00000000/0000000/100000/80000/3000/300/183359/183359.strip.gif [21:11] there already was defcad.org [21:48] woo got all of simtelnet on my laptop [21:48] 1997 me would be so jealous [21:49] seems to be missing a lot of games compared to what I remember though [21:49] the mirror at ftp.riken.go.jp is the same so maybe they cleared them out at some point