Time |
Nickname |
Message |
00:23
🔗
|
Smiley |
awww hell wtf |
00:23
🔗
|
* |
Smiley ponders what to tell the doctor tomorrow |
00:23
🔗
|
Smiley |
"HI, most of the time i feel completely normal, apart from those times I wish I didn't exist." |
00:23
🔗
|
omf_ |
You have moments that make you anxious for no reason |
00:24
🔗
|
Smiley |
yes |
00:25
🔗
|
Smiley |
Dr Cope << hahaha |
00:25
🔗
|
omf_ |
That is how you should explain it and then give a few examples. |
00:25
🔗
|
Smiley |
yes. |
00:25
🔗
|
Smiley |
yes I should |
00:27
🔗
|
dashcloud |
from twitter: Allow me to donate this slogan to any evanescent startup with no business model beyond acquisition. "We'll make you the star in rm -rf *" |
00:27
🔗
|
omf_ |
yes dashcloud |
00:49
🔗
|
omf_ |
Wikis must drive OCD people nuts |
01:10
🔗
|
DFJustin |
yeah I put in like 7000 spelling and formatting edits to wikipedia before basically giving up |
01:18
🔗
|
omf_ |
I think our wiki is a lot better than it was last time this year |
01:19
🔗
|
omf_ |
More content and much less spam |
01:19
🔗
|
Aranje |
so I was just handed a printout of someone's geocities site that archiveteam saved. He was very happy. |
01:23
🔗
|
SketchCow |
awww |
02:29
🔗
|
omf_ |
Iooking at the 60day history, there was a fuck ton of backing off http://archive.org/stats/s3.php#60d |
03:19
🔗
|
omf_ |
753 of the top 1 million sites are google |
03:20
🔗
|
omf_ |
#2, #12, #23, #25, etc.... |
04:35
🔗
|
omf_ |
On a 2gb 2core butt I can grab 7 screenshots at a time |
04:35
🔗
|
omf_ |
CPU usage being the primary limiter |
04:36
🔗
|
omf_ |
I am slowly scaling this up to see if any bottlenecks arise |
04:36
🔗
|
omf_ |
The end goal being able to just fire this up and capture any amount of pages in a reasonable amount of time |
04:37
🔗
|
omf_ |
I still cannot figure out why images were not showing up from amazon on the posterous test |
04:57
🔗
|
instence |
what is the best way to resolve a large batch of URL's that are being redirected? aka have been "moved" |
04:58
🔗
|
instence |
is there a better way than wget spider? |
04:58
🔗
|
instence |
or cleaner I should say |
05:07
🔗
|
omf_ |
are you just trying to get the resolve urls? |
05:09
🔗
|
instence |
yep, i am using an app to check the status of a large URL list, which kicks back 404 not found, "ok", timed out, no connection, no such host, etc |
05:10
🔗
|
instence |
and some are 301 object permanently moved, or 302 object temporarily moved |
05:10
🔗
|
instence |
and I want to just get back where they were moved to |
05:13
🔗
|
omf_ |
how big a url list |
05:19
🔗
|
instence |
Hmm anywhere from 50 to 500,000 |
05:19
🔗
|
instence |
i think on average it would be 100-1000 url's that I would be trying to resolve |
05:22
🔗
|
instence |
I am basically using this app to chew through sites have archived for any externally linked URL's, and I run a check on those, looking for more sites to archive, and I do this on each site. |
05:22
🔗
|
instence |
The list of potentials grows and grows |
05:22
🔗
|
omf_ |
Yeah I am dealing with that kind of problem right now |
05:22
🔗
|
omf_ |
I am testing out 7 urls at a time |
05:24
🔗
|
instence |
I use a windows app called XENU to do URL checking |
05:24
🔗
|
omf_ |
Scaling up the server size will help but I am looking into dns caching as well |
05:27
🔗
|
instence |
I could use wget to spider, and set a level depth limit, causing it to resolve, but the verbose output is too much. |
05:27
🔗
|
instence |
I would love to just get a straight list somehow |
05:28
🔗
|
instence |
i might have to do that though, and pump the output through SED or something |
05:28
🔗
|
instence |
just that I know how to do that in theory, not necessarily in practice off the top of my head lol |
05:28
🔗
|
instence |
as i am not a wizard in regex yet |
06:09
🔗
|
brayden |
Ah yeah Xenu is really good. |
06:15
🔗
|
brayden |
instence, so if they give a 3xx HTTP result, the script needs to just say where the new location is? |
06:17
🔗
|
instence |
yea |
06:18
🔗
|
instence |
I export what is in xenu to tab list and then into spreadsheet |
06:19
🔗
|
instence |
basically i want to just take the list of url's that are 3xx and resolve them, to find out their new location |
06:20
🔗
|
brayden |
Well whilst I was waiting for you to respond I made a Python script using some code I stole off of stackexchange, like usual, which kind of does that |
06:20
🔗
|
brayden |
http://brayden.ur.cx/redirect.py |
06:20
🔗
|
brayden |
it'll output 1 on the first line if a redirect then 2nd line it'll include the new location |
06:20
🔗
|
brayden |
otherwise 0 |
06:20
🔗
|
brayden |
using sed it would be something like sed -n1p or whatever it was to get the result |
06:21
🔗
|
brayden |
ah I was close |
06:21
🔗
|
brayden |
sed -n 1p |
06:21
🔗
|
brayden |
for line one, apparently |
06:21
🔗
|
brayden |
Is the spreadsheet just a csv or is it a proper excel one? |
06:22
🔗
|
instence |
well, it can be anything really, as what gets dumped from xenu is just a tab delimited file |
06:23
🔗
|
instence |
i usually drop it into a spreadsheet just to do sorting and contains searches, categorizing what is there |
06:23
🔗
|
instence |
i would filter by 3xx and just select the urls, past to txt file and want to run that list through something to get the resolved URL's |
06:25
🔗
|
instence |
thanks for the python example |
06:25
🔗
|
brayden |
parsing tab delimited files should be pretty easy. |
06:25
🔗
|
brayden |
just making a thing for that now |
06:26
🔗
|
instence |
hmm almost wonder if I could do something like that in php as well, as I have written scraper scripts in php that work great |
06:26
🔗
|
instence |
this is cool cause it will get me into learning a bit of python, which I haven't work with before yet ;) |
06:26
🔗
|
brayden |
Can you provide an example of a line from the tab delimited file? |
06:27
🔗
|
instence |
I don't even know if I would process the tab delimited file. As I would do my sorting first and just select the range of URL's that I want to resolve so honestly whatever the script targets it could be a txt file with just a single list of URL's |
06:28
🔗
|
instence |
so urls.txt, and just 100 URL's or whatever |
06:30
🔗
|
brayden |
C:\Users\brayden\PycharmProjects\PythonXMLTest>python show-redirect.py urls.txt |
06:30
🔗
|
brayden |
1 http://www.google.com/ |
06:30
🔗
|
brayden |
1 http://archiveteam.org/index.php?title=Main_Page |
06:30
🔗
|
brayden |
1 http://www.google.com.au/ |
06:30
🔗
|
brayden |
seems to be working |
06:31
🔗
|
instence |
awesome |
06:31
🔗
|
brayden |
http://brayden.ur.cx/redirect.py just updated this one |
06:31
🔗
|
brayden |
it has no error checking though or anything like that |
06:31
🔗
|
brayden |
if a site goes down or times out for whatever reason it will crash |
06:31
🔗
|
brayden |
really easy to fix though if htat becomes a problem |
06:32
🔗
|
instence |
cool, at least if something does bork I will know why |
06:33
🔗
|
instence |
this is great though |
06:33
🔗
|
instence |
thanks a bunch for the script |
06:33
🔗
|
brayden |
no worries. Python is a really easy language to learn so you should try |
06:34
🔗
|
instence |
quick question: does it treat tabs/spaces as contextual? since there are no curley braces? like coffee script? |
06:35
🔗
|
brayden |
for every line in the text file it'll just send the text off to the testUrl function |
06:35
🔗
|
instence |
oh i mean for the python code itself |
06:35
🔗
|
instence |
like the syntax |
06:35
🔗
|
brayden |
oh |
06:35
🔗
|
brayden |
it uses indenting |
06:36
🔗
|
brayden |
after every : you raise the indenting level |
06:36
🔗
|
instence |
ok cool |
06:36
🔗
|
brayden |
It doesn't use braces or anything |
06:37
🔗
|
instence |
this is the perfect thing to get my into experimenting with python, as I can already clearly see what this scrupt is doing and it relates to stuff I need to do |
06:37
🔗
|
instence |
so I can probobly work from this, expand on it, and branch out into other tasks I need done |
06:37
🔗
|
instence |
so looks like this weekend I will be getting into python a bit ;) |
06:37
🔗
|
brayden |
I'll be here around this time for the weekend if you need to ask anything |
06:38
🔗
|
instence |
ok cool, i will most likely be up Fri/Sat night all night working on various archiving tasks |
06:39
🔗
|
instence |
so I will ping you if I have any questions |
06:39
🔗
|
instence |
I am going to get python up and running locally tomorrow night and test out this script for myself |
06:39
🔗
|
brayden |
yeah well python is really easy to setup on windows/linux so you shouldn't have any trouble |
06:40
🔗
|
brayden |
this just uses standard libraries |
06:41
🔗
|
instence |
I have setup python before, for some application dependancy in the past. It was pretty straight forward. |
06:44
🔗
|
brayden |
Do you just use notepad++ or something like that to write it? |
06:48
🔗
|
instence |
like as far as preferred editor? |
06:49
🔗
|
brayden |
well what are you planning on using? |
06:50
🔗
|
instence |
well, I am sort of IDE agnostic at the moment. At work I use a combination of eclipse, notepad++, and Dreamweaver (coder mode only, mainly because you can custom collapse any selected region of code) |
06:50
🔗
|
instence |
though I have ambitions to learn VIM and migrate to that |
06:50
🔗
|
instence |
I am designer/front end developer |
06:52
🔗
|
instence |
if I write python it might be in notepad++ for now |
06:54
🔗
|
brayden |
Well PHPstorm can collapse bits of code and I bet it has a hell of a lot better completion than Dreamweaver as well as being way cheaper. |
06:55
🔗
|
brayden |
and having Linux/Mac/Windows support |
06:55
🔗
|
brayden |
Personally I have a lot of trouble remembering things so I use a fully fledged IDE |
06:55
🔗
|
brayden |
PyCharm is the one I use. It is pretty expensive and I'm a student so can't really afford it. |
06:55
🔗
|
brayden |
Had to go via other means :( |
06:55
🔗
|
brayden |
but they have a 30 day trial |
06:56
🔗
|
brayden |
A lot of nice features are shared between editions, for instance, I'm doing a tornado template and it has detected it is HTML syntax. |
06:56
🔗
|
brayden |
and providing me with proper completion etc. |
06:56
🔗
|
brayden |
it even was nice enough to download the twitter bootstrap js and provide completion with that! |
06:57
🔗
|
brayden |
https://www.jetbrains.com/ anyway it is this lot. They do some seriously good software! |
07:00
🔗
|
instence |
cool I will have to check it out |
07:01
🔗
|
instence |
I actually have been shying away from hinting/completion aside from auto-closing html tags and auto-generating blocks of preformatted code |
07:01
🔗
|
brayden |
Why? |
07:01
🔗
|
instence |
basically because I found myself getting too reliant on the auto completion, and i didn't realy know the languages as well |
07:01
🔗
|
brayden |
Well yeah there is that. |
07:02
🔗
|
brayden |
But that's not really a problem of using auto completion, I reckon at least. |
07:02
🔗
|
instence |
so to force myself to remember and learn the languages better I have been just referring to api documentation and writing as much as I can |
07:03
🔗
|
instence |
yea its not that auto completion is the problem per say |
07:03
🔗
|
brayden |
and few auto completions are powerful enough to let you get away with that anyway |
07:03
🔗
|
brayden |
I still have to go through module docs all the time. |
07:05
🔗
|
instence |
I am going to checkout PyCharm |
07:05
🔗
|
instence |
looks interesting |
07:06
🔗
|
brayden |
They made a nice theme for it too which is pretty easy on the eyes |
07:06
🔗
|
brayden |
this whole "darcula" thing they're integrating into their stuff |
07:06
🔗
|
instence |
lol |
07:06
🔗
|
instence |
nice name |
09:30
🔗
|
godane |
so i found another episode of gamespot tv |
09:30
🔗
|
godane |
i really wish these rips where at 165kbs |
09:31
🔗
|
godane |
maybe not that great but still alot better then dialup |
10:53
🔗
|
godane |
yes |
10:53
🔗
|
godane |
found another episode of call for help |
11:13
🔗
|
Smiley |
psyatric nurse :o |
13:53
🔗
|
Smiley |
raaawr |
14:06
🔗
|
ersi |
Raring rawrtail |
14:09
🔗
|
DFJustin |
another gaming gem rescued from the dustbin of history http://archive.org/download/Nextys_Archive/OW__B.ISO/Butt_Slam%2FBUTTSLAM.ZIP |
14:10
🔗
|
SketchCow |
Whew, back |
14:10
🔗
|
ersi |
Butt slam :D Bwahaha |
14:10
🔗
|
Cameron_D |
http://www.theverge.com/2013/5/3/4294548/tears-in-rain-how-snapchat-showed-me-the-glory-of-fading-data |
15:03
🔗
|
sep332 |
Hackers for Charity documentary has 2 days left http://www.kickstarter.com/projects/1456247168/hackers-in-uganda-a-documentary?ref=live |
15:39
🔗
|
SketchCow |
I won't support it |
15:39
🔗
|
SketchCow |
They basically dumped in 5k of their own money to guarantee investment |
17:57
🔗
|
DFJustin |
http://www.forbes.com/sites/andygreenberg/2013/05/03/this-is-the-worlds-first-entirely-3d-printed-gun-photos/ |
18:28
🔗
|
SketchCow |
Thingiverse dropped the funs |
18:28
🔗
|
SketchCow |
guns |
18:28
🔗
|
SketchCow |
Talked to Bre about it |
18:28
🔗
|
SketchCow |
Told him I expected it, I said I expect Guniverse within seconds and he says there's already one |
18:33
🔗
|
chronomex |
haha |
18:33
🔗
|
SketchCow |
http://i.imgur.com/KBCsbVi.gif |
19:14
🔗
|
omf_ |
I am hitting the final stretch of having refreshed all my backups. It is a huge relief. Sometimes you delete a few old things, most of the time you add more. |
19:22
🔗
|
omf_ |
only 21gb left to go through |
19:23
🔗
|
omf_ |
I really should get more drive trays but they are $17 each. |
19:26
🔗
|
omf_ |
I mention this because a friend just had his 3rd hard drive failure since I have known him and he lost everything again |
20:04
🔗
|
Smiley |
Ok so another week off work :/ |
20:05
🔗
|
Smiley |
however means I maybe crunching again late night as that saeems to be teh tiem i become active here. |
20:05
🔗
|
Smiley |
However right now my only concern is this vodka. |
20:46
🔗
|
ersi |
soultcer: Haha, thanks for the chocolate! Awesome packaging |
20:49
🔗
|
ersi |
Super tasty :3 |
20:51
🔗
|
Smiley |
:O |
21:00
🔗
|
ersi |
http://dilbert.com/dyn/str_strip/000000000/00000000/0000000/100000/80000/3000/300/183359/183359.strip.gif |
21:11
🔗
|
Coderjoe |
there already was defcad.org |
21:48
🔗
|
DopefishJ |
woo got all of simtelnet on my laptop |
21:48
🔗
|
DFJustin |
1997 me would be so jealous |
21:49
🔗
|
DFJustin |
seems to be missing a lot of games compared to what I remember though |
21:49
🔗
|
DFJustin |
the mirror at ftp.riken.go.jp is the same so maybe they cleared them out at some point |