| Time |
Nickname |
Message |
|
00:24
π
|
joepie92 |
kennethre: quick q... does requests not allow for retrieving the response body if the response code is a 404? |
|
00:24
π
|
joepie92 |
actually wait |
|
00:25
π
|
joepie92 |
I'm an idiot, disregard above |
|
00:45
π
|
godane |
http://dogtv.com/ |
|
01:05
π
|
DFJustin |
I grabbed the jedi academy & outcast source zips when it was up |
|
01:05
π
|
omf_ |
DFJustin, could you sha256sum both files so I can compare mine make sure everything matches up |
|
01:06
π
|
DFJustin |
266c87c4fa204c8da87f1d5968abf115e33b332e45f265226cffed7e4a621641 *jediAcademy.zip |
|
01:06
π
|
DFJustin |
89a0d8bfb45b194fe8ab0943a25b65089c28457423380aecb760c7e0b2a98209 *jediOutcast.zip |
|
01:07
π
|
omf_ |
Mine match up |
|
01:07
π
|
omf_ |
\o/ |
|
01:24
π
|
SketchCow |
Please upload to archive.org. |
|
01:27
π
|
DFJustin |
on it |
|
01:35
π
|
dashcloud |
the unedited version? |
|
01:36
π
|
DFJustin |
http://archive.org/details/jediacademy_source |
|
01:36
π
|
DFJustin |
http://archive.org/details/jedioutcast_source |
|
01:39
π
|
dashcloud |
thanks |
|
01:44
π
|
SketchCow |
Excellent |
|
02:14
π
|
omf_ |
I want to stab the guy who presented after you SketchCow. He is totally a publisher apologist |
|
02:16
π
|
omf_ |
"Publishers want to get it right." Consider the fucking stacks of shit that comes out places like Apress, Wrox, nostarch |
|
02:17
π
|
omf_ |
"They don't want to lock things down." Which is why everything is fucking locked down hard |
|
02:21
π
|
omf_ |
More than half way done and no mention of Elsevier, the boycott or the dozens or articles talking about authors who hate the publishing industry |
|
02:41
π
|
dashcloud |
omf_: is your second sentence from the presentation itself? |
|
02:49
π
|
bsmith093 |
i just popped in, SketchCow has a presentation ?? link? |
|
02:49
π
|
omf_ |
bsmith093, - https://www.youtube.com/watch?v=M41806HPaCY |
|
02:50
π
|
omf_ |
dashcloud, I am looking for the time so you can watch it youself |
|
02:51
π
|
dashcloud |
thanks! |
|
03:06
π
|
omf_ |
start at 10 minutes 40 seconds in |
|
03:07
π
|
omf_ |
He then goes on to talk about all the work publishers do |
|
03:07
π
|
omf_ |
Here is how unimportant this guy is |
|
03:08
π
|
omf_ |
He has like a staff of 40 and then run a university library and interface with people and buy journals |
|
03:09
π
|
omf_ |
SketchCow, has 20-30 core volunteers, 100 more that are big helpers and 100+ more people he pulls in just to run warriors which saves millions of peoples content and then gives it away for free |
|
03:11
π
|
omf_ |
This speaker also keeps mentioning how he has friends in the publishing industry, how the publishing industry is all idealists and to go buy your publisher a drink |
|
03:13
π
|
omf_ |
and this talk started off so interesting |
|
03:13
π
|
omf_ |
Open Access Journals or Die |
|
03:14
π
|
dashcloud |
thanks! |
|
03:15
π
|
omf_ |
dashcloud, I am always on the hunt for good talks to watch. What are you interested in? I might have some suggestions |
|
03:19
π
|
dashcloud |
maybe some reverse engineering talks (saw a good one by Travis Goodspeed), but in general I just don't have the time to watch everything I want to see/listen to |
|
03:35
π
|
omf_ |
I got nothing off the top of my head. |
|
03:35
π
|
omf_ |
I am actually extracting data from the DMOZ rdf dump |
|
03:36
π
|
omf_ |
A very useful little bit of data for mapping out a small fraction of the internet |
|
04:04
π
|
DFJustin |
dashcloud: https://www.youtube.com/watch?v=PR9tFXz4Quc |
|
04:54
π
|
omf_ |
Ooh just read a good quote |
|
04:54
π
|
omf_ |
Violence is like duct tape. If it doesn't solve the problem, you didn't use enough. |
|
04:56
π
|
DFJustin |
dynamite is another example of that |
|
04:56
π
|
omf_ |
good point |
|
04:58
π
|
omf_ |
Anyone here had to setup and run a search engine that had to crawl web pages |
|
04:58
π
|
omf_ |
I think I am running into edge cases in url de-duplication and I wanted to take a look at how the existing open source players handle it |
|
05:00
π
|
omf_ |
Also webcrawler.com still exists |
|
05:01
π
|
DFJustin |
how the-- |
|
05:02
π
|
DFJustin |
I guess some people have really old start pages bookmarked |
|
05:09
π
|
Lord_Nigh |
does metacrawler still exist? |
|
05:10
π
|
Lord_Nigh |
i remember patching the webcrawler.com page locally around 1997 to allow an arbitrary number of results returned |
|
05:10
π
|
Lord_Nigh |
it used to limit you to 5,10,20,or 30 results |
|
05:10
π
|
Lord_Nigh |
i wanted to see 200 |
|
05:11
π
|
Lord_Nigh |
paginated results were still a ways off |
|
05:11
π
|
Lord_Nigh |
back in the ancient days |
|
05:17
π
|
omf_ |
I think I can use a Bloom filter to speed up url lookups and de-duplication |
|
05:24
π
|
omf_ |
The Lorena Bobbit emoji |
|
05:24
π
|
omf_ |
( Γ―ΒΌΒΎΓ’Β‘ï¼¾)ã£ÒΒΒΓ’Β°ÒΒΒΓ’ΒΒ― |
|
05:24
π
|
omf_ |
wow. I am so impressed, I mean fucking look at that. Creativity all around |
|
05:41
π
|
omf_ |
For the pythonistas in here - http://www.michaelnielsen.org/ddi/how-to-crawl-a-quarter-billion-webpages-in-40-hours/ |
|
05:41
π
|
omf_ |
It walks through the problem but the guy did not release the code because he was afraid people would abuse it |
|
05:43
π
|
omf_ |
He used 20 Amazone EC2 extra large instances. |
|
06:26
π
|
omf_ |
chfoo did you read the ISO or the other warc tools to build your warc file reader? |
|
06:26
π
|
omf_ |
and writer |
|
06:50
π
|
omf_ |
Anyone here have experience with Amazon Elastic MapReduce> |
|
06:50
π
|
omf_ |
? |
|
09:57
π
|
godane |
so looks like i'm starting to upload 2011 clips from g4 |
|
09:58
π
|
godane |
AOTS Reveals World's Largest T-Bag: https://archive.org/details/g4tv.com-video50827 |
|
13:19
π
|
ersi |
Hehe, Virgin Lounge. |
|
13:19
π
|
SketchCow |
SEXY virgin lounge |
|
13:19
π
|
SketchCow |
I just had a smoothie brought to my couch |
|
13:19
π
|
SketchCow |
They have US and UK plugs, makes life easier. |
|
13:19
π
|
SketchCow |
Wine tasting upstairs, not attending |
|
13:20
π
|
ersi |
I think it's hilarious that, that airline is actually named Virgin :D |
|
13:20
π
|
SketchCow |
Virgin the company has an interesting history. I'd check it out, as well as Richard Branson |
|
13:20
π
|
ersi |
Indeed, Richard is indeed quite interesting |
|
13:21
π
|
ersi |
Lounges are pretty sweet. |
|
13:21
π
|
ersi |
Don't you have enough points, seeing how you criss cross the world, to not pay for access to those? |
|
13:33
π
|
BlueMax |
lucky SketchCow |
|
13:39
π
|
ersi |
How much does luck play a role in this? Paid for entrance :p |
|
13:42
π
|
SketchCow |
Yeah, SketchCow with 60 pounds. |
|
13:43
π
|
SketchCow |
If I was hungry enough, I could totally nail that in just free food/drink |
|
13:50
π
|
SmileyG |
Virgin Coke anyone? |
|
13:50
π
|
SmileyG |
I mean how does a record company end up selling soft drinks o_O |
|
13:51
π
|
SmileyG |
http://i.telegraph.co.uk/multimedia/archive/01556/VIRGINCOLA_1556623i.jpg |
|
13:51
π
|
GLaDOS |
..what |
|
13:51
π
|
GLaDOS |
why would you |
|
13:51
π
|
GLaDOS |
Richard, what are you doing? |
|
13:51
π
|
SmileyG |
Was years ago |
|
13:52
π
|
GLaDOS |
Still.. |
|
13:52
π
|
SmileyG |
then again how does a record company end up flying planes o_O |
|
13:52
π
|
GLaDOS |
Boredom, obviously. |
|
13:52
π
|
SmileyG |
I've made lots of money, wtf do I do now? |
|
13:53
π
|
GLaDOS |
Or, got annoyed at Emirates not serving up Virgin Cola on a flight, and bought a whole fleet of aircraft. |
|
13:53
π
|
SmileyG |
xD |
|
13:53
π
|
SmileyG |
"They said I couldn't bring drinks on a plane.... I proved them wrong!" |
|
13:54
π
|
SmileyG |
How does a record company end up buying an ISP?! |
|
13:54
π
|
GLaDOS |
It all makes sense.. |
|
13:54
π
|
GLaDOS |
Hm, this one is tough.. |
|
13:54
π
|
SmileyG |
he was really annoyed with bt? |
|
13:55
π
|
GLaDOS |
He wanted the extra kilobyte/second. |
|
14:01
π
|
SmileyG |
Anyway the interview seemed ok :O |
|
14:02
π
|
GLaDOS |
That's good. |
|
14:02
π
|
GLaDOS |
..what's this about again? |
|
14:02
π
|
SmileyG |
yeah ope so ;D |
|
14:02
π
|
SmileyG |
my new job hopefully |
|
14:02
π
|
GLaDOS |
Ah |
|
14:03
π
|
SmileyG |
from 4-7k payrise, easily reaching 9k pay rise |
|
14:03
π
|
SmileyG |
concidering I've only had 2k in my present comapny in 2 years.... I'd be very happy |
|
14:03
π
|
GLaDOS |
Well that's good! |
|
14:04
π
|
GLaDOS |
Going to do a wiki spambot cleanup tomorrow. |
|
14:04
π
|
GLaDOS |
This time, I'm using my new PC that can have Nightly open while I do other things! |
|
14:05
π
|
SmileyG |
NIghtly is...? |
|
14:06
π
|
GLaDOS |
Firefox Nightly. |
|
14:07
π
|
* |
GLaDOS crashes into bed |
|
14:11
π
|
SmileyG |
ahh |
|
14:39
π
|
chfoo |
omf_: i was reading the iso when i was writing it. i didn't look at code from other projects because i wanted to avoid copying conformance bugs if any by accident. |
|
14:39
π
|
chfoo |
have you been trying it out? |
|
14:40
π
|
chfoo |
the next thing i plan to do is a verify command to check the digests and conformance issues |
|
14:45
π
|
chfoo |
in terms in conformance, i noticed so far is that wget produces warcs with duplicate record ids when saving some metadata |
|
15:02
π
|
SketchCow |
http://i.imgur.com/LrASrN8.jpg |
|
15:02
π
|
SketchCow |
los it at broconut |
|
15:05
π
|
SmileyG |
xD |
|
15:06
π
|
BlueMax |
I put the lime in the broconut |
|
15:30
π
|
godane |
so i looked at g4tv.com/images again |
|
15:31
π
|
godane |
i think i can get them all in warc.gz format |
|
15:31
π
|
godane |
or at least whats on the website |
|
15:32
π
|
godane |
only cause i checked the source and it looks like every url for images are in there |
|
15:32
π
|
godane |
even if its not being displayed |
|
15:35
π
|
SketchCow |
When's G4 down for good? |
|
15:35
π
|
SketchCow |
11 days, I see |
|
15:39
π
|
godane |
i think so |
|
15:42
π
|
godane |
SketchCow: i did find it funny that you had a interview on g4 |
|
15:43
π
|
godane |
i was always thinking about you going on to techtv if you could |
|
15:43
π
|
SketchCow |
Yes, that was a LONG time ago and I was one of Kevin's first big interviews |
|
15:43
π
|
SketchCow |
He worked so hard on this and we talked a lot before and after. |
|
15:43
π
|
SketchCow |
He probably would remember me |
|
15:43
π
|
SketchCow |
I have a memory he was so happy to get a copy of the doc when it did come out a year later. |
|
15:44
π
|
SketchCow |
My flight leaves soon |
|
15:44
π
|
godane |
its on his youtube account |
|
15:44
π
|
SketchCow |
As per the upgrade, they come and get me and I get pre-boarding |
|
15:44
π
|
SketchCow |
And now I can polish off a few more movies. |
|
15:44
π
|
SketchCow |
(Saw Jack Reacher, Hotel Transylvania on the way in) |
|
15:45
π
|
balrog_ |
is there a guide anywhere for archiving IP.board (invision powerboard) forums? |
|
15:45
π
|
SketchCow |
Hotel Transylvania is Adam Sandler with an actual director giving him characterization directions |
|
15:47
π
|
BlueMax |
so...good? |
|
15:57
π
|
Schbirid |
it has lots of singing and other childish things, one of the few movies i stopped watching midway through |
|
16:02
π
|
godane |
so g4 has stopped releasing new videos like 3 weeks ago |
|
16:03
π
|
godane |
so thats why i'm panicing to get as much stuff as i can |
|
17:31
π
|
godane |
i have passed 28k videos in g4video-web |
|
18:58
π
|
omf_ |
After over a day I can say that DigitalOcean droplets respond faster than joyent instances |
|
18:58
π
|
omf_ |
It all comes down to DO using SSD and joyent does not |
|
18:58
π
|
omf_ |
DigitalOcean has a better online interface than Joyent and costs half the price for the same thing |
|
19:11
π
|
soultcer |
Interesting. I read that DO has a pretty bad network. Did you have any trouble with it? |
|
19:12
π
|
omf_ |
None so far |
|
19:12
π
|
omf_ |
I will give them a month just like I did joyent |
|
19:12
π
|
soultcer |
Also re joyent: It's been over a month and I still haven't received a billing statement. |
|
19:13
π
|
soultcer |
Preliminary calculations suggest though that I have so little traffic that the price advantage of Amazon Spot instances negates the savings for having cheap traffic at DO or joyent |
|
19:14
π
|
omf_ |
I tried DO because I got $10 account credit for free |
|
19:14
π
|
omf_ |
and Joyent gave me a bunch free too |
|
19:14
π
|
omf_ |
I will probably end back at Amazon simply because they offer better features |
|
19:23
π
|
ersi |
DO still has the SSDTWEET promo code activated btw |
|
19:24
π
|
ersi |
which entitled you to $10 credit |
|
19:59
π
|
DFJustin |
http://archive.org/details/cst_000027 |
|
20:32
π
|
omf_ |
These DO instances are so much faster than joyent that I have to up my droplet size to get more hard drive space |
|
21:21
π
|
godane |
DICE 2011 "Exploring the Ocean Deep in 3D" Presentation: https://archive.org/details/g4tv.com-video51230 |
|
22:04
π
|
omf_ |
Internet Archive eat another 2.2gb doughnut. You know you love it. |
|
22:04
π
|
joepie92 |
omf_: it's clearly a marketing tactic! ;) |
|
22:04
π
|
omf_ |
joepie92, yeah now it costs what I was paying for joyent |
|
22:05
π
|
omf_ |
$0.03 an hour |
|
22:06
π
|
omf_ |
See I am old school. I use the butt, then I lose the butt |
|
22:06
π
|
omf_ |
If the butts aren't making me money there is no reason to keep them around :) |
|
22:10
π
|
S[h]O[r]T |
are you talking about buttcoins? |
|
22:12
π
|
dashcloud |
nope- the "cloud" |
|
22:13
π
|
omf_ |
The clown when in public |
|
22:13
π
|
dashcloud |
I think I prefer "moon" more- it makes it very clear how far away your stuff is and your chances of getting it if something goes wrong |
|
22:16
π
|
omf_ |
I might spin up an instance of the common crawl tonight to try MapReduce and see how fast things are |
|
22:16
π
|
omf_ |
The thing I am trying to figure out is do I have to load the whole thing first or is it already in a format that I can just query against |
|
22:17
π
|
omf_ |
I don't even need all 81tb, just the URL frontier |
|
22:28
π
|
omf_ |
Here is what is interesting about DO. $0.007 per hour = 20gb storage, 1tb transfer |
|
22:29
π
|
omf_ |
$0.015 per hour = 30gb, 2tb transfer |
|
22:29
π
|
omf_ |
So I should just get 2 of the smallest instances to get more hard drive space |
|
22:32
π
|
joepie92 |
haha |
|
22:41
π
|
omf_ |
I added more clown info to the wiki - http://www.archiveteam.org/index.php?title=Clown_hosting |
|
22:43
π
|
joepie92 |
omf_: DO offers unmetered again? |
|
22:43
π
|
omf_ |
I didn't put that note on there. Not sure |
|
22:43
π
|
omf_ |
All the plans have transfer limits on them |
|
22:44
π
|
joepie92 |
All servers come with 1Gb/sec. network interface. Plans start with 1TB per month and increase incrementally. Once the monthly transfer limit has been exceeded, it's $0.02 per GB thereafter. YouΓΒ’??ll save a ton of money with our network and it's easy to get started. |
|
22:59
π
|
joepie92 |
http://www.vr.org/buy-vps/#usa |
|
22:59
π
|
joepie92 |
free inbound |
|
23:00
π
|
dashcloud |
interesting move by Google: http://dataliberation.blogspot.com/2013/04/plan-your-digital-afterlife-with.html |
|
23:32
π
|
joepie92 |
dashcloud: whoa. |
|
23:33
π
|
Aranje- |
yeah man |
|
23:33
π
|
Aranje- |
I'm excited about it |
|
23:33
π
|
dashcloud |
so, not perfect, but far better than everyone else out there |
|
23:33
π
|
Aranje- |
I'm setting mine up today |
|
23:33
π
|
Aranje- |
yeah |