Time |
Nickname |
Message |
00:24
π
|
joepie92 |
kennethre: quick q... does requests not allow for retrieving the response body if the response code is a 404? |
00:24
π
|
joepie92 |
actually wait |
00:25
π
|
joepie92 |
I'm an idiot, disregard above |
00:45
π
|
godane |
http://dogtv.com/ |
01:05
π
|
DFJustin |
I grabbed the jedi academy & outcast source zips when it was up |
01:05
π
|
omf_ |
DFJustin, could you sha256sum both files so I can compare mine make sure everything matches up |
01:06
π
|
DFJustin |
266c87c4fa204c8da87f1d5968abf115e33b332e45f265226cffed7e4a621641 *jediAcademy.zip |
01:06
π
|
DFJustin |
89a0d8bfb45b194fe8ab0943a25b65089c28457423380aecb760c7e0b2a98209 *jediOutcast.zip |
01:07
π
|
omf_ |
Mine match up |
01:07
π
|
omf_ |
\o/ |
01:24
π
|
SketchCow |
Please upload to archive.org. |
01:27
π
|
DFJustin |
on it |
01:35
π
|
dashcloud |
the unedited version? |
01:36
π
|
DFJustin |
http://archive.org/details/jediacademy_source |
01:36
π
|
DFJustin |
http://archive.org/details/jedioutcast_source |
01:39
π
|
dashcloud |
thanks |
01:44
π
|
SketchCow |
Excellent |
02:14
π
|
omf_ |
I want to stab the guy who presented after you SketchCow. He is totally a publisher apologist |
02:16
π
|
omf_ |
"Publishers want to get it right." Consider the fucking stacks of shit that comes out places like Apress, Wrox, nostarch |
02:17
π
|
omf_ |
"They don't want to lock things down." Which is why everything is fucking locked down hard |
02:21
π
|
omf_ |
More than half way done and no mention of Elsevier, the boycott or the dozens or articles talking about authors who hate the publishing industry |
02:41
π
|
dashcloud |
omf_: is your second sentence from the presentation itself? |
02:49
π
|
bsmith093 |
i just popped in, SketchCow has a presentation ?? link? |
02:49
π
|
omf_ |
bsmith093, - https://www.youtube.com/watch?v=M41806HPaCY |
02:50
π
|
omf_ |
dashcloud, I am looking for the time so you can watch it youself |
02:51
π
|
dashcloud |
thanks! |
03:06
π
|
omf_ |
start at 10 minutes 40 seconds in |
03:07
π
|
omf_ |
He then goes on to talk about all the work publishers do |
03:07
π
|
omf_ |
Here is how unimportant this guy is |
03:08
π
|
omf_ |
He has like a staff of 40 and then run a university library and interface with people and buy journals |
03:09
π
|
omf_ |
SketchCow, has 20-30 core volunteers, 100 more that are big helpers and 100+ more people he pulls in just to run warriors which saves millions of peoples content and then gives it away for free |
03:11
π
|
omf_ |
This speaker also keeps mentioning how he has friends in the publishing industry, how the publishing industry is all idealists and to go buy your publisher a drink |
03:13
π
|
omf_ |
and this talk started off so interesting |
03:13
π
|
omf_ |
Open Access Journals or Die |
03:14
π
|
dashcloud |
thanks! |
03:15
π
|
omf_ |
dashcloud, I am always on the hunt for good talks to watch. What are you interested in? I might have some suggestions |
03:19
π
|
dashcloud |
maybe some reverse engineering talks (saw a good one by Travis Goodspeed), but in general I just don't have the time to watch everything I want to see/listen to |
03:35
π
|
omf_ |
I got nothing off the top of my head. |
03:35
π
|
omf_ |
I am actually extracting data from the DMOZ rdf dump |
03:36
π
|
omf_ |
A very useful little bit of data for mapping out a small fraction of the internet |
04:04
π
|
DFJustin |
dashcloud: https://www.youtube.com/watch?v=PR9tFXz4Quc |
04:54
π
|
omf_ |
Ooh just read a good quote |
04:54
π
|
omf_ |
Violence is like duct tape. If it doesn't solve the problem, you didn't use enough. |
04:56
π
|
DFJustin |
dynamite is another example of that |
04:56
π
|
omf_ |
good point |
04:58
π
|
omf_ |
Anyone here had to setup and run a search engine that had to crawl web pages |
04:58
π
|
omf_ |
I think I am running into edge cases in url de-duplication and I wanted to take a look at how the existing open source players handle it |
05:00
π
|
omf_ |
Also webcrawler.com still exists |
05:01
π
|
DFJustin |
how the-- |
05:02
π
|
DFJustin |
I guess some people have really old start pages bookmarked |
05:09
π
|
Lord_Nigh |
does metacrawler still exist? |
05:10
π
|
Lord_Nigh |
i remember patching the webcrawler.com page locally around 1997 to allow an arbitrary number of results returned |
05:10
π
|
Lord_Nigh |
it used to limit you to 5,10,20,or 30 results |
05:10
π
|
Lord_Nigh |
i wanted to see 200 |
05:11
π
|
Lord_Nigh |
paginated results were still a ways off |
05:11
π
|
Lord_Nigh |
back in the ancient days |
05:17
π
|
omf_ |
I think I can use a Bloom filter to speed up url lookups and de-duplication |
05:24
π
|
omf_ |
The Lorena Bobbit emoji |
05:24
π
|
omf_ |
( Γ―ΒΌΒΎΓ’Β‘ï¼¾)ã£ÒΒΒΓ’Β°ÒΒΒΓ’ΒΒ― |
05:24
π
|
omf_ |
wow. I am so impressed, I mean fucking look at that. Creativity all around |
05:41
π
|
omf_ |
For the pythonistas in here - http://www.michaelnielsen.org/ddi/how-to-crawl-a-quarter-billion-webpages-in-40-hours/ |
05:41
π
|
omf_ |
It walks through the problem but the guy did not release the code because he was afraid people would abuse it |
05:43
π
|
omf_ |
He used 20 Amazone EC2 extra large instances. |
06:26
π
|
omf_ |
chfoo did you read the ISO or the other warc tools to build your warc file reader? |
06:26
π
|
omf_ |
and writer |
06:50
π
|
omf_ |
Anyone here have experience with Amazon Elastic MapReduce> |
06:50
π
|
omf_ |
? |
09:57
π
|
godane |
so looks like i'm starting to upload 2011 clips from g4 |
09:58
π
|
godane |
AOTS Reveals World's Largest T-Bag: https://archive.org/details/g4tv.com-video50827 |
13:19
π
|
ersi |
Hehe, Virgin Lounge. |
13:19
π
|
SketchCow |
SEXY virgin lounge |
13:19
π
|
SketchCow |
I just had a smoothie brought to my couch |
13:19
π
|
SketchCow |
They have US and UK plugs, makes life easier. |
13:19
π
|
SketchCow |
Wine tasting upstairs, not attending |
13:20
π
|
ersi |
I think it's hilarious that, that airline is actually named Virgin :D |
13:20
π
|
SketchCow |
Virgin the company has an interesting history. I'd check it out, as well as Richard Branson |
13:20
π
|
ersi |
Indeed, Richard is indeed quite interesting |
13:21
π
|
ersi |
Lounges are pretty sweet. |
13:21
π
|
ersi |
Don't you have enough points, seeing how you criss cross the world, to not pay for access to those? |
13:33
π
|
BlueMax |
lucky SketchCow |
13:39
π
|
ersi |
How much does luck play a role in this? Paid for entrance :p |
13:42
π
|
SketchCow |
Yeah, SketchCow with 60 pounds. |
13:43
π
|
SketchCow |
If I was hungry enough, I could totally nail that in just free food/drink |
13:50
π
|
SmileyG |
Virgin Coke anyone? |
13:50
π
|
SmileyG |
I mean how does a record company end up selling soft drinks o_O |
13:51
π
|
SmileyG |
http://i.telegraph.co.uk/multimedia/archive/01556/VIRGINCOLA_1556623i.jpg |
13:51
π
|
GLaDOS |
..what |
13:51
π
|
GLaDOS |
why would you |
13:51
π
|
GLaDOS |
Richard, what are you doing? |
13:51
π
|
SmileyG |
Was years ago |
13:52
π
|
GLaDOS |
Still.. |
13:52
π
|
SmileyG |
then again how does a record company end up flying planes o_O |
13:52
π
|
GLaDOS |
Boredom, obviously. |
13:52
π
|
SmileyG |
I've made lots of money, wtf do I do now? |
13:53
π
|
GLaDOS |
Or, got annoyed at Emirates not serving up Virgin Cola on a flight, and bought a whole fleet of aircraft. |
13:53
π
|
SmileyG |
xD |
13:53
π
|
SmileyG |
"They said I couldn't bring drinks on a plane.... I proved them wrong!" |
13:54
π
|
SmileyG |
How does a record company end up buying an ISP?! |
13:54
π
|
GLaDOS |
It all makes sense.. |
13:54
π
|
GLaDOS |
Hm, this one is tough.. |
13:54
π
|
SmileyG |
he was really annoyed with bt? |
13:55
π
|
GLaDOS |
He wanted the extra kilobyte/second. |
14:01
π
|
SmileyG |
Anyway the interview seemed ok :O |
14:02
π
|
GLaDOS |
That's good. |
14:02
π
|
GLaDOS |
..what's this about again? |
14:02
π
|
SmileyG |
yeah ope so ;D |
14:02
π
|
SmileyG |
my new job hopefully |
14:02
π
|
GLaDOS |
Ah |
14:03
π
|
SmileyG |
from 4-7k payrise, easily reaching 9k pay rise |
14:03
π
|
SmileyG |
concidering I've only had 2k in my present comapny in 2 years.... I'd be very happy |
14:03
π
|
GLaDOS |
Well that's good! |
14:04
π
|
GLaDOS |
Going to do a wiki spambot cleanup tomorrow. |
14:04
π
|
GLaDOS |
This time, I'm using my new PC that can have Nightly open while I do other things! |
14:05
π
|
SmileyG |
NIghtly is...? |
14:06
π
|
GLaDOS |
Firefox Nightly. |
14:07
π
|
* |
GLaDOS crashes into bed |
14:11
π
|
SmileyG |
ahh |
14:39
π
|
chfoo |
omf_: i was reading the iso when i was writing it. i didn't look at code from other projects because i wanted to avoid copying conformance bugs if any by accident. |
14:39
π
|
chfoo |
have you been trying it out? |
14:40
π
|
chfoo |
the next thing i plan to do is a verify command to check the digests and conformance issues |
14:45
π
|
chfoo |
in terms in conformance, i noticed so far is that wget produces warcs with duplicate record ids when saving some metadata |
15:02
π
|
SketchCow |
http://i.imgur.com/LrASrN8.jpg |
15:02
π
|
SketchCow |
los it at broconut |
15:05
π
|
SmileyG |
xD |
15:06
π
|
BlueMax |
I put the lime in the broconut |
15:30
π
|
godane |
so i looked at g4tv.com/images again |
15:31
π
|
godane |
i think i can get them all in warc.gz format |
15:31
π
|
godane |
or at least whats on the website |
15:32
π
|
godane |
only cause i checked the source and it looks like every url for images are in there |
15:32
π
|
godane |
even if its not being displayed |
15:35
π
|
SketchCow |
When's G4 down for good? |
15:35
π
|
SketchCow |
11 days, I see |
15:39
π
|
godane |
i think so |
15:42
π
|
godane |
SketchCow: i did find it funny that you had a interview on g4 |
15:43
π
|
godane |
i was always thinking about you going on to techtv if you could |
15:43
π
|
SketchCow |
Yes, that was a LONG time ago and I was one of Kevin's first big interviews |
15:43
π
|
SketchCow |
He worked so hard on this and we talked a lot before and after. |
15:43
π
|
SketchCow |
He probably would remember me |
15:43
π
|
SketchCow |
I have a memory he was so happy to get a copy of the doc when it did come out a year later. |
15:44
π
|
SketchCow |
My flight leaves soon |
15:44
π
|
godane |
its on his youtube account |
15:44
π
|
SketchCow |
As per the upgrade, they come and get me and I get pre-boarding |
15:44
π
|
SketchCow |
And now I can polish off a few more movies. |
15:44
π
|
SketchCow |
(Saw Jack Reacher, Hotel Transylvania on the way in) |
15:45
π
|
balrog_ |
is there a guide anywhere for archiving IP.board (invision powerboard) forums? |
15:45
π
|
SketchCow |
Hotel Transylvania is Adam Sandler with an actual director giving him characterization directions |
15:47
π
|
BlueMax |
so...good? |
15:57
π
|
Schbirid |
it has lots of singing and other childish things, one of the few movies i stopped watching midway through |
16:02
π
|
godane |
so g4 has stopped releasing new videos like 3 weeks ago |
16:03
π
|
godane |
so thats why i'm panicing to get as much stuff as i can |
17:31
π
|
godane |
i have passed 28k videos in g4video-web |
18:58
π
|
omf_ |
After over a day I can say that DigitalOcean droplets respond faster than joyent instances |
18:58
π
|
omf_ |
It all comes down to DO using SSD and joyent does not |
18:58
π
|
omf_ |
DigitalOcean has a better online interface than Joyent and costs half the price for the same thing |
19:11
π
|
soultcer |
Interesting. I read that DO has a pretty bad network. Did you have any trouble with it? |
19:12
π
|
omf_ |
None so far |
19:12
π
|
omf_ |
I will give them a month just like I did joyent |
19:12
π
|
soultcer |
Also re joyent: It's been over a month and I still haven't received a billing statement. |
19:13
π
|
soultcer |
Preliminary calculations suggest though that I have so little traffic that the price advantage of Amazon Spot instances negates the savings for having cheap traffic at DO or joyent |
19:14
π
|
omf_ |
I tried DO because I got $10 account credit for free |
19:14
π
|
omf_ |
and Joyent gave me a bunch free too |
19:14
π
|
omf_ |
I will probably end back at Amazon simply because they offer better features |
19:23
π
|
ersi |
DO still has the SSDTWEET promo code activated btw |
19:24
π
|
ersi |
which entitled you to $10 credit |
19:59
π
|
DFJustin |
http://archive.org/details/cst_000027 |
20:32
π
|
omf_ |
These DO instances are so much faster than joyent that I have to up my droplet size to get more hard drive space |
21:21
π
|
godane |
DICE 2011 "Exploring the Ocean Deep in 3D" Presentation: https://archive.org/details/g4tv.com-video51230 |
22:04
π
|
omf_ |
Internet Archive eat another 2.2gb doughnut. You know you love it. |
22:04
π
|
joepie92 |
omf_: it's clearly a marketing tactic! ;) |
22:04
π
|
omf_ |
joepie92, yeah now it costs what I was paying for joyent |
22:05
π
|
omf_ |
$0.03 an hour |
22:06
π
|
omf_ |
See I am old school. I use the butt, then I lose the butt |
22:06
π
|
omf_ |
If the butts aren't making me money there is no reason to keep them around :) |
22:10
π
|
S[h]O[r]T |
are you talking about buttcoins? |
22:12
π
|
dashcloud |
nope- the "cloud" |
22:13
π
|
omf_ |
The clown when in public |
22:13
π
|
dashcloud |
I think I prefer "moon" more- it makes it very clear how far away your stuff is and your chances of getting it if something goes wrong |
22:16
π
|
omf_ |
I might spin up an instance of the common crawl tonight to try MapReduce and see how fast things are |
22:16
π
|
omf_ |
The thing I am trying to figure out is do I have to load the whole thing first or is it already in a format that I can just query against |
22:17
π
|
omf_ |
I don't even need all 81tb, just the URL frontier |
22:28
π
|
omf_ |
Here is what is interesting about DO. $0.007 per hour = 20gb storage, 1tb transfer |
22:29
π
|
omf_ |
$0.015 per hour = 30gb, 2tb transfer |
22:29
π
|
omf_ |
So I should just get 2 of the smallest instances to get more hard drive space |
22:32
π
|
joepie92 |
haha |
22:41
π
|
omf_ |
I added more clown info to the wiki - http://www.archiveteam.org/index.php?title=Clown_hosting |
22:43
π
|
joepie92 |
omf_: DO offers unmetered again? |
22:43
π
|
omf_ |
I didn't put that note on there. Not sure |
22:43
π
|
omf_ |
All the plans have transfer limits on them |
22:44
π
|
joepie92 |
All servers come with 1Gb/sec. network interface. Plans start with 1TB per month and increase incrementally. Once the monthly transfer limit has been exceeded, it's $0.02 per GB thereafter. YouΓΒ’??ll save a ton of money with our network and it's easy to get started. |
22:59
π
|
joepie92 |
http://www.vr.org/buy-vps/#usa |
22:59
π
|
joepie92 |
free inbound |
23:00
π
|
dashcloud |
interesting move by Google: http://dataliberation.blogspot.com/2013/04/plan-your-digital-afterlife-with.html |
23:32
π
|
joepie92 |
dashcloud: whoa. |
23:33
π
|
Aranje- |
yeah man |
23:33
π
|
Aranje- |
I'm excited about it |
23:33
π
|
dashcloud |
so, not perfect, but far better than everyone else out there |
23:33
π
|
Aranje- |
I'm setting mine up today |
23:33
π
|
Aranje- |
yeah |