Time |
Nickname |
Message |
01:32
🔗
|
swebb |
chronomex: Yea, I'm doing the auto-op stuff. |
01:33
🔗
|
chronomex |
ok, my hostname has changed; I'm now coming from numbertron.com (and never from gir.seattlewireless.net) |
08:02
🔗
|
BlueMax |
I hate how one of the best ZX Spectrum emulators on Windows isn't free |
08:41
🔗
|
godane |
i broke the 2400+ mark in my g4video-web collection |
08:56
🔗
|
SmileyG |
hmmm |
08:56
🔗
|
SmileyG |
punchfork has a date, but xanga doesn't.. |
08:56
🔗
|
SmileyG |
is punchfork done too? |
08:56
🔗
|
BlueMax |
...why did they call their site "punchfork"? |
08:56
🔗
|
BlueMax |
that just sounds painful. |
08:57
🔗
|
SmileyG |
;) |
08:57
🔗
|
SmileyG |
Hmmm |
08:57
🔗
|
* |
SmileyG adds another suggestion for the warrior |
08:59
🔗
|
SmileyG |
https://github.com/ArchiveTeam/seesaw-kit/issues/19 |
13:14
🔗
|
swebb |
chronomex: ok, I've updated the auto-op to use your new source domain. Welcome back! |
13:26
🔗
|
SmileyG |
hmmm |
13:26
🔗
|
SmileyG |
SCO requesting permission to "loose" documents |
13:26
🔗
|
SmileyG |
suggest IA offers free storage and digital conversion |
13:29
🔗
|
ersi |
EFF has taken similar things up previously AFAIK |
13:29
🔗
|
ersi |
I think that's why SketchCow got a few pallets of paper previously |
13:33
🔗
|
SmileyG |
heh, I was joking, but it would be rather amusing |
13:38
🔗
|
ersi |
I think you'd need several warehouses though |
14:44
🔗
|
xk_id |
I'm here |
14:45
🔗
|
ersi |
enjoy the cat and mouse game of Banhammer 3k |
14:46
🔗
|
xk_id |
I just need a bunch of cheap machine on which to run the spider and a ssh tunnel, I think. |
14:46
🔗
|
ersi |
throw in more hosts, go slower/faster in turns, switch patterns/user agents etc |
14:46
🔗
|
xk_id |
what's the name for providers of such services? |
14:46
🔗
|
ersi |
lowendbox.com |
14:47
🔗
|
xk_id |
I just found out some guys called rackspace also provide 512 MB RAM machines with full root for 2p an hour |
14:47
🔗
|
ersi |
yeah |
14:48
🔗
|
ersi |
there's a bunch of providers.. linode is another one |
14:48
🔗
|
ersi |
slicehost |
14:48
🔗
|
ersi |
etc etc |
14:48
🔗
|
ersi |
You can even get VM's at Google these days, that'd be interesting. I'd like to see them ban Google |
14:48
🔗
|
xk_id |
hah |
14:49
🔗
|
ersi |
https://cloud.google.com/products/compute-engine |
14:49
🔗
|
xk_id |
wow |
14:49
🔗
|
xk_id |
well, if I can get 5 machines in 5 different countries |
14:50
🔗
|
xk_id |
and iterate each time I get banned... |
14:50
🔗
|
xk_id |
it should work, right? |
14:50
🔗
|
xk_id |
and thanks for the recommendations, there seems to be a plethora of choices available |
14:50
🔗
|
ersi |
I enjoy killing services |
14:50
🔗
|
ersi |
And seeing how you most likely hammer the shit out of that social network, it's my pleasure |
14:50
🔗
|
ersi |
:D |
14:50
🔗
|
xk_id |
Nooooo |
14:51
🔗
|
xk_id |
I promise you I am not... |
14:51
🔗
|
xk_id |
I ran *one* single threaded crawler |
14:51
🔗
|
ersi |
Well, you're probably as hard to spot as a naked man on a Town Square |
14:51
🔗
|
xk_id |
for most of the time |
14:51
🔗
|
ersi |
I'd just request a bunch of IPs for each VM |
14:51
🔗
|
xk_id |
I only started running a second one from my laptop for about an hour, before the first one got banned. |
14:51
🔗
|
ersi |
until they're smart and block the whole provider |
14:51
🔗
|
xk_id |
delays between requests were between 0.8 and 1s |
14:52
🔗
|
ersi |
Well, it doesn't look like a large social network.. I'd not be suprised if they got super sucky infra |
14:55
🔗
|
xk_id |
is it illegal what I'm doing? |
14:57
🔗
|
ersi |
I dunno, maybe. Depends on country, county and how tech litterate your justice system is in general |
14:57
🔗
|
ersi |
Then again, most people probably break a few laws every now and then. There's plenty of laws. |
14:58
🔗
|
xk_id |
? |
14:58
🔗
|
xk_id |
is it a bad idea to explain to the rackspace customer support what I am doing |
14:59
🔗
|
ersi |
Why would you, though? |
14:59
🔗
|
xk_id |
I wanted to know if they have something useful for me. |
14:59
🔗
|
xk_id |
I had to explain what my requirements were |
14:59
🔗
|
ersi |
Well, you just want a VM and possibly a few IPs |
15:00
🔗
|
xk_id |
I've already mentioned crawling, and now they ask if I can provide more details about usage |
15:00
🔗
|
Schbirid |
mention it is a research project |
15:00
🔗
|
xk_id |
k |
15:04
🔗
|
xk_id |
is it any possibility I might get accused of of ddos'ing them? |
15:04
🔗
|
xk_id |
or am I just becoming paranoid? |
15:04
🔗
|
Schbirid |
anything is possible :( |
15:04
🔗
|
xk_id |
don't joke.. |
15:05
🔗
|
ersi |
We're not joking |
15:05
🔗
|
ersi |
then again, what the fuck do we care. We eat services for breakfast occationally |
15:06
🔗
|
xk_id |
have you ever had problems? |
15:07
🔗
|
ersi |
sure, of course services fight back |
15:07
🔗
|
Schbirid |
what was that peotry site named again? |
15:07
🔗
|
ersi |
Schbirid: Lulu. |
15:08
🔗
|
ersi |
Or well, poetry.com |
15:08
🔗
|
Schbirid |
i think jason mentioned their struggle in his talks |
15:08
🔗
|
Schbirid |
http://ascii.textfiles.com/archives/3278 maybe |
15:11
🔗
|
xk_id |
heh |
15:11
🔗
|
xk_id |
funny title :) |
15:11
🔗
|
xk_id |
well |
15:12
🔗
|
xk_id |
not finishing my degree would probably be worse than being sued |
15:12
🔗
|
xk_id |
that aside, I hope my supervisor is illiterate enough to realise what I'm actually up to. |
15:12
🔗
|
xk_id |
(illiterate in respect to IT) |
15:14
🔗
|
SmileyG |
hmmm |
15:14
🔗
|
SmileyG |
what IS your degree btw? |
15:15
🔗
|
SmileyG |
just randomly out of interest? |
15:16
🔗
|
DFJustin |
http://www.nytimes.com/2013/02/04/world/africa/saving-timbuktus-priceless-artifacts-from-militants-clutches.html?_r=0 |
15:17
🔗
|
xk_id |
uh, two rackspace guys asked me online what I want to use the servers for. so now a third guy called me to ask the same thing |
15:17
🔗
|
xk_id |
Oh, my degree is IT/management consultancy. Not very romantic. |
15:17
🔗
|
ersi |
rackspace does that |
15:17
🔗
|
ersi |
they phone all new costumers |
15:17
🔗
|
xk_id |
But I managed to get away with an interesting dissertation topic |
15:17
🔗
|
ersi |
xk_id: That's a.. degree? |
15:18
🔗
|
xk_id |
well, it's not what it's called |
15:18
🔗
|
xk_id |
but it's what it is is, essentially |
15:18
🔗
|
xk_id |
if you're a really good student, you end up an IT/mngmt consultant. |
15:18
🔗
|
xk_id |
but the degree is called Information Management for Business |
15:18
🔗
|
ersi |
In other words, can't get a proper employment? |
15:19
🔗
|
xk_id |
hmmm? I thought those guys are pretty sorted |
15:19
🔗
|
ersi |
I don't got much over for people who instantly turn into consultants |
15:19
🔗
|
ersi |
Got a pretty bad rep |
15:19
🔗
|
xk_id |
well, as you can see, I kind of drift away from my degree |
15:19
🔗
|
xk_id |
:P |
15:19
🔗
|
xk_id |
my dissertation is on network science |
16:14
🔗
|
SmileyG |
you have a interesting disseration on a crap sounding degree. |
16:14
🔗
|
SmileyG |
I was the other way around, |
16:14
🔗
|
xk_id |
what was your degree and dissertation? |
16:14
🔗
|
SmileyG |
Comp Sci |
16:14
🔗
|
SmileyG |
and errr |
16:15
🔗
|
SmileyG |
multiplatform location aware social gaming and interaction tool |
16:15
🔗
|
SmileyG |
a website which found gamers with simular interests who were located nearby. |
16:15
🔗
|
SmileyG |
I didn't evne build the site. |
16:15
🔗
|
SmileyG |
It was more fun looking at all the issues surrounding the idea. |
16:16
🔗
|
SmileyG |
From: wtf is the point, anyone online can game anyone else so why be local? |
16:16
🔗
|
SmileyG |
to "WHO WILL THINK OF THE CHILDREN?!" |
16:27
🔗
|
xk_id |
heh |
16:28
🔗
|
xk_id |
I doubt I'll get a good grade on my dissertation, despite the extreme effort I'm putting in it. Because I think it's the kind of looking at issues that you mentioned which is expected.. Not so much trying to crawl a website without getting banned.. |
16:28
🔗
|
xk_id |
I'm doing it wrong. |
16:29
🔗
|
Schbirid |
the result does not count, the approach, working, thoughts etc do. at least in germany |
16:50
🔗
|
xk_id |
same here. and I'm not following it at all. |
16:50
🔗
|
xk_id |
I'm very retarded, I don't know what I'm thinking... |
16:50
🔗
|
xk_id |
I got too enamourated with this topic..... |
16:50
🔗
|
xk_id |
I have less than 2 months left |
16:50
🔗
|
xk_id |
oh, god. |
16:51
🔗
|
SmileyG |
I did that |
16:51
🔗
|
SmileyG |
step back |
16:51
🔗
|
SmileyG |
drink something |
16:51
🔗
|
SmileyG |
the crazy thing is, while doing that |
16:51
🔗
|
SmileyG |
I re-wrote my wifes disseration, fixing all her grammar and stuff. |
16:51
🔗
|
SmileyG |
I'm dylexic, but it was a good way to not think about mine at the time D: |
16:52
🔗
|
Schbirid |
procrastination is evil |
16:56
🔗
|
xk_id |
btw, sorry, I know you won't be able to help, but I really need to tell this to someone to get it off my chest. I caught my crawler malfunctioning (i.e extracting incorrect data from the webpage; 2 pages in the friendlist, only extracted the first one). Now I'm running everything again and it works well. I have no, absolutely no idea what could be going on, how often it happens, and why |
16:56
🔗
|
* |
xk_id rips the last hair on his scalp |
16:56
🔗
|
SmileyG |
write about the bug!!!!!!! |
16:57
🔗
|
Schbirid |
what he said! |
16:57
🔗
|
xk_id |
can I do that?! |
16:57
🔗
|
SmileyG |
lol thats kind of like the point. |
16:57
🔗
|
SmileyG |
:/ |
16:57
🔗
|
SmileyG |
Or it was in CompSci |
16:57
🔗
|
SmileyG |
It was never about the end project, its about the journey. |
16:57
🔗
|
SmileyG |
Show how you adapt to problems |
16:57
🔗
|
Schbirid |
yeah |
16:57
🔗
|
SmileyG |
Show how you've used your learning of the last 3 years to get around issues. |
16:58
🔗
|
SmileyG |
SHINE BRIGHTER THAN THE BRIGHTEST STAR |
16:58
🔗
|
SmileyG |
BURN LIKE THE NUCLEAR FORCE YOU ARE. |
17:00
🔗
|
xk_id |
I'm wondering whether I should restrict the scope of my dissertation to the crawler, tbh |
17:04
🔗
|
tef |
xk_id: you know that thing social networks do? return a fail whale page |
17:04
🔗
|
Schbirid |
speak to your supervisor(or how that is called) if possible |
17:04
🔗
|
tef |
that |
17:05
🔗
|
xk_id |
tef: sorry? |
17:05
🔗
|
tef |
xk_id: every so often you get a crap page and you have to hit f5. your crawler needs to have the same data |
17:05
🔗
|
tef |
behaviour even |
17:05
🔗
|
xk_id |
Schbirid: I tried, I dunno, I think I'm going really wrong about the whole thing. I'm 110% about results. |
17:06
🔗
|
tef |
you need to be stricter about your scraper, and ensure it fails fast if the page is not what it expects, and says 'page error' rather than 'page ok, no data' |
17:07
🔗
|
xk_id |
I haven't implemented any tests for page errors. |
17:07
🔗
|
xk_id |
gather.com didn't seem that dynamic, I thought it won't do that sort of crap |
17:08
🔗
|
tef |
heh |
17:08
🔗
|
tef |
web pages fail |
17:08
🔗
|
tef |
they fail more when you hammer them |
17:08
🔗
|
xk_id |
I shall not sob! |
17:30
🔗
|
xk_id |
if only there was somewhere on the user's profile the total number of friends.. |
17:52
🔗
|
soultcer |
xk_id: What website are you scraping? |
17:52
🔗
|
xk_id |
Gather.com |
17:54
🔗
|
soultcer |
What's the crawler written in? |
17:54
🔗
|
xk_id |
Node.js |
19:21
🔗
|
xk_id |
\o/ |
19:21
🔗
|
xk_id |
crawler operational on cloudshards.com |
20:46
🔗
|
* |
xk_id cries with joy |
20:47
🔗
|
xk_id |
I found the bug |
20:47
🔗
|
ersi |
\o/ |
20:49
🔗
|
xk_id |
programming is.. well.. it's something. the joys and the pains certainly balance each other.... |
20:49
🔗
|
xk_id |
:) |
20:50
🔗
|
xk_id |
It's as much beautiful as it is horrible. |
20:50
🔗
|
xk_id |
Those balances in life.. |
20:52
🔗
|
ersi |
Heh, yeah. |
21:13
🔗
|
schbiridi |
xk_id: document it, you got another page written :) |
21:14
🔗
|
xk_id |
:) |
21:15
🔗
|
xk_id |
I really thought my dissertation should resemble more a journal article, rather than a sort of reflective/auto-biographical piece. but your suggestions are consistent with what I think my supervisor has been trying to explain to me for a while... |
21:38
🔗
|
SmileyG |
explain what your going to do |
21:39
🔗
|
SmileyG |
explain what you did |
21:39
🔗
|
SmileyG |
explain everything else. |
22:11
🔗
|
xk_id |
I'm successfully running my crawlers on two VPSs. that I'm paying $3/month |
22:12
🔗
|
xk_id |
*that I'm paying $3/month for |
22:12
🔗
|
xk_id |
I don't think it's too bad |
22:12
🔗
|
xk_id |
and there seem to be lots of providers like that |
22:15
🔗
|
chronomex |
I like the 21st century :) |
22:15
🔗
|
xk_id |
that's exactly what I was thinking too :) |
22:19
🔗
|
SmileyG |
we do live in that future I dreamt of as a child :o |
22:19
🔗
|
godane |
so i have uploaded 2679 g4tv.com web videos |
22:20
🔗
|
SmileyG |
:O |
22:39
🔗
|
omf_ |
godane, are you just working your way through that site? |
22:47
🔗
|
godane |
i'm working on the videos |
22:48
🔗
|
godane |
i can't get everything by myself |
22:48
🔗
|
omf_ |
do you have a list of what is left and what is done? |
22:50
🔗
|
godane |
i got most of the forums so i think i can do the forums |
22:50
🔗
|
godane |
i also have the feed |
22:51
🔗
|
godane |
i want to grab this: http://www.g4tv.com/techtvvault/index.html |
22:51
🔗
|
godane |
but there is no easy way i think |
22:52
🔗
|
godane |
very old techtv articles |
22:53
🔗
|
godane |
i getting triumph of the nerds |
22:54
🔗
|
godane |
i'm also getting secret life of machines |
22:55
🔗
|
godane |
that is only found on p2p i think |
22:55
🔗
|
godane |
the author even says to get it from p2p |
22:55
🔗
|
godane |
http://en.wikipedia.org/wiki/Secret_Life_of_Machines |
22:57
🔗
|
godane |
it looks like it was released on video tape and dvd |
22:58
🔗
|
chronomex |
secret life of machines is cool |
22:58
🔗
|
godane |
its old enough that it shouldn't get darked |
23:11
🔗
|
xk_id |
do you guys archive vimeo/youtube? |
23:11
🔗
|
xk_id |
or, does anybody, for the matter? |
23:11
🔗
|
xk_id |
better I should google.. |
23:11
🔗
|
balrog |
"72 hours of video are uploaded to YouTube every minute" |
23:11
🔗
|
balrog |
good luck. |
23:11
🔗
|
xk_id |
I had no idea that was the scale |
23:16
🔗
|
godane |
i think IA trys to back up the ones with bigger videos |
23:16
🔗
|
chronomex |
IA sucks in the videos that are mentioned on the twitter feed they archive |
23:16
🔗
|
godane |
thats my thought |
23:17
🔗
|
godane |
i also think i would be best to just also back up the videos said users have uploaded |
23:18
🔗
|
chronomex |
sure |
23:18
🔗
|
godane |
its a way to do sort of a tree grab of youtube |
23:19
🔗
|
godane |
then you can start looking at playlists and grab those videos and all thoses users videos |
23:19
🔗
|
godane |
and etc |
23:57
🔗
|
dashcloud |
xk_id: can you access any of the things you want over IPv6? There's a vastly larger number of IPv6 addresses available for you if you could |
23:58
🔗
|
dashcloud |
also, if you're really stuck, there's a Plan Z: bulletproof hosting (which is something to consider only if everything else fails) |