| Time |
Nickname |
Message |
|
00:02
๐
|
SketchCow |
Pete Chivani. |
|
00:02
๐
|
SketchCow |
Now I'm spelling it wrong. |
|
00:02
๐
|
SketchCow |
YOU DID THOS TO ME |
|
00:02
๐
|
SketchCow |
sdlkfslfkjdsf |
|
00:04
๐
|
SketchCow |
So, this hotel network? Suuuuuuucl |
|
00:04
๐
|
SketchCow |
Can't believe I paid for it. |
|
00:04
๐
|
SketchCow |
win 8 |
|
00:08
๐
|
SketchCow |
OK, feeling better. |
|
00:08
๐
|
SketchCow |
His name is Pete Chvany. |
|
00:08
๐
|
SketchCow |
"Computer Networks: The Heralds of Resource Sharing". 1972. |
|
00:08
๐
|
SketchCow |
It's on archive.org. |
|
00:28
๐
|
bsmith093 |
thanks so much |
|
02:02
๐
|
underscor |
33 on the ACT |
|
02:02
๐
|
underscor |
I'm pretty excited |
|
02:02
๐
|
underscor |
:B |
|
02:02
๐
|
* |
BlueMax stabs underscor |
|
02:15
๐
|
underscor |
http://en.wikipedia.org/wiki/Super_High_Me |
|
03:49
๐
|
chronomex |
underscor: is that out of 32? |
|
03:49
๐
|
chronomex |
:P |
|
03:49
๐
|
underscor |
chronomex: 36 |
|
03:49
๐
|
underscor |
:P |
|
03:49
๐
|
chronomex |
I kid I kid |
|
03:50
๐
|
chronomex |
good job |
|
03:50
๐
|
underscor |
Thanks :D |
|
03:50
๐
|
chronomex |
I think I got something around 33 too |
|
03:50
๐
|
chronomex |
so ... you're in good company |
|
03:50
๐
|
underscor |
haha |
|
03:50
๐
|
underscor |
99th percentile, fuck yeah |
|
03:50
๐
|
* |
chronomex currently packaging up symbian for the torrent |
|
03:50
๐
|
underscor |
99th percentile on ACT, 94th percentile on SAT, and 3.05 GPA |
|
03:51
๐
|
underscor |
One of these is not like the other |
|
03:51
๐
|
underscor |
:V |
|
03:51
๐
|
chronomex |
94th? :( |
|
03:51
๐
|
underscor |
Supposedly |
|
03:51
๐
|
underscor |
I couldn't find any concrete numbers anywhere |
|
03:51
๐
|
chronomex |
you disappoint |
|
03:51
๐
|
underscor |
I got a 2010 |
|
03:51
๐
|
chronomex |
2010 is last year |
|
03:51
๐
|
underscor |
Whatever percentile that is |
|
03:51
๐
|
underscor |
1380 on the old scale |
|
03:51
๐
|
chronomex |
mmmm |
|
03:53
๐
|
underscor |
Either way, I'm pretty happy |
|
03:53
๐
|
underscor |
Except for my GPA |
|
03:53
๐
|
underscor |
lol |
|
03:53
๐
|
DFJustin |
heh I got like 1510 on the old scale |
|
03:53
๐
|
underscor |
But that's because I hate mundane work |
|
03:53
๐
|
underscor |
and I spend all my time on archiveteam and other fun things |
|
03:53
๐
|
underscor |
instead of doing homework |
|
03:53
๐
|
underscor |
:< |
|
03:53
๐
|
underscor |
DFJustin: That's almost perfect |
|
03:54
๐
|
underscor |
:P |
|
03:54
๐
|
chronomex |
archiveteam is a good thing to spend your time on |
|
03:54
๐
|
underscor |
Hopefully this internship thing in august works out well too |
|
05:58
๐
|
underscor |
ndurner: Able to get a stats update when you have a minute? |
|
05:58
๐
|
underscor |
:) |
|
06:08
๐
|
ndurner |
will do :-) |
|
10:44
๐
|
Spirit_ |
useless fact of the day, the number of domains starting with each character in alexa's top 1M list http://pastebin.com/gpPxZWZY |
|
10:44
๐
|
Spirit_ |
M as in million, not thousand |
|
10:48
๐
|
Spirit_ |
and on place 744459 there is "_live.it"... |
|
10:49
๐
|
Spirit_ |
i wonder if i should try to grab 100.000 robots.txt per day instead of 10.000 |
|
11:34
๐
|
bbot_ |
Spirit_: that's a lot of suicide notes |
|
12:58
๐
|
Spirit_ |
bbot_: hm? |
|
13:00
๐
|
bbot_ |
as in, jason's "robots.txt is a suicide note" essay |
|
13:03
๐
|
Spirit_ |
ah yes |
|
13:03
๐
|
Spirit_ |
currently thinking how to make it nicely accessible |
|
13:04
๐
|
Spirit_ |
maybe after each scrape, check which files were changed or are new/gone and put that information in a database |
|
13:05
๐
|
Spirit_ |
well, let's try if i can get 100000 down instead |
|
13:27
๐
|
Lembam |
Hi there. :-) |
|
13:29
๐
|
Spirit_ |
h |
|
13:29
๐
|
Spirit_ |
i |
|
13:32
๐
|
Lembam |
I'm just having a look, what you guys are doing is so great. But I assume you've been receiving lots of thanks lately. :-P |
|
13:35
๐
|
Spirit_ |
careful if you look to much, i am afraid some guys in here wear no pants |
|
13:35
๐
|
Spirit_ |
actually it is not that often that people come here i think but i am just a side peobn |
|
13:35
๐
|
Spirit_ |
peon |
|
13:45
๐
|
emijrp |
What is the problem with wearing skirt? |
|
13:46
๐
|
emijrp |
Female archivist here. |
|
13:47
๐
|
Spirit_ |
ha, i never knew |
|
13:47
๐
|
Spirit_ |
girls in skirts are cool |
|
13:52
๐
|
Lembam |
brb |
|
14:04
๐
|
Spirit_ |
17962 files so far |
|
14:04
๐
|
Spirit_ |
i estimate 70k, since i always got ~7k from 10k |
|
14:05
๐
|
Spirit_ |
so something around 5-6 hours, that is great |
|
14:11
๐
|
Lembam |
back |
|
14:29
๐
|
sadcarrot |
any word on yahoo video? |
|
14:30
๐
|
ersi |
sadcarrot: What kind of words are you looking for? :) |
|
14:30
๐
|
sadcarrot |
lol |
|
14:30
๐
|
Spirit_ |
bash question: if i get an error, i would it to be in $result, result=$(diff -q $yesterday $today) |
|
14:30
๐
|
sadcarrot |
the good kinds! |
|
14:30
๐
|
Spirit_ |
any hint? |
|
14:30
๐
|
Spirit_ |
i mean this is my line "result=$(diff -q $yesterday $today)" |
|
14:30
๐
|
sadcarrot |
i can no longer rsync my yahoo video |
|
14:30
๐
|
Spirit_ |
but if a file is missing, i get an error and $result is empty |
|
14:30
๐
|
sadcarrot |
so, just wanted to verify that is complete |
|
14:30
๐
|
sadcarrot |
(password doesn't work) |
|
14:30
๐
|
Spirit_ |
wait a second |
|
14:31
๐
|
ersi |
sadcarrot: Oh, well - it's best if you'd check with SketchCow on that |
|
14:31
๐
|
Spirit_ |
yes |
|
14:31
๐
|
Spirit_ |
result=$(diff -q $yesterday $today 2>&1) |
|
14:31
๐
|
Spirit_ |
thanks :P |
|
14:32
๐
|
sadcarrot |
gotcha |
|
14:39
๐
|
Spirit_ |
does anyone have a tested and proven method how to identify true HTML files from bash? many sites serve random crap pages when i ask them for a robots.txt |
|
14:40
๐
|
Spirit_ |
i am afraid that "file" might misclassify some |
|
14:46
๐
|
alard |
grep "<" ? |
|
14:47
๐
|
Spirit_ |
that character is in txt files |
|
14:47
๐
|
Spirit_ |
file seems to do a good job actually |
|
14:48
๐
|
Spirit_ |
http://pastebin.com/raw.php?i=2ymRsydX |
|
14:49
๐
|
Spirit_ |
seems like people ilke to roundrobin and serve different files too, meh |
|
14:59
๐
|
soultcer |
Spirit_: Did you check the mime type? |
|
15:36
๐
|
db48x |
look for a doctype |
|
15:37
๐
|
db48x |
<!DOCTYPE ...!> |
|
15:37
๐
|
db48x |
lots of people leave them off though |
|
15:43
๐
|
alard |
In other news: I did a little experimenting based on Coderjoe's idea for a whois archiver. http://whoisarchive.heroku.com/ |
|
15:45
๐
|
Lembam |
The whois/domain lookup archiver looks cool. :-) |
|
15:49
๐
|
alard |
There is a paid service that does the same, though: http://www.domaintools.com/ |
|
15:56
๐
|
Spirit_ |
i think i will go with "file" |
|
15:56
๐
|
Spirit_ |
i dont like whois archiving, especially not indexed by search engines |
|
15:56
๐
|
Spirit_ |
actually, i wish whois would vanish |
|
15:57
๐
|
Spirit_ |
for privacy |
|
15:57
๐
|
underscor |
Spirit_: Why? |
|
15:57
๐
|
underscor |
IT's the same thing as a business license, or a car registration |
|
15:57
๐
|
Spirit_ |
because i do not want john doe to google my name and find domain x and y |
|
15:57
๐
|
underscor |
They're all public information |
|
15:57
๐
|
Spirit_ |
in the US maybe |
|
15:58
๐
|
underscor |
Then buy a domain privacy thing |
|
15:58
๐
|
underscor |
Well, com and net are administered in the us, so... :P |
|
15:58
๐
|
Spirit_ |
yeah, but fuck that! :P |
|
15:58
๐
|
underscor |
Convince your local ccTLD to get rid of whois |
|
15:58
๐
|
underscor |
and there you go |
|
15:58
๐
|
db48x |
yea, that's good information to archive |
|
15:59
๐
|
Spirit_ |
The compilation, |
|
15:59
๐
|
Spirit_ |
prohibited without the prior written consent of VeriSign. |
|
15:59
๐
|
Spirit_ |
repackaging, dissemination or other use of this Data is expressly |
|
15:59
๐
|
Spirit_ |
says many (all?) com whois' |
|
15:59
๐
|
db48x |
well, yea. they would say that |
|
15:59
๐
|
underscor |
Registrant Organization:ARCHIVE TEAM IS GO |
|
15:59
๐
|
underscor |
hahaha |
|
15:59
๐
|
db48x |
they want to have control |
|
16:00
๐
|
db48x |
heh |
|
16:00
๐
|
underscor |
Server Name: FRIENDSTER.COM.ZZZZZ.GET.LAID.AT.WWW.SWINGINGCOMMUNITY.COM |
|
16:00
๐
|
underscor |
IP Address: 69.41.185.226 |
|
16:00
๐
|
underscor |
Referral URL: http://domainhelp.opensrs.net |
|
16:00
๐
|
underscor |
Registrar: TUCOWS.COM CO. |
|
16:00
๐
|
underscor |
Whois Server: whois.tucows.com |
|
16:00
๐
|
underscor |
What?!?! |
|
16:00
๐
|
underscor |
http://whoisarchive.heroku.com/friendster.com/20110628142730.txt |
|
16:01
๐
|
db48x |
heh |
|
16:01
๐
|
Lembam |
brb |
|
16:02
๐
|
Spirit_ |
that would be the database lookup i guess |
|
16:05
๐
|
Spirit_ |
hm, do i want to delete html responses? |
|
16:06
๐
|
Spirit_ |
bash: /bin/ls: Argument list too long :( |
|
16:06
๐
|
db48x |
xargs |
|
16:06
๐
|
db48x |
find whatever -print0 | xargs -0 rm |
|
16:07
๐
|
Spirit_ |
sorry, completely unrelated to the deletion |
|
16:09
๐
|
Spirit_ |
but find was a good suggestion, thanks |
|
16:10
๐
|
Spirit_ |
or not |
|
16:10
๐
|
Spirit_ |
uggestion, thanks |
|
16:10
๐
|
Spirit_ |
bash: /usr/bin/find: Argument list too long |
|
16:10
๐
|
Spirit_ |
find robotstxt2/files/*/*/20110628 | wc -l |
|
16:10
๐
|
Spirit_ |
there was a trick with echo for this, hm |
|
16:20
๐
|
db48x |
any time the arguments list is too long, use find |
|
16:20
๐
|
db48x |
find whatever -print0 | xargs -0 wc -l |
|
16:21
๐
|
db48x |
sorry, for that you'll want find whatever -print0 | xargs -0 ls -l | wc -l |
|
16:21
๐
|
db48x |
annoying, but |
|
16:22
๐
|
db48x |
anyway, I'm late |
|
16:22
๐
|
db48x |
bbl |
|
16:25
๐
|
Spirit_ |
thanks, that works |
|
16:25
๐
|
sadcarrot |
underscor: hey man |
|
16:25
๐
|
sadcarrot |
underscor: can you check the status of my yahoo vid upload? |
|
16:25
๐
|
Spirit_ |
i guess -print0 does not buffer like without |
|
16:26
๐
|
underscor |
sadcarrot: Were you uploading to me or to rsync.net? |
|
16:27
๐
|
closure |
"(If you have reviews, Iรขยยd begin the process of archiving them via a Word document." http://wheredangerlives.blogspot.com/2011/06/professor-is-dead-long-live-netflix.html |
|
16:28
๐
|
closure |
netflix reviews that is |
|
16:28
๐
|
sadcarrot |
underscor: datadump.textfiles.com |
|
16:29
๐
|
underscor |
You'll have to talk to SketchCow then |
|
16:29
๐
|
sadcarrot |
oh ok |
|
16:32
๐
|
Spirit_ |
55k files down |
|
16:33
๐
|
Spirit_ |
about 7/10th through the 100k list |
|
17:50
๐
|
Spirit_ |
db48x: |
|
17:50
๐
|
Spirit_ |
$ time find files/*/*/20110628 -print0 | xargs -0 ls -l | wc -l |
|
17:50
๐
|
Spirit_ |
bash: /usr/bin/find: Argument list too long |
|
17:50
๐
|
Spirit_ |
:] |
|
17:50
๐
|
Spirit_ |
i guess 64k is a limit |
|
18:27
๐
|
balrog |
SketchCow: ping |
|
18:27
๐
|
balrog |
as for the bitsavers stuff ... are you familiar with Manx? |
|
18:48
๐
|
ndurner |
alard: is there a problem with your Google Groups script? |
|
18:48
๐
|
Spirit_ |
now lets see if 7z likes to pack these files |
|
18:50
๐
|
alard |
ndurner: No, it's just switched off. |
|
18:51
๐
|
alard |
My connection is currently busy with downloading Friendster user connections and uploading the other Friendster data. |
|
18:51
๐
|
alard |
I'll probably turn ggroups back on when those things are done. |
|
18:57
๐
|
ndurner |
ah, ok |
|
18:58
๐
|
ndurner |
can you upload your script somewhere so that someone else can jump in? |
|
18:58
๐
|
Spirit_ |
seems to work |
|
18:58
๐
|
ndurner |
(also, having the code for that kind of trickery might help future projects) |
|
19:01
๐
|
alard |
ndurner: the ggroups script? |
|
19:40
๐
|
ndurner |
alard: yes |
|
20:06
๐
|
alard |
ndurner: Sorry for the delay, I had to find my notes on ipv6 tunnels first. |
|
20:06
๐
|
alard |
https://gist.github.com/30cff29b602b818d018c#file_instructions.txt |
|
20:06
๐
|
ndurner |
thanks! |
|
20:06
๐
|
alard |
https://gist.github.com/30cff29b602b818d018c#file_ggroups_zipdl_ipv6.sh |
|
20:23
๐
|
ndurner |
underscor: Google Groups update: |
|
20:23
๐
|
ndurner |
directories: TOTAL: 243898, NEW: 105872, PROCESSING: 15, DONE_DIR: 138011<br> |
|
20:23
๐
|
ndurner |
completion rate: directories: 337/hr, groups: 865/hr |
|
20:23
๐
|
ndurner |
groups: TOTAL: 1245968, NEW: 767342, PROCESSING: 44, ERROR: 10944, ADULT: 4236, DONE_GRP: 463402<br> |
|
21:23
๐
|
alard |
marceloan: Hi, have you been able to upload your twaud.io files yet? |
|
21:24
๐
|
alard |
Or haven't you been able to contact SketchCow? |
|
21:28
๐
|
marceloan |
Hi |
|
21:30
๐
|
marceloan |
alard: No and no. |
|
21:30
๐
|
alard |
Ah. |
|
21:30
๐
|
marceloan |
alard: What compression should I use? |
|
21:30
๐
|
alard |
No compression, I guess. |
|
21:31
๐
|
marceloan |
alard: I have to send all the data unzipped? |
|
21:31
๐
|
alard |
You can try bzip or gzip, but it probably won't help. mp3's are already pretty compressed. |
|
21:31
๐
|
alard |
If it helps, you could rsync it to me and then I'll upload it along with my part. |
|
21:32
๐
|
marceloan |
Yes, how can I do it? |
|
21:32
๐
|
alard |
Is rsync okay? |
|
21:32
๐
|
marceloan |
I have to use Linux? |
|
21:33
๐
|
alard |
No, you can also use cwRsync, the Windows version. |
|
21:34
๐
|
marceloan |
That? http://www.itefix.no/cwrsync/ |
|
21:34
๐
|
alard |
Yes. And then you probably don't want the server, just the client. |
|
21:40
๐
|
marceloan |
3.6MB, downloading... 10 minutes left... |
|
21:40
๐
|
alard |
Ah, that takes a while. |
|
21:41
๐
|
alard |
That gives me the time to figure out how I can set up an rsyncd server. |
|
21:53
๐
|
marceloan |
Ok, I installed it. |
|
21:56
๐
|
alard |
Great. Let's continue in a private message. |