Time |
Nickname |
Message |
00:02
๐
|
SketchCow |
Pete Chivani. |
00:02
๐
|
SketchCow |
Now I'm spelling it wrong. |
00:02
๐
|
SketchCow |
YOU DID THOS TO ME |
00:02
๐
|
SketchCow |
sdlkfslfkjdsf |
00:04
๐
|
SketchCow |
So, this hotel network? Suuuuuuucl |
00:04
๐
|
SketchCow |
Can't believe I paid for it. |
00:04
๐
|
SketchCow |
win 8 |
00:08
๐
|
SketchCow |
OK, feeling better. |
00:08
๐
|
SketchCow |
His name is Pete Chvany. |
00:08
๐
|
SketchCow |
"Computer Networks: The Heralds of Resource Sharing". 1972. |
00:08
๐
|
SketchCow |
It's on archive.org. |
00:28
๐
|
bsmith093 |
thanks so much |
02:02
๐
|
underscor |
33 on the ACT |
02:02
๐
|
underscor |
I'm pretty excited |
02:02
๐
|
underscor |
:B |
02:02
๐
|
* |
BlueMax stabs underscor |
02:15
๐
|
underscor |
http://en.wikipedia.org/wiki/Super_High_Me |
03:49
๐
|
chronomex |
underscor: is that out of 32? |
03:49
๐
|
chronomex |
:P |
03:49
๐
|
underscor |
chronomex: 36 |
03:49
๐
|
underscor |
:P |
03:49
๐
|
chronomex |
I kid I kid |
03:50
๐
|
chronomex |
good job |
03:50
๐
|
underscor |
Thanks :D |
03:50
๐
|
chronomex |
I think I got something around 33 too |
03:50
๐
|
chronomex |
so ... you're in good company |
03:50
๐
|
underscor |
haha |
03:50
๐
|
underscor |
99th percentile, fuck yeah |
03:50
๐
|
* |
chronomex currently packaging up symbian for the torrent |
03:50
๐
|
underscor |
99th percentile on ACT, 94th percentile on SAT, and 3.05 GPA |
03:51
๐
|
underscor |
One of these is not like the other |
03:51
๐
|
underscor |
:V |
03:51
๐
|
chronomex |
94th? :( |
03:51
๐
|
underscor |
Supposedly |
03:51
๐
|
underscor |
I couldn't find any concrete numbers anywhere |
03:51
๐
|
chronomex |
you disappoint |
03:51
๐
|
underscor |
I got a 2010 |
03:51
๐
|
chronomex |
2010 is last year |
03:51
๐
|
underscor |
Whatever percentile that is |
03:51
๐
|
underscor |
1380 on the old scale |
03:51
๐
|
chronomex |
mmmm |
03:53
๐
|
underscor |
Either way, I'm pretty happy |
03:53
๐
|
underscor |
Except for my GPA |
03:53
๐
|
underscor |
lol |
03:53
๐
|
DFJustin |
heh I got like 1510 on the old scale |
03:53
๐
|
underscor |
But that's because I hate mundane work |
03:53
๐
|
underscor |
and I spend all my time on archiveteam and other fun things |
03:53
๐
|
underscor |
instead of doing homework |
03:53
๐
|
underscor |
:< |
03:53
๐
|
underscor |
DFJustin: That's almost perfect |
03:54
๐
|
underscor |
:P |
03:54
๐
|
chronomex |
archiveteam is a good thing to spend your time on |
03:54
๐
|
underscor |
Hopefully this internship thing in august works out well too |
05:58
๐
|
underscor |
ndurner: Able to get a stats update when you have a minute? |
05:58
๐
|
underscor |
:) |
06:08
๐
|
ndurner |
will do :-) |
10:44
๐
|
Spirit_ |
useless fact of the day, the number of domains starting with each character in alexa's top 1M list http://pastebin.com/gpPxZWZY |
10:44
๐
|
Spirit_ |
M as in million, not thousand |
10:48
๐
|
Spirit_ |
and on place 744459 there is "_live.it"... |
10:49
๐
|
Spirit_ |
i wonder if i should try to grab 100.000 robots.txt per day instead of 10.000 |
11:34
๐
|
bbot_ |
Spirit_: that's a lot of suicide notes |
12:58
๐
|
Spirit_ |
bbot_: hm? |
13:00
๐
|
bbot_ |
as in, jason's "robots.txt is a suicide note" essay |
13:03
๐
|
Spirit_ |
ah yes |
13:03
๐
|
Spirit_ |
currently thinking how to make it nicely accessible |
13:04
๐
|
Spirit_ |
maybe after each scrape, check which files were changed or are new/gone and put that information in a database |
13:05
๐
|
Spirit_ |
well, let's try if i can get 100000 down instead |
13:27
๐
|
Lembam |
Hi there. :-) |
13:29
๐
|
Spirit_ |
h |
13:29
๐
|
Spirit_ |
i |
13:32
๐
|
Lembam |
I'm just having a look, what you guys are doing is so great. But I assume you've been receiving lots of thanks lately. :-P |
13:35
๐
|
Spirit_ |
careful if you look to much, i am afraid some guys in here wear no pants |
13:35
๐
|
Spirit_ |
actually it is not that often that people come here i think but i am just a side peobn |
13:35
๐
|
Spirit_ |
peon |
13:45
๐
|
emijrp |
What is the problem with wearing skirt? |
13:46
๐
|
emijrp |
Female archivist here. |
13:47
๐
|
Spirit_ |
ha, i never knew |
13:47
๐
|
Spirit_ |
girls in skirts are cool |
13:52
๐
|
Lembam |
brb |
14:04
๐
|
Spirit_ |
17962 files so far |
14:04
๐
|
Spirit_ |
i estimate 70k, since i always got ~7k from 10k |
14:05
๐
|
Spirit_ |
so something around 5-6 hours, that is great |
14:11
๐
|
Lembam |
back |
14:29
๐
|
sadcarrot |
any word on yahoo video? |
14:30
๐
|
ersi |
sadcarrot: What kind of words are you looking for? :) |
14:30
๐
|
sadcarrot |
lol |
14:30
๐
|
Spirit_ |
bash question: if i get an error, i would it to be in $result, result=$(diff -q $yesterday $today) |
14:30
๐
|
sadcarrot |
the good kinds! |
14:30
๐
|
Spirit_ |
any hint? |
14:30
๐
|
Spirit_ |
i mean this is my line "result=$(diff -q $yesterday $today)" |
14:30
๐
|
sadcarrot |
i can no longer rsync my yahoo video |
14:30
๐
|
Spirit_ |
but if a file is missing, i get an error and $result is empty |
14:30
๐
|
sadcarrot |
so, just wanted to verify that is complete |
14:30
๐
|
sadcarrot |
(password doesn't work) |
14:30
๐
|
Spirit_ |
wait a second |
14:31
๐
|
ersi |
sadcarrot: Oh, well - it's best if you'd check with SketchCow on that |
14:31
๐
|
Spirit_ |
yes |
14:31
๐
|
Spirit_ |
result=$(diff -q $yesterday $today 2>&1) |
14:31
๐
|
Spirit_ |
thanks :P |
14:32
๐
|
sadcarrot |
gotcha |
14:39
๐
|
Spirit_ |
does anyone have a tested and proven method how to identify true HTML files from bash? many sites serve random crap pages when i ask them for a robots.txt |
14:40
๐
|
Spirit_ |
i am afraid that "file" might misclassify some |
14:46
๐
|
alard |
grep "<" ? |
14:47
๐
|
Spirit_ |
that character is in txt files |
14:47
๐
|
Spirit_ |
file seems to do a good job actually |
14:48
๐
|
Spirit_ |
http://pastebin.com/raw.php?i=2ymRsydX |
14:49
๐
|
Spirit_ |
seems like people ilke to roundrobin and serve different files too, meh |
14:59
๐
|
soultcer |
Spirit_: Did you check the mime type? |
15:36
๐
|
db48x |
look for a doctype |
15:37
๐
|
db48x |
<!DOCTYPE ...!> |
15:37
๐
|
db48x |
lots of people leave them off though |
15:43
๐
|
alard |
In other news: I did a little experimenting based on Coderjoe's idea for a whois archiver. http://whoisarchive.heroku.com/ |
15:45
๐
|
Lembam |
The whois/domain lookup archiver looks cool. :-) |
15:49
๐
|
alard |
There is a paid service that does the same, though: http://www.domaintools.com/ |
15:56
๐
|
Spirit_ |
i think i will go with "file" |
15:56
๐
|
Spirit_ |
i dont like whois archiving, especially not indexed by search engines |
15:56
๐
|
Spirit_ |
actually, i wish whois would vanish |
15:57
๐
|
Spirit_ |
for privacy |
15:57
๐
|
underscor |
Spirit_: Why? |
15:57
๐
|
underscor |
IT's the same thing as a business license, or a car registration |
15:57
๐
|
Spirit_ |
because i do not want john doe to google my name and find domain x and y |
15:57
๐
|
underscor |
They're all public information |
15:57
๐
|
Spirit_ |
in the US maybe |
15:58
๐
|
underscor |
Then buy a domain privacy thing |
15:58
๐
|
underscor |
Well, com and net are administered in the us, so... :P |
15:58
๐
|
Spirit_ |
yeah, but fuck that! :P |
15:58
๐
|
underscor |
Convince your local ccTLD to get rid of whois |
15:58
๐
|
underscor |
and there you go |
15:58
๐
|
db48x |
yea, that's good information to archive |
15:59
๐
|
Spirit_ |
The compilation, |
15:59
๐
|
Spirit_ |
prohibited without the prior written consent of VeriSign. |
15:59
๐
|
Spirit_ |
repackaging, dissemination or other use of this Data is expressly |
15:59
๐
|
Spirit_ |
says many (all?) com whois' |
15:59
๐
|
db48x |
well, yea. they would say that |
15:59
๐
|
underscor |
Registrant Organization:ARCHIVE TEAM IS GO |
15:59
๐
|
underscor |
hahaha |
15:59
๐
|
db48x |
they want to have control |
16:00
๐
|
db48x |
heh |
16:00
๐
|
underscor |
Server Name: FRIENDSTER.COM.ZZZZZ.GET.LAID.AT.WWW.SWINGINGCOMMUNITY.COM |
16:00
๐
|
underscor |
IP Address: 69.41.185.226 |
16:00
๐
|
underscor |
Referral URL: http://domainhelp.opensrs.net |
16:00
๐
|
underscor |
Registrar: TUCOWS.COM CO. |
16:00
๐
|
underscor |
Whois Server: whois.tucows.com |
16:00
๐
|
underscor |
What?!?! |
16:00
๐
|
underscor |
http://whoisarchive.heroku.com/friendster.com/20110628142730.txt |
16:01
๐
|
db48x |
heh |
16:01
๐
|
Lembam |
brb |
16:02
๐
|
Spirit_ |
that would be the database lookup i guess |
16:05
๐
|
Spirit_ |
hm, do i want to delete html responses? |
16:06
๐
|
Spirit_ |
bash: /bin/ls: Argument list too long :( |
16:06
๐
|
db48x |
xargs |
16:06
๐
|
db48x |
find whatever -print0 | xargs -0 rm |
16:07
๐
|
Spirit_ |
sorry, completely unrelated to the deletion |
16:09
๐
|
Spirit_ |
but find was a good suggestion, thanks |
16:10
๐
|
Spirit_ |
or not |
16:10
๐
|
Spirit_ |
uggestion, thanks |
16:10
๐
|
Spirit_ |
bash: /usr/bin/find: Argument list too long |
16:10
๐
|
Spirit_ |
find robotstxt2/files/*/*/20110628 | wc -l |
16:10
๐
|
Spirit_ |
there was a trick with echo for this, hm |
16:20
๐
|
db48x |
any time the arguments list is too long, use find |
16:20
๐
|
db48x |
find whatever -print0 | xargs -0 wc -l |
16:21
๐
|
db48x |
sorry, for that you'll want find whatever -print0 | xargs -0 ls -l | wc -l |
16:21
๐
|
db48x |
annoying, but |
16:22
๐
|
db48x |
anyway, I'm late |
16:22
๐
|
db48x |
bbl |
16:25
๐
|
Spirit_ |
thanks, that works |
16:25
๐
|
sadcarrot |
underscor: hey man |
16:25
๐
|
sadcarrot |
underscor: can you check the status of my yahoo vid upload? |
16:25
๐
|
Spirit_ |
i guess -print0 does not buffer like without |
16:26
๐
|
underscor |
sadcarrot: Were you uploading to me or to rsync.net? |
16:27
๐
|
closure |
"(If you have reviews, Iรขยยd begin the process of archiving them via a Word document." http://wheredangerlives.blogspot.com/2011/06/professor-is-dead-long-live-netflix.html |
16:28
๐
|
closure |
netflix reviews that is |
16:28
๐
|
sadcarrot |
underscor: datadump.textfiles.com |
16:29
๐
|
underscor |
You'll have to talk to SketchCow then |
16:29
๐
|
sadcarrot |
oh ok |
16:32
๐
|
Spirit_ |
55k files down |
16:33
๐
|
Spirit_ |
about 7/10th through the 100k list |
17:50
๐
|
Spirit_ |
db48x: |
17:50
๐
|
Spirit_ |
$ time find files/*/*/20110628 -print0 | xargs -0 ls -l | wc -l |
17:50
๐
|
Spirit_ |
bash: /usr/bin/find: Argument list too long |
17:50
๐
|
Spirit_ |
:] |
17:50
๐
|
Spirit_ |
i guess 64k is a limit |
18:27
๐
|
balrog |
SketchCow: ping |
18:27
๐
|
balrog |
as for the bitsavers stuff ... are you familiar with Manx? |
18:48
๐
|
ndurner |
alard: is there a problem with your Google Groups script? |
18:48
๐
|
Spirit_ |
now lets see if 7z likes to pack these files |
18:50
๐
|
alard |
ndurner: No, it's just switched off. |
18:51
๐
|
alard |
My connection is currently busy with downloading Friendster user connections and uploading the other Friendster data. |
18:51
๐
|
alard |
I'll probably turn ggroups back on when those things are done. |
18:57
๐
|
ndurner |
ah, ok |
18:58
๐
|
ndurner |
can you upload your script somewhere so that someone else can jump in? |
18:58
๐
|
Spirit_ |
seems to work |
18:58
๐
|
ndurner |
(also, having the code for that kind of trickery might help future projects) |
19:01
๐
|
alard |
ndurner: the ggroups script? |
19:40
๐
|
ndurner |
alard: yes |
20:06
๐
|
alard |
ndurner: Sorry for the delay, I had to find my notes on ipv6 tunnels first. |
20:06
๐
|
alard |
https://gist.github.com/30cff29b602b818d018c#file_instructions.txt |
20:06
๐
|
ndurner |
thanks! |
20:06
๐
|
alard |
https://gist.github.com/30cff29b602b818d018c#file_ggroups_zipdl_ipv6.sh |
20:23
๐
|
ndurner |
underscor: Google Groups update: |
20:23
๐
|
ndurner |
directories: TOTAL: 243898, NEW: 105872, PROCESSING: 15, DONE_DIR: 138011<br> |
20:23
๐
|
ndurner |
completion rate: directories: 337/hr, groups: 865/hr |
20:23
๐
|
ndurner |
groups: TOTAL: 1245968, NEW: 767342, PROCESSING: 44, ERROR: 10944, ADULT: 4236, DONE_GRP: 463402<br> |
21:23
๐
|
alard |
marceloan: Hi, have you been able to upload your twaud.io files yet? |
21:24
๐
|
alard |
Or haven't you been able to contact SketchCow? |
21:28
๐
|
marceloan |
Hi |
21:30
๐
|
marceloan |
alard: No and no. |
21:30
๐
|
alard |
Ah. |
21:30
๐
|
marceloan |
alard: What compression should I use? |
21:30
๐
|
alard |
No compression, I guess. |
21:31
๐
|
marceloan |
alard: I have to send all the data unzipped? |
21:31
๐
|
alard |
You can try bzip or gzip, but it probably won't help. mp3's are already pretty compressed. |
21:31
๐
|
alard |
If it helps, you could rsync it to me and then I'll upload it along with my part. |
21:32
๐
|
marceloan |
Yes, how can I do it? |
21:32
๐
|
alard |
Is rsync okay? |
21:32
๐
|
marceloan |
I have to use Linux? |
21:33
๐
|
alard |
No, you can also use cwRsync, the Windows version. |
21:34
๐
|
marceloan |
That? http://www.itefix.no/cwrsync/ |
21:34
๐
|
alard |
Yes. And then you probably don't want the server, just the client. |
21:40
๐
|
marceloan |
3.6MB, downloading... 10 minutes left... |
21:40
๐
|
alard |
Ah, that takes a while. |
21:41
๐
|
alard |
That gives me the time to figure out how I can set up an rsyncd server. |
21:53
๐
|
marceloan |
Ok, I installed it. |
21:56
๐
|
alard |
Great. Let's continue in a private message. |