#archiveteam 2011-06-28,Tue

โ†‘back Search

Time Nickname Message
00:02 ๐Ÿ”— SketchCow Pete Chivani.
00:02 ๐Ÿ”— SketchCow Now I'm spelling it wrong.
00:02 ๐Ÿ”— SketchCow YOU DID THOS TO ME
00:02 ๐Ÿ”— SketchCow sdlkfslfkjdsf
00:04 ๐Ÿ”— SketchCow So, this hotel network? Suuuuuuucl
00:04 ๐Ÿ”— SketchCow Can't believe I paid for it.
00:04 ๐Ÿ”— SketchCow win 8
00:08 ๐Ÿ”— SketchCow OK, feeling better.
00:08 ๐Ÿ”— SketchCow His name is Pete Chvany.
00:08 ๐Ÿ”— SketchCow "Computer Networks: The Heralds of Resource Sharing". 1972.
00:08 ๐Ÿ”— SketchCow It's on archive.org.
00:28 ๐Ÿ”— bsmith093 thanks so much
02:02 ๐Ÿ”— underscor 33 on the ACT
02:02 ๐Ÿ”— underscor I'm pretty excited
02:02 ๐Ÿ”— underscor :B
02:02 ๐Ÿ”— * BlueMax stabs underscor
02:15 ๐Ÿ”— underscor http://en.wikipedia.org/wiki/Super_High_Me
03:49 ๐Ÿ”— chronomex underscor: is that out of 32?
03:49 ๐Ÿ”— chronomex :P
03:49 ๐Ÿ”— underscor chronomex: 36
03:49 ๐Ÿ”— underscor :P
03:49 ๐Ÿ”— chronomex I kid I kid
03:50 ๐Ÿ”— chronomex good job
03:50 ๐Ÿ”— underscor Thanks :D
03:50 ๐Ÿ”— chronomex I think I got something around 33 too
03:50 ๐Ÿ”— chronomex so ... you're in good company
03:50 ๐Ÿ”— underscor haha
03:50 ๐Ÿ”— underscor 99th percentile, fuck yeah
03:50 ๐Ÿ”— * chronomex currently packaging up symbian for the torrent
03:50 ๐Ÿ”— underscor 99th percentile on ACT, 94th percentile on SAT, and 3.05 GPA
03:51 ๐Ÿ”— underscor One of these is not like the other
03:51 ๐Ÿ”— underscor :V
03:51 ๐Ÿ”— chronomex 94th? :(
03:51 ๐Ÿ”— underscor Supposedly
03:51 ๐Ÿ”— underscor I couldn't find any concrete numbers anywhere
03:51 ๐Ÿ”— chronomex you disappoint
03:51 ๐Ÿ”— underscor I got a 2010
03:51 ๐Ÿ”— chronomex 2010 is last year
03:51 ๐Ÿ”— underscor Whatever percentile that is
03:51 ๐Ÿ”— underscor 1380 on the old scale
03:51 ๐Ÿ”— chronomex mmmm
03:53 ๐Ÿ”— underscor Either way, I'm pretty happy
03:53 ๐Ÿ”— underscor Except for my GPA
03:53 ๐Ÿ”— underscor lol
03:53 ๐Ÿ”— DFJustin heh I got like 1510 on the old scale
03:53 ๐Ÿ”— underscor But that's because I hate mundane work
03:53 ๐Ÿ”— underscor and I spend all my time on archiveteam and other fun things
03:53 ๐Ÿ”— underscor instead of doing homework
03:53 ๐Ÿ”— underscor :<
03:53 ๐Ÿ”— underscor DFJustin: That's almost perfect
03:54 ๐Ÿ”— underscor :P
03:54 ๐Ÿ”— chronomex archiveteam is a good thing to spend your time on
03:54 ๐Ÿ”— underscor Hopefully this internship thing in august works out well too
05:58 ๐Ÿ”— underscor ndurner: Able to get a stats update when you have a minute?
05:58 ๐Ÿ”— underscor :)
06:08 ๐Ÿ”— ndurner will do :-)
10:44 ๐Ÿ”— Spirit_ useless fact of the day, the number of domains starting with each character in alexa's top 1M list http://pastebin.com/gpPxZWZY
10:44 ๐Ÿ”— Spirit_ M as in million, not thousand
10:48 ๐Ÿ”— Spirit_ and on place 744459 there is "_live.it"...
10:49 ๐Ÿ”— Spirit_ i wonder if i should try to grab 100.000 robots.txt per day instead of 10.000
11:34 ๐Ÿ”— bbot_ Spirit_: that's a lot of suicide notes
12:58 ๐Ÿ”— Spirit_ bbot_: hm?
13:00 ๐Ÿ”— bbot_ as in, jason's "robots.txt is a suicide note" essay
13:03 ๐Ÿ”— Spirit_ ah yes
13:03 ๐Ÿ”— Spirit_ currently thinking how to make it nicely accessible
13:04 ๐Ÿ”— Spirit_ maybe after each scrape, check which files were changed or are new/gone and put that information in a database
13:05 ๐Ÿ”— Spirit_ well, let's try if i can get 100000 down instead
13:27 ๐Ÿ”— Lembam Hi there. :-)
13:29 ๐Ÿ”— Spirit_ h
13:29 ๐Ÿ”— Spirit_ i
13:32 ๐Ÿ”— Lembam I'm just having a look, what you guys are doing is so great. But I assume you've been receiving lots of thanks lately. :-P
13:35 ๐Ÿ”— Spirit_ careful if you look to much, i am afraid some guys in here wear no pants
13:35 ๐Ÿ”— Spirit_ actually it is not that often that people come here i think but i am just a side peobn
13:35 ๐Ÿ”— Spirit_ peon
13:45 ๐Ÿ”— emijrp What is the problem with wearing skirt?
13:46 ๐Ÿ”— emijrp Female archivist here.
13:47 ๐Ÿ”— Spirit_ ha, i never knew
13:47 ๐Ÿ”— Spirit_ girls in skirts are cool
13:52 ๐Ÿ”— Lembam brb
14:04 ๐Ÿ”— Spirit_ 17962 files so far
14:04 ๐Ÿ”— Spirit_ i estimate 70k, since i always got ~7k from 10k
14:05 ๐Ÿ”— Spirit_ so something around 5-6 hours, that is great
14:11 ๐Ÿ”— Lembam back
14:29 ๐Ÿ”— sadcarrot any word on yahoo video?
14:30 ๐Ÿ”— ersi sadcarrot: What kind of words are you looking for? :)
14:30 ๐Ÿ”— sadcarrot lol
14:30 ๐Ÿ”— Spirit_ bash question: if i get an error, i would it to be in $result, result=$(diff -q $yesterday $today)
14:30 ๐Ÿ”— sadcarrot the good kinds!
14:30 ๐Ÿ”— Spirit_ any hint?
14:30 ๐Ÿ”— Spirit_ i mean this is my line "result=$(diff -q $yesterday $today)"
14:30 ๐Ÿ”— sadcarrot i can no longer rsync my yahoo video
14:30 ๐Ÿ”— Spirit_ but if a file is missing, i get an error and $result is empty
14:30 ๐Ÿ”— sadcarrot so, just wanted to verify that is complete
14:30 ๐Ÿ”— sadcarrot (password doesn't work)
14:30 ๐Ÿ”— Spirit_ wait a second
14:31 ๐Ÿ”— ersi sadcarrot: Oh, well - it's best if you'd check with SketchCow on that
14:31 ๐Ÿ”— Spirit_ yes
14:31 ๐Ÿ”— Spirit_ result=$(diff -q $yesterday $today 2>&1)
14:31 ๐Ÿ”— Spirit_ thanks :P
14:32 ๐Ÿ”— sadcarrot gotcha
14:39 ๐Ÿ”— Spirit_ does anyone have a tested and proven method how to identify true HTML files from bash? many sites serve random crap pages when i ask them for a robots.txt
14:40 ๐Ÿ”— Spirit_ i am afraid that "file" might misclassify some
14:46 ๐Ÿ”— alard grep "<" ?
14:47 ๐Ÿ”— Spirit_ that character is in txt files
14:47 ๐Ÿ”— Spirit_ file seems to do a good job actually
14:48 ๐Ÿ”— Spirit_ http://pastebin.com/raw.php?i=2ymRsydX
14:49 ๐Ÿ”— Spirit_ seems like people ilke to roundrobin and serve different files too, meh
14:59 ๐Ÿ”— soultcer Spirit_: Did you check the mime type?
15:36 ๐Ÿ”— db48x look for a doctype
15:37 ๐Ÿ”— db48x <!DOCTYPE ...!>
15:37 ๐Ÿ”— db48x lots of people leave them off though
15:43 ๐Ÿ”— alard In other news: I did a little experimenting based on Coderjoe's idea for a whois archiver. http://whoisarchive.heroku.com/
15:45 ๐Ÿ”— Lembam The whois/domain lookup archiver looks cool. :-)
15:49 ๐Ÿ”— alard There is a paid service that does the same, though: http://www.domaintools.com/
15:56 ๐Ÿ”— Spirit_ i think i will go with "file"
15:56 ๐Ÿ”— Spirit_ i dont like whois archiving, especially not indexed by search engines
15:56 ๐Ÿ”— Spirit_ actually, i wish whois would vanish
15:57 ๐Ÿ”— Spirit_ for privacy
15:57 ๐Ÿ”— underscor Spirit_: Why?
15:57 ๐Ÿ”— underscor IT's the same thing as a business license, or a car registration
15:57 ๐Ÿ”— Spirit_ because i do not want john doe to google my name and find domain x and y
15:57 ๐Ÿ”— underscor They're all public information
15:57 ๐Ÿ”— Spirit_ in the US maybe
15:58 ๐Ÿ”— underscor Then buy a domain privacy thing
15:58 ๐Ÿ”— underscor Well, com and net are administered in the us, so... :P
15:58 ๐Ÿ”— Spirit_ yeah, but fuck that! :P
15:58 ๐Ÿ”— underscor Convince your local ccTLD to get rid of whois
15:58 ๐Ÿ”— underscor and there you go
15:58 ๐Ÿ”— db48x yea, that's good information to archive
15:59 ๐Ÿ”— Spirit_ The compilation,
15:59 ๐Ÿ”— Spirit_ prohibited without the prior written consent of VeriSign.
15:59 ๐Ÿ”— Spirit_ repackaging, dissemination or other use of this Data is expressly
15:59 ๐Ÿ”— Spirit_ says many (all?) com whois'
15:59 ๐Ÿ”— db48x well, yea. they would say that
15:59 ๐Ÿ”— underscor Registrant Organization:ARCHIVE TEAM IS GO
15:59 ๐Ÿ”— underscor hahaha
15:59 ๐Ÿ”— db48x they want to have control
16:00 ๐Ÿ”— db48x heh
16:00 ๐Ÿ”— underscor Server Name: FRIENDSTER.COM.ZZZZZ.GET.LAID.AT.WWW.SWINGINGCOMMUNITY.COM
16:00 ๐Ÿ”— underscor IP Address: 69.41.185.226
16:00 ๐Ÿ”— underscor Referral URL: http://domainhelp.opensrs.net
16:00 ๐Ÿ”— underscor Registrar: TUCOWS.COM CO.
16:00 ๐Ÿ”— underscor Whois Server: whois.tucows.com
16:00 ๐Ÿ”— underscor What?!?!
16:00 ๐Ÿ”— underscor http://whoisarchive.heroku.com/friendster.com/20110628142730.txt
16:01 ๐Ÿ”— db48x heh
16:01 ๐Ÿ”— Lembam brb
16:02 ๐Ÿ”— Spirit_ that would be the database lookup i guess
16:05 ๐Ÿ”— Spirit_ hm, do i want to delete html responses?
16:06 ๐Ÿ”— Spirit_ bash: /bin/ls: Argument list too long :(
16:06 ๐Ÿ”— db48x xargs
16:06 ๐Ÿ”— db48x find whatever -print0 | xargs -0 rm
16:07 ๐Ÿ”— Spirit_ sorry, completely unrelated to the deletion
16:09 ๐Ÿ”— Spirit_ but find was a good suggestion, thanks
16:10 ๐Ÿ”— Spirit_ or not
16:10 ๐Ÿ”— Spirit_ uggestion, thanks
16:10 ๐Ÿ”— Spirit_ bash: /usr/bin/find: Argument list too long
16:10 ๐Ÿ”— Spirit_ find robotstxt2/files/*/*/20110628 | wc -l
16:10 ๐Ÿ”— Spirit_ there was a trick with echo for this, hm
16:20 ๐Ÿ”— db48x any time the arguments list is too long, use find
16:20 ๐Ÿ”— db48x find whatever -print0 | xargs -0 wc -l
16:21 ๐Ÿ”— db48x sorry, for that you'll want find whatever -print0 | xargs -0 ls -l | wc -l
16:21 ๐Ÿ”— db48x annoying, but
16:22 ๐Ÿ”— db48x anyway, I'm late
16:22 ๐Ÿ”— db48x bbl
16:25 ๐Ÿ”— Spirit_ thanks, that works
16:25 ๐Ÿ”— sadcarrot underscor: hey man
16:25 ๐Ÿ”— sadcarrot underscor: can you check the status of my yahoo vid upload?
16:25 ๐Ÿ”— Spirit_ i guess -print0 does not buffer like without
16:26 ๐Ÿ”— underscor sadcarrot: Were you uploading to me or to rsync.net?
16:27 ๐Ÿ”— closure "(If you have reviews, Iรขย€ย™d begin the process of archiving them via a Word document." http://wheredangerlives.blogspot.com/2011/06/professor-is-dead-long-live-netflix.html
16:28 ๐Ÿ”— closure netflix reviews that is
16:28 ๐Ÿ”— sadcarrot underscor: datadump.textfiles.com
16:29 ๐Ÿ”— underscor You'll have to talk to SketchCow then
16:29 ๐Ÿ”— sadcarrot oh ok
16:32 ๐Ÿ”— Spirit_ 55k files down
16:33 ๐Ÿ”— Spirit_ about 7/10th through the 100k list
17:50 ๐Ÿ”— Spirit_ db48x:
17:50 ๐Ÿ”— Spirit_ $ time find files/*/*/20110628 -print0 | xargs -0 ls -l | wc -l
17:50 ๐Ÿ”— Spirit_ bash: /usr/bin/find: Argument list too long
17:50 ๐Ÿ”— Spirit_ :]
17:50 ๐Ÿ”— Spirit_ i guess 64k is a limit
18:27 ๐Ÿ”— balrog SketchCow: ping
18:27 ๐Ÿ”— balrog as for the bitsavers stuff ... are you familiar with Manx?
18:48 ๐Ÿ”— ndurner alard: is there a problem with your Google Groups script?
18:48 ๐Ÿ”— Spirit_ now lets see if 7z likes to pack these files
18:50 ๐Ÿ”— alard ndurner: No, it's just switched off.
18:51 ๐Ÿ”— alard My connection is currently busy with downloading Friendster user connections and uploading the other Friendster data.
18:51 ๐Ÿ”— alard I'll probably turn ggroups back on when those things are done.
18:57 ๐Ÿ”— ndurner ah, ok
18:58 ๐Ÿ”— ndurner can you upload your script somewhere so that someone else can jump in?
18:58 ๐Ÿ”— Spirit_ seems to work
18:58 ๐Ÿ”— ndurner (also, having the code for that kind of trickery might help future projects)
19:01 ๐Ÿ”— alard ndurner: the ggroups script?
19:40 ๐Ÿ”— ndurner alard: yes
20:06 ๐Ÿ”— alard ndurner: Sorry for the delay, I had to find my notes on ipv6 tunnels first.
20:06 ๐Ÿ”— alard https://gist.github.com/30cff29b602b818d018c#file_instructions.txt
20:06 ๐Ÿ”— ndurner thanks!
20:06 ๐Ÿ”— alard https://gist.github.com/30cff29b602b818d018c#file_ggroups_zipdl_ipv6.sh
20:23 ๐Ÿ”— ndurner underscor: Google Groups update:
20:23 ๐Ÿ”— ndurner directories: TOTAL: 243898, NEW: 105872, PROCESSING: 15, DONE_DIR: 138011<br>
20:23 ๐Ÿ”— ndurner completion rate: directories: 337/hr, groups: 865/hr
20:23 ๐Ÿ”— ndurner groups: TOTAL: 1245968, NEW: 767342, PROCESSING: 44, ERROR: 10944, ADULT: 4236, DONE_GRP: 463402<br>
21:23 ๐Ÿ”— alard marceloan: Hi, have you been able to upload your twaud.io files yet?
21:24 ๐Ÿ”— alard Or haven't you been able to contact SketchCow?
21:28 ๐Ÿ”— marceloan Hi
21:30 ๐Ÿ”— marceloan alard: No and no.
21:30 ๐Ÿ”— alard Ah.
21:30 ๐Ÿ”— marceloan alard: What compression should I use?
21:30 ๐Ÿ”— alard No compression, I guess.
21:31 ๐Ÿ”— marceloan alard: I have to send all the data unzipped?
21:31 ๐Ÿ”— alard You can try bzip or gzip, but it probably won't help. mp3's are already pretty compressed.
21:31 ๐Ÿ”— alard If it helps, you could rsync it to me and then I'll upload it along with my part.
21:32 ๐Ÿ”— marceloan Yes, how can I do it?
21:32 ๐Ÿ”— alard Is rsync okay?
21:32 ๐Ÿ”— marceloan I have to use Linux?
21:33 ๐Ÿ”— alard No, you can also use cwRsync, the Windows version.
21:34 ๐Ÿ”— marceloan That? http://www.itefix.no/cwrsync/
21:34 ๐Ÿ”— alard Yes. And then you probably don't want the server, just the client.
21:40 ๐Ÿ”— marceloan 3.6MB, downloading... 10 minutes left...
21:40 ๐Ÿ”— alard Ah, that takes a while.
21:41 ๐Ÿ”— alard That gives me the time to figure out how I can set up an rsyncd server.
21:53 ๐Ÿ”— marceloan Ok, I installed it.
21:56 ๐Ÿ”— alard Great. Let's continue in a private message.

irclogger-viewer