[04:15] are you guys starting to backup youtube? [04:15] No. [04:15] i was thinking backing only videos over 10 million views [04:15] well, only certain tiny bits of it. [04:15] as needed. [04:16] that'd be an interesting metric [04:16] will its to start with something then go from there [04:16] that why we have something already archive [04:16] *way [04:20] youtube receives more than 24 hours of new video every minute. [04:20] i know [04:20] i was only talking older ones [04:20] right [04:21] with very high views [04:22] the only way i think youtube could be archive would be start at very high views you clips then go from there [04:24] after that you start geting the full youtube accounts of the user that had a video with high views [04:37] I was at one point thinking of downloading crap from youtube, starting with something to download videos that show up on the "new videos" rss feed and then working on the enormous backlog [04:37] but then I realized how much disk space that would take, let alone the bandwidth required [04:38] thats why i was think the more importiant videos first [04:38] cause i don't think your going to be able to archive everything [04:38] of youtube [04:39] even in 10 years with 80tb hard drives [04:39] cause for a good at least 5 years everything will be hd 1080p if not 4096p [04:41] i don't think youtube is in TB space :D [04:41] it is at least in PB [04:42] i was just thinking the most popular videos for now [04:42] thats the best way to attack youtube as the archiveteam [04:43] anything over 100 million views [04:43] Can we "forget" any Justin Beiber videos? [04:43] yes [04:43] that includes beiber and ms. black [04:45] also once the these videos are backed up we shouldn't have to back them up again [04:45] until they add a higher quality option [04:45] Yeah, it can be a slow, ongonig project [04:45] but its a start [04:46] one thing I liked about google video (for awhile at least) was that I could download an avi or mpg file (which in many cases was probably the original upload, but I am not certain) [04:47] yes, I had to add a greasemonkey script to get the button, but the download was direct, not through some other server [08:22] huh, archive.org is unreachable [08:22] pinged it from external machine as well, still no dice [08:24] yea, it seems to be down :( [08:25] its down, omg what happened?! [08:25] 10 64.125.188.242 22.304 ms 18.375 ms 22.112 ms [08:25] [08:25] 11 207.241.224.2 662.163 ms 661.735 ms 661.682 ms [08:25] still responds to ICMP [08:25] also can i get a rough idea of local time for whomevers still awake on this channel [08:26] midnight [08:26] 2:25AM [08:26] 325am [08:26] 10 64.125.188.242.t01072-01.above.net (64.125.188.242) 21.093 ms 22.594 ms 19.376 ms [08:26] 11 * * * [08:26] well, sometimes still responds to ICMP [08:26] ca, central, and eastern, not bad [08:27] solid 80ms here [08:28] 31 packets transmitted, 31 received, 0% packet loss, time 30044ms [08:28] rtt min/avg/max/mdev = 79.446/83.262/99.299/4.459 ms [08:29] 64 bytes from 207.241.224.2: icmp_seq=11 ttl=50 time=797 ms [08:29] PING 207.241.224.2 (207.241.224.2) 56(84) bytes of data. [08:30] hrm, archive.org: Name or service not known [08:30] --- 207.241.224.2 ping statistics --- [08:30] 55 packets transmitted, 8 received, 85% packet loss, time 54255ms [08:30] rtt min/avg/max/mdev = 783.744/911.610/1024.098/80.249 ms, pipe 2 [08:31] 0332 [08:32] time, 30 seconds? uh... I think something bad may be happening and filling buffers in routers [08:33] oh, you ran for 50 and 54 seconds [08:33] there needs to be a red phone [08:33] someone to call and wake up [08:34] if they're set up right, some monitoring system should be sending out pages to netops people (provided they don't have a netops person on duty around the clock) [08:35] i bet the guy on duty was sitting in the middle of the data center, eating spaghetti [08:35] and knocked over a bottle of wine [08:35] and shorted out everything [08:36] i have a friend that works at a communications company. he told me about this one time a contractor was picking something up and hit the big red estop button with his ass [08:36] shutting down the entire datacenter [08:36] hah [08:36] hahahah [08:38] hmm [08:38] isc's nameservers (which are the only non-archive.org nameservers listed as authorative) are not answering questions for archive.org [08:39] that's a failure of asstronomical proportions [08:39] dammit, root servers, give me glue for ns[123].archive.org [08:39] yipdw^ lol [08:39] fucking glueless operation [08:40] yay. I got some glue that time [08:41] ns1 is responding to pings, ns2 is not [08:41] a little while ago web.archive.org was responding but gave a message "The Wayback server is down." [08:42] and sn3 is also responding to pings [08:42] so something is going on over there [08:43] (ns1 and ns2 are on the same /27 [08:44] (ns3 is on a different /1) [08:44] er, /2 [08:45] (as far as where the first difference in IP occurs) [08:46] bsmith093: I'm pacific time, it's nearly 1am here [08:47] wow nice spread of interest, geographically speaking [08:47] hardly :P [08:47] ya pacific here too [08:47] Coderjoe: what does the /# mean for an ip addr [08:47] eastern [08:48] 3hrs ahead greetings from the future [08:50] bsmith093: that's the netmask [08:51] ok im actually studing for my ccna class, does that have anything to do with a subnet? [08:54] bsmith093: yes; /x indicates that the first x bits of the netmask are on [08:55] the /n is the number of 1 bits in a netmask, and the netmask is how you define the subnet [08:55] so *thats* what he meant by stealing bits fro the host portion [08:55] so /8 is the same as 255.0.0.0, /16 is 255.255.0.0, /24 is 255.255.255.0 [08:55] yes, or stealing them from the network portion [08:55] to have more hosts per subnet, but fewer subnets [08:55] dear c'thulu, but I hate subnetting [08:56] well [08:56] it gets worse [08:56] "stealing bits from the host portion" is an anachronism [08:56] why not just have everything on one network? [08:56] an IPv4 anachronism that is [08:56] a subnet could be 255.63.0.0 [08:56] routing [08:56] bsmith093: routing [08:56] i figured [08:56] in IPv6, switching out the prefix is a lot easier [08:56] in theory [08:56] 11111111.00001111.000000000.00000000 [08:57] that's 12 bits, but not /12 [08:57] who does that? [08:57] so its just easier on the routers?, or easier on the poor bastards who have to move all that cat5e? [08:57] I've not seen an IP routing scheme in wide use [08:57] like that [08:57] you would have to be slightly mad [08:57] bsmith093: it simplifies routing by a lot [08:57] has nothing to do with the catN [08:57] bsmith093: there is a flat address space in use; it's called MAC addresses [08:58] try devising a routing system for that [08:58] the subnets make the routing tables much smaller, which means that the routers can be more capable with the same hardware [08:58] so its purely a logical network thing, rather that for physical reasons? [08:58] you don't have to list every possible ip address in your routing table [08:58] yes [08:58] all of IP is logical rather than physical [08:59] all of the physical stuff is handled by the lower levels of the stack [08:59] in the case of ethernet, your TCP/IP packet gets wrapped in an Ethernet frame which is sent across the network [08:59] bsmith093: I guess the post is a good analogy [08:59] the Ethernet frame handles all of the physical transport information [08:59] a post office isn't going to try to route your message based on the full address; they're going to go by the postal code first [09:00] then a local branch routes it to the right city [09:00] a smaller branch handles individual addresses [09:00] the point is to make routing feasible via hierarchical organization [09:00] i knew that, i just meant subnetting is really confusing, and also it was mentioned at the very beginning that we'd only focus on ipv4 since thats what we'd realistically see in a corporate environment, in all but the huge est networks [09:00] ... [09:01] Cisco still believes that, eh :P [09:01] apparently [09:01] as much as i am not looking forward to ipv6, it WILL be coming to homes and offices large and small [09:01] well [09:01] I like ipv6 [09:01] so many addresses :) [09:02] I think that thinking of subnetting in terms of hierarchy will make it easier [09:02] because that's really all it is [09:02] which reminds me, where do they get off charging $6,500 for this course im taking, if i wasnt taking it at college?, which i am, so its much cheaper, but still WTF?! [09:02] I mean, here's another way to think about it [09:02] let's say that you are a router [09:03] bsmith093: they can charge what the market will bear [09:03] bsmith093: you could learn the same things from a book or two [09:03] you have two interfaces, one of which is labeled as 192.168.0.0/16 and the other is 192.168.1.0/24 [09:03] that's where I learned it all, ages ago [09:03] when you receive an IP packet with an address you can easily figure out which interface to route the packet to by ANDing the address with the subnet masks [09:04] actually if nanotech ever gets off the ground, and every bot has an ip addr, they will only be able to eat 35% o the planet under ipv6 [09:04] aarggh so it wasANDing, i knew that, god i hate cisco exams [09:04] er [09:04] wait [09:04] I may have gotten that wrong [09:04] what if they use continent-level NAT [09:04] i am a bit sleepy [09:06] yes, you and the address with the netmask to determine the network [09:06] yeah, it's bitwise AND [09:06] phew [09:07] and you and the inverse of the netmask against the address if you want just the host portion (but you would never really need to do this) [09:09] well that is interesting... [09:10] from one host, tracerouting 208.70.31.251 is going through tinet.net, while 208.70.31.236 is going through he.net [09:10] I'm a bit surprised to hear that the CCNA curriculum doesn't cover IPv6, considering (1) IPv6 in China is in wide use and (2) Cisco had a hand in Golden Shield [09:10] but now I'm just being snarky [09:12] hm [09:13] and now I can not ping either (where one of them was responding before. those are ns1 and ns2) [09:20] dammit, google, you really are not helpful. when I say "archive.org" (with the quotes) I fucking MEAN the full string. stop dropping the .org and searching for just "archive". also, why do you show results for "down" (with it bolded) when it is not in my search terms and you did not say "did you mean?" [09:21] um, click "more search tools" and then hit "verbatim" [09:21] http://duckduckgo.com/?q=%22archive.org%22 [09:21] that seems to work [09:21] that turns on a special "fuck you google" mode [09:23] i can't seem to do both verbatim and past 24 hours [09:24] google has a VERBATIM MODE where has that been for, forever ?!, that would have been so useful, for all thoose very specific searchis i ran where it would pick words fromt he same root but different tenses, and so on [09:24] verbatim mode is new this month [09:24] it used to be that quotes meant verbatim [09:24] and + meant "require this" [09:25] it used to be that +"verbatim" meant verbatim [09:25] quotes have never meant "exactly this" [09:25] fun fact, google used to consider _ a letter [09:25] google likes to make their search less useful for non-shopping purposes [09:29] funny i thought quotes did mean :exactly this" thats what they're *for* [09:29] actually, it still is a letter sometimes [09:30] who honestly ultimately uses google to shop?, i use amazon, through google, first usually, but the point stands [09:30] wow, there's a http://en.wikipedia.org/wiki/.gg [09:31] why? [09:31] because it's a country code TLD [09:31] I just like that there's actually a gg TLD [09:31] trying to think of some sufficiently memey domain name to go with that [09:32] also i didnt think . was a valid character in a url, that it would be escaped [09:32] goo.gg [09:32] the fact that . is unescaped is how you can write things like en.wikipedia.org [09:32] oh, huh [09:32] there is no requirement that the host portion have any sort of hierarchy [09:33] really, i thought most of the web was all about hiearchy [09:33] URLs and URIs (and URNs) aren't really a Web thing [09:34] they're intended to address resources [09:34] but that doesn't mean that they have to be on the Web [09:34] well its almost 5am est, to bed i go ! [09:34] //localhost:6379 is a valid URI for example [09:34] yipdw^ remembering im barely older than the web, where else would they be [09:34] oh, yeah, um , that [09:35] bsmith093: ftp://ftp.cdrom.com for an example [09:35] k i feel stupid having just used that today :\ [09:35] actually [09:36] technically //localhost:6379 is not a valid *full* URI [09:36] it is a valid URI reference [09:36] it's not a full URI because it has no scheme name [09:36] wouldnt localhost count [09:36] no [09:36] scheme name, aka protocol [09:36] refs the loopback hostname [09:36] http:, ftp:, irc:, etc [09:36] the scheme is also sometimes called the protocol [09:37] how old are these rules? [09:37] also, I was wrong again [09:37] the part following the scheme should be hierarchical in nature [09:37] (but it doesn't have to be) [09:37] uh... you have somefilename.html... why would you think the . needed to be escaped? [09:38] bsmith093: the URI format has been around since 1994 [09:38] bsmith093: first came about http://tools.ietf.org/html/rfc1738 [09:38] http://tools.ietf.org/html/rfc3986 is an upadte [09:41] bsmith093: also, for an example of a URI that looks nothing like what we've been talking about so far, check out some XML namespaces [09:41] wow, I didn't realize it was that new [09:42] bsmith093: for example, XMPP declares a namespace called "jabber:iq:register", which is a valid URI [09:42] bsmith093: http://xmpp.org/registrar/namespaces.html for more info [09:46] huh, Ruby's URI class doesn't handle jabber:iq:register the way it should: [09:46] ruby-1.9.2-p290 :013 > URI.parse("jabber:iq:register").path => nil [09:46] interesting [09:46] fail! [09:47] i wonder if the xml ns parsing uses that or if it just stores it as a string [09:48] dunno [09:59] * db48x sighs [09:59] hard drive prices :P [10:54] also, webpages that move the focus away from the password field after you've started typing in it [11:19] #klol [12:47] webcite doesn't archive images in knol articles (if uplaoded to knol site) [12:48] images hosted in external sites and hotlinked in knol articles re archived correctly [12:49] and IA doesn't archive knols because robots.txt [12:50] http://knol.google.com/robots.txt [15:28] doing a knol scrapper [15:47] Do it! [16:03] i hope google doesnt ban me [16:03] using time.sleep [16:03] this scrapper download knol metadata, not content [16:04] list of knols, user, description, pageviews, date [16:04] using a list of words and the knol search engine [16:05] If you can determine a good scraper approach, we can spread it amongst many people. [16:05] scrapes this http://knol.google.com/k/knol/Search?q=incategory%3Ascience [16:07] emijrp: knol seems to be available over ipv6. [16:08] what does mean for us? [16:09] We get around rate limiting a lot easier [16:10] SketchCow: German TV makes selected video content available online but "depublishes" the videos after a certain time period (depending on "importance", mostly between 7 days and maybe 6 months). Are you generally interested? [16:12] Vaguely. [16:13] 1000+ knols metadata using only 4 words [16:13] archive.org has been archiving television, if a pump could be made, that would be good. [16:13] i heard there are 300k knols or more [16:13] not sure [16:15] are there existing interfaces that could be used for the pump? [16:16] emijrp: A free ipv6 tunnel + a /48 subnet from http://tunnelbroker.net/ gets you a large number of ip addresses, so instead of time.sleep you'd switch to the next ipv6 ip. [18:23] can someone verify my rsync [18:45] I'm watching the ROFLCON video with Brewster and Jason, and I didn't realize that Yahoo provided as much help as they did [18:51] what roflcon video, with brewster? [19:08] http://vimeo.com/31739539 [19:10] what did Yahoo ever do that was good? [19:12] Love the forever alone face in the background [19:24] if you watch the first 10 minutes or so, Brewster talks about Yahoo & Geocities, and mentions a couple of things Yahoo did to make things easier for them (I'm guessing Internet Archive) [19:24] ah. [19:24] I can't watch the video. [19:30] question: I'd like to archive a site with wget-warc, and so I thought I would just modify the splinder grabber line- am I missing anything? http://pastebin.com/UdAhF7mc [19:34] I just researched the internet archiving process of the national library of my home country and apparently they ignore robots.txt because the law specifically allows them to [19:35] that's good news right? [19:38] Dunno. Mediawiki has the normal pages at domain.example/wiki/ARTICLE, but all the dynamic pages at domain.example/w/index.php?..., so obviously the latter part is blocked by robots.txt [19:38] Crawling that part would be a waste of resources for both the archive guys and the webmaster [20:29] soultcer: all mediawiki pages are dynamic, and example.com/wiki/Article is usually rewritten by an apache mod_rewrite rule to example.com/w/index.php?title=Article [20:33] Coderjoe: Yes, of course they are also dynamic. Let me rephrase: /wiki/ARTICLE is where the content really is, /w/ is where the "edit page", "diff", ... stuff lives, which is irrelevant in most cases. [21:17] ... [21:17] i knew that, i just meant subnetting is really confusing, and also it was mentioned at the very beginning that we'd only focus on ipv4 since thats what we'd realistically see in a corporate environment, in all but the huge est networks [21:17] Cisco still believes that, eh :P [21:17] Yep, I get dirty looks everytime I bring up IPv6 in my Network Engineering IV class [21:17] (CCNA class) [21:18] that's a problem. [21:18] lol at him not getting subnets [21:18] It's really easy to practice [21:22] ciao donbex [21:25] chronomex: It doesn't help that the teacher is afraid of them [21:25] She doesn't like Linux either [21:25] "Windows servers are where most business takes place, so it's better to familiarize yourself with them" [21:26] Not too shabby argument. There's a lot of Windows servers. [21:26] Don't know what that has to do with CCNA though [21:28] Right [21:29] But I think a balance of both would be better [21:29] Most people have their own taste of things, use what you're comfortable.. It only makes sense that the teacher would recommend using what the teacher can help with [21:29] A lot of server marketshare is *nix, and especially for virtualized windows guests [21:29] sure, but it doesn't matter for a networking course [21:29] yeah [21:31] Life's full of shit to not care about [21:32] wait [21:32] windows ... servers? [21:32] that doesn't even make any sense [21:33] they're used to run crappy code from bad decisions made in Microsofts piece of ass tech [21:34] chronomex: Yep :D s/?/>/ [21:54] I'm going to Finland! [21:54] February [21:54] finland! [21:54] Helvete [21:54] I have a friend in Finland, you should say hi to him! [21:54] his name is E-J, or that's what he's named on EFNet [21:57] satan perkele [22:10] What's in Finland in February? [22:10] Me [22:10] Snow [22:11] Cold [22:11] Very cold [22:11] And a lot of darkness, I assume. [22:12] I live in Minnesota. [22:13] I figure Finland is a place that normally gets visited in the summer. [22:15] SketchCow: some scans for you http://www.apple2online.com/index.php?p=1_65_Apple-IIGS-Buyer-s-Guide http://www.apple2online.com/index.php?p=1_70_inCider-Magazine http://www.apple2online.com/index.php?p=1_53_Newsletters [22:18] hi folks, can I have someone check out a wget-warc mirror I did of a site?