[00:07] anyone ue mysqldump to do backups from remote systems? i cant get it to work remotely - seems to ignore the --host= option [00:07] so its trying to connect to localhost instead [00:07] any ideas? [00:11] try "-h (hostname)" instead of --host=(hostname) ? [00:12] And perhaps set the port number as well? [00:12] i've used -h with other mysql tools, even when using the default port [00:12] yeah all got it now, im just dumb [00:13] i just need to ask in 10 irc chans at once [00:13] so everyoine thinks im dumb [00:13] good job [00:14] :D [00:14] trying to centralise my mysql backups [00:14] always fun :| [00:20] Heh, I have an 18G user currently uploading to FOC at about 200kb/s [00:20] 36520460288 25% 175.22kB/s 165:24:30 [00:23] foc? [00:23] fos* [00:30] ...I never get names right. [00:38] how do you browse .tar files again on archive.org? [00:39] S[h]O[r]T: Tack a / at the end [00:39] e.g. http://archive.org/details/FileplanetFiles_150500-150999/ [00:42] worked, thanks [00:50] wfnx.com/robots.txt [00:50] User-agent: wget; disallow: / [00:50] lol [01:05] i have some webuser pdfs for the dark-magazine rack [01:06] some from 2005 and 2006 [01:06] full 2010 i think [01:19] mistym: was that recently added? [01:19] Coderjoe: Not sure? I looked before today. [01:19] Did someone else already grab it? [01:20] not I [01:20] Still needs grabbing then. [01:24] I assume the #archiveteam answer to "robots.txt" is not just to say "ok" [01:25] "FUCK ROBOTS.TXT" [01:25] wget: -e robots=no [01:25] i thought it was -e robots=off [01:26] though I did run across a site where the robots.txt was sane, and just blocked an automated-reports section of the site that becomes a crawler trap [01:26] godane: either [01:26] ok [01:26] thought i was doing it wrong [01:32] Coderjoe: Yeah, that's different - it makes sense that you wouldn't want to crawl that [01:33] I remember getting into an infinite loop on one side I tried downloading; they could have USED a robots.txt there [01:34] Dynamic calendar that doesn't care what date you put in, so you can endlessly follow "next month" links to the year 10000+ [01:35] We'll make sure to put it in a time capsule and seal it for 8000 years [02:34] mistym: the archiveteam answer to robots.txt is "thanks for the advice about crawler traps, we'll keep an eye on that" [02:34] you can quote me on that [02:40] Seattle (Reuters) -- Today Archive Team member "chronomex" was quoted as saying, "the archiveteam answer to robots.txt is "thanks for the advice about crawler traps, we'll keep an eye on that"". More on this story as it develops. [02:48] bbs was just on jeopardy :3 [02:48] also, no one knew the question [02:50] faiiil [02:50] also, video? [02:51] haha [02:52] watching it live, not sure where you could catch a copy [02:52] I would be smug as hell, getting that one correct and then losing the rest of the show by an embarrassing margin that would call into question the casting directors' ability to weed out rubes. [02:52] well, the playback anyway [02:52] I think i could happily not answer any of them, just smile [02:56] I always loved getting computer history questions in trivia contests, because so few people know it :) [02:57] Free points! [03:22] some used old CDs seem to have a giant unbalanced mass. [03:22] i should not read them anymore while someone is sleeping in the house [03:23] 1x [03:23] good point [03:23] :D [03:43] did you guys notice about dotster? [04:25] http://www.youtube.com/watch?v=8KvRe4hJyjU [04:30] we may need to backup digg.com [04:46] ehm ehm ehm [04:46] by the way: [04:46] year-to-date traffic stats on buttcoin.org [04:46] http://www.welp.im/ages/buttcoin-stats2.png [04:52] found a iso scene nfo collection v4 on underground-gamer [04:52] 3358 nfos [07:08] damn, underscor got a job at IA? [07:09] apparently [07:10] http://archive.org/details/archiveteam-splinder [07:10] so now how do I know if a user has uploaded his stuff or not? [07:11] alard, do you still have from the tracker the info of who's supposed to have downloaded what? [07:12] if you still have splinder, please do not delete it [07:13] contact me for a few commands to run against it [07:13] we lost a decent size chunk [07:14] chronomex, what did we lose? [07:14] and, did you upload the data you had in some way? [07:14] I guess now you should just tar it and add another archive.org item [07:14] one of the tars got truncated during final assembly [07:14] but we have a list of exactly what is missing [07:14] by SketchCow you mean? [07:15] and where is this list [07:15] yea [07:15] ok, I'll paste them in here [07:16] wget http://archive.org/download/archiveteam-splinder-00000017/00000017.txt -O splinder-missing.txt [07:16] cat splinder-missing.txt |cut -d/ -f2-6|uniq|sed -e 's,^,data/,' > splinder-missing-paths.txt [07:16] tar cvf splinder-for-sketchcow.tar -T splinder-missing-paths.txt [07:17] ?? [07:17] run those three commands in your splinder working directory (the one that contains data/) [07:18] and put the tar file somewhere good [07:18] didn't you already do so? [07:18] I did, you need to run it in your local splinder cache if you still have it [07:18] I asked you a list of the missing stuff, I don't have any Splinder data [07:19] ah [07:19] well, there's a list at the url on the first line [07:19] running against my splinder data [07:19] rad [07:20] how can that file contain the list? [07:20] got 10mb worth [07:20] the list of files was generated before creating the tar? [07:20] Nemo_bis: wtf are you saying? [07:20] I'm asking where the data got lost [07:21] or the tar was truncated after creation, uploaded in part and deleted locally [old problem]? [07:21] it was created by a broken script, uploaded without checking, and the sources were removed from fortress of solitude [07:22] a good amount of it is no doubt lost for good [07:22] so where does that list come from? [07:22] the list of files in the directory before it was deleted [07:22] Nemo_bis: that list is the original list of files that should have gone into the .tar [07:22] ok [07:23] about ummm I guess 10% actually made it [07:23] that's one of 26 final .tar files [07:25] make sense? [07:25] DFJustin: neat, care to share my way? [07:25] http://interbutt.com/misc/splinder-for-sketchcow.tar [07:26] this Splinder grab has been very very unlucky [07:26] so sad [07:27] cool, DFJustin, I get that with md5sum a4fd0a673c5b830c2748fa7822a8231b [07:27] correct [07:28] rad [07:28] I'll hold on to it and take care of it from here [07:28] Nemo_bis: I forget what else there was, care to remind me? [07:29] lots of users weren't downloaded because of the splinder_noconns when server was overloaded [07:29] (we never checked them to redownload them) [07:29] oh, yeah [07:29] many users with weird charactersin the domain weren't downloaded [07:29] several people never uploaded their stuff [07:30] downloaded users with some unescaped characters have not been uploaded and we never got to fix them [07:30] I think a lot of blame is squarely in splinder's court for having a janked up system that wasn't reliable [07:30] and now we lost 5% of the downloaded data [07:30] not reallt [07:30] very well [07:31] it was mostly bugs in our software and human errors [07:31] too much stuff going on I guess, to do everything correctly :) [07:31] also, when the deadline was moved the project was completely forgotten [07:32] we discovered that rsync doesn't work very well when you have files named with asterisks in [07:32] as I recall, it completely eats itself [07:36] it wasn't only that [07:42] I've updated http://archiveteam.org/index.php?title=Splinder [so depressing] [07:43] cheer up, the other 95% is safe [07:44] it's the 95 % percent of x % [07:44] with x still unknown [07:44] it's a metric shitton more than fuck-all [07:44] meh [07:44] we've had so much time to fix everything... [07:51] -------------------------------------------- [07:51] ok stealing noticeboard now [07:51] BEFORE YOUR DATA GETS LOST ON YOUR DRIVES TOO [07:51] PLEASE HELP FIX THE 17TH CHUNK DATA DESTRUCTION DISASTER [07:51] URGENT REQUEST FOR EVERYONE WHO HELPED DOWNLOAD SPLINDER SOME MONTHS AGO [07:51] http://archiveteam.org/index.php?title=Splinder#disasterfix [07:51] (ALSO, MAKE SURE YOU'VE UPLOADED EVERYTHING YOU HAD) [07:51] THANKS [07:51] (alard, Angra, anonymous, arima, asdf, bsmith093, cameron_d, closure, dashcloud, db48x, dnova, donbex, DoubleJ, hrbrmstr, Hydriz, kenneth, kennethreitz, Konklone, koon, marceloantonio1, ndurner, NotGLaDOS, Paradoks, pberry, PepsiMax, proub, rebiolca, sarpedon, sente, shoop, soultcer, spirit, tef, undercave, underscor, VMB, Wyatt, yipdw) [07:51] -------------------------------------------- [07:57] chronomex, were the other tars checked? [07:57] so far as I know they are good [07:58] I don't know what level of checking they got [07:58] http://archive.org/details/archiveteam-splinder-00000008 has no tar [07:59] odd. [07:59] could someone download the tars and check them against the file lists? [07:59] also, combine the lists to see how many users they contain [08:23] trying to get the list of Splinder users on IA from the combined files lists [08:23] $ sed -r "s/^SPLINDER\/[^/]{2}\/[^/]\/[^/]{2}\/[^/]{3}\/([^/]+)\/.+$/\1/g" list.txt | sort -u > users.txt [08:24] that'll be slow [08:24] * Nemo_bis shrugs [08:24] that's why I did just |cut -d/ -f2-6 [08:24] akes fem min at most [08:25] also cut is less typing :P [08:27] your advice came too late [08:27] asking or learning takes more time than DIY as you're able to [08:29] * chronomex shrugs [08:29] you're doing the work, you get to choose how it gets done [08:44] hmmmmmm [08:44] $ wc -l users.txt [08:44] 1164439 users.txt [08:56] http://archiveteam.org/index.php?title=Splinder&action=historysubmit&diff=7800&oldid=7799 [09:29] Nemo_bis: what happened to cause this? [09:43] Coderjoe, see wiki [09:44] o do you mean the 200k missing? users? [09:44] i see the wiki, but that doesn't say what happened [09:44] just "hey, we're missing shit. please do this to help us recover" [09:45] the tar didn't contain all files, tar was uploaded and files deleted, missing files are lost [09:45] "most of the data for the 17th item wasn't put in the tar and the files were deleted" [09:45] to me the wiki page seems to say it [09:46] oh [09:46] above the spot you linked [09:47] yes [10:12] Nemo_bis/Coderjoe: Here's the splinder tracker log, if that helps: http://db.tt/ZDTea9nG [10:49] alard, thanks [10:59] alard, how can I extract the list of users originally added to the tracker? [13:21] Morning. [13:25] What happened was I had a truncated .tarring and then I deleted the original later. [13:25] Totally my mistake, I feel terrible about it. [13:26] We did get a lot, we can't fall into the trap of hating all we didn't get or lost. [13:32] (Grabbing MASTER PENIS) (6150) [13:32] Oh, IUMA, you hosted all the best bands. [13:33] http://archive.org/details/iuma-master_penis [13:43] In terms of the Splinder analysis, I was going to do that. [13:44] I can download them quickly (being local after all) and do analysis. [13:44] Right now I am focused on blowing up IUMA and uploading Fortunecity. [13:46] Don't know how longer the splinder thing will take so I'll let it run overnight and see about uploading it in the morning [13:53] Excellent. [13:59] "Censored websites from the Internet (mostly science-related). Upon a request, whole websites may be downloaded from the Internet, undergo review and censorship, and be published on Kwangmyong" http://en.wikipedia.org/wiki/Kwangmyong_(network) [14:00] you go North Korea. Archive and censor that internet [14:09] I'm cranking IUMA up to 11, and Ive got the fortunecity merge running. [14:09] These merges are murder, but thanks to rsync, not worried they're being done wrong. [14:09] I know what I did wrong with the Splinder upload - won't happen again. [14:09] Always a learning process. [14:14] http://archive.org/details/iuma-pdsa [14:14] Now THAT's some UTF-8 encoding! [14:18] OK, with 15 processes jamming IUMA songs into archive.org, I think that's the limit of FOS for that process. [14:18] MobileMe is now smooth as butter - it just works. [14:21] SketchCow: do you ever sleep o_O4 [14:21] What? I slept plenty - today it was 2:30am EST to 8:30est [14:29] Nemo_bis: The usernames are in the log file. (In the "item" or "user" field of the JSON object.) [14:35] alard, ok [14:35] * Nemo_bis still has to understand how to extract them [14:47] MASTER_PENIS_-_SUCK_MY_DICK_CHAINEY [14:47] very inspiring [14:47] lol [14:50] Nemo_bis: with a JSON parser (python, ruby etc.) or bunzip2 -c splinder.log.bz2 | grep -ohE '"user":"[^"]+"' | cut -d '"' -f 4 [14:51] alard, thanks :) [14:52] * Nemo_bis had actually understood he needed to learn cut [14:52] it's like python re.split or whatnot [16:06] http://defacto2.wordpress.com/2012/05/15/mindcandy-volume-2-amiga-demos-has-sold-out-so-it-is-now-available-for-free-online/ [16:06] something for archive.org? [16:36] I expect it's there. [16:36] If not I'll grab it. [18:19] hm http://archive.org/details/uihucvj [18:19] mmm spammers [18:20] oh jeez. [18:20] i just realized... the varying types of spam emails are also internet cultural artifacts [18:21] "these are examples of the type of bullshit we had to wade through in our email" [18:31] sure [18:31] that's why I'm archiving lots of wikis with GBs of spam [18:31] dozens GB sometimes [18:31] which 7z reduces by 3-4 orders of magnitude [19:06] Fortunecity is now integrated into one directory (and one "doubles" directory.) [19:06] The doubles is a mere 1.3gb, which is not bad against a 1.2tb set. [19:24] SketchCow: i saved this: http://archive.org/details/TechTVSept2002 [19:25] took over 2 hours to upload [19:25] You're a hard worker with little encoragement, godane [19:26] that episode could only be found on usenet [19:26] and it was not index everywhere [19:29] still looking for techtv music wars [19:29] old edonkey link: ed2k://|file|TechTV_-_Music.Wars_-_9-12-2003..rm|364096025|44F4B72EBB8BA3F5740698149FD8AAF 8|/ [19:30] saidly it doesn't work anymore [19:30] it was hosted on razeback and the was sezied on feb 21 2006 [19:45] is that a valid link? I removed spaces in the hash [19:53] Fortunecity link being built. [19:57] n [19:57] sorry, wrong screen :) [20:22] how do you download from links like these: ftp://[2001:da8:b800:253:202:118:224:241]/ [20:22] thats ipv6 [20:22] is there way to translate to ip4 [20:22] my firefox can't open it [20:22] nah [20:23] you'll need a ipv6 tunnel broker [20:23] as I presume your ISP doesn't support ipv6 [20:28] is it so rare? [20:29] not super rare, but I can only name one "consumer isp" who supports it (and even then I think its patchy) [20:29] mine supports it and it's the second biggest ISP in Italy [20:30] unless I'm missing somethng [20:30] in the UK the two big major players don't [20:30] and they supply pretty much everyone else so... [20:48] i found something you may want to mirror [20:48] http://archive.org/details/iuma-pdsa <-- most of the mp3s are 0 bytes, utf-8 fuckery? ;) [20:48] ftp://211.99.140.238/ [20:48] there is a lot of chinese videos on there [20:49] there are 56k, 300k, 1.5mb versions with some of them [21:19] archive.org doesn't have this: http://geocities.com/techtvvids [21:19] :-( [21:22] godane: I have a box with ipv6, what do you need? [21:24] i was trying to get access to these files: http://www.goobye.net/ftphtml/v6_2001_da8_b800_253_202_118_224_241/EF778C6C597A1B2113560AEC2CA21B51_1.shtml [21:25] it the only copy of webuser issue 224 that i have found that may not be a dead link [21:25] * chronomex investigates [21:26] do you know the path it should be in on the ftp server? [21:26] ah [21:26] Warez/0day/200910/1007/Webuser.Issue.224.October.08.2009.Retail.Ebook-ATTiCA/acwu224a.zip [21:28] sweet, it exists [21:28] wget'ing now [21:30] some pirate part of me is yelling to mirror site [21:30] http://gir.seattlewireless.net/~chronomex/crap/webuser.tar [21:30] some pirate part of me is agreeing [21:32] they have some weird shit here [21:32] agilent genespring? [21:32] there is like school videos are on the sites [21:32] for bioinformatics pathway analysis [21:33] o_O [21:33] some secret undercover US reseach lab? :D [21:33] lol [21:33] more like china reseach lab [21:33] well I was gonna say that.... [21:34] but they have eyes.. [21:34] this is just the weirdest collection [21:35] ... [21:35] right next to angry birds [21:35] and globe and mail, 2012-01-06, alberta edition (ebook) [21:36] request only stuff? [21:36] it most be the piratebay/research labs of china [21:36] possible, SmileyG [21:36] godane: is "research labs" a pirate site I haven't heard of? [21:37] :D [21:38] godane: anyway, you're doing a really good job of chasing these things down [21:39] i dream of a time when I don't have 4 pcs to fix at home and my network is sweet, and I can spend time working on this stuff like you guys :/ [21:40] heh, you think I have time [21:40] no, but I really don't have time ;D [21:41] [21:41] ah, so that's what WBM stands for [21:41] :O [21:42] holy crap [21:42] simple minds in laterial efforts yank grands [21:42] this webuser has a index [21:43] like I wanna setup ipv6 tunnel but not had a chance yet to look at IPv6 iptabling... [21:43] SmileyG: I only have v6 because Linode provides it [21:43] ah nice :) [21:44] Once I figure out ip6tables, I'll re-enable my tunnel via he.net and be happy ;D [21:45] godane: hmmm? [21:46] the pdf was different in that there was a content of tables [21:46] ah [21:46] perhaps the pirates were being useful :P [21:46] I mean more than usual [21:46] lol [21:47] one of the servers has movies [21:48] there "it's a Wonderful Life" Colorized version thats 720p [21:48] shouldn't wonderful life be in public domain? [21:48] these sorts of sites seem to collect the most wonderful digital flotsam [21:49] godane: the film probably is, but the colorization probably wouldn't be [21:49] er, no, I don't see why the film would be [21:49] 1946 [21:50] 75 years now, no? [21:50] A clerical error at NTA prevented the copyright from being renewed properly in 1974.[55][56] Despite the lapsed copyright, television stations that aired it still were required to pay royalties. Although the film's images had entered the public domain, the film's story was still protected by virtue of it being a derivative work of the published story "The Greatest Gift", whose copyright was properly renewed by Philip Van Doren Stern in 19 [21:51] o_O [21:51] even weird]er [21:51] In 1993, Republic Pictures, which was the successor to NTA, relied on the 1990 U.S. Supreme Court ruling in Stewart v. Abend (which involved another Stewart film, Rear Window) to enforce its claim to the copyright. While the film's copyright had not been renewed, Republic still owned the original film elements, the music score, and the film rights to "The Greatest Gift"; thus the plaintiffs were able to argue its status as a derivative wo [21:52] anyway, a colorization is probably a derivative creative work, and so has copyright of its own [21:52] yes, copyright is fucked up. [21:53] when pasting long things, it would be nice if you weren't stopped by candleja [21:53] candleja? [21:54] if you say candlejack's name, he will suddenly abduct y [21:54] what the fuck [21:56] oh. [22:02] that site has to be in the TBs [22:03] ftp://221.208.245.212/091230/%C2%BD%C2%A1%C2%BF%C2%B5%C3%89%C3%BA%C2%BB%C3%AE/100105/ [22:04] darn it [22:08] probably expects paths to be retrieved in GB2312 instead of UTF-8 [22:11] ftp://221.208.245.212/091230/%BD%A1%BF%B5%C9%FA%BB%EE/100105/ [22:11] Coderjoe: I've always wondered why irc clients don't deal with the protocol's size limit by splitting messages [22:11] some do [22:11] like xchat [22:11] and probably mirc [22:12] mirc doesn't [22:12] let's try irssi [22:12] Some of the prefixes formerly used in the metric system have fallen into disuse and were not adopted into the SI. The prefix myria-, ten thousand,[6][7] denoting a factor of 10000, originated from the Greek μύριοι (mýrioi), that is, myriad, for ten thousand, and the prefixes demi- and double-, denoting a factors of 1⁄2 and 2, respectively,[8] were parts of the original metric system adopted by France in 1795. Thes [22:13] that looked like one big line here [22:13] how about out there in ircland? [22:13] looked like one truncated big line here [22:14] then irssi also fails [22:15] epic4 also doesn't work. I suspect epic5 also fails [22:15] if xchat does, that's anther reason to switch back to it, but last I tried it, it cannot connect to multiple irc networks [22:15] it can [22:16] same time? [22:16] it has been able to for like 10 years [22:16] yes [22:16] huh, I looked pretty hard and didn't see how iirc [22:16] before my hard drive failed, I had xchat connecting to like 10 networks at the same time [22:17] how old was your hard drive [22:18] 1962 [22:18] 1792 [22:18] no, more like 1492 [22:18] 8000BC [22:19] i dunno. 8 years or so [22:22] must have had a brain freeze, it certianly done [22:23] how do i get ipv6 to work? [22:23] tunnel? [22:23] without tunnel [22:23] time machine? million dollars? [22:24] i would think comcast supports ipv6 by now [22:24] when it was finally failing, it started out reading at 22kB/s. then it was reading fine. then it stopped and after a power cycle just clicks and clicks [22:24] if they do and your router supports it, your OS should autonegotiate an address [22:27] found another webuser magazine: http://www.goobye.net/search?business=ftpsrc&act=search&curPage=4&q=webuser [22:28] from april 14 2005 [22:32] one of the sites had 384gb of NHK [22:42] I just checked. The PDSA entry only has 0-length mp3s. [22:42] That's the source of THAT issue - just checked. [22:45] well ain't that fucked [22:50] founded a chinese bbs [22:50] https://bbs.sjtu.edu.cn/file/bbs/index/index.htm [22:52] email for you SketchCow [23:08] godane: that's an interesting index [23:08] pretty cool [23:08] :D [23:08] I think we should mirror them all [23:11] since its a bbs from 1996 to now [23:11] start with old stuff [23:12] then work your way up [23:12] since content will still be added for years to come by my guest [23:14] chronomex: can you look at this: ftp://[2001:da8:215:4020:204:23ff:feb8:e020] [23:14] there is a webuser magazine from 2005 on it [23:14] sure, where? [23:15] pub/Documents/so-many-notsorted/new/220.168.7.12:10021Webuser.Magazine.Apr.14.2005.eBook\-LinG [23:15] how i got it: http://www.goobye.net/search?q=webuser+2005&business=ftpsrc&act=search [23:16] Oh, this is cute: http://projects.radiusic.com/koorogi/2007/07/04/finishing-torrents-over-http/ [23:18] got it: http://gir.seattlewireless.net/~chronomex/crap/webuser-2005-04-14.tar [23:23] Wyatt: spiffy [23:24] another "archive" site: http://dumpz.ru/ [23:28] look like archive_king was tooked [23:41] there is also m2v.ru [23:41] I really should learn russian [23:42] i want to get this: http://www.m2v.ru/?id=8275&func=sub&name=Webuser.Issue.Mar.03.2005.PDF.eBook-LiB [23:43] but there is no links [23:47] huh [23:47] it may have most of webuser from 2005 [23:52] found this too: http://www.5isucai.com/ftpsearch/?showftp=33&action=browse&path=%2F2005%2F01%2F0112%2FWebuser.Issue.100.Jan.06.2005.PDF.eBook-LinG/ [23:55] what the hell does this mean: NO:2 0day-www.5isucai.com-2002-2011]ftp:// 127.0.0.1/2005/01/0112/Webuser.Issue.100.Jan.06.2005.PDF.eBook-LinG/ [23:55] 127.0.0.1, eh hahaha