" [15:47] <shaqfu> Example: http://ia601207.us.archive.org/tarview.php?tar=/12/items/FileplanetFiles_00000-09999/00000-09999.tar&file=00000-09999/www.fileplanet.com/1467/download/index.html [15:51] <Schbirid> "Home / Gaming / Action / First Person / Quake Series / Quake / Maps / Deathmatch" was all the metadata i am after [15:51] <shaqfu> No interest in /fileinfo? [15:51] <Schbirid> i agree that the fileinfo pagse would be handy but i found not sane way to get them [15:52] <shaqfu> Schbirid: sitemaps, with the tags stripped out, are a big list of all of them :) [15:52] <shaqfu> Or you could brute force them [15:53] <Schbirid> oooh http://www.fileplanet.com/robots.txt [15:53] <Schbirid> excellent [15:53] <Schbirid> thanks [15:54] <shaqfu> I was looking to use the sitemaps to build a list of IDs to grab, but they did the hard work for me :) [15:55] <Schbirid> nice, we can get the whole size from that too then [15:55] <Schbirid> gotta go [15:56] <shaqfu> Alright [15:56] <Schbirid> will grab those tomorrow [15:58] <shaqfu> AWesome [16:46] <SketchCow> your.com is not enjoying my downloading. [16:49] <Nemo_bis> bad boy [16:55] <SketchCow> We're probably talking 3-4 days of downloading. [17:40] <emijrp> who knows how to archive tweets by hashtag? [17:42] <yipdw> well, first approach: suck down a shitload of tweets, scan for #\w+ [17:42] <yipdw> I'm sure we can work Hadoop into this [17:42] <yipdw> ooh [17:42] <yipdw> closure, SketchCow: op spread [17:45] <mistym> emijrp: It's pretty easy. Twitter's API is actually pretty well thought through (despite the weirdness with the search API) [17:47] <mistym> What're you looking to archive? [17:50] <emijrp> a hashtag that is going to be used a lot in the next days [17:51] <emijrp> also user accounts http://code.activestate.com/recipes/576594-backupdownload-your-tweets-or-anyones-tweets/ [17:54] <mistym> That example won't work anymore, since they were screenscraping instead of using the Twitter API. But it's not hard to archive your own tweets, yeah - should take roughly as many lines of code. [17:54] <mistym> (e.g. they were screenscraping and the API changed in the meantime.) [17:55] <mistym> (The design changed, even) [17:55] <emijrp> it works for me [17:55] <mistym> Never mind then, I was wrong! [17:55] <mistym> Good it works. Not the 100% best solution, but at the end of the day if you get your tweets it's what matters. [18:00] <mistym> Hootsuite has an archiving tool you can set to grab tweets on hashtag, then export to CSV. Probably the easiest solution if you don't want to roll your own. [18:01] <mistym> Oh, never mind. They charge money for it? That's pretty silly. [18:04] <emijrp> im will make my own approach, without beautiful soup [18:04] <emijrp> beautiful soup is for gays [18:06] <mistym> Can we tone the homophobia down a little? :V [18:06] <yipdw> well, I guess that means that I dress well and get along well with women [18:06] <yipdw> I'll take that [18:06] <mistym> While I am not a pythonista, I am 9000% sure that there is a convenience API wrapper library available for Python. (If you even need it, the API is so simple you're probably fine with just a json parser.) [18:07] <emijrp> api needs keys? [18:07] <mistym> Only for write functions, you can search and retrieve keys all you want with no API key. [18:08] <mistym> *search and retrieve tweets [18:10] <emijrp> and request frequency limits? [18:12] <mistym> There's an hourly API limit, unauthenticated users get 150 requests per hour. [18:12] <mistym> The docs are on Twitter's site: https://dev.twitter.com/docs/rate-limiting [18:13] <yipdw> if you want to latch onto *all* tweets, you may also want to investigte their streaming API [18:13] <yipdw> https://dev.twitter.com/docs/streaming-api [18:14] <yipdw> I don't think that they just let anyone use that, though [18:16] <yipdw> there's also some severe rate limits on the streaming API [18:16] <yipdw> not in terms of requests/sec, but more like "You will not be served more than 1% of all public tweets matching your request" [18:18] * mistym needs to grab some lunch [18:26] <Coderjoe> unless you pay stanley to be able to drink from the firehose [18:29] <Coderjoe> hmm [18:29] <Coderjoe> statuses/sample [18:29] <Coderjoe> Returns a random sample of all public statuses. The default access level, âSpritzerâ provides a small proportion of the Firehose, very roughly, 1% of all public statuses. [18:29] <Coderjoe> I don't see such text on the statuses/filter method [18:31] <Coderjoe> mmm [18:31] <Coderjoe> https://dev.twitter.com/docs/twitter-data-providers [18:37] <SketchCow> https://twitter.com/#!/TheRealNimoy/status/200287158299406336 is pretty awesome [18:45] <yipdw> Coderjoe: that list of data providers just made me irrationally happy to not be in the twitter cogwork [19:43] <SketchCow> http://scottbeale.org/2012/05/09/jason-scott-of-textfiles/ [19:47] <Schbirid> totally nws but that gentleman has some resemblance with SketchCow http://www.imagebam.com/image/906de5189359872 [20:10] <shaqfu> Schbirid: FP still chugging along? [20:26] <Coderjoe> mmm [20:26] <Coderjoe> some websites have awesome color choices [20:26] <Coderjoe> such as black on dark blue [21:14] <SketchCow> http://ascii.textfiles.com/archives/3559 Javascript Hero. I'm announcing generally that #jsmess in EFnet could use your attention and help. [23:23] <dashcloud> if you'd like to see a website time forgot, check out epicclassics.com -it's pretty much the exact same site now as when it launched in 1999 [23:49] <mistym> dashcloud: I like that they haven't fixed the order page that broke ~4-5 years ago [23:50] <dashcloud> actually, it still works- I placed an order for nearly everything on the page, and got a box with the correct items (plus, I accidentally sent too much money, so they sent me a refund of that+a copy of Jazz 2) [23:51] <dashcloud> for those that haven't been there, that means printing out the document, writing a check, and mailing it all to the address listed, then waiting a couple of weeks for a box in the mail [23:59] <mistym> Oh, I see - the top order form works, I was clicking on the image

[01:42] folks, if anyone here has a twitter account, there was a huge breach: 55k accounts+password hash or actual password were leaked: http://www.airdemon.net/hacker107.html [01:43] yay im not there [01:43] Thought that was just some spammer's account that got compromised. Mostly fake accounts. [01:44] yeah probably, lol [01:44] By "spammer's account" I mean their list [01:45] well, at least one list has email accounts+actual clear text passwords, which is disturbing on a number of levels [01:49] I really wouldn't be surprised if a spammer didn't care enough to salt and hash the passwords of his list. [01:50] I followed some of the discussion from here: http://news.ycombinator.com/item?id=3945410 [01:53] interesting [05:26] http://www.theblaze.com/stories/great-leap-backwards-why-did-wikipedia-scrub-a-year-a-half-old-article-titled-forward-generic-name-of-socialist-publications/ [05:26] looks like wikipeida maybe removing forward from it [05:31] (Best $20 you'll ever spend.) [05:31] That would definitely be for the best if you could do that. He'll learn faster and better because you don't have to go slow for the "learning-challenged" kids in the classroom, and he won't have libtards filling his head with pro-Nigger propaganda six hours a day. That crap can rapidly turn him into someone you won't even recognize. I don't think most parents have any idea what those monsters are "teaching" their kids. [05:31] You can also sit him down in front of a TV for an hour with the promise of a quarter for every pro-nigger or pro-multiculturalist reference he can point out. Make him aware of the way they try to manipulate us. [06:57] PooSkin: .. what [07:24]

New patchset: ArielGlenn; "create tarballs of media uploaded locally and remote (to commons) per wiki" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/7034 [08:49] Want Free Prizes? Click Here! ->> http://quickprize.info/?ref=11509 [11:14] spot the mistake :((( 55000-59999.tar -> s3://FileplanetFiles_50000-559999/55000-59999.tar [11:20] Extra FileplanetFiles_50000-559999 [11:21] nah, 55000-59999 is correct, i named the item wrong [11:23] Do note my baldness [11:23] I mean, boldness. [11:25] oops [13:52] Someone has sent me a thread, an epic thread. [13:54] Basically, some project added a new version, and it had a bug. [13:54] .....it would delete /usr [13:54] Quite a bug [13:54] Imagine the thread [13:54] Now you don't have to. [13:54] Ah, I know this project :D It's 'bumblbee' :D [13:54] or whatever the nvidia optimus/graphic card switcheroo software is called. I think it was bumblebee >_> [13:55] http://sudokode.net/~tim/bumblebee/ [13:55] http://sudokode.net/~tim/bumblebee/index_files/687474703a2f2f692e696d6775722e636f6d2f6f387731612e6a7067.jpe [13:55] Aw, he's deleted the original at Github/MrEEEE/bumblebee/ :( [13:56] heh, i remember that [13:56] and now i own a netbook and run bumblebee on it [13:56] I installed it once, I remember never upgrading it :D [13:56] because I had seen that issue on the tracker [13:56] http://sudokode.net/~tim/bumblebee/index_files/687474703a2f2f746d702e6b72616c2e686b2f622e6275672e7370696465.png [13:57] Good fun [14:00] http://sudokode.net/~tim/bumblebee/index_files/687474703a2f2f692e696d6775722e636f6d2f44674454572e676966.gif [14:04] next fileplanet chunk is up http://archive.org/details/FileplanetFiles_50000-559999 [14:04] i hope the typo is no problem [14:04] i guess it can be renamed when moved into a collection later [14:09] http://archive.org/details/archiveteam-bumblebee-usr-thread [14:09] At the end, talk to me about us setting it up for permanent-permanent. [14:09] aye [14:10] SketchCow, I really enjoyed that thread back then :D [14:11] Yeah, he was a smart guy to archive it. [14:16] Schbirid: Is it just you downloading fileplanet? [14:16] so far yes :\ [14:17] Start a channel, get some people. [14:17] Assign chunks. [14:17] How big are the chunks [14:17] growing bigger, no idea how big. at the moment it is ~25GB [14:17] per chunk [14:18] trying to think of a good channel name [14:19] fileplanot [14:19] fryplanet [14:19] fuckign :) [14:20] #fryplanet. [14:20] No [14:20] Wait [14:20] #fireplanet [14:20] ha [14:20] that is good [14:21] alright, let me write a proper guide how to help [14:23] i am not sure if making the TARring part of the script, to me it would be annoying since i would want to do that myself [14:23] When's shutdown? [14:25] not announced [14:25] http://www.gamefront.com/breaking-ign-to-close-fileplanet/ [14:25] it is nothing TOO panicky [14:33] Well, it's worth going after now. [14:35] Blargh, gonna be busy tonight after work as well T_ T [14:35] was thinking of starting crawling http://resdagboken.se, to archive it ASAP since it's closing down next month [14:36] Get the project page up on the wiki [14:37] cant register [14:37] Oooo yes [14:37] Someone find me a good set of plugin items [14:37] And we can begin working on getting registration working. [14:38] shaqfu_: you offered your account. yes, please :) [14:38] I got an ATwiki account, so I'll go a head and create a project page~ [14:39] there is one already [14:39] http://archiveteam.org/index.php?title=Fileplanet [14:39] just wrote http://www.quaddicted.com/forum/viewtopic.php?pid=257#p257 [15:45] Schbirid: Bad news - you're not grabbing the HTML page with the metadata on it :( [15:45] I checked via tar viewer, and the download page just says "thank you for downloading " [15:47] <shaqfu> Example: http://ia601207.us.archive.org/tarview.php?tar=/12/items/FileplanetFiles_00000-09999/00000-09999.tar&file=00000-09999/www.fileplanet.com/1467/download/index.html [15:51] <Schbirid> "Home / Gaming / Action / First Person / Quake Series / Quake / Maps / Deathmatch" was all the metadata i am after [15:51] <shaqfu> No interest in /fileinfo? [15:51] <Schbirid> i agree that the fileinfo pagse would be handy but i found not sane way to get them [15:52] <shaqfu> Schbirid: sitemaps, with the tags stripped out, are a big list of all of them :) [15:52] <shaqfu> Or you could brute force them [15:53] <Schbirid> oooh http://www.fileplanet.com/robots.txt [15:53] <Schbirid> excellent [15:53] <Schbirid> thanks [15:54] <shaqfu> I was looking to use the sitemaps to build a list of IDs to grab, but they did the hard work for me :) [15:55] <Schbirid> nice, we can get the whole size from that too then [15:55] <Schbirid> gotta go [15:56] <shaqfu> Alright [15:56] <Schbirid> will grab those tomorrow [15:58] <shaqfu> AWesome [16:46] <SketchCow> your.com is not enjoying my downloading. [16:49] <Nemo_bis> bad boy [16:55] <SketchCow> We're probably talking 3-4 days of downloading. [17:40] <emijrp> who knows how to archive tweets by hashtag? [17:42] <yipdw> well, first approach: suck down a shitload of tweets, scan for #\w+ [17:42] <yipdw> I'm sure we can work Hadoop into this [17:42] <yipdw> ooh [17:42] <yipdw> closure, SketchCow: op spread [17:45] <mistym> emijrp: It's pretty easy. Twitter's API is actually pretty well thought through (despite the weirdness with the search API) [17:47] <mistym> What're you looking to archive? [17:50] <emijrp> a hashtag that is going to be used a lot in the next days [17:51] <emijrp> also user accounts http://code.activestate.com/recipes/576594-backupdownload-your-tweets-or-anyones-tweets/ [17:54] <mistym> That example won't work anymore, since they were screenscraping instead of using the Twitter API. But it's not hard to archive your own tweets, yeah - should take roughly as many lines of code. [17:54] <mistym> (e.g. they were screenscraping and the API changed in the meantime.) [17:55] <mistym> (The design changed, even) [17:55] <emijrp> it works for me [17:55] <mistym> Never mind then, I was wrong! [17:55] <mistym> Good it works. Not the 100% best solution, but at the end of the day if you get your tweets it's what matters. [18:00] <mistym> Hootsuite has an archiving tool you can set to grab tweets on hashtag, then export to CSV. Probably the easiest solution if you don't want to roll your own. [18:01] <mistym> Oh, never mind. They charge money for it? That's pretty silly. [18:04] <emijrp> im will make my own approach, without beautiful soup [18:04] <emijrp> beautiful soup is for gays [18:06] <mistym> Can we tone the homophobia down a little? :V [18:06] <yipdw> well, I guess that means that I dress well and get along well with women [18:06] <yipdw> I'll take that [18:06] <mistym> While I am not a pythonista, I am 9000% sure that there is a convenience API wrapper library available for Python. (If you even need it, the API is so simple you're probably fine with just a json parser.) [18:07] <emijrp> api needs keys? [18:07] <mistym> Only for write functions, you can search and retrieve keys all you want with no API key. [18:08] <mistym> *search and retrieve tweets [18:10] <emijrp> and request frequency limits? [18:12] <mistym> There's an hourly API limit, unauthenticated users get 150 requests per hour. [18:12] <mistym> The docs are on Twitter's site: https://dev.twitter.com/docs/rate-limiting [18:13] <yipdw> if you want to latch onto *all* tweets, you may also want to investigte their streaming API [18:13] <yipdw> https://dev.twitter.com/docs/streaming-api [18:14] <yipdw> I don't think that they just let anyone use that, though [18:16] <yipdw> there's also some severe rate limits on the streaming API [18:16] <yipdw> not in terms of requests/sec, but more like "You will not be served more than 1% of all public tweets matching your request" [18:18] * mistym needs to grab some lunch [18:26] <Coderjoe> unless you pay stanley to be able to drink from the firehose [18:29] <Coderjoe> hmm [18:29] <Coderjoe> statuses/sample [18:29] <Coderjoe> Returns a random sample of all public statuses. The default access level, âSpritzerâ provides a small proportion of the Firehose, very roughly, 1% of all public statuses. [18:29] <Coderjoe> I don't see such text on the statuses/filter method [18:31] <Coderjoe> mmm [18:31] <Coderjoe> https://dev.twitter.com/docs/twitter-data-providers [18:37] <SketchCow> https://twitter.com/#!/TheRealNimoy/status/200287158299406336 is pretty awesome [18:45] <yipdw> Coderjoe: that list of data providers just made me irrationally happy to not be in the twitter cogwork [19:43] <SketchCow> http://scottbeale.org/2012/05/09/jason-scott-of-textfiles/ [19:47] <Schbirid> totally nws but that gentleman has some resemblance with SketchCow http://www.imagebam.com/image/906de5189359872 [20:10] <shaqfu> Schbirid: FP still chugging along? [20:26] <Coderjoe> mmm [20:26] <Coderjoe> some websites have awesome color choices [20:26] <Coderjoe> such as black on dark blue [21:14] <SketchCow> http://ascii.textfiles.com/archives/3559 Javascript Hero. I'm announcing generally that #jsmess in EFnet could use your attention and help. [23:23] <dashcloud> if you'd like to see a website time forgot, check out epicclassics.com -it's pretty much the exact same site now as when it launched in 1999 [23:49] <mistym> dashcloud: I like that they haven't fixed the order page that broke ~4-5 years ago [23:50] <dashcloud> actually, it still works- I placed an order for nearly everything on the page, and got a box with the correct items (plus, I accidentally sent too much money, so they sent me a refund of that+a copy of Jazz 2) [23:51] <dashcloud> for those that haven't been there, that means printing out the document, writing a check, and mailing it all to the address listed, then waiting a couple of weeks for a box in the mail [23:59] <mistym> Oh, I see - the top order form works, I was clicking on the image