#archiveteam 2012-05-09,Wed

↑back Search

Time Nickname Message
01:42 πŸ”— dashcloud folks, if anyone here has a twitter account, there was a huge breach: 55k accounts+password hash or actual password were leaked: http://www.airdemon.net/hacker107.html
01:43 πŸ”— oli yay im not there
01:43 πŸ”— aggro Thought that was just some spammer's account that got compromised. Mostly fake accounts.
01:44 πŸ”— oli yeah probably, lol
01:44 πŸ”— aggro By "spammer's account" I mean their list
01:45 πŸ”— dashcloud well, at least one list has email accounts+actual clear text passwords, which is disturbing on a number of levels
01:49 πŸ”— aggro I really wouldn't be surprised if a spammer didn't care enough to salt and hash the passwords of his list.
01:50 πŸ”— aggro I followed some of the discussion from here: http://news.ycombinator.com/item?id=3945410
01:53 πŸ”— dashcloud interesting
05:26 πŸ”— godane http://www.theblaze.com/stories/great-leap-backwards-why-did-wikipedia-scrub-a-year-a-half-old-article-titled-forward-generic-name-of-socialist-publications/
05:26 πŸ”— godane looks like wikipeida maybe removing forward from it
05:31 πŸ”— PooSkin (Best $20 you'll ever spend.)
05:31 πŸ”— PooSkin That would definitely be for the best if you could do that. He'll learn faster and better because you don't have to go slow for the "learning-challenged" kids in the classroom, and he won't have libtards filling his head with pro-Nigger propaganda six hours a day. That crap can rapidly turn him into someone you won't even recognize. I don't think most parents have any idea what those monsters are "teaching" their kids.
05:31 πŸ”— PooSkin You can also sit him down in front of a TV for an hour with the promise of a quarter for every pro-nigger or pro-multiculturalist reference he can point out. Make him aware of the way they try to manipulate us.
06:57 πŸ”— ersi PooSkin: .. what
07:24 πŸ”— Nemo_bis <gerrit-wm> New patchset: ArielGlenn; "create tarballs of media uploaded locally and remote (to commons) per wiki" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/7034
08:49 πŸ”— mib_rpa6q Want Free Prizes? Click Here! ->> http://quickprize.info/?ref=11509
11:14 πŸ”— Schbirid spot the mistake :((( 55000-59999.tar -> s3://FileplanetFiles_50000-559999/55000-59999.tar
11:20 πŸ”— ersi Extra FileplanetFiles_50000-559999
11:21 πŸ”— Schbirid nah, 55000-59999 is correct, i named the item wrong
11:23 πŸ”— ersi Do note my baldness
11:23 πŸ”— ersi I mean, boldness.
11:25 πŸ”— Schbirid oops
13:52 πŸ”— SketchCow Someone has sent me a thread, an epic thread.
13:54 πŸ”— SketchCow Basically, some project added a new version, and it had a bug.
13:54 πŸ”— SketchCow .....it would delete /usr
13:54 πŸ”— SketchCow Quite a bug
13:54 πŸ”— SketchCow Imagine the thread
13:54 πŸ”— SketchCow Now you don't have to.
13:54 πŸ”— ersi Ah, I know this project :D It's 'bumblbee' :D
13:54 πŸ”— ersi or whatever the nvidia optimus/graphic card switcheroo software is called. I think it was bumblebee >_>
13:55 πŸ”— SketchCow http://sudokode.net/~tim/bumblebee/
13:55 πŸ”— SketchCow http://sudokode.net/~tim/bumblebee/index_files/687474703a2f2f692e696d6775722e636f6d2f6f387731612e6a7067.jpe
13:55 πŸ”— ersi Aw, he's deleted the original at Github/MrEEEE/bumblebee/ :(
13:56 πŸ”— Schbirid heh, i remember that
13:56 πŸ”— Schbirid and now i own a netbook and run bumblebee on it
13:56 πŸ”— ersi I installed it once, I remember never upgrading it :D
13:56 πŸ”— ersi because I had seen that issue on the tracker
13:56 πŸ”— SketchCow http://sudokode.net/~tim/bumblebee/index_files/687474703a2f2f746d702e6b72616c2e686b2f622e6275672e7370696465.png
13:57 πŸ”— ersi Good fun
14:00 πŸ”— SketchCow http://sudokode.net/~tim/bumblebee/index_files/687474703a2f2f692e696d6775722e636f6d2f44674454572e676966.gif
14:04 πŸ”— Schbirid next fileplanet chunk is up http://archive.org/details/FileplanetFiles_50000-559999
14:04 πŸ”— Schbirid i hope the typo is no problem
14:04 πŸ”— Schbirid i guess it can be renamed when moved into a collection later
14:09 πŸ”— SketchCow http://archive.org/details/archiveteam-bumblebee-usr-thread
14:09 πŸ”— SketchCow At the end, talk to me about us setting it up for permanent-permanent.
14:09 πŸ”— Schbirid aye
14:10 πŸ”— Nemo_bis SketchCow, I really enjoyed that thread back then :D
14:11 πŸ”— SketchCow Yeah, he was a smart guy to archive it.
14:16 πŸ”— SketchCow Schbirid: Is it just you downloading fileplanet?
14:16 πŸ”— Schbirid so far yes :\
14:17 πŸ”— SketchCow Start a channel, get some people.
14:17 πŸ”— SketchCow Assign chunks.
14:17 πŸ”— SketchCow How big are the chunks
14:17 πŸ”— Schbirid growing bigger, no idea how big. at the moment it is ~25GB
14:17 πŸ”— Schbirid per chunk
14:18 πŸ”— Schbirid trying to think of a good channel name
14:19 πŸ”— Schbirid fileplanot
14:19 πŸ”— Schbirid fryplanet
14:19 πŸ”— Schbirid fuckign :)
14:20 πŸ”— SketchCow #fryplanet.
14:20 πŸ”— SketchCow No
14:20 πŸ”— SketchCow Wait
14:20 πŸ”— SketchCow #fireplanet
14:20 πŸ”— Schbirid ha
14:20 πŸ”— Schbirid that is good
14:21 πŸ”— Schbirid alright, let me write a proper guide how to help
14:23 πŸ”— Schbirid i am not sure if making the TARring part of the script, to me it would be annoying since i would want to do that myself
14:23 πŸ”— SketchCow When's shutdown?
14:25 πŸ”— Schbirid not announced
14:25 πŸ”— Schbirid http://www.gamefront.com/breaking-ign-to-close-fileplanet/
14:25 πŸ”— Schbirid it is nothing TOO panicky
14:33 πŸ”— SketchCow Well, it's worth going after now.
14:35 πŸ”— ersi Blargh, gonna be busy tonight after work as well T_ T
14:35 πŸ”— ersi was thinking of starting crawling http://resdagboken.se, to archive it ASAP since it's closing down next month
14:36 πŸ”— SketchCow Get the project page up on the wiki
14:37 πŸ”— Schbirid cant register
14:37 πŸ”— SketchCow Oooo yes
14:37 πŸ”— SketchCow Someone find me a good set of plugin items
14:37 πŸ”— SketchCow And we can begin working on getting registration working.
14:38 πŸ”— Schbirid shaqfu_: you offered your account. yes, please :)
14:38 πŸ”— ersi I got an ATwiki account, so I'll go a head and create a project page~
14:39 πŸ”— Schbirid there is one already
14:39 πŸ”— Schbirid http://archiveteam.org/index.php?title=Fileplanet
14:39 πŸ”— Schbirid just wrote http://www.quaddicted.com/forum/viewtopic.php?pid=257#p257
15:45 πŸ”— shaqfu_ Schbirid: Bad news - you're not grabbing the HTML page with the metadata on it :(
15:45 πŸ”— shaqfu_ I checked via tar viewer, and the download page just says "thank you for downloading <title>"
15:47 πŸ”— shaqfu Example: http://ia601207.us.archive.org/tarview.php?tar=/12/items/FileplanetFiles_00000-09999/00000-09999.tar&file=00000-09999/www.fileplanet.com/1467/download/index.html
15:51 πŸ”— Schbirid "Home / Gaming / Action / First Person / Quake Series / Quake / Maps / Deathmatch" was all the metadata i am after
15:51 πŸ”— shaqfu No interest in /fileinfo?
15:51 πŸ”— Schbirid i agree that the fileinfo pagse would be handy but i found not sane way to get them
15:52 πŸ”— shaqfu Schbirid: sitemaps, with the tags stripped out, are a big list of all of them :)
15:52 πŸ”— shaqfu Or you could brute force them
15:53 πŸ”— Schbirid oooh http://www.fileplanet.com/robots.txt
15:53 πŸ”— Schbirid excellent
15:53 πŸ”— Schbirid thanks
15:54 πŸ”— shaqfu I was looking to use the sitemaps to build a list of IDs to grab, but they did the hard work for me :)
15:55 πŸ”— Schbirid nice, we can get the whole size from that too then
15:55 πŸ”— Schbirid gotta go
15:56 πŸ”— shaqfu Alright
15:56 πŸ”— Schbirid will grab those tomorrow
15:58 πŸ”— shaqfu AWesome
16:46 πŸ”— SketchCow your.com is not enjoying my downloading.
16:49 πŸ”— Nemo_bis bad boy
16:55 πŸ”— SketchCow We're probably talking 3-4 days of downloading.
17:40 πŸ”— emijrp who knows how to archive tweets by hashtag?
17:42 πŸ”— yipdw well, first approach: suck down a shitload of tweets, scan for #\w+
17:42 πŸ”— yipdw I'm sure we can work Hadoop into this
17:42 πŸ”— yipdw ooh
17:42 πŸ”— yipdw closure, SketchCow: op spread
17:45 πŸ”— mistym emijrp: It's pretty easy. Twitter's API is actually pretty well thought through (despite the weirdness with the search API)
17:47 πŸ”— mistym What're you looking to archive?
17:50 πŸ”— emijrp a hashtag that is going to be used a lot in the next days
17:51 πŸ”— emijrp also user accounts http://code.activestate.com/recipes/576594-backupdownload-your-tweets-or-anyones-tweets/
17:54 πŸ”— mistym That example won't work anymore, since they were screenscraping instead of using the Twitter API. But it's not hard to archive your own tweets, yeah - should take roughly as many lines of code.
17:54 πŸ”— mistym (e.g. they were screenscraping and the API changed in the meantime.)
17:55 πŸ”— mistym (The design changed, even)
17:55 πŸ”— emijrp it works for me
17:55 πŸ”— mistym Never mind then, I was wrong!
17:55 πŸ”— mistym Good it works. Not the 100% best solution, but at the end of the day if you get your tweets it's what matters.
18:00 πŸ”— mistym Hootsuite has an archiving tool you can set to grab tweets on hashtag, then export to CSV. Probably the easiest solution if you don't want to roll your own.
18:01 πŸ”— mistym Oh, never mind. They charge money for it? That's pretty silly.
18:04 πŸ”— emijrp im will make my own approach, without beautiful soup
18:04 πŸ”— emijrp beautiful soup is for gays
18:06 πŸ”— mistym Can we tone the homophobia down a little? :V
18:06 πŸ”— yipdw well, I guess that means that I dress well and get along well with women
18:06 πŸ”— yipdw I'll take that
18:06 πŸ”— mistym While I am not a pythonista, I am 9000% sure that there is a convenience API wrapper library available for Python. (If you even need it, the API is so simple you're probably fine with just a json parser.)
18:07 πŸ”— emijrp api needs keys?
18:07 πŸ”— mistym Only for write functions, you can search and retrieve keys all you want with no API key.
18:08 πŸ”— mistym *search and retrieve tweets
18:10 πŸ”— emijrp and request frequency limits?
18:12 πŸ”— mistym There's an hourly API limit, unauthenticated users get 150 requests per hour.
18:12 πŸ”— mistym The docs are on Twitter's site: https://dev.twitter.com/docs/rate-limiting
18:13 πŸ”— yipdw if you want to latch onto *all* tweets, you may also want to investigte their streaming API
18:13 πŸ”— yipdw https://dev.twitter.com/docs/streaming-api
18:14 πŸ”— yipdw I don't think that they just let anyone use that, though
18:16 πŸ”— yipdw there's also some severe rate limits on the streaming API
18:16 πŸ”— yipdw not in terms of requests/sec, but more like "You will not be served more than 1% of all public tweets matching your request"
18:18 πŸ”— * mistym needs to grab some lunch
18:26 πŸ”— Coderjoe unless you pay stanley to be able to drink from the firehose
18:29 πŸ”— Coderjoe hmm
18:29 πŸ”— Coderjoe statuses/sample
18:29 πŸ”— Coderjoe Returns a random sample of all public statuses. The default access level, Γ’Β€Β˜SpritzerҀ™ provides a small proportion of the Firehose, very roughly, 1% of all public statuses.
18:29 πŸ”— Coderjoe I don't see such text on the statuses/filter method
18:31 πŸ”— Coderjoe mmm
18:31 πŸ”— Coderjoe https://dev.twitter.com/docs/twitter-data-providers
18:37 πŸ”— SketchCow https://twitter.com/#!/TheRealNimoy/status/200287158299406336 is pretty awesome
18:45 πŸ”— yipdw Coderjoe: that list of data providers just made me irrationally happy to not be in the twitter cogwork
19:43 πŸ”— SketchCow http://scottbeale.org/2012/05/09/jason-scott-of-textfiles/
19:47 πŸ”— Schbirid totally nws but that gentleman has some resemblance with SketchCow http://www.imagebam.com/image/906de5189359872
20:10 πŸ”— shaqfu Schbirid: FP still chugging along?
20:26 πŸ”— Coderjoe mmm
20:26 πŸ”— Coderjoe some websites have awesome color choices
20:26 πŸ”— Coderjoe such as black on dark blue
21:14 πŸ”— SketchCow http://ascii.textfiles.com/archives/3559 Javascript Hero. I'm announcing generally that #jsmess in EFnet could use your attention and help.
23:23 πŸ”— dashcloud if you'd like to see a website time forgot, check out epicclassics.com -it's pretty much the exact same site now as when it launched in 1999
23:49 πŸ”— mistym dashcloud: I like that they haven't fixed the order page that broke ~4-5 years ago
23:50 πŸ”— dashcloud actually, it still works- I placed an order for nearly everything on the page, and got a box with the correct items (plus, I accidentally sent too much money, so they sent me a refund of that+a copy of Jazz 2)
23:51 πŸ”— dashcloud for those that haven't been there, that means printing out the document, writing a check, and mailing it all to the address listed, then waiting a couple of weeks for a box in the mail
23:59 πŸ”— mistym Oh, I see - the top order form works, I was clicking on the image

irclogger-viewer