#archiveteam 2012-02-27,Mon

↑back Search

Time Nickname Message
00:52 πŸ”— closure undersco2: worst bit is they had the non-archive url at first and then changed it
01:06 πŸ”— chronomex what was the original url?
01:07 πŸ”— undersco2 circlejerk.k-srv.info
01:07 πŸ”— undersco2 but it was just a CNAME to an archive box, which was fucking dumb
01:10 πŸ”— closure classy they took the article down
01:11 πŸ”— undersco2 very
01:11 πŸ”— closure meanwhile, wikileaks is baaack
01:11 πŸ”— undersco2 jason's friends with a lot of them though
01:11 πŸ”— undersco2 hahaha, I saw that
01:11 πŸ”— closure he paid the $2
01:12 πŸ”— NovaKing http://pastebin.com/D7sR4zhT Wikileaks Stratfor dump out right now too
01:13 πŸ”— kennethre closure:I couldn't find the data
01:13 πŸ”— closure well, it's wikileaks, you just wait for the leaked torrent
01:14 πŸ”— closure and then you mairix the fucker and have some datamining fun
01:15 πŸ”— undersco2 haha
01:16 πŸ”— Soojin https://www.youtube.com/watch?v=DKjd3plMCYA - Taiwan's last printing press
01:29 πŸ”— NovaKing undersco2: see pm?
01:30 πŸ”— undersco2 replied
02:11 πŸ”— dcmorton uh oh.. routing loop inside of archive.org's network
02:12 πŸ”— undersco2 yeah, saw that
02:14 πŸ”— dcmorton of course it had to happen when the ~35 gig rsync was 98% done
02:31 πŸ”— chronomex yum
02:32 πŸ”— chronomex packets, packets, eat them up, yum!
02:43 πŸ”— undersco2 dcmorton: :(
02:43 πŸ”— undersco2 one of the DCs is offline (the one where fos lives)
02:43 πŸ”— undersco2 So that's why there's a loop
03:03 πŸ”— dcmorton issues look to be resolved now.. working for me at least
03:26 πŸ”— kennethre it's really nice to see that number going up again
04:59 πŸ”— undersco2 <joepie91> HAHAHA:
04:59 πŸ”— undersco2 <joepie91> Bookmarks still exists, and will shut it down as soon as a manager accidentally stumbles
04:59 πŸ”— undersco2 <joepie91> For example, while itҀ™s likely that Delicious has a long-term strategy for becoming a
04:59 πŸ”— undersco2 <joepie91> across their office.
04:59 πŸ”— undersco2 <joepie91> profitable business, itҀ™s almost certain that Yahoo has simply forgotten that Yahoo
05:47 πŸ”— undersco2 https://knol-redirects.appspot.com/faq.html
05:52 πŸ”— Coderjoe shut down which?
05:56 πŸ”— undersco2 Yahoo bookmarks
05:56 πŸ”— undersco2 (if you're referring to that copypata)
05:57 πŸ”— undersco2 pasta*
05:58 πŸ”— Coderjoe yes
05:58 πŸ”— Coderjoe blargh. bed
06:37 πŸ”— chronomex bed? already?
08:04 πŸ”— SketchCow Back.
08:10 πŸ”— SketchCow Boy, whoever thought giving Metafilter a link to "Circlejerk" as a host for a Archiveteam Project is going to not get a christmas card from me.
08:12 πŸ”— ersi Isn't it a little fitting though? That a circlejerk host went up at a circlejerk-site? :)
08:12 πŸ”— ersi I mean, beside that the host being associated to a archive.org boxen
08:14 πŸ”— SketchCow You expect for the part where it sucked, it ruled.
08:15 πŸ”— SketchCow Except.
08:15 πŸ”— SketchCow Anyway, now I have to do a bunch of audits tomorrow.
08:15 πŸ”— ersi that's another big suck
08:17 πŸ”— arrith is there documentation anywhere on archive.org's storage infrastructure in terms of how they ensure data integrity?
08:32 πŸ”— SketchCow Not to an adequate about.
08:32 πŸ”— SketchCow Amount
08:42 πŸ”— alard If a task on archive.org says 'Waiting for admin', I assume the admin will arrive by himself (red lights flashing, sirens blaring)? There's no need to start mailing people, right?
08:44 πŸ”— chronomex archive.org provides you with no guarantees
08:44 πŸ”— chronomex what is the item?
08:46 πŸ”— SketchCow I'm the Archive Teams pimp bitch janitor on administration.
08:46 πŸ”— SketchCow So hit me up, I fast-track
08:46 πŸ”— chronomex pimp bitch janitor
08:46 πŸ”— chronomex that's a new one to me
08:47 πŸ”— alard Basically, there's eight items: archiveteam-mobileme-hero-1 to -8. They all have a task that failed with something like "ssh: connect to host ia700807.us.archive.org port 22: No route to host".
08:48 πŸ”— alard As a result, because the items are blocked, there's currently a list of 127 tasks waiting to run.
08:48 πŸ”— alard It's probably because of the network error underscor mentioned.
08:48 πŸ”— chronomex sounds like the pimp bitches are stuck up, so the pimp bitch janitor is the right man to call
08:49 πŸ”— chronomex sounds like
08:49 πŸ”— chronomex can you poke at one and make it run again?
08:50 πŸ”— alard I certainly can't poke.
08:53 πŸ”— SketchCow OK one moment.
08:54 πŸ”— SketchCow Doing -1 to see how it goes.
08:54 πŸ”— SketchCow If it goes fine, I do all
08:57 πŸ”— db48x I think my computer is broken :(
09:02 πŸ”— * chronomex casts fixing spell, lays on hands
09:05 πŸ”— db48x that's got to be a memory error in the video card
09:05 πŸ”— db48x I've got 'stuck' pixels on my crt
09:06 πŸ”— db48x hrm
09:07 πŸ”— db48x and in text mode during bootup some of the text is wrong
09:07 πŸ”— db48x I can't remember, do the basic text modes store the text buffer on the video card or in main memory?
09:07 πŸ”— arrith well i'm curious if it's all custom or if there's some enterprisey oss thing that does large-scale data integrity, or if it's paperclips and par2s
09:08 πŸ”— chronomex db48x: the video card, I think
09:10 πŸ”— SketchCow db48x: You are owed dinner
09:10 πŸ”— SketchCow I can take you to a nice party on Tuesday
09:10 πŸ”— SketchCow Bourbon, man. Bourbon
09:10 πŸ”— db48x heh
09:12 πŸ”— db48x that's kind of you
09:12 πŸ”— SketchCow Too much jerking around of a good team member
09:12 πŸ”— db48x what's the occasion?
09:13 πŸ”— SketchCow Private party
09:14 πŸ”— db48x could be fun, I suppose, but I'm not a drinker
09:15 πŸ”— db48x memtest86 looks funny with spurious characters
09:15 πŸ”— db48x one of the # in the progress bars invariably becomes a "
09:28 πŸ”— db48x cool
09:28 πŸ”— db48x it's getting worse over time
09:34 πŸ”— alard The queue seems to be moving again. Thanks.
09:41 πŸ”— Nemo_bis alard, how many tars are you going to put in each item?
09:42 πŸ”— alard I think it was 40.
09:42 πŸ”— alard So you'd get 40*5=200GB items.
09:42 πŸ”— Nemo_bis ok
09:43 πŸ”— alard Why? Is that enough/too many?
09:43 πŸ”— alard (Not that it's easy to change now, but anyway.)
09:43 πŸ”— Nemo_bis Just out of curiosity, and to add it to the wiki.
09:44 πŸ”— alard Ha. At least someone is keeping the documentation up to date. :)
09:47 πŸ”— Nemo_bis Yes. If only I didn't have to submit a captcha on each edit.
09:48 πŸ”— db48x that reminds me
09:48 πŸ”— db48x every month or so I see the main page and think about trying to fix the obvious bugs there
09:48 πŸ”— db48x the worst is the misplaced menu on the side
09:48 πŸ”— db48x I think it's missing a style rule
09:49 πŸ”— Nemo_bis alard, bigger users, over 5 GiB, are still put in a single archive, right?
09:49 πŸ”— db48x SketchCow: how is the wiki set up? did you make any changes to the theme at all?
09:50 πŸ”— Nemo_bis db48x, do you mean the sidebar? that's Firefox
09:50 πŸ”— db48x yea
09:50 πŸ”— db48x Nemo_bis: do you have a firefox bug number?
09:50 πŸ”— db48x I'd be greatly surprised if it's really a rendering bug
09:51 πŸ”— Nemo_bis db48x, you have to used AdBlockPlus to block some KHTML thingy
09:51 πŸ”— Nemo_bis it's just an old MediaWiki which doesn't work
09:51 πŸ”— alard Nemo_bis: Yes, archives can be larger than 5GB. The script keeps adding users until the size is at least 5GB, then tars and uploads. So the file size could be 5GB+size of last user.
09:51 πŸ”— Nemo_bis alard, thanks.
09:57 πŸ”— Nemo_bis alard, and then how will one be able to find a user in all that mass of stuff?
09:58 πŸ”— alard Well, you start downloading the first tar file, see if you're in there. If not, download the next.
09:58 πŸ”— alard No, there is a txt file for each tar that lists the users. :)
09:58 πŸ”— Nemo_bis Oh, how couldn't I think of this.
09:58 πŸ”— Nemo_bis ok
09:59 πŸ”— Nemo_bis oh, silly me, I thought that was the _meta
09:59 πŸ”— alard But eventually, it might be useful to make one big list that points to a specific tar file.
09:59 πŸ”— Nemo_bis Or, those user names could be all put in some ad-hoc metadata tag
09:59 πŸ”— db48x which project are you uploading?
09:59 πŸ”— SketchCow For what it's worth, the mobileme-heros are all progressing properly.
09:59 πŸ”— Nemo_bis So that you can search them
09:59 πŸ”— alard http://ia600808.us.archive.org/6/items/archiveteam-mobileme-hero-1/archiveteam-mobileme-hero-1-8.txt
10:00 πŸ”— SketchCow I agree, as we go, that we will want to consider curating these things.
10:00 πŸ”— alard SketchCow: Yes, thanks for poking.
10:00 πŸ”— SketchCow Over time, we can generate things for them, that will allow better searching down the line.
10:00 πŸ”— ersi 300k users to GoGo though
10:01 πŸ”— SketchCow I'll push the mobileme heroes to a collection at some point.
10:01 πŸ”— ersi 292k actually
10:01 πŸ”— SketchCow Are these all coming from kenneth's work?
10:01 πŸ”— SketchCow Where are these tars coming from?
10:01 πŸ”— alard These tars are coming from kenneth's instances, plus the one that I'm running.
10:01 πŸ”— SketchCow OK.
10:01 πŸ”— alard It's a pity that the s3 upload is so slow.
10:01 πŸ”— SketchCow So at what rate? I assume we're not back to awful
10:02 πŸ”— SketchCow Or I should say breathtaking
10:02 πŸ”— SketchCow Yes, S3 is a problem.
10:02 πŸ”— ersi How's fortress doing by the way?
10:02 πŸ”— SketchCow I will work with S3's admins to discuss ways to improve.
10:02 πŸ”— SketchCow Fortress got super-ganked by last night's network storm.
10:02 πŸ”— Nemo_bis \o/
10:02 πŸ”— Nemo_bis (that was about s3)
10:02 πŸ”— ersi just curious since I've been pushing 30-50Mbit/s to it for quite a while ^_^
10:02 πŸ”— db48x oh, nice
10:03 πŸ”— db48x you guys have been making good progress
10:03 πŸ”— db48x I'm nearly out of the top 10
10:03 πŸ”— ersi >:]
10:04 πŸ”— db48x I will have to acquire some free space somehow
10:05 πŸ”— ersi I got another 0.5-1TiB coming online soon~
10:06 πŸ”— db48x ok, memtest has done a few passes
10:06 πŸ”— db48x nothing apparently wrong with my real memory
10:06 πŸ”— db48x just the video card
10:06 πŸ”— db48x I wonder which of the three is the one that's broken...
10:09 πŸ”— * db48x follows the cabling
10:11 πŸ”— db48x ouch :(
10:11 πŸ”— db48x burnt myself :(
10:11 πŸ”— db48x thing is HOT
10:14 πŸ”— SketchCow OK, going to bed
10:14 πŸ”— SketchCow tomorrow morning, off to work to face the music, and get some shit done
10:14 πŸ”— SketchCow DONE I tell you
10:15 πŸ”— ersi [1]+ Done
12:21 πŸ”— godane is there a archive of 2600 magzines on archive.org?
12:29 πŸ”— ersi there is a search feature on archive.org
12:34 πŸ”— godane i couldn't find them using search
12:38 πŸ”— db48x :)
14:56 πŸ”— blink_ Hello all
14:58 πŸ”— dnova hello
15:00 πŸ”— blink_ just out of curiosity, does anyone know that there is a warning that the archive team site might be compromised?
15:01 πŸ”— Nemo_bis blink_, a warbning where?
15:02 πŸ”— blink_ i tried to go there a minute ago, and my browser showed a warning
15:02 πŸ”— blink_ Viagra levitra comparison, order levitra - Pill store, best prices!. Save money with generics. Money back guarantee!"
15:02 πŸ”— blink_ and this "This site may be compromised.
15:05 πŸ”— emijrp on google
15:05 πŸ”— emijrp i guess it is due to spammers editing the wiki
15:06 πŸ”— blink_ so...its safe to click?
15:06 πŸ”— emijrp viagra is safe
15:06 πŸ”— blink_ ha
15:06 πŸ”— blink_ i means the site itself
15:06 πŸ”— emijrp yes
15:07 πŸ”— blink_ okies
15:10 πŸ”— Nemo_bis sigh, why don't people just use the URL bar
15:11 πŸ”— ersi because they use the URL bar
15:12 πŸ”— ersi and go to google, then type where they want and then they feel lucky!
15:13 πŸ”— Nemo_bis he didn't feel lucky, or he wouldn't have seen the warning :p
15:57 πŸ”— balrog_ph TPB down for anyone?
15:59 πŸ”— ersi balrog: Nope. It's up for me.
15:59 πŸ”— ersi but it's been shakier than a heroinist with withdrawal symptoms of lately
16:00 πŸ”— balrog it was going up and down today
16:02 πŸ”— ersi note my earlier line
16:04 πŸ”— balrog yea I see
16:23 πŸ”— emijrp ok
16:30 πŸ”— db48x uh, wtf
16:30 πŸ”— db48x I swapped out my video card
16:30 πŸ”— db48x now I can see the bios drawing each individual character one row at a time
16:30 πŸ”— db48x it hasn't even gotten past the first page of the post process yet
16:37 πŸ”— kennethre memories
16:49 πŸ”— emijrp i need some volunteers to archive http://commons.wikimedia.org, it contains 12M files, but the first chunk is about 1M files ~= 500 gb, that chunk is made of daily chunks
16:49 πŸ”— emijrp im going to upload the script and feed list
16:50 πŸ”— emijrp #wikiteam
16:59 πŸ”— Ymgve don't wikimedia sites have easily downloadable archives?
17:00 πŸ”— Nemo_bis not if images
17:01 πŸ”— Nemo_bis Ymgve, https://wikitech.wikimedia.org/view/Dumps/Image_dumps
17:03 πŸ”— Ymgve don't worry, after all male genitalia has been removed, the remaining collection will be 2gb
17:05 πŸ”— DFJustin lmao
17:07 πŸ”— SketchCow Good compression technique
17:08 πŸ”— SketchCow I show uploading via S3 is about 2 hours for 50gb
17:08 πŸ”— emijrp no, really, they are deleting pics because some countries doesn't have "freedom of panorama", so many pics about monuments are biting the dust
17:08 πŸ”— DFJustin somebody at IA is doing these already right http://www.archive.org/details/wikimediadownloads
17:08 πŸ”— DFJustin can't they just add image dumps to the list
17:09 πŸ”— emijrp DFJustin: that is only text
17:09 πŸ”— DFJustin or are we ripping the images directly
17:09 πŸ”— emijrp this new project is only images
17:09 πŸ”— emijrp texts + images = win
17:09 πŸ”— DFJustin yes ofc
17:10 πŸ”— SketchCow Well, the liberator is liberated.
17:10 πŸ”— SketchCow I can't stop that.
17:12 πŸ”— DFJustin I guess we are ripping the images directly, in that case carry on :)
17:13 πŸ”— db48x freedom of panorama?
17:13 πŸ”— Nemo_bis SketchCow, is there some public graph for network stats of the s3 hosts?
17:14 πŸ”— SketchCow Not really in the way I'd like.
17:14 πŸ”— DFJustin db48x: US law is you can take pictures of buildings, statues, etc if they are in public and do whatever you want with the pictures
17:15 πŸ”— DFJustin some other countries you need permission from whoever's stuff it is you're photographing
17:15 πŸ”— DFJustin to over-simplify
17:16 πŸ”— Nemo_bis emijrp, it's not you DDOSing WMF servers, is it? :)
17:16 πŸ”— balrog so multiupload seems dead for good
17:16 πŸ”— balrog :|
17:16 πŸ”— SketchCow Someone help me here.
17:16 πŸ”— emijrp Nemo_bis: no, but toolserver is slow as hell
17:17 πŸ”— Nemo_bis emijrp, don't complain, wikis are down :p
17:17 πŸ”— SketchCow 1. http://jliberate.k-srv.info/index.php - who is that, and whos hosting it.
17:17 πŸ”— SketchCow 2. The bookmarklet sends the documents "somewhere". Where is that?
17:17 πŸ”— Nemo_bis emijrp, but here's one of the responsibles for that https://ru.wikipedia.org/wiki/%D0%9E%D0%B1%D1%81%D1%83%D0%B6%D0%B4%D0%B5%D0%BD%D0%B8%D0%B5_%D1%83%D1%87%D0%B0%D1%81%D1%82%D0%BD%D0%B8%D0%BA%D0%B0:Dmitry89#Toolserver
17:17 πŸ”— SketchCow Like, are we actually getting those documents?
17:18 πŸ”— DFJustin yeah I was wondering about that, I don't know that the backend ever got hooked up for that
17:18 πŸ”— Nemo_bis db48x, https://commons.wikimedia.org/wiki/Commons:FOP
17:20 πŸ”— Nemo_bis emijrp, <domas> DDoS in progress, please be silent
17:27 πŸ”— SketchCow https://twitter.com/#!/JSTOR/status/174155323668574208
17:27 πŸ”— Nemo_bis :O
17:30 πŸ”— SketchCow https://twitter.com/#!/archiveteam/status/174184820782542849
17:30 πŸ”— SketchCow I'd like that http://jliberate.k-srv.info/index.php to go down
17:31 πŸ”— SketchCow Oh, I see it's an undersco2 special
17:31 πŸ”— db48x Nemo_bis: sheesh
17:32 πŸ”— emijrp SketchCow: what format is prefered for the 1,000-5,000 image packages? zip, tar, 7z ? it includes also one .xml per image with the wikitext description metadata
17:33 πŸ”— SketchCow zip.
17:33 πŸ”— emijrp ok
17:34 πŸ”— SketchCow Oh goddamnit.
17:34 πŸ”— yipdw I think ArchiveTeam needs a gravatar
17:35 πŸ”— yipdw the middle finger GIF will do
17:35 πŸ”— SketchCow http://tracker.archive.org/
17:35 πŸ”— SketchCow See that? That's the sound of someone going into The Hole for a week.
17:35 πŸ”— DFJustin wow
17:35 πŸ”— yipdw SketchCow: wtf
17:42 πŸ”— SketchCow A shame, too, because I had actually calmed down.
17:42 πŸ”— SketchCow Whih is are, I haven't calmed down since 1996
17:43 πŸ”— SketchCow Wow, typing is gone, I blame the keyboard.
17:50 πŸ”— SketchCow Well, lost my my line place for the shower.
17:52 πŸ”— emijrp what is the best way to zip a directory?
17:52 πŸ”— emijrp zip a.zip folder/subfolder/*
17:52 πŸ”— emijrp ?
17:53 πŸ”— SketchCow zip -9 -r azip topmostfolder
17:53 πŸ”— SketchCow zip -9 -r a.zip topmostfolder
17:55 πŸ”— emijrp if i dont add /* to end, it doesnt work
17:58 πŸ”— db48x I don't really feel like doing any real work today
18:15 πŸ”— alard SketchCow: If you have a moment, could you rerun this task? http://www.us.archive.org/log_show.php?task_id=97292517 Another network error. [archiveteam-mobileme-hero-4]
18:16 πŸ”— SketchCow For the record, we're murdering the s3 interface.
18:17 πŸ”— SketchCow But I think it needs to be murdered, increased in benefit by working better.
18:19 πŸ”— alard Should we go slower? (As far as I can see it's still responsive, so we open fewer connections than yesterday.)
18:19 πŸ”— alard Heroku - archive.org : 1 - 0 ?
18:24 πŸ”— Coderjoe SketchCow: any progress on getting the friendster off that USB drive I sent? If not, I'd be willing to switch to JFS or even just a tarball-in-partition or whatever, if the drive were sent back here.
18:26 πŸ”— SketchCow We should go a little slower, if possible.
18:26 πŸ”— SketchCow Coderjoe: I've not had a chance to look at it
18:26 πŸ”— SketchCow But I will
18:27 πŸ”— SketchCow I am willing to pay for a second drive so you can try again
18:28 πŸ”— Coderjoe alright. I can do that. and it can be a bare SATA drive, too.
18:29 πŸ”— Coderjoe I was planning to use that USB drive for other stuff later.
18:29 πŸ”— Coderjoe hmm
18:30 πŸ”— Coderjoe I could apparently be at JSTOR in a few hours...
18:30 πŸ”— Coderjoe if I were already going that way, it might be amusing to go knock on their door
18:32 πŸ”— Coderjoe at least based on the location in their twitter profile
18:33 πŸ”— SketchCow I've been talking to archive.org about it.
18:33 πŸ”— SketchCow They already have a link in back. JSTOR wants to give archive.org a copy of their PD stuff.
18:33 πŸ”— Coderjoe cool
18:33 πŸ”— SketchCow It's just arrangement now.
18:34 πŸ”— SketchCow Hence this liberator, which is redundant and non-supported, is not needed.
18:38 πŸ”— db48x I think we need to outlaw pollen
18:39 πŸ”— DFJustin if pollen is outlawed, then only outlaws will have pollen!
18:39 πŸ”— dnova that would pretty much wipe out all above-water life
18:39 πŸ”— Coderjoe dnova: I'm pretty sure it would also have a detremental effect on underwater life as well
18:40 πŸ”— db48x we would have to re-engineer our plant life, granted
18:40 πŸ”— db48x they can all just use wifi or something
18:41 πŸ”— SketchCow OK, I need to drive to work.
18:41 πŸ”— SketchCow Let's try and not have a massive political and reputational blowup in the hour commute.
18:42 πŸ”— dnova you're asking the impossible
18:42 πŸ”— SketchCow I know, I am just required to ask under the terms of my gitmo parole
18:42 πŸ”— dnova oh, just "try"... ok.
18:42 πŸ”— db48x SketchCow: why do you have an hour-long commute in a town you are visiting?
18:43 πŸ”— SketchCow I stay in San Jose
18:43 πŸ”— emijrp SketchCow: zip browser works whit subfolders?
18:43 πŸ”— db48x huh
18:43 πŸ”— SketchCow emijrp: Yes.
18:44 πŸ”— SketchCow http://www.archive.org/download/SuperBlue/pc_blue_ii.zip/
18:45 πŸ”— emijrp great
18:46 πŸ”— emijrp ok, we can make some tests http://www.archiveteam.org/index.php?title=Wikimedia_Commons#Archiving_process
18:46 πŸ”— emijrp i have tested the script, but, i hope we can find any error before we start to DDoS wikipedia servers
18:47 πŸ”— emijrp by the way, my upload stream is shit, so, i wont use this script much, irony
18:48 πŸ”— db48x heh
18:49 πŸ”— emijrp file list available from 2004-09-07 to 2006-12-31
18:49 πŸ”— db48x throw that thing up on github :)
18:49 πŸ”— db48x pastebins are... annoying
18:49 πŸ”— emijrp i will create further lists in the next days
18:50 πŸ”— emijrp db48x: http://code.google.com/p/wikiteam/source/browse/trunk/commonsdownloader.py
18:52 πŸ”— db48x emijrp: perfect
19:09 πŸ”— db48x emijrp: seems to be working
19:09 πŸ”— Nemo_bis db48x, you edit conflicted me, why didnj't you join #wikiteam? :p
19:09 πŸ”— emijrp db48x: ok, please go to #wikiteam we are coordinating there
19:12 πŸ”— emijrp SketchCow: why isnt there a link to the browsable version from here http://www.archive.org/details/SuperBlue ?
19:14 πŸ”— db48x heh
19:17 πŸ”— Nemo_bis emijrp, this is how they generate links: http://www.archive.org/details/archiveteam-googlegroups-jz
19:17 πŸ”— Nemo_bis in The Right Way
19:18 πŸ”— Nemo_bis trailing slash is the trick I suppose
19:22 πŸ”— topaz http://archiveteam.org does not appear to be a helpful URL. :-)
19:22 πŸ”— topaz (or is that old news?)
19:23 πŸ”— db48x topaz: yea, the front page is a little out of date
19:23 πŸ”— db48x splinder is finished, but mobileme is still going
19:23 πŸ”— topaz right now the front page looks like a viagra ad on my browser.
19:24 πŸ”— db48x nice
19:24 πŸ”— db48x your user agent is set to the googlebot
19:24 πŸ”— chronomex looks fine from my phone
19:25 πŸ”— db48x the wiki is occasionally slightly hacked
19:25 πŸ”— db48x if it thinks you're google then it's a pharma scam
19:26 πŸ”— Coderjoe topaz: using a googlebot UA?
19:26 πŸ”— topaz no, using Chrome
19:26 πŸ”— Coderjoe oh hi. I ken read 5 lines up
19:26 πŸ”— topaz I was spoofing user agents some time ago but I thought I'd turned all that off, double checking now
19:26 πŸ”— Coderjoe perhaps the UA hack the crackers put in just checks for "google" and not "googlebot"
19:27 πŸ”— Coderjoe UA masking of spammy crap is unfortunately not uncommon
19:27 πŸ”— topaz yeah, I'm not overriding the user-agent.
19:28 πŸ”— Coderjoe refresh?
19:28 πŸ”— Coderjoe I'm not seeing viagra as googlebot
19:29 πŸ”— topaz I still am.
19:29 πŸ”— topaz hmm, hang on
19:29 πŸ”— topaz yeah, still spam
19:29 πŸ”— Coderjoe I turned off adblock and refreshed as googlebot and still am not
19:29 πŸ”— Coderjoe are you perhaps using a hacked proxy?
19:30 πŸ”— topaz conceivable. I'm at work and have not examined the network settings closely. I'm getting the same result when I use Safari.
19:32 πŸ”— topaz but my browser user-agent string is "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11" (just confirmed against my own web server)
19:45 πŸ”— LimbClock i've been meaning to ask, have you guys made a siterip of Old Man Murray yet?
19:55 πŸ”— topaz anyway, I know none of the rest of you are seeing the viagra spam on the archiveteam.org page, but who'd be responsible for fixing it anyway? :-)
19:57 πŸ”— ersi SketchCow as usual. He'll get to it
19:57 πŸ”— ersi it's not like it hasn't happened before
19:57 πŸ”— topaz ah ok
19:57 πŸ”— ersi dreamhost is a turd
19:57 πŸ”— ersi shiny shiny unlimited turd
19:57 πŸ”— balrog topaz: why are you seeing it?
19:57 πŸ”— topaz balrog: haven't been able to figure it out.
19:57 πŸ”— balrog you're sure you don't have a rootkit?
19:57 πŸ”— ersi he's probably saying he's googlebot in his useragent string
19:57 πŸ”— topaz seeing it in both Chrome and Safari, OSX
19:58 πŸ”— ersi wÀt
19:58 πŸ”— topaz ersi: unmodified user-agent
19:58 πŸ”— topaz scroll back
19:58 πŸ”— topaz I'm happy to conduct more experiments but not sure where else to look on my end.
19:58 πŸ”— ersi yeah, I was too lazy to connect the dots. but yes, I see now~
20:04 πŸ”— topaz I seriously doubt that my laptop has been rooted. Not that it couldn't happen, but a root that only announces itself on specific URLs like this?
20:04 πŸ”— ersi It's most likely the wiki
20:04 πŸ”— emijrp topaz: perhaps, next time google index AT wiki, it will be fiexed
20:05 πŸ”— topaz emijrp: doesn't have anything to do with the Google index.
20:06 πŸ”— emijrp then ?
20:10 πŸ”— topaz precisely my question.
20:11 πŸ”— kennethre topaz: dns maybe
20:11 πŸ”— kennethre google things archiveteam.org is a potential threat
20:11 πŸ”— kennethre *thinks
20:12 πŸ”— kennethre http://cl.ly/1Y473S411Z2I1c2I0V31
20:12 πŸ”— topaz kennethre: doubt it's DNS poisoning. the page source is littered with what looks like real references from the archiveteam.org wiki.
20:12 πŸ”— kennethre topaz ^
20:12 πŸ”— kennethre topaz: google agrees with you http://cl.ly/1Y473S411Z2I1c2I0V31
20:13 πŸ”— topaz yeah, so the wiki is returning viagra spam to google to poison the search engine cache. what I can't figure out is why it's returning it to my browser too,.
20:13 πŸ”— kennethre hahahaha
20:13 πŸ”— kennethre that's awesome
20:14 πŸ”— kennethre topaz: you're using chrome?
20:14 πŸ”— topaz and, sorry, I know this has gone beyond anything that the rest of you folks can do anything about :-)
20:14 πŸ”— topaz kennethre: I'm seeing it in both Chrome and Safari on OSX. unmodified user-agent strings.
20:14 πŸ”— kennethre topaz: i wonder if the page pre-fetching has a different user agent
20:14 πŸ”— Coderjoe kennethre: I'm not seeing it with a googlebot UA
20:14 πŸ”— kennethre well, why are we poisioning the cache?
20:14 πŸ”— kennethre that's rediculous
20:15 πŸ”— kennethre *ridiculous
20:15 πŸ”— ersi The wiki has gotten viagra aids before, several times
20:15 πŸ”— kennethre oh bots
20:15 πŸ”— ersi yeah
20:16 πŸ”— kennethre haha
20:16 πŸ”— kennethre viagra aids
20:16 πŸ”— ersi I rule etc
20:17 πŸ”— topaz I get it even if I explicitly mask myself as IE9. Amazing.
20:17 πŸ”— topaz I can only imagine that the hacked code is returning it based on my IP address for some bizarre reason
20:28 πŸ”— db48x why is my latency so high?
20:33 πŸ”— ersi db48x: because someone is snoopin' on yer tubes
20:52 πŸ”— SketchCow MORNING
20:52 πŸ”— SketchCow OK AM AT THE ARCHIVE AGAIN
20:53 πŸ”— hybernaut topaz: are you reloading the page with pragma: no-cache?
20:54 πŸ”— DFJustin hide your kids hide your files
20:54 πŸ”— hybernaut (cmd-shift-R on Chrome/Mac)
20:55 πŸ”— topaz yes, I'm doing force-reload with command-shift-R
20:55 πŸ”— hybernaut topaz: can you do it again with Developer Tools open to the Network tab?
20:55 πŸ”— topaz sure, sec
20:56 πŸ”— hybernaut just curious if you're getting a 200 OK response, or a 304 Not Modified
20:56 πŸ”— SketchCow Wow, everyone has questions for me.
20:56 πŸ”— topaz Developer Tools confirms that I am getting objects
20:57 πŸ”— SketchCow * Why does CD-ROM not have link to browseable version? We added iso and zip browsing after it was uploaded.
20:57 πŸ”— topaz a long list of 200 OK, not seeing any 304 responses
20:57 πŸ”— SketchCow * Why does archiveteam.org have spam? Dreamhost blows, looking for a new host now.
20:57 πŸ”— hybernaut topaz: any proxy headers in the response headers?
20:57 πŸ”— topaz SketchCow: mainly I was trying to figure out why I'm the only one who seems to be seeing it :-)
20:57 πŸ”— SketchCow It's because your user agents are hipster.
20:57 πŸ”— SketchCow That's why
20:57 πŸ”— topaz (which is now primarily of interest to me, but whatever)
20:57 πŸ”— topaz NO IT IS NOT
20:58 πŸ”— topaz dude :-)
20:58 πŸ”— SketchCow Yes... it is.
20:58 πŸ”— SketchCow It spams based on user-agent string.
20:58 πŸ”— topaz ok, so if I were to switch my user agent to match, say, IE9, it should work right?
20:58 πŸ”— topaz because it doesn't.
20:59 πŸ”— emijrp but does Google send different result description by user-agent? :/
21:00 πŸ”— hybernaut Greetings, Doctor SketchCow: my service are at yourҀ¦um, service
21:00 πŸ”— topaz hybernaut: where would I find proxy headers? sorry, I haven't dug that deep into Developer Tools :-/
21:01 πŸ”— hybernaut topaz: select the index.php entry, then look at Response Headers
21:01 πŸ”— hybernaut but I don't know anything about the wiki's perverse behavior; I am just curious
21:02 πŸ”— topaz ah, I see. no, I'm not seeing any proxy response headers there.
21:03 πŸ”— SketchCow OK, first.
21:04 πŸ”— SketchCow Im switching archiveteam.org hosts, right now.
21:04 πŸ”— SketchCow Thank you for enrolling your card to be billed in your local currency!
21:04 πŸ”— topaz :-)
21:06 πŸ”— kennethre SketchCow: webfaction is awesome
21:06 πŸ”— kennethre SketchCow: if you don't already have something planned
21:07 πŸ”— ersi emijrp: AFAIK they don't, but they do on other factors such as which ccTLD you came to, language settings, if you are logged in to a Google account or not
21:08 πŸ”— SketchCow We are switching to hostgato.
21:08 πŸ”— SketchCow Hostgator.
21:08 πŸ”— SketchCow I know there's a bunch, I'm just liking Hostgator.
21:08 πŸ”— SketchCow Just bought a 3 year membership.
21:08 πŸ”— alard Do you want feedback? :)
21:10 πŸ”— SketchCow What, is there something wrong with hostgator?
21:11 πŸ”— ersi There's always someone not liking a shared hosting provider
21:11 πŸ”— SketchCow Yeah, but I figured I'd let him say it anyway.
21:12 πŸ”— alard No, but since we recently established you like feedback on business cards and engagements, I thought we might have to find something on hosting providers too.
21:12 πŸ”— SketchCow Never let it be said I don't listen to feedback (while watching the price is right, and jerking off)
21:12 πŸ”— SketchCow SHOWCASE........SHOWDOWN
21:12 πŸ”— ersi Price is right? Not wheel of fortune? :(
21:12 πŸ”— ersi Sorry, let me fix that
21:12 πŸ”— SketchCow Wheel of fortune rounds end too quick
21:12 πŸ”— ersi WHEEL. OF. FORTUNE!
21:13 πŸ”— SketchCow So, Sam, the archive.org admin will be coming in to talk with me, kennethre, alard about maximizing the S3 interface
21:13 πŸ”— SketchCow We're the big test, we're going to find all the limits.
21:14 πŸ”— SketchCow The explosion on the weekend was finding one, since repaired somewhat.
21:14 πŸ”— SketchCow Soon we can go to max.
21:14 πŸ”— alard http://duckduckgo.com/?q=hostgator+sucks
21:14 πŸ”— alard :)
21:15 πŸ”— kennethre SketchCow: fantastic :)
21:15 πŸ”— ersi Um, should dld-me really output to /dev/null for public.me.com? :o
21:15 πŸ”— SketchCow 74.52.105.105 is our ip
21:16 πŸ”— ersi hm, seems like it's writing to the warc though, that's good
21:16 πŸ”— SketchCow http://duckduckgo.com/?q=dreamhost+sucks&t=1
21:16 πŸ”— alard ersi: Yes, it should. The files are saved in the warc, so there's no need to write them to disk.
21:16 πŸ”— alard The web is full of hate.
21:16 πŸ”— ersi Hate Machine
21:16 πŸ”— SketchCow http://duckduckgo.com/?q=webfaction+sucks&t=1
21:16 πŸ”— ersi http://duckduckgo.com/?q=archiveteam+sucks
21:16 πŸ”— kennethre they all http://duckduckgo.com/?q=heroku+sucks&t=1
21:16 πŸ”— ersi wÀt
21:16 πŸ”— kennethre oh look nothing
21:16 πŸ”— kennethre :)
21:17 πŸ”— SketchCow http://duckduckgo.com/?q=your+favorite+sucks&t=1
21:17 πŸ”— kennethre haha
21:18 πŸ”— SketchCow MediaWiki is a free software open source wiki package written in PHP, originally for use on Wikipedia.
21:18 πŸ”— SketchCow Version: 1.18.1
21:18 πŸ”— SketchCow OK, we're installing that.
21:20 πŸ”— SketchCow I'm going to need help with the transfer, imagine that.
21:20 πŸ”— SketchCow But it's the only way to ensure clearing
21:20 πŸ”— DFJustin "maximize the s3 interface" sounds like something you could do after remodulating the deflector dish
21:21 πŸ”— kennethre reverse the polarity
21:27 πŸ”— SketchCow I love messing with hosts files so much
21:30 πŸ”— SketchCow OK, who wants to help transfer this thing to the new box.
21:31 πŸ”— Coderjoe alard: wasn't there an issue with wget not being able to look for links in files downloaded to /dev/null?
21:32 πŸ”— alard Yes, but > /dev/null is only for public.me.com, where --mirror is not used.
21:32 πŸ”— alard web and homepage are --mirror and rm -rf files/
21:40 πŸ”— ersi yeah
21:43 πŸ”— SketchCow OH BOY I LOVE SWITCHING SERVERS
21:44 πŸ”— SketchCow Just learning hostgator's himblehabble
21:54 πŸ”— Coderjoe ersi: ah
21:55 πŸ”— Coderjoe alard: are the ones that need a temporary output location able to use a specified location? (to allow tmpfs/ramfs, for example?)
21:57 πŸ”— alard Yes, if you're careful you can replace the references to the files/ subdirectory with something else. (But be extra careful if you're running multiple scripts in the same directory.)
21:57 πŸ”— alard Is I/O a problem?
21:58 πŸ”— Coderjoe i don't know offhand, for mobileme. i think it was for another project
21:59 πŸ”— alard An easier optimization: set the wget-warc tempdir (if the scripts don't already do that).
21:59 πŸ”— alard Every warc record is first written to a temporary file.
22:00 πŸ”— alard --warc-tempdir
22:17 πŸ”— alard SketchCow: For planning, when would Sam come in to talk about the s3 api?
22:18 πŸ”— SketchCow He just went into a meeting. Here, hit him up: samuel@archive.org
22:39 πŸ”— SketchCow HYBERNAUT SAYS HI
22:39 πŸ”— SketchCow Someone give him something to do
22:40 πŸ”— hybernaut Greetings!
22:41 πŸ”— alard Hi.
22:41 πŸ”— SketchCow Oh, so many project.
22:41 πŸ”— hybernaut I have skillz in the Ruby, JavaScript, databases
22:41 πŸ”— hybernaut and competent knowledge of teh unix command line
22:43 πŸ”— ersi I got skills at making worthless remarks, like this one
22:45 πŸ”— hybernaut well if I can help with something, I will have more worthful remarks to make, too
22:45 πŸ”— hybernaut otherwise, I will lurk politely, sorry
22:54 πŸ”— ersi like this one meant mine, not your
22:57 πŸ”— SketchCow New wiki is functioning, but it has no data from the old wiki... yet.
22:57 πŸ”— SketchCow I hate this work, by the way
23:01 πŸ”— hybernaut are you copying the database, or do you have another plan?
23:14 πŸ”— alard SketchCow: Just sent the email to Sam.
23:14 πŸ”— alard Now going to bed. Bye.
23:51 πŸ”— SketchCow Thanks, alard

irclogger-viewer