Time |
Nickname |
Message |
00:52
π
|
closure |
undersco2: worst bit is they had the non-archive url at first and then changed it |
01:06
π
|
chronomex |
what was the original url? |
01:07
π
|
undersco2 |
circlejerk.k-srv.info |
01:07
π
|
undersco2 |
but it was just a CNAME to an archive box, which was fucking dumb |
01:10
π
|
closure |
classy they took the article down |
01:11
π
|
undersco2 |
very |
01:11
π
|
closure |
meanwhile, wikileaks is baaack |
01:11
π
|
undersco2 |
jason's friends with a lot of them though |
01:11
π
|
undersco2 |
hahaha, I saw that |
01:11
π
|
closure |
he paid the $2 |
01:12
π
|
NovaKing |
http://pastebin.com/D7sR4zhT Wikileaks Stratfor dump out right now too |
01:13
π
|
kennethre |
closure:I couldn't find the data |
01:13
π
|
closure |
well, it's wikileaks, you just wait for the leaked torrent |
01:14
π
|
closure |
and then you mairix the fucker and have some datamining fun |
01:15
π
|
undersco2 |
haha |
01:16
π
|
Soojin |
https://www.youtube.com/watch?v=DKjd3plMCYA - Taiwan's last printing press |
01:29
π
|
NovaKing |
undersco2: see pm? |
01:30
π
|
undersco2 |
replied |
02:11
π
|
dcmorton |
uh oh.. routing loop inside of archive.org's network |
02:12
π
|
undersco2 |
yeah, saw that |
02:14
π
|
dcmorton |
of course it had to happen when the ~35 gig rsync was 98% done |
02:31
π
|
chronomex |
yum |
02:32
π
|
chronomex |
packets, packets, eat them up, yum! |
02:43
π
|
undersco2 |
dcmorton: :( |
02:43
π
|
undersco2 |
one of the DCs is offline (the one where fos lives) |
02:43
π
|
undersco2 |
So that's why there's a loop |
03:03
π
|
dcmorton |
issues look to be resolved now.. working for me at least |
03:26
π
|
kennethre |
it's really nice to see that number going up again |
04:59
π
|
undersco2 |
<joepie91> HAHAHA: |
04:59
π
|
undersco2 |
<joepie91> Bookmarks still exists, and will shut it down as soon as a manager accidentally stumbles |
04:59
π
|
undersco2 |
<joepie91> For example, while itΓ’ΒΒs likely that Delicious has a long-term strategy for becoming a |
04:59
π
|
undersco2 |
<joepie91> across their office. |
04:59
π
|
undersco2 |
<joepie91> profitable business, itΓ’ΒΒs almost certain that Yahoo has simply forgotten that Yahoo |
05:47
π
|
undersco2 |
https://knol-redirects.appspot.com/faq.html |
05:52
π
|
Coderjoe |
shut down which? |
05:56
π
|
undersco2 |
Yahoo bookmarks |
05:56
π
|
undersco2 |
(if you're referring to that copypata) |
05:57
π
|
undersco2 |
pasta* |
05:58
π
|
Coderjoe |
yes |
05:58
π
|
Coderjoe |
blargh. bed |
06:37
π
|
chronomex |
bed? already? |
08:04
π
|
SketchCow |
Back. |
08:10
π
|
SketchCow |
Boy, whoever thought giving Metafilter a link to "Circlejerk" as a host for a Archiveteam Project is going to not get a christmas card from me. |
08:12
π
|
ersi |
Isn't it a little fitting though? That a circlejerk host went up at a circlejerk-site? :) |
08:12
π
|
ersi |
I mean, beside that the host being associated to a archive.org boxen |
08:14
π
|
SketchCow |
You expect for the part where it sucked, it ruled. |
08:15
π
|
SketchCow |
Except. |
08:15
π
|
SketchCow |
Anyway, now I have to do a bunch of audits tomorrow. |
08:15
π
|
ersi |
that's another big suck |
08:17
π
|
arrith |
is there documentation anywhere on archive.org's storage infrastructure in terms of how they ensure data integrity? |
08:32
π
|
SketchCow |
Not to an adequate about. |
08:32
π
|
SketchCow |
Amount |
08:42
π
|
alard |
If a task on archive.org says 'Waiting for admin', I assume the admin will arrive by himself (red lights flashing, sirens blaring)? There's no need to start mailing people, right? |
08:44
π
|
chronomex |
archive.org provides you with no guarantees |
08:44
π
|
chronomex |
what is the item? |
08:46
π
|
SketchCow |
I'm the Archive Teams pimp bitch janitor on administration. |
08:46
π
|
SketchCow |
So hit me up, I fast-track |
08:46
π
|
chronomex |
pimp bitch janitor |
08:46
π
|
chronomex |
that's a new one to me |
08:47
π
|
alard |
Basically, there's eight items: archiveteam-mobileme-hero-1 to -8. They all have a task that failed with something like "ssh: connect to host ia700807.us.archive.org port 22: No route to host". |
08:48
π
|
alard |
As a result, because the items are blocked, there's currently a list of 127 tasks waiting to run. |
08:48
π
|
alard |
It's probably because of the network error underscor mentioned. |
08:48
π
|
chronomex |
sounds like the pimp bitches are stuck up, so the pimp bitch janitor is the right man to call |
08:49
π
|
chronomex |
sounds like |
08:49
π
|
chronomex |
can you poke at one and make it run again? |
08:50
π
|
alard |
I certainly can't poke. |
08:53
π
|
SketchCow |
OK one moment. |
08:54
π
|
SketchCow |
Doing -1 to see how it goes. |
08:54
π
|
SketchCow |
If it goes fine, I do all |
08:57
π
|
db48x |
I think my computer is broken :( |
09:02
π
|
* |
chronomex casts fixing spell, lays on hands |
09:05
π
|
db48x |
that's got to be a memory error in the video card |
09:05
π
|
db48x |
I've got 'stuck' pixels on my crt |
09:06
π
|
db48x |
hrm |
09:07
π
|
db48x |
and in text mode during bootup some of the text is wrong |
09:07
π
|
db48x |
I can't remember, do the basic text modes store the text buffer on the video card or in main memory? |
09:07
π
|
arrith |
well i'm curious if it's all custom or if there's some enterprisey oss thing that does large-scale data integrity, or if it's paperclips and par2s |
09:08
π
|
chronomex |
db48x: the video card, I think |
09:10
π
|
SketchCow |
db48x: You are owed dinner |
09:10
π
|
SketchCow |
I can take you to a nice party on Tuesday |
09:10
π
|
SketchCow |
Bourbon, man. Bourbon |
09:10
π
|
db48x |
heh |
09:12
π
|
db48x |
that's kind of you |
09:12
π
|
SketchCow |
Too much jerking around of a good team member |
09:12
π
|
db48x |
what's the occasion? |
09:13
π
|
SketchCow |
Private party |
09:14
π
|
db48x |
could be fun, I suppose, but I'm not a drinker |
09:15
π
|
db48x |
memtest86 looks funny with spurious characters |
09:15
π
|
db48x |
one of the # in the progress bars invariably becomes a " |
09:28
π
|
db48x |
cool |
09:28
π
|
db48x |
it's getting worse over time |
09:34
π
|
alard |
The queue seems to be moving again. Thanks. |
09:41
π
|
Nemo_bis |
alard, how many tars are you going to put in each item? |
09:42
π
|
alard |
I think it was 40. |
09:42
π
|
alard |
So you'd get 40*5=200GB items. |
09:42
π
|
Nemo_bis |
ok |
09:43
π
|
alard |
Why? Is that enough/too many? |
09:43
π
|
alard |
(Not that it's easy to change now, but anyway.) |
09:43
π
|
Nemo_bis |
Just out of curiosity, and to add it to the wiki. |
09:44
π
|
alard |
Ha. At least someone is keeping the documentation up to date. :) |
09:47
π
|
Nemo_bis |
Yes. If only I didn't have to submit a captcha on each edit. |
09:48
π
|
db48x |
that reminds me |
09:48
π
|
db48x |
every month or so I see the main page and think about trying to fix the obvious bugs there |
09:48
π
|
db48x |
the worst is the misplaced menu on the side |
09:48
π
|
db48x |
I think it's missing a style rule |
09:49
π
|
Nemo_bis |
alard, bigger users, over 5 GiB, are still put in a single archive, right? |
09:49
π
|
db48x |
SketchCow: how is the wiki set up? did you make any changes to the theme at all? |
09:50
π
|
Nemo_bis |
db48x, do you mean the sidebar? that's Firefox |
09:50
π
|
db48x |
yea |
09:50
π
|
db48x |
Nemo_bis: do you have a firefox bug number? |
09:50
π
|
db48x |
I'd be greatly surprised if it's really a rendering bug |
09:51
π
|
Nemo_bis |
db48x, you have to used AdBlockPlus to block some KHTML thingy |
09:51
π
|
Nemo_bis |
it's just an old MediaWiki which doesn't work |
09:51
π
|
alard |
Nemo_bis: Yes, archives can be larger than 5GB. The script keeps adding users until the size is at least 5GB, then tars and uploads. So the file size could be 5GB+size of last user. |
09:51
π
|
Nemo_bis |
alard, thanks. |
09:57
π
|
Nemo_bis |
alard, and then how will one be able to find a user in all that mass of stuff? |
09:58
π
|
alard |
Well, you start downloading the first tar file, see if you're in there. If not, download the next. |
09:58
π
|
alard |
No, there is a txt file for each tar that lists the users. :) |
09:58
π
|
Nemo_bis |
Oh, how couldn't I think of this. |
09:58
π
|
Nemo_bis |
ok |
09:59
π
|
Nemo_bis |
oh, silly me, I thought that was the _meta |
09:59
π
|
alard |
But eventually, it might be useful to make one big list that points to a specific tar file. |
09:59
π
|
Nemo_bis |
Or, those user names could be all put in some ad-hoc metadata tag |
09:59
π
|
db48x |
which project are you uploading? |
09:59
π
|
SketchCow |
For what it's worth, the mobileme-heros are all progressing properly. |
09:59
π
|
Nemo_bis |
So that you can search them |
09:59
π
|
alard |
http://ia600808.us.archive.org/6/items/archiveteam-mobileme-hero-1/archiveteam-mobileme-hero-1-8.txt |
10:00
π
|
SketchCow |
I agree, as we go, that we will want to consider curating these things. |
10:00
π
|
alard |
SketchCow: Yes, thanks for poking. |
10:00
π
|
SketchCow |
Over time, we can generate things for them, that will allow better searching down the line. |
10:00
π
|
ersi |
300k users to GoGo though |
10:01
π
|
SketchCow |
I'll push the mobileme heroes to a collection at some point. |
10:01
π
|
ersi |
292k actually |
10:01
π
|
SketchCow |
Are these all coming from kenneth's work? |
10:01
π
|
SketchCow |
Where are these tars coming from? |
10:01
π
|
alard |
These tars are coming from kenneth's instances, plus the one that I'm running. |
10:01
π
|
SketchCow |
OK. |
10:01
π
|
alard |
It's a pity that the s3 upload is so slow. |
10:01
π
|
SketchCow |
So at what rate? I assume we're not back to awful |
10:02
π
|
SketchCow |
Or I should say breathtaking |
10:02
π
|
SketchCow |
Yes, S3 is a problem. |
10:02
π
|
ersi |
How's fortress doing by the way? |
10:02
π
|
SketchCow |
I will work with S3's admins to discuss ways to improve. |
10:02
π
|
SketchCow |
Fortress got super-ganked by last night's network storm. |
10:02
π
|
Nemo_bis |
\o/ |
10:02
π
|
Nemo_bis |
(that was about s3) |
10:02
π
|
ersi |
just curious since I've been pushing 30-50Mbit/s to it for quite a while ^_^ |
10:02
π
|
db48x |
oh, nice |
10:03
π
|
db48x |
you guys have been making good progress |
10:03
π
|
db48x |
I'm nearly out of the top 10 |
10:03
π
|
ersi |
>:] |
10:04
π
|
db48x |
I will have to acquire some free space somehow |
10:05
π
|
ersi |
I got another 0.5-1TiB coming online soon~ |
10:06
π
|
db48x |
ok, memtest has done a few passes |
10:06
π
|
db48x |
nothing apparently wrong with my real memory |
10:06
π
|
db48x |
just the video card |
10:06
π
|
db48x |
I wonder which of the three is the one that's broken... |
10:09
π
|
* |
db48x follows the cabling |
10:11
π
|
db48x |
ouch :( |
10:11
π
|
db48x |
burnt myself :( |
10:11
π
|
db48x |
thing is HOT |
10:14
π
|
SketchCow |
OK, going to bed |
10:14
π
|
SketchCow |
tomorrow morning, off to work to face the music, and get some shit done |
10:14
π
|
SketchCow |
DONE I tell you |
10:15
π
|
ersi |
[1]+ Done |
12:21
π
|
godane |
is there a archive of 2600 magzines on archive.org? |
12:29
π
|
ersi |
there is a search feature on archive.org |
12:34
π
|
godane |
i couldn't find them using search |
12:38
π
|
db48x |
:) |
14:56
π
|
blink_ |
Hello all |
14:58
π
|
dnova |
hello |
15:00
π
|
blink_ |
just out of curiosity, does anyone know that there is a warning that the archive team site might be compromised? |
15:01
π
|
Nemo_bis |
blink_, a warbning where? |
15:02
π
|
blink_ |
i tried to go there a minute ago, and my browser showed a warning |
15:02
π
|
blink_ |
Viagra levitra comparison, order levitra - Pill store, best prices!. Save money with generics. Money back guarantee!" |
15:02
π
|
blink_ |
and this "This site may be compromised. |
15:05
π
|
emijrp |
on google |
15:05
π
|
emijrp |
i guess it is due to spammers editing the wiki |
15:06
π
|
blink_ |
so...its safe to click? |
15:06
π
|
emijrp |
viagra is safe |
15:06
π
|
blink_ |
ha |
15:06
π
|
blink_ |
i means the site itself |
15:06
π
|
emijrp |
yes |
15:07
π
|
blink_ |
okies |
15:10
π
|
Nemo_bis |
sigh, why don't people just use the URL bar |
15:11
π
|
ersi |
because they use the URL bar |
15:12
π
|
ersi |
and go to google, then type where they want and then they feel lucky! |
15:13
π
|
Nemo_bis |
he didn't feel lucky, or he wouldn't have seen the warning :p |
15:57
π
|
balrog_ph |
TPB down for anyone? |
15:59
π
|
ersi |
balrog: Nope. It's up for me. |
15:59
π
|
ersi |
but it's been shakier than a heroinist with withdrawal symptoms of lately |
16:00
π
|
balrog |
it was going up and down today |
16:02
π
|
ersi |
note my earlier line |
16:04
π
|
balrog |
yea I see |
16:23
π
|
emijrp |
ok |
16:30
π
|
db48x |
uh, wtf |
16:30
π
|
db48x |
I swapped out my video card |
16:30
π
|
db48x |
now I can see the bios drawing each individual character one row at a time |
16:30
π
|
db48x |
it hasn't even gotten past the first page of the post process yet |
16:37
π
|
kennethre |
memories |
16:49
π
|
emijrp |
i need some volunteers to archive http://commons.wikimedia.org, it contains 12M files, but the first chunk is about 1M files ~= 500 gb, that chunk is made of daily chunks |
16:49
π
|
emijrp |
im going to upload the script and feed list |
16:50
π
|
emijrp |
#wikiteam |
16:59
π
|
Ymgve |
don't wikimedia sites have easily downloadable archives? |
17:00
π
|
Nemo_bis |
not if images |
17:01
π
|
Nemo_bis |
Ymgve, https://wikitech.wikimedia.org/view/Dumps/Image_dumps |
17:03
π
|
Ymgve |
don't worry, after all male genitalia has been removed, the remaining collection will be 2gb |
17:05
π
|
DFJustin |
lmao |
17:07
π
|
SketchCow |
Good compression technique |
17:08
π
|
SketchCow |
I show uploading via S3 is about 2 hours for 50gb |
17:08
π
|
emijrp |
no, really, they are deleting pics because some countries doesn't have "freedom of panorama", so many pics about monuments are biting the dust |
17:08
π
|
DFJustin |
somebody at IA is doing these already right http://www.archive.org/details/wikimediadownloads |
17:08
π
|
DFJustin |
can't they just add image dumps to the list |
17:09
π
|
emijrp |
DFJustin: that is only text |
17:09
π
|
DFJustin |
or are we ripping the images directly |
17:09
π
|
emijrp |
this new project is only images |
17:09
π
|
emijrp |
texts + images = win |
17:09
π
|
DFJustin |
yes ofc |
17:10
π
|
SketchCow |
Well, the liberator is liberated. |
17:10
π
|
SketchCow |
I can't stop that. |
17:12
π
|
DFJustin |
I guess we are ripping the images directly, in that case carry on :) |
17:13
π
|
db48x |
freedom of panorama? |
17:13
π
|
Nemo_bis |
SketchCow, is there some public graph for network stats of the s3 hosts? |
17:14
π
|
SketchCow |
Not really in the way I'd like. |
17:14
π
|
DFJustin |
db48x: US law is you can take pictures of buildings, statues, etc if they are in public and do whatever you want with the pictures |
17:15
π
|
DFJustin |
some other countries you need permission from whoever's stuff it is you're photographing |
17:15
π
|
DFJustin |
to over-simplify |
17:16
π
|
Nemo_bis |
emijrp, it's not you DDOSing WMF servers, is it? :) |
17:16
π
|
balrog |
so multiupload seems dead for good |
17:16
π
|
balrog |
:| |
17:16
π
|
SketchCow |
Someone help me here. |
17:16
π
|
emijrp |
Nemo_bis: no, but toolserver is slow as hell |
17:17
π
|
Nemo_bis |
emijrp, don't complain, wikis are down :p |
17:17
π
|
SketchCow |
1. http://jliberate.k-srv.info/index.php - who is that, and whos hosting it. |
17:17
π
|
SketchCow |
2. The bookmarklet sends the documents "somewhere". Where is that? |
17:17
π
|
Nemo_bis |
emijrp, but here's one of the responsibles for that https://ru.wikipedia.org/wiki/%D0%9E%D0%B1%D1%81%D1%83%D0%B6%D0%B4%D0%B5%D0%BD%D0%B8%D0%B5_%D1%83%D1%87%D0%B0%D1%81%D1%82%D0%BD%D0%B8%D0%BA%D0%B0:Dmitry89#Toolserver |
17:17
π
|
SketchCow |
Like, are we actually getting those documents? |
17:18
π
|
DFJustin |
yeah I was wondering about that, I don't know that the backend ever got hooked up for that |
17:18
π
|
Nemo_bis |
db48x, https://commons.wikimedia.org/wiki/Commons:FOP |
17:20
π
|
Nemo_bis |
emijrp, <domas> DDoS in progress, please be silent |
17:27
π
|
SketchCow |
https://twitter.com/#!/JSTOR/status/174155323668574208 |
17:27
π
|
Nemo_bis |
:O |
17:30
π
|
SketchCow |
https://twitter.com/#!/archiveteam/status/174184820782542849 |
17:30
π
|
SketchCow |
I'd like that http://jliberate.k-srv.info/index.php to go down |
17:31
π
|
SketchCow |
Oh, I see it's an undersco2 special |
17:31
π
|
db48x |
Nemo_bis: sheesh |
17:32
π
|
emijrp |
SketchCow: what format is prefered for the 1,000-5,000 image packages? zip, tar, 7z ? it includes also one .xml per image with the wikitext description metadata |
17:33
π
|
SketchCow |
zip. |
17:33
π
|
emijrp |
ok |
17:34
π
|
SketchCow |
Oh goddamnit. |
17:34
π
|
yipdw |
I think ArchiveTeam needs a gravatar |
17:35
π
|
yipdw |
the middle finger GIF will do |
17:35
π
|
SketchCow |
http://tracker.archive.org/ |
17:35
π
|
SketchCow |
See that? That's the sound of someone going into The Hole for a week. |
17:35
π
|
DFJustin |
wow |
17:35
π
|
yipdw |
SketchCow: wtf |
17:42
π
|
SketchCow |
A shame, too, because I had actually calmed down. |
17:42
π
|
SketchCow |
Whih is are, I haven't calmed down since 1996 |
17:43
π
|
SketchCow |
Wow, typing is gone, I blame the keyboard. |
17:50
π
|
SketchCow |
Well, lost my my line place for the shower. |
17:52
π
|
emijrp |
what is the best way to zip a directory? |
17:52
π
|
emijrp |
zip a.zip folder/subfolder/* |
17:52
π
|
emijrp |
? |
17:53
π
|
SketchCow |
zip -9 -r azip topmostfolder |
17:53
π
|
SketchCow |
zip -9 -r a.zip topmostfolder |
17:55
π
|
emijrp |
if i dont add /* to end, it doesnt work |
17:58
π
|
db48x |
I don't really feel like doing any real work today |
18:15
π
|
alard |
SketchCow: If you have a moment, could you rerun this task? http://www.us.archive.org/log_show.php?task_id=97292517 Another network error. [archiveteam-mobileme-hero-4] |
18:16
π
|
SketchCow |
For the record, we're murdering the s3 interface. |
18:17
π
|
SketchCow |
But I think it needs to be murdered, increased in benefit by working better. |
18:19
π
|
alard |
Should we go slower? (As far as I can see it's still responsive, so we open fewer connections than yesterday.) |
18:19
π
|
alard |
Heroku - archive.org : 1 - 0 ? |
18:24
π
|
Coderjoe |
SketchCow: any progress on getting the friendster off that USB drive I sent? If not, I'd be willing to switch to JFS or even just a tarball-in-partition or whatever, if the drive were sent back here. |
18:26
π
|
SketchCow |
We should go a little slower, if possible. |
18:26
π
|
SketchCow |
Coderjoe: I've not had a chance to look at it |
18:26
π
|
SketchCow |
But I will |
18:27
π
|
SketchCow |
I am willing to pay for a second drive so you can try again |
18:28
π
|
Coderjoe |
alright. I can do that. and it can be a bare SATA drive, too. |
18:29
π
|
Coderjoe |
I was planning to use that USB drive for other stuff later. |
18:29
π
|
Coderjoe |
hmm |
18:30
π
|
Coderjoe |
I could apparently be at JSTOR in a few hours... |
18:30
π
|
Coderjoe |
if I were already going that way, it might be amusing to go knock on their door |
18:32
π
|
Coderjoe |
at least based on the location in their twitter profile |
18:33
π
|
SketchCow |
I've been talking to archive.org about it. |
18:33
π
|
SketchCow |
They already have a link in back. JSTOR wants to give archive.org a copy of their PD stuff. |
18:33
π
|
Coderjoe |
cool |
18:33
π
|
SketchCow |
It's just arrangement now. |
18:34
π
|
SketchCow |
Hence this liberator, which is redundant and non-supported, is not needed. |
18:38
π
|
db48x |
I think we need to outlaw pollen |
18:39
π
|
DFJustin |
if pollen is outlawed, then only outlaws will have pollen! |
18:39
π
|
dnova |
that would pretty much wipe out all above-water life |
18:39
π
|
Coderjoe |
dnova: I'm pretty sure it would also have a detremental effect on underwater life as well |
18:40
π
|
db48x |
we would have to re-engineer our plant life, granted |
18:40
π
|
db48x |
they can all just use wifi or something |
18:41
π
|
SketchCow |
OK, I need to drive to work. |
18:41
π
|
SketchCow |
Let's try and not have a massive political and reputational blowup in the hour commute. |
18:42
π
|
dnova |
you're asking the impossible |
18:42
π
|
SketchCow |
I know, I am just required to ask under the terms of my gitmo parole |
18:42
π
|
dnova |
oh, just "try"... ok. |
18:42
π
|
db48x |
SketchCow: why do you have an hour-long commute in a town you are visiting? |
18:43
π
|
SketchCow |
I stay in San Jose |
18:43
π
|
emijrp |
SketchCow: zip browser works whit subfolders? |
18:43
π
|
db48x |
huh |
18:43
π
|
SketchCow |
emijrp: Yes. |
18:44
π
|
SketchCow |
http://www.archive.org/download/SuperBlue/pc_blue_ii.zip/ |
18:45
π
|
emijrp |
great |
18:46
π
|
emijrp |
ok, we can make some tests http://www.archiveteam.org/index.php?title=Wikimedia_Commons#Archiving_process |
18:46
π
|
emijrp |
i have tested the script, but, i hope we can find any error before we start to DDoS wikipedia servers |
18:47
π
|
emijrp |
by the way, my upload stream is shit, so, i wont use this script much, irony |
18:48
π
|
db48x |
heh |
18:49
π
|
emijrp |
file list available from 2004-09-07 to 2006-12-31 |
18:49
π
|
db48x |
throw that thing up on github :) |
18:49
π
|
db48x |
pastebins are... annoying |
18:49
π
|
emijrp |
i will create further lists in the next days |
18:50
π
|
emijrp |
db48x: http://code.google.com/p/wikiteam/source/browse/trunk/commonsdownloader.py |
18:52
π
|
db48x |
emijrp: perfect |
19:09
π
|
db48x |
emijrp: seems to be working |
19:09
π
|
Nemo_bis |
db48x, you edit conflicted me, why didnj't you join #wikiteam? :p |
19:09
π
|
emijrp |
db48x: ok, please go to #wikiteam we are coordinating there |
19:12
π
|
emijrp |
SketchCow: why isnt there a link to the browsable version from here http://www.archive.org/details/SuperBlue ? |
19:14
π
|
db48x |
heh |
19:17
π
|
Nemo_bis |
emijrp, this is how they generate links: http://www.archive.org/details/archiveteam-googlegroups-jz |
19:17
π
|
Nemo_bis |
in The Right Way |
19:18
π
|
Nemo_bis |
trailing slash is the trick I suppose |
19:22
π
|
topaz |
http://archiveteam.org does not appear to be a helpful URL. :-) |
19:22
π
|
topaz |
(or is that old news?) |
19:23
π
|
db48x |
topaz: yea, the front page is a little out of date |
19:23
π
|
db48x |
splinder is finished, but mobileme is still going |
19:23
π
|
topaz |
right now the front page looks like a viagra ad on my browser. |
19:24
π
|
db48x |
nice |
19:24
π
|
db48x |
your user agent is set to the googlebot |
19:24
π
|
chronomex |
looks fine from my phone |
19:25
π
|
db48x |
the wiki is occasionally slightly hacked |
19:25
π
|
db48x |
if it thinks you're google then it's a pharma scam |
19:26
π
|
Coderjoe |
topaz: using a googlebot UA? |
19:26
π
|
topaz |
no, using Chrome |
19:26
π
|
Coderjoe |
oh hi. I ken read 5 lines up |
19:26
π
|
topaz |
I was spoofing user agents some time ago but I thought I'd turned all that off, double checking now |
19:26
π
|
Coderjoe |
perhaps the UA hack the crackers put in just checks for "google" and not "googlebot" |
19:27
π
|
Coderjoe |
UA masking of spammy crap is unfortunately not uncommon |
19:27
π
|
topaz |
yeah, I'm not overriding the user-agent. |
19:28
π
|
Coderjoe |
refresh? |
19:28
π
|
Coderjoe |
I'm not seeing viagra as googlebot |
19:29
π
|
topaz |
I still am. |
19:29
π
|
topaz |
hmm, hang on |
19:29
π
|
topaz |
yeah, still spam |
19:29
π
|
Coderjoe |
I turned off adblock and refreshed as googlebot and still am not |
19:29
π
|
Coderjoe |
are you perhaps using a hacked proxy? |
19:30
π
|
topaz |
conceivable. I'm at work and have not examined the network settings closely. I'm getting the same result when I use Safari. |
19:32
π
|
topaz |
but my browser user-agent string is "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11" (just confirmed against my own web server) |
19:45
π
|
LimbClock |
i've been meaning to ask, have you guys made a siterip of Old Man Murray yet? |
19:55
π
|
topaz |
anyway, I know none of the rest of you are seeing the viagra spam on the archiveteam.org page, but who'd be responsible for fixing it anyway? :-) |
19:57
π
|
ersi |
SketchCow as usual. He'll get to it |
19:57
π
|
ersi |
it's not like it hasn't happened before |
19:57
π
|
topaz |
ah ok |
19:57
π
|
ersi |
dreamhost is a turd |
19:57
π
|
ersi |
shiny shiny unlimited turd |
19:57
π
|
balrog |
topaz: why are you seeing it? |
19:57
π
|
topaz |
balrog: haven't been able to figure it out. |
19:57
π
|
balrog |
you're sure you don't have a rootkit? |
19:57
π
|
ersi |
he's probably saying he's googlebot in his useragent string |
19:57
π
|
topaz |
seeing it in both Chrome and Safari, OSX |
19:58
π
|
ersi |
wΓΒ€t |
19:58
π
|
topaz |
ersi: unmodified user-agent |
19:58
π
|
topaz |
scroll back |
19:58
π
|
topaz |
I'm happy to conduct more experiments but not sure where else to look on my end. |
19:58
π
|
ersi |
yeah, I was too lazy to connect the dots. but yes, I see now~ |
20:04
π
|
topaz |
I seriously doubt that my laptop has been rooted. Not that it couldn't happen, but a root that only announces itself on specific URLs like this? |
20:04
π
|
ersi |
It's most likely the wiki |
20:04
π
|
emijrp |
topaz: perhaps, next time google index AT wiki, it will be fiexed |
20:05
π
|
topaz |
emijrp: doesn't have anything to do with the Google index. |
20:06
π
|
emijrp |
then ? |
20:10
π
|
topaz |
precisely my question. |
20:11
π
|
kennethre |
topaz: dns maybe |
20:11
π
|
kennethre |
google things archiveteam.org is a potential threat |
20:11
π
|
kennethre |
*thinks |
20:12
π
|
kennethre |
http://cl.ly/1Y473S411Z2I1c2I0V31 |
20:12
π
|
topaz |
kennethre: doubt it's DNS poisoning. the page source is littered with what looks like real references from the archiveteam.org wiki. |
20:12
π
|
kennethre |
topaz ^ |
20:12
π
|
kennethre |
topaz: google agrees with you http://cl.ly/1Y473S411Z2I1c2I0V31 |
20:13
π
|
topaz |
yeah, so the wiki is returning viagra spam to google to poison the search engine cache. what I can't figure out is why it's returning it to my browser too,. |
20:13
π
|
kennethre |
hahahaha |
20:13
π
|
kennethre |
that's awesome |
20:14
π
|
kennethre |
topaz: you're using chrome? |
20:14
π
|
topaz |
and, sorry, I know this has gone beyond anything that the rest of you folks can do anything about :-) |
20:14
π
|
topaz |
kennethre: I'm seeing it in both Chrome and Safari on OSX. unmodified user-agent strings. |
20:14
π
|
kennethre |
topaz: i wonder if the page pre-fetching has a different user agent |
20:14
π
|
Coderjoe |
kennethre: I'm not seeing it with a googlebot UA |
20:14
π
|
kennethre |
well, why are we poisioning the cache? |
20:14
π
|
kennethre |
that's rediculous |
20:15
π
|
kennethre |
*ridiculous |
20:15
π
|
ersi |
The wiki has gotten viagra aids before, several times |
20:15
π
|
kennethre |
oh bots |
20:15
π
|
ersi |
yeah |
20:16
π
|
kennethre |
haha |
20:16
π
|
kennethre |
viagra aids |
20:16
π
|
ersi |
I rule etc |
20:17
π
|
topaz |
I get it even if I explicitly mask myself as IE9. Amazing. |
20:17
π
|
topaz |
I can only imagine that the hacked code is returning it based on my IP address for some bizarre reason |
20:28
π
|
db48x |
why is my latency so high? |
20:33
π
|
ersi |
db48x: because someone is snoopin' on yer tubes |
20:52
π
|
SketchCow |
MORNING |
20:52
π
|
SketchCow |
OK AM AT THE ARCHIVE AGAIN |
20:53
π
|
hybernaut |
topaz: are you reloading the page with pragma: no-cache? |
20:54
π
|
DFJustin |
hide your kids hide your files |
20:54
π
|
hybernaut |
(cmd-shift-R on Chrome/Mac) |
20:55
π
|
topaz |
yes, I'm doing force-reload with command-shift-R |
20:55
π
|
hybernaut |
topaz: can you do it again with Developer Tools open to the Network tab? |
20:55
π
|
topaz |
sure, sec |
20:56
π
|
hybernaut |
just curious if you're getting a 200 OK response, or a 304 Not Modified |
20:56
π
|
SketchCow |
Wow, everyone has questions for me. |
20:56
π
|
topaz |
Developer Tools confirms that I am getting objects |
20:57
π
|
SketchCow |
* Why does CD-ROM not have link to browseable version? We added iso and zip browsing after it was uploaded. |
20:57
π
|
topaz |
a long list of 200 OK, not seeing any 304 responses |
20:57
π
|
SketchCow |
* Why does archiveteam.org have spam? Dreamhost blows, looking for a new host now. |
20:57
π
|
hybernaut |
topaz: any proxy headers in the response headers? |
20:57
π
|
topaz |
SketchCow: mainly I was trying to figure out why I'm the only one who seems to be seeing it :-) |
20:57
π
|
SketchCow |
It's because your user agents are hipster. |
20:57
π
|
SketchCow |
That's why |
20:57
π
|
topaz |
(which is now primarily of interest to me, but whatever) |
20:57
π
|
topaz |
NO IT IS NOT |
20:58
π
|
topaz |
dude :-) |
20:58
π
|
SketchCow |
Yes... it is. |
20:58
π
|
SketchCow |
It spams based on user-agent string. |
20:58
π
|
topaz |
ok, so if I were to switch my user agent to match, say, IE9, it should work right? |
20:58
π
|
topaz |
because it doesn't. |
20:59
π
|
emijrp |
but does Google send different result description by user-agent? :/ |
21:00
π
|
hybernaut |
Greetings, Doctor SketchCow: my service are at yourΓ’ΒΒ¦um, service |
21:00
π
|
topaz |
hybernaut: where would I find proxy headers? sorry, I haven't dug that deep into Developer Tools :-/ |
21:01
π
|
hybernaut |
topaz: select the index.php entry, then look at Response Headers |
21:01
π
|
hybernaut |
but I don't know anything about the wiki's perverse behavior; I am just curious |
21:02
π
|
topaz |
ah, I see. no, I'm not seeing any proxy response headers there. |
21:03
π
|
SketchCow |
OK, first. |
21:04
π
|
SketchCow |
Im switching archiveteam.org hosts, right now. |
21:04
π
|
SketchCow |
Thank you for enrolling your card to be billed in your local currency! |
21:04
π
|
topaz |
:-) |
21:06
π
|
kennethre |
SketchCow: webfaction is awesome |
21:06
π
|
kennethre |
SketchCow: if you don't already have something planned |
21:07
π
|
ersi |
emijrp: AFAIK they don't, but they do on other factors such as which ccTLD you came to, language settings, if you are logged in to a Google account or not |
21:08
π
|
SketchCow |
We are switching to hostgato. |
21:08
π
|
SketchCow |
Hostgator. |
21:08
π
|
SketchCow |
I know there's a bunch, I'm just liking Hostgator. |
21:08
π
|
SketchCow |
Just bought a 3 year membership. |
21:08
π
|
alard |
Do you want feedback? :) |
21:10
π
|
SketchCow |
What, is there something wrong with hostgator? |
21:11
π
|
ersi |
There's always someone not liking a shared hosting provider |
21:11
π
|
SketchCow |
Yeah, but I figured I'd let him say it anyway. |
21:12
π
|
alard |
No, but since we recently established you like feedback on business cards and engagements, I thought we might have to find something on hosting providers too. |
21:12
π
|
SketchCow |
Never let it be said I don't listen to feedback (while watching the price is right, and jerking off) |
21:12
π
|
SketchCow |
SHOWCASE........SHOWDOWN |
21:12
π
|
ersi |
Price is right? Not wheel of fortune? :( |
21:12
π
|
ersi |
Sorry, let me fix that |
21:12
π
|
SketchCow |
Wheel of fortune rounds end too quick |
21:12
π
|
ersi |
WHEEL. OF. FORTUNE! |
21:13
π
|
SketchCow |
So, Sam, the archive.org admin will be coming in to talk with me, kennethre, alard about maximizing the S3 interface |
21:13
π
|
SketchCow |
We're the big test, we're going to find all the limits. |
21:14
π
|
SketchCow |
The explosion on the weekend was finding one, since repaired somewhat. |
21:14
π
|
SketchCow |
Soon we can go to max. |
21:14
π
|
alard |
http://duckduckgo.com/?q=hostgator+sucks |
21:14
π
|
alard |
:) |
21:15
π
|
kennethre |
SketchCow: fantastic :) |
21:15
π
|
ersi |
Um, should dld-me really output to /dev/null for public.me.com? :o |
21:15
π
|
SketchCow |
74.52.105.105 is our ip |
21:16
π
|
ersi |
hm, seems like it's writing to the warc though, that's good |
21:16
π
|
SketchCow |
http://duckduckgo.com/?q=dreamhost+sucks&t=1 |
21:16
π
|
alard |
ersi: Yes, it should. The files are saved in the warc, so there's no need to write them to disk. |
21:16
π
|
alard |
The web is full of hate. |
21:16
π
|
ersi |
Hate Machine |
21:16
π
|
SketchCow |
http://duckduckgo.com/?q=webfaction+sucks&t=1 |
21:16
π
|
ersi |
http://duckduckgo.com/?q=archiveteam+sucks |
21:16
π
|
kennethre |
they all http://duckduckgo.com/?q=heroku+sucks&t=1 |
21:16
π
|
ersi |
wΓΒ€t |
21:16
π
|
kennethre |
oh look nothing |
21:16
π
|
kennethre |
:) |
21:17
π
|
SketchCow |
http://duckduckgo.com/?q=your+favorite+sucks&t=1 |
21:17
π
|
kennethre |
haha |
21:18
π
|
SketchCow |
MediaWiki is a free software open source wiki package written in PHP, originally for use on Wikipedia. |
21:18
π
|
SketchCow |
Version: 1.18.1 |
21:18
π
|
SketchCow |
OK, we're installing that. |
21:20
π
|
SketchCow |
I'm going to need help with the transfer, imagine that. |
21:20
π
|
SketchCow |
But it's the only way to ensure clearing |
21:20
π
|
DFJustin |
"maximize the s3 interface" sounds like something you could do after remodulating the deflector dish |
21:21
π
|
kennethre |
reverse the polarity |
21:27
π
|
SketchCow |
I love messing with hosts files so much |
21:30
π
|
SketchCow |
OK, who wants to help transfer this thing to the new box. |
21:31
π
|
Coderjoe |
alard: wasn't there an issue with wget not being able to look for links in files downloaded to /dev/null? |
21:32
π
|
alard |
Yes, but > /dev/null is only for public.me.com, where --mirror is not used. |
21:32
π
|
alard |
web and homepage are --mirror and rm -rf files/ |
21:40
π
|
ersi |
yeah |
21:43
π
|
SketchCow |
OH BOY I LOVE SWITCHING SERVERS |
21:44
π
|
SketchCow |
Just learning hostgator's himblehabble |
21:54
π
|
Coderjoe |
ersi: ah |
21:55
π
|
Coderjoe |
alard: are the ones that need a temporary output location able to use a specified location? (to allow tmpfs/ramfs, for example?) |
21:57
π
|
alard |
Yes, if you're careful you can replace the references to the files/ subdirectory with something else. (But be extra careful if you're running multiple scripts in the same directory.) |
21:57
π
|
alard |
Is I/O a problem? |
21:58
π
|
Coderjoe |
i don't know offhand, for mobileme. i think it was for another project |
21:59
π
|
alard |
An easier optimization: set the wget-warc tempdir (if the scripts don't already do that). |
21:59
π
|
alard |
Every warc record is first written to a temporary file. |
22:00
π
|
alard |
--warc-tempdir |
22:17
π
|
alard |
SketchCow: For planning, when would Sam come in to talk about the s3 api? |
22:18
π
|
SketchCow |
He just went into a meeting. Here, hit him up: samuel@archive.org |
22:39
π
|
SketchCow |
HYBERNAUT SAYS HI |
22:39
π
|
SketchCow |
Someone give him something to do |
22:40
π
|
hybernaut |
Greetings! |
22:41
π
|
alard |
Hi. |
22:41
π
|
SketchCow |
Oh, so many project. |
22:41
π
|
hybernaut |
I have skillz in the Ruby, JavaScript, databases |
22:41
π
|
hybernaut |
and competent knowledge of teh unix command line |
22:43
π
|
ersi |
I got skills at making worthless remarks, like this one |
22:45
π
|
hybernaut |
well if I can help with something, I will have more worthful remarks to make, too |
22:45
π
|
hybernaut |
otherwise, I will lurk politely, sorry |
22:54
π
|
ersi |
like this one meant mine, not your |
22:57
π
|
SketchCow |
New wiki is functioning, but it has no data from the old wiki... yet. |
22:57
π
|
SketchCow |
I hate this work, by the way |
23:01
π
|
hybernaut |
are you copying the database, or do you have another plan? |
23:14
π
|
alard |
SketchCow: Just sent the email to Sam. |
23:14
π
|
alard |
Now going to bed. Bye. |
23:51
π
|
SketchCow |
Thanks, alard |