Time |
Nickname |
Message |
00:01
🔗
|
yipdw |
hmm |
00:02
🔗
|
yipdw |
come to think of it, I may not have anything that would otherwise be blocked by robots exclusion lists |
00:02
🔗
|
yipdw |
because my main index source was Google News, and Google obeys those |
00:02
🔗
|
yipdw |
well, still worth a once-over |
00:17
🔗
|
bsmith093 |
is there anything that losslessly converts between cbr and cbz to pdf files |
00:18
🔗
|
bsmith093 |
i can do it with gscan to pdf, but they alwasy come out fuzzy, like theyve been slightly over sized |
00:22
🔗
|
yipdw |
if Wikipedia is correct and CBR/CBZs just wrap a collection of PNGs or JPEGs, then you don't need to PDF them |
00:22
🔗
|
yipdw |
however, if you want to, what you need to do is ensure that the DPI of the PDF is the same as the DPI of the PNG or JPEG |
00:24
🔗
|
bsmith093 |
i know i dont have to, but my windows friends like pdfs rather than cbr/z, and i want to share some hilarious webcomics. |
00:32
🔗
|
yipdw |
holy hell, I started a gunzip job on an EBS volume attached to an EC2 micro six hours ago and it's still not done |
00:32
🔗
|
yipdw |
what does Amazon run those things on |
00:32
🔗
|
yipdw |
hamsters? |
00:35
🔗
|
DFJustin |
kindles |
00:36
🔗
|
yipdw |
on fire |
00:42
🔗
|
bsmith093 |
is dpi something i have to set, because i cant find it in the imag files |
00:43
🔗
|
bsmith093 |
there gifs, if it matters |
01:11
🔗
|
LordNlptp |
hamsters with tails on fire |
01:12
🔗
|
LordNlptp |
in little wheels which runs a big mechanical computer |
02:47
🔗
|
SketchCow |
Where's the MobileMe channel? |
03:32
🔗
|
Paradoks |
#memac? I don't think we ever made one. |
04:10
🔗
|
Coderjoe |
www.youtube.com/watch?v=7ezeYJUz-84 |
08:09
🔗
|
ersi |
http://yro.slashdot.org/story/12/01/23/1725231/carl-malamud-answers-goading-the-government-to-make-public-data-public |
09:25
🔗
|
yipdw |
http://www.ninjawedding.org/sopa/stories.html |
09:25
🔗
|
yipdw |
and that'll probably be the last I write about SOPA archiving unless someone else demands more |
09:31
🔗
|
chronomex |
you sure it works? http://wayback.at.ninjawedding.org/*/http://www.bigshinyrobot.com/reviews/archives/35921 |
09:31
🔗
|
yipdw |
some probably aren't quite working |
09:31
🔗
|
yipdw |
I didn't review the full import log |
09:31
🔗
|
chronomex |
k |
09:31
🔗
|
yipdw |
most I've tried do resolve, though |
09:32
🔗
|
yipdw |
e.g. http://wayback.at.ninjawedding.org/20120118202651/http://blogs.telegraph.co.uk/technology/alexisdormandy/100007102/jimmy-wales-is-showing-a-lack-of-imagination-over-the-wikipedia-shutdown/ |
09:32
🔗
|
yipdw |
chronomex: if you find more, let me know what they are and I can see if wayback printed log messages pertinent to those WARCs |
09:33
🔗
|
chronomex |
is it relatively straightforward to get a wayback machine running? I might do my own deployment... |
09:33
🔗
|
yipdw |
well, the basics aren't bad -- it's much like any other Java webapp |
09:33
🔗
|
chronomex |
euh :P |
09:33
🔗
|
yipdw |
there is quite a bit you can do with Hadoop etc though to speed up indexing that I haven't yet enabled |
09:34
🔗
|
ersi |
Heretrix has hadoop support? |
09:34
🔗
|
yipdw |
Wayback does |
09:34
🔗
|
yipdw |
Heritrix, I don't know |
09:34
🔗
|
yipdw |
(maybe) |
09:34
🔗
|
yipdw |
and if it doesn't you could write a Hadoop job to wrap it |
09:34
🔗
|
ersi |
Hmm. |
09:34
🔗
|
ersi |
Heretrix is just the grabber/crawler? |
09:34
🔗
|
yipdw |
yeah |
09:35
🔗
|
yipdw |
Wayback's the only thing I've found (so far) that will actually let me poke around in WARCs |
09:35
🔗
|
yipdw |
I've been throwing around the idea of a desktop application into which you load a bunch of WARCs |
09:35
🔗
|
yipdw |
it inflates each record and rewrites URLs for local viewing |
09:35
🔗
|
yipdw |
but that's a way down the road :P |
09:35
🔗
|
yipdw |
I think it'd be useful, though |
09:36
🔗
|
ersi |
sounds awesome (as well as somewhat painful) |
09:36
🔗
|
yipdw |
heh |
09:36
🔗
|
yipdw |
I dunno |
09:36
🔗
|
yipdw |
WARC handling isn't too bad if the WARC is compressed per-record |
09:36
🔗
|
yipdw |
URL rewriting...yeah, that's bitchy |
09:37
🔗
|
yipdw |
actually, impossible |
09:37
🔗
|
yipdw |
unless you catch outbound network requests or some crazy shit you're never going to catch them all |
09:37
🔗
|
yipdw |
but DTSTTCPW etc |
09:38
🔗
|
yipdw |
actually, on that note, catching outbound network requests is probably imperative |
09:39
🔗
|
yipdw |
if you're viewing archived data, you are working with data from God-knows-where, and as such you do need to sandbox it |
09:39
🔗
|
yipdw |
maybe Webkit has a way to intercept such things |
09:40
🔗
|
yipdw |
crash time! |
09:42
🔗
|
yipdw |
oh, also, wayback.at.ninjawedding.org is an EC2 micro instance, so don't be surprised if it chokes every now and then |
13:42
🔗
|
emijrp |
http://yro.slashdot.org/story/12/01/23/1725231/carl-malamud-answers-goading-the-government-to-make-public-data-public |
16:14
🔗
|
Coderjoe |
the WARCs written by wget are supposed to be compressed per-record. |
17:11
🔗
|
Coderjoe |
... |
17:11
🔗
|
Coderjoe |
http://www.youtube.com/watch?v=Uae58589aec |
17:11
🔗
|
Coderjoe |
youtube has an "original" quality setting there |
17:25
🔗
|
nitro2k01 |
Oh nice! For the times when I want to watch a video at a higher resolution than my display's, and have the video lag! |
17:28
🔗
|
DFJustin |
will be handy for youtube-dl and future-proofing |
17:52
🔗
|
yipdw |
jhttps://wwws.whitehouse.gov/petitions#!/petition/investigate-chris-dodd-and-mpaa-bribery-after-he-publicly-admited-bribing-politicans-pass/DffX0YQv |
17:52
🔗
|
yipdw |
er |
17:52
🔗
|
yipdw |
https://wwws.whitehouse.gov/petitions#!/petition/investigate-chris-dodd-and-mpaa-bribery-after-he-publicly-admited-bribing-politicans-pass/DffX0YQv |
17:52
🔗
|
yipdw |
who wants to bet on Noncommittal Half-Response |
18:02
🔗
|
Coderjoe |
"first name, last initial and city and state will be publicly displayed on the petition page." |
18:02
🔗
|
SketchCow |
Hey, so I finished the Sound of Young America upload last night! |
18:02
🔗
|
SketchCow |
611 audio and video |
18:03
🔗
|
Coderjoe |
which is fine if you have a common enough name that there are multiple in your city... |
18:03
🔗
|
mach |
even if there were multiple people, how hard would it be to identify the sort of person who would sign the petition? |
18:10
🔗
|
yipdw |
Coderjoe: meh, I don't care |
18:10
🔗
|
yipdw |
the information displayed on that page is enough to identify me |
18:10
🔗
|
yipdw |
for that matter, so is the information in a WHOIS query or a credit report gone astray |
18:54
🔗
|
yipdw |
Rust looks like a neat language |
18:55
🔗
|
yipdw |
its standard library strikes me as a bit weird |
18:55
🔗
|
yipdw |
http://doc.rust-lang.org/doc/std/files/four-rs.html |
18:55
🔗
|
yipdw |
what is that doing in there? |
18:57
🔗
|
Coderjoe |
darn. nothing implemented yet on http://rosettacode.org/wiki/Rust |
18:58
🔗
|
yipdw |
it feels a bit like Go |
18:59
🔗
|
yipdw |
I don't like its typing system, though, because (from what I've read so far) it seems inconsistent |
18:59
🔗
|
yipdw |
you can specify interfaces -- collections of methods -- but you can't say "this function accepts anything that responds to m" |
18:59
🔗
|
Coderjoe |
hmm. that page was just added about 50 minutes ago |
19:00
🔗
|
yipdw |
it looks like you have to say "this function accepts things that implement Somethingable" |
19:00
🔗
|
yipdw |
and that sucks |
19:01
🔗
|
yipdw |
hm, yeah, type-by-name is encoded into the language itself, too, in e.g. the requirement that the conditional on an if must receive a value of "type boolean" |
19:01
🔗
|
yipdw |
oh well |
20:08
🔗
|
SketchCow |
Guys, where can one get wget-warc binaries? |
20:15
🔗
|
yipdw |
I build it using get-wget-warc.sh |
20:16
🔗
|
yipdw |
from splinder-grab or one of the other grabber codebases that generate WARCs |
20:58
🔗
|
SketchCow |
Greetings, |
20:58
🔗
|
SketchCow |
I just wanted to give you a heads up that you might want to take one last pass at archiving any www-personal.umich.edu websites, they will be deleted this year. |
20:58
🔗
|
SketchCow |
The University of Michigan recently announced that they are contracting out email and web services to Google, and all accounts will be transitioned later this year. As part of this all the old websites will be deleted by August. They aren't providing any support for moving the websites and won't support redirection after August, so i suspect most of the web pages will simply vanish. They also have not made any of this clear in their public anno |
20:58
🔗
|
SketchCow |
I've moved my own web site and have redirection set up (at least until August) but you might want to get a copy of the rest of them while they still exist. |
20:58
🔗
|
SketchCow |
Thanks! |
20:58
🔗
|
SketchCow |
... |
20:58
🔗
|
SketchCow |
They aren't providing any support for moving the websites and won't support redirection after August, so i suspect most of the web pages will simply vanish. They also have not made any of this clear in their public announcements, so I suspect many to the web site authors will be caught by surprise |
20:58
🔗
|
SketchCow |
I only found out by having several conversations with the IT folks. I get the feeling this aspect of the transition was overlooked. |
20:58
🔗
|
balrog |
:/ |
20:59
🔗
|
SketchCow |
Let's do it. |
20:59
🔗
|
Nemo_bis |
omg crazy |
20:59
🔗
|
SketchCow |
#uwish |
21:00
🔗
|
nitro2k01 |
This might possibly be possible to do from the inside with that guy's help |
21:00
🔗
|
nitro2k01 |
If this is a standard UNIX system, it's likely that public_html is world-readable |
21:01
🔗
|
nitro2k01 |
So you could read /home/xxx/public_html/* |
21:01
🔗
|
balrog |
nitro2k01: if there are cgi scripts those may not be world readable |
21:01
🔗
|
nitro2k01 |
Even though you can't read /home/xxx/* in general |
21:01
🔗
|
nitro2k01 |
Right |
21:01
🔗
|
nitro2k01 |
Still worth a shot if he wants to give it a go |
21:03
🔗
|
emijrp |
Archive Team goes to University. |
21:04
🔗
|
DoubleJ |
Any CMU or MIT kids in here? Unless things have changed, you should still be able to hit umich.edu over AFS/Athena. |
21:04
🔗
|
DoubleJ |
Assuming of course, that umich won't mind you hammering their servers with a sheel script... |
21:04
🔗
|
DoubleJ |
And of course, WARCs will still be needed. |
21:05
🔗
|
nitro2k01 |
Doesn't even need to be a shell script |
21:05
🔗
|
nitro2k01 |
tar + wildcards ftw |
21:05
🔗
|
balrog |
DoubleJ: you could throttle it :p |
21:07
🔗
|
emijrp |
What do a bunch of archivists inside an University full of chicks? |
21:07
🔗
|
ersi |
you accidentally words |
21:09
🔗
|
nitro2k01 |
"an ooniversity" |
22:55
🔗
|
bbot_ |
SketchCow: https://secure.flickr.com/photos/textfiles/6716867195/in/photostream/ |
22:55
🔗
|
bbot_ |
What's the jointed-arm-thing under the dust cover? |
22:57
🔗
|
nitro2k01 |
Could be a microscope of some kind |
23:30
🔗
|
SketchCow |
Magnifying glass. |
23:56
🔗
|
BlueMax |
Microscope? |
23:57
🔗
|
DFJustin |
http://news.thomasnet.com/fullstory/Bench-Magnifier-offers-accurate-view-across-surface-area-484982 |