Time |
Nickname |
Message |
00:19
🔗
|
dashcloud |
so is there a general tool that will scrape urls from a google search for you? |
02:18
🔗
|
Coderjoe |
no kidding about metadata being a love note to the future... |
02:18
🔗
|
Coderjoe |
I have a couple hundred unlabled VHS tapes, a number of unlabled hard drives (offline), a bunch of unlabeled floppies, etc... |
02:19
🔗
|
Coderjoe |
but something is a little odd about the shirt coloring around the text, like they had the heat press turned up too high, pressed too long, or with too much pressure |
02:33
🔗
|
asking |
can we archive lachlan cranswick's page? |
02:33
🔗
|
asking |
he died and i don't know how long it will be up |
02:34
🔗
|
asking |
he was a physicist from australia with a bunch of topics on his page ranging from physics to poetry |
02:34
🔗
|
asking |
lachlan.bluehaze.com.au/ |
02:34
🔗
|
asking |
proof: www.abc.net.au/news/2010-06-16/aussie-scientists-body-found-in-canadian-river/869416 |
02:34
🔗
|
asking |
more proof www.cbc.ca/news/canada/ottawa/story/2010/02/04/deep-river-lachlan-cranswick.html |
02:40
🔗
|
Coderjoe |
go ahead |
02:41
🔗
|
asking |
i don't have the space |
02:41
🔗
|
Coderjoe |
Nothing stops an individual from archiving something they feel is important |
02:41
🔗
|
Coderjoe |
... aside from that :-\ |
02:41
🔗
|
asking |
yes there are a shitload of things |
02:41
🔗
|
asking |
time, space, bandwidth |
02:41
🔗
|
asking |
knowledge |
02:42
🔗
|
asking |
do you feel this is an important website? |
02:43
🔗
|
Coderjoe |
hmm |
02:43
🔗
|
Coderjoe |
it has been slightly modified from how he left it by the person adding the note that it is being left how he left it |
02:45
🔗
|
asking |
yea that was probably a woman |
02:45
🔗
|
asking |
you know, it's not like he gave his admin passwords to his nuclear physicist hacker co-workers |
02:45
🔗
|
asking |
i was wget'ing for an hour and it was over 600mb |
02:46
🔗
|
Coderjoe |
nice robots.txt... only thing disallowed is the web server statistics |
02:47
🔗
|
asking |
how an i estimate the website's size? |
02:47
🔗
|
asking |
in it's entirety |
02:48
🔗
|
asking |
sum of all files |
02:50
🔗
|
Coderjoe |
without fetching everything? not really possible |
02:50
🔗
|
asking |
curl has -I |
02:50
🔗
|
asking |
curl displays the file size and last modification time only. |
02:50
🔗
|
Coderjoe |
you would have to spider everything with head requests as a minimum |
02:51
🔗
|
asking |
doesn't seem to work for index.html |
02:51
🔗
|
Coderjoe |
i've a mirroring underway |
02:51
🔗
|
asking |
do you think he has stats in that hidden folder mentioned in robots.txt? |
02:51
🔗
|
Coderjoe |
yeah, the server seems to not be sending content lengths for anything |
02:51
🔗
|
asking |
\fuck/ |
02:52
🔗
|
Coderjoe |
yes, there are website statistics (like hit counts and such) |
02:52
🔗
|
asking |
are there size statistics? |
02:53
🔗
|
Coderjoe |
oh crap, I think I might want to respect that robots.txt entry. |
02:53
🔗
|
Coderjoe |
the statistics reports appear to be generated on request |
02:53
🔗
|
asking |
what does that mean for me? |
02:54
🔗
|
asking |
btw archiveteam has a really big list of likely-to-die pages, do you think you will manage to save them all? |
02:54
🔗
|
Coderjoe |
oh yay. no last-modified header eather |
02:56
🔗
|
* |
Coderjoe tells wget to ignore /repwork, where the realtime report scripts are located |
02:58
🔗
|
asking |
robots.txt has content-length:36 with accept-ranges:bytes, does that mean 36 bytes? |
02:59
🔗
|
Coderjoe |
content-length: 36 means 36 bytes |
02:59
🔗
|
Coderjoe |
whee. deep.html is over 1MB |
03:01
🔗
|
asking |
what are you doing? |
03:01
🔗
|
Coderjoe |
watching wget take forever to grab some files |
03:01
🔗
|
Coderjoe |
wow |
03:01
🔗
|
asking |
btw another website where the owner died that has a shitload of web history on it is fravia's website @ searchlores.org |
03:01
🔗
|
Coderjoe |
other-links.html is at 6MB and growing |
03:02
🔗
|
Coderjoe |
at 80KB/s this is going to take awhile |
03:02
🔗
|
asking |
better known by his nickname Fravia, was a software reverse engineer and "seeker" known for his web ... |
03:02
🔗
|
asking |
Coderjoe: what are you mirroring? |
03:02
🔗
|
Coderjoe |
lachlan.bluehaze.com.au |
03:02
🔗
|
asking |
any way i can mirror it directly to a vps? |
03:03
🔗
|
Coderjoe |
run the wget on the vps? |
03:03
🔗
|
asking |
no run the curl on my site that pushes every file to a vps then deletes it, i can run that in memory |
03:03
🔗
|
Coderjoe |
the bottleneck isn't on my end |
03:03
🔗
|
asking |
or run it all in memory .. i have 16 giga |
03:04
🔗
|
asking |
if i had a place to store all this information i could do it myself |
03:04
🔗
|
asking |
or together |
03:04
🔗
|
Coderjoe |
if you are running linux, you can mount a ramfs and wget to that |
03:04
🔗
|
Coderjoe |
you don't have much free disk space? |
03:05
🔗
|
asking |
like 50mb |
03:05
🔗
|
asking |
i have a 16giga ramfs, but then what |
03:05
🔗
|
asking |
everyone reboots sometimes |
03:05
🔗
|
Coderjoe |
tar it up? |
03:06
🔗
|
Coderjoe |
yeesh. only 50mb? |
03:06
🔗
|
asking |
actually less, let me check |
03:06
🔗
|
asking |
last thing i did was tar up my repository of code |
03:06
🔗
|
Coderjoe |
i'm grabbing it to a disk with 400GB free |
03:06
🔗
|
Coderjoe |
hmm |
03:06
🔗
|
asking |
set up an ftp server and i'll dump to it? |
03:07
🔗
|
Coderjoe |
i am getting content lengths and mtims on things likw jpg files |
03:07
🔗
|
asking |
it can't be that big, his website is pretty old |
03:07
🔗
|
Coderjoe |
makes me think the html is eing processed |
03:18
🔗
|
Coderjoe |
mmm |
03:19
🔗
|
Coderjoe |
404 |
03:19
🔗
|
Coderjoe |
amusing 404 page |
03:19
🔗
|
Coderjoe |
http://lachlan.bluehaze.com.au/ccp14admin/security/index.html |
03:19
🔗
|
Coderjoe |
you know... i probably would be better off using wget-warc |
03:21
🔗
|
asking |
his email address is invalid |
03:23
🔗
|
Coderjoe |
Paradoks: is your password "username"? |
03:23
🔗
|
asking |
are you fetching it in parallel? |
03:24
🔗
|
asking |
do you ever get harassed by your isp? |
03:24
🔗
|
Coderjoe |
not for a number of years |
03:25
🔗
|
Coderjoe |
comcast disconnected me without notice a few years back. (and I mean notice as in "we're disconnecting you", not like they claim they did, which was 2 months prior for a warning) |
03:25
🔗
|
balrog |
Coderjoe: for overuse? |
03:25
🔗
|
Coderjoe |
yep |
03:25
🔗
|
balrog |
verizon doesn't harrass |
03:26
🔗
|
balrog |
harass * |
03:26
🔗
|
balrog |
I wish I had FiOS though. it is available here. |
03:26
🔗
|
Coderjoe |
this was back when they had the unpublished limit, before they started with the overage charge |
03:26
🔗
|
asking |
i was reading in a thread today, one guy says he's being harassed because he got caught a few times for pirating torrents |
03:26
🔗
|
asking |
then someone else replies 'tell them to fuck off or else you'll chose another isp' |
03:26
🔗
|
Coderjoe |
I now have at&t uverse. they were supposedly going to put caps on it, but it doesn't appear to have happened yet |
03:26
🔗
|
asking |
you think they are likely to sue or obey? |
03:27
🔗
|
Coderjoe |
(possibly because they have trouble distingushing TV traffic from the user's own internet traffic) |
03:27
🔗
|
Coderjoe |
that who is likely to sue or obey? |
03:27
🔗
|
asking |
that isp |
03:27
🔗
|
asking |
the isp in the guy in my story |
03:28
🔗
|
Coderjoe |
it is not the ISP's responsibility to sue over copyright infringement. it is the infringed party's. |
03:28
🔗
|
asking |
oh true |
03:29
🔗
|
Coderjoe |
hmm. this site might be large... looks like he published a lot of logs (with photos) of his travels |
03:30
🔗
|
asking |
wget downloaded those first for me, and it was just over 500 from 2002 to 2007 |
03:30
🔗
|
asking |
as i remember they only went to 2008, and the later years only had a few pics in them |
03:31
🔗
|
asking |
you can also delete some of the gifs, he had copies of jps as gives for thumbnails or quicker page loading |
03:31
🔗
|
asking |
(they're copies but one of the copies is lower quality) |
03:31
🔗
|
Coderjoe |
i will not be deleting anything |
03:32
🔗
|
asking |
i wonder if embedded archives in his pictures |
03:35
🔗
|
m0lson |
asking, you shoud tell the user on the forum to stay off the piratebay |
03:38
🔗
|
asking |
what? |
03:39
🔗
|
m0lson |
"one guy says he's being harassed because he got caught a few times for pirating torrents" |
03:41
🔗
|
asking |
oh that, i didn't save the link |
03:41
🔗
|
asking |
he didn't necessarily have to be on the pirate bay |
03:41
🔗
|
m0lson |
public trackers in genral |
03:41
🔗
|
asking |
sure |
04:26
🔗
|
asking |
Coderjoe: did you leave it running this whole time? how much did it download? |
04:27
🔗
|
Coderjoe |
currently at 250M |
04:27
🔗
|
Coderjoe |
and in the reports |
04:28
🔗
|
Coderjoe |
(i told wget to ignore the directory where the realtime report scripts live) |
05:44
🔗
|
SketchCow |
Uploaded 4 terabytes of Yahoo Video! |
05:44
🔗
|
asking |
:O |
05:44
🔗
|
SketchCow |
Now I'm adding more, doing up .tars for it, and so on. |
05:45
🔗
|
SketchCow |
So hopefully tomorrow, I can set the thing to start uploading them again. |
05:53
🔗
|
SketchCow |
I've uploaded 41 sets of videos so far. |
05:53
🔗
|
SketchCow |
I forgot it's only, like, 9.7 million users. |
05:53
🔗
|
asking |
all in flv? |
05:53
🔗
|
SketchCow |
http://www.textfiles.com/videoyahoo/USERSCRAPE/USERLISTS/ |
05:53
🔗
|
SketchCow |
I believe so, yes. |
05:54
🔗
|
asking |
its too bad you guys are so strict on altering the content, you would achieve better compression if you converted flv to another format before compressing as an archive |
05:54
🔗
|
asking |
there are lossless functions if you want |
05:54
🔗
|
SketchCow |
Yes, that's really too bad. |
05:54
🔗
|
SketchCow |
Remember when you see those old books? |
05:54
🔗
|
SketchCow |
It's too bad they didn't cut out the parts that weren't that politically expedient. |
05:54
🔗
|
SketchCow |
Or took out the black people |
05:55
🔗
|
asking |
it's lossless |
05:55
🔗
|
SketchCow |
Or thought they could rewrite them in shorthand to save space |
05:55
🔗
|
asking |
no but how about tripping margins |
05:55
🔗
|
SketchCow |
Is that like tripping balls? |
05:55
🔗
|
asking |
what? |
05:55
🔗
|
asking |
you mean lossless compression isn't good enough? |
05:56
🔗
|
SketchCow |
You are free to download the files we're uploading, compress them any way you want, and make a new set. |
05:56
🔗
|
SketchCow |
Give me another week or two, still uploading. |
05:56
🔗
|
asking |
and then throw it away in /dev/null? fantastic! |
05:56
🔗
|
SketchCow |
That doesn't sound very lossless |
05:57
🔗
|
asking |
the files are lossless, your perogative is lossy |
05:58
🔗
|
asking |
perhaps i didn't understand clearly (i'm new here), you want to mirror them as a compressed torrent, or exactly as the user interface was for yahoo.com/videos was |
05:59
🔗
|
SketchCow |
I want to take these 15-20 terabytes of Yahoo! Video we downloaded over 4-5 months and put them on archive.org. |
05:59
🔗
|
underscor |
And let people do whatever they want with them |
05:59
🔗
|
SketchCow |
And I'm something like 10 terabytes in, things are going swimmingly. |
06:00
🔗
|
asking |
after them download it, or browser all those videos on archive.org |
06:00
🔗
|
asking |
...? |
06:00
🔗
|
SketchCow |
That wasn't english |
06:00
🔗
|
asking |
after that download it, or browse all those videos on archive.org |
06:00
🔗
|
SketchCow |
Good question. Not sure. |
06:00
🔗
|
SketchCow |
I'm sure something will happen with them. |
06:01
🔗
|
asking |
do you have a small dataset uploaded? |
06:01
🔗
|
SketchCow |
I am probably going to write something to download the files, run some number crunching, upload the resulting number crunching. |
06:01
🔗
|
SketchCow |
I have a fucking huge dataset uploaded, that's even better than a small one. |
06:01
🔗
|
asking |
are the videos divided into folders of users? |
06:02
🔗
|
SketchCow |
I guess there's a small one in the big one. |
06:02
🔗
|
SketchCow |
Yes. |
06:02
🔗
|
asking |
where? |
06:02
🔗
|
asking |
what's the url |
06:02
🔗
|
SketchCow |
http://www.archive.org/details/archiveteam-yahoovideo |
06:02
🔗
|
asking |
i'll get back to you in a few days |
06:49
🔗
|
Coderjoe |
lachlan mirror still going. grabbing lots of jpgs now.. currently at 748M |
07:06
🔗
|
asking |
jeez |
07:07
🔗
|
asking |
is there a way to retrieve usenet posts that had the x-no-distribute header? |
07:08
🔗
|
Coderjoe |
if you have the article id or server-specific index number, perhaps |
07:08
🔗
|
Coderjoe |
otherwise, you have to pull full headers (not xover) of all the articles to find the ones that have headers like that |
12:01
🔗
|
Coderjoe |
now at 1.6G and still going |
12:55
🔗
|
Paradoks |
Coderjoe: Of course "username" is my password. What's the point of having a password if it doesn't logically fit the username? (I have no idea why my User ID is "password". If I set it, I set it years ago.) |
13:52
🔗
|
bearh |
Where should i upload my backup of an old website(only 30mb zipped)? |
15:10
🔗
|
inv |
Bear_: you're bearh? |
15:11
🔗
|
inv |
msg SketchCow, he'll hook you up with an rsync account |
15:11
🔗
|
inv |
I can host it temporarily if you can't keep a stable connection up |
15:13
🔗
|
Bear_ |
kk |
15:13
🔗
|
Bear_ |
Sorry, I wasn't looking at my irc client. |
16:18
🔗
|
Schbirid |
anyone know a good tool to download images off a flickr account without being the owner of that account? |
16:22
🔗
|
underscor |
Schbirid: I have some |
16:23
🔗
|
underscor |
Linux only though |
16:23
🔗
|
Schbirid |
splendid |
16:23
🔗
|
Schbirid |
i remember the term "flickr fuckr" but could not find it anymore |
16:23
🔗
|
underscor |
Let me go find them |
16:24
🔗
|
underscor |
Hmm, I think this is the current version |
16:24
🔗
|
underscor |
Gimme a username to test |
16:24
🔗
|
underscor |
:) |
16:25
🔗
|
Schbirid |
random flickr says "minkee" |
16:27
🔗
|
underscor |
Schbirid: Thanks |
16:27
🔗
|
underscor |
One second |
16:27
🔗
|
underscor |
Making a few changes |
16:27
🔗
|
Schbirid |
you rock |
16:27
🔗
|
Schbirid |
i'll be back in ~30 minutes |
16:27
🔗
|
underscor |
Ok |
16:28
🔗
|
underscor |
If you want to fully extract everything, you need ruby and the json and yaml gems |
16:28
🔗
|
underscor |
(just fyi) |
16:55
🔗
|
asking |
Coderjoe: did it finish? |
16:56
🔗
|
asking |
wow 1.6 |
16:57
🔗
|
asking |
Schbirid: you can't if they're marked private, but if you're going to index flirkr, javascript can do it |
16:57
🔗
|
asking |
javascript is great because you can use it like the firefox XPS attacks against irc networks |
16:58
🔗
|
asking |
you get random users to go on your web site and the code runs |
16:58
🔗
|
asking |
they leave the site open and it scrapes and send back the pics to you |
16:58
🔗
|
asking |
there's also javascript ddos websites done in the same fashion |
16:58
🔗
|
asking |
or bandwidth killers, those load up as many pictures as possible from the website |
16:59
🔗
|
asking |
its really easy to get people to use that software to help you since you don't have to explain them much |
17:00
🔗
|
asking |
like if u want grandma to help you, you can do it this way |
17:20
🔗
|
underscor |
or you can not violate flickr's TOS by doing it the legitimate way |
17:22
🔗
|
underscor |
Schbirid: http://tracker.archive.org/flickr/ |
17:22
🔗
|
underscor |
There's an example of the output too |
17:22
🔗
|
underscor |
Need ruby, libjson-ruby, and libyaml-ruby |
17:23
🔗
|
underscor |
(if you're on debian or ubuntu) |
17:39
🔗
|
Schbirid |
underscor: works partially, it does not seem to like usernames like 54421772@N03 but looks like it is downloading the images alright |
17:39
🔗
|
Schbirid |
thank you! |
17:47
🔗
|
underscor |
Make sure you put the username in quotes |
17:47
🔗
|
underscor |
But other than that, yay |
17:49
🔗
|
Schbirid |
i did, it still seemed to use the @ as seperator somewhere |
17:50
🔗
|
Schbirid |
wait, actually this is not those usernames |
17:50
🔗
|
Schbirid |
got a normal one with the same error. let me pastebin it |
18:00
🔗
|
Schbirid |
underscor: http://pastebin.com/Hfk4tKBi |
18:00
🔗
|
Schbirid |
i did "gem install json", not yaml since google told me that stuff is builtin. using ruby 1.9.something (latest on archlinux) |
18:03
🔗
|
underscor |
oh okay, yeah |
18:03
🔗
|
underscor |
1.8 needs it |
18:04
🔗
|
underscor |
Ah, numeric username was borking some regexps |
18:05
🔗
|
underscor |
Think I fixed it, let me test |
18:06
🔗
|
asking |
underscor: that's legitimate |
18:06
🔗
|
asking |
it's as legitimate as using wget to scrape it all |
18:06
🔗
|
asking |
except you're using javascript, and you can distribute easily |
18:08
🔗
|
underscor |
except it's a lot more work to write the backend stuff to control it all |
18:08
🔗
|
underscor |
Also, you run into XSS issues too |
18:09
🔗
|
underscor |
Schbirid: flickrgrabr updated, should work now |
18:10
🔗
|
Schbirid |
thanks! |
18:24
🔗
|
alard |
That's a funny use of Ruby: as a json to yaml converter. (For people who don't like json, presumably. :) |
18:28
🔗
|
Schbirid |
underscor: yes, that fixed it :) |
18:29
🔗
|
underscor |
awesome |
18:29
🔗
|
underscor |
alard: ;) |
19:05
🔗
|
underscor |
<underscor> (It's only singleuser, so if someone's already connected it'll drop new ones) |
19:05
🔗
|
underscor |
<underscor> ? :) |
19:05
🔗
|
underscor |
<underscor> Someone willing to test it my telnetting to 71.126.138.142:42 |
19:05
🔗
|
underscor |
<underscor> Whee, got my parallax propeller running hangman over telnet |
19:15
🔗
|
closure |
wtf is a parallax propeller? |
19:15
🔗
|
alard |
Tried it, lost. Nice. (I'm probably not familiar enough with the type of words it uses.) |
19:15
🔗
|
underscor |
it's an 8 core microchip |
19:15
🔗
|
underscor |
http://tracker.archive.org/wordbank.txt |
19:16
🔗
|
underscor |
closure: running at 80mhz with (I think) 32k ram |
19:17
🔗
|
balrog |
an interesting microcontroller |
19:17
🔗
|
alard |
underscor: Ha. I recognize only a few. |
19:17
🔗
|
underscor |
:) |
19:18
🔗
|
alard |
I did learn the word 'gonkulator', so it's a very educational game. |
19:18
🔗
|
Coderjoe |
asking: still going. now at 2.7G |
19:19
🔗
|
underscor |
alard: hahha |
19:41
🔗
|
lemonkey |
http://laughingsquid.com/fifteen-people-youll-see-at-every-video-gamecomicnerd-convention/ |
19:47
🔗
|
lemonkey |
http://sfist.com/attachments/SFist_AndrewD/OccupySF_Oct15_19_steverhodes.jpg?487 |
19:47
🔗
|
lemonkey |
woops wrong chan |
19:48
🔗
|
balrog |
lemonkey: got my PM? |
19:52
🔗
|
alard |
underscor: https://gist.github.com/3e333ef4b583117928ee (Sorry, couldn't help it.) |
19:53
🔗
|
underscor |
:D |
19:53
🔗
|
underscor |
That's awesome |
20:46
🔗
|
human39_ |
I just acquired some old flyers/pictures zines from a basement. They're a bit damp. Any tips on how to dry these guys out? |
21:48
🔗
|
bsmith093 |
anyone good with archive conversion? |
21:50
🔗
|
bsmith093 |
specifically cbz cbr to pdf? |
22:36
🔗
|
alard |
underscor: I was wondering, would you be able (and willing) to set up a listerine-like thing for the downloading of MobileMe? It would be really helpful. |