Time |
Nickname |
Message |
00:12
🔗
|
ivan` |
anyone backed up any of what.cd? |
00:13
🔗
|
ivan` |
esp. collages |
00:15
🔗
|
DFJustin |
yes |
00:22
🔗
|
Aranje |
^ |
01:20
🔗
|
turnkit |
Archiving some CD-ROMs using imgburn. Anyone confirm if imgburn will create a .cue/.bin automatically if the disc is dual mode (mode1 and mode 2 -- i.e. data + audio)? So far all the discs are ripping as .iso. I just want to make sure a dual mode disc will be archive correctly and that I'm using the right tool. |
03:06
🔗
|
DFJustin |
turnkit: yes imgburn will automatically switch to bin/cue for anything that's not vanilla mode1 |
03:07
🔗
|
DFJustin |
what kind of stuff are you archiving |
03:29
🔗
|
SketchCow |
I hope it's awesome stuff |
03:29
🔗
|
SketchCow |
DFJustin: You're able to fling stuff into cdbbsarchive, I see. Good. |
03:30
🔗
|
DFJustin |
have been for the better part of the last year :P |
03:30
🔗
|
SketchCow |
I figured, just figured I'd say it. |
03:32
🔗
|
DFJustin |
I'd appreciate a mass bchunk run when you get a chance, would be more efficient than converting the stuff locally and uploading it over cable |
03:33
🔗
|
SketchCow |
Explain bchunk run here (I'm tired, drove through 5 states) |
03:33
🔗
|
SketchCow |
What did bchunk do again |
03:34
🔗
|
DFJustin |
convert .bin/.cue into browseable .iso |
03:34
🔗
|
SketchCow |
Oh that's right. |
03:34
🔗
|
SketchCow |
I can do that. |
03:34
🔗
|
SketchCow |
Right now, however, I'm working on a bitsavers ingestor |
03:34
🔗
|
DFJustin |
>:D |
03:34
🔗
|
SketchCow |
If I get that working, boom, 25,000 documents |
03:35
🔗
|
DFJustin |
al's been flinging dumps of one obscure-ass system after another towards messdev |
03:35
🔗
|
SketchCow |
One little thing, though. |
03:35
🔗
|
SketchCow |
https://archive.org/details/cdrom-descent-dimensions |
03:36
🔗
|
SketchCow |
Try to avoid going "this was sued out of existence, haw haw" in a description. |
03:36
🔗
|
DFJustin |
hah yeah I guess so, that was a copy-paste from elsewhere |
03:36
🔗
|
SketchCow |
And yes, Al's been on a hell of a run. |
03:36
🔗
|
SketchCow |
To be honest, bitsavers is terrifying at how much it's bringing in now. |
03:38
🔗
|
SketchCow |
wc -l IndexByDate.txt 25872 IndexByDate.txt |
03:38
🔗
|
SketchCow |
So 25,872 individual PDFs |
03:40
🔗
|
godane |
SketchCow: i uploading my official xbox magazine pdfs |
03:40
🔗
|
SketchCow |
Yes. |
03:40
🔗
|
SketchCow |
You spelled Magazine wrong. |
03:40
🔗
|
SketchCow |
it's rather exciting to watch |
03:41
🔗
|
SketchCow |
Also, they're almost guaranteed to disappear - Future Publishing is a strong, modern entity. |
03:42
🔗
|
godane |
yes but this stuff is giving away for free |
03:44
🔗
|
godane |
http://www.oxmonline.com/secretstash |
03:45
🔗
|
SketchCow |
Hmm, I may be wrong. |
03:45
🔗
|
SketchCow |
Worth seeing. |
03:48
🔗
|
godane |
i'm also fixing the typo |
03:50
🔗
|
SketchCow |
So the good news is by mistake, I uploaded a script to archive.org with my s3 keys in it. |
03:50
🔗
|
SketchCow |
I immediately deleted it when I saw what the script had done, of course. |
03:50
🔗
|
SketchCow |
But therefore, I HAD to smash my s3 keys |
03:50
🔗
|
SketchCow |
Therefore I HAVE to smash all my scripts that use those keys. |
03:51
🔗
|
SketchCow |
Therefore, I HAVE NO REASON not to make them all access an external file for their s3 key information. |
03:51
🔗
|
SketchCow |
AND FINALLY THEREFORE, that means as I finish these new scripts, I can drop them on people for github or whatever |
03:54
🔗
|
SketchCow |
And then I'm going after that mkepunk.com site because holy shit |
05:26
🔗
|
Aranje |
oooh |
05:59
🔗
|
turnkit |
DFJustin: MacAddict set. I collected about 70% of the set but only maybe 50% of the sleeves. Unsure when sleeves stopped and restarted. I've got a lot of silkscreen (cd) and sleeve scanning left but am uploading slowly now. Need to find a source for the missing discs still. Am trying to track them down. |
06:01
🔗
|
turnkit |
DFJustin: thx for the imgburn confirmation. Puts my mind at ease. |
06:17
🔗
|
SketchCow |
It's written! |
06:17
🔗
|
SketchCow |
I just started doing it. |
06:17
🔗
|
SketchCow |
I can do masses of specific machine uploads |
06:18
🔗
|
SketchCow |
And as you can see, it refuses to upload twice. |
06:18
🔗
|
SketchCow |
[!] beehive/118010M_A_Family_of_Video_Display_Terminals_for_OEM_Planning_Brochure.pdf already uploaded! |
06:18
🔗
|
SketchCow |
[!] beehive/BM1912RAM08-A_Model_DM20_Data_Sheet_March1980.pdf already uploaded! |
06:18
🔗
|
SketchCow |
[!] beehive/BM1912RAM08-B_Model_DM1A_Data_Sheet_March1980.pdf already uploaded! |
06:18
🔗
|
SketchCow |
[!] beehive/BM1912RAM08-C_Model_DM30_Data_Sheet_March1980.pdf already uploaded! |
06:18
🔗
|
SketchCow |
[!] beehive/BM1912RAM08-D_Model_DM10_Data_Sheet_March1980.pdf already uploaded! |
06:18
🔗
|
SketchCow |
[!] beehive/BM1912RAM08-E_Model_DM_S_Data_Sheet_March1980.pdf already uploaded! |
06:18
🔗
|
SketchCow |
[!] beehive/BM1913MAR80-Micro_4400_Brochure_March1980.pdf already uploaded! |
06:18
🔗
|
SketchCow |
[!] beehive/Beehive_Service_Field_Engineering_Brochure.pdf already uploaded! |
06:18
🔗
|
SketchCow |
[root@www /usr/home/jason/BITARCHIVERS]# for each in `grep beehive *Date* | cut -f3- -d" "`;do ./bitsaver "$each";done |
06:19
🔗
|
hiker1 |
Does that upload to Archive.org? |
06:20
🔗
|
SketchCow |
Yes, that's what this is. |
06:22
🔗
|
SketchCow |
http://archive.org/details/bitsavers_beehive118isplayTerminalsforOEMPlanningBrochure_9464917 |
06:22
🔗
|
SketchCow |
for example |
06:38
🔗
|
DFJustin |
bits: fukken saved |
06:40
🔗
|
SketchCow |
yeah |
06:40
🔗
|
SketchCow |
Well, more that they're now up in the archive.org collection. |
06:40
🔗
|
SketchCow |
Now, remember the big second part to my doing this - pulling the item information into a wiki so people can edit it and we can sync it back to archive.org |
06:41
🔗
|
SketchCow |
That'll be the big thing |
06:41
🔗
|
SketchCow |
Then we can set it up for other collections. |
06:41
🔗
|
SketchCow |
total hack around internet archive not buying into collaborative metadata |
06:42
🔗
|
SketchCow |
http://archive.org/details/bitsavers_motorola68_1675238 is what we have for now |
06:49
🔗
|
SketchCow |
https://archive.org/details/bitsavers_atari40080mTextEditor1981_3442791 |
07:02
🔗
|
turnkit |
love that updating solutions are being addressed. |
07:02
🔗
|
SketchCow |
has to be. |
07:02
🔗
|
SketchCow |
this is the year! |
07:02
🔗
|
SketchCow |
I am going to fix some shit. I know how things work now, and we're going to get cruising. |
07:05
🔗
|
chronomex |
fix the shit out of it this year |
07:07
🔗
|
SketchCow |
Yeah |
07:07
🔗
|
SketchCow |
And among that is fixing the metadata thing. |
07:07
🔗
|
SketchCow |
So I'll need help with that, as we make a second wiki for pulling in collections to work on |
07:22
🔗
|
SketchCow |
x-archive-meta-title:ibm :: 1130 :: subroutines :: 00.3.003 Magnetic Tape Subroutines For Assembler and Fortran Compiled Programs for the IBM 1130 |
07:22
🔗
|
SketchCow |
Drool |
07:23
🔗
|
chronomex |
ooo |
07:23
🔗
|
chronomex |
now you're talking |
07:24
🔗
|
hiker1 |
You guys saved all those old documents for atari? |
07:25
🔗
|
hiker1 |
wow |
07:39
🔗
|
godane |
is trs-80 microcomputer news upload to archive.org? |
07:42
🔗
|
godane |
i only ask cause looks like romsheperd has it |
08:41
🔗
|
Nemo_bis |
Nice, SketchCow it keeping the OCR boxes busy. :) |
08:54
🔗
|
godane |
all of the free xbox magazines are uploaded now |
08:54
🔗
|
hiker1 |
good work |
09:04
🔗
|
Nemo_bis |
godane: come on, only 835 texts uploads? You can do better. ;-) https://archive.org/search.php?query=uploader%3Aslaxemulator%20AND%20mediatype%3Atexts |
09:05
🔗
|
godane |
why is it that some of my website dumps are in texts? |
09:05
🔗
|
godane |
i know i upload it that way but jason moved it to archiveteam-file |
09:06
🔗
|
godane |
i do notice it doesn't have the archiveteam-file web interface |
09:07
🔗
|
godane |
SketchCow: i think you need to change the mediatype to some of my webdumps and iso files |
09:08
🔗
|
godane |
you only put into the collection |
09:08
🔗
|
godane |
without the mediatype change it doesn't get the collection web interface |
09:10
🔗
|
godane |
SketchCow: you also missed one of my groklaw.net pdfs dumps: https://archive.org/details/groklaw.net-pdfs-2007-20120827 |
09:10
🔗
|
godane |
even though its pdfs there in a warc.gz and tar.gz |
09:11
🔗
|
godane |
this was needs a it mediatype chnaged: https://archive.org/details/TechTV_Computer_Basics_with_Chris_Pirillo_and_Kate_Botello |
14:12
🔗
|
hiker1 |
Does anyone here use warc-proxy on Linux or OS x? |
14:54
🔗
|
alard |
hiker1: I use Linux. |
14:55
🔗
|
hiker1 |
ah. I was just wondering how portable it was. Apparently very. |
14:55
🔗
|
alard |
You're using it on Windows? I find that even more interesting. |
14:56
🔗
|
hiker1 |
heh. well, it works here too, with no problems |
14:59
🔗
|
alard |
Have you used the Firefox extension as well? Apparently OS X is (or was, perhaps) more difficult https://github.com/alard/warc-proxy/issues/1 |
14:59
🔗
|
hiker1 |
No, I didn't try it. My FireFox is so slow already ;_; |
14:59
🔗
|
hiker1 |
plus I haven't restarted it in god knows how long |
15:00
🔗
|
hiker1 |
That is an old ticket! |
15:05
🔗
|
hiker1 |
alard: It probably doesn't like having the two extensions |
15:05
🔗
|
alard |
This was before the two extensions. At that time WARC *.warc.gz was the only file type (and there wasn't an All files option). |
15:06
🔗
|
hiker1 |
I mean .warc.gz |
15:06
🔗
|
hiker1 |
intead of .gz |
15:06
🔗
|
alard |
Ah, I see. |
15:06
🔗
|
hiker1 |
That would be my guess |
15:07
🔗
|
hiker1 |
Probably not much is lost by changing it to just .gz. |
15:07
🔗
|
alard |
But does the 'filename extension' even exist in OS X? I thought that was a Windows thing. |
15:07
🔗
|
hiker1 |
I don't know |
15:08
🔗
|
Smiley |
hmmm |
15:08
🔗
|
Smiley |
is it like the linux version? |
15:09
🔗
|
Smiley |
If it exists, it obeys it (or complains about it) and if it doesn't exist, it doesn't care? |
15:10
🔗
|
alard |
Could this have something to do with it? https://bugzilla.mozilla.org/show_bug.cgi?id=444423 |
15:14
🔗
|
hiker1 |
might. would have to see the Firefox code for nsIFilePicker |
15:15
🔗
|
hiker1 |
well, for the OS X code. that is just an interface |
15:20
🔗
|
hiker1 |
That bug is probably related. I guess they'll be fixing it any time now. Yep, any year now. Maybe before the decade is out? Well, two decades? |
15:28
🔗
|
godane |
uploaded: http://archive.org/details/Call.For.Help.Canada.2004.08.17 |
15:29
🔗
|
godane |
SketchCow: there is going to a call for help canada collection soon in computers and tech videos |
15:29
🔗
|
godane |
i also plan on doing the same thing for all of the the screen savers episodes i have |
15:59
🔗
|
hiker1 |
What is call for help canada? |
16:00
🔗
|
hiker1 |
tech tv show |
16:00
🔗
|
hiker1 |
godane: How are you going to create the collection? |
16:00
🔗
|
godane |
i'm not create the collection |
16:01
🔗
|
godane |
but jason scott puts my files into a collection |
16:01
🔗
|
hiker1 |
ah |
16:13
🔗
|
SketchCow |
He lights a candle and I am there |
16:17
🔗
|
godane |
the screen savers have like the last 8 months of |
16:17
🔗
|
hiker1 |
IIPC is working to create an archiving proxy: http://netpreserve.org/projects/live-archiving-http-proxy |
16:18
🔗
|
godane |
i'm thinking of something crazy that could in theory save space |
16:18
🔗
|
hiker1 |
? |
16:18
🔗
|
godane |
like merging multiable warc.gz of the same site into sort of a megawarc |
16:19
🔗
|
hiker1 |
won't help |
16:19
🔗
|
godane |
i was thinking of a way to dedup multiable warc.gz |
16:19
🔗
|
hiker1 |
all you need to do is use actual compression instead of warc.gz files. |
16:20
🔗
|
hiker1 |
warc.gz is not a true compressed file. It is a bunch of compressed files merged together. Each HTML file is compressed by itself without knowledge of the other html files. |
16:20
🔗
|
alard |
SketchCow: Semantic MediaWiki + Semantic Forms might be something to look at for your metadata-wiki. |
16:20
🔗
|
godane |
this way something like my 5 torrentfreak warc.gz can be in one mega warc but alot smaller |
16:20
🔗
|
hiker1 |
This severely hurts compression |
16:20
🔗
|
alard |
You can make structured forms like these: http://hackerspaces.org/w/index.php?title=The_1st_SPACE_Retreat&action=formedit |
16:21
🔗
|
hiker1 |
godane: Just extract a .warc.gz file, then compress it with .7z. Alternatively, extract a warc file and compress with 7-zip along with another .warc file |
16:22
🔗
|
alard |
Is compression that important? |
16:22
🔗
|
godane |
hiker1: i'm thinking something that can still work with wayback machine |
16:22
🔗
|
hiker1 |
alard: Yes... especially for transferring files to IA. |
16:22
🔗
|
hiker1 |
and it saves bandwidth for people downloading from IA |
16:22
🔗
|
alard |
But you're making your warc files much, much less useful. |
16:23
🔗
|
godane |
i'm also thinking of sort way so IA could have a kit to setup archive.org at home of sorts |
16:23
🔗
|
godane |
dedup matters when your think in that way |
16:23
🔗
|
hiker1 |
Does IA even use the original warc.gz file in production? I assume they use it to feed the machine, but then they could have just extracted a 7z and fed the machine with that |
16:24
🔗
|
alard |
I don't know. The wayback machine that you can download from GitHub certainly reads from .warc.gz files. |
16:25
🔗
|
hiker1 |
but I'm guessing IA has two copies |
16:25
🔗
|
hiker1 |
one as the Item, and one for the machine |
16:25
🔗
|
hiker1 |
I am not sure though, it's just a guess |
16:28
🔗
|
hiker1 |
For a continuous crawler, you could save space by checking old records in a WARC file and then adding revisit records as necessary. But this would not work with any current WARC readers (wayback machine, warc-proxy, etc.) |
16:29
🔗
|
hiker1 |
continuous crawler being e.g. one that crawls every week to check for changes to the site |
16:31
🔗
|
hiker1 |
godane: If you have the WARC ISO draft, it discusses this type of example on p. 27 |
16:39
🔗
|
godane |
hiker1: again i was thinking of a way to dedup multiable warc into one big warc |
16:39
🔗
|
godane |
not revisting records in another file |
16:41
🔗
|
godane |
the idea is to store the file once |
16:41
🔗
|
hiker1 |
godane: If you had two snapshots, warc1 taken first then warc2. You could run through warc2 and see if the HTTP body matches the record in warc1. If it matches, append a revisit record to warc1. If they are different, append the new record to warc1. |
16:42
🔗
|
hiker1 |
godane: Two snapshots of the same site, right? |
16:42
🔗
|
hiker1 |
or do you mean two warc files with different contents, e.g. html in one and images in another? |
16:43
🔗
|
godane |
i mean content with the same md5sum or checksum |
16:44
🔗
|
hiker1 |
Yes, as I said, run through the two and check if the http body is the same |
16:44
🔗
|
hiker1 |
WARC files already offer the payload-digest to check if the http body is the same. |
16:44
🔗
|
godane |
again i want the two to merge into one file |
16:45
🔗
|
godane |
also say that the site is dead |
16:45
🔗
|
hiker1 |
that would merge them effectively |
16:45
🔗
|
hiker1 |
after you did it, you could delete warc2. |
16:45
🔗
|
hiker1 |
since all the contents would be in warc1 |
17:58
🔗
|
SketchCow |
Just had a fascinating conversation with brewster about internet archive. |
17:58
🔗
|
SketchCow |
Talking about fundraising, still on his mind, how can that be done better next year. |
18:00
🔗
|
SketchCow |
one of the things I mentioned was Internet Archive taking over or bringing back certain "services" people have expected over the years. |
18:00
🔗
|
SketchCow |
So have a bank of virtual servers that are basically "this service" |
18:00
🔗
|
SketchCow |
brainstorming on that would be good at some point. |
18:00
🔗
|
SketchCow |
So basically, he's up for non-archive.org-interface things |
18:02
🔗
|
SketchCow |
Also, I am fucking DESTROYING the submit queue |
18:02
🔗
|
SketchCow |
\o/ |
18:05
🔗
|
* |
Nemo_bis feels the derivers gasping |
18:05
🔗
|
* |
Nemo_bis laughs devilishly |
18:14
🔗
|
SketchCow |
I can already see a few dozen, maybe a couple hundred, will come out 'wrong'. |
18:17
🔗
|
Nemo_bis |
SketchCow: my meccano-magazine-* derives always failed when rerun all together. I had to rerun only ~100 at a time to avoid them timing out on solr_post.py or getting stuck with high load on OCR hosts. |
18:18
🔗
|
SketchCow |
Right. |
18:18
🔗
|
SketchCow |
We should make those a collection, huh. |
18:18
🔗
|
SketchCow |
I mean, it IS 650 issues |
18:21
🔗
|
SketchCow |
http://archive.org/details/meccano_magazine |
18:22
🔗
|
beardicus |
SketchCow, do you have an example service re: your talk with brewster? |
18:25
🔗
|
beardicus |
ah... maybe i see what you're getting at. eg: provide gopher archives through an actual gopher server instead of all webbed up in the archive.org interface. |
18:31
🔗
|
SketchCow |
http://archive.org/details/meccano_magazine is now coming along nicely. |
18:31
🔗
|
SketchCow |
Yes |
18:31
🔗
|
SketchCow |
That is exactly what I mean |
18:31
🔗
|
SketchCow |
"stuff" |
19:03
🔗
|
hiker1 |
How is chunked HTTP encoding supposed to be handled in a WARC file? |
19:03
🔗
|
hiker1 |
Should I just remove the chunked header from the response? |
19:18
🔗
|
hiker1 |
alard: warc-proxy passes the transfer-encoding header. This seems to leave the connection open forever. |
19:19
🔗
|
hiker1 |
for responses that have it set |
19:28
🔗
|
hiker1 |
I think I might be saving my chunks wrong. |
19:40
🔗
|
hiker1 |
No, I think I am saving them right. Hanzo warctools handles decoding the chunks, so I don't think warc-proxy should pass the transfer-encoding header since that would be telling the browser to handle the chunks. |
19:50
🔗
|
swebb |
That Adobe CS2 stuff can be found here: http://noelchenier.blogspot.ca/2013/01/download-adobe-cs2-for-free.html |
20:00
🔗
|
SketchCow |
Thanks. |
20:00
🔗
|
SketchCow |
Grabbing. |
20:05
🔗
|
SketchCow |
Up past 1,500 red rows on archive.org! |
20:05
🔗
|
SketchCow |
Deriver is dying on me |
20:05
🔗
|
SketchCow |
TAKE IT |
20:05
🔗
|
SketchCow |
TAKE ALL OF IT |
20:06
🔗
|
Smiley |
MOAR FATA!!!! |
20:15
🔗
|
SketchCow |
I had to stop it. |
20:15
🔗
|
SketchCow |
It's at 1,480. |
20:16
🔗
|
SketchCow |
I'll let things die down and do another massive submit after these get down |
20:16
🔗
|
SketchCow |
Or it'll murder a dev |
20:45
🔗
|
alard |
hiker1: Are your certain you have the latest warc-proxy? The latest version shouldn't send the Transfer-Encoding header: https://github.com/alard/warc-proxy/blob/master/warcproxy.py#L263 |
21:09
🔗
|
Nemo_bis |
SketchCow: I so much miss having a ganglia graph for all servers to see the CPU load. |
21:12
🔗
|
Nemo_bis |
SketchCow: some meccano-magazine-* were not added to the collection (like half of them?), in case you don't know |
21:13
🔗
|
Nemo_bis |
eg https://archive.org/catalog.php?history=1&identifier=meccano-magazine-1966-03 |
21:29
🔗
|
godane |
i got one of my smart computing magazines in the mail |
21:29
🔗
|
godane |
there is some highlighting and writing on the front |
21:29
🔗
|
godane |
but that can be fixed in gimp |
21:31
🔗
|
Nemo_bis |
omg godane's scans are RETOUCHED |
21:34
🔗
|
godane |
the original will be posted too |
21:34
🔗
|
Nemo_bis |
j/k |
21:34
🔗
|
godane |
the cover mostly had the writing on the front |
21:34
🔗
|
godane |
where this white |
21:36
🔗
|
godane |
omg: http://www.ebay.com/sch/Magazine-Back-Issues-/280/i.html?_ipg=25&_from=&_nkw=&_armrs=1&_ssn=treasures-again |
21:37
🔗
|
godane |
who is willing to give me money for archiving this? |
21:38
🔗
|
godane |
just know that all these items have about 3 days or less |
21:40
🔗
|
DFJustin |
is there any point in paying full price for this stuff on ebay, you can probably get stacks at your local salvation army for 25c a pop |
21:42
🔗
|
godane |
i'm in new england i don't know if local salvation army will have this stuff? |
21:44
🔗
|
DFJustin |
your local university library probably has them then |
21:46
🔗
|
DFJustin |
at least you should check places before paying $7 an issue for stuff that's not super old and rare |
21:46
🔗
|
godane |
DFJustin: i don't drive |
21:47
🔗
|
godane |
my brother is the one that drives |
21:47
🔗
|
Nemo_bis |
there's also surely someone with the cellar full of thousands of magazines if one can drive to them and collect them |
21:47
🔗
|
Nemo_bis |
but you don't, too bad |
22:26
🔗
|
DFJustin |
http://www.forbes.com/sites/adriankingsleyhughes/2013/01/07/download-adobe-cs2-applications-for-free/ |
22:29
🔗
|
DFJustin |
* not actually free |
22:35
🔗
|
db48x |
lol |
23:06
🔗
|
apokalypt |
http://windowsphone.bboard.de/board/ |