Time |
Nickname |
Message |
00:17
🔗
|
DFJustin |
there'll be plenty of time to argue about login walls after the manuscripts aren't disintegrating |
02:06
🔗
|
BlueMax |
Is ExoDOS up yet? |
02:19
🔗
|
SketchCow |
Still uploading. |
02:19
🔗
|
SketchCow |
These are large items. |
02:21
🔗
|
SketchCow |
http://archive.org/details/eXoDOSAct.v1.6 is the first and it's STILL regulating out. |
02:25
🔗
|
dashcloud |
SketchCow: can you tweet about the Timbuktu manuscript project on the archiveteam twitter? http://www.indiegogo.com/projects/timbuktu-libraries-in-exile/x/7939 |
02:25
🔗
|
BlueMax |
fair enough SketchCow |
08:40
🔗
|
SketchCow |
Digitizin' Apple Diskettes, rendering documentary, uploading Formspring, uploading ExoDOS, uploading bitsavers. |
09:35
🔗
|
BlueMax |
That's the SketchCow life. |
09:35
🔗
|
BlueMax |
Digiupding. |
09:37
🔗
|
BlueMax |
Will it be possible to browse the large ExoDOS zip files? |
09:40
🔗
|
godane |
looks like i have a 142 episodes of this week in computer hardware uploaded |
10:23
🔗
|
ruairi |
BlueMax: what was the deal with that, are they getting uploaded? |
10:24
🔗
|
BlueMax |
ruairi, they already are http://archive.org/details/eXoDOSAct.v1.6 |
10:28
🔗
|
ruairi |
WHOA, right on! :D |
10:29
🔗
|
ruairi |
BlueMax: What wouldn't be welcome at archive.org then? - I can try and get some ex UG content up if I reach out |
10:30
🔗
|
BlueMax |
I think SketchCow is more of an expert on that front than I am |
10:30
🔗
|
ruairi |
When does he get online (ish)? |
10:31
🔗
|
BlueMax |
Well he was online like an hour or two ago, just hang around and see if he pops back in |
10:31
🔗
|
SketchCow |
Justin Beiber. totally unwelcome |
10:32
🔗
|
ruairi |
SketchCow: 'ello sir! I'm ruairi aka rc55 who runs the uk demoscene party "Sundown" |
10:32
🔗
|
Smiley |
ruairi: upload _ALL_ the things |
10:32
🔗
|
Smiley |
worry about if it's welcome later. |
10:33
🔗
|
Smiley |
Anything unwelcome goes dark I believe. Delete nothing, Archive everything! |
10:34
🔗
|
ruairi |
Yeah, doesn't it get very iffy if current stuff goes up though? There are fullsets of PSX ISOs out there, but if people start uploading PS2, PS3 etc... |
10:34
🔗
|
Ymgve |
I wonder how large a complete ps2 library is |
10:34
🔗
|
godane |
i would think ps2 ps3 demo discs would be less iffy |
10:35
🔗
|
ruairi |
Ymgve: I'm guessing about 1.8TB |
10:35
🔗
|
godane |
i think its around 500 to 800gb |
10:35
🔗
|
Ymgve |
ruairi: that sounds way too small |
10:35
🔗
|
BlueMax |
actually it's around that size for a single PS2 region |
10:35
🔗
|
BlueMax |
from what I remember of the old UG torrents |
10:35
🔗
|
ruairi |
hm |
10:35
🔗
|
Ymgve |
that's like only 500 titles |
10:35
🔗
|
ruairi |
Fair play :) |
10:36
🔗
|
godane |
that must have been ps1 i was think of |
10:36
🔗
|
Ymgve |
yeah, for ps1 it's probably correct since each title is at most 700mb |
10:36
🔗
|
Ymgve |
but for ps2 a title can be up to 8gb |
10:37
🔗
|
Ymgve |
"thankfully" there are fewer ps2 games than ps1 games |
10:37
🔗
|
Ymgve |
and even less ps3 games |
10:38
🔗
|
ruairi |
Could there be any collaboration between archive.org and redump.org possibly? |
10:38
🔗
|
ruairi |
23,000 dumps done there |
10:45
🔗
|
godane |
i'm thinking of grabing all batman 1966 tv show master vhs tapes |
10:46
🔗
|
godane |
i just could see the comments/reviews on it now |
10:46
🔗
|
BlueMax |
man I still have a copy of that series around here somewhere. |
10:47
🔗
|
godane |
it was never release on to dvd or vhs |
10:47
🔗
|
godane |
these are studio vhs master |
10:50
🔗
|
antomatic |
Rights hell, apparently - supposed to be a very complicated production agreement and a lot of areas (e.g. home video rights) that were never discussed or agreed and aren't fully clear. |
10:51
🔗
|
godane |
also cause of all the cameo appearces |
10:51
🔗
|
godane |
they didn't sign anything for home video rights |
10:52
🔗
|
antomatic |
Doesn't really make the series public domain, though - I can't see how it could be legitimately carried on archive.org. (Then again a lot of what's there already surprises me, so don't listen to me..) |
10:56
🔗
|
godane |
plus side is youtube also has it: https://www.youtube.com/playlist?list=PLA7491A8FC7830D6E |
10:56
🔗
|
godane |
its being uploaded for over a 1 year |
10:57
🔗
|
godane |
so i got with if it can stay up on youtube it should stay up on archive.org |
10:57
🔗
|
godane |
*go with |
10:58
🔗
|
antomatic |
Doesn't make it legitimate, though, unless the studio has uploaded it themselves. Ultimately it belongs to the studio, not anyone else. As you say, if it's on YouTube then that might indicate that the studio isn't confident in their ownership or there are other reasons why they haven't taken it down (and just not noticing is a possibility) but it doesn't legitimise it. |
10:58
🔗
|
antomatic |
Not to be a nay-sayer, though - I'm not saying you shoudln't do it. Just that I can see the 'ethical dilemmas'. |
11:00
🔗
|
antomatic |
If more people had made their own archiving efforts back in the 1950s and 1960s there wouldn't be so many episodes of Doctor Who missing today, for example. :) |
11:00
🔗
|
antomatic |
Grey area, though. |
11:03
🔗
|
godane |
Batman 1966 TV Series DVD Set Review: https://www.youtube.com/watch?v=GuvOZ8vZARM |
11:04
🔗
|
godane |
its from a bootleg dvd |
11:05
🔗
|
BlueMax |
antomatic, well it's either they're not confident or they haven't got an auto Content ID check / some guy looking for it yet. |
11:09
🔗
|
godane |
now this is funny |
11:10
🔗
|
godane |
i was looking up the 17bit software |
11:10
🔗
|
godane |
the amgia pd disks i have |
11:10
🔗
|
godane |
there origin is from team17 |
11:12
🔗
|
Tephra |
WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD |
11:13
🔗
|
antomatic |
Ah, I believe it to be 'yahoosucks', Tephra. |
11:13
🔗
|
Tephra |
antomatic: thanks :) |
11:13
🔗
|
godane |
i need help |
11:14
🔗
|
godane |
i didn't upload the index text file for my 17bit phase four item |
11:16
🔗
|
godane |
now it works |
11:17
🔗
|
godane |
i have right click on the tree root and then click upload file |
11:17
🔗
|
godane |
classic uploader just is not working for me for some reason |
12:14
🔗
|
omf_ |
Ymgve, PS1 games can span multiple disks so 700mb is not the max size for a game |
12:14
🔗
|
Smiley |
antomatic: Just because it can't be seen doesn't mean IA doesn't want it. |
12:17
🔗
|
omf_ |
Don't we have a FAQ page for these same tired questions? |
12:17
🔗
|
Smiley |
maybe |
12:17
🔗
|
Smiley |
But I like repeating myself ;) |
12:21
🔗
|
antomatic |
Aah. I see. :) |
12:29
🔗
|
omf_ |
I cannot be the only one tired of seeing the same questions over and over again. In fact I know I am not since we are always pointing people to the wiki at least for projects and application information. I am going to look through the wiki and if we don't have a page answering these questions I am going to make one |
12:33
🔗
|
godane |
so i'm finally uploading the last few forums of g4 |
12:38
🔗
|
Smiley |
omf_: cool, |
13:12
🔗
|
* |
ats notes someone ought to mirror http://arcade.demon.co.uk/ and its file section at ftp://arcade.demon.co.uk/ ... |
13:13
🔗
|
ats |
although sadly I can't remember my username and password from when I used to use it back in 1993 or so... |
13:22
🔗
|
godane |
ats: i will see about mirroring that |
13:24
🔗
|
BlueMax |
heesh what's on that FTP? I see nothing but a bunch of nothing files with no extensions |
13:25
🔗
|
godane |
its cause the file names are only on the website |
13:25
🔗
|
godane |
http://arcade.demon.co.uk/filepages/file56.htm |
13:25
🔗
|
godane |
it will give you a name and a desc of the file |
13:25
🔗
|
ivan` |
are there any massive URL lists beside common_crawl_index? |
13:27
🔗
|
godane |
whats the best options to mirroring a ftp site |
13:29
🔗
|
godane |
i'm just going to mirror it |
13:29
🔗
|
godane |
but there will be a wget.log for this |
13:40
🔗
|
omf_ |
Okay here is the start. I am trying to keep it short and on point http://pad.archivingyoursh.it/p/faqs |
13:55
🔗
|
Smiley |
Hmmm "Won't I get sued" ? |
13:55
🔗
|
Smiley |
A: "Ask a lawyer" |
14:33
🔗
|
omf_ |
added |
14:37
🔗
|
DFJustin |
sketchcow is open to archiving porn |
14:40
🔗
|
BlueMax |
mostly beard porn |
14:40
🔗
|
omf_ |
I thought hat porn as well |
14:41
🔗
|
BlueMax |
floppy porn? |
14:41
🔗
|
omf_ |
Our logo is floppy porn |
15:23
🔗
|
ivan` |
hard to believe, but qq.com and 163.com have rss feeds |
15:54
🔗
|
omf_ |
We have Transclusion enabled on our wiki right? |
16:24
🔗
|
SketchCow |
omf_: I prefer the human touch |
16:31
🔗
|
omf_ |
So I should just copy and paste stuff from existing pages? There is material spread across multiple pages that I believe would also benefit users if presented as a single page |
16:37
🔗
|
SketchCow |
How are you so much more angrier than me? |
16:38
🔗
|
omf_ |
I am between jobs |
16:38
🔗
|
SketchCow |
I'm like, the angriest person in the world. |
16:38
🔗
|
SketchCow |
That'll do it. |
16:42
🔗
|
omf_ |
Do we have a write once, display multiple places solution in mediawiki? |
16:46
🔗
|
Ravenloft |
AngryCow, the new Angry Birds spin-off |
16:49
🔗
|
ivan` |
it would be quite helpful if IA published a 2.4TB bz2 or leveldb dump of all their URLs |
16:49
🔗
|
omf_ |
I second that motion ivan` |
16:51
🔗
|
ivan` |
I could probably rescue thrice as much of Reader with it |
16:51
🔗
|
omf_ |
It would help many projects |
18:47
🔗
|
SketchCow |
ivan`: I just asked about feasibility in the dev channel. |
18:54
🔗
|
ivan` |
cool |
18:54
🔗
|
SketchCow |
He said it's feasible. |
18:57
🔗
|
omf_ |
That is awesome news |
18:58
🔗
|
omf_ |
I assume the dataset would be CC or public domain so it can be used freely. |
19:31
🔗
|
underscor |
ivan`: you're probably looking at 5-7TB compressed, though |
19:31
🔗
|
underscor |
it would probably have to be chunked |
19:32
🔗
|
ivan` |
I don't have that much space at the moment, but I would definitely buy a drive or two to hold it soon |
19:33
🔗
|
ivan` |
IA could provide full text search on the URLs :-) |
19:36
🔗
|
underscor |
We kind of have that |
19:36
🔗
|
underscor |
but it's internal right now |
19:38
🔗
|
Coderjoe |
what is this url thing for? (i'm not undersanding how it helps a mirror/save project) |
19:38
🔗
|
underscor |
Same thing we use google scraping for |
19:38
🔗
|
underscor |
finding subdomains/user pages |
19:38
🔗
|
Coderjoe |
ah |
19:39
🔗
|
Coderjoe |
but if ia knows of them, won't it usually already have crawled the content? (unless blocked by robots.txt) |
19:39
🔗
|
ivan` |
there might be newer content, or in my case, a heck of lot more content in Reader's cache |
19:40
🔗
|
DFJustin |
often ia will have hit it with a shallow crawl but not a deep crawl of all subpages and images |
19:41
🔗
|
DFJustin |
they have to be sparing with how deep they go in order to try and cover the whole internet |
19:42
🔗
|
omf_ |
Archive Team always goes deep, shit we'll take the damn hard drives if we can |
19:43
🔗
|
godane |
we are the NSA elfs |
20:01
🔗
|
ivan` |
you have been selected for backup |
20:03
🔗
|
omf_ |
Resistance is futile |
20:07
🔗
|
sep332 |
your historical and technical distinctiveness will be added to our... oh wait this is our own user-generated data to begin with? |
20:11
🔗
|
SilSte |
Hi |
20:11
🔗
|
SilSte |
are there any problems with the tracker? |
20:12
🔗
|
ivan` |
yes |
20:13
🔗
|
SilSte |
kk |
20:13
🔗
|
SilSte |
do you know how long they will take? |
20:13
🔗
|
omf_ |
nope |
20:14
🔗
|
alard |
SilSte: The tracker is doing something -- I don't know exactly what -- and perhaps it will come back when that's finished. |
20:14
🔗
|
SilSte |
:D |
20:14
🔗
|
SilSte |
k |
20:14
🔗
|
ivan` |
boy am I glad I set up my own tracker ;) |
20:14
🔗
|
SilSte |
uploaded a 20gb+ file... don't want it to get deleted ^^ |
20:15
🔗
|
alard |
SilSte: Just wait, your warrior will keep trying to contact the tracker. |
20:16
🔗
|
SilSte |
i know |
20:18
🔗
|
SketchCow |
http://archive.org/sitemap/sitemap.xml |
20:20
🔗
|
omf_ |
SketchCow, is that all the items or just all the collections? |
20:21
🔗
|
ersi |
Interesting. *clicks* |
20:21
🔗
|
omf_ |
there are 182 xml.gz files and each looks to have ~50,000 links |
20:23
🔗
|
omf_ |
9.1e6 items |
20:36
🔗
|
alard |
The tracker is back. |
20:37
🔗
|
antomatic |
posterous is fucked - all invalid jobs |
20:44
🔗
|
Coderjoe |
well that's cool. one could use the sitemap to build a database of file checksums. the lastmod date telling you if you need to grab an updated _files.xml |
20:45
🔗
|
SketchCow |
http://web.archive.org/cdx/ |
21:09
🔗
|
xmc |
alard: got a notification that the tracker was being migrated to a new host, I take it you didn't initiate that? |
21:12
🔗
|
omf_ |
alard, did that |
21:12
🔗
|
omf_ |
It was for the free RAM upgrade |
21:15
🔗
|
xmc |
thought as much |
23:53
🔗
|
godane |
so i add a browse here links in my g4 image dumps |