#archiveteam 2013-06-17,Mon

↑back Search

Time Nickname Message
00:17 🔗 DFJustin there'll be plenty of time to argue about login walls after the manuscripts aren't disintegrating
02:06 🔗 BlueMax Is ExoDOS up yet?
02:19 🔗 SketchCow Still uploading.
02:19 🔗 SketchCow These are large items.
02:21 🔗 SketchCow http://archive.org/details/eXoDOSAct.v1.6 is the first and it's STILL regulating out.
02:25 🔗 dashcloud SketchCow: can you tweet about the Timbuktu manuscript project on the archiveteam twitter? http://www.indiegogo.com/projects/timbuktu-libraries-in-exile/x/7939
02:25 🔗 BlueMax fair enough SketchCow
08:40 🔗 SketchCow Digitizin' Apple Diskettes, rendering documentary, uploading Formspring, uploading ExoDOS, uploading bitsavers.
09:35 🔗 BlueMax That's the SketchCow life.
09:35 🔗 BlueMax Digiupding.
09:37 🔗 BlueMax Will it be possible to browse the large ExoDOS zip files?
09:40 🔗 godane looks like i have a 142 episodes of this week in computer hardware uploaded
10:23 🔗 ruairi BlueMax: what was the deal with that, are they getting uploaded?
10:24 🔗 BlueMax ruairi, they already are http://archive.org/details/eXoDOSAct.v1.6
10:28 🔗 ruairi WHOA, right on! :D
10:29 🔗 ruairi BlueMax: What wouldn't be welcome at archive.org then? - I can try and get some ex UG content up if I reach out
10:30 🔗 BlueMax I think SketchCow is more of an expert on that front than I am
10:30 🔗 ruairi When does he get online (ish)?
10:31 🔗 BlueMax Well he was online like an hour or two ago, just hang around and see if he pops back in
10:31 🔗 SketchCow Justin Beiber. totally unwelcome
10:32 🔗 ruairi SketchCow: 'ello sir! I'm ruairi aka rc55 who runs the uk demoscene party "Sundown"
10:32 🔗 Smiley ruairi: upload _ALL_ the things
10:32 🔗 Smiley worry about if it's welcome later.
10:33 🔗 Smiley Anything unwelcome goes dark I believe. Delete nothing, Archive everything!
10:34 🔗 ruairi Yeah, doesn't it get very iffy if current stuff goes up though? There are fullsets of PSX ISOs out there, but if people start uploading PS2, PS3 etc...
10:34 🔗 Ymgve I wonder how large a complete ps2 library is
10:34 🔗 godane i would think ps2 ps3 demo discs would be less iffy
10:35 🔗 ruairi Ymgve: I'm guessing about 1.8TB
10:35 🔗 godane i think its around 500 to 800gb
10:35 🔗 Ymgve ruairi: that sounds way too small
10:35 🔗 BlueMax actually it's around that size for a single PS2 region
10:35 🔗 BlueMax from what I remember of the old UG torrents
10:35 🔗 ruairi hm
10:35 🔗 Ymgve that's like only 500 titles
10:35 🔗 ruairi Fair play :)
10:36 🔗 godane that must have been ps1 i was think of
10:36 🔗 Ymgve yeah, for ps1 it's probably correct since each title is at most 700mb
10:36 🔗 Ymgve but for ps2 a title can be up to 8gb
10:37 🔗 Ymgve "thankfully" there are fewer ps2 games than ps1 games
10:37 🔗 Ymgve and even less ps3 games
10:38 🔗 ruairi Could there be any collaboration between archive.org and redump.org possibly?
10:38 🔗 ruairi 23,000 dumps done there
10:45 🔗 godane i'm thinking of grabing all batman 1966 tv show master vhs tapes
10:46 🔗 godane i just could see the comments/reviews on it now
10:46 🔗 BlueMax man I still have a copy of that series around here somewhere.
10:47 🔗 godane it was never release on to dvd or vhs
10:47 🔗 godane these are studio vhs master
10:50 🔗 antomatic Rights hell, apparently - supposed to be a very complicated production agreement and a lot of areas (e.g. home video rights) that were never discussed or agreed and aren't fully clear.
10:51 🔗 godane also cause of all the cameo appearces
10:51 🔗 godane they didn't sign anything for home video rights
10:52 🔗 antomatic Doesn't really make the series public domain, though - I can't see how it could be legitimately carried on archive.org. (Then again a lot of what's there already surprises me, so don't listen to me..)
10:56 🔗 godane plus side is youtube also has it: https://www.youtube.com/playlist?list=PLA7491A8FC7830D6E
10:56 🔗 godane its being uploaded for over a 1 year
10:57 🔗 godane so i got with if it can stay up on youtube it should stay up on archive.org
10:57 🔗 godane *go with
10:58 🔗 antomatic Doesn't make it legitimate, though, unless the studio has uploaded it themselves. Ultimately it belongs to the studio, not anyone else. As you say, if it's on YouTube then that might indicate that the studio isn't confident in their ownership or there are other reasons why they haven't taken it down (and just not noticing is a possibility) but it doesn't legitimise it.
10:58 🔗 antomatic Not to be a nay-sayer, though - I'm not saying you shoudln't do it. Just that I can see the 'ethical dilemmas'.
11:00 🔗 antomatic If more people had made their own archiving efforts back in the 1950s and 1960s there wouldn't be so many episodes of Doctor Who missing today, for example. :)
11:00 🔗 antomatic Grey area, though.
11:03 🔗 godane Batman 1966 TV Series DVD Set Review: https://www.youtube.com/watch?v=GuvOZ8vZARM
11:04 🔗 godane its from a bootleg dvd
11:05 🔗 BlueMax antomatic, well it's either they're not confident or they haven't got an auto Content ID check / some guy looking for it yet.
11:09 🔗 godane now this is funny
11:10 🔗 godane i was looking up the 17bit software
11:10 🔗 godane the amgia pd disks i have
11:10 🔗 godane there origin is from team17
11:12 🔗 Tephra WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD
11:13 🔗 antomatic Ah, I believe it to be 'yahoosucks', Tephra.
11:13 🔗 Tephra antomatic: thanks :)
11:13 🔗 godane i need help
11:14 🔗 godane i didn't upload the index text file for my 17bit phase four item
11:16 🔗 godane now it works
11:17 🔗 godane i have right click on the tree root and then click upload file
11:17 🔗 godane classic uploader just is not working for me for some reason
12:14 🔗 omf_ Ymgve, PS1 games can span multiple disks so 700mb is not the max size for a game
12:14 🔗 Smiley antomatic: Just because it can't be seen doesn't mean IA doesn't want it.
12:17 🔗 omf_ Don't we have a FAQ page for these same tired questions?
12:17 🔗 Smiley maybe
12:17 🔗 Smiley But I like repeating myself ;)
12:21 🔗 antomatic Aah. I see. :)
12:29 🔗 omf_ I cannot be the only one tired of seeing the same questions over and over again. In fact I know I am not since we are always pointing people to the wiki at least for projects and application information. I am going to look through the wiki and if we don't have a page answering these questions I am going to make one
12:33 🔗 godane so i'm finally uploading the last few forums of g4
12:38 🔗 Smiley omf_: cool,
13:12 🔗 * ats notes someone ought to mirror http://arcade.demon.co.uk/ and its file section at ftp://arcade.demon.co.uk/ ...
13:13 🔗 ats although sadly I can't remember my username and password from when I used to use it back in 1993 or so...
13:22 🔗 godane ats: i will see about mirroring that
13:24 🔗 BlueMax heesh what's on that FTP? I see nothing but a bunch of nothing files with no extensions
13:25 🔗 godane its cause the file names are only on the website
13:25 🔗 godane http://arcade.demon.co.uk/filepages/file56.htm
13:25 🔗 godane it will give you a name and a desc of the file
13:25 🔗 ivan` are there any massive URL lists beside common_crawl_index?
13:27 🔗 godane whats the best options to mirroring a ftp site
13:29 🔗 godane i'm just going to mirror it
13:29 🔗 godane but there will be a wget.log for this
13:40 🔗 omf_ Okay here is the start. I am trying to keep it short and on point http://pad.archivingyoursh.it/p/faqs
13:55 🔗 Smiley Hmmm "Won't I get sued" ?
13:55 🔗 Smiley A: "Ask a lawyer"
14:33 🔗 omf_ added
14:37 🔗 DFJustin sketchcow is open to archiving porn
14:40 🔗 BlueMax mostly beard porn
14:40 🔗 omf_ I thought hat porn as well
14:41 🔗 BlueMax floppy porn?
14:41 🔗 omf_ Our logo is floppy porn
15:23 🔗 ivan` hard to believe, but qq.com and 163.com have rss feeds
15:54 🔗 omf_ We have Transclusion enabled on our wiki right?
16:24 🔗 SketchCow omf_: I prefer the human touch
16:31 🔗 omf_ So I should just copy and paste stuff from existing pages? There is material spread across multiple pages that I believe would also benefit users if presented as a single page
16:37 🔗 SketchCow How are you so much more angrier than me?
16:38 🔗 omf_ I am between jobs
16:38 🔗 SketchCow I'm like, the angriest person in the world.
16:38 🔗 SketchCow That'll do it.
16:42 🔗 omf_ Do we have a write once, display multiple places solution in mediawiki?
16:46 🔗 Ravenloft AngryCow, the new Angry Birds spin-off
16:49 🔗 ivan` it would be quite helpful if IA published a 2.4TB bz2 or leveldb dump of all their URLs
16:49 🔗 omf_ I second that motion ivan`
16:51 🔗 ivan` I could probably rescue thrice as much of Reader with it
16:51 🔗 omf_ It would help many projects
18:47 🔗 SketchCow ivan`: I just asked about feasibility in the dev channel.
18:54 🔗 ivan` cool
18:54 🔗 SketchCow He said it's feasible.
18:57 🔗 omf_ That is awesome news
18:58 🔗 omf_ I assume the dataset would be CC or public domain so it can be used freely.
19:31 🔗 underscor ivan`: you're probably looking at 5-7TB compressed, though
19:31 🔗 underscor it would probably have to be chunked
19:32 🔗 ivan` I don't have that much space at the moment, but I would definitely buy a drive or two to hold it soon
19:33 🔗 ivan` IA could provide full text search on the URLs :-)
19:36 🔗 underscor We kind of have that
19:36 🔗 underscor but it's internal right now
19:38 🔗 Coderjoe what is this url thing for? (i'm not undersanding how it helps a mirror/save project)
19:38 🔗 underscor Same thing we use google scraping for
19:38 🔗 underscor finding subdomains/user pages
19:38 🔗 Coderjoe ah
19:39 🔗 Coderjoe but if ia knows of them, won't it usually already have crawled the content? (unless blocked by robots.txt)
19:39 🔗 ivan` there might be newer content, or in my case, a heck of lot more content in Reader's cache
19:40 🔗 DFJustin often ia will have hit it with a shallow crawl but not a deep crawl of all subpages and images
19:41 🔗 DFJustin they have to be sparing with how deep they go in order to try and cover the whole internet
19:42 🔗 omf_ Archive Team always goes deep, shit we'll take the damn hard drives if we can
19:43 🔗 godane we are the NSA elfs
20:01 🔗 ivan` you have been selected for backup
20:03 🔗 omf_ Resistance is futile
20:07 🔗 sep332 your historical and technical distinctiveness will be added to our... oh wait this is our own user-generated data to begin with?
20:11 🔗 SilSte Hi
20:11 🔗 SilSte are there any problems with the tracker?
20:12 🔗 ivan` yes
20:13 🔗 SilSte kk
20:13 🔗 SilSte do you know how long they will take?
20:13 🔗 omf_ nope
20:14 🔗 alard SilSte: The tracker is doing something -- I don't know exactly what -- and perhaps it will come back when that's finished.
20:14 🔗 SilSte :D
20:14 🔗 SilSte k
20:14 🔗 ivan` boy am I glad I set up my own tracker ;)
20:14 🔗 SilSte uploaded a 20gb+ file... don't want it to get deleted ^^
20:15 🔗 alard SilSte: Just wait, your warrior will keep trying to contact the tracker.
20:16 🔗 SilSte i know
20:18 🔗 SketchCow http://archive.org/sitemap/sitemap.xml
20:20 🔗 omf_ SketchCow, is that all the items or just all the collections?
20:21 🔗 ersi Interesting. *clicks*
20:21 🔗 omf_ there are 182 xml.gz files and each looks to have ~50,000 links
20:23 🔗 omf_ 9.1e6 items
20:36 🔗 alard The tracker is back.
20:37 🔗 antomatic posterous is fucked - all invalid jobs
20:44 🔗 Coderjoe well that's cool. one could use the sitemap to build a database of file checksums. the lastmod date telling you if you need to grab an updated _files.xml
20:45 🔗 SketchCow http://web.archive.org/cdx/
21:09 🔗 xmc alard: got a notification that the tracker was being migrated to a new host, I take it you didn't initiate that?
21:12 🔗 omf_ alard, did that
21:12 🔗 omf_ It was for the free RAM upgrade
21:15 🔗 xmc thought as much
23:53 🔗 godane so i add a browse here links in my g4 image dumps

irclogger-viewer