#archiveteam 2013-05-06,Mon

↑back Search

Time Nickname Message
00:25 🔗 DFJustin https://archive.org/details/archive.pdp11.org.ru-20130504
00:44 🔗 ivan` it looks like Feed API hits URLs outside /reader/ and is a lot more annoying to use
02:54 🔗 SketchCow Slightly redid the page, nothing earth-shattering.
05:27 🔗 SketchCow Time to set up some floppy scannin'!
05:28 🔗 SketchCow I'm going to start with Compute! floppies, work from there.
07:52 🔗 Nemo_bis aww floppies
07:53 🔗 Nemo_bis number of (media) files on Wikimedia Foundation servers: http://ganglia.wikimedia.org/latest/graph.php?r=year&z=xlarge&m=swift_object_count&h=Swift+pmtpa+prod&c=Swift+pmtpa&trend=1
07:53 🔗 Nemo_bis mostly thumbnails
13:21 🔗 omf_ I uploaded the first block of 10,000 screenshots that are finished - http://archive.org/details/posterous_screens_01
13:21 🔗 omf_ The item contains a tar of pngs, a log file and the list of urls used
13:22 🔗 godane i want your script so i can do that
13:22 🔗 omf_ What other information should I include?
14:41 🔗 omf_ SketchCow, when you have a minute could you flip this from software to texts Mediatype. I goofed the first one - http://archive.org/details/posterous_screens_01
15:00 🔗 SketchCow Done
15:05 🔗 omf_ SketchCow, you flipped the collection to Ebook and text, but the mediatype is still software
15:09 🔗 SketchCow So I did, so I did. Fixed.
15:12 🔗 yan heads up, shouting coming up!
15:12 🔗 yan WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD
15:12 🔗 SketchCow HUZZAH GOOD SIR, THE MAGIC WORD OF SECRET IS "yahoosucks"
15:19 🔗 godane SketchCow: i uploaded 3 of my scanned magazines
15:19 🔗 godane https://archive.org/details/pc-novice-1995-04
15:19 🔗 godane https://archive.org/details/pc-novice-1995-05
15:19 🔗 godane https://archive.org/details/pc-novice-1995-06
15:23 🔗 yan thank ye, kind fere, henceforth I am proud to consider myself a proby among your fine band of warriors!
15:38 🔗 SketchCow Why would you make this posterous screens collection text versus software?
15:52 🔗 audy what do archive now?
15:53 🔗 godane fuck me
15:54 🔗 godane looks like spark cbc links are not working for older episodes now
16:05 🔗 godane good news is the wayback machine last year took a good archive it looks like of it
16:23 🔗 godane so we may get lucky and have a very almost complete collection of cbc spark soon
16:49 🔗 MrArgent whoo, 600mb till i've got the Textfiles backup stored locally!
18:53 🔗 berndj is there a usenet archive that's free from the shackles of dejanews' new owners?
19:59 🔗 Tux berndj: olduse.net?
20:02 🔗 berndj ah. as long as the usenet legacy isn't vulnerable to getting geocitied
21:03 🔗 omf_ SketchCow, I should be setting this stuff collection "ourmedia", mediatype "image" for the posterous screens. That seems a better fit
21:22 🔗 omf_ Then again the screenshot grabs from last year are all mediatype: web
21:23 🔗 omf_ If anyone else has an opinion or view on this, I am open to suggestions
21:23 🔗 omf_ http://archive.org/details/archiveteam-fortunecity-screenshots for example
21:24 🔗 Smiley sounds good to me
21:27 🔗 omf_ Smiley, should I be setting them to mediatype image or web?
21:27 🔗 omf_ It is a creative commons collection of images that represent a crawl of the web
21:28 🔗 omf_ Well the web aspect is there via the webcrawl tag
21:28 🔗 omf_ Take note everyone. Metadata is harder than getting the data :)
21:32 🔗 Smiley I'd say image.
21:32 🔗 Smiley are the metatags used to work out how to display said data?
21:32 🔗 Smiley if so, it' needs to be image.
21:33 🔗 omf_ IA just mime types and magics files
21:33 🔗 omf_ it is the fastest and most common method
21:34 🔗 omf_ can you really trust user data?
21:34 🔗 DFJustin the mediatype does determine the layout of the item page
21:34 🔗 DFJustin e.g. "text" doesn't have links to the raw files unless you click through to the HTTP directory view
21:35 🔗 omf_ I was thinking of the data work on the files not the display
21:35 🔗 omf_ what DFJustin said ^^
21:36 🔗 DFJustin software/image/web/data all seem to have the same or basically the same layout for now, but that could change down the road
21:37 🔗 DFJustin for screenshots I'd probably go image because web is more designed for warcs for the wayback machine, and in the future they may add thumbnail browsing or the like
22:06 🔗 SketchCow Just got heads
22:06 🔗 SketchCow up
22:06 🔗 SketchCow Huge internal fight at pouet.net
22:06 🔗 SketchCow We need to grab this thing
22:07 🔗 Smiley Awww crap, huge forum
22:07 🔗 omf_ The url format is pretty easy - http://pouet.net/topic.php?which=9389&page=1&x=25&y=11
22:08 🔗 omf_ which is a thread
22:08 🔗 omf_ page is pagination
22:08 🔗 omf_ and the x & y can be left off
22:09 🔗 balrog ohshit
22:09 🔗 balrog pouet.net is important
22:09 🔗 Smiley Doing "normal" wget grab to warc
22:09 🔗 omf_ which is also everything else
22:10 🔗 Smiley well the server is nice and fast at least.
22:11 🔗 omf_ not for long ;)
22:11 🔗 Smiley ;)
22:11 🔗 omf_ This should be pretty easy to run on the warrior
22:12 🔗 Smiley get to it then guys
22:12 🔗 Smiley :D
22:12 🔗 Smiley Not like I know how to setup warrior tasks yet and have far too much to do :(
22:12 🔗 omf_ Smiley, For once your not actually bored?!?
22:12 🔗 omf_ there is also this url pattern
22:13 🔗 omf_ http://pouet.net/download.php?which=55993
22:14 🔗 Smiley omf_: nope, far too mcuh to do! :D
22:18 🔗 omf_ wow so I just read up on pouet and what is going on
22:19 🔗 omf_ A user wrote them a new code base and is now using it as leverage to make a land grab
22:19 🔗 omf_ and the users are freaking out that all the data is going to get closed and erased
22:21 🔗 balrog omf_: where is this?
22:21 🔗 chronomex as well they should
22:21 🔗 balrog if they can't figure out how to migrate the data, they should freeze and archive the old site as read-only.
22:23 🔗 omf_ balrog, it has nothing to do with data migration and more to do with who is going to "own" the data in the future
22:24 🔗 Smiley like thingiverse?
22:24 🔗 Smiley but not quite.
22:24 🔗 omf_ It is 10 pages of comments so far
22:26 🔗 omf_ Also pouet made the mistake of not opening source the 2.0 from the beginning of the project and this multiyear closed development project is full of ego
22:26 🔗 balrog ughhhh
22:27 🔗 omf_ They let a coder run wild on his own for years, what did they expect to happen
22:31 🔗 omf_ downloading the files is going to be tricky
22:31 🔗 balrog omf_: I've heard that FurAffinity has had similar internal political issues for what, the past 2 years? Not that I have an account there, as I don't
22:36 🔗 philpem I do.
22:36 🔗 philpem The URLs are straightforward incrementing ID numbers.
22:37 🔗 philpem Journals and submissions would get you >80% of the "interesting" data. Userpages (linked from submissions) would get you the rest.
22:37 🔗 philpem Parse the page and look for usernames in the comments, look who submitted the thing etc.
22:38 🔗 philpem The only catch is that they have options for restricting viewing to logged-in users; also anything rated NSFW requires login (and account set to "allow adult content") to see it.
22:38 🔗 balrog I do know that they deliberately block all robots including googlebot
22:39 🔗 philpem Well yeah, robots.txt
22:39 🔗 balrog well yeah that's what I meant
22:39 🔗 philpem They used to allow them until Dragoneer had a hissy fit about it
22:39 🔗 omf_ Robots.txt does not mean shit, when I think blocking I think of the shit google and yahoo do
22:40 🔗 philpem Deviantart, Inkbunny, SoFurry, Weasyl and Nabyn all allow bots in. Better to let them index the site, then you can find stuff with internet searches. (but we're straying off topic)
22:40 🔗 HeaD_ greped over http://pouet.net/groups.php?pattern=[a-z]
22:41 🔗 HeaD_ got 61243 prod-ids
22:41 🔗 HeaD_ where can i paste it?
22:41 🔗 omf_ paste.archivingyoursh.it or pad.archivingyoursh.it
22:45 🔗 HeaD_ http://paste.archivingyoursh.it/wadarihewe.md
22:47 🔗 Smiley for x,y in ./list_of_ids; do wget ....$x..$y; done
22:47 🔗 Smiley except I'm unsure the exact wget we need.
22:48 🔗 Smiley Oh wait thats line numbers ¬_¬
22:58 🔗 HeaD_ i found 11139 groups: http://paste.archivingyoursh.it/nexamejoyi.apache
23:10 🔗 SketchCow http://archive.org/details/DOS.Memories.Project.1980-2003
23:13 🔗 SketchCow http://ia601705.us.archive.org/zipview.php?zip=/7/items/DOS.Memories.Project.1980-2003/DOS.Memories.Project.1980-2003.zip
23:16 🔗 DFJustin https://archive.org/details/oldies-but-goldies-1740-games https://archive.org/details/Nextys_Archive https://archive.org/details/PC98_Games_1813
23:16 🔗 DFJustin zipview still doesn't like japanese filenames though :(
23:18 🔗 DFJustin more to come too, I have a 7.75gb home of the underdogs set and a bunch more .jp stuff

irclogger-viewer