#archiveteam 2013-05-23,Thu

↑back Search

Time Nickname Message
00:35 🔗 SketchCow OK, here we go - uploading 100+ CD-ROMs
00:39 🔗 godane cool
00:46 🔗 dashcloud awesome!
00:54 🔗 SketchCow http://archive.org/search.php?query=collection%3Acdbbsarchive&sort=-publicdate
00:54 🔗 SketchCow Watch the fun
00:54 🔗 SketchCow they'll pop in like crazy
00:56 🔗 godane SketchCow: you should get me access to cdbbsarchive and archiveteam
00:57 🔗 godane just so when i add web dumps there not stuck in texts forever
01:00 🔗 dashcloud why would web dumps go in cdbbssarchive ?
01:01 🔗 godane they go to archiveteam
01:01 🔗 dashcloud okay- that makes more sense
01:01 🔗 godane access to cdbbsarchive will help me when uploading stuff like twilight series of cds/dvds
01:18 🔗 SketchCow There is no situation where you would upload twilight DVDs to that collection.
01:18 🔗 SketchCow It's software, godane.
01:19 🔗 godane ok
01:20 🔗 godane why?
01:20 🔗 godane its dvd iso or bin/cue image
01:20 🔗 SketchCow Twilight. Like the Twilight movies?
01:20 🔗 godane no
01:20 🔗 SketchCow Explain to me what they are. Link?
01:21 🔗 SketchCow 02
01:22 🔗 godane https://archive.org/details/cdrom-twilight-003
01:22 🔗 godane its shareware cds/dvds
01:24 🔗 godane i converted the first 5 isos from mdf files
01:24 🔗 godane that mdf and mds files are still there too
01:29 🔗 balrog keep the mdf/mds, yes
01:31 🔗 godane its only the first 15 that i have in mdf/mds format
01:31 🔗 godane after that there in iso or bin/cue
01:31 🔗 balrog bin/cue is preferred over iso
01:31 🔗 balrog but still not great for multisession/etc
01:37 🔗 SketchCow OK, you have access to cdbbsarchive
01:37 🔗 godane anyways i have uploaded over 44k videos to g4video-web
01:37 🔗 godane cool
01:38 🔗 SketchCow And archiveteam
01:41 🔗 godane great
01:42 🔗 godane now i don't have to wait to push my stuff to your collection
01:42 🔗 godane *my stuff to be add to your collection
02:07 🔗 Start WHAT FORSOOTH, PRITHEE TELL ME THE SECRET
02:07 🔗 BlueMax "yahooschmahoo"
02:12 🔗 SketchCow "yahoosucks"
02:13 🔗 SketchCow Ignore Bluemax, we leave him around so the children feel better about themselves
02:13 🔗 SketchCow 95 CD-ROMs uploaded. 56gb of content.
02:14 🔗 BlueMax what collection again?
02:20 🔗 dashcloud is there a way to search through the CD's for a particular file name yet?
02:37 🔗 SketchCow cdbbsarchive
02:37 🔗 SketchCow Absolutely not.
02:41 🔗 SketchCow http://archive.org/search.php?query=collection%3Acommodore_c64_books&sort=-publicdate
02:41 🔗 SketchCow That will eventually expand out to 95 books
02:52 🔗 godane found a 3min interview with john ritter
03:05 🔗 SketchCow Now at 27
03:05 🔗 SketchCow Get reading, you'll never catch up.
03:15 🔗 DFJustin this pc gamer stuff is all dupes
03:17 🔗 SketchCow Is it? I scanned them in myself.
03:18 🔗 SketchCow I'm glad there's some dupes, it was getting freaky.
03:18 🔗 godane i think i can edit anything
03:18 🔗 SketchCow Oh man, godane.
03:19 🔗 SketchCow If you went ahead and fixed metadata on any objects in the CD Archive, they'd sing songs about you
03:19 🔗 godane your giving me full admin?
03:19 🔗 SketchCow You're an admin of that collection, yes.
03:19 🔗 SketchCow Delete something and they'll find you in 12 coffee cans.
03:19 🔗 SketchCow Let me do that.
03:19 🔗 SketchCow But feel free to increase the quality of metadata for that thing.
03:24 🔗 godane uploaded: https://archive.org/details/cdrom-twilight-006
03:26 🔗 godane for a few seconds there i thought i could move my stuff to archiveteam
03:31 🔗 SketchCow So many Commodore C64 books.
03:31 🔗 SketchCow Uploading manuals by the metric ton, too
03:31 🔗 SketchCow http://archive.org/details/commodore_c64_manuals
03:32 🔗 SketchCow http://archive.org/stream/1541_Flash_Disk_Speedup_for_SX-64_1985_Skyles_Electric_Works#page/n23/mode/2up
03:32 🔗 SketchCow now those are clear instructions
03:32 🔗 SketchCow not scary at all
03:33 🔗 SketchCow Staring into a wire-mass maw of SX64 misery
03:53 🔗 godane SketchCow: i uploaded those JumpStart games
03:53 🔗 godane JumpStart 1st Grade: https://archive.org/details/JumpStart_1st_Grade
03:54 🔗 godane https://archive.org/details/Jumpstart_2nd_Grade
03:54 🔗 godane https://archive.org/details/Jumpstart_3rd_Grade
03:54 🔗 godane https://archive.org/details/Jumpstart_4th_Grade
03:55 🔗 godane https://archive.org/details/Jumpstart_5th_Grade
03:55 🔗 godane can these be moved to cdbbsarchive?
04:18 🔗 SketchCow Yes
04:18 🔗 SketchCow But they're .zips when ISOs are better.
04:18 🔗 SketchCow But whatever, get them in
04:23 🔗 godane there clonecd images in the zips
04:24 🔗 godane it just was 4 files in them and i don't know know how to do more then one file to my s3 upload script
04:26 🔗 SketchCow You don't, just do them over and over
04:27 🔗 godane oh
04:28 🔗 godane i normally use ftp for more then one file
04:37 🔗 godane SketchCow: i don't see archiveteam collection when i'm checking in my item
04:57 🔗 SketchCow You absolutely have admin access.
05:07 🔗 godane i know i do in the software collection
05:07 🔗 godane but i don't see web crawls drop down menu or something saying archiveteam
05:11 🔗 godane i will test it later with my s3 upload script
05:16 🔗 SketchCow Oh, yes.
05:16 🔗 SketchCow That's true, you might be fucked there.
05:18 🔗 omf_ godane what script are you using to upload?
05:18 🔗 godane a custom script i have
05:26 🔗 omf_ And it cannot upload more than one file at once? This is why I do development on the open source ias3upload so that everyone can benefit
05:58 🔗 ivan` how do I install and run universal-tracker? I am unfamiliar with gem
06:01 🔗 ivan` (and do rubyists run all of these tools as root or what?)
06:35 🔗 ivan` okay, I did ~/.gem/ruby/1.9.1/bin/bundle install --path=~/.gem; cp config/redis.json.example config/redis.json; rackup
06:35 🔗 ivan` that seems to be working
06:50 🔗 ivan` so I'm writing the pipeline stuff for greader, is it reasonable to put 1000 unrelated feeds into each warc?
06:51 🔗 ivan` most of them will be 404 anyway
06:51 🔗 omf_ depends on the output size, most people have slow ass upload speeds
06:51 🔗 ivan` it's mostly gzipped json, no images
06:51 🔗 omf_ then I don't think it would be a problem
06:52 🔗 omf_ I mention it because we constantly see people asking about uploads, not understand that is the part that takes forever not downloading
06:52 🔗 ivan` too bad it ain't .warc.lzma2
09:10 🔗 ivan` the inversion of pipeline is really frustrating :/
09:10 🔗 ivan` trying to pass multiple URLs to WgetDownload
09:29 🔗 ersi ivan`: Just pass a list/text file with targets?
09:40 🔗 Smiley the warc uploads are STILL going for that site godane
09:43 🔗 Smiley 2013-05-22 22:30:39 (92.3 MB/s) - 'rbelmont.mameworld.info/index.html?feed=rss2&page_id=70' saved [751]
09:43 🔗 Smiley FINISHED --2013-05-22 22:30:39--
09:44 🔗 Smiley Total wall clock time: 46m 27s
09:44 🔗 Smiley Downloaded: 1495 files, 220M in 7m 5s (530 KB/s)
09:45 🔗 Smiley :)
09:46 🔗 omf_ Smiley, Baljem did this yesterday https://archive.org/details/rbelmont.mameworld.info.warc
09:49 🔗 ivan` https://github.com/ArchiveTeam/greader-grab :-)
09:50 🔗 ivan` ersi: considered that, but didn't want to code the cleanup
09:50 🔗 ivan` I wrote a ConcatenatedList with a .realize
09:53 🔗 ivan` some feeds have a massive history, most are near-empty or 404, hopefully slow people will not get 1000 massive feeds :/
09:56 🔗 Smiley omf_: good good.
09:56 🔗 ersi ivan`: Maybe make smaller chunks, just in case?
09:59 🔗 ivan` yeah, perhaps 100 or 200
10:02 🔗 ersi sounds good
10:32 🔗 ivan` how in the world does this work in posterous-grab? id_function=(lambda item: {"ua": item["user_agent"] })
10:33 🔗 ivan` my item is lacking this user_agent in my pipeline
10:34 🔗 ersi The tracker for posterous dishes out user-agents.. and that's where it was introduced first
10:34 🔗 ersi It's for overriding the useragent in the WgetDownload
10:35 🔗 ivan` oh man, I didn't even realize the tracker could dish out json
10:36 🔗 ivan` I suppose I should stop doing my string splitting now
10:36 🔗 ivan` (unless adding jobs with item_name and some extra unique data is hard?)
10:42 🔗 ivan` right now my items are 00000001|feed1`feed2`...
10:45 🔗 ersi Why not just give them a name? Like task1 2 3 etc
10:48 🔗 ivan` I don't understand :)
10:50 🔗 ivan` instead of doing the string splitting stuff, should I write a program for inserting a job like {"item_name": "0000001", "feed_urls": [...]} into redis?
10:51 🔗 ersi I think so, but I'm not certain.. I think alard's the man
17:58 🔗 SketchCow Hey godane
19:25 🔗 godane SketchCow: hey
19:25 🔗 SketchCow Hey.
19:25 🔗 SketchCow So, in the future, for texts
19:25 🔗 SketchCow Please enter subjects as subject[1] subject[2] and so on, not as just subject
19:26 🔗 SketchCow Small bug with s3, and how it works
19:26 🔗 SketchCow if you do subject=keyword1;keyword2, it just makes a single subject "keyword1;keyword2" instead of the right thing.
19:26 🔗 SketchCow This is just for texts.
19:26 🔗 SketchCow Got it?
19:26 🔗 SketchCow It'll help going forward
19:27 🔗 godane this is a problem with texts keywords
19:27 🔗 godane ok
19:27 🔗 alard ivan`: Haven't looked at your pipeline code, but if you need more details, perhaps #warrior is a useful channel for discussion?
19:29 🔗 godane SketchCow: i don't have this problem here:
19:29 🔗 godane https://archive.org/details/g4tv.com-hdvideo-xml-20130228
19:37 🔗 SketchCow Yes, that's video.
19:37 🔗 SketchCow mediatype movies
19:37 🔗 SketchCow When it's mediatype texts, the problem shows.
19:38 🔗 godane that is a web dump
19:38 🔗 godane mediatype is texts
19:38 🔗 godane collection is opensource
19:38 🔗 godane :P
19:39 🔗 SketchCow This is what has been passed to me.
19:39 🔗 SketchCow http://archive.org/details/Loadstar-Letter-Issue-45 is the example he used.
19:40 🔗 godane how do you fix that problem?
19:43 🔗 godane i tryed removing it and adding it back in
19:43 🔗 SketchCow 15:27 <@SketchCow> Please enter subjects as subject[1] subject[2] and so on, not as just subject
19:43 🔗 godane how do you do that in web edit?
19:44 🔗 godane there is only subject
19:44 🔗 SketchCow Add at the bottom
19:44 🔗 Nemo_bis adding another field
19:50 🔗 godane ok this is weird
19:50 🔗 godane it doesn't cause anyproblems in search pages
19:52 🔗 godane also looks like i only add key words to loadstar letter like this
19:53 🔗 godane i think i didn't add any key works to stuff like my sandhills publishing pdfs i uploaded
19:53 🔗 godane *words
19:54 🔗 SketchCow Great.
19:55 🔗 godane so this bug looks to be limited to the page of the item
19:56 🔗 godane not search results
19:56 🔗 godane so its not a very big bug
19:59 🔗 Nemo_bis not being able to click a keyword is annoying
19:59 🔗 godane i know
19:59 🔗 godane but this is in a collection
20:00 🔗 godane so its not has bad has it could be
20:03 🔗 godane http://developers.slashdot.org/story/13/05/23/1752201/google-code-deprecates-download-service-for-project-hosting
20:09 🔗 balrog what... why
20:10 🔗 godane figure guys should get one that
20:11 🔗 godane *on that
21:27 🔗 ivan` alard: thanks, didn't know that existed
21:27 🔗 Smiley Pouet still going;
21:28 🔗 Smiley newAmerica is now on the dedibox, and I need to write up the metadata for it
21:38 🔗 Smiley And it's uploading :) This is gonna take _AWHILE_
22:23 🔗 Smiley godane: on warc 13..... I doubt by the morning it'll even be finished :D
23:14 🔗 wp494 in the event that I need to suspend a warrior when it's uploading something, will the upload be fucked or will things be normal upon resume?
23:16 🔗 Smiley most likely resume
23:16 🔗 Smiley else it'll just start again
23:16 🔗 Smiley Depends if it ends up connecting ot the same server on resume, and if the resume data is intact.
23:28 🔗 S[h]O[r]T i think its just fos right now where everything is being pushed
23:29 🔗 Smiley cool
23:50 🔗 SketchCow So, Brewster asked me if one of you wanted to take a shot at a pretty weird little project.
23:50 🔗 SketchCow If you want to fuck with WARCs, let me know
23:57 🔗 omf_ SketchCow, I already fuck with warcs
23:58 🔗 omf_ If it is programming I am all ears
23:58 🔗 SketchCow Take a WARC, download it, rip out a collection of "interesting" GIFs, JPGs and PNGs, and make a collage.
23:58 🔗 SketchCow Write it so "make a collage" can be different things.
23:58 🔗 omf_ Do you mean different file types for the collage or different image types
23:59 🔗 SketchCow Sorry.
23:59 🔗 SketchCow WARC --> Rip Images --> Make Collage
23:59 🔗 SketchCow WARC --> Rip Images --> Make Gallery
23:59 🔗 SketchCow WARC --> Rip Images --> Rotating Head of Bear Roaring out GIFs
23:59 🔗 omf_ Okay

irclogger-viewer