[00:35] OK, here we go - uploading 100+ CD-ROMs [00:39] cool [00:46] awesome! [00:54] http://archive.org/search.php?query=collection%3Acdbbsarchive&sort=-publicdate [00:54] Watch the fun [00:54] they'll pop in like crazy [00:56] SketchCow: you should get me access to cdbbsarchive and archiveteam [00:57] just so when i add web dumps there not stuck in texts forever [01:00] why would web dumps go in cdbbssarchive ? [01:01] they go to archiveteam [01:01] okay- that makes more sense [01:01] access to cdbbsarchive will help me when uploading stuff like twilight series of cds/dvds [01:18] There is no situation where you would upload twilight DVDs to that collection. [01:18] It's software, godane. [01:19] ok [01:20] why? [01:20] its dvd iso or bin/cue image [01:20] Twilight. Like the Twilight movies? [01:20] no [01:20] Explain to me what they are. Link? [01:21] 02 [01:22] https://archive.org/details/cdrom-twilight-003 [01:22] its shareware cds/dvds [01:24] i converted the first 5 isos from mdf files [01:24] that mdf and mds files are still there too [01:29] keep the mdf/mds, yes [01:31] its only the first 15 that i have in mdf/mds format [01:31] after that there in iso or bin/cue [01:31] bin/cue is preferred over iso [01:31] but still not great for multisession/etc [01:37] OK, you have access to cdbbsarchive [01:37] anyways i have uploaded over 44k videos to g4video-web [01:37] cool [01:38] And archiveteam [01:41] great [01:42] now i don't have to wait to push my stuff to your collection [01:42] *my stuff to be add to your collection [02:07] WHAT FORSOOTH, PRITHEE TELL ME THE SECRET [02:07] "yahooschmahoo" [02:12] "yahoosucks" [02:13] Ignore Bluemax, we leave him around so the children feel better about themselves [02:13] 95 CD-ROMs uploaded. 56gb of content. [02:14] what collection again? [02:20] is there a way to search through the CD's for a particular file name yet? [02:37] cdbbsarchive [02:37] Absolutely not. [02:41] http://archive.org/search.php?query=collection%3Acommodore_c64_books&sort=-publicdate [02:41] That will eventually expand out to 95 books [02:52] found a 3min interview with john ritter [03:05] Now at 27 [03:05] Get reading, you'll never catch up. [03:15] this pc gamer stuff is all dupes [03:17] Is it? I scanned them in myself. [03:18] I'm glad there's some dupes, it was getting freaky. [03:18] i think i can edit anything [03:18] Oh man, godane. [03:19] If you went ahead and fixed metadata on any objects in the CD Archive, they'd sing songs about you [03:19] your giving me full admin? [03:19] You're an admin of that collection, yes. [03:19] Delete something and they'll find you in 12 coffee cans. [03:19] Let me do that. [03:19] But feel free to increase the quality of metadata for that thing. [03:24] uploaded: https://archive.org/details/cdrom-twilight-006 [03:26] for a few seconds there i thought i could move my stuff to archiveteam [03:31] So many Commodore C64 books. [03:31] Uploading manuals by the metric ton, too [03:31] http://archive.org/details/commodore_c64_manuals [03:32] http://archive.org/stream/1541_Flash_Disk_Speedup_for_SX-64_1985_Skyles_Electric_Works#page/n23/mode/2up [03:32] now those are clear instructions [03:32] not scary at all [03:33] Staring into a wire-mass maw of SX64 misery [03:53] SketchCow: i uploaded those JumpStart games [03:53] JumpStart 1st Grade: https://archive.org/details/JumpStart_1st_Grade [03:54] https://archive.org/details/Jumpstart_2nd_Grade [03:54] https://archive.org/details/Jumpstart_3rd_Grade [03:54] https://archive.org/details/Jumpstart_4th_Grade [03:55] https://archive.org/details/Jumpstart_5th_Grade [03:55] can these be moved to cdbbsarchive? [04:18] Yes [04:18] But they're .zips when ISOs are better. [04:18] But whatever, get them in [04:23] there clonecd images in the zips [04:24] it just was 4 files in them and i don't know know how to do more then one file to my s3 upload script [04:26] You don't, just do them over and over [04:27] oh [04:28] i normally use ftp for more then one file [04:37] SketchCow: i don't see archiveteam collection when i'm checking in my item [04:57] You absolutely have admin access. [05:07] i know i do in the software collection [05:07] but i don't see web crawls drop down menu or something saying archiveteam [05:11] i will test it later with my s3 upload script [05:16] Oh, yes. [05:16] That's true, you might be fucked there. [05:18] godane what script are you using to upload? [05:18] a custom script i have [05:26] And it cannot upload more than one file at once? This is why I do development on the open source ias3upload so that everyone can benefit [05:58] how do I install and run universal-tracker? I am unfamiliar with gem [06:01] (and do rubyists run all of these tools as root or what?) [06:35] okay, I did ~/.gem/ruby/1.9.1/bin/bundle install --path=~/.gem; cp config/redis.json.example config/redis.json; rackup [06:35] that seems to be working [06:50] so I'm writing the pipeline stuff for greader, is it reasonable to put 1000 unrelated feeds into each warc? [06:51] most of them will be 404 anyway [06:51] depends on the output size, most people have slow ass upload speeds [06:51] it's mostly gzipped json, no images [06:51] then I don't think it would be a problem [06:52] I mention it because we constantly see people asking about uploads, not understand that is the part that takes forever not downloading [06:52] too bad it ain't .warc.lzma2 [09:10] the inversion of pipeline is really frustrating :/ [09:10] trying to pass multiple URLs to WgetDownload [09:29] ivan`: Just pass a list/text file with targets? [09:40] the warc uploads are STILL going for that site godane [09:43] 2013-05-22 22:30:39 (92.3 MB/s) - 'rbelmont.mameworld.info/index.html?feed=rss2&page_id=70' saved [751] [09:43] FINISHED --2013-05-22 22:30:39-- [09:44] Total wall clock time: 46m 27s [09:44] Downloaded: 1495 files, 220M in 7m 5s (530 KB/s) [09:45] :) [09:46] Smiley, Baljem did this yesterday https://archive.org/details/rbelmont.mameworld.info.warc [09:49] https://github.com/ArchiveTeam/greader-grab :-) [09:50] ersi: considered that, but didn't want to code the cleanup [09:50] I wrote a ConcatenatedList with a .realize [09:53] some feeds have a massive history, most are near-empty or 404, hopefully slow people will not get 1000 massive feeds :/ [09:56] omf_: good good. [09:56] ivan`: Maybe make smaller chunks, just in case? [09:59] yeah, perhaps 100 or 200 [10:02] sounds good [10:32] how in the world does this work in posterous-grab? id_function=(lambda item: {"ua": item["user_agent"] }) [10:33] my item is lacking this user_agent in my pipeline [10:34] The tracker for posterous dishes out user-agents.. and that's where it was introduced first [10:34] It's for overriding the useragent in the WgetDownload [10:35] oh man, I didn't even realize the tracker could dish out json [10:36] I suppose I should stop doing my string splitting now [10:36] (unless adding jobs with item_name and some extra unique data is hard?) [10:42] right now my items are 00000001|feed1`feed2`... [10:45] Why not just give them a name? Like task1 2 3 etc [10:48] I don't understand :) [10:50] instead of doing the string splitting stuff, should I write a program for inserting a job like {"item_name": "0000001", "feed_urls": [...]} into redis? [10:51] I think so, but I'm not certain.. I think alard's the man [17:58] Hey godane [19:25] SketchCow: hey [19:25] Hey. [19:25] So, in the future, for texts [19:25] Please enter subjects as subject[1] subject[2] and so on, not as just subject [19:26] Small bug with s3, and how it works [19:26] if you do subject=keyword1;keyword2, it just makes a single subject "keyword1;keyword2" instead of the right thing. [19:26] This is just for texts. [19:26] Got it? [19:26] It'll help going forward [19:27] this is a problem with texts keywords [19:27] ok [19:27] ivan`: Haven't looked at your pipeline code, but if you need more details, perhaps #warrior is a useful channel for discussion? [19:29] SketchCow: i don't have this problem here: [19:29] https://archive.org/details/g4tv.com-hdvideo-xml-20130228 [19:37] Yes, that's video. [19:37] mediatype movies [19:37] When it's mediatype texts, the problem shows. [19:38] that is a web dump [19:38] mediatype is texts [19:38] collection is opensource [19:38] :P [19:39] This is what has been passed to me. [19:39] http://archive.org/details/Loadstar-Letter-Issue-45 is the example he used. [19:40] how do you fix that problem? [19:43] i tryed removing it and adding it back in [19:43] 15:27 <@SketchCow> Please enter subjects as subject[1] subject[2] and so on, not as just subject [19:43] how do you do that in web edit? [19:44] there is only subject [19:44] Add at the bottom [19:44] adding another field [19:50] ok this is weird [19:50] it doesn't cause anyproblems in search pages [19:52] also looks like i only add key words to loadstar letter like this [19:53] i think i didn't add any key works to stuff like my sandhills publishing pdfs i uploaded [19:53] *words [19:54] Great. [19:55] so this bug looks to be limited to the page of the item [19:56] not search results [19:56] so its not a very big bug [19:59] not being able to click a keyword is annoying [19:59] i know [19:59] but this is in a collection [20:00] so its not has bad has it could be [20:03] http://developers.slashdot.org/story/13/05/23/1752201/google-code-deprecates-download-service-for-project-hosting [20:09] what... why [20:10] figure guys should get one that [20:11] *on that [21:27] alard: thanks, didn't know that existed [21:27] Pouet still going; [21:28] newAmerica is now on the dedibox, and I need to write up the metadata for it [21:38] And it's uploading :) This is gonna take _AWHILE_ [22:23] godane: on warc 13..... I doubt by the morning it'll even be finished :D [23:14] in the event that I need to suspend a warrior when it's uploading something, will the upload be fucked or will things be normal upon resume? [23:16] most likely resume [23:16] else it'll just start again [23:16] Depends if it ends up connecting ot the same server on resume, and if the resume data is intact. [23:28] i think its just fos right now where everything is being pushed [23:29] cool [23:50] So, Brewster asked me if one of you wanted to take a shot at a pretty weird little project. [23:50] If you want to fuck with WARCs, let me know [23:57] SketchCow, I already fuck with warcs [23:58] If it is programming I am all ears [23:58] Take a WARC, download it, rip out a collection of "interesting" GIFs, JPGs and PNGs, and make a collage. [23:58] Write it so "make a collage" can be different things. [23:58] Do you mean different file types for the collage or different image types [23:59] Sorry. [23:59] WARC --> Rip Images --> Make Collage [23:59] WARC --> Rip Images --> Make Gallery [23:59] WARC --> Rip Images --> Rotating Head of Bear Roaring out GIFs [23:59] Okay