[00:03] i'm searching for blip.tv urls in the urlteam torrent [00:07] If I have WARC files I've crawled, is there a good way to upload them to the Internet Archive?? [00:11] kyan, https://github.com/kimmel/ias3upload [00:11] it is a bulk upload script [00:13] omf_, it looks like I would need collection admin access for that. I just have a normal account there… [00:14] by the way: the files I've got are not "of a type" (they don't all go together) — just a few websites I've seen fit to archive. [00:15] conceivably I could just upload the files into the Community Texts collection, although that really seems like the wrong place for archived websites. [00:18] kyan, you set the mediatype as texts and collection as opensource then an admin moves it to the proper place [00:19] omf_: Ok, cool. Sounds good. Thanks :D [00:26] 20,000 blip.tv urls: http://paste.archivingyoursh.it/raw/gidaqotimo.sm [00:31] #blooper.tv [00:44] is anyone trying to archive the BBC iPM podcast? they delete it after 30 days. http://www.bbc.co.uk/podcasts/series/ipm [00:54] It's you! [00:54] You're doing it! [00:55] heh. :D [01:34] http://i.imgur.com/mk5rbvg.gif Yahoo's migrating servers again [01:34] ouch [01:35] good thing it only lost a few blanking panels:) [01:35] ya, sure [01:49] Who is the admin I should contact for getting website archives into the right places at the Internet Archive? [01:50] Me. What do you need? [01:51] SketchCow, I've uploaded some stuff. Should I PM you to keep the channel quiet? [02:13] kyan: i maybe able to help iPM podcast [02:14] the bbc streams almost all podcasts [02:14] even if there is only a 30 day download limit on them [02:16] i should be able to get stuff from as old as 2009 [02:19] godane: I've gotten a script set up to download the iPM podcasts regularly as a WARC. What data do you have? We should probably try to get it all into the Internet Archive… [02:27] i just plan on grabing the flash audio files and converting them to m4a [02:27] thats that rtmpdump will give me [02:28] then i will make my upload script with desc, title, and date upload to the audio collection [02:31] godane: cool. I can get the MP3s, XML files, and description pages using my script for upcoming podcasts. Time for battle! [02:38] looks like the wayback has nothing on it [02:39] there is like 3 mp3 urls but i think all are brokening [02:43] godane: here's the snapshot I took of the currently available podcasts: https://ia601008.us.archive.org/10/items/BBC8oct2013.warc/ (it's the next-to-last item in the list) [02:44] godane: I think that if I take a snapshot using that script at a regular basis, future episodes should be preserved [02:50] ok [03:02] kyan, just tell me your archive.org account name [03:45] SketchCow: EchoBrightstorm [04:03] (note: not everything I've uploaded is a WARC, just some items) [04:14] https://ludios.org/tmp/calvinanddune.tumblr.com.7z HTTrack of this dead site [04:55] kyan converted [05:01] The Bebo wiki on archiveteam.org panel on the left 'archiving status' 1.0 as 'lost' [05:02] I thought 1.0 is what is being replaced by 2.0 (now in development) and that 1.0 is still accessible [05:02] not 'lost' [05:03] or is 2.0 the version of Bebo that will be replaced after the complete overhaul/redo/redesign/start over is done being developed? [05:04] http://archive.bebo.com/Profile.jsp?MemberId=1 It's in the archive.bebo.com/Profile.f.jadlkjfla but still accessible. Or am I not understanding it right? [05:23] well, i wrote the page but i never heard of bebo so i don't know exactly what's there. but there's a few broken links and images. i also checked to see stuff like whether the wall was in the wayback machine and it's not. [05:24] the wayback machine has only "http://www.bebo.com/Timeout.jsp" recorded [05:36] i updated the page. i'll call the archive version bebo 1.5. [05:41] anyone is more than welcome to fix up it though [05:42] re [05:48] dyld: Library not loaded: /opt/local/lib/libx264.98.dylib Referenced from: /opt/local/bin/ffmpeg Reason: image not found [05:48] fcomputer:~ marc$ ffmpeg [05:48] sorry misfire [05:48] anybody ever scrape ustream? [06:10] Why? Module threw exception:rar x '/var/tmp/autoclean/derive/manga_Bleach-c323/Bleach-c323.cbr' -w '/var/tmp/autoclean/derive-manga_Bleach-c323-ProcessJP2/RawImagesBookPreprocessor-manga_Bleach-c323/' failed with exit code: 10 https://catalogd.archive.org/log/172961378 [07:21] http://ascii.textfiles.com/archives/4099 [09:51] Did we use the warrior on yahoo video? I cannot tell from reading the wiki pages about it. [09:54] alard made the first warrior commit on 2012-04-03 [09:54] https://github.com/ArchiveTeam/warrior-code/commits/master [09:55] I see [09:57] So that's a 'nope' [10:47] chfoo, are you still looking for things to fix up on the wiki? [11:40] i move and change metadata on the newamerica.net dump [11:40] it is not going to be shutdown [14:00] I have done a writeup of how the warrior virtual machine works. Please review and comment --> http://www.archiveteam.org/index.php?title=ArchiveTeam_Warrior#How_the_warrior_works [14:14] blip's going on the Warrior, right? I have to shuffle some VMs around at work, I might be able to free up a host to run solely Warriors for a while... [18:22] SketchCow, I've got more WARC items uploaded to the Internet Archive to be moved, but there are some I'm still downloading. Would you rather have me tell you about them as I upload them, or wait until I've got a good many uploaded? [18:37] Wait until you have many [18:37] I can do group actions. [18:40] SketchCow: Cool, thanks. [18:41] SketchCow: can you move these to wikimediacommons collection? https://archive.org/search.php?query=subject%3A%22Wikimedia%20Commons%22%20collection%3Awikimedia-other [19:09] Another question… (sorry for all the noise) Is there a way to rerun wget with WARC output to make sure that nothing got left out? When I try to do that it just overwrites my old WARC file and starts from scratch. [19:18] Nemo_bis: Done [19:19] thanks [19:19] SketchCow: can you also force the creation for torrents for all of them till 2012? [19:20] I don't know how hard a request is this, ignore me if it's not trivial and don't glare at me disapprovingly. [19:20] I'll glare no matter what [19:31] Ran it against all 98 because fuck the police [19:38] kyan: with wget's WARC output, no [19:38] kyan: well, you can create a new one, and it may be possible to reuse a generated WARC index to avoid refetch (you'd probably use a Lua script for that) but there's no "resume from this point in the WARC" option [19:41] yipdw: I see. I was wondering because I've noticed that sometimes links aren't getting crawled (at all) by wget, as far as I can tell, and thus are getting left out of the archive. [19:41] yipdw: Thanks. [19:45] kyan: yeah, wget's URL heuristics can be a bit odd -- if push comes to shove, though, you can augment them with the Lua hooks [20:07] yipdw: huh, that looks cool. Hadn't seen that before. Installing… [21:14] yipdw, Can you create a tracker instance for blooper.tv? I should have a url list soon [21:14] or xmc or underscor [21:16] omf_: http://tracker.archiveteam.org/bloopertv live, you have admin [21:16] omf_: no upload locations set for it yet, though [21:17] thats fine