[00:03] <chfoo> i'm searching for blip.tv urls in the urlteam torrent
[00:07] <kyan> If I have WARC files I've crawled, is there a good way to upload them to the Internet Archive??
[00:11] <omf_> kyan, https://github.com/kimmel/ias3upload
[00:11] <omf_> it is a bulk upload script
[00:13] <kyan> omf_, it looks like I would need collection admin access for that. I just have a normal account thereâ¦
[00:14] <kyan> by the way: the files I've got are not "of a type" (they don't all go together) â just a few websites I've seen fit to archive.
[00:15] <kyan> conceivably I could just upload the files into the Community Texts collection, although that really seems like the wrong place for archived websites.
[00:18] <omf_> kyan, you set the mediatype as texts and collection as opensource then an admin moves it to the proper place
[00:19] <kyan> omf_: Ok, cool. Sounds good. Thanks :D
[00:26] <chfoo> 20,000 blip.tv urls: http://paste.archivingyoursh.it/raw/gidaqotimo.sm
[00:31] <SketchCow> #blooper.tv
[00:44] <kyan> is anyone trying to archive the BBC iPM podcast? they delete it after 30 days. http://www.bbc.co.uk/podcasts/series/ipm
[00:54] <SketchCow> It's you!
[00:54] <SketchCow> You're doing it!
[00:55] <kyan> heh. :D
[01:34] <SketchCow> http://i.imgur.com/mk5rbvg.gif Yahoo's migrating servers again
[01:34] <BiggieJon> ouch
[01:35] <BiggieJon> good thing it only lost a few blanking panels:)
[01:35] <BiggieJon> ya, sure
[01:49] <kyan> Who is the admin I should contact for getting website archives into the right places at the Internet Archive?
[01:50] <SketchCow> Me. What do you need?
[01:51] <kyan> SketchCow, I've uploaded some stuff. Should I PM you to keep the channel quiet?
[02:13] <godane> kyan: i maybe able to help iPM podcast
[02:14] <godane> the bbc streams almost all podcasts
[02:14] <godane> even if there is only a 30 day download limit on them
[02:16] <godane> i should be able to get stuff from as old as 2009
[02:19] <kyan> godane: I've gotten a script set up to download the iPM podcasts regularly as a WARC. What data do you have? We should probably try to get it all into the Internet Archiveâ¦
[02:27] <godane> i just plan on grabing the flash audio files and converting them to m4a
[02:27] <godane> thats that rtmpdump will give me
[02:28] <godane> then i will make my upload script with desc, title, and date upload to the audio collection
[02:31] <kyan> godane: cool. I can get the MP3s, XML files, and description pages using my script for upcoming podcasts. Time for battle!
[02:38] <godane> looks like the wayback has nothing on it
[02:39] <godane> there is like 3 mp3 urls but i think all are brokening
[02:43] <kyan> godane: here's the snapshot I took of the currently available podcasts: https://ia601008.us.archive.org/10/items/BBC8oct2013.warc/ (it's the next-to-last item in the list)
[02:44] <kyan> godane: I think that if I take a snapshot using that script at a regular basis, future episodes should be preserved
[02:50] <godane> ok
[03:02] <SketchCow> kyan, just tell me your archive.org account name
[03:45] <kyan> SketchCow: EchoBrightstorm
[04:03] <kyan> (note: not everything I've uploaded is a WARC, just some items)
[04:14] <ivan`> https://ludios.org/tmp/calvinanddune.tumblr.com.7z HTTrack of this dead site
[04:55] <SketchCow> kyan converted
[05:01] <arkhive> The Bebo wiki on archiveteam.org panel on the left 'archiving status' 1.0 as 'lost'
[05:02] <arkhive> I thought 1.0 is what is being replaced by 2.0 (now in development) and that 1.0 is still accessible
[05:02] <arkhive> not 'lost'
[05:03] <arkhive> or is 2.0 the version of Bebo that will be replaced after the complete overhaul/redo/redesign/start over is done being developed?
[05:04] <arkhive> http://archive.bebo.com/Profile.jsp?MemberId=1   It's in the archive.bebo.com/Profile.f.jadlkjfla but still accessible. Or am I not understanding it right?
[05:23] <chfoo> well, i wrote the page but i never heard of bebo so i don't know exactly what's there. but there's a few broken links and images. i also checked to see stuff like whether the wall was in the wayback machine and it's not.
[05:24] <chfoo> the wayback machine has only "http://www.bebo.com/Timeout.jsp" recorded
[05:36] <chfoo> i updated the page. i'll call the archive version bebo 1.5.
[05:41] <chfoo> anyone is more than welcome to fix up it though
[05:42] <marc> re
[05:48] <marc> dyld: Library not loaded: /opt/local/lib/libx264.98.dylib Referenced from: /opt/local/bin/ffmpeg Reason: image not found
[05:48] <marc> fcomputer:~ marc$ ffmpeg
[05:48] <marc> sorry misfire
[05:48] <marc> anybody ever scrape ustream?
[06:10] <Nemo_bis> Why? Module threw exception:rar x '/var/tmp/autoclean/derive/manga_Bleach-c323/Bleach-c323.cbr' -w '/var/tmp/autoclean/derive-manga_Bleach-c323-ProcessJP2/RawImagesBookPreprocessor-manga_Bleach-c323/' failed with exit code: 10 https://catalogd.archive.org/log/172961378
[07:21] <SketchCow> http://ascii.textfiles.com/archives/4099
[09:51] <omf_> Did we use the warrior on yahoo video? I cannot tell from reading the wiki pages about it.
[09:54] <ivan`> alard made the first warrior commit on 2012-04-03
[09:54] <ivan`> https://github.com/ArchiveTeam/warrior-code/commits/master
[09:55] <omf_> I see
[09:57] <ersi> So that's a 'nope'
[10:47] <omf_> chfoo, are you still looking for things to fix up on the wiki?
[11:40] <godane> i move and change metadata on the newamerica.net dump
[11:40] <godane> it is not going to be shutdown
[14:00] <omf_> I have done a writeup of how the warrior virtual machine works. Please review and comment --> http://www.archiveteam.org/index.php?title=ArchiveTeam_Warrior#How_the_warrior_works
[14:14] <Baljem> blip's going on the Warrior, right?  I have to shuffle some VMs around at work, I might be able to free up a host to run solely Warriors for a while...
[18:22] <kyan> SketchCow, I've got more WARC items uploaded to the Internet Archive to be moved, but there are some I'm still downloading. Would you rather have me tell you about them as I upload them, or wait until I've got a good many uploaded?
[18:37] <SketchCow> Wait until you have many
[18:37] <SketchCow> I can do group actions.
[18:40] <kyan> SketchCow: Cool, thanks.
[18:41] <Nemo_bis> SketchCow: can you move these to wikimediacommons collection? https://archive.org/search.php?query=subject%3A%22Wikimedia%20Commons%22%20collection%3Awikimedia-other
[19:09] <kyan> Another questionâ¦ (sorry for all the noise) Is there a way to rerun wget with WARC output to make sure that nothing got left out? When I try to do that it just overwrites my old WARC file and starts from scratch.
[19:18] <SketchCow> Nemo_bis: Done
[19:19] <Nemo_bis> thanks
[19:19] <Nemo_bis> SketchCow: can you also force the creation for torrents for all of them till 2012?
[19:20] <Nemo_bis> I don't know how hard a request is this, ignore me if it's not trivial and don't glare at me disapprovingly.
[19:20] <SketchCow> I'll glare no matter what
[19:31] <SketchCow> Ran it against all 98 because fuck the police
[19:38] <yipdw> kyan: with wget's WARC output, no
[19:38] <yipdw> kyan: well, you can create a new one, and it may be possible to reuse a generated WARC index to avoid refetch (you'd probably use a Lua script for that) but there's no "resume from this point in the WARC" option
[19:41] <kyan> yipdw: I see. I was wondering because I've noticed that sometimes links aren't getting crawled (at all) by wget, as far as I can tell, and thus are getting left out of the archive.
[19:41] <kyan> yipdw: Thanks.
[19:45] <yipdw> kyan: yeah, wget's URL heuristics can be a bit odd -- if push comes to shove, though, you can augment them with the Lua hooks
[20:07] <kyan> yipdw: huh, that looks cool. Hadn't seen that before. Installingâ¦
[21:14] <omf_> yipdw, Can you create a tracker instance for blooper.tv? I should have a url list soon
[21:14] <omf_> or xmc or underscor
[21:16] <yipdw> omf_: http://tracker.archiveteam.org/bloopertv live, you have admin
[21:16] <yipdw> omf_: no upload locations set for it yet, though
[21:17] <omf_> thats fine