#archiveteam 2013-10-09,Wed

↑back Search

Time Nickname Message
00:03 πŸ”— chfoo i'm searching for blip.tv urls in the urlteam torrent
00:07 πŸ”— kyan If I have WARC files I've crawled, is there a good way to upload them to the Internet Archive??
00:11 πŸ”— omf_ kyan, https://github.com/kimmel/ias3upload
00:11 πŸ”— omf_ it is a bulk upload script
00:13 πŸ”— kyan omf_, it looks like I would need collection admin access for that. I just have a normal account thereҀ¦
00:14 πŸ”— kyan by the way: the files I've got are not "of a type" (they don't all go together) Ҁ” just a few websites I've seen fit to archive.
00:15 πŸ”— kyan conceivably I could just upload the files into the Community Texts collection, although that really seems like the wrong place for archived websites.
00:18 πŸ”— omf_ kyan, you set the mediatype as texts and collection as opensource then an admin moves it to the proper place
00:19 πŸ”— kyan omf_: Ok, cool. Sounds good. Thanks :D
00:26 πŸ”— chfoo 20,000 blip.tv urls: http://paste.archivingyoursh.it/raw/gidaqotimo.sm
00:31 πŸ”— SketchCow #blooper.tv
00:44 πŸ”— kyan is anyone trying to archive the BBC iPM podcast? they delete it after 30 days. http://www.bbc.co.uk/podcasts/series/ipm
00:54 πŸ”— SketchCow It's you!
00:54 πŸ”— SketchCow You're doing it!
00:55 πŸ”— kyan heh. :D
01:34 πŸ”— SketchCow http://i.imgur.com/mk5rbvg.gif Yahoo's migrating servers again
01:34 πŸ”— BiggieJon ouch
01:35 πŸ”— BiggieJon good thing it only lost a few blanking panels:)
01:35 πŸ”— BiggieJon ya, sure
01:49 πŸ”— kyan Who is the admin I should contact for getting website archives into the right places at the Internet Archive?
01:50 πŸ”— SketchCow Me. What do you need?
01:51 πŸ”— kyan SketchCow, I've uploaded some stuff. Should I PM you to keep the channel quiet?
02:13 πŸ”— godane kyan: i maybe able to help iPM podcast
02:14 πŸ”— godane the bbc streams almost all podcasts
02:14 πŸ”— godane even if there is only a 30 day download limit on them
02:16 πŸ”— godane i should be able to get stuff from as old as 2009
02:19 πŸ”— kyan godane: I've gotten a script set up to download the iPM podcasts regularly as a WARC. What data do you have? We should probably try to get it all into the Internet ArchiveҀ¦
02:27 πŸ”— godane i just plan on grabing the flash audio files and converting them to m4a
02:27 πŸ”— godane thats that rtmpdump will give me
02:28 πŸ”— godane then i will make my upload script with desc, title, and date upload to the audio collection
02:31 πŸ”— kyan godane: cool. I can get the MP3s, XML files, and description pages using my script for upcoming podcasts. Time for battle!
02:38 πŸ”— godane looks like the wayback has nothing on it
02:39 πŸ”— godane there is like 3 mp3 urls but i think all are brokening
02:43 πŸ”— kyan godane: here's the snapshot I took of the currently available podcasts: https://ia601008.us.archive.org/10/items/BBC8oct2013.warc/ (it's the next-to-last item in the list)
02:44 πŸ”— kyan godane: I think that if I take a snapshot using that script at a regular basis, future episodes should be preserved
02:50 πŸ”— godane ok
03:02 πŸ”— SketchCow kyan, just tell me your archive.org account name
03:45 πŸ”— kyan SketchCow: EchoBrightstorm
04:03 πŸ”— kyan (note: not everything I've uploaded is a WARC, just some items)
04:14 πŸ”— ivan` https://ludios.org/tmp/calvinanddune.tumblr.com.7z HTTrack of this dead site
04:55 πŸ”— SketchCow kyan converted
05:01 πŸ”— arkhive The Bebo wiki on archiveteam.org panel on the left 'archiving status' 1.0 as 'lost'
05:02 πŸ”— arkhive I thought 1.0 is what is being replaced by 2.0 (now in development) and that 1.0 is still accessible
05:02 πŸ”— arkhive not 'lost'
05:03 πŸ”— arkhive or is 2.0 the version of Bebo that will be replaced after the complete overhaul/redo/redesign/start over is done being developed?
05:04 πŸ”— arkhive http://archive.bebo.com/Profile.jsp?MemberId=1 It's in the archive.bebo.com/Profile.f.jadlkjfla but still accessible. Or am I not understanding it right?
05:23 πŸ”— chfoo well, i wrote the page but i never heard of bebo so i don't know exactly what's there. but there's a few broken links and images. i also checked to see stuff like whether the wall was in the wayback machine and it's not.
05:24 πŸ”— chfoo the wayback machine has only "http://www.bebo.com/Timeout.jsp" recorded
05:36 πŸ”— chfoo i updated the page. i'll call the archive version bebo 1.5.
05:41 πŸ”— chfoo anyone is more than welcome to fix up it though
05:42 πŸ”— marc re
05:48 πŸ”— marc dyld: Library not loaded: /opt/local/lib/libx264.98.dylib Referenced from: /opt/local/bin/ffmpeg Reason: image not found
05:48 πŸ”— marc fcomputer:~ marc$ ffmpeg
05:48 πŸ”— marc sorry misfire
05:48 πŸ”— marc anybody ever scrape ustream?
06:10 πŸ”— Nemo_bis Why? Module threw exception:rar x '/var/tmp/autoclean/derive/manga_Bleach-c323/Bleach-c323.cbr' -w '/var/tmp/autoclean/derive-manga_Bleach-c323-ProcessJP2/RawImagesBookPreprocessor-manga_Bleach-c323/' failed with exit code: 10 https://catalogd.archive.org/log/172961378
07:21 πŸ”— SketchCow http://ascii.textfiles.com/archives/4099
09:51 πŸ”— omf_ Did we use the warrior on yahoo video? I cannot tell from reading the wiki pages about it.
09:54 πŸ”— ivan` alard made the first warrior commit on 2012-04-03
09:54 πŸ”— ivan` https://github.com/ArchiveTeam/warrior-code/commits/master
09:55 πŸ”— omf_ I see
09:57 πŸ”— ersi So that's a 'nope'
10:47 πŸ”— omf_ chfoo, are you still looking for things to fix up on the wiki?
11:40 πŸ”— godane i move and change metadata on the newamerica.net dump
11:40 πŸ”— godane it is not going to be shutdown
14:00 πŸ”— omf_ I have done a writeup of how the warrior virtual machine works. Please review and comment --> http://www.archiveteam.org/index.php?title=ArchiveTeam_Warrior#How_the_warrior_works
14:14 πŸ”— Baljem blip's going on the Warrior, right? I have to shuffle some VMs around at work, I might be able to free up a host to run solely Warriors for a while...
18:22 πŸ”— kyan SketchCow, I've got more WARC items uploaded to the Internet Archive to be moved, but there are some I'm still downloading. Would you rather have me tell you about them as I upload them, or wait until I've got a good many uploaded?
18:37 πŸ”— SketchCow Wait until you have many
18:37 πŸ”— SketchCow I can do group actions.
18:40 πŸ”— kyan SketchCow: Cool, thanks.
18:41 πŸ”— Nemo_bis SketchCow: can you move these to wikimediacommons collection? https://archive.org/search.php?query=subject%3A%22Wikimedia%20Commons%22%20collection%3Awikimedia-other
19:09 πŸ”— kyan Another questionҀ¦ (sorry for all the noise) Is there a way to rerun wget with WARC output to make sure that nothing got left out? When I try to do that it just overwrites my old WARC file and starts from scratch.
19:18 πŸ”— SketchCow Nemo_bis: Done
19:19 πŸ”— Nemo_bis thanks
19:19 πŸ”— Nemo_bis SketchCow: can you also force the creation for torrents for all of them till 2012?
19:20 πŸ”— Nemo_bis I don't know how hard a request is this, ignore me if it's not trivial and don't glare at me disapprovingly.
19:20 πŸ”— SketchCow I'll glare no matter what
19:31 πŸ”— SketchCow Ran it against all 98 because fuck the police
19:38 πŸ”— yipdw kyan: with wget's WARC output, no
19:38 πŸ”— yipdw kyan: well, you can create a new one, and it may be possible to reuse a generated WARC index to avoid refetch (you'd probably use a Lua script for that) but there's no "resume from this point in the WARC" option
19:41 πŸ”— kyan yipdw: I see. I was wondering because I've noticed that sometimes links aren't getting crawled (at all) by wget, as far as I can tell, and thus are getting left out of the archive.
19:41 πŸ”— kyan yipdw: Thanks.
19:45 πŸ”— yipdw kyan: yeah, wget's URL heuristics can be a bit odd -- if push comes to shove, though, you can augment them with the Lua hooks
20:07 πŸ”— kyan yipdw: huh, that looks cool. Hadn't seen that before. InstallingҀ¦
21:14 πŸ”— omf_ yipdw, Can you create a tracker instance for blooper.tv? I should have a url list soon
21:14 πŸ”— omf_ or xmc or underscor
21:16 πŸ”— yipdw omf_: http://tracker.archiveteam.org/bloopertv live, you have admin
21:16 πŸ”— yipdw omf_: no upload locations set for it yet, though
21:17 πŸ”— omf_ thats fine

irclogger-viewer