Time |
Nickname |
Message |
00:03
π
|
chfoo |
i'm searching for blip.tv urls in the urlteam torrent |
00:07
π
|
kyan |
If I have WARC files I've crawled, is there a good way to upload them to the Internet Archive?? |
00:11
π
|
omf_ |
kyan, https://github.com/kimmel/ias3upload |
00:11
π
|
omf_ |
it is a bulk upload script |
00:13
π
|
kyan |
omf_, it looks like I would need collection admin access for that. I just have a normal account thereΓ’ΒΒ¦ |
00:14
π
|
kyan |
by the way: the files I've got are not "of a type" (they don't all go together) Γ’ΒΒ just a few websites I've seen fit to archive. |
00:15
π
|
kyan |
conceivably I could just upload the files into the Community Texts collection, although that really seems like the wrong place for archived websites. |
00:18
π
|
omf_ |
kyan, you set the mediatype as texts and collection as opensource then an admin moves it to the proper place |
00:19
π
|
kyan |
omf_: Ok, cool. Sounds good. Thanks :D |
00:26
π
|
chfoo |
20,000 blip.tv urls: http://paste.archivingyoursh.it/raw/gidaqotimo.sm |
00:31
π
|
SketchCow |
#blooper.tv |
00:44
π
|
kyan |
is anyone trying to archive the BBC iPM podcast? they delete it after 30 days. http://www.bbc.co.uk/podcasts/series/ipm |
00:54
π
|
SketchCow |
It's you! |
00:54
π
|
SketchCow |
You're doing it! |
00:55
π
|
kyan |
heh. :D |
01:34
π
|
SketchCow |
http://i.imgur.com/mk5rbvg.gif Yahoo's migrating servers again |
01:34
π
|
BiggieJon |
ouch |
01:35
π
|
BiggieJon |
good thing it only lost a few blanking panels:) |
01:35
π
|
BiggieJon |
ya, sure |
01:49
π
|
kyan |
Who is the admin I should contact for getting website archives into the right places at the Internet Archive? |
01:50
π
|
SketchCow |
Me. What do you need? |
01:51
π
|
kyan |
SketchCow, I've uploaded some stuff. Should I PM you to keep the channel quiet? |
02:13
π
|
godane |
kyan: i maybe able to help iPM podcast |
02:14
π
|
godane |
the bbc streams almost all podcasts |
02:14
π
|
godane |
even if there is only a 30 day download limit on them |
02:16
π
|
godane |
i should be able to get stuff from as old as 2009 |
02:19
π
|
kyan |
godane: I've gotten a script set up to download the iPM podcasts regularly as a WARC. What data do you have? We should probably try to get it all into the Internet ArchiveΓ’ΒΒ¦ |
02:27
π
|
godane |
i just plan on grabing the flash audio files and converting them to m4a |
02:27
π
|
godane |
thats that rtmpdump will give me |
02:28
π
|
godane |
then i will make my upload script with desc, title, and date upload to the audio collection |
02:31
π
|
kyan |
godane: cool. I can get the MP3s, XML files, and description pages using my script for upcoming podcasts. Time for battle! |
02:38
π
|
godane |
looks like the wayback has nothing on it |
02:39
π
|
godane |
there is like 3 mp3 urls but i think all are brokening |
02:43
π
|
kyan |
godane: here's the snapshot I took of the currently available podcasts: https://ia601008.us.archive.org/10/items/BBC8oct2013.warc/ (it's the next-to-last item in the list) |
02:44
π
|
kyan |
godane: I think that if I take a snapshot using that script at a regular basis, future episodes should be preserved |
02:50
π
|
godane |
ok |
03:02
π
|
SketchCow |
kyan, just tell me your archive.org account name |
03:45
π
|
kyan |
SketchCow: EchoBrightstorm |
04:03
π
|
kyan |
(note: not everything I've uploaded is a WARC, just some items) |
04:14
π
|
ivan` |
https://ludios.org/tmp/calvinanddune.tumblr.com.7z HTTrack of this dead site |
04:55
π
|
SketchCow |
kyan converted |
05:01
π
|
arkhive |
The Bebo wiki on archiveteam.org panel on the left 'archiving status' 1.0 as 'lost' |
05:02
π
|
arkhive |
I thought 1.0 is what is being replaced by 2.0 (now in development) and that 1.0 is still accessible |
05:02
π
|
arkhive |
not 'lost' |
05:03
π
|
arkhive |
or is 2.0 the version of Bebo that will be replaced after the complete overhaul/redo/redesign/start over is done being developed? |
05:04
π
|
arkhive |
http://archive.bebo.com/Profile.jsp?MemberId=1 It's in the archive.bebo.com/Profile.f.jadlkjfla but still accessible. Or am I not understanding it right? |
05:23
π
|
chfoo |
well, i wrote the page but i never heard of bebo so i don't know exactly what's there. but there's a few broken links and images. i also checked to see stuff like whether the wall was in the wayback machine and it's not. |
05:24
π
|
chfoo |
the wayback machine has only "http://www.bebo.com/Timeout.jsp" recorded |
05:36
π
|
chfoo |
i updated the page. i'll call the archive version bebo 1.5. |
05:41
π
|
chfoo |
anyone is more than welcome to fix up it though |
05:42
π
|
marc |
re |
05:48
π
|
marc |
dyld: Library not loaded: /opt/local/lib/libx264.98.dylib Referenced from: /opt/local/bin/ffmpeg Reason: image not found |
05:48
π
|
marc |
fcomputer:~ marc$ ffmpeg |
05:48
π
|
marc |
sorry misfire |
05:48
π
|
marc |
anybody ever scrape ustream? |
06:10
π
|
Nemo_bis |
Why? Module threw exception:rar x '/var/tmp/autoclean/derive/manga_Bleach-c323/Bleach-c323.cbr' -w '/var/tmp/autoclean/derive-manga_Bleach-c323-ProcessJP2/RawImagesBookPreprocessor-manga_Bleach-c323/' failed with exit code: 10 https://catalogd.archive.org/log/172961378 |
07:21
π
|
SketchCow |
http://ascii.textfiles.com/archives/4099 |
09:51
π
|
omf_ |
Did we use the warrior on yahoo video? I cannot tell from reading the wiki pages about it. |
09:54
π
|
ivan` |
alard made the first warrior commit on 2012-04-03 |
09:54
π
|
ivan` |
https://github.com/ArchiveTeam/warrior-code/commits/master |
09:55
π
|
omf_ |
I see |
09:57
π
|
ersi |
So that's a 'nope' |
10:47
π
|
omf_ |
chfoo, are you still looking for things to fix up on the wiki? |
11:40
π
|
godane |
i move and change metadata on the newamerica.net dump |
11:40
π
|
godane |
it is not going to be shutdown |
14:00
π
|
omf_ |
I have done a writeup of how the warrior virtual machine works. Please review and comment --> http://www.archiveteam.org/index.php?title=ArchiveTeam_Warrior#How_the_warrior_works |
14:14
π
|
Baljem |
blip's going on the Warrior, right? I have to shuffle some VMs around at work, I might be able to free up a host to run solely Warriors for a while... |
18:22
π
|
kyan |
SketchCow, I've got more WARC items uploaded to the Internet Archive to be moved, but there are some I'm still downloading. Would you rather have me tell you about them as I upload them, or wait until I've got a good many uploaded? |
18:37
π
|
SketchCow |
Wait until you have many |
18:37
π
|
SketchCow |
I can do group actions. |
18:40
π
|
kyan |
SketchCow: Cool, thanks. |
18:41
π
|
Nemo_bis |
SketchCow: can you move these to wikimediacommons collection? https://archive.org/search.php?query=subject%3A%22Wikimedia%20Commons%22%20collection%3Awikimedia-other |
19:09
π
|
kyan |
Another questionΓ’ΒΒ¦ (sorry for all the noise) Is there a way to rerun wget with WARC output to make sure that nothing got left out? When I try to do that it just overwrites my old WARC file and starts from scratch. |
19:18
π
|
SketchCow |
Nemo_bis: Done |
19:19
π
|
Nemo_bis |
thanks |
19:19
π
|
Nemo_bis |
SketchCow: can you also force the creation for torrents for all of them till 2012? |
19:20
π
|
Nemo_bis |
I don't know how hard a request is this, ignore me if it's not trivial and don't glare at me disapprovingly. |
19:20
π
|
SketchCow |
I'll glare no matter what |
19:31
π
|
SketchCow |
Ran it against all 98 because fuck the police |
19:38
π
|
yipdw |
kyan: with wget's WARC output, no |
19:38
π
|
yipdw |
kyan: well, you can create a new one, and it may be possible to reuse a generated WARC index to avoid refetch (you'd probably use a Lua script for that) but there's no "resume from this point in the WARC" option |
19:41
π
|
kyan |
yipdw: I see. I was wondering because I've noticed that sometimes links aren't getting crawled (at all) by wget, as far as I can tell, and thus are getting left out of the archive. |
19:41
π
|
kyan |
yipdw: Thanks. |
19:45
π
|
yipdw |
kyan: yeah, wget's URL heuristics can be a bit odd -- if push comes to shove, though, you can augment them with the Lua hooks |
20:07
π
|
kyan |
yipdw: huh, that looks cool. Hadn't seen that before. InstallingΓ’ΒΒ¦ |
21:14
π
|
omf_ |
yipdw, Can you create a tracker instance for blooper.tv? I should have a url list soon |
21:14
π
|
omf_ |
or xmc or underscor |
21:16
π
|
yipdw |
omf_: http://tracker.archiveteam.org/bloopertv live, you have admin |
21:16
π
|
yipdw |
omf_: no upload locations set for it yet, though |
21:17
π
|
omf_ |
thats fine |