Time |
Nickname |
Message |
00:00
🔗
|
nickname_ |
I'm suprised that this 11 year old file still on NPR.org and still works in RealPlayer |
00:01
🔗
|
dashcloud |
hmm- let me try to track down an mplayer install and the codec pack- that should handle the file then |
00:02
🔗
|
nickname_ |
Here is the one with the weird file format on archive.org: https://archive.org/details/MorningEditionForDecember22200520051222Me14 |
00:03
🔗
|
nickname_ |
(it's not in the player, you have go to show all) |
00:09
🔗
|
godane |
nickname_: i will look at npr |
00:10
🔗
|
godane |
i have been doing it: https://archive.org/details/npr-morning-edition |
00:10
🔗
|
nickname_ |
okay, i've been doing it manually with firefox using FlashGot and DownThemAll... |
00:11
🔗
|
nickname_ |
I saw that collection and decided to start from January 2005, but when you click on some of them they are unavailible for some reason |
00:11
🔗
|
nickname_ |
(copyright?) |
00:12
🔗
|
godane |
i blute force the paths using wget |
00:12
🔗
|
godane |
so sometimes there is not mp3 but there is real media |
00:13
🔗
|
godane |
or sometimes its real media but no mp3 |
00:13
🔗
|
nickname_ |
on each page there is a download button for usually mp3 that is a direct link |
00:13
🔗
|
godane |
for Morning Edition yes |
00:13
🔗
|
nickname_ |
this page (http://www.npr.org/programs/morning-edition/2005/12/22/12927340/?showDate=2005-12-22) is one of those annoying real media ones |
00:13
🔗
|
godane |
for show like Talk of the Nation there was real media for a long time |
00:15
🔗
|
nickname_ |
I tried wget, but there was no easy way to do it quickly |
00:15
🔗
|
godane |
https://archive.org/details/npr-talk-of-the-nation |
00:16
🔗
|
godane |
talk of the nation real media files after august 20, 2001 don't encode into mp3 well |
00:16
🔗
|
nickname_ |
some of those are unavailible as well |
00:17
🔗
|
dashcloud |
holy shit- mplayer plays that first one just fin- 20051222_me_13.rm |
00:17
🔗
|
nickname_ |
what! |
00:17
🔗
|
godane |
mplayer plays the real media files fine |
00:18
🔗
|
godane |
it just doesn't re-encode on archive.org right |
00:18
🔗
|
nickname_ |
hmm |
00:20
🔗
|
|
BlueMaxim has quit IRC (Read error: Operation timed out) |
00:28
🔗
|
|
ris has quit IRC () |
00:57
🔗
|
|
VADemon has quit IRC (Quit: left4dead) |
00:57
🔗
|
nickname_ |
godane: have you tried converting the *.rm file with mencoder to mp3? |
01:01
🔗
|
dashcloud |
maybe it would be better to convert to a .wav file, then upload that so IA can make derivatives? |
01:09
🔗
|
godane |
i upload the original files |
01:10
🔗
|
godane |
the derive problem started around august 20 2001 |
01:10
🔗
|
godane |
after that IA couldn't re-encode the real media files |
01:10
🔗
|
joepie91 |
dashcloud: better to do it as FLAC then |
01:11
🔗
|
joepie91 |
space-waste-wise |
01:11
🔗
|
godane |
i prefer just to upload original files |
01:11
🔗
|
joepie91 |
sure, but you can upload both the original and a FLAC version |
01:11
🔗
|
joepie91 |
and let it derive from the FLAC |
01:12
🔗
|
godane |
then IA can fix it this later |
01:12
🔗
|
godane |
i figure its IA derive process problem |
01:15
🔗
|
|
tomwsmf-a has joined #archiveteam-bs |
01:16
🔗
|
godane |
the one that works before they stop deriving : https://archive.org/details/npr-talk-of-the-nation-08-20-2001 |
01:19
🔗
|
|
JesseW has joined #archiveteam-bs |
01:59
🔗
|
|
j08nY has quit IRC (Quit: Leaving) |
02:00
🔗
|
nickname_ |
that one says there are issues with this items content |
02:02
🔗
|
SketchCow |
Safe at home. |
02:05
🔗
|
godane |
nickname_: i can see the item |
02:05
🔗
|
godane |
also some of my stuff i upload goes to podcast collection |
02:05
🔗
|
godane |
that collection makes things disappear publicly for some reason |
02:07
🔗
|
nickname_ |
you can see it, but i can't (https://i.imgur.com/fQLJU5F.png) |
02:07
🔗
|
godane |
i know |
02:07
🔗
|
godane |
it works when i'm login |
02:08
🔗
|
godane |
looks like i can't even download it on wget |
02:08
🔗
|
godane |
*using wget |
02:08
🔗
|
godane |
SketchCow: we need to fix that podcast collection |
02:09
🔗
|
godane |
even if we have move the collections out of podcast collection so there viewer/downloadable |
02:12
🔗
|
SketchCow |
Whut |
02:14
🔗
|
godane |
its a bug from a log time ago |
02:15
🔗
|
godane |
the podcast collection i think blocks non-login uses from downloading the items in it |
02:15
🔗
|
godane |
SketchCow: you noticed this bug to at one point |
02:21
🔗
|
godane |
fun fact: i was around 617k around jan 22 2016 |
02:21
🔗
|
godane |
i'm at 718k so i got up about 101k in about 5 months |
02:23
🔗
|
godane |
so i figure the goal would be to get around to 850k+ before the end of the year |
02:26
🔗
|
|
Start_ has joined #archiveteam-bs |
02:26
🔗
|
|
Start has quit IRC (Ping timeout: 260 seconds) |
02:37
🔗
|
|
acridAxid has quit IRC (marauder) |
02:38
🔗
|
|
acridAxid has joined #archiveteam-bs |
02:49
🔗
|
godane |
so i got a book called The official AT&T Worldnet Web Discovery Guide |
02:49
🔗
|
godane |
it came with a cd with a free month of at&t worldnet |
02:51
🔗
|
DoomTay |
Now I'm thinking of http://motherboard.vice.com/read/this-dude-is-collecting-your-old-aol-cds |
02:51
🔗
|
DoomTay |
Wait... |
03:01
🔗
|
dashcloud |
glad to hear you made it home safely SketchCow |
03:12
🔗
|
godane |
i'm uploading the iso now |
03:14
🔗
|
godane |
i'm listening to the All Things Considered -- A review of the World Wide Web and RealAudio! |
03:14
🔗
|
godane |
1995-06-06 episode |
03:23
🔗
|
godane |
https://archive.org/details/The_Official_ATT_Worldnet_Web_Discovery_Guide_CD |
03:24
🔗
|
DoomTay |
You should scan the CD cover as well |
03:27
🔗
|
|
nickname_ has quit IRC (Ping timeout: 492 seconds) |
03:29
🔗
|
|
xXx_ndidd has joined #archiveteam-bs |
03:30
🔗
|
|
tomwsmf-a has quit IRC (Read error: Operation timed out) |
03:32
🔗
|
|
ndiddy has quit IRC (Ping timeout: 244 seconds) |
04:01
🔗
|
|
vitzli has joined #archiveteam-bs |
04:06
🔗
|
|
Sk1d has quit IRC (Ping timeout: 250 seconds) |
04:12
🔗
|
|
FalconK has quit IRC (Ping timeout: 260 seconds) |
04:13
🔗
|
|
FalconK has joined #archiveteam-bs |
04:15
🔗
|
|
Sk1d has joined #archiveteam-bs |
04:22
🔗
|
|
tomwsmf-a has joined #archiveteam-bs |
04:31
🔗
|
|
tomwsmf-a has quit IRC (Ping timeout: 258 seconds) |
04:52
🔗
|
|
DoomTay has quit IRC (Quit: Page closed) |
04:52
🔗
|
godane |
https://archive.org/details/Signing_For_Dummies_CD |
05:07
🔗
|
|
BlueMaxim has joined #archiveteam-bs |
05:16
🔗
|
godane |
so i'm grabbing Analytical Chemistry journals |
05:28
🔗
|
godane |
someone else can go after it: https://thepiratebay.org/torrent/13670809/Analytical_Chemistry_(1929-2015) |
05:28
🔗
|
godane |
this guy has a lot of interesting stuff: https://thepiratebay.org/user/clouderone/0/5/0 |
05:36
🔗
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
06:07
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
06:09
🔗
|
|
dashcloud has joined #archiveteam-bs |
06:11
🔗
|
godane |
starting to upload NASA docs again: https://archive.org/details/NASA_NTRS_Archive_19630002631 |
06:24
🔗
|
godane |
i'm also starting to upload my deadspin.com sitemap urls |
06:24
🔗
|
godane |
i'm starting to download 2010 urls |
06:48
🔗
|
ivan` |
when you notice that your ubuntu 16.04 tmuxes are locking up, grep this log and read what I wrote here :-) https://github.com/ludios/grab-site/commit/ca7bc71045784b1cfda2a6d1d6dbc44a4000aa13 |
07:39
🔗
|
|
is- has quit IRC (Ping timeout: 362 seconds) |
07:46
🔗
|
|
schbirid has joined #archiveteam-bs |
08:00
🔗
|
|
vitzli has quit IRC (Quit: Leaving) |
08:31
🔗
|
|
is- has joined #archiveteam-bs |
08:53
🔗
|
|
Stilett0 has joined #archiveteam-bs |
08:53
🔗
|
|
Stiletto has quit IRC (Read error: Operation timed out) |
09:26
🔗
|
|
Zebranky has quit IRC (Read error: Operation timed out) |
09:36
🔗
|
|
Zebranky has joined #archiveteam-bs |
09:48
🔗
|
|
vitzli has joined #archiveteam-bs |
11:51
🔗
|
midas |
ivan`: ubuntu 16.04, systemd? |
11:54
🔗
|
|
jut has joined #archiveteam-bs |
12:38
🔗
|
|
xXx_ndidd has quit IRC (Read error: Connection reset by peer) |
12:41
🔗
|
|
j08nY has joined #archiveteam-bs |
12:47
🔗
|
|
VADemon has joined #archiveteam-bs |
13:36
🔗
|
|
nickname_ has joined #archiveteam-bs |
13:43
🔗
|
|
VADemon has quit IRC (Quit: left4dead) |
13:44
🔗
|
|
nickname_ has quit IRC (Read error: Operation timed out) |
13:50
🔗
|
|
sep332 has quit IRC (Quit: Konversation terminated!) |
14:00
🔗
|
|
jut has quit IRC (Read error: Connection reset by peer) |
14:25
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
14:25
🔗
|
|
aschmitz has quit IRC (Read error: Operation timed out) |
15:18
🔗
|
|
VADemon has joined #archiveteam-bs |
15:30
🔗
|
|
DoomTay has joined #archiveteam-bs |
15:36
🔗
|
luckcolor |
guys any good tool to download an entire archive.org collection? |
15:38
🔗
|
DoomTay |
So we're clear, you mean a single collection, not the entire archive.org database? |
15:38
🔗
|
luckcolor |
ofc a single collection :P |
15:39
🔗
|
luckcolor |
entire archive.org collection == entire collection? |
15:39
🔗
|
luckcolor |
:P |
15:39
🔗
|
JW_work |
luckcolor: Use the internetarchive python module, and some scripting around it |
15:39
🔗
|
JW_work |
MacNN is going away — http://www.macnn.com/articles/16/06/20/long.time.staff.winding.up.two.decades.of.service.at.the.end.of.june.134716/ |
15:39
🔗
|
* |
luckcolor doesn't use python |
15:40
🔗
|
luckcolor |
but i will check out how that module is would prefer other stuff |
15:46
🔗
|
JW_work |
well, you're going to need to do *some* scripting, afaik |
15:46
🔗
|
JW_work |
what language do you prefer? |
15:47
🔗
|
luckcolor |
What languages do you know? maybe it's easier :) |
15:48
🔗
|
luckcolor |
well found one https://github.com/ghalfacree/bash-scripts/blob/master/archivedownload.sh |
15:49
🔗
|
JW_work |
add it to the wiki if it isn't already there |
15:49
🔗
|
luckcolor |
sigh wiki editing incoming.... |
15:52
🔗
|
luckcolor |
actually this script requires you to tell it wich file type to download |
15:53
🔗
|
|
j08nY has quit IRC (Quit: Leaving) |
15:53
🔗
|
luckcolor |
ia download --search 'collection:terroroftinytown' |
15:54
🔗
|
JW_work |
You want to download the zip files |
15:59
🔗
|
luckcolor |
ia download --search 'subject:terroroftinytown' |
15:59
🔗
|
luckcolor |
error: the query "'subject:terroroftinytown'" returned no results |
16:00
🔗
|
JW_work |
I think you want 'collection:urlteam' |
16:00
🔗
|
luckcolor |
trying |
16:02
🔗
|
luckcolor |
ia download --search subject:terroroftinytown |
16:02
🔗
|
luckcolor |
urlteam_2014-11-06-20-33-23 (1/512): dd |
16:02
🔗
|
luckcolor |
had to remove the unix like quotation marks |
16:02
🔗
|
JW_work |
ha |
16:02
🔗
|
luckcolor |
apparently windows or the script doens't like them |
16:02
🔗
|
luckcolor |
:P |
16:03
🔗
|
luckcolor |
side note: that's the worst looking chracter for a progress bar |
16:03
🔗
|
luckcolor |
*character |
16:03
🔗
|
luckcolor |
oh well |
16:04
🔗
|
JW_work |
heh, feel free to make a PR |
16:27
🔗
|
|
vitzli has quit IRC (Quit: Leaving) |
16:29
🔗
|
|
Rickster has quit IRC (Quit: ZNC 1.6.1 - http://znc.in) |
16:30
🔗
|
|
Rickster has joined #archiveteam-bs |
17:02
🔗
|
luckcolor |
Also it's ultra slow don't know if it's my connection but will run it on my server |
17:02
🔗
|
luckcolor |
200mbit/s go |
17:02
🔗
|
luckcolor |
! |
17:06
🔗
|
|
metalcamp has joined #archiveteam-bs |
17:07
🔗
|
luckcolor |
still incredibily slow i will leave it running overnight |
17:07
🔗
|
luckcolor |
JW_work perhaps do you know how big is the collection? |
17:11
🔗
|
luckcolor |
ok on my server it says it had errors |
17:12
🔗
|
* |
luckcolor decides to change tool as this one seems to be stupid |
17:15
🔗
|
JW_work |
less than a terabyte |
17:16
🔗
|
JW_work |
I'm seeding the whole collection, also — so you could get it that way if you prefer |
17:16
🔗
|
JW_work |
but you'll still need to grab the torrents (or magnet links) |
17:17
🔗
|
luckcolor |
JW_work good news then because i don't have enough space anywhere |
17:17
🔗
|
DoomTay |
200mbits is better than what I have |
17:17
🔗
|
luckcolor |
it's on the server |
17:17
🔗
|
luckcolor |
and apparently it says error |
17:18
🔗
|
luckcolor |
SIGH python |
17:18
🔗
|
JW_work |
luckcolor: how much space do you have? |
17:18
🔗
|
luckcolor |
not enough 250 gb here |
17:18
🔗
|
luckcolor |
36 on the server |
17:18
🔗
|
JW_work |
36gb? yeah, you'll need more than that :-) |
17:19
🔗
|
luckcolor |
well ssd volumes aren't cheap |
17:19
🔗
|
JW_work |
why does it need to be an SSD? |
17:20
🔗
|
luckcolor |
Scaleway |
17:20
🔗
|
luckcolor |
only ssds there |
17:20
🔗
|
JW_work |
ah, makes sense |
17:32
🔗
|
yipdw |
must be the program, can't be the network or infrastructure or anything else, oh no |
17:34
🔗
|
luckcolor |
i mean i don't have the space anyway |
17:48
🔗
|
|
Start_ is now known as Start |
17:57
🔗
|
espes__ |
true? https://twitter.com/marcan42/status/745183129283919872 |
17:57
🔗
|
espes__ |
'Why does RT get to retroactively edit YouTube videos, keeping the URL, even though YT says that is "not possible"?' |
17:58
🔗
|
espes__ |
https://meduza.io/en/news/2016/06/20/russian-state-television-accidentally-broadcasts-evidence-that-moscow-uses-cluster-bombs-in-syria |
18:14
🔗
|
arkiver |
I still see the big barrels here https://www.youtube.com/watch?v=dNbIRD8Cq48 |
18:15
🔗
|
espes__ |
see the note |
18:16
🔗
|
espes__ |
guess youtube is mutable -_- |
18:18
🔗
|
schbirid |
i thought partners can replace videos |
18:19
🔗
|
yipdw |
if RT didn't get to rewrite history in their own image it wouldn't really be RT |
18:20
🔗
|
xmc |
harsh |
18:20
🔗
|
yipdw |
wabam |
18:32
🔗
|
|
tomwsmf-a has joined #archiveteam-bs |
18:43
🔗
|
DoomTay |
The sad thing about sites going down is that even if they were perfectly saved, they would still be harder to search |
18:43
🔗
|
xmc |
ya |
18:55
🔗
|
JW_work |
eh, not always — once they are converted into downloadable dumps, it can be *easier* to search them |
18:55
🔗
|
JW_work |
just because we don't *currently* have fabulous WARC searching tools doesn't mean that such things can't be written |
18:56
🔗
|
xmc |
yeah you could build a thing that pulls unusual words out of the warc'd html and adds it to a cdx-style index |
19:16
🔗
|
|
j08nY has joined #archiveteam-bs |
19:26
🔗
|
|
nickname_ has joined #archiveteam-bs |
19:26
🔗
|
|
antomati_ has joined #archiveteam-bs |
19:26
🔗
|
|
swebb sets mode: +o antomati_ |
19:26
🔗
|
|
RichardG_ has joined #archiveteam-bs |
19:28
🔗
|
|
godane has quit IRC (Read error: Operation timed out) |
19:29
🔗
|
|
godane has joined #archiveteam-bs |
19:29
🔗
|
|
antomatic has quit IRC (Read error: Operation timed out) |
19:29
🔗
|
|
Igloo has quit IRC (Read error: Operation timed out) |
19:29
🔗
|
|
whopper has quit IRC (Read error: Operation timed out) |
19:29
🔗
|
|
Igloo has joined #archiveteam-bs |
19:31
🔗
|
|
RichardG has quit IRC (Read error: Operation timed out) |
19:33
🔗
|
|
Whopper has joined #archiveteam-bs |
20:20
🔗
|
|
nickname_ has quit IRC (Read error: Operation timed out) |
20:35
🔗
|
|
anjacks0n has joined #archiveteam-bs |
20:49
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
21:04
🔗
|
|
ris has joined #archiveteam-bs |
21:36
🔗
|
|
metalcamp has quit IRC (Quit: Bye) |
21:44
🔗
|
|
ndiddy has joined #archiveteam-bs |
21:53
🔗
|
xmc |
ticalc.org turned 20 today and we're not closing the site http://www.ticalc.org/archives/news/articles/14/148/148917.html |
21:53
🔗
|
|
Simpbrain has quit IRC (Read error: Connection reset by peer) |
21:53
🔗
|
|
anjacks0n has quit IRC (anjacks0n) |
22:05
🔗
|
|
Sue_ has quit IRC (Ping timeout: 260 seconds) |
22:27
🔗
|
JW_work |
we should probably drop it in archivebot anyway :-P |
22:28
🔗
|
JW_work |
(but likely not with high priority) |
22:28
🔗
|
DoomTay |
And to think two jobs "failed" on me recently |
22:39
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
22:43
🔗
|
|
dashcloud has joined #archiveteam-bs |
23:21
🔗
|
|
tomwsmf-a has quit IRC (Read error: Operation timed out) |
23:35
🔗
|
|
Whopper has quit IRC (Read error: Operation timed out) |
23:40
🔗
|
|
tomwsmf-a has joined #archiveteam-bs |
23:57
🔗
|
|
Sue_ has joined #archiveteam-bs |