#archiveteam 2013-12-21,Sat

↑back Search

Time	Nickname	Message
00:10 ^🔗	NK_	Asparagir: what about rtmpdump? 99% of the streaming content is rtmp
00:10 ^🔗	NK_	you could try livestreamer, based on rtmpdump, it could just work out of the box
00:12 ^🔗	Asparagir	Oh wow, livestreamer looks perfect. Thanks! http://livestreamer.tanuki.se/en/latest/
00:13 ^🔗	NK_	Asparagir: yeah, it can work great, but it's luck
00:13 ^🔗	NK_	if it doesn't, just use rtmpdump, it's included within the livestreamer release
00:27 ^🔗	Nemo_bis	youtube-dl works
00:27 ^🔗	Nemo_bis	iirc I used it to download some stuff
00:28 ^🔗	xmc	youtube-dl works for so many things
00:28 ^🔗	xmc	it's like a sonic screwdriver for web video
00:29 ^🔗	Nemo_bis	yep, I think they support several hundreds websites by now
00:29 ^🔗	xmc	geez
00:31 ^🔗	NK_	Nemo_bis: you can whatch live streams with youtube-dl ?
00:33 ^🔗	Nemo_bis	the -dl part of it suggests not
00:33 ^🔗	Nemo_bis	but who knows
00:44 ^🔗	Asparagir	Livestreamer seems to be doing the trick. Really easy to install. Currently grabbing stream from a local Utah station.
00:44 ^🔗	Asparagir	So, thanks!
02:08 ^🔗	Hadouken	is there an archive of shoutcast.com and playlists?
03:45 ^🔗	Sum1	With the Archivebot how long is it before the upload to IA starts?
03:46 ^🔗	joepie91	Sum1: they are uploaded in batches
03:46 ^🔗	joepie91	I don't think there's a fixed interval for that
03:46 ^🔗	joepie91	afaik it's just "when the server is full"
03:47 ^🔗	Sum1	Ah, thanks.
03:58 ^🔗	kyan	other band that bothers me is Darkside of Innocence, who post unfinished mixes of their tracks on soundcloud for a couple days and then pull them down again
03:58 ^🔗	kyan	I love the band
03:58 ^🔗	kyan	but not that
03:58 ^🔗	kyan	Ouch. That was supposed to be in -bs
03:58 ^🔗	kyan	sorry
04:44 ^🔗	DFJustin	https://twitter.com/textfiles/status/414242871261548544
07:23 ^🔗	joepie91	https://dnscensus2013.neocities.org/
08:03 ^🔗	yipdw	DFJustin: done
08:05 ^🔗	ivan`	http://www.poynter.org/latest-news/top-stories/234307/singaporean-government-bureaucracy-effectively-closes-news-site/
12:04 ^🔗	arkiver	winamp should be closed now...
12:04 ^🔗	arkiver	but still online
12:04 ^🔗	arkiver	:D
12:05 ^🔗	arkiver	archiving everything as much as I can
12:05 ^🔗	arkiver	everything is probably already downloaded
12:05 ^🔗	arkiver	but just doing a second one to be sure everything is there 100%
12:14 ^🔗	joepie91	http://blogs.smithsonianmag.com/science/2013/12/the-vast-majority-of-raw-data-from-old-scientific-studies-may-now-be-missing/
13:45 ^🔗	tephra	joepie91: I have been thinking of going after data for some time, seems like a good idea to start now. I have access to a lot of recources (papers, databases, etc.) via my university
13:46 ^🔗	joepie91	tephra: I assume those are normally restricted-access documents?
13:47 ^🔗	joepie91	because you'd probably want to look into pdfparanoia
13:51 ^🔗	tephra	joepie91: well yes like most say no automated downloading and restrictions on sharing material especially on papers
13:51 ^🔗	joepie91	right, definitely pdfparanoia (and general carefulness) then
13:52 ^🔗	joepie91	(pdfparanoia == watermark stripper)
13:53 ^🔗	tephra	nice, downloaded.
14:01 ^🔗	joepie91	tephra: if you need help with automation of this, let me know :)
14:06 ^🔗	tephra	joepie91: thanks! I think automation of paper downloading would be a bad idea since it could get me kicked out of school but I'm making a list of databases that would be good to mirror. Will start tomorrow when
14:49 ^🔗	chfoo	a gentle reminder: #shipwretched . ~5 days remaining.
14:51 ^🔗	Nemo_bis	oh, it only needs to get 46 times faster to complete
14:54 ^🔗	arkiver	everyone!!
14:54 ^🔗	arkiver	http://www.scirus.com/
14:54 ^🔗	arkiver	new website dying:
14:54 ^🔗	arkiver	one of the biggest scientific search engines
14:54 ^🔗	arkiver	but
14:55 ^🔗	arkiver	I think we can't like "download" search engines right?
14:55 ^🔗	joepie91	arkiver: this might be a good opportunity to do some diplomatic negotiations about them uploading their dataset to IA
14:56 ^🔗	joepie91	:)
14:56 ^🔗	arkiver	hmm
14:56 ^🔗	arkiver	shall I try to mail them?
14:56 ^🔗	joepie91	oh jesus fuck
14:56 ^🔗	joepie91	elsevier
14:56 ^🔗	joepie91	of fucking course
14:56 ^🔗	arkiver	elsevier?
14:56 ^🔗	joepie91	arkiver: please do
14:56 ^🔗	joepie91	I don't think I can type a full e-mail to Elsevier
14:56 ^🔗	joepie91	without ranting at them
14:56 ^🔗	joepie91	Elsevier owns that site
14:57 ^🔗	arkiver	ah I see
14:57 ^🔗	arkiver	"Elsevier B.V."
14:57 ^🔗	arkiver	but maybe it would be better to ask if yipdw or sketchcow can send an email
14:57 ^🔗	arkiver	since I'm not really into this a long time
14:57 ^🔗	joepie91	perhaps
14:57 ^🔗	arkiver	so there are some things I don't know yet
14:57 ^🔗	joepie91	paging SketchCow, paging SketchCow
14:58 ^🔗	arkiver	and I don't know exactly what the IA can handle and not handle
14:58 ^🔗	arkiver	what do you mean?
14:58 ^🔗	arkiver	paging?
15:05 ^🔗	joepie91	arkiver: just highlighting him and pointing out that it's important :)
15:05 ^🔗	Ymgve	arkiver: http://en.wikipedia.org/wiki/Public_address
15:06 ^🔗	arkiver	joepie91: ah, I see... :D
15:07 ^🔗	Ymgve	in the old days before cell phones people in for example a hospital would say "Paging Dr. XXX, please call number yyy" over the loudspeaker system
15:07 ^🔗	Ymgve	so no matter where XXX was, he got the message and could call whoever tried to contact him
15:08 ^🔗	arkiver	Ymgve: ah, haha, I get it now
15:08 ^🔗	arkiver	thanks
15:18 ^🔗	arkiver	does someone know if it is possible to download a website for which you need an account?
15:22 ^🔗	joepie91	arkiver: yes.
15:22 ^🔗	joepie91	you will need to export your cookies
15:22 ^🔗	joepie91	and import them into wget/httrack/whatever
15:27 ^🔗	Nemo_bis	yay, this time I managed to follow CameronD's suggestion, with some changes https://github.com/ArchiveTeam/wretch-grab/pull/2
15:31 ^🔗	arkiver	joepie91: hmm, I will take a look if that is possible with heritrix
15:31 ^🔗	joepie91	arkiver: I feel like you're the one user of heritrix besides IA
15:31 ^🔗	joepie91	heh
15:31 ^🔗	arkiver	but would it be legal to download forum with also the account-only parts and then upload it to the archive?
15:33 ^🔗	arkiver	joepie91: yeah, sometimes I have that feeling too... :P
15:33 ^🔗	joepie91	<arkiver>but would it be legal to download forum with also the account-only parts and then upload it to the archive?
15:33 ^🔗	joepie91	don't worry about that
15:33 ^🔗	joepie91	archive first, ask questions later
15:33 ^🔗	joepie91	:)
15:34 ^🔗	arkiver	I don't really get the point in using wget while there is also heritrix which is made to create warc's
15:34 ^🔗	joepie91	arkiver: wget setup time, 12 seconds (apt-get install wget)
15:34 ^🔗	joepie91	heritrix setup time... how many days? :P
15:34 ^🔗	arkiver	??
15:34 ^🔗	arkiver	wiuth setup time you mean get it running?
15:34 ^🔗	joepie91	yes
15:34 ^🔗	arkiver	lol
15:34 ^🔗	arkiver	less then 12 seconds
15:35 ^🔗	arkiver	if you are running windows (like me):
15:35 ^🔗	arkiver	create "run.bat" in bin folder
15:35 ^🔗	arkiver	add "heritrix -a admin:admin"
15:35 ^🔗	arkiver	in that bat file
15:35 ^🔗	arkiver	click it
15:35 ^🔗	arkiver	done!
15:36 ^🔗	arkiver	that;s it
15:38 ^🔗	arkiver	and it's working very good
15:39 ^🔗	arkiver	if internet connection is out
15:39 ^🔗	arkiver	it pauses the download for 15 minutes
15:39 ^🔗	arkiver	and then tries again 30 times
15:39 ^🔗	arkiver	so I don't have to worry for my internet connection
15:39 ^🔗	arkiver	and it's running way faster i think then other programs
15:39 ^🔗	arkiver	also
15:39 ^🔗	arkiver	it unpacks swf files
15:39 ^🔗	arkiver	and checks them for links
16:31 ^🔗	joepie91	huh, really
16:31 ^🔗	joepie91	that's actually kinda cool
16:43 ^🔗	dovahkiin	so can anyone explain me what the 4 download/upload speeds in the left bottom of the 8001 localhost page exactly mean?
16:44 ^🔗	dovahkiin	bottom speeds = current and top speeds = total?
16:49 ^🔗	antomatic	green numbers are data downloaded and speed downloaded
16:49 ^🔗	antomatic	white numbers are data UPLOADED and speed uploaded
16:50 ^🔗	antomatic	number at the top of the chart is max upload/download speed within that chart
16:50 ^🔗	antomatic	(which is why it goes up and down as the chart resizes)
16:52 ^🔗	dovahkiin	ah thanks!
17:53 ^🔗	ZoeB	hey hey ^.^
18:17 ^🔗	SketchCow	Hi.
18:19 ^🔗	turnip	Hello. Maybe
19:02 ^🔗	ZoeB	Whoever uploaded substantial amounts of USENET to the Internet Archive, thanks! :D
19:03 ^🔗	dashcloud	ZoeB: you should promote your USENET tool here- other folks might like to use it
19:03 ^🔗	ZoeB	Oh, the https://github.com/ZoeB/arcmesg one?
19:03 ^🔗	dashcloud	I think so
19:04 ^🔗	ZoeB	Much like how Git works behind the scenes, you can throw tonnes of messages at it, and it sorts them neatly into a directory with filenames that are SHA1s of the unique message ID
19:05 ^🔗	ZoeB	My dream here is to somehow implement the Message-ID part of the URL spec, so you can link to the message ID and know where you can look it up with reasonable certainty that it'll be there. :)
19:05 ^🔗	ZoeB	But I'm not that good a programmer, so I'm certainly open to features, bug fixes, etc
19:07 ^🔗	ZoeB	But you can give it POP3 details, NNTP details, or now a single-message-per-file list of local files (useful in combination with Git, which can split an mbox into one file per message) and it'll import all of the messages from all these places into a big Katamari-style directory
19:09 ^🔗	ZoeB	Anyway, anyone's free to use it. I've got several gigabytes' worth of messages in my personal stash, mostly from USENET, but also a few public mailing lists (mostly GNU, plus a few more obscure ones). The exiting USENET archives on the Internet Archive plus Gmane make it somewhat redundant, but I'm all in favour of redundancy where data preservation is concerned. :)
19:10 ^🔗	ZoeB	It's a fun use of a Raspberry Pi IMO, anyway. :D
19:11 ^🔗	dashcloud	I think someone here created this site: http://olduse.net/ which shows you what you would've seen many years ago today on Usenet
19:12 ^🔗	ZoeB	Yes, that one's neat too!
19:13 ^🔗	ZoeB	I started off just archiving a few synthesiser-based mailing lists over at http://analogue.bytenoise.co.uk but I kinda increased my scope a bit >.>
19:14 ^🔗	ZoeB	Eventually I hope to put this on a website so that all the messages are searchable and browseable, and to put up a torrent of the message tarball, but those are a while off yet. I'm hoping to amass a few TBs' worth of messages first.
19:14 ^🔗	chavezery	you could use doxbin's source
19:14 ^🔗	chavezery	iirc they have a search bar thinger
19:15 ^🔗	ZoeB	Doxbin?
19:16 ^🔗	chavezery	it's exactly what it sounds like
19:16 ^🔗	ZoeB	Heh, OK
19:16 ^🔗	chavezery	i'd provide a link but it's easy to google and i'd rather not get b& ::P
19:16 ^🔗	chavezery	*:P
19:17 ^🔗	ZoeB	Searching does get a bit tricky when you get into the several-gigabytes'-worth-of-data territory, yeah.
19:17 ^🔗	ZoeB	Although a nice thing about messages is that they're plaintext, but still, grepping them would take a while
19:17 ^🔗	ZoeB	BIAB, helping Nina install furniture...
19:32 ^🔗	ZoeB	back
19:35 ^🔗	ZoeB	This olduse.net is such a time suck... reminds me of when I was using Pine via a 7" black and white monitor. :)
19:36 ^🔗	chavezery	downloading rationalwiki
19:36 ^🔗	chavezery	this is gonna take a while :\|
19:37 ^🔗	ZoeB	Ooh, good one
19:46 ^🔗	ZoeB	I just got a few transsex and intersex specific sites (research and support groups), and some obscure operating system sites. Not that I have anything to actually do with these warcs, mind. :) Still, it'll be nice to preserve more obscure parts of what this period in history's like.
19:53 ^🔗	ZoeB	OMG, KA9Q's site is still up! :gets:
19:58 ^🔗	ZoeB	Back when I worked for an ISP, I used to fear getting a tech support call for KA9Q
19:59 ^🔗	closure	ZoeB: glad you like my olduse.net :)
20:00 ^🔗	ZoeB	Ah, it's yours? Yeah, it's really neat!
20:00 ^🔗	dashcloud	ZoeB: not sure if you know about it, but there's archivebot in #archivebot that will automatically archive smaller sites and pages if you have ops (for handling small jobs and making sure things can get archived quickly, in a standard fashion)
20:01 ^🔗	ZoeB	Ooh, I heard about archivebot in a speech of Jason Scott's I was just watching. Neat, thanks!
20:01 ^🔗	ZoeB	What is the standard fashion, incidentally? It'd be nice to use that when I'm grabbing stuff too, I'm guessing
20:01 ^🔗	closure	still looking for usernet archives from the 90's
20:02 ^🔗	ZoeB	closure: So http://article.olduse.net/5946%40unc.UUCP was doing the rounds, and I showed it to my mother, and then we got reminiscing about her old Philips word processor with that kind of screen. So thank you for that. :)
20:02 ^🔗	ZoeB	I'd love to get my hands on the Walnut Creek USENET CD-ROMs
20:02 ^🔗	yipdw	ZoeB: standard fashion?
20:02 ^🔗	yipdw	anyone can use it
20:03 ^🔗	*	closure too
20:03 ^🔗	ZoeB	yipdw: dashcloud said something about a standard fashion. I assumed he meant something like standard wget arguments, filenames to save to, etc?
20:03 ^🔗	yipdw	ZoeB: oh
20:03 ^🔗	ZoeB	well, I presume he. I should know better too. :/
20:03 ^🔗	yipdw	archivebot automates Archive Team standard practices, i.e. shoving stuff into a gzipped WARC
20:04 ^🔗	ZoeB	Ah, cool
20:04 ^🔗	yipdw	it has limitations
20:04 ^🔗	ZoeB	I use this personally: "wget -mbc --warc-file=www.example.com --warc-cdx --wait=5 http://www.example.com -o www.example.com.log" But I'm open to suggestions, it'd be nice to conform to what everyone else is doing for consistency's sake, to make other people's lives easier
20:04 ^🔗	dashcloud	here's the github for it: https://github.com/ArchiveTeam/ArchiveBot
20:04 ^🔗	ZoeB	biab, you groovy people
20:05 ^🔗	yipdw	two big ones: (1) it doesn't yet get Flash videos; (2) Javascript-heavy sites may or may not work, depending on how AJAXed up it is; and (3) there are capacity limits
20:05 ^🔗	yipdw	and I just realized I put three for two, oh well
20:05 ^🔗	yipdw	but that said it seems to do pretty well based on what I've seen coming in
20:06 ^🔗	yipdw	I hear Heritrix has SWF-specific filters; maybe it'd make sense to integrate that
20:08 ^🔗	ZoeB	closure: how much she liked browsing that site: https://twitter.com/thomasbeth/status/407593998195257346
20:09 ^🔗	ZoeB	So thank you. :)
20:28 ^🔗	Smiley	Anyone imported the VM image into virtualbox on the commandline?
20:53 ^🔗	touya	hm clanbase shut down. not sure if anyone has a backup
22:24 ^🔗	ZoeB	"Don't look down, never look away; ArchiveBot's like the wind." Nice. :)

irclogger-viewer