#archiveteam 2013-12-21,Sat

↑back Search

Time Nickname Message
00:10 🔗 NK_ Asparagir: what about rtmpdump? 99% of the streaming content is rtmp
00:10 🔗 NK_ you could try livestreamer, based on rtmpdump, it could just work out of the box
00:12 🔗 Asparagir Oh wow, livestreamer looks perfect. Thanks! http://livestreamer.tanuki.se/en/latest/
00:13 🔗 NK_ Asparagir: yeah, it can work great, but it's luck
00:13 🔗 NK_ if it doesn't, just use rtmpdump, it's included within the livestreamer release
00:27 🔗 Nemo_bis youtube-dl works
00:27 🔗 Nemo_bis iirc I used it to download some stuff
00:28 🔗 xmc youtube-dl works for so many things
00:28 🔗 xmc it's like a sonic screwdriver for web video
00:29 🔗 Nemo_bis yep, I think they support several hundreds websites by now
00:29 🔗 xmc geez
00:31 🔗 NK_ Nemo_bis: you can whatch live streams with youtube-dl ?
00:33 🔗 Nemo_bis the -dl part of it suggests not
00:33 🔗 Nemo_bis but who knows
00:44 🔗 Asparagir Livestreamer seems to be doing the trick. Really easy to install. Currently grabbing stream from a local Utah station.
00:44 🔗 Asparagir So, thanks!
02:08 🔗 Hadouken is there an archive of shoutcast.com and playlists?
03:45 🔗 Sum1 With the Archivebot how long is it before the upload to IA starts?
03:46 🔗 joepie91 Sum1: they are uploaded in batches
03:46 🔗 joepie91 I don't think there's a fixed interval for that
03:46 🔗 joepie91 afaik it's just "when the server is full"
03:47 🔗 Sum1 Ah, thanks.
03:58 🔗 kyan other band that bothers me is Darkside of Innocence, who post unfinished mixes of their tracks on soundcloud for a couple days and then pull them down again
03:58 🔗 kyan I love the band
03:58 🔗 kyan but not that
03:58 🔗 kyan Ouch. That was supposed to be in -bs
03:58 🔗 kyan sorry
04:44 🔗 DFJustin https://twitter.com/textfiles/status/414242871261548544
07:23 🔗 joepie91 https://dnscensus2013.neocities.org/
08:03 🔗 yipdw DFJustin: done
08:05 🔗 ivan` http://www.poynter.org/latest-news/top-stories/234307/singaporean-government-bureaucracy-effectively-closes-news-site/
12:04 🔗 arkiver winamp should be closed now...
12:04 🔗 arkiver but still online
12:04 🔗 arkiver :D
12:05 🔗 arkiver archiving everything as much as I can
12:05 🔗 arkiver everything is probably already downloaded
12:05 🔗 arkiver but just doing a second one to be sure everything is there 100%
12:14 🔗 joepie91 http://blogs.smithsonianmag.com/science/2013/12/the-vast-majority-of-raw-data-from-old-scientific-studies-may-now-be-missing/
13:45 🔗 tephra joepie91: I have been thinking of going after data for some time, seems like a good idea to start now. I have access to a lot of recources (papers, databases, etc.) via my university
13:46 🔗 joepie91 tephra: I assume those are normally restricted-access documents?
13:47 🔗 joepie91 because you'd probably want to look into pdfparanoia
13:51 🔗 tephra joepie91: well yes like most say no automated downloading and restrictions on sharing material especially on papers
13:51 🔗 joepie91 right, definitely pdfparanoia (and general carefulness) then
13:52 🔗 joepie91 (pdfparanoia == watermark stripper)
13:53 🔗 tephra nice, downloaded.
14:01 🔗 joepie91 tephra: if you need help with automation of this, let me know :)
14:06 🔗 tephra joepie91: thanks! I think automation of paper downloading would be a bad idea since it could get me kicked out of school but I'm making a list of databases that would be good to mirror. Will start tomorrow when
14:49 🔗 chfoo a gentle reminder: #shipwretched . ~5 days remaining.
14:51 🔗 Nemo_bis oh, it only needs to get 46 times faster to complete
14:54 🔗 arkiver everyone!!
14:54 🔗 arkiver http://www.scirus.com/
14:54 🔗 arkiver new website dying:
14:54 🔗 arkiver one of the biggest scientific search engines
14:54 🔗 arkiver but
14:55 🔗 arkiver I think we can't like "download" search engines right?
14:55 🔗 joepie91 arkiver: this might be a good opportunity to do some diplomatic negotiations about them uploading their dataset to IA
14:56 🔗 joepie91 :)
14:56 🔗 arkiver hmm
14:56 🔗 arkiver shall I try to mail them?
14:56 🔗 joepie91 oh jesus fuck
14:56 🔗 joepie91 elsevier
14:56 🔗 joepie91 of fucking course
14:56 🔗 arkiver elsevier?
14:56 🔗 joepie91 arkiver: please do
14:56 🔗 joepie91 I don't think I can type a full e-mail to Elsevier
14:56 🔗 joepie91 without ranting at them
14:56 🔗 joepie91 Elsevier owns that site
14:57 🔗 arkiver ah I see
14:57 🔗 arkiver "Elsevier B.V."
14:57 🔗 arkiver but maybe it would be better to ask if yipdw or sketchcow can send an email
14:57 🔗 arkiver since I'm not really into this a long time
14:57 🔗 joepie91 perhaps
14:57 🔗 arkiver so there are some things I don't know yet
14:57 🔗 joepie91 paging SketchCow, paging SketchCow
14:58 🔗 arkiver and I don't know exactly what the IA can handle and not handle
14:58 🔗 arkiver what do you mean?
14:58 🔗 arkiver paging?
15:05 🔗 joepie91 arkiver: just highlighting him and pointing out that it's important :)
15:05 🔗 Ymgve arkiver: http://en.wikipedia.org/wiki/Public_address
15:06 🔗 arkiver joepie91: ah, I see... :D
15:07 🔗 Ymgve in the old days before cell phones people in for example a hospital would say "Paging Dr. XXX, please call number yyy" over the loudspeaker system
15:07 🔗 Ymgve so no matter where XXX was, he got the message and could call whoever tried to contact him
15:08 🔗 arkiver Ymgve: ah, haha, I get it now
15:08 🔗 arkiver thanks
15:18 🔗 arkiver does someone know if it is possible to download a website for which you need an account?
15:22 🔗 joepie91 arkiver: yes.
15:22 🔗 joepie91 you will need to export your cookies
15:22 🔗 joepie91 and import them into wget/httrack/whatever
15:27 🔗 Nemo_bis yay, this time I managed to follow CameronD's suggestion, with some changes https://github.com/ArchiveTeam/wretch-grab/pull/2
15:31 🔗 arkiver joepie91: hmm, I will take a look if that is possible with heritrix
15:31 🔗 joepie91 arkiver: I feel like you're the one user of heritrix besides IA
15:31 🔗 joepie91 heh
15:31 🔗 arkiver but would it be legal to download forum with also the account-only parts and then upload it to the archive?
15:33 🔗 arkiver joepie91: yeah, sometimes I have that feeling too... :P
15:33 🔗 joepie91 <arkiver>but would it be legal to download forum with also the account-only parts and then upload it to the archive?
15:33 🔗 joepie91 don't worry about that
15:33 🔗 joepie91 archive first, ask questions later
15:33 🔗 joepie91 :)
15:34 🔗 arkiver I don't really get the point in using wget while there is also heritrix which is made to create warc's
15:34 🔗 joepie91 arkiver: wget setup time, 12 seconds (apt-get install wget)
15:34 🔗 joepie91 heritrix setup time... how many days? :P
15:34 🔗 arkiver ??
15:34 🔗 arkiver wiuth setup time you mean get it running?
15:34 🔗 joepie91 yes
15:34 🔗 arkiver lol
15:34 🔗 arkiver less then 12 seconds
15:35 🔗 arkiver if you are running windows (like me):
15:35 🔗 arkiver create "run.bat" in bin folder
15:35 🔗 arkiver add "heritrix -a admin:admin"
15:35 🔗 arkiver in that bat file
15:35 🔗 arkiver click it
15:35 🔗 arkiver done!
15:36 🔗 arkiver that;s it
15:38 🔗 arkiver and it's working very good
15:39 🔗 arkiver if internet connection is out
15:39 🔗 arkiver it pauses the download for 15 minutes
15:39 🔗 arkiver and then tries again 30 times
15:39 🔗 arkiver so I don't have to worry for my internet connection
15:39 🔗 arkiver and it's running way faster i think then other programs
15:39 🔗 arkiver also
15:39 🔗 arkiver it unpacks swf files
15:39 🔗 arkiver and checks them for links
16:31 🔗 joepie91 huh, really
16:31 🔗 joepie91 that's actually kinda cool
16:43 🔗 dovahkiin so can anyone explain me what the 4 download/upload speeds in the left bottom of the 8001 localhost page exactly mean?
16:44 🔗 dovahkiin bottom speeds = current and top speeds = total?
16:49 🔗 antomatic green numbers are data downloaded and speed downloaded
16:49 🔗 antomatic white numbers are data UPLOADED and speed uploaded
16:50 🔗 antomatic number at the top of the chart is max upload/download speed within that chart
16:50 🔗 antomatic (which is why it goes up and down as the chart resizes)
16:52 🔗 dovahkiin ah thanks!
17:53 🔗 ZoeB hey hey ^.^
18:17 🔗 SketchCow Hi.
18:19 🔗 turnip Hello. Maybe
19:02 🔗 ZoeB Whoever uploaded substantial amounts of USENET to the Internet Archive, thanks! :D
19:03 🔗 dashcloud ZoeB: you should promote your USENET tool here- other folks might like to use it
19:03 🔗 ZoeB Oh, the https://github.com/ZoeB/arcmesg one?
19:03 🔗 dashcloud I think so
19:04 🔗 ZoeB Much like how Git works behind the scenes, you can throw tonnes of messages at it, and it sorts them neatly into a directory with filenames that are SHA1s of the unique message ID
19:05 🔗 ZoeB My dream here is to somehow implement the Message-ID part of the URL spec, so you can link to the message ID and know where you can look it up with reasonable certainty that it'll be there. :)
19:05 🔗 ZoeB But I'm not *that* good a programmer, so I'm certainly open to features, bug fixes, etc
19:07 🔗 ZoeB But you can give it POP3 details, NNTP details, or now a single-message-per-file list of local files (useful in combination with Git, which can split an mbox into one file per message) and it'll import all of the messages from all these places into a big Katamari-style directory
19:09 🔗 ZoeB Anyway, anyone's free to use it. I've got several gigabytes' worth of messages in my personal stash, mostly from USENET, but also a few public mailing lists (mostly GNU, plus a few more obscure ones). The exiting USENET archives on the Internet Archive plus Gmane make it somewhat redundant, but I'm all in favour of redundancy where data preservation is concerned. :)
19:10 🔗 ZoeB It's a fun use of a Raspberry Pi IMO, anyway. :D
19:11 🔗 dashcloud I think someone here created this site: http://olduse.net/ which shows you what you would've seen many years ago today on Usenet
19:12 🔗 ZoeB Yes, that one's neat too!
19:13 🔗 ZoeB I started off just archiving a few synthesiser-based mailing lists over at http://analogue.bytenoise.co.uk but I kinda increased my scope a bit >.>
19:14 🔗 ZoeB Eventually I hope to put this on a website so that all the messages are searchable and browseable, and to put up a torrent of the message tarball, but those are a while off yet. I'm hoping to amass a few TBs' worth of messages first.
19:14 🔗 chavezery you could use doxbin's source
19:14 🔗 chavezery iirc they have a search bar thinger
19:15 🔗 ZoeB Doxbin?
19:16 🔗 chavezery it's exactly what it sounds like
19:16 🔗 ZoeB Heh, OK
19:16 🔗 chavezery i'd provide a link but it's easy to google and i'd rather not get b& ::P
19:16 🔗 chavezery *:P
19:17 🔗 ZoeB Searching does get a bit tricky when you get into the several-gigabytes'-worth-of-data territory, yeah.
19:17 🔗 ZoeB Although a nice thing about messages is that they're plaintext, but still, grepping them would take a while
19:17 🔗 ZoeB BIAB, helping Nina install furniture...
19:32 🔗 ZoeB back
19:35 🔗 ZoeB This olduse.net is such a time suck... reminds me of when I was using Pine via a 7" black and white monitor. :)
19:36 🔗 chavezery downloading rationalwiki
19:36 🔗 chavezery this is gonna take a while :|
19:37 🔗 ZoeB Ooh, good one
19:46 🔗 ZoeB I just got a few transsex and intersex specific sites (research and support groups), and some obscure operating system sites. Not that I have anything to actually do with these warcs, mind. :) Still, it'll be nice to preserve more obscure parts of what this period in history's like.
19:53 🔗 ZoeB OMG, KA9Q's site is still up! :gets:
19:58 🔗 ZoeB Back when I worked for an ISP, I used to fear getting a tech support call for KA9Q
19:59 🔗 closure ZoeB: glad you like my olduse.net :)
20:00 🔗 ZoeB Ah, it's yours? Yeah, it's really neat!
20:00 🔗 dashcloud ZoeB: not sure if you know about it, but there's archivebot in #archivebot that will automatically archive smaller sites and pages if you have ops (for handling small jobs and making sure things can get archived quickly, in a standard fashion)
20:01 🔗 ZoeB Ooh, I heard about archivebot in a speech of Jason Scott's I was just watching. Neat, thanks!
20:01 🔗 ZoeB What is the standard fashion, incidentally? It'd be nice to use that when I'm grabbing stuff too, I'm guessing
20:01 🔗 closure still looking for usernet archives from the 90's
20:02 🔗 ZoeB closure: So http://article.olduse.net/5946%40unc.UUCP was doing the rounds, and I showed it to my mother, and then we got reminiscing about her old Philips word processor with that kind of screen. So thank you for that. :)
20:02 🔗 ZoeB I'd love to get my hands on the Walnut Creek USENET CD-ROMs
20:02 🔗 yipdw ZoeB: standard fashion?
20:02 🔗 yipdw anyone can use it
20:03 🔗 * closure too
20:03 🔗 ZoeB yipdw: dashcloud said something about a standard fashion. I assumed he meant something like standard wget arguments, filenames to save to, etc?
20:03 🔗 yipdw ZoeB: oh
20:03 🔗 ZoeB well, I presume he. I should know better too. :/
20:03 🔗 yipdw archivebot automates Archive Team standard practices, i.e. shoving stuff into a gzipped WARC
20:04 🔗 ZoeB Ah, cool
20:04 🔗 yipdw it has limitations
20:04 🔗 ZoeB I use this personally: "wget -mbc --warc-file=www.example.com --warc-cdx --wait=5 http://www.example.com -o www.example.com.log" But I'm open to suggestions, it'd be nice to conform to what everyone else is doing for consistency's sake, to make other people's lives easier
20:04 🔗 dashcloud here's the github for it: https://github.com/ArchiveTeam/ArchiveBot
20:04 🔗 ZoeB biab, you groovy people
20:05 🔗 yipdw two big ones: (1) it doesn't yet get Flash videos; (2) Javascript-heavy sites may or may not work, depending on how AJAXed up it is; and (3) there are capacity limits
20:05 🔗 yipdw and I just realized I put three for two, oh well
20:05 🔗 yipdw but that said it seems to do pretty well based on what I've seen coming in
20:06 🔗 yipdw I hear Heritrix has SWF-specific filters; maybe it'd make sense to integrate that
20:08 🔗 ZoeB closure: how much she liked browsing that site: https://twitter.com/thomasbeth/status/407593998195257346
20:09 🔗 ZoeB So thank you. :)
20:28 🔗 Smiley Anyone imported the VM image into virtualbox on the commandline?
20:53 🔗 touya hm clanbase shut down. not sure if anyone has a backup
22:24 🔗 ZoeB "Don't look down, never look away; ArchiveBot's like the wind." Nice. :)

irclogger-viewer