#archiveteam-bs 2017-11-01,Wed

↑back Search

Time Nickname Message
00:04 πŸ”— TC01 has quit IRC (Read error: Connection reset by peer)
00:09 πŸ”— TC01 has joined #archiveteam-bs
00:38 πŸ”— bitBaron has quit IRC (Quit: My computer has gone to sleep. ZZZzzz…)
01:04 πŸ”— SilSte has joined #archiveteam-bs
01:29 πŸ”— MRX3 has quit IRC (Quit: Leaving)
01:36 πŸ”— hook54321 If a teacher says something and a student records it is the recording part of the Public Domain?
01:42 πŸ”— drumstick has quit IRC (Ping timeout: 255 seconds)
01:43 πŸ”— drumstick has joined #archiveteam-bs
01:54 πŸ”— dashcloud no
01:55 πŸ”— dashcloud everything now is copyrighted, unless you do something specifically to change that
01:56 πŸ”— dashcloud that's what Creative Commons does for non-software things, and what all the open-source licenses do for software
01:56 πŸ”— Somebody2 Copyright in recordings of extemporaneous speech generally belong to the person who makes the recording, IIRC.
01:57 πŸ”— Somebody2 But recording people without their consent can be illegal, depending on other factors.
01:57 πŸ”— Somebody2 (like what state you are in, whether it was a private conversation, and others)
01:58 πŸ”— dashcloud if you're in person, in a school environment, unless you are specifically requested not to do so, you should be able to record without issue
01:58 πŸ”— Somebody2 And if a speech was written down before it was delivered, whoever wrote it down holds the copyright on it, and audio recordings are derivative works.
01:59 πŸ”— dashcloud hook54321: I get the feeling that none of this really answers the question you had
01:59 πŸ”— Somebody2 It's a *VERY* gray area just how detailed notes have to be to make a recording of a speech a derivative work.
01:59 πŸ”— Somebody2 But yeah, I suspect you had a different question.
02:00 πŸ”— hook54321 It wasn't really a speech, it was this teacher's rant.
02:00 πŸ”— hook54321 https://archive.org/details/BillJohnson
02:47 πŸ”— Somebody2 That certainly sounds extemporaneous, so copyright is likely not a concern.
02:48 πŸ”— Somebody2 But it also seems likely to attract the attention of an irrational and angry person, so I, at least, will be staying far away.
02:50 πŸ”— schbirid2 has quit IRC (Ping timeout: 255 seconds)
02:57 πŸ”— godane i'm splitting the bbc america bowie tape into 2 parts
02:58 πŸ”— godane cause one recording is from 2000 and the other is from 1975
03:03 πŸ”— schbirid2 has joined #archiveteam-bs
03:10 πŸ”— godane so another unlabel tape has Cinemax recording of Excalibur
03:10 πŸ”— godane i'm very sure thats on dvd some where
03:14 πŸ”— godane anyways turns out there is some sort of Live event recorded after it
03:14 πŸ”— godane called Film Independent's Spirit Awards
03:18 πŸ”— godane this must have been the 2008 one
03:19 πŸ”— godane SketchCow: btw its hosted by Rainn Wilson
03:20 πŸ”— godane also your going to better bitrate the commercial tapes with this one
03:21 πŸ”— godane i'm getting 8300k to 8700k
03:25 πŸ”— Stilett0 has joined #archiveteam-bs
03:29 πŸ”— pizzaiolo has quit IRC (Remote host closed the connection)
03:53 πŸ”— zhongfu has quit IRC (Ping timeout: 260 seconds)
04:17 πŸ”— qw3rty116 has joined #archiveteam-bs
04:20 πŸ”— bitBaron has joined #archiveteam-bs
04:23 πŸ”— qw3rty115 has quit IRC (Read error: Operation timed out)
04:25 πŸ”— bitBaron has quit IRC (Quit: My computer has gone to sleep. ZZZzzz…)
04:48 πŸ”— zhongfu has joined #archiveteam-bs
05:11 πŸ”— Lord_Nigh has quit IRC (Read error: Operation timed out)
05:12 πŸ”— hook54321 JAA: Do you still have the partial recording of Bryan Lunduke's 24 hour thing?
05:14 πŸ”— Lord_Nigh has joined #archiveteam-bs
05:42 πŸ”— balrog has quit IRC (Read error: Operation timed out)
05:46 πŸ”— balrog has joined #archiveteam-bs
05:46 πŸ”— swebb sets mode: +o balrog
07:17 πŸ”— godane SketchCow: we got some good old hbo previews at the end of this tape too
07:34 πŸ”— godane these hbo previews are from 1990/1991 since the is inside the nfl talking about super bowl 25
07:57 πŸ”— Stilett0 is now known as Stiletto
08:05 πŸ”— godane one tape i'm skipping is the 'in treatment' hbo episodes
08:06 πŸ”— godane i question its from a 2008 series and its on dvd
09:25 πŸ”— godane 1 minute of footage from this tape is missing
09:26 πŸ”— pizzaiolo has joined #archiveteam-bs
09:26 πŸ”— godane audio capture but video goes back and white then to black
09:26 πŸ”— godane around 34:16 to 35:15 this happen
09:30 πŸ”— Stiletto has quit IRC ()
09:41 πŸ”— Stilett0 has joined #archiveteam-bs
09:50 πŸ”— nyaomi has quit IRC (Read error: Operation timed out)
09:59 πŸ”— nyaomi has joined #archiveteam-bs
09:59 πŸ”— drumstick has quit IRC (Ping timeout: 255 seconds)
10:00 πŸ”— drumstick has joined #archiveteam-bs
10:00 πŸ”— pizzaiolo has quit IRC (pizzaiolo)
10:01 πŸ”— pizzaiolo has joined #archiveteam-bs
10:05 πŸ”— pizzaiolo has quit IRC (Ping timeout: 246 seconds)
10:31 πŸ”— godane i may stop the tape after the current episode only cause is having problems
10:32 πŸ”— godane there is frame issue with episode 4 on this tape
10:34 πŸ”— BlueMaxim has quit IRC (Quit: Leaving)
10:45 πŸ”— JAA hook54321: Yes, I do.
10:57 πŸ”— zhongfu has quit IRC (Ping timeout: 260 seconds)
11:03 πŸ”— zhongfu has joined #archiveteam-bs
11:12 πŸ”— zhongfu has quit IRC (Ping timeout: 260 seconds)
11:12 πŸ”— zhongfu has joined #archiveteam-bs
11:16 πŸ”— ScruffyB has joined #archiveteam-bs
11:19 πŸ”— decay_ has joined #archiveteam-bs
11:19 πŸ”— pikhq_ has joined #archiveteam-bs
11:20 πŸ”— RKenshin has joined #archiveteam-bs
11:20 πŸ”— tuluu has joined #archiveteam-bs
11:22 πŸ”— SN4T14_ has joined #archiveteam-bs
11:22 πŸ”— ppsym has joined #archiveteam-bs
11:22 πŸ”— Hecatz- has joined #archiveteam-bs
11:23 πŸ”— db420 has joined #archiveteam-bs
11:23 πŸ”— db420 has quit IRC (Connection closed)
11:23 πŸ”— LeG0ax has joined #archiveteam-bs
11:29 πŸ”— phillipsj has quit IRC (se.hub irc.underworld.no)
11:29 πŸ”— Aerochrom has quit IRC (se.hub irc.underworld.no)
11:29 πŸ”— purplebot has quit IRC (se.hub irc.underworld.no)
11:29 πŸ”— PurpleSym has quit IRC (se.hub irc.underworld.no)
11:29 πŸ”— JensRex has quit IRC (se.hub irc.underworld.no)
11:29 πŸ”— tuluu_ has quit IRC (se.hub irc.underworld.no)
11:29 πŸ”— i0npulse has quit IRC (se.hub irc.underworld.no)
11:29 πŸ”— pikhq has quit IRC (se.hub irc.underworld.no)
11:29 πŸ”— Hecatz has quit IRC (se.hub irc.underworld.no)
11:29 πŸ”— Kenshin has quit IRC (se.hub irc.underworld.no)
11:29 πŸ”— Ing3b0rg has quit IRC (se.hub irc.underworld.no)
11:29 πŸ”— dboard2 has quit IRC (se.hub irc.underworld.no)
11:29 πŸ”— Rai-chan has quit IRC (se.hub irc.underworld.no)
11:29 πŸ”— medowar has quit IRC (se.hub irc.underworld.no)
11:29 πŸ”— decay has quit IRC (se.hub irc.underworld.no)
11:29 πŸ”— SN4T14 has quit IRC (se.hub irc.underworld.no)
11:37 πŸ”— zhongfu has quit IRC (Ping timeout: 260 seconds)
11:37 πŸ”— zhongfu has joined #archiveteam-bs
11:45 πŸ”— RKenshin is now known as Kenshin
11:45 πŸ”— ppsym is now known as PurpleSym
11:45 πŸ”— LeG0ax is now known as Ing3b0rg
11:45 πŸ”— Hecatz- is now known as Hecatz
11:45 πŸ”— Aerochrom has joined #archiveteam-bs
11:56 πŸ”— pizzaiolo has joined #archiveteam-bs
11:57 πŸ”— drumstick has quit IRC (Read error: Operation timed out)
11:59 πŸ”— zhongfu has quit IRC (Ping timeout: 260 seconds)
11:59 πŸ”— zhongfu has joined #archiveteam-bs
12:24 πŸ”— dboard2 has joined #archiveteam-bs
12:24 πŸ”— dboard2 has quit IRC (Connection closed)
13:27 πŸ”— godane so i decided to subscribe The New Yorker for the digital issues on there site
13:50 πŸ”— kyounko_ has joined #archiveteam-bs
13:50 πŸ”— kyounko_ has quit IRC (Excess Flood)
13:50 πŸ”— alfie has quit IRC (Ping timeout: 260 seconds)
13:50 πŸ”— r3c0d3x has quit IRC (Ping timeout: 260 seconds)
13:50 πŸ”— Meroje has quit IRC (Ping timeout: 260 seconds)
13:50 πŸ”— dan- has quit IRC (Ping timeout: 260 seconds)
13:50 πŸ”— DopefishJ has joined #archiveteam-bs
13:50 πŸ”— swebb sets mode: +o DopefishJ
13:50 πŸ”— Meroje has joined #archiveteam-bs
13:50 πŸ”— kyounko_ has joined #archiveteam-bs
13:51 πŸ”— r3c0d3x has joined #archiveteam-bs
13:51 πŸ”— dan- has joined #archiveteam-bs
13:51 πŸ”— Hecatz has quit IRC (Ping timeout: 260 seconds)
13:51 πŸ”— ld1 has quit IRC (Ping timeout: 260 seconds)
13:51 πŸ”— Muad-Dib has quit IRC (Ping timeout: 260 seconds)
13:51 πŸ”— ZexaronS- has joined #archiveteam-bs
13:51 πŸ”— jsa has quit IRC (Quit: No Ping reply in 180 seconds.)
13:51 πŸ”— zhongfu has quit IRC (Remote host closed the connection)
13:51 πŸ”— ld1 has joined #archiveteam-bs
13:52 πŸ”— jsa has joined #archiveteam-bs
13:52 πŸ”— kyounko has quit IRC (Ping timeout: 260 seconds)
13:52 πŸ”— DFJustin has quit IRC (Ping timeout: 260 seconds)
13:52 πŸ”— ZexaronS has quit IRC (Ping timeout: 260 seconds)
13:52 πŸ”— Hecatz has joined #archiveteam-bs
13:52 πŸ”— alfie has joined #archiveteam-bs
13:52 πŸ”— JAA What is going on with all these ping timeouts?
13:53 πŸ”— zhongfu has joined #archiveteam-bs
13:57 πŸ”— alfie has quit IRC (Ping timeout: 260 seconds)
13:57 πŸ”— alembic has quit IRC (Ping timeout: 260 seconds)
13:57 πŸ”— riking has quit IRC (Ping timeout: 260 seconds)
13:57 πŸ”— ThisAsYou has quit IRC (Ping timeout: 260 seconds)
13:57 πŸ”— midas has quit IRC (Ping timeout: 260 seconds)
13:57 πŸ”— ItsYoda has quit IRC (Ping timeout: 260 seconds)
13:57 πŸ”— JSharp has quit IRC (Ping timeout: 260 seconds)
13:57 πŸ”— riking has joined #archiveteam-bs
13:57 πŸ”— JSharp has joined #archiveteam-bs
13:57 πŸ”— ThisAsYou has joined #archiveteam-bs
13:57 πŸ”— alembic has joined #archiveteam-bs
13:58 πŸ”— DopefishJ has quit IRC (Ping timeout: 260 seconds)
13:58 πŸ”— DrasticAc has quit IRC (Ping timeout: 260 seconds)
13:58 πŸ”— bitspill has quit IRC (Ping timeout: 260 seconds)
13:58 πŸ”— robogoat has quit IRC (Ping timeout: 260 seconds)
13:58 πŸ”— trvz has quit IRC (Ping timeout: 260 seconds)
13:58 πŸ”— Hecatz has quit IRC (Ping timeout: 260 seconds)
13:58 πŸ”— ld1 has quit IRC (Ping timeout: 260 seconds)
13:58 πŸ”— dan- has quit IRC (Ping timeout: 260 seconds)
13:58 πŸ”— pikhq_ has quit IRC (Ping timeout: 260 seconds)
13:58 πŸ”— spacegirl has quit IRC (Ping timeout: 260 seconds)
13:59 πŸ”— zhongfu has quit IRC (Ping timeout: 260 seconds)
13:59 πŸ”— robogoat has joined #archiveteam-bs
14:00 πŸ”— pikhq has joined #archiveteam-bs
14:00 πŸ”— spacegirl has joined #archiveteam-bs
14:02 πŸ”— DFJustin has joined #archiveteam-bs
14:02 πŸ”— swebb sets mode: +o DFJustin
14:02 πŸ”— zhongfu has joined #archiveteam-bs
14:03 πŸ”— ld1 has joined #archiveteam-bs
14:03 πŸ”— midas has joined #archiveteam-bs
14:03 πŸ”— DrasticAc has joined #archiveteam-bs
14:03 πŸ”— bitspill has joined #archiveteam-bs
14:03 πŸ”— Hecatz has joined #archiveteam-bs
14:03 πŸ”— alfie has joined #archiveteam-bs
14:03 πŸ”— ItsYoda has joined #archiveteam-bs
14:04 πŸ”— trvz has joined #archiveteam-bs
14:04 πŸ”— Muad-Dib has joined #archiveteam-bs
14:13 πŸ”— godane so looks like i have this tape from jason : https://archive.org/details/ShigeruMiyamotoGdcKeynote1999
14:14 πŸ”— godane we have the opening of it so it is different then that one
14:17 πŸ”— dan- has joined #archiveteam-bs
14:34 πŸ”— tuluu has quit IRC (Read error: Operation timed out)
14:35 πŸ”— tuluu has joined #archiveteam-bs
14:37 πŸ”— purplebot has joined #archiveteam-bs
14:37 πŸ”— Rai-chan has joined #archiveteam-bs
14:38 πŸ”— dboard2 has joined #archiveteam-bs
14:42 πŸ”— i0npulse has joined #archiveteam-bs
14:48 πŸ”— godane has left
14:48 πŸ”— godane has joined #archiveteam-bs
14:50 πŸ”— bitBaron has joined #archiveteam-bs
14:54 πŸ”— bitBaron has quit IRC (Client Quit)
16:22 πŸ”— Stilett0 is now known as Stiletto
16:26 πŸ”— HCross2 has quit IRC (Ping timeout: 260 seconds)
16:26 πŸ”— mattl has quit IRC (Ping timeout: 260 seconds)
16:26 πŸ”— voltagex has quit IRC (Ping timeout: 260 seconds)
16:26 πŸ”— jiphex has quit IRC (Ping timeout: 260 seconds)
16:27 πŸ”— trvz has quit IRC (Ping timeout: 260 seconds)
16:27 πŸ”— bitspill has quit IRC (Ping timeout: 260 seconds)
16:27 πŸ”— DrasticAc has quit IRC (Ping timeout: 260 seconds)
16:27 πŸ”— r3c0d3x has quit IRC (Ping timeout: 260 seconds)
16:27 πŸ”— tklk has quit IRC (Ping timeout: 260 seconds)
16:27 πŸ”— floogulin has quit IRC (Ping timeout: 260 seconds)
16:27 πŸ”— DedSec has quit IRC (Ping timeout: 260 seconds)
16:27 πŸ”— fallenoak has quit IRC (Ping timeout: 260 seconds)
16:28 πŸ”— ThisAsYou has quit IRC (Ping timeout: 260 seconds)
16:28 πŸ”— alembic has quit IRC (Ping timeout: 260 seconds)
16:28 πŸ”— JSharp has quit IRC (Ping timeout: 260 seconds)
16:28 πŸ”— riking has quit IRC (Ping timeout: 260 seconds)
16:28 πŸ”— SN4T14_ has quit IRC (Ping timeout: 260 seconds)
16:28 πŸ”— BartoCH has quit IRC (Ping timeout: 260 seconds)
16:28 πŸ”— Ctrl-S___ has quit IRC (Ping timeout: 260 seconds)
16:28 πŸ”— deathy has quit IRC (Ping timeout: 260 seconds)
16:28 πŸ”— xarph has quit IRC (Ping timeout: 260 seconds)
16:28 πŸ”— Muad-Dib has quit IRC (Ping timeout: 260 seconds)
16:28 πŸ”— jsa has quit IRC (Ping timeout: 260 seconds)
16:28 πŸ”— Meroje has quit IRC (Ping timeout: 260 seconds)
16:28 πŸ”— victorbje has quit IRC (Ping timeout: 260 seconds)
16:28 πŸ”— johtso has quit IRC (Ping timeout: 260 seconds)
16:28 πŸ”— octarine has quit IRC (Ping timeout: 260 seconds)
16:28 πŸ”— jrwr has quit IRC (Ping timeout: 260 seconds)
16:28 πŸ”— JAA WTF
16:29 πŸ”— BartoCH has joined #archiveteam-bs
16:29 πŸ”— mattl has joined #archiveteam-bs
16:29 πŸ”— deathy has joined #archiveteam-bs
16:29 πŸ”— JSharp has joined #archiveteam-bs
16:29 πŸ”— ThisAsYou has joined #archiveteam-bs
16:29 πŸ”— riking has joined #archiveteam-bs
16:29 πŸ”— jiphex has joined #archiveteam-bs
16:29 πŸ”— voltagex has joined #archiveteam-bs
16:29 πŸ”— alembic has joined #archiveteam-bs
16:29 πŸ”— Ctrl-S___ has joined #archiveteam-bs
16:29 πŸ”— HCross2 has joined #archiveteam-bs
16:29 πŸ”— trvz has joined #archiveteam-bs
16:29 πŸ”— octarine has joined #archiveteam-bs
16:29 πŸ”— victorbje has joined #archiveteam-bs
16:29 πŸ”— r3c0d3x has joined #archiveteam-bs
16:30 πŸ”— Meroje has joined #archiveteam-bs
16:30 πŸ”— floogulin has joined #archiveteam-bs
16:30 πŸ”— tklk has joined #archiveteam-bs
16:30 πŸ”— DrasticAc has joined #archiveteam-bs
16:30 πŸ”— fallenoak has joined #archiveteam-bs
16:30 πŸ”— DedSec has joined #archiveteam-bs
16:30 πŸ”— bitspill has joined #archiveteam-bs
16:30 πŸ”— johtso has joined #archiveteam-bs
16:31 πŸ”— jsa has joined #archiveteam-bs
16:31 πŸ”— SN4T14 has joined #archiveteam-bs
16:34 πŸ”— Muad-Dib has joined #archiveteam-bs
17:03 πŸ”— joepie91 https://motherboard.vice.com/en_us/article/bj7vam/why-twitter-is-the-best-social-media-platform-for-disinformation
17:10 πŸ”— midas2 has quit IRC (Read error: Operation timed out)
17:16 πŸ”— midas2 has joined #archiveteam-bs
17:22 πŸ”— K4k has quit IRC (Read error: Operation timed out)
17:24 πŸ”— K4k has joined #archiveteam-bs
17:30 πŸ”— Coderjo hmm... not quite a user-driven site, but...
17:30 πŸ”— Coderjo http://support.comixology.com/customer/portal/articles/2887181-pull-list-retirement-faq
17:30 πŸ”— Coderjo it somewhat was, with the retailer portal bit, I guess
17:33 πŸ”— Coderjo And who is surprised at Amazon killing this part of the company after acquiring it? Show of hands?
17:45 πŸ”— xarph has joined #archiveteam-bs
18:02 πŸ”— JensRex has joined #archiveteam-bs
18:28 πŸ”— K4k has quit IRC (Quit: WeeChat 1.9.1)
18:29 πŸ”— jrwr has joined #archiveteam-bs
18:30 πŸ”— K4k has joined #archiveteam-bs
18:37 πŸ”— K4k has quit IRC (Quit: WeeChat 1.9.1)
18:37 πŸ”— K4k has joined #archiveteam-bs
18:38 πŸ”— jrochkind has joined #archiveteam-bs
18:38 πŸ”— jrochkind Hello, I am a librarian-programmer, but not professionally involved in digital archiving,a nd dont’ know much about archiveteam. BUT….
18:39 πŸ”— jrochkind Baltimore City Paper, Baltimore’s 40-year old alternative free weekly, just published their last issue, after being bought by Tribune Media/TRONC. The website is still up, with lots and lots of content, but who knows for how long. I want to try to to preserve as much as possible.
18:39 πŸ”— jrochkind Can anyone here help? Either via archiveteam project, or advice, or whatever?
18:42 πŸ”— JAA Thank you. I'll add it to ArchiveBot.
18:43 πŸ”— JAA That might not exactly grab everything though.
18:45 πŸ”— jrochkind Thanks! http://www.citypaper.com/ I will continue exporing various other approaches. Is there a place i can check to see ArchiveBot progress, or find the results of what it managers to get? Sorry, I am starting from zero knowledge about how your tools work, although I am an engineer and understand stuff.
18:45 πŸ”— JAA Yeah, the site uses JS for quite a lot of stuff.
18:46 πŸ”— JAA http://dashboard.at.ninjawedding.org/
18:46 πŸ”— jrochkind awesome, thank you.
18:46 πŸ”— JAA It will be job at569nt11fsuk3019kimdq036 (displayed on the far right), but it might not start until in a few days.
18:46 πŸ”— JAA I'll also throw in some subdomains, e.g. http://events.citypaper.com/
18:47 πŸ”— JAA http://digitaledition.citypaper.com/ definitely won't work with ArchiveBot at all.
18:47 πŸ”— JAA And even on the main site, galleries etc. all only work with JavaScript. :-|
18:47 πŸ”— jrochkind i actually didn’t even know about digitaledition.citypaper.com, ha. There’s def years of content jsut available on HTML pages, although I don’t know about the internal links, if a scraper is going to find them.
18:50 πŸ”— JAA Yeah, I'm not quite sure either.
18:50 πŸ”— jrochkind Here’s an example page I found on google (happens to have a letter to the editor from me, is how I targetted it), which is not currently in the IA wayback machine. It’s just an ordinary HTML page, but I dunno about internal links for a scraper to find it. http://www.citypaper.com/bcp-cms-1-1406281-migrated-story-cp-20121121-mail-20121121-story.html
18:56 πŸ”— JAA I think it should discover quite a large part of the site through http://www.citypaper.com/topic/
18:56 πŸ”— JAA Luckily, the listings within topics are using URL-based pagination, e.g. http://www.citypaper.com/topic/politics-government/government/catherine-e.-pugh-PEPLT00007656-topic.html -> http://www.citypaper.com/topic/politics-government/government/catherine-e.-pugh-PEPLT00007656-topic.html?page=2&
18:59 πŸ”— jrochkind hmm. what if I get a list of every `site:citypaper.com` hit URL from google, perhaps using a google CSE I pay for. Is there anything useful I can do with that?
19:01 πŸ”— JAA Yes, we could make use of that. But keep in mind that search engines (especially Google) have strict rate limits. Scraping it for results is only really possible for smallish websites, in my experience.
19:01 πŸ”— jrochkind oh nice, yeah that topics index with paginated lists of topics is pretty good.
19:01 πŸ”— JAA Specifically, they'll make you fill out captchas, so you can't really automate it.
19:03 πŸ”— jrochkind Google has 30-40K hits for citypaper.com. If you pay google, you actually get an allowed API, no captcha, unless they’ve cancelled that service since I used it last. It will not be expensive to use to just get all the paginated results. (I’d pay for it). Although the allowed API actually might not let me get em all, it might stop you from paginating beyond a certai point. But I might mess with it, if a giant list of
19:03 πŸ”— jrochkind URLs would be useful to you. If I do get a list of a few tens of K of URLs, can I share them with you somehow?
19:03 πŸ”— JAA Ah, right.
19:04 πŸ”— jrochkind $5 per 1000 queries, if it really lets me paginate thorugh 30K at 10 at a time, that’s only $15.
19:06 πŸ”— JAA It could be useful, but if those articles are all (or almost all) discovered through /topic anyway, it's probably not worth it.
19:07 πŸ”— JAA I need to leave for a bit. Maybe someone else has better ideas.
19:07 πŸ”— jrochkind drat, I believe Google actually shut down that API anyway. Even though their docs still doc it, it gives me an error when I try to create one, and I vaguely remember them saying they were gonna shut it down. Ah, Google. Anyway, ok, than you JAA!
19:33 πŸ”— TheLovina has quit IRC (Read error: Connection reset by peer)
19:36 πŸ”— jrochkind JAA if they come back or any other interested parties, they do have a sitemap.xml, although it seems to only have some very limited things in it, its’ not really a sitemap. Dont’ know if your tools will use sitemap.
19:41 πŸ”— jrochkind their robots.txt actualy disallows all those topic/ pages, which seemed the most useful for scraping links. don’t know what archivebot does with robots.txt
19:52 πŸ”— JAA jrochkind: wpull (used by ArchiveBot) knows about both sitemaps and robots.txt. With the options used in ArchiveBot, it grabs both to discover additional content (i.e. ignores Disallow directives).
19:52 πŸ”— jrochkind cool. looking at it, this site might not be very scrapable, it’s a pretty poorly designed site. we’ll find out!
19:53 πŸ”— jrochkind those topics are actually pretty useless. I think it’s just a listing of terms from some standard vocabularly, I have yet to find one that actually leads to articles.
19:53 πŸ”— jrochkind which may be why they are disallowed in robots.txt.
19:54 πŸ”— JAA Yes, most of those "topics" seem useless, but some do have links to articles, e.g. the one I linked above.
19:55 πŸ”— JAA In that case, it seems to be the author of the articles.
19:56 πŸ”— jrochkind ah, cool. it might trip up a scraper in requesting thousands of useless links too though.
19:57 πŸ”— JAA Yeah, but some thousands of links aren't really that problematic in the big picture.
19:58 πŸ”— jrochkind interesting. there are some weird topic links for sure. http://www.citypaper.com/topic/education/schools/high-schools/05005003-topic.html
19:58 πŸ”— jrochkind i wonder who they licensed that vocabulary from haha
20:25 πŸ”— schbirid2 has quit IRC (Quit: Leaving)
20:33 πŸ”— jschwart has joined #archiveteam-bs
20:38 πŸ”— Mateon1 has quit IRC (Ping timeout: 250 seconds)
20:40 πŸ”— Mateon1 has joined #archiveteam-bs
21:16 πŸ”— tuluu has quit IRC (Remote host closed the connection)
21:19 πŸ”— tuluu has joined #archiveteam-bs
21:53 πŸ”— dashcloud has quit IRC (Remote host closed the connection)
22:02 πŸ”— kyounko_ has quit IRC (Ping timeout: 255 seconds)
22:27 πŸ”— drumstick has joined #archiveteam-bs
22:44 πŸ”— jschwart has quit IRC (Konversation terminated!)
23:24 πŸ”— dashcloud has joined #archiveteam-bs
23:28 πŸ”— BlueMaxim has joined #archiveteam-bs
23:47 πŸ”— jrochkind has quit IRC (jrochkind)

irclogger-viewer