#archiveteam-bs 2019-07-23,Tue

↑back Search

Time Nickname Message
00:10 πŸ”— BlueMax has quit IRC (Quit: Leaving)
00:35 πŸ”— HashbangI has quit IRC (Read error: Connection reset by peer)
00:45 πŸ”— hshashash has joined #archiveteam-bs
00:48 πŸ”— hshashash has there been any discussion about archiving podcasts? very culturally significant and they'
00:48 πŸ”— hshashash re not going to be around forever
00:49 πŸ”— hshashash scraping large podcast hosting sites would get the bulk of it with relatively little effort
01:09 πŸ”— BlueMax has joined #archiveteam-bs
02:19 πŸ”— Raccoon i think it'd be cool if someone had/is archiving NPR's hourly news updates radio stream (3~5 minute mp3 each hour)
02:19 πŸ”— Raccoon url never changes
02:22 πŸ”— Raccoon http://public.npr.org/anon.npr-mp3/npr/news/newscast.mp3
02:31 πŸ”— Raccoon has quit IRC (Read error: Connection reset by peer)
02:31 πŸ”— DogsRNice has quit IRC (Read error: Connection reset by peer)
02:32 πŸ”— Raccoon has joined #archiveteam-bs
02:39 πŸ”— shashasha has joined #archiveteam-bs
02:39 πŸ”— yano has quit IRC (Read error: Connection reset by peer)
02:39 πŸ”— yano_ has joined #archiveteam-bs
02:40 πŸ”— fuzzy8021 has quit IRC (Read error: Operation timed out)
02:41 πŸ”— fuzzy8021 has joined #archiveteam-bs
02:41 πŸ”— mundus20- has quit IRC (Read error: Operation timed out)
02:41 πŸ”— mundus201 has joined #archiveteam-bs
02:42 πŸ”— hshashash has quit IRC (Read error: Connection reset by peer)
02:43 πŸ”— paul2520 has quit IRC (Read error: Operation timed out)
02:43 πŸ”— paul2520 has joined #archiveteam-bs
02:45 πŸ”— shashasha has quit IRC (Remote host closed the connection)
02:45 πŸ”— qw3rty112 has joined #archiveteam-bs
02:47 πŸ”— systwi_ has joined #archiveteam-bs
02:47 πŸ”— d5f4a3622 has quit IRC (Read error: Connection reset by peer)
02:47 πŸ”— qw3rty111 has quit IRC (Read error: Operation timed out)
02:47 πŸ”— d5f4a3622 has joined #archiveteam-bs
02:47 πŸ”— Flashfloo has quit IRC (Ping timeout: 601 seconds)
02:47 πŸ”— TigerbotH has quit IRC (Ping timeout: 601 seconds)
02:49 πŸ”— shashasha has joined #archiveteam-bs
02:49 πŸ”— shashasha has quit IRC (Remote host closed the connection)
02:49 πŸ”— kiska11 has joined #archiveteam-bs
02:50 πŸ”— Igloo_ has joined #archiveteam-bs
02:53 πŸ”— ndiddy has quit IRC (Ping timeout: 602 seconds)
02:53 πŸ”— TigerbotH has joined #archiveteam-bs
02:54 πŸ”— paul2520 has quit IRC (Ping timeout: 602 seconds)
02:54 πŸ”— systwi has quit IRC (Ping timeout: 602 seconds)
02:54 πŸ”— luckcolor has quit IRC (Ping timeout: 602 seconds)
02:54 πŸ”— Kenshin has quit IRC (Ping timeout: 602 seconds)
02:54 πŸ”— luckcolor has joined #archiveteam-bs
02:54 πŸ”— kiska1 has quit IRC (Ping timeout: 600 seconds)
02:54 πŸ”— Igloo has quit IRC (Ping timeout: 600 seconds)
02:54 πŸ”— shashasha has joined #archiveteam-bs
02:54 πŸ”— abstract has quit IRC (Ping timeout: 600 seconds)
02:55 πŸ”— ivan- has joined #archiveteam-bs
02:55 πŸ”— Fusl sets mode: +o ivan-
02:56 πŸ”— Kenshin has joined #archiveteam-bs
02:56 πŸ”— Fusl sets mode: +o Kenshin
02:56 πŸ”— ndiddy has joined #archiveteam-bs
02:57 πŸ”— paul2520 has joined #archiveteam-bs
02:57 πŸ”— joepie91 has quit IRC (Ping timeout: 600 seconds)
02:58 πŸ”— ivan_ has quit IRC (Ping timeout: 600 seconds)
02:58 πŸ”— abstract has joined #archiveteam-bs
02:59 πŸ”— PotcFdk has quit IRC (Ping timeout: 600 seconds)
02:59 πŸ”— ivan- is now known as ivan_
03:03 πŸ”— joepie91 has joined #archiveteam-bs
03:06 πŸ”— PotcFdk has joined #archiveteam-bs
03:20 πŸ”— step has quit IRC (Read error: Operation timed out)
03:47 πŸ”— wyatt8740 has joined #archiveteam-bs
03:52 πŸ”— wyatt8740 has quit IRC (Ceci n'est pas un IRC quit message.)
03:53 πŸ”— wyatt8740 has joined #archiveteam-bs
03:56 πŸ”— qw3rty113 has joined #archiveteam-bs
03:58 πŸ”— systwi_ is now known as systwi
04:00 πŸ”— qw3rty112 has quit IRC (Ping timeout: 600 seconds)
04:19 πŸ”— wyatt8740 has quit IRC (Read error: Operation timed out)
04:37 πŸ”— kiskabak has quit IRC (Remote host closed the connection)
04:37 πŸ”— kiskabak has joined #archiveteam-bs
04:37 πŸ”— Fusl sets mode: +o kiskabak
04:38 πŸ”— kiska11 is now known as kiska1
05:12 πŸ”— fredgido has quit IRC (Ping timeout: 252 seconds)
05:12 πŸ”— fredgido has joined #archiveteam-bs
05:15 πŸ”— Rotietip has joined #archiveteam-bs
05:16 πŸ”— Rotietip Hello everyone, there are some sites of which I am a little worried about their future/long-term stability, could anyone initiate a work in the Warrior for http://www.mundoparanormal.com http://grupoelron.org/ and http://www.videosporno.tv/ ?
05:27 πŸ”— wyatt8740 has joined #archiveteam-bs
05:40 πŸ”— terry1 has quit IRC (Read error: Operation timed out)
05:50 πŸ”— terry1 has joined #archiveteam-bs
05:50 πŸ”— PhrackD has quit IRC (Read error: Operation timed out)
05:52 πŸ”— SketchCow Well, the last one is a straight up porn site
05:52 πŸ”— PhrackD has joined #archiveteam-bs
05:53 πŸ”— SketchCow Other two went in
05:56 πŸ”— Rotietip I know, but it has not been updated for more than a year and I am worried that it will be closed at any time, in addition to the fact that it only has videos embedded from external tubes, so the backup should not weigh too much.
05:57 πŸ”— Flashfire Use the save now feature is my suggestion. I dont speak for archiveteam but I havent seen us do much to save porn
05:57 πŸ”— Flashfire Except for EroShare I think it was
06:06 πŸ”— Rotietip Flashfire: It is a site with thousands of entries, if you refer to the tool that has archive.org it would take me days to save the whole site but it doesn't matter. By the way, can ArchiveBot save individual boards from 8ch.net (including the threads archive that is loaded with javascript)?
06:06 πŸ”— Flashfire https://web.archive.org/save/ I was refering to this yes
06:10 πŸ”— Flashfire Rotietip https://web.archive.org/web/*/www.videosporno.tv/* I wouldnt worry too much about that site seems reasonably covered for a site that is mostly embeds from other sites
06:11 πŸ”— m007a83 has quit IRC (Ping timeout: 252 seconds)
06:15 πŸ”— Rotietip At least that leaves me calmer, I imagined it would be a lesser known site (like the other two) and therefore I thought I would not be so indexed.
06:15 πŸ”— ivan_ last I checked IA doesn't take 8ch
06:16 πŸ”— ivan_ oh never mind it's in wayback
06:21 πŸ”— m007a83 has joined #archiveteam-bs
06:32 πŸ”— step has joined #archiveteam-bs
06:33 πŸ”— Rotietip 8ch.net is like 4chan but everyone can create their own board (so there are thousands of all kinds). The issue is that if the owner of a board does not appear in two weeks it becomes "reclaimable" by anyone. There is also the fact that old threads are indexed in a file (although this is configurable) like this: https://8ch.net/hisparefugio/archive/index.html (if it takes a long time to load, it is because there are thousands of
06:33 πŸ”— Rotietip threads which are in JSON format within index.html).
06:33 πŸ”— Rotietip The idea would be to save some boards currently inactive and with valuable content before some asshole claims them and sends everything to hell for "lulz/trolling"
06:42 πŸ”— Flashfire I cant help there at all its blocked by the Australian Government
06:46 πŸ”— Rotietip No problem, however it is possible that 8chan requires more specialized tools than wget or ArchiveBot (at least to get the thread links)
06:56 πŸ”— Flashfire I would assume chromebot it being headless Chromium might be able to do it
07:32 πŸ”— Rotietip Speaking of Chromebot, with that would it be possible to archive a site like https://radio.garden/ ?
07:40 πŸ”— PurpleSym Possible, Rotietip. The site needs a lot of user interaction though, so custom click selectors are necessary.
07:41 πŸ”— Dimtree has quit IRC (Ping timeout: 745 seconds)
08:06 πŸ”— Dimtree has joined #archiveteam-bs
08:30 πŸ”— Sokar has quit IRC (Quit: KVIrc 4.9.3 Aria http://www.kvirc.net/)
08:52 πŸ”— Flashfire can someone add http://www.fuzzymemories.tv/ to archivebot and possibly chromebot looks unique and interesting though not sure how the videos will go
09:58 πŸ”— Fusl_ what do you all think about archiving https://picosong.com/ ? i did a quick dive through a majestic report of 51k urls and saw a lot of content that may get DMCAd when its up at the IA
09:59 πŸ”— Fusl JAA Kaz SketchCow?
10:03 πŸ”— killsushi has quit IRC (Quit: Leaving)
10:09 πŸ”— Dj-Wawa has joined #archiveteam-bs
10:26 πŸ”— t3 has quit IRC (Quit: Connection closed for inactivity)
10:29 πŸ”— Flashfire Does a DMCA cost IA anything? Would the benefits outweigh the risks?
10:49 πŸ”— Rotietip In the case of radio.garden the links of the radios that appear in the lower right corner can be detected with ".channel-list-item-link", for cities with many radios you can go to the next page with ".channel-list-navigation > .mod-next" and the most difficult thing would be to get the green dots to click since it seems to be all drawn on a canvas element (which can be found with ".cesium-widget")
10:54 πŸ”— Rotietip A project like that is not something I can dedicate a lot of time to at the moment but there I leave the idea in case someone wants to make a script to emulate the necessary interactions with the site to be able to archive it with Chromebot.
11:18 πŸ”— Rotietip has quit IRC (Read error: Operation timed out)
11:27 πŸ”— BlueMax has quit IRC (Quit: Leaving)
11:31 πŸ”— Rotietip has joined #archiveteam-bs
12:15 πŸ”— TC01 has joined #archiveteam-bs
12:17 πŸ”— Rotietip has quit IRC (Quit: KVIrc 5.0.0 Aria http://www.kvirc.net/)
12:38 πŸ”— Dj-Wawa has quit IRC (Quit: Connection closed for inactivity)
12:43 πŸ”— Dj-Wawa has joined #archiveteam-bs
12:57 πŸ”— yano_ is now known as yano
13:01 πŸ”— killsushi has joined #archiveteam-bs
13:38 πŸ”— SketchCow Oh youuuuuuuu
13:45 πŸ”— SketchCow https://www.pgmusic.com/forums/ubbthreads.php?ubb=showflat&Number=319856
13:45 πŸ”— SketchCow My attitude about Picosong is into the wayback.
13:47 πŸ”— Igloo_ is now known as Igloo
13:48 πŸ”— JAA This sounds like a task for qwarc, unless someone wants to set up a warrior project.
13:49 πŸ”— Igloo I can look when I get home as I thought qwarc was buggy?
13:49 πŸ”— Igloo #notsosung ? :)
13:53 πŸ”— JAA Yes, it is.
13:53 πŸ”— JAA (But I blame aiohttp.)
14:00 πŸ”— JAA Based on Fusl's Majestic list, the song IDs are [0-9A-Za-z]{,4}. There are also some w[0-9A-Za-z]{4} (more recent uploads it seems). Inexistent IDs redirect back to the homepage.
14:02 πŸ”— Fusl ,5
14:02 πŸ”— Fusl not ,4
14:02 πŸ”— Fusl some of them are 5 chars
14:03 πŸ”— JAA Yes, but all of them start with "w".
14:03 πŸ”— JAA Well, all that I found.
14:04 πŸ”— JAA The IDs also appear to be somewhat sequential.
14:04 πŸ”— Igloo Got an example?
14:04 πŸ”— Fusl weird
14:04 πŸ”— arkiver picosong sounds nice
14:04 πŸ”— arkiver does it require a warrior project?
14:05 πŸ”— Fusl so it's w?[0-9A-Za-z]{1,4}
14:05 πŸ”— JAA Yup
14:05 πŸ”— JAA arkiver: Don't start another one. :-P
14:05 πŸ”— arkiver well if we need it IΒ΄m fine with getting one running
14:05 πŸ”— arkiver I have time
14:05 πŸ”— JAA Yeah, let's get the others off the ground first.
14:05 πŸ”— arkiver but if we donΒ΄t need it, then fine too
14:05 πŸ”— JAA This one will stay online until October.
14:06 πŸ”— arkiver yeah
14:06 πŸ”— arkiver but letΒ΄s not forget about it
14:06 πŸ”— Igloo We have time, I almost have a full list of freeml too
14:06 πŸ”— Igloo Which is due to due in December.
14:06 πŸ”— Igloo *to die
14:07 πŸ”— arkiver hmm yes
14:07 πŸ”— JAA Should be ~30 million IDs to attempt on picosong (2 * 62**4). I'll try with qwarc for everything but the actual audio files (because qwarc's very slow and leaking memory left and right on processing large files).
14:10 πŸ”— arkiver yes
14:11 πŸ”— Fusl "I have time" there's still sony sketch and reddit :P
14:12 πŸ”— Igloo Sketch is the important one as that deadline is looming
14:15 πŸ”— arkiver alright
14:17 πŸ”— JAA There are also Disqus comments on picosong. Do we have a method for grabbing those yet?
14:21 πŸ”— arkiver no
14:21 πŸ”— arkiver afaik
14:24 πŸ”— JAA :-/
14:24 πŸ”— JAA At least it seems that most songs don't have any comments.
14:24 πŸ”— JAA Found two examples so far with one each: https://picosong.com/ytid/ https://picosong.com/jLBh/
14:26 πŸ”— JAA Ok, 19 on this one: https://picosong.com/wK3ag/
14:26 πŸ”— JAA Anyway
14:26 πŸ”— JAA I'll just leave https://disqus.com/home/forum/picosong/ here.
14:44 πŸ”— kiska sets mode: +o Igloo
14:55 πŸ”— Mayonaise has quit IRC (Read error: Connection reset by peer)
14:56 πŸ”— Mayonaise has joined #archiveteam-bs
17:18 πŸ”— Dj-Wawa has quit IRC (Quit: Connection closed for inactivity)
17:24 πŸ”— bithippo has joined #archiveteam-bs
17:30 πŸ”— bithippo has quit IRC (Textual IRC Client: www.textualapp.com)
17:57 πŸ”— schbirid has joined #archiveteam-bs
18:19 πŸ”— wyatt8740 has quit IRC (Read error: Operation timed out)
18:24 πŸ”— wyatt8740 has joined #archiveteam-bs
18:52 πŸ”— wyatt8740 has quit IRC (Read error: Operation timed out)
19:05 πŸ”— wyatt8740 has joined #archiveteam-bs
19:18 πŸ”— wyatt8740 has quit IRC (Read error: Operation timed out)
19:44 πŸ”— schbirid has quit IRC (Remote host closed the connection)
22:15 πŸ”— SketchCow Fusl: Do we have lingering "need admin IA work done" tasks
22:15 πŸ”— Fusl yes
22:15 πŸ”— Fusl sonysketch
22:15 πŸ”— Fusl and archiveteam_sonysketchimg
22:16 πŸ”— Fusl i PMd you the email addresses list on twitter
22:16 πŸ”— Fusl and we'll need another one for reddit soon i believe
22:21 πŸ”— SketchCow It's a huge pain in the ass
22:21 πŸ”— SketchCow I wish I could make it a command line but it's not possible
22:22 πŸ”— SketchCow I'm thinking of making an archiveteam holdingbay collection
22:22 πŸ”— SketchCow Then all your shit can go in there, and I write scripts to transfer over to final resting place.
22:22 πŸ”— SketchCow Wayback doesn't care
22:24 πŸ”— SketchCow Yes, I'm doing this
22:25 πŸ”— SketchCow Less pain down the line and you're not all sitting around kicking the can waiting for me
22:26 πŸ”— SketchCow https://archive.org/details/archiveteam_inbox
22:29 πŸ”— JAA Yeah, seems like a good idea.
22:29 πŸ”— SketchCow Fusl, Igloo, HCross and Kiska all have access to this collection now.
22:30 πŸ”— SketchCow Is there anyone else doing high-grade uploads we need to add to it
22:30 πŸ”— * Kaz waves
22:30 πŸ”— Kaz checking address now
22:31 πŸ”— Kaz pm'd
22:31 πŸ”— JAA Me too.
22:31 πŸ”— JAA Also PM'd.
22:36 πŸ”— JAA SketchCow: By the way, you missed this beautiful question about the Tumblr collection over in #tumbledown yesterday:
22:36 πŸ”— JAA 2019-07-22 20:50:35 UTC < rbarreto> hi everyone - i'm a researcher on the impacts of SESTA / FOSTA and I'm trying to get data on the blogs that were impacted by NSFW
22:36 πŸ”— JAA 2019-07-22 20:50:55 UTC < rbarreto> but when I look at what the internet archive has ...it's pretty much just images of this one guy
22:49 πŸ”— ivan_ what
22:49 πŸ”— ivan_ not enough tumblr porn on IA? I doubt this very much
22:53 πŸ”— JAA They're referring to the images on this page: https://archive.org/details/archiveteam_tumblr
22:55 πŸ”— ivan_ haha
22:55 πŸ”— ivan_ people take thumbnails very seriously these days
22:57 πŸ”— second has quit IRC (Quit: ZNC 1.6.5 - http://znc.in)
23:06 πŸ”— second has joined #archiveteam-bs
23:27 πŸ”— BlueMax has joined #archiveteam-bs
23:34 πŸ”— wyatt8740 has joined #archiveteam-bs
23:41 πŸ”— VerifiedJ has quit IRC (Quit: Leaving)
23:45 πŸ”— wyatt8740 has quit IRC (Read error: Operation timed out)
23:47 πŸ”— wyatt8740 has joined #archiveteam-bs

irclogger-viewer