#archiveteam-bs 2019-05-21,Tue

↑back Search

Time Nickname Message
00:01 🔗 RichardG has quit IRC (Ping timeout: 268 seconds)
00:05 🔗 RichardG has joined #archiveteam-bs
00:24 🔗 killsushi has quit IRC (Quit: Leaving)
00:41 🔗 enowaldo has joined #archiveteam-bs
00:43 🔗 BlueMax has joined #archiveteam-bs
00:48 🔗 enowaldo has quit IRC (Read error: Operation timed out)
00:58 🔗 benjins has quit IRC (Quit: Leaving)
01:03 🔗 Zerote_ has quit IRC (Ping timeout: 600 seconds)
01:45 🔗 icedice has quit IRC (Quit: Leaving)
01:48 🔗 enowaldo has joined #archiveteam-bs
01:50 🔗 webdownlo has quit IRC (Quit: Page closed)
02:04 🔗 Kaz https://www.bbc.co.uk/news/world-europe-48345660
02:06 🔗 SketchCow godane: https://twitter.com/textfiles/status/1130655750475472896
02:08 🔗 kisspunch JAA: maybe put in a ArchiveBot patch for ?dl=0 -> ?dl=1 ?
02:11 🔗 godane i noticed it
02:12 🔗 godane he sent me a message from vanderbilt library askus service
02:13 🔗 godane i forwarded the email to your address
02:14 🔗 kisspunch If anyone's at Carnegie Mellon I'm trying to get a paper on compression, msg me
02:16 🔗 SketchCow godane: Please just ignore him, forward any reviews he sends, etc.
02:16 🔗 SketchCow I'm gathering all of this.
02:16 🔗 SketchCow (I also know where he lives now)
02:17 🔗 godane i just hope he is not in New england
02:18 🔗 godane anyways your getting tons of japanese manuals
02:18 🔗 godane with metadata mostly
02:19 🔗 godane so its going better then amazon manuals at least
02:21 🔗 godane so i found some japanese hiphop band page : http://www.harlem.co.jp/
02:22 🔗 godane SketchCow: you may have something for your hiphop tape collection here : https://soundcloud.com/club_harlem
02:29 🔗 enowaldo has quit IRC (Read error: Operation timed out)
03:05 🔗 Anthony1 has joined #archiveteam-bs
03:21 🔗 qw3rty111 has joined #archiveteam-bs
03:27 🔗 marked1 has quit IRC (Quit: WeeChat 2.4)
03:28 🔗 qw3rty119 has quit IRC (Read error: Operation timed out)
03:31 🔗 Anthony1 has quit IRC (Quit: Page closed)
03:50 🔗 odemgi_ has joined #archiveteam-bs
03:53 🔗 odemgi has quit IRC (Ping timeout: 252 seconds)
03:53 🔗 odemg has quit IRC (Ping timeout: 265 seconds)
03:56 🔗 enowaldo has joined #archiveteam-bs
04:05 🔗 odemg has joined #archiveteam-bs
04:08 🔗 enowaldo has quit IRC (Read error: Operation timed out)
04:14 🔗 halt_ has quit IRC (irc.efnet.nl efnet.deic.eu)
04:23 🔗 systwi has joined #archiveteam-bs
04:53 🔗 paul2520 has quit IRC (Read error: Operation timed out)
04:53 🔗 Lord_Nigh has quit IRC (Read error: Operation timed out)
04:53 🔗 dxrt_ has quit IRC (Write error: Broken pipe)
04:53 🔗 ivan has quit IRC (Write error: Broken pipe)
04:53 🔗 kiska1 has quit IRC (Read error: Operation timed out)
04:53 🔗 TC01 has quit IRC (Read error: Operation timed out)
04:53 🔗 colona has quit IRC (Read error: Operation timed out)
04:53 🔗 JH88 has quit IRC (Read error: Operation timed out)
04:53 🔗 HashbangI has quit IRC (Read error: Operation timed out)
04:53 🔗 ivan has joined #archiveteam-bs
04:54 🔗 systwi has quit IRC (Read error: Operation timed out)
04:54 🔗 wyatt8740 has joined #archiveteam-bs
04:54 🔗 TigerbotH has quit IRC (Read error: Operation timed out)
04:54 🔗 Yurume has quit IRC (Read error: Operation timed out)
04:55 🔗 PhrackD has quit IRC (Read error: Operation timed out)
04:55 🔗 Lord_Nigh has joined #archiveteam-bs
04:55 🔗 Yurume has joined #archiveteam-bs
04:55 🔗 PotcFdk has quit IRC (Read error: Operation timed out)
04:55 🔗 step has quit IRC (Read error: Operation timed out)
04:57 🔗 TC01 has joined #archiveteam-bs
04:57 🔗 colona has joined #archiveteam-bs
04:58 🔗 qw3rty111 has quit IRC (Ping timeout: 600 seconds)
05:04 🔗 Fusl SketchCow: that tweet, mind looping me in on the story about it?
05:05 🔗 wyatt8740 has quit IRC (Read error: Operation timed out)
05:32 🔗 kpcyrd ^ also interested in this
05:34 🔗 Flashfire I assume it has something to do with the crazy whackjob threatening Godane
05:52 🔗 qw3rty111 has joined #archiveteam-bs
05:52 🔗 kiska1 has joined #archiveteam-bs
05:52 🔗 svchfoo3 sets mode: +o kiska1
05:56 🔗 dxrt_ has joined #archiveteam-bs
05:56 🔗 dxrt sets mode: +o dxrt_
06:01 🔗 step has joined #archiveteam-bs
06:02 🔗 PhrackD has joined #archiveteam-bs
06:02 🔗 HashbangI has joined #archiveteam-bs
06:02 🔗 systwi has joined #archiveteam-bs
06:02 🔗 TigerbotH has joined #archiveteam-bs
06:05 🔗 enowaldo has joined #archiveteam-bs
06:06 🔗 PotcFdk has joined #archiveteam-bs
06:10 🔗 enowaldo has quit IRC (Ping timeout: 265 seconds)
06:14 🔗 paul2520 has joined #archiveteam-bs
06:21 🔗 Zerote_ has joined #archiveteam-bs
06:58 🔗 bsmith093 Flashfire: there's a whackjob? fun story or sad, realistic one?
07:38 🔗 Zerote_ has quit IRC (Ping timeout: 600 seconds)
07:46 🔗 Zerote has joined #archiveteam-bs
07:50 🔗 BlueMax has quit IRC (Quit: Leaving)
07:59 🔗 enowaldo has joined #archiveteam-bs
08:18 🔗 Kaz eientei95: nvm I can't tabcomplete good
08:18 🔗 eientei95 lol
08:18 🔗 Kaz No idea if we ever got that dump though - that edit seems to be from some point last year
08:18 🔗 eientei95 Huh
08:19 🔗 enowaldo has quit IRC (Read error: Operation timed out)
08:20 🔗 Kaz Frogging: any idea?
08:31 🔗 Kaz I've pm'd the guy anyway
08:42 🔗 HashbangI has quit IRC (Read error: Connection reset by peer)
08:43 🔗 enowaldo has joined #archiveteam-bs
08:56 🔗 enowaldo has quit IRC (Read error: Operation timed out)
10:10 🔗 enowaldo has joined #archiveteam-bs
10:32 🔗 enowaldo has quit IRC (Read error: Operation timed out)
10:45 🔗 zerkalo has joined #archiveteam-bs
10:56 🔗 Oddly has joined #archiveteam-bs
11:12 🔗 enowaldo has joined #archiveteam-bs
11:12 🔗 wp494 has quit IRC (Ping timeout: 615 seconds)
11:13 🔗 wp494 has joined #archiveteam-bs
11:24 🔗 enowaldo has quit IRC (Ping timeout: 252 seconds)
11:52 🔗 enowaldo has joined #archiveteam-bs
12:06 🔗 JAA kisspunch: Won't help. As mentioned, AB doesn't grab the actual download on ?dl=1 links. It's due to the --no-parent flag for wpull and Dropbox's redirect setup.
12:13 🔗 m007a83_ is now known as m007a83
12:37 🔗 enowaldo has quit IRC (Ping timeout: 252 seconds)
12:56 🔗 HashbangI has joined #archiveteam-bs
12:59 🔗 cfarquhar has quit IRC (Read error: Operation timed out)
12:59 🔗 cfarquhar has joined #archiveteam-bs
13:14 🔗 Dj-Wawa has joined #archiveteam-bs
13:28 🔗 enowaldo has joined #archiveteam-bs
14:07 🔗 enowaldo has quit IRC (Ping timeout: 268 seconds)
14:16 🔗 SketchCow So, the thing I tweeted about
14:16 🔗 SketchCow Godane uploads a lot of video. A lot of everything, a saint.
14:17 🔗 SketchCow So, it's pretty usual that a bunch of tapes come out, and some of our more... special users post "reviews" like "I AM REQUESTING (OTHER SHOW) PLZ"
14:17 🔗 SketchCow Like, I get it, self-centered nerds. You post 50 live shows of The Blathering Blootz playing at various NJ nightclubs and someone "reviews" it going "DO YOU HAVE THEIR 1984 CBGB TAPE"
14:18 🔗 SketchCow But we got a guy
14:18 🔗 SketchCow And he's super into wanting a specific run of Nightlight
14:18 🔗 SketchCow Nightline
14:18 🔗 SketchCow And he has, for months, months mind you, posting a review on almost everything, demanding we post Nightlight from two specific years
14:18 🔗 SketchCow Reviews, like... reviews on everything uploaded by godane. Reviews on my uploads.
14:18 🔗 SketchCow And they started getting weird.
14:19 🔗 SketchCow Literally "Am I going to have to kill you, am I going to track you down and murder you if you don't post these tapes"
14:19 🔗 SketchCow So now he just upped the game
14:19 🔗 SketchCow He just wrote to a third-party archive AS me, and demanded things
14:19 🔗 SketchCow And now that archive is flipping out
14:19 🔗 SketchCow So now I have to step careful
14:20 🔗 SketchCow But it's very likely, thousands of items will go dark because of him
14:21 🔗 Igloo Can't we do something about *him*?
14:21 🔗 SketchCow I mean, I am
14:22 🔗 SketchCow Hence my tweet?
14:22 🔗 SketchCow Because now he's in my fuckin' backyard?
14:22 🔗 Igloo Forgive me, I don't follow you on twitter
14:22 🔗 SketchCow You're missing out
14:22 🔗 Igloo (Or even go on twitter all much)
14:22 🔗 SketchCow I'm the best thing since the thing that slices bread
14:23 🔗 eientei95 Igloo: https://twitter.com/textfiles/status/1130655750475472896
14:23 🔗 Igloo I shall follow
14:23 🔗 eientei95 SketchCannedBread
14:23 🔗 Fusl JAA: want to write a qwarc script for minecraftforum.net?
14:23 🔗 Fusl most of it was grabbed in my earlier run
14:23 🔗 Igloo I think we just warrior the minecraftforum
14:23 🔗 Fusl but there may be some new stuff
14:24 🔗 JAA Sure, when it goes read-only I guess?
14:24 🔗 Fusl i'm not giving them another bit of trust on this
14:24 🔗 Fusl last time we did, remember
14:24 🔗 Fusl 1/3rd of the entire forum vanished
14:24 🔗 JAA Right
14:25 🔗 Fusl Igloo: it's fucking huge, it took me weeks to get all forums archived with multiple grab-site jobs running across several systems
14:25 🔗 Fusl the html parsing of it is very cpu hungry
14:25 🔗 Igloo #shmimecraft ?
14:25 🔗 Fusl #cursed
14:26 🔗 Zerote has quit IRC (Read error: Operation timed out)
14:27 🔗 jrwr Why does that photo exist
14:27 🔗 jrwr Jason you mad man
14:27 🔗 Fusl :D
14:27 🔗 Fusl i love that photo :D
14:27 🔗 jrwr Did you see https://www.reddit.com/r/DataHoarder/comments/bn1f8j/introducing_datahoardercloud_a_new_standard_for/
14:34 🔗 jrwr I love people who put IPFS out there as a solution to stuff like this. I tried to use IPFS as a method to provide backups of a service I ran, holy shit the diskIO/CPU usage was insane for a 500GB collection of crap
14:39 🔗 enowaldo has joined #archiveteam-bs
14:41 🔗 deevious has quit IRC (Quit: deevious)
14:49 🔗 enowaldo has quit IRC (Ping timeout: 252 seconds)
14:54 🔗 Zerote has joined #archiveteam-bs
14:57 🔗 SketchCow Two responses.
14:57 🔗 SketchCow First, I really like the IPFS guy, as a person and as a thinker and as a dedication to his craft and goals.
14:57 🔗 SketchCow Second, the amount of secret bro money that drives a lot of tech and hides the true cost of items and usage that people then come to concoct "plans" against is utterly ridiculous.
14:58 🔗 SketchCow IPFS has a lot of inefficiencies because it's focused on being distributed and being outside of the, let's face it, law.
14:58 🔗 SketchCow Like, law WANTS centralization and two companies whose pencil necked geeks they can strong-arm into giving up the keys because brown people did something wrong
14:58 🔗 Oddly has quit IRC (Read error: Operation timed out)
14:59 🔗 SketchCow But I think that it's distribution first, cheapness second, and that will always be an issue.
14:59 🔗 SketchCow We've discovered that with internetarchive.bak and we'll discover it elsewhere.
15:09 🔗 enowaldo has joined #archiveteam-bs
15:40 🔗 enowaldo has quit IRC (Read error: Operation timed out)
15:59 🔗 tomaspark has quit IRC (Read error: Connection reset by peer)
16:03 🔗 Dj-Wawa has quit IRC (Quit: Connection closed for inactivity)
17:23 🔗 joshua_ is it possible to --no-offsitelinks a running ArchiveBot run without restrting it? I am concerned that given the volume of offsite links to a 2m+ post forum, ew0hhphkhlajc4w23hfr7e6km will not finish before the shutdown
17:25 🔗 enowaldo has joined #archiveteam-bs
17:29 🔗 schbirid mozilla is turning into Google Sunsets Everything #2
17:57 🔗 Lord_Nigh didn't google inherit the screenshots thing from when they acquired pocket?
17:57 🔗 Lord_Nigh er
17:57 🔗 Lord_Nigh mozilla
17:58 🔗 Lord_Nigh didn't mozilla inherit the screenshots thing from when they acquired pocket?
17:59 🔗 Lord_Nigh Maybe mozilla is trying to limit their liability by not hosting content created by users on their sync servers where they can be transferred from machine to machine, but that doesn't make sense since you could abuse the url/history sync database stuff to transfer data anyway
17:59 🔗 Lord_Nigh by filling the history with fake urls with data encoded as base64
17:59 🔗 Lord_Nigh in the url itself
17:59 🔗 Lord_Nigh i suppose screenshots allow it to be done more easily...
17:59 🔗 JAA joshua_: No, that's not possible.
18:00 🔗 SketchCow https://www.vice.com/en_us/article/a3x7qz/newly-surfaced-arcade-management-documents-from-the-1970s-predicted-a-wild-future-for-video-games
18:00 🔗 Lord_Nigh :o
18:00 🔗 Lord_Nigh *furiously reads*
18:00 🔗 * Lord_Nigh pokes Stiletto
18:00 🔗 SketchCow This was the stuff I was talking about
18:00 🔗 SketchCow I tweeted this
18:00 🔗 SketchCow Stay up on it
18:01 🔗 joshua_ JAA: ah, that's a shame. well, if it's looking sketch in the next few days, we might want to turn up the concurrency / turn down the delays on that job to intercept before the turndown
18:01 🔗 joshua_ (also the archive is getting huge already, and I am not sure how much of the actual forum it has already archived; is there a max size limit on any one archivebot job that we're worried about hitting?)
18:02 🔗 JAA Nope, no size limit. We've had jobs of several TB.
18:02 🔗 Lord_Nigh isn't the max size limit 'when the disk fills, at which point the job crashes/hangs and has to be manually unstuck to upload'?
18:02 🔗 JAA The data is uploaded to IA while the job's still running.
18:02 🔗 Lord_Nigh ah, i didn't realize that
18:02 🔗 joshua_ phew. I was reading the archivebot wiki page and it indicates disks of 100GB recommended or so
18:02 🔗 joshua_ yeah
18:03 🔗 JAA But yes, that is the limit. Specifically, if the log file or URL DB gets too large.
18:03 🔗 JAA And of course those jobs take forever.
18:04 🔗 Lord_Nigh ... does the code currently slow down the url pulling if the disk is dangerously full with a lot of data still pending to upload to IA? i.e. to let the uploadable-and-clearable parts of the disk get freed up
18:04 🔗 Lord_Nigh using the disk like a giant buffer
18:04 🔗 joshua_ it appears to be mirroring ahlf of the known universe of some site's video archive
18:04 🔗 joshua_ well, no kill like overkill.
18:05 🔗 Lord_Nigh the url db can't do that since its a database, so the limit of total size would be when the urldb fills the whole disk leaving no space for files to be pulled
18:05 🔗 joshua_ (also, hi, Lord_Nigh, ltns)
18:05 🔗 Lord_Nigh the only way i can see to get around that would be to 'flush' the database when it gets too big AND a specific subdirectory on the server is completely pulled, but this would be complicated
18:06 🔗 Lord_Nigh hi joshua_ ! i don't think we've spoken in quite some time (I think the last time we talked was me asking if you still had a copy of the gnuboy SVN?)
18:06 🔗 Lord_Nigh which was ~10 years ago
18:07 🔗 Lord_Nigh how are things?
18:09 🔗 JAA Lord_Nigh: wpull will pause if it detects that there's less than 500 MiB of free disk space (by default, the exact value is customisable). However, that check only runs between URLs, so it can still happen that the disk fills up if it's grabbing huge files.
18:13 🔗 JAA The exact limit is a bit more complicated. There needs to be enough space also for the temporary files and the log file.
18:15 🔗 Lord_Nigh hmm. for web servers which report the size of the file being retrieved, this could be used to ensure no single file will fill the entire rest of the disk...
18:15 🔗 Lord_Nigh but with many retrievals happening in parallel this is still a problem
18:18 🔗 icedice has joined #archiveteam-bs
18:19 🔗 RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue)
18:29 🔗 Zerote_ has joined #archiveteam-bs
18:33 🔗 Zerote has quit IRC (Ping timeout: 252 seconds)
18:36 🔗 Oddly has joined #archiveteam-bs
18:50 🔗 Dj-Wawa has joined #archiveteam-bs
19:39 🔗 JAA Do we want a copy of the EU parliament's video library (committee meetings, plenary speeches, etc.)? Could be huge obviously.
19:39 🔗 JAA E.g. this: http://www.europarl.europa.eu/ep-live/en/plenary/
19:41 🔗 joshua_ Lord_Nigh, things are ok; work is work, etc. yeah, I am sad that the gbdev channel went full Nazi...
19:45 🔗 Sanqui joshua_: there's always the discord
19:46 🔗 Sanqui which only contains trace amounts of beware
19:46 🔗 joshua_ yeah, though, discord: "like IRC, but with a mandatory 600MB client RAM footprint!"
19:47 🔗 astrid other day i discovered ripcord https://cancel.fm/ripcord/
19:47 🔗 astrid #1 feature is "not made from a web browser"
19:49 🔗 joshua_ tempting.
19:49 🔗 Kaz I see discord using 256M, vs irccloud using 400M
19:49 🔗 Kaz but hey, mobile app /shrug
19:52 🔗 enowaldo_ has joined #archiveteam-bs
19:55 🔗 enowaldo has quit IRC (Ping timeout: 252 seconds)
19:59 🔗 JAA irssi: 31 MiB... Just saying...
20:01 🔗 astrid my weechat is taking 87 megs right now
20:01 🔗 astrid that's RSS; virtual size is about 300 megs
20:01 🔗 JAA Yeah, same here, 31 MiB RSS, 154 MiB VIRT
20:03 🔗 kiska thelounge is using 134M RSS, 1007M VIRT
20:04 🔗 JAA Server or client?
20:04 🔗 JAA Also, let's move this discussion to -ot.
20:06 🔗 godane JAA: i'm looking at eu parliament videos library
20:09 🔗 JAA Cheers. I know that VideoBot used to upload some videos from europarl, but that stopped last year, and I have no idea how complete it is.
20:09 🔗 godane JAA: code i used to grab 2008-09-01 date urls : curl -s http://www.europarl.europa.eu/ep-live/en/plenary/video?date=01-09-2008 | grep debate= | sed 's|.*href="|http://www.europarl.europa.eu|g' | sed 's|".*||g'
20:10 🔗 godane they have a day-month-year date in there urls
20:10 🔗 JAA I was going to just extract the VOD URLs from the ArchiveBot log for the europarl.europa.eu job.
20:10 🔗 JAA I ignored those earlier.
20:12 🔗 godane code to grab videos from the debate urls: curl -s http://www.europarl.europa.eu/ep-live/en/plenary/video?debate=1220281482063 | grep mp4 | sed 's|> <|>\n<|g' | sed 's|.*value="||g' | sed 's|".*||g'
20:13 🔗 Coderjo has quit IRC (Ping timeout: 252 seconds)
20:13 🔗 godane i could use this code to start making daily dump videos of eu parliament video library
20:14 🔗 zerkalo has quit IRC (Read error: Operation timed out)
20:14 🔗 HashbangI has quit IRC (Read error: Operation timed out)
20:14 🔗 dxrt_ has quit IRC (Read error: Operation timed out)
20:14 🔗 jspiros has quit IRC (Read error: Operation timed out)
20:14 🔗 paul2520 has quit IRC (Write error: Broken pipe)
20:14 🔗 kiska1 has quit IRC (Write error: Broken pipe)
20:15 🔗 TigerbotH has quit IRC (Read error: Operation timed out)
20:15 🔗 PotcFdk has quit IRC (Read error: Operation timed out)
20:16 🔗 PhrackD has quit IRC (Read error: Operation timed out)
20:16 🔗 jspiros has joined #archiveteam-bs
20:16 🔗 kiska1 has joined #archiveteam-bs
20:16 🔗 qw3rty112 has joined #archiveteam-bs
20:16 🔗 godane this could be better cause then you could set the videos in order maybe : http://www.europarl.europa.eu/ep-live/en/plenary/search-by-date/results?date=01-09-2008&start=0
20:16 🔗 svchfoo3 sets mode: +o kiska1
20:17 🔗 paul2520 has joined #archiveteam-bs
20:18 🔗 PhrackD has joined #archiveteam-bs
20:19 🔗 qw3rty111 has quit IRC (Ping timeout: 600 seconds)
20:20 🔗 dxrt_ has joined #archiveteam-bs
20:20 🔗 dxrt sets mode: +o dxrt_
20:20 🔗 TigerbotH has joined #archiveteam-bs
20:20 🔗 systwi has quit IRC (Read error: Operation timed out)
20:24 🔗 HashbangI has joined #archiveteam-bs
20:24 🔗 godane JAA: better code : curl -s http://www.europarl.europa.eu/ep-live/en/plenary/video?date=02-09-2008 | grep 'li id=' | sed 's|.*li id="c|http://www.europarl.europa.eu/ep-live/en/plenary/video?debate=|g' | sed 's|".*||g' | grep ^http
20:25 🔗 systwi has joined #archiveteam-bs
20:25 🔗 godane one of videos didn't have a debate= url cause the first one is a selected video
20:25 🔗 phiresky1 JAA godane: no idea if this is useful, but there's this secondary website web.ep.streamovations.be that also has an index where you can get a list of all videos ever in a single request
20:25 🔗 Zerote_ has quit IRC (Read error: Connection reset by peer)
20:25 🔗 Zerote__ has joined #archiveteam-bs
20:27 🔗 phiresky1 Here's a script I used previously (that intentionally only get's the DASH urls for currently running european parliament live streams though): https://gist.github.com/phiresky/2312c9d6eb3f69067b61fa1e267867c3
20:27 🔗 Coderjo has joined #archiveteam-bs
20:28 🔗 zerkalo has joined #archiveteam-bs
20:29 🔗 RichardG has joined #archiveteam-bs
20:32 🔗 JAA Looks like there's a lot more content on https://multimedia.europarl.europa.eu/en/home also.
20:32 🔗 BartoCH has joined #archiveteam-bs
20:33 🔗 phiresky1 if i remember correctly, web.ep.streamovations.be has recordings you can't (easily?) find on www.europarl.europa.eu . but it's probably only the stuff that went through their live streaming system, not clips and stuff
20:34 🔗 Oddly has quit IRC (Read error: Operation timed out)
20:34 🔗 phiresky1 is now known as phiresky
20:44 🔗 PotcFdk has joined #archiveteam-bs
21:12 🔗 godane so to everyone here all the nightline and old abc, nbc, cbs news broadcasts i have are dark
21:13 🔗 godane i'm not going to bother uploading anymore cause of that
21:13 🔗 enowaldo_ has quit IRC (Read error: Operation timed out)
21:15 🔗 JAA :-(
21:15 🔗 JAA Don't let that jerk win.
21:25 🔗 enowaldo has joined #archiveteam-bs
21:26 🔗 Igloo Oh godane i am so sorry
21:27 🔗 ealgase godane: why did they go dark?
21:27 🔗 godane no i mean i'm not uploading anymore old broadcasts from Vanderbilt
21:27 🔗 ealgase did they send a DMCA?
21:28 🔗 godane cause there from Vanderbilt archive
21:28 🔗 ealgase so they sent a DMCA?
21:28 🔗 Igloo Ah, Not because of jerkhead?
21:29 🔗 godane that crazy guy emailed Vanderbilt as me and jason scott asking for episodes of nightline from what i got from the emails
21:29 🔗 ealgase ah
21:29 🔗 ealgase wtf
21:30 🔗 godane ^(10:19:36 AM) SketchCow: He just wrote to a third-party archive AS me, and demanded things
21:30 🔗 godane he has a boner for 2004-2005 Nightline episodes
21:31 🔗 Igloo Ah I see
21:31 🔗 ealgase so Vanderbilt blocked you?
21:31 🔗 ealgase I don't understand what happened really
21:31 🔗 Igloo Humans can be shitty
21:32 🔗 godane that crazy guy talked about ted koppel hair color a few times with passion
21:37 🔗 godane i grab the videos looking for m3u8 files in pages like this : https://tvnews.vanderbilt.edu/broadcasts/895918
21:38 🔗 ealgase I wonder if youtube-dl supports it
21:39 🔗 ealgase would probably make that faster for you
21:39 🔗 godane i then use something like this recently : youtube-dl --hls-prefer-native --hls-use-mpegts --fixup warn --keep-fragments -o $(basename $(dirname $m3u8url) .mp4).ts
21:39 🔗 godane yes it can
21:39 🔗 godane but use those commands cause mpegts will give back same checksum
21:39 🔗 ealgase ah ok
21:39 🔗 ealgase anyway
21:40 🔗 ealgase so why were the items blocked on archive.org?
21:40 🔗 ealgase who blocked them?
21:40 🔗 godane it has to be blocked cause that crazy guy that bother me for 2004-2005 nightline episodes
21:41 🔗 ealgase ah
21:41 🔗 JAA I still don't understand why the guy can't be blocked instead.
21:41 🔗 ealgase wait, so you intentionally blocked them?
21:42 🔗 godane cause Vanderbilt doesn't want IA hosting them i guest
21:42 🔗 godane *guess
21:42 🔗 godane anyways i got to go
21:42 🔗 godane bbl
21:42 🔗 ealgase ah ok, i understand it now
21:42 🔗 ealgase (sorry for bothinging you)
21:50 🔗 qwebirc78 has joined #archiveteam-bs
21:50 🔗 qwebirc78 Are we going to archive Ghostbin?
22:00 🔗 enowaldo has quit IRC (Ping timeout: 265 seconds)
22:02 🔗 qwebirc78 has left
22:03 🔗 godane anyways i'm looking for old japanese computer magazines in pdf/cbr/cbz format
22:04 🔗 godane a big upload happened a few months ago : https://archive.org/details/micomBASIC19841994
22:05 🔗 godane whats weird is there is another big 2 uploads of Oh! MZ and Oh! X
22:06 🔗 JAA EU election timetable:
22:06 🔗 JAA - Thursday, 23 May: UK, Netherlands
22:06 🔗 JAA - Friday, 24 May: Czech Republic, Ireland
22:06 🔗 JAA - Saturday, 25 May: Czech Republic, Slovakia, Latvia, Malta
22:06 🔗 JAA - Sunday, 26 May: all other countries
22:06 🔗 JAA So we should try to cover them in this order.
22:06 🔗 JAA I'm running betamax's list from the UK through ArchiveBot currently.
22:09 🔗 godane those 3 sets of pdfs all have date of feb 28 2014 which makes me think there from the same person or release group
22:09 🔗 godane based on pdfinfo command
22:23 🔗 enowaldo has joined #archiveteam-bs
22:24 🔗 godane SketchCow: did we grab Afterhoursdjs.org : http://95.46.199.251/
22:25 🔗 godane cause its closing at end of the month: https://old.reddit.com/r/opendirectories/comments/bquq60/afterhoursdjsorg_liveset_recordings_mostly_2000s/
22:25 🔗 astrid omg
22:26 🔗 ealgase godane: #archivebot ?
22:26 🔗 godane i figure it was too big for archivebot
22:26 🔗 JAA Yep, probably.
22:26 🔗 ealgase nah
22:27 🔗 JAA Or at least AB won't like it.
22:27 🔗 ealgase anyway, keep in mind "There's a copy of these livesets on archive.org already, but it's a little trickier to bulk-download from there"
22:28 🔗 astrid ohh
22:29 🔗 ealgase so it's not like this stuff would be lost
22:29 🔗 astrid i did not know about this :P https://archive.org/details/afterhoursdjs_livesets
22:29 🔗 JAA It's actually easier to bulk-download from IA than a recursive wget.
22:30 🔗 JAA ia download --search='collection:afterhoursdjs_livesets'
22:31 🔗 Despatche has quit IRC (Ping timeout: 255 seconds)
22:49 🔗 JAA "The following text is what triggered our spam filter: loan"
22:49 🔗 JAA PLS
22:50 🔗 JAA The offending link: http://www.karinegloanecmaurin.eu , website of Karine Gloanec Maurin, French member of the EU Parliament
22:53 🔗 betamax JAA: fyi, I've just put a large list of tweets (~100MB, 1.75 million tweets) coming from twitter accounts owned by UK MEP candidates, into archivebot
22:54 🔗 JAA betamax: Nice, thanks!
22:55 🔗 betamax I put it in as a single list, if you think it would be better as multiple smaller lists, feel free to split it up and do that instead (you're much more familiar with the archivebot system)
22:55 🔗 JAA We'll see how it handles that.
22:56 🔗 JAA Should probably be fine.
22:57 🔗 betamax I assume it's safe to "!yahoo" that job and twitter will handle the load?
22:57 🔗 JAA I already changed the settings.
22:57 🔗 JAA (To more than what !yahoo does.)
22:57 🔗 JAA But yes, Twitter's fine with a high request rate.
22:58 🔗 betamax Ah, excellent! (And btw, all those tweets were obtained with snscrape, which zoomed through the list on a fast 1Gb/sec up/down connection)
23:00 🔗 JAA :-)
23:00 🔗 JAA How many scrapes were you running concurrently?
23:01 🔗 JAA Also, I hope to implement https://github.com/JustAnotherArchivist/snscrape/issues/34 soonish, which should make it even faster.
23:02 🔗 betamax None! All done one after another (but on a very fast server - 64 core Xeon, 400GB RAM)
23:02 🔗 JAA Heh, cool.
23:03 🔗 JAA Though snscrape's single-threaded, so the cores won't matter.
23:05 🔗 betamax Oh, I was wondering if there was a way to limit snscrape's twitter-user scrape to only include tweets after a certain date (e.g: so I can run another scrape in, say, a weeks time and get everything from now until after the election)
23:06 🔗 JAA Yeah, two ways to do that: --since option or using twitter-search directly instead of twitter-user.
23:06 🔗 JAA I consider the latter more reliable, but --since should probably work fine for Twitter.
23:07 🔗 JAA The syntax for the search would be something like this: snscrape twitter-search 'from:username since:2019-01-01'
23:07 🔗 betamax Thanks!
23:07 🔗 enowaldo has quit IRC (Read error: Operation timed out)
23:07 🔗 JAA And it's more reliable because it does the filtering on the server side.
23:07 🔗 JAA --since iterates over results until it finds a result that is older than the specified datetime.
23:08 🔗 JAA Twitter seems to reliably return results in reverse-chronological order, but Instagram hashtag searches for example are very unreliable in that regard since there are sometimes old results at the end of a page.
23:10 🔗 BlueMax has joined #archiveteam-bs
23:28 🔗 icedice has quit IRC (Quit: Leaving)
23:45 🔗 icedice has joined #archiveteam-bs
23:50 🔗 enowaldo has joined #archiveteam-bs

irclogger-viewer