[00:03] *** Jens has quit IRC (Remote host closed the connection) [00:04] *** Jens has joined #archiveteam-bs [00:07] *** vectr0n has quit IRC (ZNC - https://znc.in) [00:24] *** vectr0n has joined #archiveteam-bs [00:29] *** vectr0n has quit IRC (ZNC - https://znc.in) [00:37] *** vectr0n has joined #archiveteam-bs [00:40] *** vectr0n has quit IRC (Client Quit) [00:43] *** BlueMax has joined #archiveteam-bs [01:48] *** vectr0n has joined #archiveteam-bs [02:38] *** Selavi has quit IRC (Read error: Connection reset by peer) [02:38] *** superkuh has quit IRC (Read error: Operation timed out) [02:38] *** Pixi has joined #archiveteam-bs [02:39] *** superkuh has joined #archiveteam-bs [02:39] *** ivan has quit IRC (Read error: Operation timed out) [02:39] *** zyphlar has quit IRC (Read error: Operation timed out) [02:40] *** jspiros has quit IRC (Read error: Operation timed out) [02:40] *** Petri152 has quit IRC (Read error: Operation timed out) [02:40] *** JAA has quit IRC (Read error: Operation timed out) [02:40] *** wabu has quit IRC (Read error: Operation timed out) [02:40] *** Stilett0 has quit IRC (Read error: Operation timed out) [02:40] *** Stilett0 has joined #archiveteam-bs [02:41] *** Pixi` has quit IRC (Read error: Operation timed out) [02:42] *** ivan has joined #archiveteam-bs [02:43] *** svchfoo3 sets mode: +o ivan [02:44] *** wp494 has quit IRC (Read error: Operation timed out) [02:44] *** wp494 has joined #archiveteam-bs [02:54] *** Selavi has joined #archiveteam-bs [03:08] *** ta9le has quit IRC (Quit: Connection closed for inactivity) [03:16] *** m007a83 has quit IRC (Quit: Leaving) [03:33] *** rcunning_ has quit IRC (Connection closed for inactivity) [03:40] *** JAA has joined #archiveteam-bs [03:40] *** swebb sets mode: +o JAA [03:40] *** bakJAA sets mode: +o JAA [03:40] *** Petri152 has joined #archiveteam-bs [03:40] *** wabu has joined #archiveteam-bs [03:41] *** zyphlar has joined #archiveteam-bs [03:42] *** archodg_ has joined #archiveteam-bs [03:43] *** archodg has quit IRC (Read error: Operation timed out) [03:44] *** jspiros has joined #archiveteam-bs [03:45] *** odemg has quit IRC (Ping timeout: 268 seconds) [03:57] *** m007a83 has joined #archiveteam-bs [03:57] *** odemg has joined #archiveteam-bs [05:27] *** cf has left Bye [05:28] *** cf has joined #archiveteam-bs [05:28] *** cf has left Bye. [05:45] *** Pixi` has joined #archiveteam-bs [05:47] *** Pixi has quit IRC (west.us.hub irc.Prison.NET) [05:47] *** achip has quit IRC (west.us.hub irc.Prison.NET) [05:47] *** Mateon1 has quit IRC (west.us.hub irc.Prison.NET) [06:18] *** Mateon1 has joined #archiveteam-bs [06:18] *** achip has joined #archiveteam-bs [07:00] *** BlueMaxim has joined #archiveteam-bs [07:04] *** dxrt- has joined #archiveteam-bs [07:04] *** dxrt has quit IRC (ZNC - http://znc.sourceforge.net) [07:08] *** BlueMax has quit IRC (Ping timeout: 604 seconds) [07:08] *** BlueMaxim has quit IRC (Read error: Operation timed out) [07:09] *** BlueMax has joined #archiveteam-bs [07:26] *** schbirid has joined #archiveteam-bs [08:37] *** wp494 has quit IRC (Read error: Operation timed out) [08:39] *** wp494 has joined #archiveteam-bs [08:48] *** dxrt- is now known as dxrt [08:49] *** dxrt has quit IRC (Quit: ZNC - http://znc.sourceforge.net) [08:50] *** dxrt has joined #archiveteam-bs [08:56] *** Mateon1 has quit IRC (Ping timeout: 255 seconds) [08:56] *** Mateon1 has joined #archiveteam-bs [09:10] so dtic.mil is now not funny anymore [09:11] just trying to upload a file will cause a 403 error to dtic.mil [09:11] cause i try to scrap metadata from there website so we can have the files have metadata [09:33] *** jschwart has joined #archiveteam-bs [09:37] one possible theory is when i'm curling the metadata it cause a 403 error cause i don't have firefox as user-agent [09:38] is a best guest cause i remember i could download pdfs with Firefox as user-agent also [09:51] Archivebot it? [10:13] ok i can acesss the website again [10:16] i think i got it working again [10:17] it was just the 403 error blocking was making no sense for the amount i was grabbing [10:18] cause some one just browsing the website could get block based on the fact that i was like 1 url being scraped 6 times [10:21] one of the newer ones: https://archive.org/details/DTIC_ADA497001 [10:21] i have been lacking in uploading those this month cause i have tapes to digitize and upload [10:25] *** BlueMax has quit IRC (Quit: Leaving) [10:54] *** VoynichCr has joined #archiveteam-bs [10:55] anyone has thought about archiving all youtube metadata? [10:56] and maybe some frames or the main thumb [11:04] 07:42 < archodg_> SketchCow, arkiver I'm working on this, https://old.reddit.com/r/DataHoarder/comments/906884/youtube_metadata_archive_because_working_with/ something that [11:04] er, sorry for the highlights [11:04] heh [11:08] it's going well, I'm upto 600,500,000+ video ids [11:11] amazing [11:12] looks heavily biased to french language videos [11:35] *** ta9le has joined #archiveteam-bs [11:49] fenn, yeah that was the test file I was using from a french guy on the-eyes discord, the other lists I'm working with are 99% english [13:11] *** REiN^ has joined #archiveteam-bs [13:25] *** plue has quit IRC (Remote host closed the connection) [13:28] *** REiN^ has quit IRC (Read error: Connection reset by peer) [13:30] *** REiN^ has joined #archiveteam-bs [14:34] To the person editing imgur's page, please replace the source with this: https://www.reddit.com/r/patreon/comments/7x4wx1 [14:43] kiska: Thanks. I knew there was a better link out there but couldn't find it. [14:58] *** plue has joined #archiveteam-bs [14:58] *** plue has quit IRC (Client Quit) [14:58] *** plue has joined #archiveteam-bs [15:15] *** m007a83 has quit IRC (Leaving) [15:17] Thanks JAA [15:17] btw JAA it was the topic for #imgone [15:24] *** Mateon1 has quit IRC (Remote host closed the connection) [15:24] *** Mateon1 has joined #archiveteam-bs [15:28] Whoops [15:34] *** m007a83 has joined #archiveteam-bs [16:39] *** achip has quit IRC (west.us.hub irc.Prison.NET) [16:55] Oh FFS, Twitter's new site also uses that awful scrolling thing where off-screen elements are removed from the DOM. Sigh. [17:01] They also nuked the non-JS mobile site, mobile.twitter.com. [17:01] Unless that's now UA-dependent or something. [17:04] Ah, it needs a cookie. You get asked whether you want the legacy site when you access mobile.twitter.com without JS, which then sets the relevant cookie(s). Afterwards, it serves you the non-JS page. [17:13] it might be interesting to design a generic mitigation that no-ops the removal of DOM elements that are not in the viewport [17:16] *** achip has joined #archiveteam-bs [17:56] *** SoniEx2 has quit IRC (Ping timeout: 264 seconds) [17:56] *** vectr0n_ has joined #archiveteam-bs [18:02] *** vectr0n has quit IRC (Read error: Operation timed out) [18:02] *** vectr0n_ is now known as vectr0n [18:08] *** SoniEx2 has joined #archiveteam-bs [18:29] *** SoniEx2 has quit IRC (Ping timeout: 360 seconds) [18:44] *** SoniEx2 has joined #archiveteam-bs [19:03] JAA: was already UA dependent iirc, i used an older opera mobile UA [19:59] *** SoniEx2 has quit IRC (Ping timeout: 264 seconds) [20:12] *** SoniEx2 has joined #archiveteam-bs [20:22] schbirid: I'm pretty sure I was able to access it without a special UA with Firefox on Linux previously. As in, a few months ago or so. [20:27] I'm scraping various sources for TalkTalk sites currently. Haven't quite figured out yet what to do with the pages I find though. Maybe I'll just !a < them. [20:28] Bing appears to be fairly scraping-friendly. At least they don't insta-ban you like many other services if you use a reasonable delay between requests. [20:33] *** betamax has joined #archiveteam-bs [21:13] *** m007a83 has quit IRC (Leaving) [22:52] Does anyone have any search term suggestions for TalkTalk? So far, I've searched for a plain site:talktalk.net and together with these terms: family history, genealogy, club, society, clan, company. That yielded 1243 websites through Bing. There must be more though. [22:57] *** jut has joined #archiveteam-bs [23:10] *** BlueMax has joined #archiveteam-bs [23:41] I propose a warrior project for grabbing steam profiles. With the bans constantly sweeping over if we grab the numerical profiles? [23:47] *** m007a83 has joined #archiveteam-bs [23:48] *** achip has quit IRC (west.us.hub irc.Prison.NET)