Time |
Nickname |
Message |
00:03
🔗
|
|
Jens has quit IRC (Remote host closed the connection) |
00:04
🔗
|
|
Jens has joined #archiveteam-bs |
00:07
🔗
|
|
vectr0n has quit IRC (ZNC - https://znc.in) |
00:24
🔗
|
|
vectr0n has joined #archiveteam-bs |
00:29
🔗
|
|
vectr0n has quit IRC (ZNC - https://znc.in) |
00:37
🔗
|
|
vectr0n has joined #archiveteam-bs |
00:40
🔗
|
|
vectr0n has quit IRC (Client Quit) |
00:43
🔗
|
|
BlueMax has joined #archiveteam-bs |
01:48
🔗
|
|
vectr0n has joined #archiveteam-bs |
02:38
🔗
|
|
Selavi has quit IRC (Read error: Connection reset by peer) |
02:38
🔗
|
|
superkuh has quit IRC (Read error: Operation timed out) |
02:38
🔗
|
|
Pixi has joined #archiveteam-bs |
02:39
🔗
|
|
superkuh has joined #archiveteam-bs |
02:39
🔗
|
|
ivan has quit IRC (Read error: Operation timed out) |
02:39
🔗
|
|
zyphlar has quit IRC (Read error: Operation timed out) |
02:40
🔗
|
|
jspiros has quit IRC (Read error: Operation timed out) |
02:40
🔗
|
|
Petri152 has quit IRC (Read error: Operation timed out) |
02:40
🔗
|
|
JAA has quit IRC (Read error: Operation timed out) |
02:40
🔗
|
|
wabu has quit IRC (Read error: Operation timed out) |
02:40
🔗
|
|
Stilett0 has quit IRC (Read error: Operation timed out) |
02:40
🔗
|
|
Stilett0 has joined #archiveteam-bs |
02:41
🔗
|
|
Pixi` has quit IRC (Read error: Operation timed out) |
02:42
🔗
|
|
ivan has joined #archiveteam-bs |
02:43
🔗
|
|
svchfoo3 sets mode: +o ivan |
02:44
🔗
|
|
wp494 has quit IRC (Read error: Operation timed out) |
02:44
🔗
|
|
wp494 has joined #archiveteam-bs |
02:54
🔗
|
|
Selavi has joined #archiveteam-bs |
03:08
🔗
|
|
ta9le has quit IRC (Quit: Connection closed for inactivity) |
03:16
🔗
|
|
m007a83 has quit IRC (Quit: Leaving) |
03:33
🔗
|
|
rcunning_ has quit IRC (Connection closed for inactivity) |
03:40
🔗
|
|
JAA has joined #archiveteam-bs |
03:40
🔗
|
|
swebb sets mode: +o JAA |
03:40
🔗
|
|
bakJAA sets mode: +o JAA |
03:40
🔗
|
|
Petri152 has joined #archiveteam-bs |
03:40
🔗
|
|
wabu has joined #archiveteam-bs |
03:41
🔗
|
|
zyphlar has joined #archiveteam-bs |
03:42
🔗
|
|
archodg_ has joined #archiveteam-bs |
03:43
🔗
|
|
archodg has quit IRC (Read error: Operation timed out) |
03:44
🔗
|
|
jspiros has joined #archiveteam-bs |
03:45
🔗
|
|
odemg has quit IRC (Ping timeout: 268 seconds) |
03:57
🔗
|
|
m007a83 has joined #archiveteam-bs |
03:57
🔗
|
|
odemg has joined #archiveteam-bs |
05:27
🔗
|
|
cf has left Bye |
05:28
🔗
|
|
cf has joined #archiveteam-bs |
05:28
🔗
|
|
cf has left Bye. |
05:45
🔗
|
|
Pixi` has joined #archiveteam-bs |
05:47
🔗
|
|
Pixi has quit IRC (west.us.hub irc.Prison.NET) |
05:47
🔗
|
|
achip has quit IRC (west.us.hub irc.Prison.NET) |
05:47
🔗
|
|
Mateon1 has quit IRC (west.us.hub irc.Prison.NET) |
06:18
🔗
|
|
Mateon1 has joined #archiveteam-bs |
06:18
🔗
|
|
achip has joined #archiveteam-bs |
07:00
🔗
|
|
BlueMaxim has joined #archiveteam-bs |
07:04
🔗
|
|
dxrt- has joined #archiveteam-bs |
07:04
🔗
|
|
dxrt has quit IRC (ZNC - http://znc.sourceforge.net) |
07:08
🔗
|
|
BlueMax has quit IRC (Ping timeout: 604 seconds) |
07:08
🔗
|
|
BlueMaxim has quit IRC (Read error: Operation timed out) |
07:09
🔗
|
|
BlueMax has joined #archiveteam-bs |
07:26
🔗
|
|
schbirid has joined #archiveteam-bs |
08:37
🔗
|
|
wp494 has quit IRC (Read error: Operation timed out) |
08:39
🔗
|
|
wp494 has joined #archiveteam-bs |
08:48
🔗
|
|
dxrt- is now known as dxrt |
08:49
🔗
|
|
dxrt has quit IRC (Quit: ZNC - http://znc.sourceforge.net) |
08:50
🔗
|
|
dxrt has joined #archiveteam-bs |
08:56
🔗
|
|
Mateon1 has quit IRC (Ping timeout: 255 seconds) |
08:56
🔗
|
|
Mateon1 has joined #archiveteam-bs |
09:10
🔗
|
godane |
so dtic.mil is now not funny anymore |
09:11
🔗
|
godane |
just trying to upload a file will cause a 403 error to dtic.mil |
09:11
🔗
|
godane |
cause i try to scrap metadata from there website so we can have the files have metadata |
09:33
🔗
|
|
jschwart has joined #archiveteam-bs |
09:37
🔗
|
godane |
one possible theory is when i'm curling the metadata it cause a 403 error cause i don't have firefox as user-agent |
09:38
🔗
|
godane |
is a best guest cause i remember i could download pdfs with Firefox as user-agent also |
09:51
🔗
|
Flashfire |
Archivebot it? |
10:13
🔗
|
godane |
ok i can acesss the website again |
10:16
🔗
|
godane |
i think i got it working again |
10:17
🔗
|
godane |
it was just the 403 error blocking was making no sense for the amount i was grabbing |
10:18
🔗
|
godane |
cause some one just browsing the website could get block based on the fact that i was like 1 url being scraped 6 times |
10:21
🔗
|
godane |
one of the newer ones: https://archive.org/details/DTIC_ADA497001 |
10:21
🔗
|
godane |
i have been lacking in uploading those this month cause i have tapes to digitize and upload |
10:25
🔗
|
|
BlueMax has quit IRC (Quit: Leaving) |
10:54
🔗
|
|
VoynichCr has joined #archiveteam-bs |
10:55
🔗
|
VoynichCr |
anyone has thought about archiving all youtube metadata? |
10:56
🔗
|
VoynichCr |
and maybe some frames or the main thumb |
11:04
🔗
|
fenn |
07:42 < archodg_> SketchCow, arkiver I'm working on this, https://old.reddit.com/r/DataHoarder/comments/906884/youtube_metadata_archive_because_working_with/ something that |
11:04
🔗
|
fenn |
er, sorry for the highlights |
11:04
🔗
|
archodg_ |
heh |
11:08
🔗
|
archodg_ |
it's going well, I'm upto 600,500,000+ video ids |
11:11
🔗
|
VoynichCr |
amazing |
11:12
🔗
|
fenn |
looks heavily biased to french language videos |
11:35
🔗
|
|
ta9le has joined #archiveteam-bs |
11:49
🔗
|
odemg |
fenn, yeah that was the test file I was using from a french guy on the-eyes discord, the other lists I'm working with are 99% english |
13:11
🔗
|
|
REiN^ has joined #archiveteam-bs |
13:25
🔗
|
|
plue has quit IRC (Remote host closed the connection) |
13:28
🔗
|
|
REiN^ has quit IRC (Read error: Connection reset by peer) |
13:30
🔗
|
|
REiN^ has joined #archiveteam-bs |
14:34
🔗
|
kiska |
To the person editing imgur's page, please replace the source with this: https://www.reddit.com/r/patreon/comments/7x4wx1 |
14:43
🔗
|
JAA |
kiska: Thanks. I knew there was a better link out there but couldn't find it. |
14:58
🔗
|
|
plue has joined #archiveteam-bs |
14:58
🔗
|
|
plue has quit IRC (Client Quit) |
14:58
🔗
|
|
plue has joined #archiveteam-bs |
15:15
🔗
|
|
m007a83 has quit IRC (Leaving) |
15:17
🔗
|
kiska |
Thanks JAA |
15:17
🔗
|
kiska |
btw JAA it was the topic for #imgone |
15:24
🔗
|
|
Mateon1 has quit IRC (Remote host closed the connection) |
15:24
🔗
|
|
Mateon1 has joined #archiveteam-bs |
15:28
🔗
|
JAA |
Whoops |
15:34
🔗
|
|
m007a83 has joined #archiveteam-bs |
16:39
🔗
|
|
achip has quit IRC (west.us.hub irc.Prison.NET) |
16:55
🔗
|
JAA |
Oh FFS, Twitter's new site also uses that awful scrolling thing where off-screen elements are removed from the DOM. Sigh. |
17:01
🔗
|
JAA |
They also nuked the non-JS mobile site, mobile.twitter.com. |
17:01
🔗
|
JAA |
Unless that's now UA-dependent or something. |
17:04
🔗
|
JAA |
Ah, it needs a cookie. You get asked whether you want the legacy site when you access mobile.twitter.com without JS, which then sets the relevant cookie(s). Afterwards, it serves you the non-JS page. |
17:13
🔗
|
ivan |
it might be interesting to design a generic mitigation that no-ops the removal of DOM elements that are not in the viewport |
17:16
🔗
|
|
achip has joined #archiveteam-bs |
17:56
🔗
|
|
SoniEx2 has quit IRC (Ping timeout: 264 seconds) |
17:56
🔗
|
|
vectr0n_ has joined #archiveteam-bs |
18:02
🔗
|
|
vectr0n has quit IRC (Read error: Operation timed out) |
18:02
🔗
|
|
vectr0n_ is now known as vectr0n |
18:08
🔗
|
|
SoniEx2 has joined #archiveteam-bs |
18:29
🔗
|
|
SoniEx2 has quit IRC (Ping timeout: 360 seconds) |
18:44
🔗
|
|
SoniEx2 has joined #archiveteam-bs |
19:03
🔗
|
schbirid |
JAA: was already UA dependent iirc, i used an older opera mobile UA |
19:59
🔗
|
|
SoniEx2 has quit IRC (Ping timeout: 264 seconds) |
20:12
🔗
|
|
SoniEx2 has joined #archiveteam-bs |
20:22
🔗
|
JAA |
schbirid: I'm pretty sure I was able to access it without a special UA with Firefox on Linux previously. As in, a few months ago or so. |
20:27
🔗
|
JAA |
I'm scraping various sources for TalkTalk sites currently. Haven't quite figured out yet what to do with the pages I find though. Maybe I'll just !a < them. |
20:28
🔗
|
JAA |
Bing appears to be fairly scraping-friendly. At least they don't insta-ban you like many other services if you use a reasonable delay between requests. |
20:33
🔗
|
|
betamax has joined #archiveteam-bs |
21:13
🔗
|
|
m007a83 has quit IRC (Leaving) |
22:52
🔗
|
JAA |
Does anyone have any search term suggestions for TalkTalk? So far, I've searched for a plain site:talktalk.net and together with these terms: family history, genealogy, club, society, clan, company. That yielded 1243 websites through Bing. There must be more though. |
22:57
🔗
|
|
jut has joined #archiveteam-bs |
23:10
🔗
|
|
BlueMax has joined #archiveteam-bs |
23:41
🔗
|
Flashfire |
I propose a warrior project for grabbing steam profiles. With the bans constantly sweeping over if we grab the numerical profiles? |
23:47
🔗
|
|
m007a83 has joined #archiveteam-bs |
23:48
🔗
|
|
achip has quit IRC (west.us.hub irc.Prison.NET) |